How much does AVX-512 help Zen 5 in x265? And how to turn it on

Ryzen 9000 with AVX-512 in x265/Handbrake

You may not know this, but the x265 video encoder can use AVX-512, but they are unused by default for historical reasons. In the past we made a guide showing how to enable the optimizations and looked at their effect, first on Intel Rocket Lake processors and then on Zen 4. Due to the popularity of those articles, we’ve now repeated the same tests on the new Zen 5 architecture, for comparison with previous cores.

This article will not be some extensive evaluation of the usefulness of AVX-512 or Zen 5. Our limited previous tests of the effect of AVX-512 were more or less due to the need to somehow handle the fact that the x265 encoder used in our methodology, while it allows the use of AVX-512 for performance enhancement, has these optimizations disabled by default. Because enabling them is neither automatic nor completely straightforward (unless you are using plain standalone x265 binary from the command line), we do not have results measured using these instructions in our methodology either.

So the encoding speed could be a bit better on Ryzen 9000 processors if you take the trouble to enable them. We’ll show you how to do that, and add some performance tests showing how Ryzen 9000 processors perform with AVX-512 in x265 compared to the previous generation and Intel Rocket Lake processors that also provided AVX-512.

Why doesn’t x265 use AVX-512 by default? It’s because these optimizations only cover some encoder operations, and as a result the FPS improvement is relatively low – don’t expect anything remotely close to the 100% improvement that a 2x wider SIMD vector could theoretically achieve in isolated operations. But at the same time, by processing 2× more datain a single instruction, AVX-512 also increased power consumption by a lot when it first appeared. The first-generation Xeon Scalable processors (Skylake-SP), which were the Intel SIMD state of the art at the time of writing these optimizations, reduced clock speeds when using these instructions, and had other more complex performance penalties (activating and deactivating 512-bit units triggered a heavy-handed transition mode with hidden performance penalties, and frequent switching itself killed performance). Therefore, it was not easy to exploit the performance potential on the server processors at that time.

When these optimizations got into x265, it was found that they didn’t increase performance on Xeons quite to the extent that the clock speed was reduced. And because of this, it was decided to leave them disabled by default – so x265 won’t use them unless you force it to (you can read about the context here). This setting has persisted since then and is still unchanged today in 2024. You can verify this by opening the encoding log and finding the line starting “x265 [info]: using cpu capabilities:“. In the default settings, you’ll find something like this:

Indication of instruction sets used with default x265 settings

The screenshot was taken in the current Nightly build of Handbrake (2024092301) on a Ryzen 9 7950X processor, which also supports AVX-512, yet you can see that the encoder only automatically enabled optimizations using BMI2 and AVX2 instructions.

The developers previously recommended turning on AVX-512 when, for example, you are encoding 4K with very slow settings. However, if your processor is overclocked to a fixed clock speed or for some other reason does not reduce clock speeds when AVX-512 is running, you should generally always see an increase in encoding speed. This is the case with Rocket Lake processors, which, at least on Z590 boards, should keep clock speeds high even with AVX-512. Ryzen 7000 processors with Zen 4 cores should not reduce clock speeds at all (they also only have 256-bit units, so you can’t expect much extra performance from them).

The new Ryzen 9000 processors also don’t seem to suffer from underclocking when using AVX-512 instructions, or at least not significantly, and they don’t have the problems with switching between 256-bit and 512-bit mode that the former Intel Skylake-SP and Skylake-X processors did. Thus, it should be a good idea to enable AVX-512 on them as well.

How to enable AVX-512?

The use of AVX-512 in x265 is enabled with the –asm avx512 parameter. Use this if you are running x265.exe directly. But if you are using a GUI or a frontend, you need to find out how to pass this parameter from it to x265.

In HandBrake this is done in the Video encoder settings – you select a x265 profile and at the bottom in the “Advanced Options” box you can see the command line parameters that are passed to x265. There should already be a few there. What you need to do is add a colon (no spaces) at the end to separate the parameters, and add “asm=avx512” after that. Without the quotation marks, see image.

Enabling AVX-512 on x265 in Handbrake

After this, a glance at the log should show that the x265 also uses AVX-512. The line about instruction extensions should say this (if you have a processor with AVX-512):

Indication of used instruction sets after manual activation of AVX-512 (on Ryzen 9 7950X)

What does it do: Is AVX-512 in Zen 5 better than Intel’s?

But most of you are probably wondering how good the AVX-512 implementation is and how much performance it adds (or how much it harms power consumption, if you still have the Rocket Lake test in your memory). Now that we have a basis for benchmarking from the previous tests, we ran the same tests on Ryzen 9000s. Thanks to the test methodology (including the same x265 and Handbrake versions) still being the same, the AVX-512 results and benefits should be directly comparable to the previous 2021 and 2022 tests:

With Zen 4, we found that turning on x265 had a pretty small benefit that was almost not worth the effort. The improvement is give or take 2% or less, while with Rocket Lake the performance of the same processor at the same clock speed improved by 7.5–9.0%. However, the impact is still a net positive on Zen 4 (and not associated with degradation in power consumption or other drawbacks) so it is still beneficial to have these instructions enabled on Ryzen 7000 (and 8000 in the case of APUs) with Zen 4 architecture. This is because Zen 4 does support these 512-bit instructions, but it executes them on 256-bit units in two passes. It is therefore good news for AMD that there is no slowdown in Zen 4 due to some inefficiencies of code using AVX-512 compared to code using only AVX2. This is because the optimizations in x265 were developed and tested on a processor with 512-bit units, on top of it being an Intel processor with different microarchitecture, long before the opportunity to test whether they work as expected on Zen 4 performance-wise.

But what about Zen 5? This processor has fully 512-bit units in the desktop version (but not in the laptop version, the mobile Ryzen AI 300 has 256-bit units) and can even perform four 512-bit operations (in the case of integer SIMD addition) per cycle in the case of instructions working with integer data types, which are what is primarily used in multimedia. So it should have even better theoretical SIMD performance than Rocket Lake. Its Cypress Cove core (the 14nm version of Sunny Cove from the 10nm Ice Lake processors) can perform a maximum of two 512-bit integer additions. Zen 5 also has significantly better performance in floating point operations, while Rocket Lake has just half the 512-bit FMA performance of the server versions of Skylake-SP and Ice Lake-SP. But that’s not going to come into play anywhere in x265 as it doesn’t use floating-point instructions significantly (cutree rate control being perhaps the only exception that is still just a minor factor, though).

Below you can see the results of our encoding performance measurements on the Ryzen 5 9600X, Ryzen 7 9700X, Ryzen 9 9900X and Ryzen 9 9950X, both without and with AVX-512. But the chart also shows results showing the benefit of AVX-512 on those previously tested Intel Core i9-11900K, i7-11700KF and i5-11400F processors representing Rocket Lake, and Ryzen 9 7900X / Ryzen 5 7600X representing Zen 4.

In the standard HWCooling reviews, only results without AVX-512 are included and only those are taken into account, but as you can see, the results of Zen 5 architecture processors would improve a little in this one discipline (even compared to Zen 4) if you are willing to turn on these optimizations in x265.

The encoding task used by HWCooling for testing delivers a benefit of +6% on the Ryzen 9 9900X, +7 % on the Ryzen 9 9950X and on the Ryzen 5 9600X, but on the Ryzen 7 9700X we have an increase of as much as +9%. The differences are not due to the instructions behaving differently on each processor, rather it’s due to other performance factors like scaling to a given number of threads, and some of the variation is probably due to the fact that unfortunately Handbrake rounds the resulting FPS to single decimal digits and that’s not accurate enough.

However, with +6% to +9% gain, it looks like the speed improvement of AVX-512 is basically the same on processors with Zen 5 cores as it is on Intel Rocket Lake cores.

When encoding in x265 with AVX-512 active on Ryzen 9000 processors, we also measured the power consumption. This time, we did not establish the increase in power consumption caused by AVX-512 (we don’t have numbers for the run with optimizations disabled), but there doesn’t seem to be much need for it either. The Ryzen 9000 power consumption numbers, even with AVX-512 on, stays at virtually the same average values as the CPUs in question consume in the Handbrake test with x264 (where AVX-512 should be used automatically), the differences are in the low units of watts. At the same time, the power consumption is even lower than in the Cinebench R23 multithreaded workload, where AVX-512 is actually not used at all (and mostly 128bit SIMD is used). Thus, at least when running x265, the power consumption does not seem to increase significantly due to AVX-512. In multithreaded applications, the power consumption will simply be what these processors typically display in other multithreaded tasks.

The achieved clock speeds are also roughly the same (with a tolerance of a few dozen MHz) as the ones that Ryzen 9000 processors run at in Cinebench R23, and we also get virtually identical temperatures as in Cinebench R23. Although we don’t have any measurements to back this up, it seems that there is no adverse effect to the thermals of Zen 5 when encoding with AVX-512. And so it makes no sense not to use these instructions if your encoding application allows you to enable AVX-512 (in Handbrake you can automate this by copying and editing the profile you use).


As a reminder, on Rocket Lake processors we saw a similar increase in performance after turning on AVX-512 as on Zen 5, but it led to a 28-30% increase in power consumption (we actually measured it at the time), and temperatures jumped up accordingly. For example, on the Core i9-11900K, the consumption measured on the 12V cable went from 226W to 292W! In contrast, the Ryzen 7 9700X with the same eight cores and 16 threads consumes 104 W and performs 38% better. The Ryzen 9 9950X with 16 cores almost reaches 250W consumption (all these numbers are with VRM losses included though, so actual power is likely below the CPU’s 230W PPT), but at 156% higher performance.

Conclusion: A free bonus

So the AVX-512 is clearly helpful in x265 for Zen 5. Personally, I was a bit surprised that the performance increase after turning on AVX-512 is not a bit higher, considering that the SIMD unit of Zen 5 is quite a bit more powerful than the one in Rocket Lake. Zen 5 has four pipelines for 512-bit instructions and Rocket Lake only two (it can do three 256bit ops per cycle though). But if you compare Zen 5 with AVX-512 against Rocket Lake with AVX-512, you can immediately see the significantly higher performance per 1 MHz (see the performance edge of the Ryzen 9700X over the 11900K, even though it runs at a slightly lower clock speed), so the more advanced SIMD unit of Zen 5 does show its performance.

It’s possible Zen 5 would do a bit better, if the AVX-512 code was actually developed against the core’s own SIMD unit architecture. The fact that the developers were writing, testing and profiling the code on Skylake proccesors may be leading to it being more tuned (trained) for the Intel core. Such effect might not be large though, as Zen 5’s AVX-512 implementation seems to be generally strong with few weaknesses.

A performance increase of 7 to 9% in this particular program (which comes without any degradation in output quality!) may not seem like much. But code like x264 and x265 is not easily parallelizable, on the contrary it requires quite intelligent manual optimizations in assembly, and only about half of the CPU time spent is actually in SIMD code (the rest is scalar and non-parallelizable), and even even of the SIMDable portion, not all operations can be scaled to 512-bit vectors. Note that even without AVX-512, the encoder is already very thoroughly optimized. Achieving a further overall speed increase reaching these percentages without the use of 512-bit vectors would probably require very extensive work and refactoring in all important functions, possibly over many years, but it is possible that a similar result could not even be achieved at all without the discovery of some clever new tricks. And it’s good to note that even in that case then the benefit would probably be cumulative with the effect of optimizations that exploit AVX-512, so these instructions are definitely useful.

Further reading

However, there are applications that can be parallelized more easily and AVX-512 on Zen 5 will lead to significantly greater benefits. Some examples can be found, for example, in the testing of the impact of AVX-512 on Linux by Phoronix.

If you are interested in AVX-512 on Zen 5, we recommend the article in which Alexander Yee, the author of yCruncher, discusses the implementation of this instruction extension:

Read more (external link): Zen5’s AVX512 Teardown + More…

If you’re particularly interested in encoding with x265 (or other encoders) on Zen 5, these detailed tests comparing Ryzen 9 7950X and Ryzen 9 9950X in x264, x265, as well as the 10-bit variant of x265 and SVT-AV1 have appeared online. The interesting thing about them is that they don’t measure performance in just one specific encoder preset, but in several, so you can choose the results depending on whether you prefer fast compression or maximum quality. In fact, the performance ratio between Zen 4 and Zen 5 can vary quite a bit depending on the speed preset. But unfortunately this article seems to be testing without having AVX-512 enabled in x265.

English translation and edit by Jozef Dudáš


  •  
  •  
  •  
Flattr this!

AMD releases Ryzen 5 7400F, cheapest AM5 CPU for gaming PCs

AMD announced several CPUs at CES 2025 during the keynote now notorious for the absence of Radeon graphics cards  – Ryzens 9 9900X3D and 9950X3D with V-Cache, Strix Halo extreme laptop CPUs, Krackan APUs and Ryzen Z2 for handhelds. Later we found out AMD stealthily launched even more CPUs, among them Ryzen 5 9600. It turns out there is yet another potentially attractive AM5 CPU that has been launched to market in this manner. Read more “AMD releases Ryzen 5 7400F, cheapest AM5 CPU for gaming PCs” »

  •  
  •  
  •  

Radeon RX 9000 MIA? What we learned (not) about RDNA 4 at CES

AMD revealed a lineup of new CPUs for 2025 during CES keynote, but not graphics cards. Although the new RDNA 4 graphics cards were believed to target CES reveal, the Radeon RX 9070 and RX 9070 XT were not discussed. It seems out that the new cards were supposed to have just a “preview” at CES 2025. They were almost glossed over in the keynote, but outside of the video presentation, some information on the new cards was shared. Read more “Radeon RX 9000 MIA? What we learned (not) about RDNA 4 at CES” »

  •  
  •  
  •  

Cheaper Radeon RX 9070 has 16GB memory too. And 8pin power

AMD’s new graphics card, the Radeon RX 9070 XT, is coming this month. Chinese sources say it will have a 260–270W TDP and clock speeds of around 2.8GHz base and up to 3.0–3.1GHz in boost. Recently, more details of it came together, as well as the first news on the cheaper Radeon RX 9070 “non-XT” version. The latter was expected to have its memory stripped down to 12 GB and accordingly lower emory bandwidth. But things may be better. Read more “Cheaper Radeon RX 9070 has 16GB memory too. And 8pin power” »

  •  
  •  
  •  

Leave a Reply

Your email address will not be published. Required fields are marked *