How much does AVX-512 help Zen 5 in x265? And how to turn it on

Ryzen 9000 with AVX-512 in x265/Handbrake

You may not know this, but the x265 video encoder can use AVX-512, but they are unused by default for historical reasons. In the past we made a guide showing how to enable the optimizations and looked at their effect, first on Intel Rocket Lake processors and then on Zen 4. Due to the popularity of those articles, we’ve now repeated the same tests on the new Zen 5 architecture, for comparison with previous cores.

This article will not be some extensive evaluation of the usefulness of AVX-512 or Zen 5. Our limited previous tests of the effect of AVX-512 were more or less due to the need to somehow handle the fact that the x265 encoder used in our methodology, while it allows the use of AVX-512 for performance enhancement, has these optimizations disabled by default. Because enabling them is neither automatic nor completely straightforward (unless you are using plain standalone x265 binary from the command line), we do not have results measured using these instructions in our methodology either.

So the encoding speed could be a bit better on Ryzen 9000 processors if you take the trouble to enable them. We’ll show you how to do that, and add some performance tests showing how Ryzen 9000 processors perform with AVX-512 in x265 compared to the previous generation and Intel Rocket Lake processors that also provided AVX-512.

Why doesn’t x265 use AVX-512 by default? It’s because these optimizations only cover some encoder operations, and as a result the FPS improvement is relatively low – don’t expect anything remotely close to the 100% improvement that a 2x wider SIMD vector could theoretically achieve in isolated operations. But at the same time, by processing 2× more datain a single instruction, AVX-512 also increased power consumption by a lot when it first appeared. The first-generation Xeon Scalable processors (Skylake-SP), which were the Intel SIMD state of the art at the time of writing these optimizations, reduced clock speeds when using these instructions, and had other more complex performance penalties (activating and deactivating 512-bit units triggered a heavy-handed transition mode with hidden performance penalties, and frequent switching itself killed performance). Therefore, it was not easy to exploit the performance potential on the server processors at that time.

When these optimizations got into x265, it was found that they didn’t increase performance on Xeons quite to the extent that the clock speed was reduced. And because of this, it was decided to leave them disabled by default – so x265 won’t use them unless you force it to (you can read about the context here). This setting has persisted since then and is still unchanged today in 2024. You can verify this by opening the encoding log and finding the line starting “x265 [info]: using cpu capabilities:“. In the default settings, you’ll find something like this:

Indication of instruction sets used with default x265 settings

The screenshot was taken in the current Nightly build of Handbrake (2024092301) on a Ryzen 9 7950X processor, which also supports AVX-512, yet you can see that the encoder only automatically enabled optimizations using BMI2 and AVX2 instructions.

The developers previously recommended turning on AVX-512 when, for example, you are encoding 4K with very slow settings. However, if your processor is overclocked to a fixed clock speed or for some other reason does not reduce clock speeds when AVX-512 is running, you should generally always see an increase in encoding speed. This is the case with Rocket Lake processors, which, at least on Z590 boards, should keep clock speeds high even with AVX-512. Ryzen 7000 processors with Zen 4 cores should not reduce clock speeds at all (they also only have 256-bit units, so you can’t expect much extra performance from them).

The new Ryzen 9000 processors also don’t seem to suffer from underclocking when using AVX-512 instructions, or at least not significantly, and they don’t have the problems with switching between 256-bit and 512-bit mode that the former Intel Skylake-SP and Skylake-X processors did. Thus, it should be a good idea to enable AVX-512 on them as well.

How to enable AVX-512?

The use of AVX-512 in x265 is enabled with the –asm avx512 parameter. Use this if you are running x265.exe directly. But if you are using a GUI or a frontend, you need to find out how to pass this parameter from it to x265.

In HandBrake this is done in the Video encoder settings – you select a x265 profile and at the bottom in the “Advanced Options” box you can see the command line parameters that are passed to x265. There should already be a few there. What you need to do is add a colon (no spaces) at the end to separate the parameters, and add “asm=avx512” after that. Without the quotation marks, see image.

After this, a glance at the log should show that the x265 also uses AVX-512. The line about instruction extensions should say this (if you have a processor with AVX-512):

Indication of used instruction sets after manual activation of AVX-512 (on Ryzen 9 7950X)

What does it do: Is AVX-512 in Zen 5 better than Intel’s?

But most of you are probably wondering how good the AVX-512 implementation is and how much performance it adds (or how much it harms power consumption, if you still have the Rocket Lake test in your memory). Now that we have a basis for benchmarking from the previous tests, we ran the same tests on Ryzen 9000s. Thanks to the test methodology (including the same x265 and Handbrake versions) still being the same, the AVX-512 results and benefits should be directly comparable to the previous 2021 and 2022 tests:

Read more: Intel AVX-512 tested in x265: how to enable it and does it help?
Read more: How good is AMD’s AVX-512? Does it improve Zen 4 performance?

With Zen 4, we found that turning on x265 had a pretty small benefit that was almost not worth the effort. The improvement is give or take 2% or less, while with Rocket Lake the performance of the same processor at the same clock speed improved by 7.5–9.0%. However, the impact is still a net positive on Zen 4 (and not associated with degradation in power consumption or other drawbacks) so it is still beneficial to have these instructions enabled on Ryzen 7000 (and 8000 in the case of APUs) with Zen 4 architecture. This is because Zen 4 does support these 512-bit instructions, but it executes them on 256-bit units in two passes. It is therefore good news for AMD that there is no slowdown in Zen 4 due to some inefficiencies of code using AVX-512 compared to code using only AVX2. This is because the optimizations in x265 were developed and tested on a processor with 512-bit units, on top of it being an Intel processor with different microarchitecture, long before the opportunity to test whether they work as expected on Zen 4 performance-wise.

But what about Zen 5? This processor has fully 512-bit units in the desktop version (but not in the laptop version, the mobile Ryzen AI 300 has 256-bit units) and can even perform four 512-bit operations (in the case of integer SIMD addition) per cycle in the case of instructions working with integer data types, which are what is primarily used in multimedia. So it should have even better theoretical SIMD performance than Rocket Lake. Its Cypress Cove core (the 14nm version of Sunny Cove from the 10nm Ice Lake processors) can perform a maximum of two 512-bit integer additions. Zen 5 also has significantly better performance in floating point operations, while Rocket Lake has just half the 512-bit FMA performance of the server versions of Skylake-SP and Ice Lake-SP. But that’s not going to come into play anywhere in x265 as it doesn’t use floating-point instructions significantly (cutree rate control being perhaps the only exception that is still just a minor factor, though).

Below you can see the results of our encoding performance measurements on the Ryzen 5 9600X, Ryzen 7 9700X, Ryzen 9 9900X and Ryzen 9 9950X, both without and with AVX-512. But the chart also shows results showing the benefit of AVX-512 on those previously tested Intel Core i9-11900K, i7-11700KF and i5-11400F processors representing Rocket Lake, and Ryzen 9 7900X / Ryzen 5 7600X representing Zen 4.

In the standard HWCooling reviews, only results without AVX-512 are included and only those are taken into account, but as you can see, the results of Zen 5 architecture processors would improve a little in this one discipline (even compared to Zen 4) if you are willing to turn on these optimizations in x265.

The encoding task used by HWCooling for testing delivers a benefit of +6% on the Ryzen 9 9900X, +7 % on the Ryzen 9 9950X and on the Ryzen 5 9600X, but on the Ryzen 7 9700X we have an increase of as much as +9%. The differences are not due to the instructions behaving differently on each processor, rather it’s due to other performance factors like scaling to a given number of threads, and some of the variation is probably due to the fact that unfortunately Handbrake rounds the resulting FPS to single decimal digits and that’s not accurate enough.

However, with +6% to +9% gain, it looks like the speed improvement of AVX-512 is basically the same on processors with Zen 5 cores as it is on Intel Rocket Lake cores.

When encoding in x265 with AVX-512 active on Ryzen 9000 processors, we also measured the power consumption. This time, we did not establish the increase in power consumption caused by AVX-512 (we don’t have numbers for the run with optimizations disabled), but there doesn’t seem to be much need for it either. The Ryzen 9000 power consumption numbers, even with AVX-512 on, stays at virtually the same average values as the CPUs in question consume in the Handbrake test with x264 (where AVX-512 should be used automatically), the differences are in the low units of watts. At the same time, the power consumption is even lower than in the Cinebench R23 multithreaded workload, where AVX-512 is actually not used at all (and mostly 128bit SIMD is used). Thus, at least when running x265, the power consumption does not seem to increase significantly due to AVX-512. In multithreaded applications, the power consumption will simply be what these processors typically display in other multithreaded tasks.

The achieved clock speeds are also roughly the same (with a tolerance of a few dozen MHz) as the ones that Ryzen 9000 processors run at in Cinebench R23, and we also get virtually identical temperatures as in Cinebench R23. Although we don’t have any measurements to back this up, it seems that there is no adverse effect to the thermals of Zen 5 when encoding with AVX-512. And so it makes no sense not to use these instructions if your encoding application allows you to enable AVX-512 (in Handbrake you can automate this by copying and editing the profile you use).

As a reminder, on Rocket Lake processors we saw a similar increase in performance after turning on AVX-512 as on Zen 5, but it led to a 28-30% increase in power consumption (we actually measured it at the time), and temperatures jumped up accordingly. For example, on the Core i9-11900K, the consumption measured on the 12V cable went from 226W to 292W! In contrast, the Ryzen 7 9700X with the same eight cores and 16 threads consumes 104 W and performs 38% better. The Ryzen 9 9950X with 16 cores almost reaches 250W consumption (all these numbers are with VRM losses included though, so actual power is likely below the CPU’s 230W PPT), but at 156% higher performance.

Conclusion: A free bonus

So the AVX-512 is clearly helpful in x265 for Zen 5. Personally, I was a bit surprised that the performance increase after turning on AVX-512 is not a bit higher, considering that the SIMD unit of Zen 5 is quite a bit more powerful than the one in Rocket Lake. Zen 5 has four pipelines for 512-bit instructions and Rocket Lake only two (it can do three 256bit ops per cycle though). But if you compare Zen 5 with AVX-512 against Rocket Lake with AVX-512, you can immediately see the significantly higher performance per 1 MHz (see the performance edge of the Ryzen 9700X over the 11900K, even though it runs at a slightly lower clock speed), so the more advanced SIMD unit of Zen 5 does show its performance.

It’s possible Zen 5 would do a bit better, if the AVX-512 code was actually developed against the core’s own SIMD unit architecture. The fact that the developers were writing, testing and profiling the code on Skylake proccesors may be leading to it being more tuned (trained) for the Intel core. Such effect might not be large though, as Zen 5’s AVX-512 implementation seems to be generally strong with few weaknesses.

A performance increase of 7 to 9% in this particular program (which comes without any degradation in output quality!) may not seem like much. But code like x264 and x265 is not easily parallelizable, on the contrary it requires quite intelligent manual optimizations in assembly, and only about half of the CPU time spent is actually in SIMD code (the rest is scalar and non-parallelizable), and even even of the SIMDable portion, not all operations can be scaled to 512-bit vectors. Note that even without AVX-512, the encoder is already very thoroughly optimized. Achieving a further overall speed increase reaching these percentages without the use of 512-bit vectors would probably require very extensive work and refactoring in all important functions, possibly over many years, but it is possible that a similar result could not even be achieved at all without the discovery of some clever new tricks. And it’s good to note that even in that case then the benefit would probably be cumulative with the effect of optimizations that exploit AVX-512, so these instructions are definitely useful.

How much does AVX-512 help Zen 5 in x265? And how to turn it on

Ryzen 9000 with AVX-512 in x265/Handbrake

How to enable AVX-512?

What does it do: Is AVX-512 in Zen 5 better than Intel’s?

Conclusion: A free bonus

Further reading

AMD releases Ryzen 5 7400F, cheapest AM5 CPU for gaming PCs

Radeon RX 9000 MIA? What we learned (not) about RDNA 4 at CES

Cheaper Radeon RX 9070 has 16GB memory too. And 8pin power

Leave a Reply Cancel reply

Latest comments

Ryzen 9000 with AVX-512 in x265/Handbrake

How to enable AVX-512?

What does it do: Is AVX-512 in Zen 5 better than Intel’s?

Conclusion: A free bonus

Further reading

Related articles

Leave a Reply Cancel reply

Latest comments

Cookies