Why is Zen 4 so fast in Topaz Labs AI apps? In fact it’s Intel’s doing

Zen 4 and VNNI

Ryzen 7000 with Zen 4 architecture is the first AMD processor to support 512-bit AVX-512 vector instructions. We’ve already discussed their benefits (bigger or smaller) here. But the Zen 4 cores support another instruction set extension that used to be Intel’s pride and joy, and now the roles have reversed a bit: VNNI. It seems to bring huge performance improvements in a number of apps, despite the limited 256-bit width of Zen 4 SIMD units.

You may have heard of VNNI (Vector Neural Network Instructions) before under the name DL Boost. This designation subsumed the 512-bit VNNI instructions, also sometimes referred to as AVX512_VNNI, on the one hand, and support for BFloat16 (AVX512_BF16) data type operations on the other. The second extension was first featured in the Cooper Lake server Xeons, the first one (VNNI) was one of Intel’s highlights for the 10nm Ice Lake and Tiger Lake processors (10th and 11th generation Core for laptops).

Intel promised that VNNI instructions would dramatically increase the performance of these processors in neural network operations, the “AI” applications for which these instructions are explicitly designed. They should use 16-bit and 8-bit precision (with integer values), which are useful for inference, i.e. for applying an already trained network. The company then partnered with Topaz Labs to have them use VNNI (via the OpenVINO framework) to optimize their applications (Gigapixel AI, Denoise AI, Video Enhance AI…).

And Intel then showed Topaz Labs apps in their official benchmarks, where they gave the 10th/11th generation quad-core mobile processors higher performance than they would normally get. At the time, the  advantage over competing processors without VNNI was significant.

Upscaling with AI from Topaz Labs (source: Intel)

Previously an advantage for Intel, now for the competition

With the arrival of Zen 4, however, the tables are turning on this one. Ironically, Intel removed support for AVX512_VNNI instructions from Alder Lake processors because they use 512-bit ZMM registers and are one of the subsets of AVX-512 (albeit a very specific one). Conversely, AMD has jumped in with Zen 4 core that introduces these instructions, so now there’s a situation where the advantage is on their side.

In Topaz Labs apps, we did  observe performance that is well above the average of the Ryzen 7000 in other programs in our reviews. The Ryzen 9 7900X was 90–126 % faster than the Ryzen 9 5900X, but even the Alder Lake processors got a similar beating – against those, the Ryzen 9 7900X is 75–95 % faster in these tests, which isn’t really in line with results common in other benchmarks and apps. And 7900X isn’t even the most powerful model AMD has in the Zen 4 lineup. We’ll see if the Ryzen 9 7950X manages to scale even higher. However, even the hexacore Ryzen 5 7600X already shows really high performance.

Zen 4 Benchmarks: AI applications Topaz Labs



Such an extraordinary performance increase shown by Zen 4 looks suspicious at first, but you may remember from the AVX-512 article that Phoronix found a number of tests using the OpenVINO framework (and hence probably VNNI instructions) where Zen 4 achieved similar up to 2× increase. So the explanation is obvious: although Topaz Labs apps VNNI acceleration was originally designed for Intel processors, it is also automatically enabled on Ryzen 7000s.

Read more: AVX-512 on Ryzen 7000: how useful is it and is AMD’s implementation better than Intel’s?

We asked Topaz Labs directly about this and received a confirmation that these programs do indeed use VNNI on Zen 4. And these instructions also, despite the fact that AMD implemented AVX-512 using 256-bit units, clearly have enough performance to make it worthwhile. So these scores are not some weird anomaly and do show a legitimate result – the speed boost is so anomalous because it is a case of accelerating specific operation and not general code performance.

According to Topaz Labs, their applications should also use the form of VNNI which is called AVX2_VNNI or VNNI/256 and was created for Alder Lake processors. Since Intel disabled AVX-512 on these processors, the VNNI instructions using the same 512-bit registers had to be disabled as well. The small Gracemont cores don’t have them and only support AVX2 (apparently with 128-bit units). However, because of the usefulness of VNNI, Intel made the aforementioned AVX2_VNNI version that works with just 256-bit registers for the hybrid processors. However, AVX2_VNNI should have just half the compute throughput (but so should Zen 4 given its double-pumped 256-bit operation), and will also probably be slower on E-Cores than the Golden Cove P-Cores.

Intel slide advertising the high performance of Topaz Labs AI applications, enabled by the AVX512_VNNI instructions of Ice Lake and Tiger Lake processors (source: Intel)

And as the Core i9-12900K results show, the lower performance of AVX2_VNNI against the Zen 4 implementation is a very real thing. We originally wondered whether, for example, Topaz Labs’ AI applications ignore the AVX2_VNNI instructions in Alder Lake (or were not yet modified to make use of it), but the company says that this 256-bit version is actually used and thus Alder Lake is actually benefiting from it in these tests. (Unless their detection and usage is perhaps implemented in a later version than our methodology uses, perhaps?). On the other hand, the performance of other Intel processors that should have the original full-performance 512-bit version of VNNI (Rocket Lake, for example Core i9-11900K) is relatively low too. Those don’t see a similarly brutal performance increase over thire predecessor (Core i9-10900K) that Zen 4 does.

Who knows, perhaps Intel is now regretting that it invested in accelerating apps like Topaz Labs software via VNNI and OpenVINO, now that it sees how – at least for the moment – it benefits the competition mre than them…

Sources: Topaz Labs, Intel

English translation and edit by Jozef Dudáš


⠀⠀

  •  
  •  
  •  
Flattr this!

AMD releases Ryzen 5 7400F, cheapest AM5 CPU for gaming PCs

AMD announced several CPUs at CES 2025 during the keynote now notorious for the absence of Radeon graphics cards  – Ryzens 9 9900X3D and 9950X3D with V-Cache, Strix Halo extreme laptop CPUs, Krackan APUs and Ryzen Z2 for handhelds. Later we found out AMD stealthily launched even more CPUs, among them Ryzen 5 9600. It turns out there is yet another potentially attractive AM5 CPU that has been launched to market in this manner. Read more “AMD releases Ryzen 5 7400F, cheapest AM5 CPU for gaming PCs” »

  •  
  •  
  •  

Radeon RX 9000 MIA? What we learned (not) about RDNA 4 at CES

AMD revealed a lineup of new CPUs for 2025 during CES keynote, but not graphics cards. Although the new RDNA 4 graphics cards were believed to target CES reveal, the Radeon RX 9070 and RX 9070 XT were not discussed. It seems out that the new cards were supposed to have just a “preview” at CES 2025. They were almost glossed over in the keynote, but outside of the video presentation, some information on the new cards was shared. Read more “Radeon RX 9000 MIA? What we learned (not) about RDNA 4 at CES” »

  •  
  •  
  •  

Cheaper Radeon RX 9070 has 16GB memory too. And 8pin power

AMD’s new graphics card, the Radeon RX 9070 XT, is coming this month. Chinese sources say it will have a 260–270W TDP and clock speeds of around 2.8GHz base and up to 3.0–3.1GHz in boost. Recently, more details of it came together, as well as the first news on the cheaper Radeon RX 9070 “non-XT” version. The latter was expected to have its memory stripped down to 12 GB and accordingly lower emory bandwidth. But things may be better. Read more “Cheaper Radeon RX 9070 has 16GB memory too. And 8pin power” »

  •  
  •  
  •  

Leave a Reply

Your email address will not be published. Required fields are marked *