Statistics from game developers reportedly show that tens of percent of all Core i9-13900K and i9-14900K processors suffer from crashes in games
Soon it will be six months since the issue with game crashes on 13th and 14th generation Intel Core CPUs came to be widely known. Intel largely keeps silent while looking for the root cause, but the issue is unfortunately still ongoing, and not at all rare. This has now been substantiated by an investigation of YouTube channel Level1Techs, which says that as many as tens of percent of processors are displaying these problems.
Level1Techs turned to sources that may have more extensive data on how common the problems are for Raptor Lake processor owners. These are, first, game developers who have telemetry on the functioning of games on user PCs (this is supposed to be a database of crashes from two games using Unreal Engine); and as the second source also operators of servers running with Raptor Lake processors. These are often used to run game servers, so the source of the information was probably again contacts in game developer companies.
The instability issues are mostly reported on the 125W TDP Core i9 and Core i7 processors of the 13th and 14th generations (2022 Raptor Lake and the 2023 Raptor Lake Refresh families), and fortunately they don’t seem to affect the older 12th generation Alder Lake, or so it seems, for now. For simplicity’s sake, Level1Techs looked at what the error databases from the games show for the Core i9-13900K and Core i9-14900K CPUs.
- Read more: Raptor Lake is unstable in games. Too high clock speeds or PL2
- Read more: Intel has spoken out about unstable processors. Board manufacturers’ fault?
- Read more: Unstable Intel CPUs: performance drop with new BIOSes to be smaller
- Read more: Unstable Intel processors have a TVB flaw, but there’s still no solution
Game crashes on Raptor Lake actually happening in practice
The most well-known “detector” of instability on Intel processors is game crashes during shader decompression using the Oodle library, that first pointed the finger at the processor instability and thanks to which it was first widely observed. According to data obtained by Level1Techs, logged Oodle crashes are very strongly correlated with the problematic Intel Raptor Lake processors. In the error database, 1431 Oodle decompression crashes were recorded for the suspect processors, while a total of mere four incidents were recorded an all AMD processors in the dataset (with a roughly 30% share of players reportedly running an AMD processor, so there is clear discrepancy).
According to Level1Techs’ analysis, it appears that up to 20–30% of gamers with a Core i9-13900K or Core i9-14900K processor experienced at least one “crash” while playing the games tracked. However, the Raptor Lake processors associated with these bugs are also quite often showing errors when accessing game data from a drive, according to game telemetry, so it seems that instability may be manifesting on this level as well. The same log databases from gaming telemetry did not show similar IO errors on the AMD platform.
Importantly enough, from looking at this data, it also seems that the chance of getting errors on these processors increases with the amount of time they are used. We can’t say that for sure yet, because the available data is hard to evaluate properly, but it does seem to point to that conclusion a bit. If this were confirmed, it could support the suspicion that the dreaded silicon degradation phenomenon (gradual buildup of damage to the circuitry and loss of functionality) is at work here as the factor behind the problem.
Overclocking can’t be the only cause, even server motherboard systems are seeing issues
A rather significant finding is that large number of occurrences of the errors on Core i9-13900K and Core i9-14900K processors are also reportedly being experienced in instances of these processors being used in servers. However such systems are mostly built around boards with the W680 chipset, which is supposed to be a more stable and reliable platform and does not allow CPU overclocking. Therefore, their problems cannot be explained as simply an error caused by overly aggressive and unstable overclocking and thus chalked down to the user’s (or the manufacturer’s of the board or PC, if they set the overclock automatically) rather then CPU’s fault. Which is often a point made in Intel’s defense, but in the light of the latest findings this explanation feels more and more untenable.
Level1Techs says in the video that the channel has investigated the issue in collaboration with two server operators using Raptor Lake processors with W680 boards from Asus and SuperMicro, and both vendor’s motherboard have reportedly experienced instability issues on roughly 50% of these systems, with the percentage being really close on both brands. And again, it is alleged that the frequency of problems seems to be increasing with the accumulated length of time the processors were operating. Some operators have also reportedly started charging several times higher support fees for dedicated hosting on hardware with Raptor Lake processors (Core i9-13900K / Core i9-14900K) with explicitly quoting the frequent hardware problems and service interventions they cause as the reason.
Information from the sources within the PC manufacturers’ circles that Level1Techs contacted in parallel with this investigation is reportedly more vague, but these sources also reportedly state that problems can be experienced on up to 10-25% of Core i9-13900K and Core i9-14900K processors and those are considered faulty or “marginal”. The video is focusing on these two SKUs, but apparently the problems also affects the lower-end Core 13. and 14. generation models to some degree). The staff contacted reportedly still have no information as to where there is a known root cause from intel, which is consistent with what the company is currently saying publicly about the status of the investigation.
Level1Techs also talks about attempts to restore the affected servers to operational status and get the 125W Core i9 processors to work without errors. The stability is supposedly improved by updating to the latest available board BIOSes, but unfortunately those don’t solve the problem completely. Lowering the maximum clock speed (CPU multiplier) to 5300 MHz, and particularly slowing down the memory (all the way to to DDR4-4200 if four modules are used) supposedly helps. Sometimes turning off the E-Cores is offered or tried as a solution, but this is said to not help very reliably, so E-Cores are not strongly tied to the problems.
It’s been a very long time and almost no communication from Intel
In any case, it’s precarious that this problem is apparently a widespread issue, but Intel hasn’t addressed it properly for almost half a year. The company only confirmed that it is in fact analysing the reports and the problem, and after that there was only trickle of information about changes in the default settings of the boards, which Intel requested from manufacturers as a countermeasure. But while working on that front, Intel did not provide any statement directly towards customers, almost as if the company was trying to draw as little publicity to the issue as possible.
The situation begs worries that perhaps at this point it’s not hunting a specific bug that is known to be there somewhere but rather hoping that such a bug in the clock speed management even exists and finding it can fix the chips through some change or mitigation that could be done, perhaps to the code controlling the clock speed and power consumption. And the problem is that so far it Intel didn’t find anything like that. In other words, it may be a structural problem where the architecture is actually working as expected and projected by the engineers, but it’s actually that intended behaviour that is itself turning out to be problematic, which can’t be changed by finding some hidden bug in the implementation.
It is possible that eventually no distinct fixable flaw behind the crashing and suspected degradation issues will be found and it will turn out that there is no simple solution. Unfortunately, the ultimate conclusion may be that Intel either failed to catch a stability issue affecting the processors during validation or manufacturing and assembly testing, or has erroneously set the CPUs to higher clock speeds than they are reliably capable of. Or that the company’s silicon aging and degradation rate projections and testing of the chips didn’t manage to successfully model and identify the long-term effects of running the silicon at high clock speeds and with large currents flowing through it under load.
However, these options could lead to very large financial losses due to RMAs or product recalls. We don’t yet know if things are going in that direction, but we also don’t know how ready Intel is to solve the problem in a customer-friendly way – after all, if it is confirmed the chips degrade over time, then it may not be optimal to cover this issue only through the standard RMA process. Many users may want to run the CPU for longer more than the warranty period which is two or three year warranty and standard RMA won’t help them with failures caused by an inherent product defect after warranty ends, while a recall of affected products would.
If you’re going to buy or upgrade a computer now, it’s probably best to avoid (at least for now) the processors that have these issues reported most often, which is the 125W Core i9 and i7 SKU of the 13th and 14th generation, but it’s possible there’s a non-negligible risk of problems with the 65W Core i7 and i9 processors as well. In contrast, cheaper models like the 65W Core i5s will hopefully always be safer due to the lower clock speeds. It’s also probably a good idea to avoid buying processors that are known to be prone to these issues second-hand. There is a chance that someone may be trying to sell their faulty CPU after seeing the first sign of instability, and once the buyer discovers them too, it can be too late and RMA claim as a second owner may be complicated or even impossible.
It remains to be seen whether these widespread problems might shake the uncanny loyalty of PC OEMs to Intel processors, and lead to a greater breakthrough of AMD chips into prebuilt PCs. Sourcing AMD processors instead of Raptor Lake (at least until Intel releases the next generation, Arrow Lake, whose clock speeds will hopefully be dialed in a bit to be stable and avoid degrading over time) is a solution to this mess that naturally presents itself, after all.
Source: Level1Techs, GamersNexus
English translation and edit by Jozef Dudáš
⠀