3

Intel Accused of Inflating Over 2,600 CPU Benchmark Results - Slashdot

 7 months ago
source link: https://slashdot.org/story/24/02/17/1739232/intel-accused-of-inflating-over-2600-cpu-benchmark-results
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Intel Accused of Inflating Over 2,600 CPU Benchmark Results

Slashdot is powered by your submissions, so send in your scoop

binspamdupenotthebestofftopicslownewsdaystalestupid freshfunnyinsightfulinterestingmaybe offtopicflamebaittrollredundantoverrated insightfulinterestinginformativefunnyunderrated descriptive typodupeerror

Sign up for the Slashdot newsletter! OR check out the new Slashdot job board to browse remote jobs or jobs in your area

Do you develop on GitHub? You can keep using GitHub but automatically sync your GitHub releases to SourceForge quickly and easily with this tool so your projects have a backup location, and get your project in front of SourceForge's nearly 20 million monthly users. It takes less than a minute. Get new users downloading your project releases today!
×

Intel Accused of Inflating Over 2,600 CPU Benchmark Results (pcworld.com) 35

Posted by EditorDavid

on Saturday February 17, 2024 @04:34PM from the Intel-insiders dept.

An anonymous reader shared this report from PCWorld:

The Standard Performance Evaluation Corporation, better known as SPEC, has invalidated over 2600 of its own results testing Xeon processors in the 2022 and 2023 version of its popular industrial SPEC CPU 2017 test. After investigating, SPEC found that Intel had used compilers that were, quote, "performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability."

In layman's terms, SPEC is accusing Intel of optimizing the compiler specifically for its benchmark, which means the results weren't indicative of how end users could expect to see performance in the real world. Intel's custom compiler might have been inflating the relevant results of the SPEC test by up to 9%...

Slightly newer versions of the compilers used in the latest industrial Xeon processors, the 5th-gen Emerald Rapids series, do not use these allegedly performance-enhancing APIs. I'll point out that both the Xeon processors and the SPEC 2017 test are some high-level hardware meant for "big iron" industrial and educational applications, and aren't especially relevant for the consumer market we typically cover.

More info at ServeTheHome, Phoronix, and Tom's Hardware.

  • Remember when companies like Boeing and Intel were companies with a proud engineering culture?

    Yeah me neither, that was a long time ago. Intel has been caught doing this type of stuff for decades.

    • Re:

      We're talking about compiler optimizations, not safety equipment on a passenger jet.

      Seems to me the issue goes away if Intel shares the compiler optimizations with the public, then the public will see the same performance as SPEC saw in their benchmarks

      The reality is the processor did the work as fast as it did in the SPEC benchmark, INTEL isn't accused of submitting a modified processor or altering the benchmark software, they are accused of setting the compiler to make the best use of capabilities the pro

      • They are accused of breaking this rule [spec.org]. IOW they created optimizations in their compiler that do nothing but speed up one benchmark (2 in total) in the suit but not other similar tasks that this benchmark is supposed to represent.

        Pretty much the only reason the SPEC people are not making a big stink about this is because Intel removed these optimizations in later versions of the compiler.

        • Re:

          PS: a little research gave me this: https://webdocs.cs.ualberta.ca... [ualberta.ca]

          My guess is that somebody tried the benchmark with the special optimization (and without), but a different workload, and realized the optimization suddenly didn't work as well.

          • Re:

            Sounds like an issue with the benchmark then. There are plenty of benchmarks that can simulate real workloads. The SPEC benchmark has very specific applicability, and yes, Xeon's are optimized overall in that realm, it's why people pay for specific SKU of these Xeons. If you need a particular optimization, it's likely you will set your compiler accordingly. I don't understand the issue overall, do they expect SPEC to simulate a scientific workload without optimizations, because that would reduce a lot of th

      • Iâ(TM)m not trolling, but: isnâ(TM)t optimizing for one target sometimes (oftentimes?) at the expense of performance for another target? If so, optimizing for a narrow, very specific set of operations would leave you underperforming in everything else. And while the benchmark was designed to RESEMBLE a real workload, if their meddling was so precise, then it would definitely only improve an extremely specific set of instructions. In which case we're back to the argument that you would not be buyin

        • Iâ(TM)m not trolling, but: isnâ(TM)t optimizing for one target sometimes (oftentimes?) at the expense of performance for another target?

          Not in this case. Intel cheated by modifying the compiler, not the silicon.

          Intel's compiler detected when it was compiling a benchmark and emitted optimized code. But there was no cost to non-benchmark code other than a few milliseconds of delay.

          Most people aren't affected because they don't use Intel's compiler. They use Microsoft compilers, GCC, or Clang.

          The lesson here is that you should never trust a benchmark from an interested party. Run your own benchmarks or get the results from someone you trust.

          The best benchmark is to run a system on your actual workload.

      • Re:

        No.... They're accused of tweaking the compiler for the sole purpose of inflating the benchmark. This was a performance boost that nobody was going to see. This wasn't some improved code they created to solve a bottleneck or execute a few clocks faster.... The only damn thing their tweaks did was cause the benchmark to run faster.

        That breaks the rules. You don't get to claim speeds that nobody or nothing is going to see because they cannot be achieved in the real word under any circumstances.

        That's

    • Re:

      If you count each of those 2600 inflations as "fraud," you'd have to count every advertisement ever, as fraud.

  • board member one: How are we going to beat AMD?
    board member two: We are going to cheat and let our brand name make up for any ties
    board member three: That right we are going to cheat and let our Bonus go high!

    • Re:

      Optimized or not, if all it takes to feed millions in revenue directly into executive pockets is selling a “benchmark” report full of shit few would ever replicate, then I put the blame more on the suckers falling for executive lies. It’s not that they’re brilliant at sales. It’s that most consumers are that gullible.

  • How aren't the test results relevant if the core designs are basically the same as in desktop CPUs? Only the number of cores in the SoC and the interface are different.

    • Re:

      The biggest difference is access to more RAM for the XEONs compared to the "desktop" version of the CPUs, some of the benchmarks in the SPEC suit simply benefit from more available RAM. (Also from faster RAM, but that shouldn't matter much within the same core generation.)

      But let's not forget: SPEC numbers only tell you something about a specific computer model with specific hardware and a specific compiler and settings. That's basically why they invalidated the results for 2500 machines and not for a coup

  • So let me get this straight...

    SPEC designed a benchmark to represent a real-world workload, then INTEL optimized their compiler to maximize performance in that benchmark, now SPEC is saying that by optimizing for their (real world simulation) benchmark, users of INTEL processors aren't going to see real world performance that matches the benchmark results?

    Sounds like INTEL optimized for what SPEC considered real world workloads, now SPEC is saying their benchmarks don't actually predict real world performan

    • Re:

      No, that's simply now how "representation" works. When something is representative of something else it doesn't mean that cheating is makes you case applicable to the other thing. Intel is not optimising for real world conditions. They are optimising specifically for something that is *not* the real world.

      • Re:

        SPEC benchmarks are pretty close to real world. It's the entire raison d'etre of SPEC. If I want to know which CPU is best at eg. fluid dynamics, I go to SPEC and see what CPU is best for the price, the optimizations that were made, which programming languages and compilers were used to get a certain result etc. I don't go to SPEC to see an overall useless number like the PassMark scoring system.

    • Re:

      Both (sub-) benchmarks are real world code used in real programs doing real tasks. So yeah, try again and get this straight.
      • The benchmark is a representative example of a real calculation workload, not an exhaustive list of all workloads.

        The compiler spitting out hand-tuned machine code when it recognizes the benchmark is somewhat, but not completely unlike the Diesel gate scandal of cheating on emissions tests.

        • Re:

          Exactly my point, but let's make clear that if you change the workload (IOW the input file) of the benchmark, you can no longer compare it to any other published results, only to those made with the same workload. Again: that workload is part of the benchmark.

          But if a specific compiler optimization only gives a notable improvement for a benchmark with the default workload, but not with (most) others, there is something fishy going on.

        • Re:

          That's not how code/benchmarks work. You compile the code, then give it the data, the compiler cannot predict when you compile a ray tracing Fortran or C program that you will then feed it a specific workload for benchmarking purposes.

    • Re:

      No. Imagine a test track for an autonomous vehicle. It is a standard track meant to be representative of city traffic but it plays out the same way every time. So instead of a system that truly drives the car autonomously, you simply build a clockwork that always operates the car the same way with no awareness of the surroundings.

      You'll get a 100% in the test and fail miserably in the real world.

      Similarly, Intel used a compiler rigged to do especially well on the benchmark and only the benchmark.

  • Ask me how I know? Intel might pick and choose their benchmark highlights but at least I've never purchased a 600 dollar chip that melted in 3 months.

    • Re:

      And? So what. You got sold something covered under warranty due to a manufacturer defect. You didn't get lied to, and you were entitled to a replacement which AMD have honoured without issue. Things randomly break, it's a fact of life. It's why the concept of warranty exists in the first place.

      Comparing that with cheating a benchmark suite is not the same thing. Your whataboutism is lame.

  • Intel used to be the king of CPUs but these last 6 years have seen them lose out in a number of areas. AMD has beat them handily since the launch of Ryzen in performance, efficiency, and sometimes cost. During that time, they lost Apple as a major customer because Apple would not wait year after year for chips that were not any better than the previous generation. ARM based CPUs are the defacto CPUs in smartphones and tablets.

    Incidentally there shades of this cheating when Intel unveiled their "Go PC" campa

    • Re:

      Intel is the "Boeing" of semiconductors.

  • ... then you have to be able to explain how these specific benchmark values influence your personal purchasing decisions.

    After investigating, SPEC found that Intel had used compilers that were, quote, "performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability."

    What the hell do the 523.xalancbmk and 623.xalancbmk benchmarks measure?

    https://www.spec.org/cpu2017/D... [spec.org]

    https://www.spec.org/cpu2017/D... [spec.org]

    Apparently they benchmark XML to HTML translations.

    • Re:

      Sorry, one link was bad - here's the correct link:

      https://www.spec.org/cpu2017/D... [spec.org]

    • Re:

      So lets assume the individual compiler developer is using existing benchmarks to both run his test code and confirm that his latest optimization push did not make the code base slower. If optimization for the sprecailized case give a 9% increase, green light but upon further review the specailized case exists in 1% of the code, does one really rollback their specialized case, the developer will just not bother pushing his code to the approval process until he has 20 of those, or just before the bonus win
    • Re:

      Ohh, so now you know those benchmarks are real world programs, and have switched to claiming cheating isn't a problem. Anything to defend Intel, eh?
    • The first thing you are missing here, Intel had prior knowledge of the benchmark code, their competition didn't - so they optimized the code using that prior knowledge, aka cheated.

      The second thing you are missing here is that the CPU's destined for data-centers is a +300 billion dollar market, if Intel can cheat to increase their market-share with 0.5%, for example, it translates to a revenue-increase around a billion dollars.

      That's why it's a big deal, and all this is on par for Intel since they have a long history of sleaziness when it comes to benchmarks, especially when the competition is taking market-share from them.

  • What is the purpose of the benchmark tests? Do they validate raw processor performance, or do they validate performance in a software task-oriented environment?
    Whatever anyone did in this story, it was not a hardware (tweak from what I can see reading the articles and other links).

    If Intel programmers could wrangle better performance out of a testing regime by writing a better compiler to produce more algorithmically compact and efficient machine code, then doesn't that mean that there is room for improvem

  • No, in layman's terms, it is called cheating, plain and simple.

    This is no different from a teacher, knowing what questions were in the coming exam, use those same questions as "examples" when teaching his/her students in class. Doing that would called cheating just like what Intel did.

    Still want to argue? Imagine it were some Chinese chip company doing this instead of Intel, would you still continue to defend this practice?

    • Re:

      When Volkswagen did the same (cheated the emission benchmark by previous knowledge of the benchmark algorithm), they got hit with billions in fines. https://en.wikipedia.org/wiki/... [wikipedia.org]


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK