Lies, damn Lies and Benchmarking!




Making decent and objective Benchmark test is quite hard almost impossible task.

Why? as the different systems should be put in equal testing conditions.

Good example is the ARM commercial compilers, if you go to the web site of Keil, IAR, etc you will see on their web sites benchmarks which show how their compiler is best as code, speed etc.

How this is done? There is always example which you can make for your specific compiler which to execute faster and to have shorter code than the competitor, so this is cheap trick which less and less novices buy.

All these commercial compilers usually show GCC as slowest and less optimized compiler, needless to say they have run these tests with questionable options 🙂 real tests show that almost all ARM compilers produce about same code, the variations are few percents, also most of commercial compilers would fail to do the job where GCC shines, i.e. to compile and link HUGE projects like GB sources of Linux Kernel, so the focus which compiler to use should be on what additional value these compilers have around them, i.e. what flash loaders, processor support, demo examples are available which to kickstart your development.

So if we benchmark same architectures, on same code and same compiler with same settings it’s possible to have something relevant, but how do we benchmark different architectures?

Let’s face it – this is not possible to be done objectively and the recent debates for AnTuTu fake benchmarking for Intel new processors prove it.

In June, The Reg reported analyst firm ABI Research’s claim that it had pitted a Lenovo K900 smartphone based on Intel’s Atom Z2580 processor against a brace of devices build around ARM system-on-chip (SoC) components and found that not only did the Intel part perform better, but it also drew less power.
Jim McGregor of analyst firm Tirias Research smelled something fishy, and after investigating, he now says the surprise showing by Intel had less to do with the chip itself as it did with inconsistencies in the AnTuTu benchmark used to conduct the tests.
McGregor’s first clue was that different versions of the benchmark produced wildly different results.
“Going from the 2.9.3 version of the AnTuTu benchmark to the 3.3 version, the overall AnTuTu scores increased 122 percent, and the RAM score increased 292 percent for the Intel processor, while the scores for the Samsung processor increased only 59 percent and 53 percent, respectively,” McGregor wrote in a blog post at EE Times. “This was just from a change in the benchmark test, not the processors.”
Versions of AnTuTu for ARM chips are built using the open source GCC compiler. But beginning with version 2.9.4, AnTuTu for Intel is built using ICC, a proprietary optimizing compiler designed by Intel itself.
Working with AnTuTu and technology consulting firm BDTI, McGregor determined that the version of the benchmark built with ICC was allowing Intel processors to skip some of the instructions that make up the RAM performance test, leading to artificially inflated results.
AnTuTu released version 3.3.2 of the benchmark on Wednesday to address the problem, and according to McGregor, it negates Intel’s artificial advantage. Intel’s CPU and Overall scores are now about 20 per cent lower than they were with the previous build, and the RAM score is around 50 per cent lower.

It’s still questionable if these new results are valid as AnTuTu didn’t explain what changed in the benchmark – the result AnTuTu is IMO totally compromised as reliable source for benchmarking, switching to Intel compiler smells of corruption 😉

more on this subject: