Lies, damn Lies and Benchmarking!


Image

vs

Image

Making decent and objective Benchmark test is quite hard almost impossible task.

Why? as the different systems should be put in equal testing conditions.

Good example is the ARM commercial compilers, if you go to the web site of Keil, IAR, etc you will see on their web sites benchmarks which show how their compiler is best as code, speed etc.

How this is done? There is always example which you can make for your specific compiler which to execute faster and to have shorter code than the competitor, so this is cheap trick which less and less novices buy.

All these commercial compilers usually show GCC as slowest and less optimized compiler, needless to say they have run these tests with questionable options🙂 real tests show that almost all ARM compilers produce about same code, the variations are few percents, also most of commercial compilers would fail to do the job where GCC shines, i.e. to compile and link HUGE projects like GB sources of Linux Kernel, so the focus which compiler to use should be on what additional value these compilers have around them, i.e. what flash loaders, processor support, demo examples are available which to kickstart your development.

So if we benchmark same architectures, on same code and same compiler with same settings it’s possible to have something relevant, but how do we benchmark different architectures?

Let’s face it – this is not possible to be done objectively and the recent debates for AnTuTu fake benchmarking for Intel new processors prove it.

In June, The Reg reported analyst firm ABI Research’s claim that it had pitted a Lenovo K900 smartphone based on Intel’s Atom Z2580 processor against a brace of devices build around ARM system-on-chip (SoC) components and found that not only did the Intel part perform better, but it also drew less power.
Jim McGregor of analyst firm Tirias Research smelled something fishy, and after investigating, he now says the surprise showing by Intel had less to do with the chip itself as it did with inconsistencies in the AnTuTu benchmark used to conduct the tests.
McGregor’s first clue was that different versions of the benchmark produced wildly different results.
“Going from the 2.9.3 version of the AnTuTu benchmark to the 3.3 version, the overall AnTuTu scores increased 122 percent, and the RAM score increased 292 percent for the Intel processor, while the scores for the Samsung processor increased only 59 percent and 53 percent, respectively,” McGregor wrote in a blog post at EE Times. “This was just from a change in the benchmark test, not the processors.”
Versions of AnTuTu for ARM chips are built using the open source GCC compiler. But beginning with version 2.9.4, AnTuTu for Intel is built using ICC, a proprietary optimizing compiler designed by Intel itself.
Working with AnTuTu and technology consulting firm BDTI, McGregor determined that the version of the benchmark built with ICC was allowing Intel processors to skip some of the instructions that make up the RAM performance test, leading to artificially inflated results.
AnTuTu released version 3.3.2 of the benchmark on Wednesday to address the problem, and according to McGregor, it negates Intel’s artificial advantage. Intel’s CPU and Overall scores are now about 20 per cent lower than they were with the previous build, and the RAM score is around 50 per cent lower.

It’s still questionable if these new results are valid as AnTuTu didn’t explain what changed in the benchmark – the result AnTuTu is IMO totally compromised as reliable source for benchmarking, switching to Intel compiler smells of corruption😉

more on this subject:

http://www.theregister.co.uk/2013/07/12/intel_atom_didnt_beat_arm/

http://www.theregister.co.uk/2013/06/14/intel_clover_trail_plus_benchmark_comparison_with_arm/

http://www.eetimes.com/author.asp?section_id=36&doc_id=1318857

http://www.eetimes.com/author.asp?section_id=36&doc_id=1318894

http://news.cnet.com/8301-1001_3-57593426-92/debate-sparked-about-benchmark-for-intel-arm-chips/

5 Comments (+add yours?)

  1. Dirk Porsche
    Jul 17, 2013 @ 11:59:59

    Nice post. I have ever wondered how to come up with a benchmark that does actually mean anything.

    You changed your theme. I liked the former one more, actually. It reflected – with it’s whiteness and simplicity – the technical focus of this blog better.

    The current one is to earthly.

    I really appreciate that you put more effort into this blog (at least it feels like you do that), and scratch many interesting topics that aren’t directly related to your products.

    Reply

    • OLIMEX Ltd
      Jul 17, 2013 @ 12:24:38

      Previous theme was clean indeed but it had side column and I wanted to put Categories and links, so I had to search for other theme which is with two columns

      Reply

  2. Max
    Jul 17, 2013 @ 12:22:47

    Good catch, it certainly puts the eariler sensationalist headlines probably everybode heard about into the proper perspective.

    However, for me, the significant part isn’t even whether Intel did actually suddenly beat ARM in performance and power consumption or not quite yet – but that now it presents a serious competition, whereas historically it was common knowledge that Intel couldn’t touch ARM’s supremacy in the mobile world before. Well, they sure seem to be in the same ballpark now. I guess that is when things start becoming interesting – will Intel simply sail past ARM, will ARM pull away again, or are we in for a “photo finish” type race…? We’ll see…

    Reply

    • OLIMEX Ltd
      Jul 17, 2013 @ 12:31:55

      Atom Z2580 processor is produced at 32nm process while Exynos 5 Octa is at 28nm, so technologically is possible Exynos to be faster and with less power consumption, this of course depend on the design process as well

      Intel have no other choice but to move in the phablet market as the PC volumes are declining and portable computing is rising

      Reply

  3. funlw65
    Jul 17, 2013 @ 14:13:55

    BDTi says:
    “If we really want to understand the performance of smartphones and their processors, I believe the answer is not to recycle 20-year-old benchmarks that were never designed for this purpose, and have little relevance to how users actually use their smartphones. Instead, I think we need a new benchmark, designed from the ground up to reflect whole-system performance (including battery life) as experienced by the user.”

    I may add: “… also including heating temperature, the increasing noise over time, and how it degrades” – this is how we evaluate and buy a product.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: