Microsoft Caught Cheating in the Sunspider Benchmark

By | November 18, 2010


Microsoft Caught Cheating in the Sunspider BenchmarkOh boy, here we go again.

If rumors are proved to be true, then Internet Explorer 9 has got some dirty tricks up its sleeve and they were just debunked.

To put it simply: Microsoft may be cheating in the Sunspider benchmark.

Mozilla’s engineer Rob Sayre was testing different web browsers until he noticed something odd.

Internet Explorer 9 was ~10 times faster in a specific SunSpider test (math-cordic) than any other browser.

Does that prove anything? No. So he continued to investigate.

By tweaking code a little bit, he made a couple of test variations and added a “return” and “true”

As a result, he made a couple of test variations by adding a “return” and “true” values that should not impact results in a big way.

What happened?

Instead of 1ms, it now took ~20ms for Internet Explorer 9 to complete this specific test.

Microsoft Caught Cheating in the Sunspider Benchmark

What about Google Chrome or Opera? It made little to no difference.

Google Chrome
Microsoft Caught Cheating in the Sunspider Benchmark

Opera
Microsoft Caught Cheating in the Sunspider Benchmark

What does it all mean? Either, Microsoft has optimized IE9 to triumph the SunSpider benchmark or it’s some sort of bug/unexplained issue.

What’s your take on this issue?

Update: Microsoft has clarified benchmark results. See “Dead Code Elimination in JavaScript” paragraph.

Thanks, Toxigene.


About (Author Profile)


Vygantas is a former web designer whose projects are used by companies such as AMD, NVIDIA and departed Westood Studios. Being passionate about software, Vygantas began his journalism career back in 2007 when he founded FavBrowser.com. Having said that, he is also an adrenaline junkie who enjoys good books, fitness activities and Forex trading.

Comments (21)

Trackback URL | Comments RSS Feed

  1. nvm says:

    Did you expect Microsoft to be honest?

  2. I wouldn’t make any assumptions. It might be anything, one programmer’s mistake, a joke, a “too much optimization” problem, or some kind of a bug. Or maybe someone just wanted to impress his own superior, and things just get out of control. In other words – unintended consequence.

    Well I wouldn’t at least blame “Microsoft” as a whole. I doubt that something like this would be official decision.

    And last thing – it is not surprise, that browser vendors sometimes optimize their code for particular set of tests. Microsoft just pushed it a little further. But, hey, their browser is fast in those particular situations :P

  3. lutze says:

    Not sure if ignoring dead code is cheating, especially since Microsoft wrote about it in blog post:
    http://blogs.msdn.com/b/ie/archive/2010/11/17/html5-and-real-world-site-performance-seventh-ie9-platform-preview-available-for-developers.aspx

    • lutze?
      as in “Don’t forget Rule Number One” Lu-Tze ?

    • Toxigene says:

      It is cheating if they only employ dead code elimination on one occasion which triggers only on a specific benchmark and only with a strict set of code. It is not “analysis” then, it is specific optimization, which is cheating, since such “analysis” would not improve real-world JS performance. That’s what they did. Writing about it is trying to control the damage to their reputation. It’s called PR.

      • lutze says:

        “That’s what they did.”
        Well, do you have any prove that it works only in benchmarks?

        • Toxigene says:

          Since people have been repeatedly saying that modifying even slight portions of the code with items which should not change the process or outcome of the code just makes this “dead code elimination” not trigger, it’s suspicious behavior that it’s specifically searching for matching code. And if you read this:
          http://blog.mozilla.com/rob-sayre/2010/11/17/dead-code-elimination-for-beginners/

          You can see that it only triggers for functions that are specifically in that Sunspider benchmark. It doesn’t handle complementary operations. As someone in the IE blog pointed out:

          The Chakra DCE “optimization” handles ++ but not –, > but not <=, <>. In short, it handles exactly the set of operators found in math-cordic, and not others. This makes no sense if it’s intended to be a general-purpose optimization.

          No, it doesn’t work in “only benchmarks” since it takes advantage of a structural flaw of the Sunspider benchmark, which might not be available in other benchmarks. But Sunspider is important since IE team has been touting it as their “speed superiority” claim to be the fastest JS engine, and if you take out such suspicious behaviors then it loses such advantages. From the graph they posted you can see how close the bars of Opera and Chrome are to IE PP7, and this “flaw” might be the only thing that has allowed to take it the top place.

          • nobody says:

            i do not consider it cheating

            i also strongly suspect that similar activities were performed by other browser vendors as well. sometimes two implementations are equal in almost everything, but one is present in tests and other one isnt. it is quite normal to include the first one.

            ie9′ dead code removal is effective it seems and taylored for most common cases (how often you use — in your code? ++ is the thing to optimise for if you can pick only one)

            if you are a coder (i doubt, but who knows) get some opensource repo JS code, and run static analysis and compare how often ‘++’ appears compared to ‘–‘ or ”)

            anyhow, js seriously needs new benchmarks. how come opera never made one (they claim they are useless, but like to brag about them anyway, and always moan that others did it wrong..)

          • Toxigene says:

            @nobody Yeah, it’d be nice and useful IF it actually, you know, triggered in anything other than the Sunspider benchmark. It seemingly doesn’t. And that’s the issue here. In their blog post, the IE devs posted how the concept of Dead Code Elimination works, in theory, and then explained why it works in math-cordic. Well duh, everyone knows why it works in math-cordic by now. The thing is, it doesn’t work elsewhere, nor if you change slight parts of it with redundant code. Yeah, very useful. And your argument about ++ being more is baseless and strictly narrowed to one case. If they were able to do DCE with ++, they won’t have trouble doing it with the other operators too. Making one set of optimization which specifically and strictly helps a part of Sunspider is cheating. What are the chances of them making a DCE algorithm which only works strictly for operators present in math-cordic?

            You may like to play the devil’s advocate, but fact is fact.

            Oh BTW, JS has plenty of new benchmarks. Feel free to look up V8 or Kraken, or even Celtic Kane. You seem to have a bad habit of bringing Opera in everything. I’m sure you’d have found something to complain about if Opera made one as well, it’s better they don’t give trolls that chance. :)

            Oh and as far as complaining and bragging is concerned, IE team repeatedly keep saying that “microbenchmarks” like Sunspider means nothing then keep posting that same sunspider chart with every release to brag about that. I wonder how you don’t notice that? Maybe you should that off that eyepatch of yours or something. lol

          • nobody says:

            eeer… not

            first – optimizing for EVERYTHING means no optimization at all. i’ll stand by my previous claim that it is very possible, that obscure operators like ‘–‘ or ” (use != instead) will NOT be taken into account. it takes time to do DCE and as gains are relatively small, doing it for all makes little sense.

            second – i havent seen analysis on how it performs in other scripts and cases. all i see is angry talk, no code that i can download and see for myself. until i see proof that ie did DCE for sunspider ONLY, then i’ll remain believing that they were not cheating

            third – opera should have produced benchmark. they are the only browser entity that didnt, and they are the browser entity that moans the loudest. their employees like to poke at others (Haavard – opvard at twitter) but when it comes to real work, they have nothing to show in this area. less talk, more work dear opera. you dont earn respect and recognition with words in this business.

            fourth – belittle me more. wow, so there ARE OTHER BENCHMARKS? oh noes.. mr obvious.

            ie team show sunspider from the beginning of ie9 development. they started when ie9 looked VERY bad, and they are showing progress. given that benchamrks arent ideal, it is the progress that might mean more than raw results. and progress they indeed.

            i still consider it fair (until somebody stop barking and start working and produce code that should, but does not start DCE in ie9).

            there should be new benchmarks. less syntetic, more ‘real world’ (including common human mistakes in coding, like idiotic use of hand made functions over built-in implementations etc)

          • Toxigene says:

            Really, now? Optimizing for everything means not optimizing at all? So deadcodes with ++ should be optimized and – should not be because then it’d totally nullify the optimization? Right. You have some serious flaws in logic you need to work on. The fact that you are selectively ignoring the fact that the “DCE” only and ONLY works for operators present in math-cordic and even a single redundant code makes it not trigger means you have an agenda, not an impartial argumentative mindset.

            Second: Uh, learn to read, I guess? What does the fact that the same code that triggers the DCE in math-cordic doesn’t trigger the DCE if you add a redundant true; that doesn’t change the process or result of the code in any way, tell you? I’m sure it tells you the same thing to any sane, logical person. It’s not people’s fault that you selectively decide to ignore it so you can debate with people.
            If you wanted to test things for yourself, you would have by now, or at least read the comments from independent sources who HAVE done the testing. It’s obvious you didn’t. That’s probably because you’re too busy making claims here and trying to refute facts with baseless claims.

            Yeah, they do that because they don’t believe in benchmarks. You’ll see more mozilla folks whining about benchmarks than Opera. Haavard is just one employee, what he says in his personal blog/twitter doesn’t represent the company’s stance. If you were professional you’d be able to recognize that. Godforbid any browser company hired you and you still kept whining about Opera like you do here. The fact that opera can go toe-to-toe with competition in benchmarks made by other parties says much more than Chrome acing in their V8. Creating another redundant benchmark to make themselves look better and because “everyone’s doing it” is not something opera probably wants to do.

            Talk and no work? So you think that all Opera does is talk and their browser magically keeps improving in performance to beat Chrome every now and then? Or is creating a benchmark only means “work” for you? Yeah IE made progress, so did Opera and now FF. Learn to recognize that without spewing hatewords and give credit where it’s due, it’ll come handy in life.

            Every benchmark coming out now a days claims to be more “real-world”. Kraken says it’s more “real-world” than Sunspider. Peacekeeper and IE’s ones claim the same thing. If you are not satisfied with them, I’m sure Opera releasing one wouldn’t satisfy you either. If you want real-world benchmark, go test real-world web pages. Benchmarks by definition is synthetic. The world is too un-uniform and chaotic for there to be real-world benchmarks that represent every aspect of real-world websites.

          • nobody says:

            first: hard to do any tests using remote desktop on ipad in the middle of south african village (im on holidays). typing is hard enough and lag kills me.

            second: your 100 lines-of-text comment has almost no informational value just loud babbling

            optimizing for everything in DCE is in fact optimizing nothing! DCE is a ‘risk evaluation’ technique. it has a high cost, and doing it for EVERYTHING is in fact not optimizing at all. too much overhead for too many corner cases.

            it might lead to situation when it is much more costly than not doing it at all.

            compilers have easier time, because they.. well.. take their time to compile. JS is executed on the spot.

            as for the specifics of ie9, i have yet to see a link presented by you that shows more than it is listed in this article.

            adding ‘return’ or adding ‘return true’ DOES change things A LOT, because if function returns, then you cannot just skip it entirely – it isnt dead-weight code, so it must be evaluated and probably this part of DCE is a) not implemented yet b) deemed to be more costly than executing script itself. it is risk taking, and can be tweaked and probably will be.

            opera is a hypocrisy-driven company. they ‘do not believe in benchmarks’ but they BRAG A LOT about how fast opera in these benchmarks is (and sure it is, no denial here). they criticize other benchmarks, yet, they do not show anything for themselves IN THIS AREA (learn to read). they stomped on MS when MS provided tests for recently-famous HTML5 test, yet they themselves delivered 3 tests, that all were included. and they SHOUTED loudly that that test was biased towards the MS. no kidding?! all browsers were free to send testcases, MS did, others didnt, but it was opera that whined.

            in real-world testing opera fails, because it is almost not compatible with anything today. and it is not because browser sniffing, it is because opera devs long ago did too much sniffing of the ‘we know it all’. and now they have borked: key events (all of them), Function.prototype.bind, history navigation back (google instant), various ommisions in getters/setters etc. they always tried to swim upstream, and it will haunt them forever. look at this blog and at this post in particular: http://my.opera.com/sitepatching/blog/show.dml/21280702 (copying links on ipad sux)

            and see for yourself, that most of opera incompatibilities lies in opera OWN bugs

            as for your snide remark about my professional career.. i wish youd ever enjoy your work, company, colleagues and salary as much as i do.

  4. RamaSubbu SK says:

    Now its time to change the code in SunSpider JavaScript.

  5. filip007 says:

    “Dead Code Elimination” that’s funny, i guess they want more sloppy sites that will work better in IE, sure why not just say “Get off the net” then.

    • nobody says:

      every and i mean that – every compiler or even interpreter uses stuff like this. most probably other browsers use that too in their JIT modules. it simply is VERY effective, esp with javascript that has a very, very simple syntax.

      isolating forked-off code is the second step after sorting conditionals based on probability of success. basics of computing..

  6. Ichann says:

    Doesn’t Mozilla have anything better to do?

    All they seem to do is be butt hurt and claim this cheats and that cheats.

    Opera also cheated according to them.

  7. filip007 says:

    The point is that html writers must always improve code, this is just slack over slack. Only improving brings new things…

  8. daddylo says:

    Well well well … when Opera got good mark for Sunspider benchmark people say it is cheating …..
    My be IE cheating ….. who cares ? ….. just something IE users can happy about ….. and it is in betas so will see when it come out officially …..

    May be …. just may be, it is about time IE make there own JavaScript benchmark (optimised for their own JavaScript engine) … nuff said …..

  9. mikah says:

    nobody why all the incessant ranting at Opera even in a thread about Microsoft ?
    I’d love to know why you’ve got your knickers in a twist over Opera, must have been pretty bad to generate this much hate.