VGTech is a blog where the developers and devops of Norways most visited website share code and tricks of the trade… Read more



Are you brilliant? We're hiring. Read more

Safari on iOS 5 Randomly Switches Images, Part 3

DevOps

We are still digging deeper into the imagebug problem we’ve mentioned in part 1 and part 2.

We’ve not been able to create a synthetic setup that triggers the bug, but managed to automate, identify, alert and log using our production environment.

We’ve modified our code so that on every pageview on http://touch.vg.no the client loops through all the images and check if the dimensions of the image and the surrounding container match using javascript.

If some of the images do not match, the images are marked in red and the client posts as much information as possible about the problem to a server side script that logs to splunk

Using splunk we have tried to figure out what type of clients that triggers the bug and this is what we found so far:

  • It seems to be a problem on all browsers that has pipelining enabled
  • Opera Mini does funky stuff on images by design so it’s a false positive
  • iOS5 is overrepresented
  • Opera on Android (And Symbian) has all kinds of issues.
  • Native Android browser has issues, but at a much lower rate than iOS and Opera

Here is a query from splunk looking at the user-agent for all browsers that triggered the bug the last 24 hours.

sourcetype="imagebugs" NOT "Opera Mini"| rex field=useragent "(?<agent>Opera|Android|Symbian|iPhone|Windows Phone)" | top agent
place useragent count percent
1 iPhone 1340 75.791855
2 Android 247 13.970588
3 Opera 148 8.371041
4 Symbian 33 1.866516

At this point we put up a test environment to test all variants:

  • Hardware: Macbook Air
  • Operations System: OS X Lion
  • Chromium latest daily daily snapshot with pipelining turned on using chrome://flags
  • Firefox 9.0.1 with pipelining turned on using about::config
  • Opera 10.60 (pipelining enabled by default)
  • iOS Simulator 5.0 from the iPhone SDK
  • Android Emulator form the Android SDK
  • Network Link Conditioner (from Lion Xcode) to emulate differnt types of network
  • Wireshark listening on port 80
  • http://touch.vg.no/index2.php – this page uses a singel host for all images to maximise occurrence of the problem (using the parameter ?time=hammer will reload the page until it fails.)
This machine has been running at home, and in the office, on wired lan, wireless lan and on wireless lan through OpenVPN (ssl-vpn)
We’ve managed to trigger the bug in all network condition except when running through OpenVPN (SSL-VPN) with the iPhone emulator. Lowering the network quality seem to increase the bugrate.
We have not managed to trigger the bug in the Android emulator or any of the other browsers
This is the normal setup at www.vg.no which shows the error:

Since the two other major newspapers in Norway have reported the same problem and they don’t use varnish we had a suspicion that the concept of loadbalancing would be the triggering factor. So to narrow down the problems we put a varnish directly on the internet with a public ip and hammered it with all the different browsers in our test environment.

The only browser we consistently managed to trigger the error on was the iOS iphone emulator running iOS 5.0.1. It took anything from 15 to 1000 reloads to trigger, averaging around 170 reloads.

For anyone interested in diving into this why this happens: Here is a bug triggered pretty early on a wired net without any traffic-shaping.

Screenshot when the bug triggers:

Screenshot of which pictures that failed.

  • PCAP-file – taken client side (all 37 attempts) using wireshark on the test environment

The first is the correct picture (of the soccer guy cheering):

http://t10.vg.no/drfront/images/2012/01/26/c=493,65,614,377;w=238;h=146;26646.jpg

The next images which is supposed to be a picture of a guy that bought lots of planes for the Norwegian Airline, but is replaced with the image above:

http://static03.vg.no/drfront/images/2012/01/26/c=190,16,782,238;w=338;h=103;26605.jpg

is then replaced with the above one.

Using wireshark, look at tcp.stream eq 60 in wireshark to see where things go wrong. In this case it seems like it actually requests the image twice before the reply. But that does not seem to be the case always.

Feedback appreciated!

Head of IT Operations. Unix and Linux hacker. Loves shellscript, splunk, perl and node.js


10 comments

  • Maarten

    Hi doesn't all touch.vg.no images display in opera mini on iPad an iPhone ? I was also wondering why movies play in safari but in opera mini I get "no flash"

    Kind regards

    M


  • maarten

    I meant why doesn't all of the images display in opera mini on iPhone and iPad like they do on db.no;)


  • Jan Vidar Krey

    Did you ever get to the bottom of this, or is this still an issue with pipelining?

    It seems like there is a bug in the iOS browser that causes it to request the same resource twice (packets 10631 and 10632) with a few milliseconds in between over the same pipeline.
    If nothing else, at least sub-optimal.


  • Richard Peacock

    On part 2, you mentioned adjusting the URL to include a cachebreaker for the images. Do you mean adding some randomness to the image URL? Like this: fun.jpg?rnd=345324. Would that actually prevent the problem? I ask because we have encountered this problem as well, and am just looking for any options I can.


  • maarten

    Hi this is still an issue. Why?


  • Yohann Richard

    Are you guys still having the issue, or were you able to solve it somehow ? We might see something similar on our end, although we don't repro this consistently.


    • Audun Ytterdal

      We have not solved the actual issue. We've just done stuff to minimize the possibility for the client to run pipelining by increasing the number of hosts we deliver images from and also using lazy loading of images on the front page. I've experienced that the Facebook iphone app has shown the wrong picture several times in the past. (It used to be very often earlier, now it is pretty rare) is it the same problem you think?

  • André Roaldseth

    Jan Vidar: Yes, it's still an issue, even in iOS 5.1.1.

    Richard: That would work yes, but give your users a slow user experience.

    We've currently "solved" it by using lazy loading (on scroll) for our images. This works because it doesn't give Safari enough simultaneously downloads to utilize pipelining.


  • pker868

    Hello,
    this set of articles are really interesting.
    I'd like to know if you already tryed to set up the test page to load only png images and if there is any statistic about no-png errors.

    I'll try to explain my idea: maybe Iphone seems to be an outlayer due to the step that manage "other formats" and convert them in PNG. Maybe the alghorytm simply adds more errors while the device perform pipelining. Should be interesting to convert all the images of the test page in PNG to see if this can be a discrimant. Thank you and best regards.


  • ET

    Do you know if this is solved in iOS6? Unfortunately I can't find such info


Leave your comment