August 19, 2007

Flickr Counting

Vanity is necessarily involved in uploading photos to a publicly viewable site, and Flickr indulges that vanity by keeping count of views, favorites, and even compute some artificial version of your pictures' Interestingness. For the hot chicks, it's easy to draw tons of admirers to your online photo albums. Not so much when it's just me and my random snaps and vacation shots with the SO (although she's drawn some attention, too).

Nevertheless, through the accumulation of family members' cursory interest, strangers' random clicks, and persistent whoring out pictures to Flickr Groups, I've managed to build up some respectable numbers of photo views. Flickr tells you a total number of Views on the front page, but that's only the views of the Photostream page itself, not counting views coming from other click-throughs like blog links or group photo-pools. So there's no single stat summing up all the actual views of the pictures themselves.

Now, being one of the pioneers of so-called Web 2.0, Flickr offers a complete array of programming APIs for any competent programmer to access stats, photos, etc. via SOAP, XML-RPC, JSON, using whatever programming language or web-app one may desire. Note the qualification of "competent" and "programmer" of which I am neither. I have done my share of data-munging back in the day, though. So it was old-school, brute-force, screen-scraping action for me.

  1. Use curl to download all the photo thumbnail pages, from 1 to however many (28 so far). Use the -b option and borrowing the cookies file from Camino/Firefox for login in order to make the photo views' stat visible on the page.

    curl -O -b cookies.txt "[1-28]"

  2. Filter out the actual number of views for each picture with

    grep -h "</b> views" page* | egrep -o "[0-9]+" > views.txt

  3. Actually add up the big list of numbers in views.txt and output totals and averages, using awk. The actual program is trivial and left as an exercise for the reader. Folks with a clue may use perl and real men can use Python or whatever, but really awk is all you need.

No need to tell me what a pathetically unstable and inelegant hack the above method is.The script took less a minute to run and got the job done. Averaging 24.15 views per photo for 504 pictures as of today. Sweet.

Posted by mikewang on 08:12 PM