Page 1 of 1

af_gocomics plugin and xpath non-matching?

Posted: 12 Jul 2013, 18:26
by frameskip
So I was taking a look at the gocomics plugin the other day, and noticed that fox is pulling the html contents on gocomics.com and using xpath to scrape out the comic URL. Using $entries = $xpath->query('(//img[@src])'); he's able to grab the image from the following (for example):

Code: Select all

<img width="600" src="http://assets.amuniversal.com/b116ca2047f1013010fe001dd8b71c47" onload="Meebo('makeSharable',{element:this, type:'image', shadow:'none', title:'Calvin and Hobbes', url:document.location.href, tweet:'Check out Calvin and Hobbes on GoComics', description:'Check out Calvin and Hobbes on GoComics'})" class="strip" alt="Calvin and Hobbes">


When I looked at the gocomics.com html source, I noticed that there's also a link to a larger version of the comic immediate following, which I would like to display instead:

Code: Select all

<div style="display: none;" id="mutable_972267"><img src="http://assets.amuniversal.com/b1fb3b1047f1013010fe001dd8b71c47?width=900.0" class="strip" alt="B1fb3b1047f1013010fe001dd8b71c47?width=900"></div>


However, after doing some debugging on the plugin, it seems that for some reason the xpath query "//img[@src]" does not find this entry. I was ultimately able scrape it out using "/div/img", but I was wondering if someone could help me understand why this is not found by an "//img" query (that should be pulling all <img> tags).

Cheers.

Re: af_gocomics plugin and xpath non-matching?

Posted: 13 Jul 2013, 04:05
by lotrfan
The XPath should find it, but the filtering loop (the "foreach ($entries as $entry) { ...") stops on the first img that has a src url matching "http://assets.amuniversal.com/..." .

I updated the plugin to try to find the larger image, and then fall back on the regular size if necessary: https://github.com/lotrfan/Tiny-Tiny-RSS/commit/dc4dbdf5e209b6966c313a1c7c1e760dc46b1589

Re: af_gocomics plugin and xpath non-matching?

Posted: 14 Jul 2013, 07:28
by fox
Thanks, I'll merge it. It would probably be faster to do specific xpath queries instead of iterating but eh.