Page 1 of 2

[1.7.9] Problem with feed from serienjunkies.org

Posted: 19 May 2013, 18:40
by Sledge
Hi,

I have strange problems with the rss-feed from serienjunkies.org (the feedlink is http://serienjunkies.org/xml/feeds/episoden.xml).

  • Compared with Google Reader, not all elements seem to the find their way into the database
  • filtering/searching/updating is rather slow

Second point could be due to rather high volume (~ 100 items a day) and running tt-rss on a AMD E-350 (slightly faster than Intel Atom). Displaying just the unread items as fast as for all other feeds.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 13:36
by fox
Do you want me to subscribe to this in google reader to make a comparison or would you care to be a bit more fucking specific re: elements or whatever?

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 14:22
by Sledge
For comparison i filtered the feed for "englisch" items (which took more than 20sec on my rig).

First screenshot is rssowl, second one from tt-rss.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 14:45
by fox
>(which took more than 20sec on my rig).

1. I don't think your E-350 pocket calculator is qualified to be called a rig. A rig it is not.

2. If you actually searched before posting, you would have known that google reader pulls older articles from the shared database when you subscribe, which is not something that happens with tt-rss for obvious reasons. You only get what was in the feed when you subscribe.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 14:46
by fox
I can't seriously believe that I'm educating people on google reader now here. What have I become. Fuck.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 14:49
by fox
Ah, sorry, your feed problems are not caused by older articles. They are (as usual) caused by invalid feed data:

Code: Select all

<item>
   <title>[DEUTSCH] Navy.CIS.L.A.S04E07.Die.groesste.Welle.GERMAN.DUBBED.DL.1080p.WebHD.x264-TVP</title>
   <description>[DEUTSCH] Navy.CIS.L.A.S04E07.Die.groesste.Welle.GERMAN.DUBBED.DL.1080p.WebHD.x264-TVP</description>
   <pubDate>Mon, 20 May 2013 00:00:00 +0200</pubDate>
   <link>http://serienjunkies.org/serie/ncis-los-angeles/</link>
</item>
<item>
   <title>[DEUTSCH] NCIS.Los.Angeles.S04E07.Die.groesste.Welle.GERMAN.DUBBED.DL.720p.WebHD.h264-euHD</title>
   <description>[DEUTSCH] NCIS.Los.Angeles.S04E07.Die.groesste.Welle.GERMAN.DUBBED.DL.720p.WebHD.h264-euHD</description>
   <pubDate>Mon, 20 May 2013 00:00:00 +0200</pubDate>
   <link>http://serienjunkies.org/serie/ncis-los-angeles/</link>
</item>


As you see, the articles do not have unique identifiers set, so link is used as an identifier, which is shared amongst posts. Which is why you can't and never be able to see all content until fuckwit feed authors fix their shit.

Hope that explains things.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 18:12
by xaberus
Just a crazy idea...(and I think I am going to regret saying it aloud as I had no time too look at the code), but is it possible to use an md5 of the item in these cases?

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 18:19
by fox
You can make a plugin which will generate IDs based on article content.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 19:02
by xaberus
Ok, I looked at the code, but I think I am still missing something. In rssfuncs.php:update_rss_feed() I see

Code: Select all

$entry_guid = $item->get_id();
if (!$entry_guid) $entry_guid = $item->get_link();
if (!$entry_guid) $entry_guid = make_guid_from_title($item->get_title());

Shouldn't the items in the example get different guid's assigned (different titles)?

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 19:08
by fox
Please read my posts above.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 19:11
by sboulema
xaberus wrote:Ok, I looked at the code, but I think I am still missing something. In rssfuncs.php:update_rss_feed() I see

Code: Select all

$entry_guid = $item->get_id();
if (!$entry_guid) $entry_guid = $item->get_link();
if (!$entry_guid) $entry_guid = make_guid_from_title($item->get_title());

Shouldn't the items in the example get different guid's assigned (different titles)?


No cause after the second line entry_guid isnt null anymore.
at a quick glance, Swapping line 2 and 3 should fix your problem...

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 19:19
by fox
Next step is whining here about duplicate articles in all feeds.

Edit: or missing articles, but I digress.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 19:44
by xaberus
fox wrote:Please read my posts above.

Argh, the link is checked first then the title and the links are the same, so we get the same id for different items. Slip of thought. Sorry.

fox wrote:Next step is whining here about duplicate articles in all feeds.
Edit: or missing articles, but I digress.

As a side note, I just checked the RSS 2.0 spec and it says all elements of item are optional (but there must be at least one). WTH? On the other hand, stuffing every item through sha1() just to get an id would be a very bad idea as you said. Bad luck for the original poster, I guess. :-(

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 19:58
by fox
I think best case scenario here is mocking feed author until he puts his head out of his asshole and fixes his shit.

If that is for some reason unfeasible, it is possible to make a feed-specific plugin which will sit on HOOK_FEED_FETCHED, parse document XML, insert correct guid elements based on title / moon phase / whatever, and pass it forward to tt-rss update mechanism.

There are people on the forum who can help with the latter, but I personally recommend the former.

Re: [1.7.9] Problem with feed from serienjunkies.org

Posted: 20 May 2013, 20:39
by Sledge
Well, I will complain in the forum over at serienjunkies.org then.