Page 1 of 1

Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 20:38
by HunterZ
So I subscribed to the following forum-generated feed yesterday: http://prospector.freeforums.org/feed.php

This is an extremely low-volume forum, but when I checked tt-rss this morning I saw over 800 new "articles" listed. Turns out that it was listing dozens of copies of each RSS article as if they were different, and I'm not sure why. I don't see any redundant articles or unreasonable-looking timestamps in the current feed source, but I'm not an RSS guru.

Re: Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 20:47
by xtaz
Cuz it looks like some idiot has put a session id in the id field which is going to change every time the session is renewed, which is probably every time the feed it fetched. Id should be permanent and never change. It's this which is used to track articles.

Re: Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 20:56
by HunterZ
I must be blind - can you point me to a specific example? When I look at the feed code, I'm only seeing id tags with post URLs containing only topic and post numbers.

Re: Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 21:05
by xtaz
That's weird! When I look at it now there's no sid. Looks like it's randomly changing the content then. Even better! Basically when I looked at it 10 minutes ago it looked like this:

Code: Select all

<id>http://prospector.freeforums.org/viewtopic.php?t=371&amp;p=1640&amp;sid=1bbcf113816943435593fd5943159cf7#p1640</id>

Re: Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 21:19
by HunterZ
Awesome. I've unsubscribed, as the maintainer of the forum hasn't even logged in all year - and probably has no idea that the RSS feature even exists in the first place. I just wanted to make sure it wasn't something on tt-rss' end.

Re: Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 21:31
by feader
We have had this before - you don't see the sids if cookies are set, i.e.

Code: Select all

$curl -c /tmp/cookieJar.txt prospector.freeforums.org/feed.php
[…]
<link rel="self" type="application/atom+xml" href="http://prospector.freeforums.org/feed.php?sid=57fd73e259655a6199740b6ff771491d" />
<feed data with sids>

$curl -b /tmp/cookieJar.txt prospector.freeforums.org/feed.php
[…]
<link rel="self" type="application/atom+xml" href="http://prospector.freeforums.org/feed.php" />
<feed data without sids>

You can use the ff_FeedCleaner plugin to erase the sids. It was originally created for such a case (and back then, it also was a forum feed that showed this behaviour).

Re: Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 22:52
by ml78
Same problem with this feed : http://www.romandie.com/rss/flux.xml

Each refresh always gets 150 new feeds, most of them the same as previous.
If I check the feed a few hours later, I can get dozens of same Subject and Time feeds

Re: Feed causes tt-rss to go crazy?

Posted: 13 Aug 2013, 23:05
by HunterZ
Ooh, a regex plugin. Thanks!

Re: Feed causes tt-rss to go crazy?

Posted: 14 Aug 2013, 15:50
by feader
ml78 wrote:Same problem with this feed : http://www.romandie.com/rss/flux.xml

Problem here is that the guid changes:

Code: Select all

[…]
<guid>http://www.romandie.com/news/n.asp?n=Nouvelles_regles_de_cybersecurite_pour_l_administration_federale22140820131310.asp-16959</guid>
[…]
[a bit later]
[…]
<guid>http://www.romandie.com/news/n.asp?n=Nouvelles_regles_de_cybersecurite_pour_l_administration_federale22140820131310.asp-16606</guid>
[…]

more precisely it does so in the suffix /-[0-9]+$/. Best course is to tell the content provider to omit these suffixes, or omit the <guid> tag entirely since the <link> stuff looks fine.
In the meantime, you could remove the guid yourself, or use ff_FeedCleaner with

Code: Select all

[
    {
        "URL": "www.romandie.com/rss/",
        "type": "xpath_regex",
        "xpath": "//item/guid",
        "pattern": "/-[0-9]+$/",
        "replacement": ""
    }
]

(Disclaimer: I didn't test it).

Re: Feed causes tt-rss to go crazy?

Posted: 16 Aug 2013, 01:48
by ml78
Thanks. Works fine for me.