Feed issue: toucharcade.com +1

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only
Striker21
Bear Rating Trainee
Bear Rating Trainee
Posts: 42
Joined: 27 Oct 2015, 00:30

Feed issue: toucharcade.com +1

Postby Striker21 » 11 Sep 2016, 18:53

VPS, Ubuntu 14.04, PostgreSQL, TT-RSS v16.8 (main git)

CASE 1
Small mystery with a feed that seem to just not pull the latest page.
Tested http://toucharcade.com/feed/ on https://fakecake.org/myfeedsucks/

And seeing this <lastBuildDate>Sat, 03 Sep 2016 14:21:15 +0000</lastBuildDate>

If I pull the website up myself on my PC I'm getting a newer version:
<lastBuildDate>Sun, 11 Sep 2016 03:58:28 +0000</lastBuildDate>

The server has been fetching data just fine since 26th August but the last entry is 3rd of September (nothing after that).

I'm guessing its not tiny, though it is also a weird one. Seems its using a cashed version of the page, is there some way of forcing tiny to refresh beyond what I've tried and yes its been a full week of nothing? Also tried "shift-f-s" with no further luck. Feedback welcome :)

CASE 2
The other one I'm struggling to understand is from http://www.slantmagazine.com/rss

I've a few missing entries randomly within the feed, e.g. one of them is still in the RSS feed but it does not appear in Tiny (see code below).

From debugger (the first below is fetched, the second "Neon Bull" is NOT fetched and the third works fine):

Code: Select all

[18:06:56/24897] guid 1,#When:19:41:00Z / SHA1:adbf06e8b09c407a54994e9e9b98d2121cb57e65
[18:06:56/24897] orig date: 1473450060
[18:06:56/24897] date 1473450060 [2016/09/09 19:41:00]
[18:06:56/24897] title Okkervil River: Away
[18:06:56/24897] link http://www.slantmagazine.com/music/review/okkervil-river-away
[18:06:56/24897] author
[18:06:56/24897] num_comments: 0
[18:06:56/24897] looking for tags...
[18:06:56/24897] tags found: music
[18:06:56/24897] done collecting data.
[18:06:56/24897] article hash: a691510f48dac646039db6bed8cc976b69170e48 [stored=a691510f48dac646039db6bed8cc976b69170e48]
[18:06:56/24897] stored article seems up to date [IID: 166147], updating timestamp only
[18:06:56/24897] guid 1,#When:19:20:00Z / SHA1:0ff6eb8a56576b661e32d5d14806cf3847cbb75c
[18:06:56/24897] orig date: 1473448800
[18:06:56/24897] date 1473448800 [2016/09/09 19:20:00]
[18:06:56/24897] title Neon Bull
[18:06:56/24897] link http://www.slantmagazine.com/dvd/review/neon-bull
[18:06:56/24897] author
[18:06:56/24897] num_comments: 0
[18:06:56/24897] looking for tags...
[18:06:56/24897] tags found: dvd and blu-ray
[18:06:56/24897] done collecting data.
[18:06:56/24897] article hash: 57e58373d3b577ad978ad6ef9ecbeacf3231536b [stored=7da3fac698345ce844de609fbf3f7d35fa7ca95e]
[18:06:56/24897] hash differs, applying plugin filters:
[18:06:56/24897] ... Af_Readability
[18:06:56/24897] === 0.0000 (sec)
[18:06:56/24897] ... Af_Unburn
[18:06:56/24897] === 0.0000 (sec)
[18:06:56/24897] plugin data: af_readability,af_unburn,
[18:06:56/24897] matched filter rules:
[18:06:56/24897] filter actions:
[18:06:56/24897] article labels:
[18:06:56/24897] force catchup:
[18:06:56/24897] base guid found, checking for user record
[18:06:56/24897] initial score: 0 [including plugin modifier: 0]
[18:06:56/24897] user record FOUND
[18:06:56/24897] RID: 82490, IID: 82398
[18:06:56/24897] assigning labels [other]...
[18:06:56/24897] assigning labels [filters]...
[18:06:56/24897] looking for enclosures...
[18:06:56/24897] article enclosures:
Array
(
)
[18:06:56/24897] filtered article tags:
Array
(
    [0] => dvd and blu-ray
    [1] => the house next door
)
[18:06:56/24897] article processed
[18:06:56/24897] guid 1,#When:16:21:00Z / SHA1:330734c863805705b23b39de3afdec41a88e587d
[18:06:56/24897] orig date: 1473438060
[18:06:56/24897] date 1473438060 [2016/09/09 16:21:00]
[18:06:56/24897] title It Had to Be Frank Sinatra: Tony Rome and Lady in Cement on Blu-ray
[18:06:56/24897] link http://www.slantmagazine.com/house/article/it-had-to-be-frank-sinatra-tony-rome-and-lady-in-cement-on-blu-ray
[18:06:56/24897] author
[18:06:56/24897] num_comments: 0
[18:06:56/24897] looking for tags...
[18:06:56/24897] tags found: the house next door
[18:06:56/24897] done collecting data.
[18:06:56/24897] article hash: 5bac83cb8501d9212cd86a6588b17d66084d20d8 [stored=5bac83cb8501d9212cd86a6588b17d66084d20d8]
[18:06:56/24897] stored article seems up to date [IID: 163564], updating timestamp only


Seems the failing one refers to a stored HASH 7da3fac698345ce844de609fbf3f7d35fa7ca95e

That HASH is in my database, its another feed entry and further down in the current RSS feed (I guess that is part of the problem somehow, though I don't understand why and if there's something else here that should give me clues):
Fool the World: The Pixies’s Trompe le Monde Turns 25
SHA1:0ff6eb8a56576b661e32d5d14806cf3847cbb75c
http://www.slantmagazine.com/house/arti ... e-turns-25


Update: On the latter one I think the guid might be to blame, not sure I have a way to avoid it. The issue article and the hash it detects both have #When:19:20:00Z as GUID. So happy for your thoughts on this, but resolved 2nd one with feedcleaner for now, replacing the guid part, hropfully i got that regex right.

Code: Select all

        "URL": "http://www.slantmagazine.com/rss",
        "type": "regex",
        "pattern": "/<guid>(.*?)<\\/guid>/",
        "replacement": ""


Striker

Striker21
Bear Rating Trainee
Bear Rating Trainee
Posts: 42
Joined: 27 Oct 2015, 00:30

Re: Feed issue: toucharcade.com +1

Postby Striker21 » 16 Sep 2016, 00:41

Did somebody have two cents on CASE 1 yet?

linoth
Bear Rating Trainee
Bear Rating Trainee
Posts: 22
Joined: 15 May 2013, 11:34

Re: Feed issue: toucharcade.com +1

Postby linoth » 16 Sep 2016, 03:15

Sure, I'll take a stab at it.

So you have a feed that's serving old data to TTRSS and a feed checker that probably uses the same method as TTRSS.

The final line of the feed mentions "WP-super-cache."

The hell is that? To Google.
This plugin generates static html files from your dynamic WordPress blog. After a html file is generated your webserver will serve that file instead of processing the comparatively heavier and more expensive WordPress PHP scripts.

...

Legacy caching. This is mainly used to cache pages for known users. These are logged in users, visitors who leave comments or those who should be shown custom per-user data. It's the most flexible caching method but also the slowest. As each page is different it's often better not to cache pages for these users at all and avoid legacy caching. Legacy caching will also cache visits by unknown users if this caching mode is selected. You can have dynamic parts to your page in this mode too.

Couldn't possibly be related. Old data being sent by a server using a WordPress plugin specifically designed to send old data to recognized users to save server load.
<!-- Cached page generated by WP-Super-Cache on 2016-09-03 11:00:03 -->

Yep, couldn't possibly be related to somebody's (possibly badly configured) WordPress plugin. That's not something you see every day.

I'll leave anything further to someone more experienced.

Edit:
For shiggles, I tried http://toucharcade.com/feed/index.php on the feed checker and it produced a different file, but I doubt that will permanently avoid the problem.

Striker21
Bear Rating Trainee
Bear Rating Trainee
Posts: 42
Joined: 27 Oct 2015, 00:30

Re: Feed issue: toucharcade.com +1

Postby Striker21 » 17 Sep 2016, 17:15

Good catch linoth, I had not noticed. That might indeed have something to do with it, but I also don't know WP enough as say for sure either. I'd like to send them a message to fix something specific, so if I do I guess I can refer to it as maybe being the reason.

I did take your suggestion on adding the index php to it and TINY TINY RSS updated the feed for the first time since 3rd September, though question is how long that will last I guess. That said, still not sure as to why it would work to refresh

What still bugs me is that I can get a refreshed feed in a browser but that tiny somehow does not manage the same.

Christian

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Feed issue: toucharcade.com +1

Postby fox » 17 Sep 2016, 18:44

>What still bugs me is that I can get a refreshed feed in a browser but that tiny somehow does not manage the same.

the plugin probably does something like "if (browser user agent) don't cache".

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: Feed issue: toucharcade.com +1

Postby JustAMacUser » 17 Sep 2016, 22:33

The WP Super Cache plugin uses PHP to built static HTML files (it can run in a few different ways, but this is the "best" way for visitor performance). They also suggest some fancy rewrite rules on the web server so that if a cached HTML file exists for the requested page, the web server uses it; otherwise it hits the PHP interpreter. This makes the site fast (especially under load). The problem is that if PHP is never hit, garbage collection never runs and aged files never get cleaned up.

The solution is to make sure WordPress's internal cron is run (wp-cron.php) and that WP Super Cache is setup to enable garbage collection.

None of this is relevant for the TT-RSS forums but, Striker21, you should contact the webmaster for the site in question because clearly their site is broken.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Feed issue: toucharcade.com +1

Postby fox » 17 Sep 2016, 23:03

well this is wordpress, it's broken by design

Striker21
Bear Rating Trainee
Bear Rating Trainee
Posts: 42
Joined: 27 Oct 2015, 00:30

Re: Feed issue: toucharcade.com +1

Postby Striker21 » 20 Sep 2016, 16:27

JustAMacUser wrote:The WP Super Cache plugin uses PHP to built static HTML files (it can run in a few different ways, but this is the "best" way for visitor performance). They also suggest some fancy rewrite rules on the web server so that if a cached HTML file exists for the requested page, the web server uses it; otherwise it hits the PHP interpreter. This makes the site fast (especially under load). The problem is that if PHP is never hit, garbage collection never runs and aged files never get cleaned up.

The solution is to make sure WordPress's internal cron is run (wp-cron.php) and that WP Super Cache is setup to enable garbage collection.

None of this is relevant for the TT-RSS forums but, Striker21, you should contact the webmaster for the site in question because clearly their site is broken.


Thanks, I'll reach out to them to ask and also talk to another site that I just also noticed also have a big gap in tiny (and the super cache on). Presumably something similar causing the same issue.


Return to “Support”

Who is online

Users browsing this forum: No registered users and 11 guests