Feeds downloaded twice during update by update.php --feeds?

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only
User avatar
firewyre
Bear Rating Trainee
Bear Rating Trainee
Posts: 19
Joined: 17 Apr 2013, 01:05
Location: Boston, MA
Contact:

Feeds downloaded twice during update by update.php --feeds?

Postby firewyre » 15 Nov 2013, 08:17

Hey Fox - this is probably a weird use case, but here goes. I belong to a site that provides an API for querying their database via HTTP with the results sent back as RSS content. They give you an apikey to use as a query string parameter and count how many API hits you make a day, with a cap of 1500 every 24 hours. I have 33 such feeds in tt-rss, and cron is currently setup to update them every 3 hours, or 8 times a day. This should equal 264 api hits a day. Instead, I'm seeing them record exactly double that, 528 hits per day. This means I'll be running into the limit on api hits much sooner than I hoped, and will eventually have to dial down the update frequency to compensate.

So two questions... am I right that feeds are downloaded twice each while updating via update.php, and if so, is there any chance of getting that down to a single download per feed? I'm on v1.10.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby fox » 15 Nov 2013, 09:05

>am I right that feeds are downloaded twice each while updating via update.php

That would be quite stupid, wouldn't it?

e: most probably, either the crontab is wrong or your math is somehow off.

User avatar
firewyre
Bear Rating Trainee
Bear Rating Trainee
Posts: 19
Joined: 17 Apr 2013, 01:05
Location: Boston, MA
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby firewyre » 15 Nov 2013, 16:42

That would be quite stupid, wouldn't it?
e: most probably, either the crontab is wrong or your math is somehow off.

I know those are the most likely culprits, which is why I triple-checked everything before I risked facing your wrath. I even changed my API key and did a replace in all 33 feeds via SQL to be absolutely sure nothing else was reading these. At this point I'm 100% sure it's only this single instance of tt-rss. But maybe you'll find an error below, and I'll facepalm myself and we can both move on :)

Here's my crontab line, pretty vanilla (every 3 hours on the hour - my update log seems to confirm this):

0 */3 * * * MYUSERNAME /usr/bin/php /usr/syno/synoman/phpsrc/tt-rss/update.php --feeds --log /volume1/homes/MYUSERNAME/updateLog.x

Here's the API hit count straight from the site in question (528):

Image

And the number of feeds from this source, pulled from phpPgAdmin (33):

select * from ttrss_feeds where feed_url like '%foobar.org%'
Image

It seemed like too much of a coincidence that 528 is EXACTLY double the expected number of 264 [33 x 8] hits, so I figured I'd reach out. Does anything jump out at you as a mistake?

Actually, I just looked at the source and it looks like get_favicon_url calls fetch_file_contents, and that's called by check_feed_favicon, which is called during the primary update code in include/rssfuncs.php, presumably as a second call from the call to actually get the RSS content? Although I do see code in here that tries to update the icon only every 12 hours, so this shouldn't exactly double the hit count... I'm going to try hard-coding $favicon_needs_check to False for a day to see what impact this has.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby fox » 15 Nov 2013, 18:29

This is strange. You can add logging to fetch_file_contents() in functions.php which should be the only function tt-rss calls to download stuff.

Add something like this at the top of the function:

Code: Select all

error_log(date("H.i.s Y-m-d") . " Requesting: $url\n", 3, "/tmp/ttrss_fetch.log");


Depending on whether /tmp is writable on your system you may need to choose another folder. Then count the amount of URL mentions in the file.

You are right in that it could be favicon-related but it shouldn't check all the time. Either that or some plugin you have enabled requests the URL again for whatever reason (try disabling all plugins).

e: You can also try updating one feed using f D hotkey and see if that increases the counter twice.

User avatar
firewyre
Bear Rating Trainee
Bear Rating Trainee
Posts: 19
Joined: 17 Apr 2013, 01:05
Location: Boston, MA
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby firewyre » 15 Nov 2013, 19:27

Thanks, I'll give all that a go and report back.

User avatar
firewyre
Bear Rating Trainee
Bear Rating Trainee
Posts: 19
Joined: 17 Apr 2013, 01:05
Location: Boston, MA
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby firewyre » 15 Nov 2013, 21:51

Very interesting results so far:

  • The log file we wrote to indeed shows fetch_file_contents running once per feed
  • I ran an adhoc update of all feeds, API hits went up by 66 rather than 33
  • I tried f D on one of these feeds, log showed one new entry but hit count went up by two
  • I hit the feed URL manually in a browser, API hit only went up by 1 (their site isn't double counting - a possibility I wanted to rule out)
  • I forced one of these feeds to update by clicking the feed again in the tt-rss UI, again the API hit counter went up by two
Perhaps the double-hits are coming from the implementation of fetch_file_contents. It clearly gets executed once per feed, but maybe it's somehow using curl twice? I'm about to dig into this possibility, as it's about the only thing I can think of at this point.

I will also try disabling plugins as you suggested, but I only have the following enabled:

  • auth_internal
  • updater
  • bookmarklets
  • note
And of these it looks like I can only disable bookmarklets.
Last edited by firewyre on 15 Nov 2013, 22:16, edited 1 time in total.

User avatar
firewyre
Bear Rating Trainee
Bear Rating Trainee
Posts: 19
Joined: 17 Apr 2013, 01:05
Location: Boston, MA
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby firewyre » 15 Nov 2013, 22:03

OK, added some more logging, here's what I got:

17.59.19 2013-11-15 In CURL block!!
17.59.19 2013-11-15 In safe_mode/open_basedir block!!
17.59.20 2013-11-15 geturl just called!!
17.59.20 2013-11-15 calling curl_init on new url!!

geturl calls curl_exec and then after it returns, curl_exec is called again via "$contents = @curl_exec($ch);" just below. Could this account for the double hits?

I think I might try defining NO_CURL in config.php and seeing what I get.
Last edited by firewyre on 15 Nov 2013, 22:17, edited 1 time in total.

User avatar
firewyre
Bear Rating Trainee
Bear Rating Trainee
Posts: 19
Joined: 17 Apr 2013, 01:05
Location: Boston, MA
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby firewyre » 15 Nov 2013, 22:11

Indeed, defining NO_CURL as true and updating via f D caused the API hit to only go up by 1. Woo-hoo!! Is there a performance benefit to using curl vs. the other method of downloading the feed?

It looks like geturl is there to see if there's a redirect response that it needs to follow in order to get at the actual content, and to determine that it needs to make a request and check the HTTP response code, resulting in a second server call. I logged the value of safe_mode, which was empty, and open_basedir was equal to "/etc.defaults:/usr/bin/php:/usr/syno/synoman:/etc:/var/run:/tmp:/var/spool/php:/volume1/@tmp/php:/var/services/web:/var/services/photo:/var/services/blog:/var/services/homes". I'm curious as to why geturl is called when open_basedir has a value, but isn't otherwise (assuming safe_mode's also empty). Wouldn't that redirection logic need to happen regardless of the value of open_basedir? Anyway, I'm running up again the limits of my PHP knowledge, so I'll stop here :)

Looking forward to hearing your thoughts.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby fox » 15 Nov 2013, 23:49

You're right, in open_basedir + curl configuration curl magically loses ability to follow redirects (don't ask me why) so this geturl() stuff is needed. I think the request is supposed to be headers only but I guess it still counts against your quota. Nice digging, I completely forgot about this thing.

You can disable curl, you shouldn't lose anything important except for some plugins that require it and pubsubhubbub.

User avatar
firewyre
Bear Rating Trainee
Bear Rating Trainee
Posts: 19
Joined: 17 Apr 2013, 01:05
Location: Boston, MA
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby firewyre » 16 Nov 2013, 00:08

Awesome, good to know. It also looks like I have the option to keep using curl by disabling open_basedir given what I just read about it being a restriction, and that by default PHP has access to all directories on a system. I have tt-rss running on a personal Synology box that came with open_basedir pre-populated. For anyone reading this who runs into a similar issue, here's the recommended way to disable open_basedir on a Synology device: http://forum.synology.com/enu/viewtopic.php?f=34&t=54617

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Feeds downloaded twice during update by update.php --fee

Postby fox » 16 Nov 2013, 00:27

My personal opinion is that open_basedir is a really silly kludge and a very stupid approach to ensuring application security (which is par for the course for php as a platform in general, it seems that they always choose the most idiotic solution to any problem), unfortunately a lot of people who don't know any better swear by it. The fact that a lot of really shitty code is written in php doesn't help things either.

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Feeds downloaded twice during update by update.php --fee

Postby feader » 16 Nov 2013, 00:47

fox wrote:You're right, in open_basedir + curl configuration curl magically loses ability to follow redirects (don't ask me why)

I believe this is intentionally done by php since a redirect could return URLs like file://some/path/ which could violate open_basedir restrictions, and php can't filter such URLs because libcurl is a black box and has no option tailored to this particular brand of php weirdness.
fox wrote:(which is par for the course for php as a platform in general, it seems that they always choose the most idiotic solution to any problem)

Oh yes. But tbf, php's libxml wrapper is much better then python's (nicest thing about php that comes to mind :? ).


Return to “Support”

Who is online

Users browsing this forum: No registered users and 9 guests