Plugin question - not sure if possible

Post plugins and custom CSS snippets here
bmckenna
Bear Rating Trainee
Bear Rating Trainee
Posts: 33
Joined: 19 Mar 2013, 21:41

Plugin question - not sure if possible

Postby bmckenna » 16 May 2014, 20:02

I checked the "making plugins" page on the wiki which basically said "there's no official information about making plugins" so I figured I'd ask some of the more experienced folks if what I'm looking to do is possible.

Currently, I have 4 RSS feed entries set up for SlickDeals to monitor deals posted to the website. The deal categories are "frontpage deals," "hot topics," "hot deals," and "up and coming." There's a decent amount of overlap between these feeds. The articles in each of these feeds all link to forum threads, and there's overlap between the categories for the same forum thread, but the links are slightly different - example:

Code: Select all

http://feedproxy.google.com/~r/SlickdealsnetUP/~3/6ULlBmUEc-o/6934468-500gb-western-digital-blue-laptop-hard-drive-39
http://feedproxy.google.com/~r/SlickdealsnetHT/~3/6ULlBmUEc-o/6934468-500gb-western-digital-blue-laptop-hard-drive-39


Both of those direct to the same forum thread so browsing through all of the threads means I'm seeing a decent amount of duplicates.

Is it possible to create a plugin that while viewing all deals in the "Deals" category I have set up for the 4 RSS feeds it would ignore near-duplicate entries in the feeds - i.e. to parse the feed link (ignoring UP vs HT) and not display articles that are duplicates?

Thanks for your attention!

scottjl
Bear Rating Trainee
Bear Rating Trainee
Posts: 8
Joined: 17 Mar 2013, 21:51

Re: Plugin question - not sure if possible

Postby scottjl » 17 May 2014, 00:19

You mean like

Preferences -> General -> Allow duplicate articles ?

Toggle that off and dupes go away, at least for me.

bmckenna
Bear Rating Trainee
Bear Rating Trainee
Posts: 33
Joined: 19 Mar 2013, 21:41

Re: Plugin question - not sure if possible

Postby bmckenna » 17 May 2014, 00:37

scottjl wrote:You mean like

Preferences -> General -> Allow duplicate articles ?

Toggle that off and dupes go away, at least for me.


It's toggled off already. As mentioned in the first post, the articles are not strictly duplicates of each other...there are slight differences in the URLs of the article...so that's probably why they're not being seen as duplicates.

bmckenna
Bear Rating Trainee
Bear Rating Trainee
Posts: 33
Joined: 19 Mar 2013, 21:41

Re: Plugin question - not sure if possible

Postby bmckenna » 21 May 2014, 00:51

Bump.

I'm thinking this could be accomplished via database script relatively simply...depending on the database structure. I'm thinking something along the lines of "look in the database for articles over the past 2, maybe 3 days for articles that almost match and set all but one of them to 'read'" - just not sure if I can set something like that up as a plugin to maybe like...run by pressing a button or on some automatic basis...or if I should just set up a cron job to run a SQL script that would basically do this, if that's easier than being able to run it on demand.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Plugin question - not sure if possible

Postby fox » 21 May 2014, 01:29

Postgres has a pretty good n-gram support for matching similar strings.

User avatar
syh
Bear Rating Trainee
Bear Rating Trainee
Posts: 5
Joined: 26 Jun 2013, 23:29

Re: Plugin question - not sure if possible

Postby syh » 21 May 2014, 07:32

Have you tried the af_unburn plug-in that comes with TT-RSS? I think it might resolve the URLs (feedproxy) and then, since they point to the same place, the duplicate check would work (in theory, I haven't looked at the code, but I think I remember messing around with it a while ago). Apologies if I've misunderstood or misremembered.

bmckenna
Bear Rating Trainee
Bear Rating Trainee
Posts: 33
Joined: 19 Mar 2013, 21:41

Re: Plugin question - not sure if possible

Postby bmckenna » 21 May 2014, 18:30

syh wrote:Have you tried the af_unburn plug-in that comes with TT-RSS? I think it might resolve the URLs (feedproxy) and then, since they point to the same place, the duplicate check would work (in theory, I haven't looked at the code, but I think I remember messing around with it a while ago). Apologies if I've misunderstood or misremembered.


Installed CURL and enabled the af_unburn proxy. The existing feeds didn't reduce in numbers due to duplicates, but I'll keep an eye on future ones that come into those to see if that takes care of it.

Edit - feeds just updated and they're still listing a feedproxy.google URL, so I'm assuming this didn't work.

according to this post - viewtopic.php?f=22&t=2865&p=16836&hilit=af_unburn#p16833 - it should be putting something in the plugin_data column of the entries table - it is not doing that.

Code: Select all

!-- CP[0] 0.0000 seconds -->
<!-- CP[04] 0.0016 seconds -->
[15:34:22/944] start
[15:34:22/944] local cache will not be used for this feed
[15:34:22/944] fetching [http://feeds.feedburner.com/SlickdealsnetHT]...
[15:34:22/944] If-Modified-Since: Wed, 21 May 2014 15:22:00 GMT
[15:34:22/944] fetch done.
[15:34:22/944] processing feed data...
[15:34:22/944] site_url: http://slickdeals.net/
[15:34:22/944] feed_title: SlickDeals.net
[15:34:22/944] loading filters & labels...
[15:34:22/944] 0 filters loaded.
[15:34:22/944] processing articles...
[15:34:22/944] f_guid http://feedproxy.google.com/~r/SlickdealsnetHT/~3/jk3Gu_5rMbQ/6944360-targus-apb27us-4800mah-external-battery-power-bank-for-smartphones-tablets-and-other-usb-mobile-devices-9-99
[15:34:22/944] guid 1,http://feedproxy.google.com/~r/SlickdealsnetHT/~3/jk3Gu_5rMbQ/6944360-targus-apb27us-4800mah-external-battery-power-bank-for-smartphones-tablets-and-other-usb-mobile-devices-9-99 / SHA1:309652f7f5e81c01e36032280aa6aaf4577a206a
[15:34:22/944] orig date: 1400656216
[15:34:22/944] date 1400656216 [2014/05/21 07:10:16]
[15:34:22/944] title Targus APB27US 4800mAh External Battery Power Bank for Smartphones, Tablets, and other USB Mobile Devices $9.99 (2 replies)
[15:34:22/944] link http://feedproxy.google.com/~r/SlickdealsnetHT/~3/jk3Gu_5rMbQ/6944360-targus-apb27us-4800mah-external-battery-power-bank-for-smartphones-tablets-and-other-usb-mobile-devices-9-99
[15:34:22/944] author
[15:34:22/944] num_comments: 0
[15:34:22/944] looking for tags...
[15:34:22/944] tags found:
[15:34:22/944] done collecting data.
[15:34:22/944] applying plugin filters..
[15:34:22/944] plugin data:
[15:34:22/944] base guid found, checking for user record
[15:34:22/944] article filters:
[15:34:22/944] initial score: 0
[15:34:22/944] user record FOUND
[15:34:22/944] RID: 181284, IID: 182940
[15:34:22/944] assigning labels...
[15:34:22/944] looking for enclosures...
[15:34:22/944] article enclosures:
Array


Plugin data is blank here as well.

User avatar
syh
Bear Rating Trainee
Bear Rating Trainee
Posts: 5
Joined: 26 Jun 2013, 23:29

Re: Plugin question - not sure if possible

Postby syh » 21 May 2014, 23:03

Ah, that's too bad, was hoping it would be a quick and easy fix. Maybe fox or one of the other much smarter people than me will be able to help.

bmckenna
Bear Rating Trainee
Bear Rating Trainee
Posts: 33
Joined: 19 Mar 2013, 21:41

Re: Plugin question - not sure if possible

Postby bmckenna » 22 May 2014, 00:27

syh wrote:Ah, that's too bad, was hoping it would be a quick and easy fix. Maybe fox or one of the other much smarter people than me will be able to help.


I'm hoping it will be quick and easy if someone can help me figure out why the plugin doesn't seem to be working. :D

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Plugin question - not sure if possible

Postby feader » 22 May 2014, 01:08

bmckenna wrote:I'm hoping it will be quick and easy if someone can help me figure out why the plugin doesn't seem to be working. :D

Problem is that the guid is set before HOOK_ARTICLE_FILTER which is what af_feedmod hooks into, and for technical reasons the guid can't be changed at that stage anymore (it might be possible for first imports – you have to look carefully at the source if you intend do that).

Two suggestions for your problem:
  • Try to filter the feed before its articles enter the database. You would have to hook in fairly early then, at HOOK_FEED_PARSED or a bit earlier at HOOK_FEED_FETCHED. Since at the former hook the feed is already a bit more structured, you may prefer it, otherwise you have to deal with raw XML. You could then remove the articles from the feed, or change them such that two articles you think are identical get the same guid (if it's an atom feed, look for the atom file in the same directory).
  • Try to hook into HOOK_ARTICLE_FILTER. As I wrote above, it shouldn't be possible to change the guid here, but you can try to work in tandem with filters, e.g. mark duplicate articles with some tag and delete them/mark them as read/whatever you like with a filter. If you want to query the DB, this might be the right hook since you can set the plugin_something field in the DB (if you don't delete the articles, that is).
You have some work to do, but I hope this helps a bit.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Plugin question - not sure if possible

Postby fox » 22 May 2014, 01:43

article filter can mark it as read I think which would effectively accomplish the goal.

bmckenna
Bear Rating Trainee
Bear Rating Trainee
Posts: 33
Joined: 19 Mar 2013, 21:41

Re: Plugin question - not sure if possible

Postby bmckenna » 22 May 2014, 18:36

feader wrote:
bmckenna wrote:I'm hoping it will be quick and easy if someone can help me figure out why the plugin doesn't seem to be working. :D

Problem is that the guid is set before HOOK_ARTICLE_FILTER which is what af_feedmod hooks into, and for technical reasons the guid can't be changed at that stage anymore (it might be possible for first imports – you have to look carefully at the source if you intend do that).

Two suggestions for your problem:
  • Try to filter the feed before its articles enter the database. You would have to hook in fairly early then, at HOOK_FEED_PARSED or a bit earlier at HOOK_FEED_FETCHED. Since at the former hook the feed is already a bit more structured, you may prefer it, otherwise you have to deal with raw XML. You could then remove the articles from the feed, or change them such that two articles you think are identical get the same guid (if it's an atom feed, look for the atom file in the same directory).
  • Try to hook into HOOK_ARTICLE_FILTER. As I wrote above, it shouldn't be possible to change the guid here, but you can try to work in tandem with filters, e.g. mark duplicate articles with some tag and delete them/mark them as read/whatever you like with a filter. If you want to query the DB, this might be the right hook since you can set the plugin_something field in the DB (if you don't delete the articles, that is).
You have some work to do, but I hope this helps a bit.



I'm confused by this...not only because of the technical aspects pertaining to the coding itself and what's called when, but because I'm trying to use af_unburn, not af_feedmod. Do I need feedmod enabled to use unburn?

I'm thinking as said earlier in this thread that if af_unburn were working (which I don't think it is), and I understand the plugin's purpose - which is to resolve feedburner article links to the actual destination link as it's updating the feeds - that should take care of the issue. It would resolve the two feedburner articles to the same URL and remove the duplicate entries. Right now, the feedburner articles are not getting resolved to their destination URL - looking in both the database (the link column) and in tt-rss GUI at the articles themselves, everything still points to the feedproxy.google.com link.

I'm not sure if the issues with what's called when are pertinent here only because if the plugin was ever designed to work period, it would need to modify the URL that's being stored in the database when the feeds are updated before the GUID is set for the article entry in the database, correct? If the GUID can't be changed as you said after the entry is created, that is.

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Plugin question - not sure if possible

Postby feader » 22 May 2014, 19:58

bmckenna wrote:I'm confused by this...not only because of the technical aspects pertaining to the coding itself and what's called when, but because I'm trying to use af_unburn, not af_feedmod. Do I need feedmod enabled to use unburn?

My bad. I meant af_unburn, this has nothing to do with af_feedmod at all.

bmckenna wrote:I'm not sure if the issues with what's called when are pertinent here only because if the plugin was ever designed to work period, it would need to modify the URL that's being stored in the database when the feeds are updated before the GUID is set for the article entry in the database, correct? If the GUID can't be changed as you said after the entry is created, that is.

The reason af_unburn can't do what you want it to do is because the ARTICLE_FILTER hook is too late for that, the guid is already set then. That means you either have to work with the earlier hooks, or hook into ARTICLE_FILTER and mark the article as read when you identify it as a duplicate, which is possible according to fox.

bmckenna
Bear Rating Trainee
Bear Rating Trainee
Posts: 33
Joined: 19 Mar 2013, 21:41

Re: Plugin question - not sure if possible

Postby bmckenna » 22 May 2014, 21:18

Okay, I think I see where you're coming from.

Independent of whether or not the duplicate articles setting should work (and based on your explanation I could see why it wouldn't), should the unburn plugin be resolving the URLs in the articles so that tt-rss's GUI/the database shows the links as being the resolved URLs? I thought that's what the purpose of the plugin was based on what I had read previously.

If that's what the plugin is supposed to be doing, I could write a simple database statement that runs every time the feeds are updated to delete articles with duplicate URLs. Since the different feedburner links would resolve to the same URL, I think this would solve my problem pretty neatly.

I could also probably write a more detailed database statement that would delete duplicates before the feedburner is resolved, but I'm still curious as to whether or not the plugin is functioning at all, since in the feed debug view it does not appear to be.

coplate
Bear Rating Trainee
Bear Rating Trainee
Posts: 3
Joined: 25 May 2014, 05:32

Re: Plugin question - not sure if possible

Postby coplate » 25 May 2014, 08:50

feader wrote:The reason af_unburn can't do what you want it to do is because the ARTICLE_FILTER hook is too late for that, the guid is already set then. That means you either have to work with the earlier hooks, or hook into ARTICLE_FILTER and mark the article as read when you identify it as a duplicate, which is possible according to fox.


I have been looking at the source for the ARTICLE_FILTER hook for something unrelated, and this is right, there's a comment before calling the article filter calling guid 'read only', and it does not use the new value even if you change it.

I was wondering if it would make sense to enhance the plugins to have a type that can be called before checking the database: the biggest downside as a new user that I see, would be that you cant look for any plugin_data for guid, because this PRE_SOMETHING_ARTICLE_FILTER would be allowed to change the guid.

In my case this could allow me to take a guid that has the update timestamp in it, but the rest of the article matches previous articles, and removing the timestamp allows me to match on the guid.

Either of us can do this in our own copies of the source, but if fox reads this again, would you be likely to reject a pull request with that feature?


Return to “Themes and plugins”

Who is online

Users browsing this forum: No registered users and 7 guests