New feature ? entry_link extract

Development-related discussion, including bundled plugins
MartinGS
Bear Rating Trainee
Bear Rating Trainee
Posts: 4
Joined: 02 Sep 2011, 12:39

New feature ? entry_link extract

Postby MartinGS » 05 Jan 2012, 20:22

Hello,
I've added in v1.5.8.1 a way to extract content from a link_entry and to replace the entry_content in ttrss_entries.

You could have a look at https://github.com/MartinGS/Tiny-Tiny-R ... .8.1_xpath

Are you interested in this feature ? I personally use it successfully since 2 or 3 month.
I'm willing to merge it to master for a pull request if yes :)

Regards,
Martin

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: New feature ? entry_link extract

Postby fox » 06 Jan 2012, 00:05

My position on this is outlined here: http://tt-rss.org/redmine/issues/390#note-4

MartinGS
Bear Rating Trainee
Bear Rating Trainee
Posts: 4
Joined: 02 Sep 2011, 12:39

Re: New feature ? entry_link extract

Postby MartinGS » 06 Jan 2012, 13:58

If I understand correctly, because RSS provider only provide partial content, tt-rss should respect that ?

The new feature I propose is optional and Off by default.
So user could act accordingly to respect what rss provider want, but be able to exploit misconfigured rss feed.

Plus, the code I propose is less intrusive and doesn't use any external library, only using XPath to extract content.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: New feature ? entry_link extract

Postby fox » 06 Jan 2012, 14:14

Point is, if they didn't want you to visit their website, they would've included the content in the feed. I don't think it makes a huge difference here whether scraping is optional or not.

MartinGS
Bear Rating Trainee
Bear Rating Trainee
Posts: 4
Joined: 02 Sep 2011, 12:39

Re: New feature ? entry_link extract

Postby MartinGS » 06 Jan 2012, 14:36

And broken or misconfigured feed ?

Addblock Plus extension for Firefox blocks adds, but I don't think site are okay with that.
The way user wants to consult information not always match how the site want the user to access the site.

If sites gives partial rss content to force you to go to their site, I personally think that's ok to use tool to extract content and "alter" the rss feed.
If it's not the use they want you to do of the rss feed, they could add adds to the rss feeds or disable it.

But you could then crawl the whole site with request, parse the result and extract any content you want.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: New feature ? entry_link extract

Postby fox » 06 Jan 2012, 14:37

Just want to add that I completely understand that it is a useful feature to have (and various implementations of this have been submitted before), but I'm not sure that having this wouldn't piss off content providers, and this is not something that I think is wise to do.

Maybe it would be possible at some point to implement this (and other things) using a plugin system, but currently tt-rss doesn't have one.

Addblock Plus extension for Firefox blocks adds, but I don't think site are okay with that.


They can't easily block Adblock from working on their site. Blocking tt-rss user agent is quite easy. This could be worked around, obviously, but I don't want to start this retarded war in the first place.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: New feature ? entry_link extract

Postby fox » 06 Jan 2012, 14:40

Maybe it would be possible to implement feed filter plugins, e.g. arbitrary plugins to mangle incoming RSS content. This is something that could potentially be useful beyond scraping of article content - e.g. inserting local ads by tt-rss instance hosters and such. I could implement something like that.

MartinGS
Bear Rating Trainee
Bear Rating Trainee
Posts: 4
Joined: 02 Sep 2011, 12:39

Re: New feature ? entry_link extract

Postby MartinGS » 06 Jan 2012, 16:59

I've not used filter plugins for the moment, I will try to have a look into it too and see if that can fit !

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: New feature ? entry_link extract

Postby fox » 06 Jan 2012, 17:52

That ain't done yet, but could be implemented in the future.


Return to “Development”

Who is online

Users browsing this forum: No registered users and 2 guests