Plugin ff_FeedCleaner

Post plugins and custom CSS snippets here
feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Plugin ff_FeedCleaner

Postby feader » 27 May 2013, 19:48

I created a plugin whose main purpose is to allow for correcting faulty feed data, therefore the FeedCleaner suffix. It can also be used to modify feed URLs for the af_FeedMod plugin. It is available on github. It needs Tiny Tiny RSS version 1.8 or later.

Some documentation is provided on the github page. For a first example, the erroneous feed described here can be corrected with this

Code: Select all

[
   {
      "URL": "http://www.iswintercoming.com/feed.php",
      "type" : "regex",
      "pattern" : "/sid=[0-9a-f]{32}/",
      "replacement" : ""
   }
]

as input, the regular expressions are those from the pcre module.
Last edited by feader on 17 Jul 2013, 21:49, edited 6 times in total.

Latimer
Bear Rating Master
Bear Rating Master
Posts: 131
Joined: 17 Mar 2013, 19:35

Re: Plugin ff_FeedCleaner

Postby Latimer » 28 May 2013, 03:47

Thanks, I'll definitely check it out once 1.7.10, or, rather, 1.8, is out.

wib
Bear Rating Trainee
Bear Rating Trainee
Posts: 2
Joined: 11 May 2013, 13:42

Re: Plugin ff_FeedCleaner

Postby wib » 13 Jun 2013, 16:01

This is exactly what I needed. Looking forward to testing.

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Plugin ff_FeedCleaner

Postby robinmarlow » 15 Jun 2013, 13:40

Thank you, this looks great. Sadly I can't get it to work!

Forhttp://adc.bmj.com/rss/ahead.xml

I want to replace the links to the full articles e.g.
http://adc.bmj.com/cgi/content/short/ar ... 59v1?rss=1
to
http://adc.bmj.com/cgi/content/long/arc ... 59v1?rss=1

I'm trying:

Code: Select all

{
    "#^http://adc\\.bmj\\.com/rss/ahead\\.xml#" : {
        "type" : "regex",
        "pattern" : "#cgi/content/short#",
        "replacement" : "cgi/content/long"
    }
}


but it's not working. What am i doing wrong? Is there a better way?

Thank,

Robin

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Plugin ff_FeedCleaner

Postby feader » 15 Jun 2013, 15:10

robinmarlow wrote:Forhttp://adc.bmj.com/rss/ahead.xml

I want to replace the links to the full articles e.g.
http://adc.bmj.com/cgi/content/short/archdischild-2013-303959v1?rss=1
to
http://adc.bmj.com/cgi/content/long/archdischild-2013-303959v1?rss=1

Hi Robin,

for me, your RegEx does exactly what you are trying to achieve, I see only links to content/long in Tiny Tiny RSS. When clicking on such a link, for example http://adc.bmj.com/cgi/content/long/archdischild-2013-303959v1?rss=1, I get redirected to http://adc.bmj.com/content/early/2013/06/13/archdischild-2013-303959.long?rss=1.

Is that your problem?

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Plugin ff_FeedCleaner

Postby robinmarlow » 15 Jun 2013, 18:19

That is exactly what I want to happen.... but it isn't!
I wondered if it only applied rules to newly fetched articles.

but creating a new feed & applying: "#^http://feeds\\.bbci\\.co\\.uk/news/rss\\.xml?edition=uk#" : {
"type" : "regex",
"pattern" : "#news#",
"replacement" : "test"
}
I would have thought should have got lots of "test".... but again it didn't. Any ideas to how i can troubleshoot it?

I can't see any errors in the tt-rss error log, is there anywhere else I can get a clue?

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Plugin ff_FeedCleaner

Postby feader » 15 Jun 2013, 18:53

robinmarlow wrote:but creating a new feed & applying: "#^http://feeds\\.bbci\\.co\\.uk/news/rss\\.xml?edition=uk#" : {
"type" : "regex",
"pattern" : "#news#",
"replacement" : "test"
}
I would have thought should have got lots of tests.... but again it didn't. Any ideas to how i can troubleshoot it?

OK, with this feed, I don't see test in the URLs either. At the moment, the plugin doesn't report anything to the debug log because I don't know how to do it right (if anyone knows a plugin that does this and posted a link to it, I'd be grateful).

Only thing we can do at the moment is to test the code from hand. I will look into it.

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Plugin ff_FeedCleaner

Postby robinmarlow » 15 Jun 2013, 18:59

Thanks! I was just investigating your (very neat) code to see how it works (I think I get the rough idea).
Adding a way to log something to the debug log would be great & given your code really easy if we knew how!
I had just started poking around to see what I can find, but nothing yet.
I can't see why my news example doesn't work either.

Robin

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Plugin ff_FeedCleaner

Postby robinmarlow » 15 Jun 2013, 19:20

Tiny-Tiny-RSS / plugins / af_pennyarcade / init.php

appears to have some logging setup in it - but the same doesn't work when I put it into feedcleaner.
however i think this is actually a problem somwhere between computer and chair....

R

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Plugin ff_FeedCleaner

Postby feader » 15 Jun 2013, 19:32

robinmarlow wrote:[…]
but creating a new feed & applying: "#^http://feeds\\.bbci\\.co\\.uk/news/rss\\.xml?edition=uk#" : {
[…]


Sometimes … the problem is that '?' is a regex meta character, could you try it with

Code: Select all

"#^http://feeds\\.bbci\\.co\\.uk/news/rss\\.xml\\?edition=uk#"

as key?

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Plugin ff_FeedCleaner

Postby robinmarlow » 15 Jun 2013, 20:58

sorry that still didn't work.

But your iswintercoming feed & example do work - so at least my computer can deal with regex - it is just choking on the sites I want!

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Plugin ff_FeedCleaner

Postby feader » 15 Jun 2013, 21:31

robinmarlow wrote:But your iswintercoming feed & example do work - so at least my computer can deal with regex - it is just choking on the sites I want!

Strange. With

Code: Select all

"#^http://feeds\\.bbci\\.co\\.uk/news/rss\\.xml\\?edition=uk#" : {
"type" : "regex",
"pattern" : "#news#",
"replacement" : "test"
}

I get tests and a nice 404 handler if I click on the URLs (that's our Beeb :wink: ). Sorry, I'm out of ideas at the moment.

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Plugin ff_FeedCleaner

Postby robinmarlow » 18 Jun 2013, 14:41

Fixed it. I needed to escape the backslashes in my regex pattern

Code: Select all

    "#^http://adc\\.bmj\\.com/rss/ahead\\.xml#" : {
        "type" : "regex",
        "pattern" : "#cgi\\/content\\/short#",
        "replacement" : "cgi\/content\/long"
    }


Robin

roshambo
Bear Rating Trainee
Bear Rating Trainee
Posts: 35
Joined: 19 Jun 2013, 20:03

Re: Plugin ff_FeedCleaner

Postby roshambo » 25 Jun 2013, 00:39

Thanks for this, I'm trying to fix this feed: http://validator.w3.org/feed/check.cgi? ... wire%2Fall but dumbfounded when it comes to regex. Also ttrss is complaining about '&acirc' instead, not sure which is correct. So far I have:

Code: Select all

{
  "#^http://feeds.feedburner\\.com/1500espn/sportswire/all\\#" : {
        "type" : "regex",
        "pattern" : "/\x80\x99/",
        "replacement" : ""
   },
  "#^http://feeds.feedburner\\.com/1500espn/sportswire/all\\#" : {
        "type" : "regex",
        "pattern" : "/\x80\x98/",
        "replacement" : ""
   },
  "#^http://feeds.feedburner\\.com/1500espn/sportswire/all\\#" : {
        "type" : "regex",
        "pattern" : "/\x85\x94/",
        "replacement" : ""
   },
  "#^http://feeds.feedburner\\.com/1500espn/sportswire/all\\#" : {
        "type" : "regex",
        "pattern" : "/&acirc/",
        "replacement" : ""
   }
}

Which results in an invalid JSON. Any help would be appreciated.

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Plugin ff_FeedCleaner

Postby feader » 25 Jun 2013, 01:03

Different objects may not have the same key. This is a mistake on my side, in the next version the configuration will consist of unnamed objects with a url key. In the mean time, try dropping letters

Code: Select all

{
  "#^http://feeds.feedburner\\.com/1500espn/sportswire/all\\#" : {
       […]
   },
  "#^http://feeds.feedburner\\.com/1500espn/sportswire/al\\#" : {
        […]
   },
  [etc]
}

or do it in one regex with alternation

Code: Select all

{
  "#^http://feeds.feedburner\\.com/1500espn/sportswire/all\\#" : {
        "type" : "regex",
        "pattern" : "/\x80\x99|\x80\x98[|…]/",
        "replacement" : ""
   }

I'm not sure what you want to achieve with the \\# at the end tough. I'm also not sure what ESPN wants with &acirc, but I'd remove the whole â with semicolon. Last not least I'm not sure if the pattern /\x80\x99/ works as intended (consult the doc), and maybe you should first contact the content provider before using this plugin.


Return to “Themes and plugins”

Who is online

Users browsing this forum: No registered users and 1 guest