af_feedmod

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only
thermionic
Bear Rating Trainee
Bear Rating Trainee
Posts: 42
Joined: 15 May 2013, 13:50

af_feedmod

Postby thermionic » 20 May 2013, 13:48

Hi All,

I wrote a very basic howto and wondered if anyone else would like to contribute any of their configs to the thread.

Cheers

Latimer
Bear Rating Master
Bear Rating Master
Posts: 131
Joined: 17 Mar 2013, 19:35

Re: af_feedmod

Postby Latimer » 20 May 2013, 18:01

Thanks to the plugin documentation and your howto it doesn't seem to be terribly difficult to come up with a proper configuration string, and I'm not sure whether it's helpful to build a collection of them. I thought I would share these for illustrative purposes. Feel free to tweak them however you want.

http://feeds.feedburner.com/AndroidPolice?format=xml

Code: Select all

"AndroidPolice": {
    "type": "xpath",
    "xpath": "div[@class='post_content']"
}

http://www.theregister.co.uk/headlines.atom

Code: Select all

"go.theregister.com": {
    "type": "xpath",
    "xpath": "div[@id='body']"
}

http://penny-arcade.com/feed

Code: Select all

"penny-arcade.com" : {
    "type": "xpath",
    "xpath": "div[contains(@class, 'comic')] | //div[@class='body']"
}

J M L
Bear Rating Trainee
Bear Rating Trainee
Posts: 13
Joined: 20 May 2013, 18:23

Re: af_feedmod

Postby J M L » 20 May 2013, 18:39

These are a few I've come up with:

Code: Select all

{
   "jalopnik": {
   "type": "xpath",
   "xpath": "div[@class='eight mobile-four columns end']"
   },
   "io9": {
   "type": "xpath",
   "xpath": "div[@class='eight mobile-four columns end']"
   },
   "webbikeworld": {
   "type": "xpath",
   "xpath": "div[@class='leftcontainer']"
   },
   "slate": {
   "type": "xpath",
   "xpath": "div[@class='ht5-article sl-main-section']"
   },
   "cartalk": {
   "type": "xpath",
   "xpath": "div[@class='field-item even']"
   },
   "motorcycle-usa": {
   "type": "xpath",
   "xpath": "div[@id='articleleftcolumn']"
   },
   "badscience": {
   "type": "xpath",
   "xpath": "div[@id='content']"
   },
   "groklaw": {
   "type": "xpath",
   "xpath": "/html/body/table[1]/tbody/tr[3]/td[2]/table/tbody/tr/td/table[1]"
   },
   "motorcycle.com": {
   "type": "xpath",
   "xpath": "div[@class='body_content']"
   }
}


The Gawker Media ones (Jalopnik and Io9); Cartalk; and Motorcycle-USA ones all work well, as far as I can tell. If Io9 links to an article on Gizmodo (for example), then the full text of that article will not be pulled, because it does not match the "io9" url. The Slate one works, but if it's a multipage article it only shows the first page. I haven't been able to test Motorcycle.com, badscience, or webbikeworld.

The Groklaw one does not work. That site uses tables for layout, and I'm not sure how to construct an XPath query to pull out just the cell I want. The XPath listed is what Chrome gives for "Copy XPath" on the appropriate table cell.

Latimer
Bear Rating Master
Bear Rating Master
Posts: 131
Joined: 17 Mar 2013, 19:35

Re: af_feedmod

Postby Latimer » 20 May 2013, 19:05

There is a full text RSS feed for io9: http://io9.com/rss. I guess you can build one for jalopnik as well: http://jalopnik.com/rss

nevergrownup
Bear Rating Trainee
Bear Rating Trainee
Posts: 6
Joined: 20 May 2013, 21:26

Re: af_feedmod

Postby nevergrownup » 20 May 2013, 21:47

I made a small tweak to af_feedmod to help support pages like groklaw which require xpaths from /html.

Change this line

Code: Select all

$entries = $xpath->query('(//'.$config['xpath'].')');

to this

Code: Select all

if(substr($config['xpath'],0,1) != "/") {
   $config['xpath'] = "//".$config['xpath'];
}
$entries = $xpath->query('('.$config['xpath'].')');


Now feedmod won't automatically prepend // if the xpath already has a starting /

Also, I've found when building xpaths into tables that you should drop tbody from the expression. so the groklaw feed should look like this

Code: Select all

"groklaw": {
   "type": "xpath",
   "xpath": "/html/body/table[1]/tr[3]/td[2]/table/tr/td/table[1]"
}


With those two things I have groklaw in my feedlist with full articles.

I suppose we could also add a patch to automatically strip tbody from xpath expressions, but I'm not sure why Firefox/Chrome show tbody and PHP's DOMDocument doesn't so I'm hesitant to just hardcode something like that in when it could be a site by site basis or the DOM parser might be updated later on. Perhaps having feedmod try with tbody and then with tbody stripped out if no results are found from the first query. That would add some overhead, but probably trivial compared to the original page fetch.

J M L
Bear Rating Trainee
Bear Rating Trainee
Posts: 13
Joined: 20 May 2013, 18:23

Re: af_feedmod

Postby J M L » 21 May 2013, 01:28

Latimer wrote:There is a full text RSS feed for io9: http://io9.com/rss. I guess you can build one for jalopnik as well: http://jalopnik.com/rss


That would have saved me some time. The ones I'd been using, http://feeds.gawker.com/io9/full, etc., stopped being a full feed a couple months ago.

thermionic
Bear Rating Trainee
Bear Rating Trainee
Posts: 42
Joined: 15 May 2013, 13:50

Re: af_feedmod

Postby thermionic » 23 May 2013, 17:15

does anyone have a working one for ars technica ?

the xpath looks as if it should be

Code: Select all

//*[@id="content"]/article
but I don't know how to convert this into an xpath that af_feedmod understands

nevergrownup
Bear Rating Trainee
Bear Rating Trainee
Posts: 6
Joined: 20 May 2013, 21:26

Re: af_feedmod

Postby nevergrownup » 23 May 2013, 17:51

try

Code: Select all

div[@id="content"]/article



that will probably work, but I'm on my phone so no easy way to verify

Latimer
Bear Rating Master
Bear Rating Master
Posts: 131
Joined: 17 Mar 2013, 19:35

Re: af_feedmod

Postby Latimer » 23 May 2013, 18:03

How about this?

Code: Select all

"arstechnica": {
    "type": "xpath",
    "xpath": "div[@class='article-content clearfix']"
}

dang
Bear Rating Trainee
Bear Rating Trainee
Posts: 14
Joined: 19 Mar 2013, 22:06

Re: af_feedmod

Postby dang » 23 May 2013, 18:20

I just use:

Code: Select all

"arstechnica": {
    "type": "xpath",
    "xpath": "article"
},


Works fine for all single page articles, you have to click through for multiple pages.

thermionic
Bear Rating Trainee
Bear Rating Trainee
Posts: 42
Joined: 15 May 2013, 13:50

Re: af_feedmod

Postby thermionic » 23 May 2013, 18:32

@Latimer that worker perfectly

Thanks!

Midas
Bear Rating Trainee
Bear Rating Trainee
Posts: 7
Joined: 26 May 2013, 23:17

Re: af_feedmod

Postby Midas » 26 May 2013, 23:26

Hello!
Could you help me, I am tried many variants...
TT-Rss 1.7.9
RSS feed: http://pipes.yahoo.com/pipes/pipe.run?_id=7e9a203d8b4faaefcdb25e1ebc40f0e0&_render=rss
Links for example to "http://habrahabr.ru/post/180935/".
Config looks good, but does not work.

Code: Select all

{
"7e9a203d8b4faaefcdb25e1ebc40f0e0": {
    "type": "xpath",
    "xpath": "div[contains(@class, 'content')]"
}
}

Latimer
Bear Rating Master
Bear Rating Master
Posts: 131
Joined: 17 Mar 2013, 19:35

Re: af_feedmod

Postby Latimer » 27 May 2013, 00:44

Code: Select all

"habrahabr.ru": {
    "type": "xpath",
    "xpath": "div[@class='content html_format']"
}

"Array key" should be a part of full article URL, not the feed URL.

Midas
Bear Rating Trainee
Bear Rating Trainee
Posts: 7
Joined: 26 May 2013, 23:17

Re: af_feedmod

Postby Midas » 27 May 2013, 22:35

Latimer, Thank you!

Midas
Bear Rating Trainee
Bear Rating Trainee
Posts: 7
Joined: 26 May 2013, 23:17

Re: af_feedmod

Postby Midas » 31 May 2013, 22:43

Another question.
How set "force_charset" to "windows-1251" ?

Code: Select all

"shkolazhizni.ru": {
    "type": "xpath",
    "xpath": "div[@id='inner_wrapper']",
    "force_charset": "windows-1251"
}

returns unreachable text


Return to “Support”

Who is online

Users browsing this forum: No registered users and 8 guests