Importing of Google Reader Cached Feeds

Request new functionality here
gbcox
Bear Rating Master
Bear Rating Master
Posts: 149
Joined: 25 Apr 2013, 04:52

Importing of Google Reader Cached Feeds

Postby gbcox » 30 Apr 2013, 20:36

Google Reader allows you to extract feed history, up to a limit of 1000 articles.

Here are two articles which explain how to extract the data:
http://googlesystem.blogspot.com.br/200 ... oogle.html
http://ashleyangell.com/2011/01/export- ... le-reader/

It would be nice to be able to import this information into the ttrss database. I've seen where you can import starred or shared items, but that isn't what I'm talking about. I'm interested in importing an entire feed history (my understanding is that this is limited to 1000 articles, but that is sufficient for most purposes) into the active database so I can search for information. So the end result would be for each feed you imported, you would start with 1000 articles from google cache. Then you would continue to add by the normal update procedure.

This possibly could also be adapted by people who are wanting to switch from rssowl, etc. to ttrss without losing their feed history.

KestL
Bear Rating Trainee
Bear Rating Trainee
Posts: 2
Joined: 03 May 2013, 11:35

Re: Importing of Google Reader Cached Feeds

Postby KestL » 03 May 2013, 11:39

Hi!
you can find a manual how to extract more then 1000 items.

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Importing of Google Reader Cached Feeds

Postby robinmarlow » 21 May 2013, 14:01

From a brief look at the XML produced by these methods, it looks as if it wouldn't be too hard to mangle it so that the XML import plugin would cope with it.
I'll try and dust off my perl / look at the plugin this weekend.

R

lotrfan
Bear Rating Disaster
Bear Rating Disaster
Posts: 73
Joined: 18 Mar 2013, 04:42

Re: Importing of Google Reader Cached Feeds

Postby lotrfan » 22 May 2013, 00:18


robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Importing of Google Reader Cached Feeds

Postby robinmarlow » 22 May 2013, 01:55

Crikey nice job! That would have taken me a while to code up.
My only suggestion for improvement would be to let it parse the subscriptions.xml that google take out produces to get the feed names.... but that isn't really needed unless you've got a lot of feeds.

Further poking around in the xml import plugin (plugins/googlereaderimport/init.php) looks like shouldn't need much modification to let it parse these files.

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Importing of Google Reader Cached Feeds

Postby robinmarlow » 24 May 2013, 02:21

Here is a parsing hack:


this will take the xml from either lotrfan's reader.pl or the manual way gbcox mentioned & make it into an xml that the import/export plugin can use

run (linux again ;o)

php readerXML_to_ttrssXML.php input.xml > output.xml

you may need to gzip the output to get the plugin to manage to load it.

Works for the 3 feeds i've tried so far - but there could be others that don't play nice.
let me know how you get on.

Robin

gbcox
Bear Rating Master
Bear Rating Master
Posts: 149
Joined: 25 Apr 2013, 04:52

Re: Importing of Google Reader Cached Feeds

Postby gbcox » 26 May 2013, 23:20

I tried the parsing hack, the text is displaying, but none of the photos...

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Importing of Google Reader Cached Feeds

Postby robinmarlow » 27 May 2013, 01:34

Half way there then ;o)

What is the feed address? I'll take a look and see what's going wrong.

R

gbcox
Bear Rating Master
Bear Rating Master
Posts: 149
Joined: 25 Apr 2013, 04:52

Re: Importing of Google Reader Cached Feeds

Postby gbcox » 27 May 2013, 05:46


lotrfan
Bear Rating Disaster
Bear Rating Disaster
Posts: 73
Joined: 18 Mar 2013, 04:42

Re: Importing of Google Reader Cached Feeds

Postby lotrfan » 27 May 2013, 10:43


lotrfan
Bear Rating Disaster
Bear Rating Disaster
Posts: 73
Joined: 18 Mar 2013, 04:42

Re: Importing of Google Reader Cached Feeds

Postby lotrfan » 27 May 2013, 10:49

By the way, nice job on the parser, robinmarlow! I've only tried it on the above feed (on which it works beautifully), as I haven't decided if I'm going to import all of my old feeds... My TT-RSS server isn't the most powerful machine; I'm not sure it can handle the tens of thousands of more articles, as some queries already take awhile...

gbcox
Bear Rating Master
Bear Rating Master
Posts: 149
Joined: 25 Apr 2013, 04:52

Re: Importing of Google Reader Cached Feeds

Postby gbcox » 27 May 2013, 21:47

@lotrfan - Thanks, I went to the plugins/import_export directory, renamed init.php to init.php.dist and replaced with your version of init.php. Tested it on three feeds and all work fine now. Checked the error log and no errors there either. @Fox do you think this change could be folded into the trunk or you think it better to create a separate plugin?

User avatar
recognitium
Bear Rating Trainee
Bear Rating Trainee
Posts: 14
Joined: 02 Jul 2013, 01:35

Re: Importing of Google Reader Cached Feeds

Postby recognitium » 02 Jul 2013, 12:44

@lotrfan and @robinmarlow, thanks for this great efforts. It seems I could use it to do a similar thing that I wanted to do.

You see, I downloaded all my tagged items as some format of xml (one xml per tag) using feed-archive tool from http://readerisdead.com/

One of the files I obtained from the tool above was:



My idea would be to import them, since it's my own old curated news. First try was to upload them and subscribe to the "dead" feeds. Even though I set up a purge = 0 and 45000 hours to new items limit (I am starting this user from scratch, and I want this to be properly setup first before adding "alive" feeds), none shows.

After that I found your posts. I changed init.php as indicated, and tried using @robinmarlow 's parsing hack to the files I downloaded. No luck also, the parser just generates 42 B files .

You can laught at me... obviously I am totally lost at this jungle, and, unfortunately, my computer and web skills are pretty low. Meanwhile I'll try to understand enough PHP to hack my own parser, but any help I could get would be great.

robinmarlow
Bear Rating Trainee
Bear Rating Trainee
Posts: 11
Joined: 21 May 2013, 13:58

Re: Importing of Google Reader Cached Feeds

Postby robinmarlow » 02 Jul 2013, 14:34


User avatar
recognitium
Bear Rating Trainee
Bear Rating Trainee
Posts: 14
Joined: 02 Jul 2013, 01:35

Re: Importing of Google Reader Cached Feeds

Postby recognitium » 02 Jul 2013, 18:57

Thanks a lot. I will try it now

---- UPDATE ---

As far as I could see, it worked like a charm!

I had 166 archived tags downloaded like that, almost all of them were properly imported. I just made a rudimentar bash script for the sed and hack for all the files. Thank you so much @robinmarlow !!

Only few that had an error were big ones (exceeding several Mb's). How was it that I could gzip them? As noob as I am, I tried to upload directly the archives as "tar.gz" , and, of course, the import-export plugin couldn't read the XML.


Return to “Feature requests”

Who is online

Users browsing this forum: No registered users and 3 guests