Update database content

User avatar
dlohan
Joy Division
Posts: 29
Joined: 04 Oct 2013, 22:26

Update database content

Postby dlohan » 11 May 2016, 17:37

I have several thousand articles I have archived in TT-RSS. Some of them arrive via Google RSS. Others do not. Some of the links resemble:

"https://www.google.com/url?rct=j&sa=t&url=http://thecork.ie/2016/03/16/cork-airport-get-two-new-routes-to-southampton-and-leeds-bradford/&ct=ga&cd=CAIyGzI5ZDZjMWRhMzczNzBlOTU6aWU6ZW46SUU6Ug&usg=AFQjCNGaSw-_EEoppiW7fQFFjFKSbcISEQ"

All I want is the middle part:

https://www.google.com/url?rct=j&sa=t&url=http://thecork.ie/2016/03/16/cork-airport-get-two-new-routes-to-southampton-and-leeds-bradford/&ct=ga&cd=CAIyGzI5ZDZjMWRhMzczNzBlOTU6aWU6ZW46SUU6Ug&usg=AFQjCNGaSw-_EEoppiW7fQFFjFKSbcISEQ

I have looked at plugins such as af_feedmod and ff_feedcleaner. The first appears to be irrelevant. The second will adjust data before it is parsed and I'm not sure whether it can make two adjustments. Aside from this it will not adjust the data already in my database. I also looked at Google New Cleanup Plugin at . Unless I am wrong this is not relevant either.

I had thought of trying a mySQL trigger. I wrote one to apply after data had been added to the table "entries".

BEGIN
UPDATE `ttrss_entries` SET `DaveLink` = `link` ,
`DaveFlag` =1 WHERE `DaveLink` = "" AND `DaveFlag` =0;

UPDATE `ttrss_entries` SET `link` = Right( `link` , Char_Length( `link` ) - ( Instr( `link` , "http://" ) -1 ) ) ,
`DaveFlag` =2 WHERE Instr( `link` , "https://www.google.com/url?rct=j&sa=t&url=" ) >0 AND Instr( `link` , "&ct=ga&" ) >0 AND `DaveFlag` =1;

UPDATE `ttrss_entries`
SET `link` = Left (`link`, Instr( `link` , "&ct=ga&")-1), `DaveFlag` = 0
WHERE Instr( `link` , "&ct=ga&" ) >0 and DaveFlag = 2;

END

The field DaveLink is a backup of the original link field and the field DaveFlag is a flag field I incorporated to avoid inconsistencies.

While it changed the data it seemed to kill the application and my feeds stopped downloading. I suspect many of you will know precisely why this occurred. Fortunately I'd made a backup and when I restored everything worked fine again, but my links remain as they were.

My questions then, if someone could assist:

1. Can I modify URLS already in the TinyTinyRSS database?
2. If not, can I use ff_feedcleaner to achieve clean urls for new data (examples from only show one part of the URL being cleaned. I need to remove the first and last parts.)
3. If ff_feedcleaner can be used can someone with expertise advise the code I should use?

All help greatly appreciated.

Latimer
Bear Rating Master
Bear Rating Master
Posts: 131
Joined: 17 Mar 2013, 19:35

Re: Update database content

Postby Latimer » 11 May 2016, 18:38

Seems like af_unburn does something similar for the new articles.

User avatar
dlohan
Joy Division
Posts: 29
Joined: 04 Oct 2013, 22:26

Re: Update database content

Postby dlohan » 11 May 2016, 20:20


JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: Update database content

Postby JustAMacUser » 11 May 2016, 20:57


User avatar
dlohan
Joy Division
Posts: 29
Joined: 04 Oct 2013, 22:26

Re: Update database content

Postby dlohan » 11 May 2016, 21:33


User avatar
dlohan
Joy Division
Posts: 29
Joined: 04 Oct 2013, 22:26

Re: Update database content

Postby dlohan » 11 May 2016, 21:36

Thanks JustAMacUser.

User avatar
dlohan
Joy Division
Posts: 29
Joined: 04 Oct 2013, 22:26

Re: Update database content

Postby dlohan » 11 May 2016, 22:15

I have enabled ff_feedcleaner. Link https://github.com/wltb/ff_feedcleaner

I think the code I should be using is something along the lines of:

[
{
"URL": "www.google.com",
"type": "link_regex",
"pattern": "(?:https\:\/\/www\.google\.com)(?:.*?)url=(.*?)\&",
"replacement": "(?:.*?)url=(.*?)\&"
}
]

I know this isn't quite right yet. If there are any regex experts out there perhaps one of you might spot my mistake.

User avatar
dlohan
Joy Division
Posts: 29
Joined: 04 Oct 2013, 22:26

Re: Update database content

Postby dlohan » 12 May 2016, 15:09

As opposed to what I posted earlier I think what I need to do is capture the first and last parts and set them to null. To this end I have worked on a regex pattern which appears (using Regex101.com and Debuggex.com) to be correct. I have tried it in ff_feedcleaner but now I am getting an "invalid JSON" message:

[
{
"URL_re": "#https://(www\\.google\\.com#",
"type": "regex",
"pattern": "(.*?)(?:http:\/\/?[\da-z\.-]+\.[a-z\.]{2,6}[\/\w \.-]*)(.*?&.*$)",
"replacement": ""
}
]

User avatar
dlohan
Joy Division
Posts: 29
Joined: 04 Oct 2013, 22:26

Re: Update database content

Postby dlohan » 13 May 2016, 15:49

I give up. Completely stumped.


Return to “Everything else”

Who is online

Users browsing this forum: No registered users and 2 guests