HOOK to remove content from external servers

Request new functionality here
jmozmoz
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 14 Apr 2013, 18:07

HOOK to remove content from external servers

Postby jmozmoz » 25 Jan 2014, 05:04

Hi,

I would like to ask for help with the following problem:
Is it possible to either configure tt-rss in such a way that only images (and other content) from the tt-rss server is shown in the feeds or to do this by a plugin? (If images are cached on the tt-rss server they should also be shown.)

I was able to write a plugin based on af_fsckportal that does this for images linked in the feed "article". But if images are in the attachments/enclosures they are not cached at the tt-rss server and I didn't find a plugin hook to either cache them or remove the links when the feed is shown.

I guess this would require a new hook to modify the feed article after the attachment are added (at the very end of the function format_article) => feature request.

(One could choose the feed option to not show images at all. But I would prefer to use the cache feature and (only) show the cached images).

Thank you for any help.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: HOOK to remove content from external servers

Postby fox » 25 Jan 2014, 11:00

I think I can move the part where tt-rss is looking for enclosures before the article filter hook is called so that it could manipulate this information. You are right though currently it can't do that.

For the time being I can suggest a workaround - you can use feed parsed hook and remove (or cache locally and rewrite the links, I guess) the enclosure, media:content, and other relevant elements from the DOM tree.

jmozmoz
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 14 Apr 2013, 18:07

Re: HOOK to remove content from external servers

Postby jmozmoz » 25 Jan 2014, 20:37

Thank you for your answer. Can HOOK_FEED_PARSED really be used for this? It looks like I cannot modify the feed content with it. But I found https://github.com/wltb/ff_feedcleaner and with config it removes the thumbnails from the enclosures:

Code: Select all

[
    {
        "URL_re": "#.*#",
        "type": "xpath_regex",
        "xpath": "//media:content/@url|//media:thumbnail/@url",
        "pattern": "#.*#",
        "replacement": ""
    }
]

Actually, this plugin uses HOOK_FEED_FETCHED.
And this is my plugin to remove the links within the feed articles:

Code: Select all

class remove_external_links extends Plugin {

        private $host;

        function about() {
                return array(1.0,
                        "Remove links to images on external server",
                        "jmozmoz");
        }

        function init($host) {
                $this->host = $host;

                #$host->add_hook($host::HOOK_FEED_PARSED, $this);
                #$host->add_hook($host::HOOK_ARTICLE_FILTER, $this);
                $host->add_hook($host::HOOK_RENDER_ARTICLE_CDM, $this);
        }

        function hook_article_left_button($article) {
                return $this->hook_article_filter($article);
        }

        function hook_render_article_cdm($article) {
                return $this->hook_article_filter($article);
        }

        function hook_article_filter($article) {
                $owner_uid = $article["owner_uid"];
                $rss_link = get_self_url_prefix();

                if (strpos($article["plugin_data"], "remove_external_links,$owner_uid:") === FALSE) {

                        $doc = new DOMDocument();

                        $charset_hack = '<head>
                                <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
                        </head>';

                        @$doc->loadHTML($charset_hack . $article["content"]);

                        if ($doc) {
                                $xpath = new DOMXPath($doc);
                                $entries = $xpath->query('//img');

                                foreach ($entries as $entry) {
                                        $src = $entry->getAttribute("src");
                                        if ((strpos($src, $rss_link) === FALSE) &&
                                            (preg_match("/^http/", $src))){
                                                $replacement = $doc->createDocumentFragment();
                                                $replacement->appendXML($src);
                                                $entry->parentNode->replaceChild($replacement, $entry);
                                                #$entry->parentNode->removeChild($entry);
                                        }
                                }

                                $article["content"] = $doc->saveXML($basenode);
                                $article["plugin_data"] = "remove_external_links,$owner_uid:" . $article["plugin_data"];

                        }
                } else if (isset($article["stored"]["content"])) {
                        $article["content"] = $article["stored"]["content"];
                }

                return $article;
        }

        function api_version() {
                return 2;
        }

}
?>

I guess if I want to cache enclosures, I would have to follow the examples of ff_feedcleaner and cache_starred_images.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: HOOK to remove content from external servers

Postby fox » 25 Jan 2014, 20:56

The argument to this hook is an object so you should be able to modify it (as far as I remember php always passes objects by reference). It's an instance of DOMDocument you can use xpath on, etc.

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: HOOK to remove content from external servers

Postby JustAMacUser » 12 May 2014, 22:00

I've also found that it would be useful to filter the enclosures when rendering articles, mostly because of SSL but also to handle some media types a little different. I put together the following pull request that adds a plugin hook for filtering enclosures:

https://github.com/gothfox/Tiny-Tiny-RSS/pull/374

jmozmoz
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 14 Apr 2013, 18:07

Re: HOOK to remove content from external servers

Postby jmozmoz » 13 May 2014, 02:23

Here is the current state of my plugin. It removes links to external content in the news message (and hopefully soon also in attachments). The state of the plugin can be toggled with the icon in the toolbar:

https://github.com/jmozmoz/remove_external_content

jmozmoz
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 14 Apr 2013, 18:07

Re: HOOK to remove content from external servers

Postby jmozmoz » 29 May 2014, 22:52

I added the new hook to my plugin (see github link above). It seems to work.


Return to “Feature requests”

Who is online

Users browsing this forum: No registered users and 4 guests