I made a thing (again)

Development-related discussion, including bundled plugins
User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

I made a thing (again)

Postby fox » 07 Jul 2015, 00:01

It even seems to work

https://github.com/gothfox/Tiny-Tiny-RS ... eadability

Also, af_redditimgur has similar functionality now.

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: I made a thing (again)

Postby JustAMacUser » 07 Jul 2015, 01:26

Awesome!

TT-RSS just keeps getting better.

nameless
Bear Rating Master
Bear Rating Master
Posts: 126
Joined: 28 Aug 2013, 20:33

Re: I made a thing (again)

Postby nameless » 07 Jul 2015, 21:41

great feature.
i think it would come handy if you could enable this option when subscribing to feed.
then again i dunno if there is a hook for that.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: I made a thing (again)

Postby fox » 07 Jul 2015, 21:55

there isn't; it's not like its hard to enable it in feed editor anyway

xtaz
Bear Rating Master
Bear Rating Master
Posts: 174
Joined: 24 Dec 2009, 16:48

Re: I made a thing (again)

Postby xtaz » 13 Jul 2015, 16:05

I'd just like to say that this thing is awesome. I've had several different feeds which used to give full text but changed it to only show the first paragraph and this restores the full text feed on every one. Amazing work.

Maru
Bear Rating Trainee
Bear Rating Trainee
Posts: 40
Joined: 20 Oct 2013, 14:26

Re: I made a thing (again)

Postby Maru » 13 Jul 2015, 18:03

Hnmm, I enabled this for one of my feeds here and while it indeed shows the full article now the encoding does not seem to work correctly. All umlauts are shown in the wrong encoding (you see two characters for 2byte utf8 characters, something you normally only see if you have utf-8 content but the wrong encoding). The feed in question is.

http://www.heise.de/newsticker/heise-atom.xml

Without the plugin enabled and only showing the summary the umlauts are shown correctly, so maybe something goes wrong when fetching the full article..

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: I made a thing (again)

Postby fox » 13 Jul 2015, 18:45

it could be broken, i had to add some workarounds so that non-unicode html pages would work properly, no idea why would it choke on utf8.

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: I made a thing (again)

Postby JustAMacUser » 13 Jul 2015, 18:53

In the af_readability pluging, commenting out:

Code: Select all

         if ($tmpdoc->encoding != 'UTF-8') {
            $tmpxpath = new DOMXPath($tmpdoc);

            foreach ($tmpxpath->query("//meta") as $elem) {
               $elem->parentNode->removeChild($elem);
            }

            $tmp = $tmpdoc->saveHTML();
         }


Fixed it for the feed in question, but I imagine this will break non-UTF-8 pages.

fox: Is there a reason for handling non-UTF-8 character sets this way versus using something like mb_convert_encoding()?

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: I made a thing (again)

Postby fox » 13 Jul 2015, 19:19

we need to remove meta elements anyway because domdocument will otherwise use it to output source encoding on saveHTML() and everything will break a bit later

the real question is what's the encoding of that heise page if its not UTF-8

e: lol, it's "utf-8". amazing.

e2: i'll never understand why we make a utf-8 domdocument, load some utf-8 content, remove a few tags, and it breaks on the output.

e3: i wonder what if the feed is ucs-4 or something, what would happen then

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: I made a thing (again)

Postby JustAMacUser » 13 Jul 2015, 20:00

My experience is the PHP's DOMDocument module(s) are horribly broken when it comes to character sets.

The feed parser class goes to great lengths to handle character sets. I'm going to guess something like that is probably what's needed.

Maru
Bear Rating Trainee
Bear Rating Trainee
Posts: 40
Joined: 20 Oct 2013, 14:26

Re: I made a thing (again)

Postby Maru » 13 Jul 2015, 23:36

fox wrote:we need to remove meta elements anyway because domdocument will otherwise use it to output source encoding on saveHTML() and everything will break a bit later

the real question is what's the encoding of that heise page if its not UTF-8

e: lol, it's "utf-8". amazing.

e2: i'll never understand why we make a utf-8 domdocument, load some utf-8 content, remove a few tags, and it breaks on the output.

e3: i wonder what if the feed is ucs-4 or something, what would happen then


hmm should this be fixed by your recent changes to the plugin? I am still getting the wrong characters here..

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: I made a thing (again)

Postby fox » 13 Jul 2015, 23:39

at least your heise feed parsed correctly last time I tried it.

Maru
Bear Rating Trainee
Bear Rating Trainee
Posts: 40
Joined: 20 Oct 2013, 14:26

Re: I made a thing (again)

Postby Maru » 14 Jul 2015, 08:49

fox wrote:at least your heise feed parsed correctly last time I tried it.

Strange, the characters are still broken for me on this feet.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: I made a thing (again)

Postby fox » 14 Jul 2015, 09:01

https://fakecake.org/uploads/2015/20150762SoJn.png

old articles won't update unless you run the feed through the debugger

Maru
Bear Rating Trainee
Bear Rating Trainee
Posts: 40
Joined: 20 Oct 2013, 14:26

Re: I made a thing (again)

Postby Maru » 14 Jul 2015, 09:54

fox wrote:https://fakecake.org/uploads/2015/20150762SoJn.png

old articles won't update unless you run the feed through the debugger

I did and I still get the wrong characters. (fD and then tick both checkboxes).

edit1: Oook, apparently $tmpdoc->encoding is empty for me so the if always triggers.

I found a workaround though if I call mb_convert_encoding PRIOR to loadHTML it works. Something along the lines of

Code: Select all

$tmpdoc = new DOMDocument("1.0", "UTF-8");
$tmp = mb_convert_encoding($tmp, 'HTML-ENTITIES', "UTF-8");
if (!$tmpdoc->loadHTML($tmp))
 return $article;


edit2: Ok I also found another solution with with mb_detect_encoding.
That said I will stop using it since it really does some stupid things with forum feeds. the tt-rss feed for example shows the content of the first post for all related replies. I think I understand why since the link is pointing at the entry itself.


Return to “Development”

Who is online

Users browsing this forum: No registered users and 3 guests