Page 1 of 3

I made a thing (again)

Posted: 07 Jul 2015, 00:01
by fox
It even seems to work

https://github.com/gothfox/Tiny-Tiny-RS ... eadability

Also, af_redditimgur has similar functionality now.

Re: I made a thing (again)

Posted: 07 Jul 2015, 01:26
by JustAMacUser
Awesome!

TT-RSS just keeps getting better.

Re: I made a thing (again)

Posted: 07 Jul 2015, 21:41
by nameless
great feature.
i think it would come handy if you could enable this option when subscribing to feed.
then again i dunno if there is a hook for that.

Re: I made a thing (again)

Posted: 07 Jul 2015, 21:55
by fox
there isn't; it's not like its hard to enable it in feed editor anyway

Re: I made a thing (again)

Posted: 13 Jul 2015, 16:05
by xtaz
I'd just like to say that this thing is awesome. I've had several different feeds which used to give full text but changed it to only show the first paragraph and this restores the full text feed on every one. Amazing work.

Re: I made a thing (again)

Posted: 13 Jul 2015, 18:03
by Maru
Hnmm, I enabled this for one of my feeds here and while it indeed shows the full article now the encoding does not seem to work correctly. All umlauts are shown in the wrong encoding (you see two characters for 2byte utf8 characters, something you normally only see if you have utf-8 content but the wrong encoding). The feed in question is.

http://www.heise.de/newsticker/heise-atom.xml

Without the plugin enabled and only showing the summary the umlauts are shown correctly, so maybe something goes wrong when fetching the full article..

Re: I made a thing (again)

Posted: 13 Jul 2015, 18:45
by fox
it could be broken, i had to add some workarounds so that non-unicode html pages would work properly, no idea why would it choke on utf8.

Re: I made a thing (again)

Posted: 13 Jul 2015, 18:53
by JustAMacUser
In the af_readability pluging, commenting out:

Code: Select all

         if ($tmpdoc->encoding != 'UTF-8') {
            $tmpxpath = new DOMXPath($tmpdoc);

            foreach ($tmpxpath->query("//meta") as $elem) {
               $elem->parentNode->removeChild($elem);
            }

            $tmp = $tmpdoc->saveHTML();
         }


Fixed it for the feed in question, but I imagine this will break non-UTF-8 pages.

fox: Is there a reason for handling non-UTF-8 character sets this way versus using something like mb_convert_encoding()?

Re: I made a thing (again)

Posted: 13 Jul 2015, 19:19
by fox
we need to remove meta elements anyway because domdocument will otherwise use it to output source encoding on saveHTML() and everything will break a bit later

the real question is what's the encoding of that heise page if its not UTF-8

e: lol, it's "utf-8". amazing.

e2: i'll never understand why we make a utf-8 domdocument, load some utf-8 content, remove a few tags, and it breaks on the output.

e3: i wonder what if the feed is ucs-4 or something, what would happen then

Re: I made a thing (again)

Posted: 13 Jul 2015, 20:00
by JustAMacUser
My experience is the PHP's DOMDocument module(s) are horribly broken when it comes to character sets.

The feed parser class goes to great lengths to handle character sets. I'm going to guess something like that is probably what's needed.

Re: I made a thing (again)

Posted: 13 Jul 2015, 23:36
by Maru
fox wrote:we need to remove meta elements anyway because domdocument will otherwise use it to output source encoding on saveHTML() and everything will break a bit later

the real question is what's the encoding of that heise page if its not UTF-8

e: lol, it's "utf-8". amazing.

e2: i'll never understand why we make a utf-8 domdocument, load some utf-8 content, remove a few tags, and it breaks on the output.

e3: i wonder what if the feed is ucs-4 or something, what would happen then


hmm should this be fixed by your recent changes to the plugin? I am still getting the wrong characters here..

Re: I made a thing (again)

Posted: 13 Jul 2015, 23:39
by fox
at least your heise feed parsed correctly last time I tried it.

Re: I made a thing (again)

Posted: 14 Jul 2015, 08:49
by Maru
fox wrote:at least your heise feed parsed correctly last time I tried it.

Strange, the characters are still broken for me on this feet.

Re: I made a thing (again)

Posted: 14 Jul 2015, 09:01
by fox
https://fakecake.org/uploads/2015/20150762SoJn.png

old articles won't update unless you run the feed through the debugger

Re: I made a thing (again)

Posted: 14 Jul 2015, 09:54
by Maru
fox wrote:https://fakecake.org/uploads/2015/20150762SoJn.png

old articles won't update unless you run the feed through the debugger

I did and I still get the wrong characters. (fD and then tick both checkboxes).

edit1: Oook, apparently $tmpdoc->encoding is empty for me so the if always triggers.

I found a workaround though if I call mb_convert_encoding PRIOR to loadHTML it works. Something along the lines of

Code: Select all

$tmpdoc = new DOMDocument("1.0", "UTF-8");
$tmp = mb_convert_encoding($tmp, 'HTML-ENTITIES', "UTF-8");
if (!$tmpdoc->loadHTML($tmp))
 return $article;


edit2: Ok I also found another solution with with mb_detect_encoding.
That said I will stop using it since it really does some stupid things with forum feeds. the tt-rss feed for example shows the content of the first post for all related replies. I think I understand why since the link is pointing at the entry itself.