It even seems to work
https://github.com/gothfox/Tiny-Tiny-RS ... eadability
Also, af_redditimgur has similar functionality now.
I made a thing (again)
-
- Bear Rating Overlord
- Posts: 373
- Joined: 20 Aug 2013, 23:13
Re: I made a thing (again)
Awesome!
TT-RSS just keeps getting better.
TT-RSS just keeps getting better.
Re: I made a thing (again)
great feature.
i think it would come handy if you could enable this option when subscribing to feed.
then again i dunno if there is a hook for that.
i think it would come handy if you could enable this option when subscribing to feed.
then again i dunno if there is a hook for that.
- fox
- ^ me reading your posts ^
- Posts: 6318
- Joined: 27 Aug 2005, 22:53
- Location: Saint-Petersburg, Russia
- Contact:
Re: I made a thing (again)
there isn't; it's not like its hard to enable it in feed editor anyway
Re: I made a thing (again)
I'd just like to say that this thing is awesome. I've had several different feeds which used to give full text but changed it to only show the first paragraph and this restores the full text feed on every one. Amazing work.
Re: I made a thing (again)
Hnmm, I enabled this for one of my feeds here and while it indeed shows the full article now the encoding does not seem to work correctly. All umlauts are shown in the wrong encoding (you see two characters for 2byte utf8 characters, something you normally only see if you have utf-8 content but the wrong encoding). The feed in question is.
http://www.heise.de/newsticker/heise-atom.xml
Without the plugin enabled and only showing the summary the umlauts are shown correctly, so maybe something goes wrong when fetching the full article..
http://www.heise.de/newsticker/heise-atom.xml
Without the plugin enabled and only showing the summary the umlauts are shown correctly, so maybe something goes wrong when fetching the full article..
- fox
- ^ me reading your posts ^
- Posts: 6318
- Joined: 27 Aug 2005, 22:53
- Location: Saint-Petersburg, Russia
- Contact:
Re: I made a thing (again)
it could be broken, i had to add some workarounds so that non-unicode html pages would work properly, no idea why would it choke on utf8.
-
- Bear Rating Overlord
- Posts: 373
- Joined: 20 Aug 2013, 23:13
Re: I made a thing (again)
In the af_readability pluging, commenting out:
Fixed it for the feed in question, but I imagine this will break non-UTF-8 pages.
fox: Is there a reason for handling non-UTF-8 character sets this way versus using something like mb_convert_encoding()?
Code: Select all
if ($tmpdoc->encoding != 'UTF-8') {
$tmpxpath = new DOMXPath($tmpdoc);
foreach ($tmpxpath->query("//meta") as $elem) {
$elem->parentNode->removeChild($elem);
}
$tmp = $tmpdoc->saveHTML();
}
Fixed it for the feed in question, but I imagine this will break non-UTF-8 pages.
fox: Is there a reason for handling non-UTF-8 character sets this way versus using something like mb_convert_encoding()?
- fox
- ^ me reading your posts ^
- Posts: 6318
- Joined: 27 Aug 2005, 22:53
- Location: Saint-Petersburg, Russia
- Contact:
Re: I made a thing (again)
we need to remove meta elements anyway because domdocument will otherwise use it to output source encoding on saveHTML() and everything will break a bit later
the real question is what's the encoding of that heise page if its not UTF-8
e: lol, it's "utf-8". amazing.
e2: i'll never understand why we make a utf-8 domdocument, load some utf-8 content, remove a few tags, and it breaks on the output.
e3: i wonder what if the feed is ucs-4 or something, what would happen then
the real question is what's the encoding of that heise page if its not UTF-8
e: lol, it's "utf-8". amazing.
e2: i'll never understand why we make a utf-8 domdocument, load some utf-8 content, remove a few tags, and it breaks on the output.
e3: i wonder what if the feed is ucs-4 or something, what would happen then
-
- Bear Rating Overlord
- Posts: 373
- Joined: 20 Aug 2013, 23:13
Re: I made a thing (again)
My experience is the PHP's DOMDocument module(s) are horribly broken when it comes to character sets.
The feed parser class goes to great lengths to handle character sets. I'm going to guess something like that is probably what's needed.
The feed parser class goes to great lengths to handle character sets. I'm going to guess something like that is probably what's needed.
Re: I made a thing (again)
fox wrote:we need to remove meta elements anyway because domdocument will otherwise use it to output source encoding on saveHTML() and everything will break a bit later
the real question is what's the encoding of that heise page if its not UTF-8
e: lol, it's "utf-8". amazing.
e2: i'll never understand why we make a utf-8 domdocument, load some utf-8 content, remove a few tags, and it breaks on the output.
e3: i wonder what if the feed is ucs-4 or something, what would happen then
hmm should this be fixed by your recent changes to the plugin? I am still getting the wrong characters here..
- fox
- ^ me reading your posts ^
- Posts: 6318
- Joined: 27 Aug 2005, 22:53
- Location: Saint-Petersburg, Russia
- Contact:
Re: I made a thing (again)
at least your heise feed parsed correctly last time I tried it.
Re: I made a thing (again)
fox wrote:at least your heise feed parsed correctly last time I tried it.
Strange, the characters are still broken for me on this feet.
- fox
- ^ me reading your posts ^
- Posts: 6318
- Joined: 27 Aug 2005, 22:53
- Location: Saint-Petersburg, Russia
- Contact:
Re: I made a thing (again)
https://fakecake.org/uploads/2015/20150762SoJn.png
old articles won't update unless you run the feed through the debugger
old articles won't update unless you run the feed through the debugger
Re: I made a thing (again)
fox wrote:https://fakecake.org/uploads/2015/20150762SoJn.png
old articles won't update unless you run the feed through the debugger
I did and I still get the wrong characters. (fD and then tick both checkboxes).
edit1: Oook, apparently $tmpdoc->encoding is empty for me so the if always triggers.
I found a workaround though if I call mb_convert_encoding PRIOR to loadHTML it works. Something along the lines of
Code: Select all
$tmpdoc = new DOMDocument("1.0", "UTF-8");
$tmp = mb_convert_encoding($tmp, 'HTML-ENTITIES', "UTF-8");
if (!$tmpdoc->loadHTML($tmp))
return $article;
edit2: Ok I also found another solution with with mb_detect_encoding.
That said I will stop using it since it really does some stupid things with forum feeds. the tt-rss feed for example shows the content of the first post for all related replies. I think I understand why since the link is pointing at the entry itself.
Who is online
Users browsing this forum: No registered users and 2 guests