Page 1 of 1

Specifing codepage for single feed

Posted: 10 Apr 2013, 09:49
by fhoshino
I'd like to request an option to specify the codepage for a single feed (not for writing to the database)
As one of my feed is not in UTF-8, I sometime get garbaged strings.

Re: Specifing codepage for single feed

Posted: 10 Apr 2013, 21:52
by phz
The problem is most likely is that the feed announces the wrong charset. That is a bug in the specific feed.

If you post a link to the feed in question people can look at it and perhaps try to figure what's wrong.

Re: Specifing codepage for single feed

Posted: 10 Apr 2013, 21:55
by fhoshino
I've post in G+ but there is no answer.
https://plus.google.com/109766076672185 ... DnEwWAxRnX

Re: Specifing codepage for single feed

Posted: 10 Apr 2013, 22:17
by phz
http://www.rthk.org.hk/rthk/news/rss/c_expressnews.xml — this is the feed in question for those who wonder.

It seems to be Chinese characters in big5 encoding, and it announces itself as that as well. The `file` tool identifies the file as ISO 8859-1, but that involves quite some guessing from its part, and running it through `iconv` interpreted as big5 seems to be perfectly consistent.

I actually don't know why it doesn't work as it should. Perhaps someone with more internal knowledge on how encoding is handled in TT-RSS can give more info on whether TT-RSS is to "blame" here.

One weird and technically convoluted hack could be to setup a local translation script that polled the feed and converted it to UTF-8 on-the-fly, but I guess that is not really an option for most people.

Re: Specifing codepage for single feed

Posted: 10 Apr 2013, 22:31
by fox
There will be no per-feed charset dropdowns. This should be properly fixed by contacting the publisher and asking them to fix their fucking feed. If that is not possible, this could be handled by a plugin (there's a hook for that).

Re: Specifing codepage for single feed

Posted: 10 Apr 2013, 23:42
by LifeWOutMilk
Image

The feed appears to be fine. It doesn't look like the issue is with tt-rss.

Re: Specifing codepage for single feed

Posted: 11 Apr 2013, 09:52
by phz
LifeWOutMilk wrote:Image

The feed appears to be fine. It doesn't look like the issue is with tt-rss.

Actually testing the feed to see if you can reproduce the bug!? Oh, I never… :-D

Well, that's great then. As I said, the feed seems to represent itself as big5, as it should.

As for the original poster, here are some leading questions for error searching:
  • which TT-RSS version are you using? Make sure it is somewhat recent.
  • which browser are you using (which version, and on what OS)? If you paste the URL to the XML feed directly into the browser, can you see the correct characters? How about if you "View source"? Can you try accessing the feed via TT-RSS in another browser?

Re: Specifing codepage for single feed

Posted: 11 Apr 2013, 11:27
by fhoshino
phz wrote:http://www.rthk.org.hk/rthk/news/rss/c_expressnews.xml — this is the feed in question for those who wonder.

It seems to be Chinese characters in big5 encoding, and it announces itself as that as well. The `file` tool identifies the file as ISO 8859-1, but that involves quite some guessing from its part, and running it through `iconv` interpreted as big5 seems to be perfectly consistent.

I actually don't know why it doesn't work as it should. Perhaps someone with more internal knowledge on how encoding is handled in TT-RSS can give more info on whether TT-RSS is to "blame" here.

One weird and technically convoluted hack could be to setup a local translation script that polled the feed and converted it to UTF-8 on-the-fly, but I guess that is not really an option for most people.


As I mention, the feed is sometimes rendered correctly, but sometimes it doesn't.
Yes, the feed is big5, and it has said to be big5.
I'm on trunk code (git source), I'm using firefox nightly on windows 8 x64, and it correctly renders the xml file without a problem.
My server is a WAMP resides on a windows 7 x86 machine.

Re: Specifing codepage for single feed

Posted: 11 Apr 2013, 11:29
by fhoshino
fox wrote:There will be no per-feed charset dropdowns. This should be properly fixed by contacting the publisher and asking them to fix their fucking feed. If that is not possible, this could be handled by a plugin (there's a hook for that).


I guess there is no way to make them fix the feed (a government organization which is reluctant to change anything for the public).
Could you guide me how I can setup the plugin?

Re: Specifing codepage for single feed

Posted: 11 Apr 2013, 18:17
by phz
fhoshino wrote:As I mention, the feed is sometimes rendered correctly, but sometimes it doesn't.

Try saving a snapshot of the feed on an occasion when it doesn't work for others to check.

Re: Specifing codepage for single feed

Posted: 11 Apr 2013, 18:28
by fhoshino
phz wrote:
fhoshino wrote:As I mention, the feed is sometimes rendered correctly, but sometimes it doesn't.

Try saving a snapshot of the feed on an occasion when it doesn't work for others to check.

1.png
1.png (136.71 KiB) Viewed 3122 times

It works right sometimes, but sometimes it doesn't.