Trouble with latest xkcd cartoon

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only
matrix64
Bear Rating Trainee
Bear Rating Trainee
Posts: 5
Joined: 17 Nov 2009, 19:19

Trouble with latest xkcd cartoon

Postby matrix64 » 15 Jun 2015, 14:03

Today I've noticed tt-rss daemon logging some errors in syslog:

Code: Select all

Jun 15 12:45:43 akula php: [tt-rss] E_USER_ERROR (256) (classes/db/mysqli.php:33) Query INSERT INTO ttrss_entries
                                                        (title,
                                                        guid,
                                                        link,
                                                        updated,
                                                        content,
                                                        content_hash,
                                                        no_orig_date,
                                                        date_updated,
                                                        date_entered,
                                                        comments,
                                                        num_comments,
                                                        plugin_data,
                                                        lang,
                                                        author)
                                                VALUES
                                                        ('Lyrics',
                                                        'SHA1:74fb4c1d1149ef619b07714c8369a9e8b90c4d65',
                                                        'http://xkcd.com/1538/',
                                                        '2015/06/15 04:00:00',
                                                        '<img src=\"http://imgs.xkcd.com/comics/lyrics.png\" title=\"To me, trying to understand song lyrics feels like when I see text in a dream but it𝔰 hอᵣd t₀ ᵣeₐd aกd 𝒾 canٖt fཱྀcu༧༦࿐༄\" alt=\"To me, trying to understand song lyrics feels like when I see text in a dream but it𝔰 hอᵣd t₀ ᵣeₐd aกd 𝒾 canٖt fཱྀcu༧༦࿐༄\" />',
                                                        '9e484d43e324a1039d7b2cd0afa6218d9a862039',
                                                        false,
                                                        NOW(),
                                                        '2015-06-15 10:45',
                                                        '',
                                                        '0',
                                                        '',
                                                        '',
                                                        '') failed: Incorrect string value: '\xF0\x9D\x94\xB0 h...' for column 'content' at row 1
Jun 15 12:45:43 akula php: [tt-rss] E_WARNING (2) (include/rssfuncs.php:1373) Invalid argument supplied for foreach()
Jun 15 12:45:43 akula php: [tt-rss] E_WARNING (2) (include/rssfuncs.php:1029) Invalid argument supplied for foreach()

I'm using the latest TTRSS version from Git on CentOS 7.
Is this a bug in TTRSS, misconfiguration on my part or something else?

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Trouble with latest xkcd cartoon

Postby fox » 15 Jun 2015, 14:25

i think its trying to tell you not to read this horrible comic but i could be mistaken

xtaz
Bear Rating Master
Bear Rating Master
Posts: 174
Joined: 24 Dec 2009, 16:48

Re: Trouble with latest xkcd cartoon

Postby xtaz » 15 Jun 2015, 14:29

It could also be telling you to ditch MySQL and use PostgreSQL instead as that particular comic loaded fine in my tt-rss this morning which uses pgsql. MySQL is obviously having some kind of a mental at the UTF8 characters.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Trouble with latest xkcd cartoon

Postby fox » 15 Jun 2015, 14:40

supposedly tt-rss is escaping data before inserting but given that mysql has like ten different escaping functions called like mysql_escape_this_shit_for_reals_pls() maybe i'm just using the incorrect one idk

but yeah don't use mysql

etu
Bear Rating Trainee
Bear Rating Trainee
Posts: 4
Joined: 15 Jun 2015, 14:59

Re: Trouble with latest xkcd cartoon

Postby etu » 15 Jun 2015, 15:02

Well, you should really leave the mysql_* functions behind anyways and go PDO all the way.

Then you can use prepared statements and get the correct escaping for free built in.

Plus you can use the same connection class to connect to both PG and My, no need to have separated code for that. Of course you still need different queries for it to work, but it's still a bit less duplication of code.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Trouble with latest xkcd cartoon

Postby fox » 15 Jun 2015, 15:14

thanks for your sage advice i'm expecting your pull request with everything rewritten to prepared statements any day now

>Plus you can use the same connection class to connect to both PG and My, no need to have separated code for that. Of course you still need different queries for it to work, but it's still a bit less duplication of code.

wow what amazing ideas you have

etu
Bear Rating Trainee
Bear Rating Trainee
Posts: 4
Joined: 15 Jun 2015, 14:59

Re: Trouble with latest xkcd cartoon

Postby etu » 15 Jun 2015, 15:29

Well, didn't mean to be that rude.

Maybe at some point I'll engage myself to do development. I'll just jump into the IRC some day.

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: Trouble with latest xkcd cartoon

Postby JustAMacUser » 15 Jun 2015, 15:42

xkcd worked fine today with my install using MySQL.

randompherret
Bear Rating Trainee
Bear Rating Trainee
Posts: 36
Joined: 04 Jul 2013, 08:11

Re: Trouble with latest xkcd cartoon

Postby randompherret » 15 Jun 2015, 15:45

You might check your character set. there are some crazy characters in the caption. I use mysql and it insterted fine, just cut it off at the first wierd s

matrix64
Bear Rating Trainee
Bear Rating Trainee
Posts: 5
Joined: 17 Nov 2009, 19:19

Re: Trouble with latest xkcd cartoon

Postby matrix64 » 15 Jun 2015, 17:27

default_charset in php.ini is set to UTF8, so is MariaDB's in my.cnf for server and client. Charset in TTRSS config.php for MySQL is also set to UTF8.

I don't know of any other config locations.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Trouble with latest xkcd cartoon

Postby fox » 15 Jun 2015, 19:51

there's some obvious random binary garbage in your log, i'm not sure why it is not filtered by mysql escape. maybe it's not supposed to idk.

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: Trouble with latest xkcd cartoon

Postby JustAMacUser » 16 Jun 2015, 04:04

After a little bit of research it appears the problem is related to UTF-8 in MySQL. When the database/table is set to utf8_general_ci MySQL will allow up to 3-bytes per character. This xkcd comic uses unicode that is 4-bytes per character. I'm not sure why the OP has the INSERT completely fail. Mine worked in that I could view the comic, but the content column for that row was truncated at the first 4-byte character, which was right in the middle of the img @alt text.

The issue can be worked around in PHP by filtering the content before it goes into the database. Fox has this code in the feed parser:

Code: Select all

$data = preg_replace('/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $data);


And a variation of this could be used to strip the higher characters before inserting the column into the database. The problem is that it's a clunky workaround.

The real solution is to convert all character sets and collations to utf8mb4 when using MySQL. I haven't finished testing this but it means some column indexes need to have their size adjusted to account for the byte change. A quick test in my dev environment with utf8mb4 resulted in the xkcd comic being inserted properly. An alternative to altering index sizes is to set innodb_large_prefix to true, but it has other dependencies and because it's disabled by default would likely cause most MySQL installs of TT-RSS to fail.

I found this blog post (https://mathiasbynens.be/notes/mysql-utf8mb4) which does a pretty good job of summarizing what's needed.

e: PHP might be the best way to handle this for now. After a bit of reading, using prefix indexes is likely to make the indexes useless and enabling larger InnoDB index support seems cumbersome at the moment. Something like:

Code: Select all

preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);


From this Stack Overflow answer: http://stackoverflow.com/a/24672780

This replaces 4-byte characters with a placeholder character. If we add this for MySQL only right before the content is escaped and inserted/updated it could fix it. I can do a pull request if fox okay's this (I can test it a bit more too, since I'm on mobile at the moment).

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Trouble with latest xkcd cartoon

Postby fox » 16 Jun 2015, 07:59

this is some nice research, thanks

i'm not sure about the pull request because it only handles this oddity for content, so if those characters are in article title or w/e input might still fail. maybe it would be better to put this preg_replace() directly into mysql and mysqli DB classes escape() function so it would be handled for all input? i'm not sure what the performance hit would be but it shouldn't be that bad.

e: or maybe just go through article[] after it has been processed by the plugins and handle it for all fields there.

e2: https://github.com/gothfox/Tiny-Tiny-RS ... 9ce1d34b8a does this work?

matrix64
Bear Rating Trainee
Bear Rating Trainee
Posts: 5
Joined: 17 Nov 2009, 19:19

Re: Trouble with latest xkcd cartoon

Postby matrix64 » 16 Jun 2015, 13:59


Yes, errors are gone. Thank you! :D

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: Trouble with latest xkcd cartoon

Postby JustAMacUser » 16 Jun 2015, 23:11

fox wrote:i'm not sure about the pull request because it only handles this oddity for content, so if those characters are in article title or w/e input might still fail.


You're right. It was late when I put that together and obviously I had a bit of tunnel vision. Your commit for applying it to the whole article is the way to go. Thanks.


Return to “Support”

Who is online

Users browsing this forum: No registered users and 11 guests