Fixing LibXML error "Extra content at the end of document"

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only
durval
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 27 Jul 2013, 13:35

Fixing LibXML error "Extra content at the end of document"

Postby durval » 27 Jul 2013, 18:04

Hello folks,

Thought this might be of interest: today, while reviewing for the first time my tt-rss "Feeds with update errors" window, found that quite a few of them had the following error:

Code: Select all

 LibXML error 5 at line 68 (column 1): Extra content at the end of the document

(line and column numbers of course varied).

Examined the XML and found that the webmaster added a Google Analytics block of code at the end of the RSS code (ie, right after the "\</xml>" end tag).

So I did a little searching here on the forum and found this excellent post by raindog469, but unfortunately it didn't solve my problem right away, so I fiddled a little with it and, by adding one more line, ended up with something that worked for my case by adding one extra line:

Code: Select all

#!/usr/bin/perl
use CGI qw(:standard);
my $url = param("feed");

die "Bad URL" unless $url =~ /^https?:/i;

open WGET, "-|", "wget", "-O-", $url or die $!;
my $feed = join('', <WGET>);

$feed =~ s/[^\x0a-\x7e]/ /g;
1 while $feed =~ s/(href="[^\"]+)\s([^\"]*)"/$1%20$2/ig;
$feed =~ s/&/&amp;/g;
$feed =~ s/&amp;amp;/&amp;/g;
$feed =~ s/&amp;lt;/&lt;/g;
$feed =~ s/&amp;gt;/&gt;/g;
$feed =~ s/&amp;quot;/&quot;/g;
$feed =~ s/></>\n</g;
$feed =~ s/<\/rss>.*$/<\/rss>\n/si;

print header("application/rss+xml");
print $feed;
exit 0;


Posting it in case it helps anyone else.

Fox, what about incorporating similar code directly in TT-RSS?

Cheers,
--
Durval.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Fixing LibXML error "Extra content at the end of documen

Postby fox » 28 Jul 2013, 00:06

>Fox, what about incorporating similar code directly in TT-RSS?

https://en.wikipedia.org/wiki/Garbage_In%2C_Garbage_Out

durval
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 27 Jul 2013, 13:35

Re: Fixing LibXML error "Extra content at the end of documen

Postby durval » 28 Jul 2013, 02:21

Hi Fox,

fox wrote:>Fox, what about incorporating similar code directly in TT-RSS?
https://en.wikipedia.org/wiki/Garbage_In%2C_Garbage_Out

Humrmrmr... good point, but please consider:

http://en.wikipedia.org/wiki/Be_conservative_in_what_you_send,_be_liberal_in_what_you_accept

instead of Babbage's (which is cited in the Wiki page you linked to and who, despite being a genious, never built much of anything), wouldn't you rather be on Postel's side (which helped build the Internet)?

Cheers,
--
Durval.

Sidicas
Bear Rating Trainee
Bear Rating Trainee
Posts: 12
Joined: 15 May 2013, 14:24

Re: Fixing LibXML error "Extra content at the end of documen

Postby Sidicas » 30 Jul 2013, 08:23

durval wrote:Humrmrmr... good point, but please consider:

http://en.wikipedia.org/wiki/Be_conservative_in_what_you_send,_be_liberal_in_what_you_accept

instead of Babbage's (which is cited in the Wiki page you linked to and who, despite being a genious, never built much of anything), wouldn't you rather be on Postel's side (which helped build the Internet)?

Cheers,
--
Durval.

That's what Microsoft did when they made Internet Explorer... Generally considered today to be bad decisions all around since you've now got all sorts of websites out there that render fine in IE but don't render properly in any generic standards-compliant browser.

Contact the website and ask them to fix their feed. I'm pretty sure you're not supposed to have any content outside of the XML boundaries. The best part, is that there is a lot of open source feed parsers out there besides tt-rss that use libXML and they'll throw the exact same error. So if you have the author fix the feed, it fixes it for everybody. If you patch tt-rss it only fixes it for tt-rss users and that's just not thinking about the bigger picture.

durval
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 27 Jul 2013, 13:35

Re: Fixing LibXML error "Extra content at the end of documen

Postby durval » 01 Aug 2013, 17:59

Hi Sidicas,

That's what Microsoft did when they made Internet Explorer... Generally considered today to be bad decisions all around since you've now got all sorts of websites out there that render fine in IE but don't render properly in any generic standards-compliant browser


I agree with you that Internet Explorer was a very "bad decision", but I fail to see how it could possibly be related to the principle I mentioned, namely to "be conservative in what you send and liberal in what you accept": if anything, MS decisions regarding IE were exactly the contrary: not only they failed to accept a lot of very common HTML and Javascript at the time (ie, they were exactly the opposite of "be liberal in what you accept"), but they also pushed their own incompatible extensions (not only in HTML/Javascript but also Active X and other "lock-in" shenanigans), so what they did was also the opposite of "be strict in what you generate". So I'm sorry, but I think that your mention of IE as a "bad example" serves at best to confirm my thesis instead of denying it (and at worst is a complete "non sequitur")...

About contacting the website and asking them to fix the feed: I did it more than once, and the few responses I got back were on the line of "but it works with "RSS Reader X and Y and Z, so the fault must be at your end"... and I agree with them: if the other readers accept it, then it may not be right "de juris", but it indeed is right "de facto", and TT-RSS should consider following suit.

About libXML and other RSS Reader software: can you cite another RSS Reader software which has the same issues with these XML mistakes as TT-RSS? If not, do you agree that they are probably fixing the XML before feeding it to LibXML? That's exactly what I'm suggesting that TT-RSS should do, too.

Cheers,
--
Durval.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Fixing LibXML error "Extra content at the end of documen

Postby fox » 01 Aug 2013, 18:47

>and I agree with them: if the other readers accept it, then it may not be right "de juris", but it indeed is right "de facto", and TT-RSS should consider following suit.

Yes, let's all cater to idiots who produce broken content because there seems to be a lot of them and being idiots they are unlikely to change. Excellent idea right there. Instead of raising the bar, let's lower it even further.

I can understand why google reader wannabe services cater to their deranged demographic - they have a monetization strategy involving the cattle of their users. I am interested in nothing of the sort so both people who produce broken ass XML and people who demand support for it can go fuck themselves (or each other, whatever strikes their fancy). I hope I'm making myself clear enough because my position on this issue is not going to change.

People should learn to own up to their shitty programming and fix it instead of dragging everyone else into their cesspool of mediocrity.

>About libXML and other RSS Reader software: can you cite another RSS Reader software which has the same issues with these XML mistakes as TT-RSS? If not, do you agree that they are probably fixing the XML before feeding it to LibXML? That's exactly what I'm suggesting that TT-RSS should do, too.

If you had spent a few minutes searching this forum instead of posting essays on the subject of what tt-rss should do, you would have discovered several ways of doing just so which fit within the overall framework provided by the application.

Then again, that would require intelligence someone blindly assuming invariably broken XML as a de-facto standard would probably lack.

durval
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 27 Jul 2013, 13:35

Re: Fixing LibXML error "Extra content at the end of documen

Postby durval » 01 Aug 2013, 19:54

Hi Fox,

fox wrote:>and I agree with them: if the other readers accept it, then it may not be right "de juris", but it indeed is right "de facto", and TT-RSS should consider following suit.

Yes, let's all cater to idiots who produce broken content because there seems to be a lot of them and being idiots they are unlikely to change. Excellent idea right there. Instead of raising the bar, let's lower it even further.


That's certainly one (rather radical, IMHO) way of putting it; the other way (which I prefer) is simply to try to be as interoperable as possible and so to cather to as much users as possible.

fox wrote:I can understand why google reader wannabe services cater to their deranged demographic - they have a monetization strategy involving the cattle of their users. I am interested in nothing of the sort so both people who produce broken ass XML and people who demand support for it can go fuck themselves (or each other, whatever strikes their fancy). I hope I'm making myself clear enough because my position on this issue is not going to change.


:-) That's not only radical but also very graphical :-) Anyway, thanks for making yourself crystal clear on this subject. I shall not insist on it further; if TT-RSS ever bothers me so much in this regard, I will just fork it and have a go at it myself (thanks for making it open source).

I should point out that IMHO It's not just about monetization: it's about making the software as useful as possible for as much people as possible. And people sometimes want to access content that's residing in servers that are returning less-than-ideal XML... telling them to go fsck themselves does not solve the issue.

On a side note, if you really don't care about monetization, perhaps you should consider taking out the "donate" button on the TT-RSS Wiki and also quit the flattr thing, saying that you are not interested in monetization and at the same time having these solicitations up might sound hypocritical (and bi, telling everyone who might think it hypocritical to go fsck themselves up or each other along with the "people who produce broken ass XML and the people who demand support for it" also won't solve it).

fox wrote:[...]
>About libXML and other RSS Reader software: can you cite another RSS Reader software which has the same issues with these XML mistakes as TT-RSS? If not, do you agree that they are probably fixing the XML before feeding it to LibXML? That's exactly what I'm suggesting that TT-RSS should do, too.


fox wrote:If you had spent a few minutes searching this forum instead of posting essays on the subject of what tt-rss should do, you would have discovered several ways of doing just so which fit within the overall framework provided by the application.
Then again, that would require intelligence someone blindly assuming invariably broken XML as a de-facto standard would probably lack.


Do you really have to go at it "ad hominen"? it weakens your whole argument, and moreover it's patently false: please notice that the first thing I posted in this thread was a reference to another thread here on the forum (which I found by yes, searching) where a partial solution was offered, and also posted my go at making it more comprehensive... so I'm clearly not only "posting essays on the subject"...

OTOH, perhaps I was not able to locate other solutions that could have bee posted here for dealing with these issues that you refuse to code into TT-RSS; if you could be so kind as to post links to them instead of trying to offend me (no, I'm not offended, at least not yet), it would be much more productive not only for both of us but also for the other poor folks who could search for a way to fix this kind of issue in the future... and telling me to go fsck myself won't help anyone either.

Cheers,
--
Durval.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Fixing LibXML error "Extra content at the end of documen

Postby fox » 01 Aug 2013, 20:29

>That's certainly one (rather radical, IMHO) way of putting it; the other way (which I prefer) is simply to try to be as interoperable as possible and so to cather to as much users as possible.

GLHF.

>On a side note, if you really don't care about monetization, perhaps you should consider taking out the "donate" button on the TT-RSS Wiki and also quit the flattr thing, saying that you are not interested in monetization and at the same time having these solicitations up might sound hypocritical

The fuck are you talking about? Wait, don't answer, I don't want to know. Stop posting instead. I'm about as interested in reading your wall of text essays as I am in working around broken XML.

gbcox
Bear Rating Master
Bear Rating Master
Posts: 149
Joined: 25 Apr 2013, 04:52

Re: Fixing LibXML error "Extra content at the end of documen

Postby gbcox » 01 Aug 2013, 20:32

fox wrote:People should learn to own up to their shitty programming and fix it instead of dragging everyone else into their cesspool of mediocrity.

Amen!

durval wrote:but I think that your mention of IE as a "bad example" serves at best to confirm my thesis instead of denying it (and at worst is a complete "non sequitur")...

That's a bit of a reach, and no it doesn't confirm your thesis. The bottom line is "one bad apple spoils the barrel".

In my view, there are people out there who are delusional and don't want to take the extra fraction of a second to do the right thing. Instead, for whatever perverse reason, they much rather spin their wheels for hours on end coming up with perverse mechanizations to reach an end result. Then, they expect the rest of us to stand in line and feed the Frankenstein monster they have created.

There are plenty of feeds out there. If someone refuses to own up and fix theirs then dump it and choose another. I've found that most people aren't aware that there is a problem and are happy to fix their stuff.

AngryChris
Bear Rating Master
Bear Rating Master
Posts: 135
Joined: 08 Apr 2013, 02:42

Re: Fixing LibXML error "Extra content at the end of documen

Postby AngryChris » 01 Aug 2013, 21:02

I'm not looking to fan any flames here, but to provide a suggestion. Fox, would it be possible to somehow implement xmllint functionality in the application via official plug-in (meaning a plug-in that is distributed alongside TT-RSS)? I don't mean re-write things so TT-RSS itself "cleans up" or ignores bad XML or whatever, but put, say, a plug-in in the official app that makes xmllint (if installed on the system) easy to enable with a checkbox?

Plugin: af_xmllint
Description: If you have it installed, runs all posts through xmllint prior to insertion into the database.
Version: 1.0
Author: fox (I hope!)

Is this a reasonable feature request?

feader
Bear Rating Master
Bear Rating Master
Posts: 160
Joined: 26 Dec 2012, 20:03

Re: Fixing LibXML error "Extra content at the end of documen

Postby feader » 01 Aug 2013, 21:10

AngryChris wrote:I'm not looking to fan any flames here, but to provide a suggestion. Fox, would it be possible to somehow implement xmllint functionality in the application via official plug-in

It's not official, but a plugin already exists. I don't think that fetching a zip file and extract it into the right directory is to much to ask for. Someone could even make a Knowledge Base entry for this kind of stuff, so that every person with reasonable search skills can find all available solutions.

gbcox
Bear Rating Master
Bear Rating Master
Posts: 149
Joined: 25 Apr 2013, 04:52

Re: Fixing LibXML error "Extra content at the end of documen

Postby gbcox » 01 Aug 2013, 21:15

Fox can and will do whatever he wants... The plugin exists, and people can seek it out if they want it. Personally, I don't see the point other than you're just asking him to support a crutch. Seriously, what is so hard about asking people to fix their stuff? Is it really that hard? Is the content in these broken feeds just so compelling and irreplaceable to insist the world to hack around their sloppy code? I really don't get it.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Fixing LibXML error "Extra content at the end of documen

Postby fox » 01 Aug 2013, 22:16

AngryChris wrote:I'm not looking to fan any flames here, but to provide a suggestion. Fox, would it be possible to somehow implement xmllint functionality in the application via official plug-in (meaning a plug-in that is distributed alongside TT-RSS)? I don't mean re-write things so TT-RSS itself "cleans up" or ignores bad XML or whatever, but put, say, a plug-in in the official app that makes xmllint (if installed on the system) easy to enable with a checkbox?


Plugin already exists, why bundle it? It should be in the wiki index even.

durval
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 27 Jul 2013, 13:35

Re: Fixing LibXML error "Extra content at the end of documen

Postby durval » 02 Aug 2013, 17:54

Hi gbcox,

gbcox wrote:
durval wrote:but I think that your mention of IE as a "bad example" serves at best to confirm my thesis instead of denying it (and at worst is a complete "non sequitur")...

That's a bit of a reach, and no it doesn't confirm your thesis. The bottom line is "one bad apple spoils the barrel".

In my view, there are people out there who are delusional and don't want to take the extra fraction of a second to do the right thing. Instead, for whatever perverse reason, they much rather spin their wheels for hours on end coming up with perverse mechanizations to reach an end result. Then, they expect the rest of us to stand in line and feed the Frankenstein monster they have created.

There are plenty of feeds out there. If someone refuses to own up and fix theirs then dump it and choose another. I've found that most people aren't aware that there is a problem and are happy to fix their stuff.


I think we should just agree to disagree on that...

Cheers,
--
Durval.
Last edited by durval on 02 Aug 2013, 18:15, edited 2 times in total.

durval
Bear Rating Trainee
Bear Rating Trainee
Posts: 26
Joined: 27 Jul 2013, 13:35

Re: Fixing LibXML error "Extra content at the end of documen

Postby durval » 02 Aug 2013, 17:55

Hi Fox,
fox wrote:>On a side note, if you really don't care about monetization, perhaps you should consider taking out the "donate" button on the TT-RSS Wiki and also quit the flattr thing, saying that you are not interested in monetization and at the same time having these solicitations up might sound hypocritical

The fuck are you talking about? Wait, don't answer, I don't want to know. Stop posting instead. I'm about as interested in reading your wall of text essays as I am in working around broken XML.


Your wish has been granted...

Cheers,
--
Durval.


Return to “Support”

Who is online

Users browsing this forum: No registered users and 22 guests