Filter on Title Only is Matching Title or Content

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only
onyxfox
Bear Rating Trainee
Bear Rating Trainee
Posts: 18
Joined: 19 Mar 2013, 11:54

Filter on Title Only is Matching Title or Content

Postby onyxfox » 23 Mar 2013, 02:03

I have a filter set up to assign a label to articles which contain a certain word in the title only, but it is setting the label for any article that has the word in the title OR the content. I have repeatedly checked that the filter was set to match on the title only, and I have deleted and recreated the filters, but it still refuses to match on title only.

I did some searching around on the forums here and Redmine, but had no luck finding any other reports of this behavior, so I apologize if this is an already known issue.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 23 Mar 2013, 09:28

That's strange - you can check get_article_filters(), there's definitely separate blocks for only title and only content filters. :(

indianbuckeye
Bear Rating Trainee
Bear Rating Trainee
Posts: 9
Joined: 29 Mar 2013, 20:24

Re: Filter on Title Only is Matching Title or Content

Postby indianbuckeye » 29 Mar 2013, 20:26

I see the same behavior on my installation. The filter is set to 'Title' but the results are for 'Title and Content'.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 29 Mar 2013, 20:58

Examples? Preferably with feed.

onyxfox
Bear Rating Trainee
Bear Rating Trainee
Posts: 18
Joined: 19 Mar 2013, 11:54

Re: Filter on Title Only is Matching Title or Content

Postby onyxfox » 29 Mar 2013, 21:48

Glad to know I'm not alone. Thought I was going crazy there, lol!

I've been seeing it still happening, and I've triple checked my filters to be sure they are really title only, and I have checked in the mysql table to also be sure that it is saving it as a title only filter.

I have been busy and have not been able to drop in some debugging code to see what values the filter is seeing, but I did look at the rssfuncs.php, and it does look like it is filtering the title only, which is mysterious to me. I had planned on dumping variable values to a log file, but I just haven't had the time to take care of that yet.

Edit: I'll add some of the issues I have been having below as I get the data for them

Code: Select all

Feed URL: http://www.engadget.com/rss.xml
Filter Match:
 - android on title in all feeds OR chrome on title in all feeds OR google on title in all feeds

False Positive:
Title:
HP's $169 Slate 7 tablet apparently delayed until June

Content:
Maybe it's that $169 price, or maybe it's the inclusion of an honest-to-goodness memory card reader, but we know some of you can't wait to get your mitts on HP's new Slate 7 Android tablet. Back when it was first announced, the company indicated it'd be available by April, but it would seem that plan has changed: the product page on HP's site is now saying the Slate won't arrive until sometime in June. We're not sure why there's a delay (we're asking for comment), but we do know this can't be good news for HP. By June, after all, Google I/O will have come and gone, and the next-gen Nexus 7 might already be on sale.

[Thanks, jmartj]

Filed under: Tablets, HP

Comments

Source: HP


Code: Select all

Feed URL: http://feeds.mashable.com/Mashable
Filter Match:
facebook on title in all feeds OR zuckerberg on title in all feeds

False Positive:
Title:
The Strategy Behind the Viral Red Marriage Equality Campaign

Content:
The Human Rights Campaign's Director of Marketing Anastasia Khoo didn't decide to change the organization's equality sign logo from blue to red on a whim

The team at the HRC, the largest lobby organization dedicated to fighting for LGBT rights, knew that the two cases the Supreme Court heard this week on marriage equality were major, that Monday and Tuesday were moments of historic significance. The organization also foresaw that people would want to show their support for marriage equality. During the planning process, Khoo had the idea to turn the iconic blue and yellow equality sign logo red. Read more...

More about Facebook, Us, Features, Social Good, and Gay Rights


Code: Select all

Feed URL: http://electronista.feedsportal.com/c/34342/f/626172/index.rss
Filter Match:
apple, icloud, ios, ipad, iphone, ipod, itunes, retina, or jony ive on title (yeah, 'ios' causes false positives, but not in the article's title below)

False Positive:
Title:
Best Buy hosting Samsung store-in-a-store in near future

Content:
Best Buy is reportedly going to play host to a number of pop-up Samsung shops in a number of its stores ahead of the Galaxy S4 launch in April. The temporary shops will apparently resemble the same "store-in-a-store" concept that Apple currently employs in Best Buy branches, and is said to provide customers more than one Galaxy S4 to try out....


Code: Select all

Feed URL: http://feedproxy.google.com/TechCrunch
Filter Match:
facebook or zuckerberg in title, all feeds
also a second false match on:
reddit in title, all feeds

False Positive:
Title:
Focused On Women, Sprightly Debuts A Visual Content Platform Showing What's Hot Across Fashion, Beauty, Design Sites & More

Content:
Sprightly, a newly launching startup whose founding team has an extensive history working in female-focused businesses, including Refinery29, Etsy, Chloe+Isabel, and others, is debuting its content aggregation platform on Monday, with a focus on verticals like fashion, beauty, design, decor, and more. TechCrunch has early invites (see below).

According to company co-founder Jorge Lopez, an early Etsy developer and until just recently, VP of Innovation at Refinery29, the original inspiration for the service was to build something that he describes as “a Reddit for women.” What he means by that is a system that aggregates content from around the web, which is then ranked in order to give you a real-time view into what’s currently popular.

“I love Reddit. It’s my favorite thing in the entire world,” Lopez explains. “You land on Reddit and you get the front page of the Internet. You get everything that’s important to you right then. It’s a snapshot of the world, and it’s very focused on recency.”

He then thought about the fact that there wasn’t a similar service designed just for women. (I’d argue that lots of women like Reddit, in fact, but Lopez is referring to those “traditional” female-friendly interests – fashion, beauty, home decor, etc. – the kind of categories that have led to Pinterest’s rapid growth.)


Working with Sprightly co-founder Pamela Castillo, previously of Chloe + Isabel, Fashism, Plum Alley, and Market Publique, they’ve spent a couple of months building Sprightly, which aggregates website and blog content in real-time, ranks what’s trending based on social media scores (as opposed to voting, like on Reddit), then presents users with a one-stop destination showing everything that’s popular today, as well as ways to drill down into other sections to explore even further.

If anything, the resulting product has more in common with Pinterest or Flipboard than it does with Reddit, as it turns out. Instead of user-submitted links and votes, Sprightly’s content comes from nearly 900 websites across the verticals it targets, as well as anything else a user wants to add on their own.

“We do the work for you, to some extent,” says Lopez. “We say, these are the cool blogs that we put together, but you still have the power to add whatever you think is really cool.” In the future, the plan is to allow users to follow each others lists of blogs, which is somewhat similar to Flipboard’s newly launched custom magazines, except the lists would contain the blogs themselves, not individual pieces of content.

As opposed to user voting, Sprightly determines what’s trending based on social media signals, including Facebook Likes, tweets, the blog’s overall popularity, Pinterest pins, and more. And in addition to populating its own front page of what to read, Sprightly will also send users an email of the top ten things they should read today. A mobile app (pictured, right), now being built by Chamera Paul, is planned for a May debut.

The other big difference between Sprightly and its original source of inspiration in Reddit, is that the site is also heavily focused on visual imagery, giving it a Pinterest-like feel. There are some 200,000 images now indexed across its service (including animated gifs, natch). But Pinterest, Lopez explains, doesn’t focus on currently trending content, but rather popularity over time.

“We saw Pinterest the morning after the Oscars, and it mentioned nothing of the Oscars,” says Lopez. “Meanwhile, Sprightly had Oscar content through every single vertical, be it beauty, be it fashion – everything is completely based on recency. It’s like, ‘this is what’s hot right now,’” he says.

Given its overlap with Pinterest, Flipboard, Tumblr, and the like, it’s hard to say if this female-friendly aggregator will take off independently. But the founders have a history of working for startups targeting the female demographic, and it’s already very easy to lose yourself on the site for good chunks of time, which is promising. The product launching next week is a very early MVP, meant only to determine if such a thing has legs.

Users will be invited in batches, but TechCrunch readers who want to be at the head of that line can use this link to sign up: https://spright.ly/i/techcrunch. The first 100 who register will be the first to receive invites when Sprightly opens up on Monday.


Well, that's four different false positives from four feeds, all from posts made today, I hope that helps out!

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 30 Mar 2013, 12:50

Do you use "or" in the regular expression field in the filter? Screenshot your filter.

Edit: tried that, and not seeing any false positives with Engadget examples. The listed post didn't apply.

What could have happened is engadget originally posting this with a different title, then adjusting it. The title would automatically correct itself but filter obviously wouldn't unapply.

onyxfox
Bear Rating Trainee
Bear Rating Trainee
Posts: 18
Joined: 19 Mar 2013, 11:54

Re: Filter on Title Only is Matching Title or Content

Postby onyxfox » 30 Mar 2013, 18:50

Oh no, not a literal OR in print, sorry. It is not exclusive to those with multiple match criteria either, as you can see with the reddit filter I had added.

Edit: I hadn't thought about it, but the Test button in the filter always shows the proper results.

Image

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 30 Mar 2013, 18:53

>Edit: I hadn't thought about it, but the Test button in the filter always shows the proper results.

It works on relevant database content, which might have changed dozens of times while the feed was updated.

onyxfox
Bear Rating Trainee
Bear Rating Trainee
Posts: 18
Joined: 19 Mar 2013, 11:54

Re: Filter on Title Only is Matching Title or Content

Postby onyxfox » 30 Mar 2013, 19:15

So the likely answer is that the feed's title is getting edited sometime between initial insertion into tt-rss and the time that I read the article?

It seems very odd to me that the keywords that are causing the false positive are always in the content, even in those which have nothing to do with the keyword. I mean it makes me scratch my head in wonder as to why the feed source would have placed the keywords into the title at any point. Assuming this is the problem, I do see how this could cause a mistake to be made in the filter's output, though.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 30 Mar 2013, 19:33

>So the likely answer is that the feed's title is getting edited sometime between initial insertion into tt-rss and the time that I read the article?

This is the only theory I have. Alternative being that sometimes match goes by different field for some unspecified mysterious reason. That is also possible, obviously, but I have no idea how or why.

>It seems very odd to me that the keywords that are causing the false positive are always in the content, even in those which have nothing to do with the keyword.

This actually makes some sense.

Initial headline: blah blah android tablet
Content: android blah blah
Editor: let's remove android from headline, it's in the article already

onyxfox
Bear Rating Trainee
Bear Rating Trainee
Posts: 18
Joined: 19 Mar 2013, 11:54

Re: Filter on Title Only is Matching Title or Content

Postby onyxfox » 30 Mar 2013, 23:03

This one just came across my feed. It matched on the content rather than the title, despite the filter being for title only. I'm pretty sure the title of the post wasn't changed. I have other RSS entries with the same title (from the same thread). It's from the RSS feed for this site even. It matched on my Google tagging filter (android, google, or chrome in title only). Of the three occurrences of this thread in my fresh articles, this is the only one which triggered the Google filter, and the only one in which the word android, google, or chrome appeared in the content.

I know it doesn't make sense, it's driving me crazy trying to explain it to myself as well, lol!

Image

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 30 Mar 2013, 23:30

This is perplexing. :)

I think the best course of action at this point would be adding logging to get_article_filters() and waiting until this happens again. Which would be pretty soon because filters are applied on each update, even on already imported articles.

As a starting point I would suggest attached diff (it logs to a file in /tmp, you can easily change it to do syslog or whatever).

Edit: also you can see which filters are applied to article in f D display; check if this is repeatable.
Attachments
log_applied_filters.diff
(685 Bytes) Downloaded 113 times

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 30 Mar 2013, 23:56

btw on your screenshot you don't have match any rule selected. is that how it's supposed to be?

onyxfox
Bear Rating Trainee
Bear Rating Trainee
Posts: 18
Joined: 19 Mar 2013, 11:54

Re: Filter on Title Only is Matching Title or Content

Postby onyxfox » 31 Mar 2013, 00:12

I'll report back. I've added the logging.

Edit: As for the Match Any option, while I was trying to figure out what was giving false positives, I had deselected it to try and see if that made a difference, which it didn't. I do normally have it toggled on, yes.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Filter on Title Only is Matching Title or Content

Postby fox » 31 Mar 2013, 00:37

I made two feeds with same rules, one with match any and one without. I'll see if anything unusual gets caught.

Edit: are you absolutely sure you don't have something like a negative regexp checked in there somewhere?


Return to “Support”

Who is online

Users browsing this forum: No registered users and 9 guests