Page 1 of 5

Bayesian classifier for TTRSS

Posted: 11 Jun 2015, 22:41
by rknobbe-other
There have been other threads about reproducing the GReader "sort by magic" or other kind of automatic classification. I took a stab at it using an external perl script, and employing AI::Categorize along with a couple of canned labels for training and scoring of articles. Please take a look and provide feedback.

https://github.com/rknobbe/tt-rss-bayes-tools

Note: I'm also "rknobbe", but the password reset response isn't coming to me for some reason.

Re: Bayesian classifier for TTRSS

Posted: 11 Jun 2015, 22:51
by fox
this kinda thing would work great as a filter plugin although i have no idea if theres any bayes stuff for php

Re: Bayesian classifier for TTRSS

Posted: 11 Jun 2015, 22:55
by rknobbe-other
I can take a stab at porting the glue logic to php; there are bayesian classifiers for php. What I would appreciate is if somebody could take a look at turning the label indicators in the article viewer into buttons, or some other slightly less clicky way of marking interesting/uninteresting. Labels are adequate, but there are too many steps for a simple training button.

Re: Bayesian classifier for TTRSS

Posted: 11 Jun 2015, 23:35
by fox
take a look at button plugins

a simple one adding essentially a like/dislike buttons to every article would suffice

Re: Bayesian classifier for TTRSS

Posted: 12 Jun 2015, 03:46
by JustAMacUser
This would be such a great feature...

Re: Bayesian classifier for TTRSS

Posted: 14 Jun 2015, 05:29
by rknobbe-other
Do plugins have a hook to modify article score? I see I can register an article filter, but it looks like the filter logic to set scores is in the mainline rssfuncs.php and not exposed to plugins.

Re: Bayesian classifier for TTRSS

Posted: 15 Jun 2015, 19:53
by fox
you're right, it's not exposed to plugins for some reasons. most likely i forgot about it. :)

e: i think i'll only be able to add a special score modifier which would add with the base score calculated by filters, adding persistent stuff seems error-prone - i.e. what if plugin would constantly increment it or something on each run

e2: https://github.com/gothfox/Tiny-Tiny-RS ... b764e49582

Re: Bayesian classifier for TTRSS

Posted: 15 Jun 2015, 22:25
by rknobbe-other
this looks great. I'll try it out tonight. While stalled on the php port, I realized that my current external (label-based) script really shouldn't be limited to 2 labels (interesting vs. uninteresting). I'm relaxing that constraint to have it do bayesian learning on any labels the user provided, then apply likely labels to new articles appropriately. I'll push that updated script to github momentarily.

Re: Bayesian classifier for TTRSS

Posted: 15 Jun 2015, 22:28
by fox

Re: Bayesian classifier for TTRSS

Posted: 15 Jun 2015, 23:37
by rknobbe-other
yeah, I saw that one and avoided it when I saw that it needed shit-tons of extra stuff for a database backend.

this one:
https://github.com/atyks/PHP-Naive-Bayesian-Filter

only needs mysql and some foreign language skills

Re: Bayesian classifier for TTRSS

Posted: 15 Jun 2015, 23:55
by fox
errors in french, readme in half japanese, looks like a lot of fun

Re: Bayesian classifier for TTRSS

Posted: 17 Jun 2015, 03:37
by himynameschris
Hi all, I am interested in this as well. I found the web service "uclassify" that could easily be integrated into tt-rss in the short term, it seems to work pretty well and the developers give an example news classifier. The service is free up to 5000 API calls per day, past that there is a subscription (I am not affiliated in any way, found them through a web search). They also have a local server version of their software but I do not see any individual/open source licensing, only paid services.

http://blog.uclassify.com/tutorial-creating-your-own-classifier/


As for php tools, the following repo seems to be the most complete for what would be needed (pos tagger, stemmer, classifier with k-means or naive bayes).

https://github.com/angeloskath/php-nlp-tools or http://php-nlp-tools.com/


Sadly, php does not seem to be the best choice when it comes to natural language processing. I would personally like to implement this using javascript on nodejs so that performance could be improved with a native / c++ module if needed, or by implementing it in Java so that an Apache Spark instance could be used if the computing requirements get to be unmanageable. The process would run independently of the php processes, updating the mysql database with analytics and results for views that would need to be implemented with a php plugin.

Excellent javascript NLP library: https://github.com/NaturalNode/natural

Re: Bayesian classifier for TTRSS

Posted: 17 Jun 2015, 15:17
by fox
well, there's this now (postgresql only at the moment, although its a matter of writing the needed script in init_database() for mysql)

https://github.com/gothfox/Tiny-Tiny-RS ... sort_bayes

it probably doesn't work correctly and/or requires tons of further tweaking, comments welcome

* how it works *

there's two article buttons, one files article in good category, another in neutral. there's no specific BAD category so it only rates things up at the moment.

when processing articles it checks if database is more or less filled on both categories and files stuff accordingly, if database is not filled it just puts everything into neutral category.

when it rates up either automatically or manually, score is bumped 50 points up.

disclaimer: i have literally no idea how any of this shit is supposed to work or if its the right way to do this lol

Re: Bayesian classifier for TTRSS

Posted: 17 Jun 2015, 18:18
by nameless
Is there anything to pay special attention to when updating?
I git pulled and actived the plugin, this is when ttrss began throwing errors; didn't get redirect to the database updater afterwards.
Ttrss is still throwing errors even though the plugin is diabled now.
Did I miss out on anything?

Re: Bayesian classifier for TTRSS

Posted: 17 Jun 2015, 18:46
by fox
maybe post some errors idk