Perceptual image filtering plugin

Post plugins and custom CSS snippets here
User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Perceptual image filtering plugin

Postby fox » 19 Jul 2016, 13:16

So, I was bored and made a thing: it checks images using perceptual image hashing and filters out ones similar to those first encountered in other (older) articles. The idea here is to filter out reposted images on reddit and stuff like that.

Image

The whole thing is a *work in progress* but seems to work surprisingly well so far.

Repository: https://tt-rss.org/gitlab/fox/ttrss-per ... image-hash (clone to plugins.local/af_zz_img_phash)

Requirements:

1. GD
2. Enough memory to load potentially large images into GD and disk space to hold them
3. if using postgresql, count-bits extension: https://github.com/sldab/count-bits

Stuff to do:

1. Clone plugin to plugins.local
2. Import schema in (pluginroot)/sql
3. Enable plugin for feeds containing potential reposts in preferences and set maximum Hamming distance (the default should be ok).

If plugin catches a potential duplicate, it will rewrite the image to a plain-text link with a dialog next to it showing all related stuff in the database.

How to check similarity manually:

Code: Select all

select url,phash,unique_1bits((select phash from ttrss_plugin_img_phash_urls where url ='http://imgur.com/something.jpg'), phash) AS distance from ttrss_plugin_img_phash_urls order by distance limit 5;


on mysql replace unique_1bits with bit_count.

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Perceptual image checking plugin

Postby sleeper_service » 20 Jul 2016, 02:44

I must be missing something, using PG, installed php_gd, plenty memory and space... cloned into plugins.local, imported schema. using a fresh pull of the main ttrss.

but I don't see the plugin anywhere, on the user, or system plugin list, or on the feed plugin tab.

I don't see it listed in output of update.php --list-plugins either.

don't see any errors on the web server logs, or in ttrss logs.

hint what I might be missing?

JustAMacUser
Bear Rating Overlord
Bear Rating Overlord
Posts: 373
Joined: 20 Aug 2013, 23:13

Re: Perceptual image checking plugin

Postby JustAMacUser » 20 Jul 2016, 03:15

I think the PHP class name (Af_Zz_Img_Phash) and the plugin's directory name have to be the same?

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Perceptual image checking plugin

Postby sleeper_service » 20 Jul 2016, 03:21

aha! the directory name has to be af_zz_img_phash

mixed case didn't show up, but making the dir all lower case, (like other af_zz plugins) made it show up.

thanks!

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Perceptual image checking plugin

Postby fox » 20 Jul 2016, 09:10

oh right, i forgot about repo name being different from the directory name here

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Perceptual image checking plugin

Postby sleeper_service » 20 Jul 2016, 09:23

why for you sabbytage me???? :lol:

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Perceptual image checking plugin

Postby fox » 20 Jul 2016, 09:24

all part of the master plan

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Perceptual image filtering plugin

Postby fox » 20 Jul 2016, 17:35

trip report, this whole thing unfortunately scales like shit

e: updated version scales a lot better but needs a custom postgresql extension (linked in the op) and a schema reimport / update.

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Perceptual image filtering plugin

Postby sleeper_service » 21 Jul 2016, 12:03

well, *that* was fun...

gcc -O2 -m64 -march=opteron-sse3 -mpopcnt -fpic -I/opt/postgres/9.5-pgdg/include/64/server -o count_bits.so -shared -fPIC count_bits.c

hope it doesn't blow up my postgres.

e sometime later: no boom so far.

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Perceptual image filtering plugin

Postby sleeper_service » 21 Jul 2016, 13:58

well, i spoke too soon

Code: Select all

2016-07-21 05:53:38 CDT [6341]: [4-1]     %LOG:  received fast shutdown request
2016-07-21 05:53:38 CDT [6341]: [5-1]     %LOG:  aborting any active transactions
2016-07-21 05:53:38 CDT [6347]: [2-1]     %LOG:  autovacuum launcher shutting down
2016-07-21 05:53:38 CDT [6341]: [6-1]     %LOG:  server process (PID 6626) was terminated by signal 4
2016-07-21 05:53:38 CDT [6341]: [7-1]     %DETAIL:  Failed process was running: SELECT article_guid FROM ttrss_plugin_img_phash_urls WHERE
        owner_uid = 2 AND
        created_at >= NOW() - INTERVAL '30 days'  AND
        unique_1bits('113299711', phash) <= 5 ORDER BY created_at LIMIT 1
2016-07-21 05:53:38 CDT [6341]: [8-1]     %LOG:  terminating any other active server processes
2016-07-21 05:53:38 CDT [6341]: [9-1]     %LOG:  abnormal database system shutdown


my first guess is that it doesn't like my gcc created count_bits.so with the sun studio compiled postgres server. :(

but, it was creating entries in the ttrss_plugin_img_phash_urls table ok... :scratching head:

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Perceptual image filtering plugin

Postby fox » 21 Jul 2016, 21:36

I used default debian stuff i.e. gcc and postgres-dev and it's rock solid so far. 9.4 though.

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Perceptual image filtering plugin

Postby sleeper_service » 22 Jul 2016, 03:48

Program terminated with signal 4, Illegal instruction.
#0 0xffff80ffbf1d1188 in unique_1bits ()
from /opt/postgres/9.5-pgdg/lib/64/count_bits.so

:(

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Perceptual image filtering plugin

Postby sleeper_service » 22 Jul 2016, 04:13

hokay, update, I recompiled the module, so it *didn't* use the popcntq instruction and now it's not coredumping postgres. fwiw:

gcc -O2 -m64 -march=opteron-sse3 -fpic -I/opt/postgres/9.5-pgdg/include/64/server -o count_bits.so -shared -fPIC count_bits.c

the previously mentioned select, which was hot death, now returns a hash.

ttrss=# SELECT article_guid FROM ttrss_plugin_img_phash_urls WHERE
[more] - > owner_uid = 2 AND
[more] - > created_at >= NOW() - INTERVAL '30 days' AND
[more] - > unique_1bits('113299711', phash) <= 5 ORDER BY created_at LIMIT 1;
article_guid
-----------------------------------------------
SHA1:584848d6f1443377f1bc62401454a6a6a0e02f30
(1 row)

so, yay!


Return to “Themes and plugins”

Who is online

Users browsing this forum: No registered users and 1 guest