Performance Regression (article purge)

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only
AngryChris
Bear Rating Master
Bear Rating Master
Posts: 135
Joined: 08 Apr 2013, 02:42

Performance Regression (article purge)

Postby AngryChris » 15 Jul 2015, 01:55

This commit here https://github.com/gothfox/Tiny-Tiny-RS ... e4f5cd6b3e may have introduced a performance regression. The query in question:

Code: Select all

DELETE FROM ttrss_entries WHERE id NOT IN (SELECT ref_id FROM ttrss_user_entries WHERE ref_id IS NOT NULL);

I believe is doing a full table scan of ttrss_user_entries for every entry in ttrss_entries. There are now 2 copies of this query running on my system and they're pegging the CPU:

Code: Select all

[0] [email protected]:/home/cbell 2013-> ps -ef | grep DELETE | grep -v grep
postgres 21450  1744 81 16:39 ?        00:49:32 postgres: ttrss ttrssdb 127.0.0.1(33238) DELETE                                     
postgres 22363  1744 37 16:51 ?        00:18:37 postgres: ttrss ttrssdb 127.0.0.1(33524) DELETE                                     
[0] [email protected]:/home/cbell 2014->

Code: Select all

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21450 postgres  20   0  271460 159616 135656 R  98.8  4.4  52:58.42 postgres
22363 postgres  20   0  271728 155280 132196 R  98.2  4.3  22:04.33 postgres

You can see both of them running in the database here:

Code: Select all

ttrssdb=> select user, usesysid, query from pg_stat_activity;
 current_user | usesysid |                                               query
--------------+----------+----------------------------------------------------------------------------------------------------
 ttrss        |    16384 | DELETE FROM ttrss_entries WHERE                                                                   +
              |          |                         id NOT IN (SELECT ref_id FROM ttrss_user_entries WHERE ref_id IS NOT NULL)
 ttrss        |    16384 | DISCARD ALL
 ttrss        |    16384 | DISCARD ALL
 ttrss        |    16384 | select user, usesysid, query from pg_stat_activity;
 ttrss        |    16384 | DISCARD ALL
 ttrss        |    16384 | DELETE FROM ttrss_entries WHERE                                                                   +
              |          |                         id NOT IN (SELECT ref_id FROM ttrss_user_entries WHERE ref_id IS NOT NULL)
(6 rows)

ttrssdb=>

Here's a query explain:

Code: Select all

ttrssdb=> explain DELETE FROM ttrss_entries WHERE id NOT IN (SELECT ref_id FROM ttrss_user_entries WHERE ref_id IS NOT NULL);
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Delete on ttrss_entries  (cost=0.00..7885392957.15 rows=348578 width=6)
   ->  Seq Scan on ttrss_entries  (cost=0.00..7885392957.15 rows=348578 width=6)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..20871.78 rows=699852 width=4)
                 ->  Seq Scan on ttrss_user_entries  (cost=0.00..14638.52 rows=699852 width=4)
                       Filter: (ref_id IS NOT NULL)
(7 rows)

ttrssdb=>

So yes, it seems this is doing a total of 348,578 full table scans of the ttrss_user_entries table (the number of rows in ttrss_entries).

The previous query:

Code: Select all

DELETE FROM ttrss_entries WHERE(SELECT COUNT(int_id) FROM ttrss_user_entries WHERE ref_id = id) = 0

Does 3,486 single index scans of ttrss_user_entries (one index scan per row found in ttrss_entries).

Code: Select all

ttrssdb=> explain DELETE FROM ttrss_entries WHERE(SELECT COUNT(int_id) FROM ttrss_user_entries WHERE ref_id = id) = 0;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Delete on ttrss_entries  (cost=0.00..5950253.90 rows=3486 width=6)
   ->  Seq Scan on ttrss_entries  (cost=0.00..5950253.90 rows=3486 width=6)
         Filter: ((SubPlan 1) = 0)
         SubPlan 1
           ->  Aggregate  (cost=8.45..8.46 rows=1 width=4)
                 ->  Index Scan using ttrss_user_entries_ref_id_index on ttrss_user_entries  (cost=0.42..8.44 rows=1 width=4)
                       Index Cond: (ref_id = ttrss_entries.id)
(7 rows)

ttrssdb=>

It seems the new code isn't a performance breakthrough at all, but rather a performance decrease of several orders of magnitude. If this analysis is wrong, I'm open to correction.
Last edited by AngryChris on 15 Jul 2015, 04:55, edited 2 times in total.

AngryChris
Bear Rating Master
Bear Rating Master
Posts: 135
Joined: 08 Apr 2013, 02:42

Re: Performance Regression (article purge)

Postby AngryChris » 15 Jul 2015, 01:56

And now 3 copies are running:

Code: Select all

[0] [email protected]:/home/cbell 2019-> ps -ef | grep DELETE
postgres 21450  1744 82 16:39 ?        01:02:34 postgres: ttrss ttrssdb 127.0.0.1(33238) DELETE                                     
postgres 22363  1744 49 16:51 ?        00:31:49 postgres: ttrss ttrssdb 127.0.0.1(33524) DELETE                                     
postgres 25302  1744 11 17:38 ?        00:01:57 postgres: ttrss ttrssdb 127.0.0.1(34252) DELETE                                     
cbell    26344 22971  0 17:55 pts/9    00:00:00 grep --color=auto DELETE
[0] [email protected]:/home/cbell 2020-> psql -U ttrss ttrssdb
psql (9.3.9)
Type "help" for help.

ttrssdb=> SELECT usesysid, user, query FROM pg_stat_activity;
 usesysid | current_user |                                               query
----------+--------------+----------------------------------------------------------------------------------------------------
    16384 | ttrss        | DELETE FROM ttrss_entries WHERE                                                                   +
          |              |                         id NOT IN (SELECT ref_id FROM ttrss_user_entries WHERE ref_id IS NOT NULL)
    16384 | ttrss        | UPDATE ttrss_entries SET date_updated = NOW()                                                     +
          |              |                                                 WHERE id = '1186574'
    16384 | ttrss        | DELETE FROM ttrss_entries WHERE                                                                   +
          |              |                         id NOT IN (SELECT ref_id FROM ttrss_user_entries WHERE ref_id IS NOT NULL)
    16384 | ttrss        | SELECT usesysid, user, query FROM pg_stat_activity;
    16384 | ttrss        | DISCARD ALL
    16384 | ttrss        | DELETE FROM ttrss_entries WHERE                                                                   +
          |              |                         id NOT IN (SELECT ref_id FROM ttrss_user_entries WHERE ref_id IS NOT NULL)
(6 rows)

ttrssdb=>

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Performance Regression (article purge)

Postby fox » 15 Jul 2015, 08:31

i had my doubts. maybe it's faster only on mysql, who knows. i'll revert it.

e: the whole BREAKTHROUGH!!11 thing was p funny though

virgo
Bear Rating Trainee
Bear Rating Trainee
Posts: 37
Joined: 12 Jun 2013, 22:14

Re: Performance Regression (article purge)

Postby virgo » 15 Jul 2015, 12:33

I don't know much of PostgreSQL, but in Firebird NOT IN query is one of the worst things to do (because IN is run for every record in main table). But looking at the original query, you probably could get rid of COUNT by making it NOT EXISTS query instead of NOT IN.
So original:

DELETE FROM ttrss_entries WHERE
(SELECT COUNT(int_id) FROM ttrss_user_entries WHERE ref_id = id) = 0

could be rewritten

DELETE FROM ttrss_entries E WHERE
NOT EXISTS (SELECT U.ref_id FROM ttrss_user_entries U WHERE U.ref_id = E.id)

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Performance Regression (article purge)

Postby fox » 15 Jul 2015, 12:48

this doesn't work on mysql but it seems to be faster on postgres:

Code: Select all

 Delete on ttrss_entries e  (cost=5781.15..31289.13 rows=23 width=12) (actual time=473.111..473.111 rows=0 loops=1)
   ->  Hash Anti Join  (cost=5781.15..31289.13 rows=23 width=12) (actual time=473.107..473.107 rows=0 loops=1)
         Hash Cond: (e.id = u.ref_id)
         ->  Seq Scan on ttrss_entries e  (cost=0.00..22806.63 rows=130563 width=10) (actual time=0.031..139.054 rows=130611 loops=1)
         ->  Hash  (cost=3575.40..3575.40 rows=130540 width=10) (actual time=188.476..188.476 rows=130611 loops=1)
               Buckets: 16384  Batches: 2  Memory Usage: 2299kB
               ->  Seq Scan on ttrss_user_entries u  (cost=0.00..3575.40 rows=130540 width=10) (actual time=0.010..100.666 rows=130611 loops=1)
 Planning time: 0.295 ms
 Execution time: 473.458 ms
(9 rows)


original (also notice the rows difference):

Code: Select all

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Delete on ttrss_entries  (cost=0.00..1126063.98 rows=653 width=6) (actual time=1159.215..1159.215 rows=0 loops=1)
   ->  Seq Scan on ttrss_entries  (cost=0.00..1126063.98 rows=653 width=6) (actual time=1159.211..1159.211 rows=0 loops=1)
         Filter: ((SubPlan 1) = 0)
         Rows Removed by Filter: 130612
         SubPlan 1
           ->  Aggregate  (cost=8.44..8.45 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=130612)
                 ->  Index Scan using ttrss_user_entries_ref_id_index on ttrss_user_entries  (cost=0.42..8.44 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=130612)
                       Index Cond: (ref_id = ttrss_entries.id)
 Planning time: 1.154 ms
 Execution time: 1159.426 ms
(10 rows)


in any case removing orphans is a housekeeping job which is done by the backend daemon, it doesn't really matter if its fast as long as it finishes in a reasonable timeframe.

virgo
Bear Rating Trainee
Bear Rating Trainee
Posts: 37
Joined: 12 Jun 2013, 22:14

Re: Performance Regression (article purge)

Postby virgo » 15 Jul 2015, 13:09

It seems, that mysql does not support table aliases in DELETE.
DELETE FROM ttrss_entries WHERE
NOT EXISTS (SELECT ref_id FROM ttrss_user_entries WHERE ref_id = id)
seems to work.
But I have no idea, why rows would be different. Because COUNT() = 0 should be when there is no record. I'm probably missing something.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Performance Regression (article purge)

Postby fox » 15 Jul 2015, 13:24

oh this is the amount of lines returned by the analyze query, duh

i like your query, i think it's an improvement

also i had no idea you can use this shorter form for aliases, cool

e: merged, let's see how this goes

vidar
Bear Rating Trainee
Bear Rating Trainee
Posts: 13
Joined: 06 May 2013, 00:19

Re: Performance Regression (article purge)

Postby vidar » 15 Jul 2015, 18:47

I have been running with a patched version of that query for 5 months with no issues. Mine is very similar to virgo's:

Code: Select all

$result = db_query("DELETE FROM ttrss_entries WHERE
   not exists (SELECT 1 FROM ttrss_user_entries WHERE ref_id = id)");

I am on PostgreSQL. I think I saw execution time go from > 2 seconds to 1 second on average. Perhaps more important: I think this change reduces locking in the database. I used to have slow UI performance at times, always at the same time as long-held locks appeared in the database. Now I think changing this query got rid of the locks, but I can't guarantee that's what did it; I didn't keep notes. Either way, I agree (and so does PostgreSQL's EXPLAIN ANALYZE) that this is a more efficient query than the original. The only difference in my query is that it gets the literal 1 instead of any column, to guarantee that it does not cause a table access when an index access would be sufficient. I always do this by habit, but in this case I don't think it matters.

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Performance Regression (article purge)

Postby sleeper_service » 15 Jul 2015, 21:39

in case anybody doesn't know about this, http://explain.depesz.com/ it was suggested by a PG friend last night to make explain analyze output a bit more readable :)


Return to “Support”

Who is online

Users browsing this forum: No registered users and 11 guests