Unusually high IO for autovacuum worker

Discussion:

(too old to reply)

Vlad Bailescu

2013-02-01 08:14:29 UTC

Hi everyone,

We are experiencing a strange situation after upgrading our Postgres from
8.4 to 9.1 and our OS from Debian 6 to Ubuntu 12.04 LTS.

We have an application where we are collecting timestamp-based data. Since
a month of data goes over 30M rows (and growing each month), we decided to
partition the data on an year-and-month basis. We basically have one master
table and partition tables like master_y2013m01 and so on.

Because of our application stack (Hibernate ORM) we use a before_insert
trigger to insert rows into the right partition and in the master table and
then an after_insert trigger to delete them from master table (we need the
original insert to return the inserted row info). This means the master
table becomes quickly populated with dead tuples and triggers the
autovacuum frequently (we are using default settings for autovacuum).

After upgrading our DB from 8.4 to 9.1 (and moving from Debian 6 to Ubuntu
12.04) we noticed a large increase in IO and traced it back to auto vacuum:
iotop and collectd report bursts of 4-6M/s taking 90 seconds followed by a
30 seconds pause. That's a lot of data to chew on especially since the
master table is only a couple of megabytes (indexes included).

Can anyone please explain why this is happening and how we can get around
it?

Thank you,
Vlad

Pavel Stehule

2013-02-01 08:29:23 UTC

Permalink

Post by Vlad Bailescu
Hi everyone,
We are experiencing a strange situation after upgrading our Postgres from
8.4 to 9.1 and our OS from Debian 6 to Ubuntu 12.04 LTS.
We have an application where we are collecting timestamp-based data. Since a
month of data goes over 30M rows (and growing each month), we decided to
partition the data on an year-and-month basis. We basically have one master
table and partition tables like master_y2013m01 and so on.
Because of our application stack (Hibernate ORM) we use a before_insert
trigger to insert rows into the right partition and in the master table and
then an after_insert trigger to delete them from master table (we need the
original insert to return the inserted row info). This means the master
table becomes quickly populated with dead tuples and triggers the autovacuum
frequently (we are using default settings for autovacuum).
After upgrading our DB from 8.4 to 9.1 (and moving from Debian 6 to Ubuntu
iotop and collectd report bursts of 4-6M/s taking 90 seconds followed by a
30 seconds pause. That's a lot of data to chew on especially since the
master table is only a couple of megabytes (indexes included).

we had similar problem - you can try to move stat file to ramdisk with
setting stats_temp_directory with target on somewhere on ramdisk

http://www.postgresql.org/message-id/CAFj8pRCRPv3i-9WwsSCa=my9M+***@mail.gmail.com

Regards

Pavel Stehule

Post by Vlad Bailescu
Can anyone please explain why this is happening and how we can get around
it?
Thank you,
Vlad

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Pavan Deolasee

2013-02-01 09:01:46 UTC

Permalink

Do you know for sure that its the master table that generating all the
vacuum traffic ? What about the partition tables ? Do they get any
updates/deletes ? It might be useful if you could turn autovacuum
logging ON and see which tables are getting vacuumed and correlate
that with the increased IO activity that you're seeing.

Did you change any other parameters like checkpoint timeout/segments
etc ? It might be worthwhile to log checkpoint activities as well to
be doubly sure.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Vlad Bailescu

2013-02-01 09:54:39 UTC

Permalink

Post by Pavan Deolasee
Do you know for sure that its the master table that generating all the
vacuum traffic ? What about the partition tables ? Do they get any
updates/deletes ? It might be useful if you could turn autovacuum
logging ON and see which tables are getting vacuumed and correlate
that with the increased IO activity that you're seeing.
Did you change any other parameters like checkpoint timeout/segments
etc ? It might be worthwhile to log checkpoint activities as well to
be doubly sure.
Thanks,
Pavan

I'm pretty sure the io is from the autovacuum on master table because it's
last_autovacuum stats update almost every minute and iotop shows something
like:

Total DISK READ: 5.80 M/s | Total DISK WRITE: 115.85 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO>
COMMAND

7681 be/4 postgres 5.93 M/s 0.00 B/s 0.00 % 63.62 % postgres:
autovacuum worker process fleet
15837 be/4 postgres 0.00 B/s 23.17 K/s 0.00 % 4.98 % postgres: fleet
fleet 127.0.0.1(53537) idle
16683 be/4 postgres 0.00 B/s 23.17 K/s 0.00 % 2.68 % postgres: fleet
fleet 127.0.0.1(53978) idle
18599 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 1.25 % postgres: fleet
fleet 127.0.0.1(59529) idle
15608 be/4 postgres 0.00 B/s 23.17 K/s 0.00 % 0.04 % postgres: fleet
fleet 127.0.0.1(53349) idle
10253 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet
fleet 127.0.0.1(50639) idle
16524 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet
fleet 127.0.0.1(53939) idle
10248 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet
fleet 127.0.0.1(50634) idle
18606 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet
fleet 127.0.0.1(59535) idle
14849 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: fleet
fleet 127.0.0.1(52832) idle
30308 be/4 postgres 0.00 B/s 7.72 K/s 0.00 % 0.01 % postgres: wal
writer process
21528 be/4 www-data 0.00 B/s 3.86 K/s 0.00 % 0.00 % nginx: worker
process
30307 be/4 postgres 0.00 B/s 38.62 K/s 0.00 % 0.00 % postgres:
writer process

I just checked and our config only sets:

max_connections = 210
superuser_reserved_connections = 10

wal_buffers = 8MB

checkpoint_segments = 16
checkpoint_completion_target = 0.9

random_page_cost = 3.0

shared_buffers = 9GB
effective_cache_size = 28GB
work_mem = 88MB
maintenance_work_mem = 1GB

We're running on a server with 40GB RAM and a RAID 10 over 6 10k HDDs.

Stats show only 3 tables get frequently autovacuumed

Table Name Tuples inserted Tuples updated Tuples deleted Tuples HOT
updated Live Tuples Dead Tuples Last vacuum Last autovacuum Last analyze Last
autoanalyze Vacuum counter Autovacuum counter Analyze counter Autoanalyze
counter Size vehicle_position 4689127 0 4689127 0 0 1985
2013-02-01 11:46:46.441227+02 2013-01-31 14:48:00.775864+02 2013-02-01
11:46:46.494234+02 0 3761 1 3754 4016 kB vehicle 2124 934640 0 924003 2124
390
2013-02-01 11:47:46.067695+02 2013-02-01 11:47:46.325444+02 0 1293 0 2038 2448
kB input_current 10032 1649206 0 1635902 10032 1728
2013-02-01 11:45:46.0681+02 2013-02-01 11:45:46.215048+02 0 442 0 1294 2336
kB

Pavan Deolasee

2013-02-01 11:08:43 UTC

Permalink

Post by Vlad Bailescu
Stats show only 3 tables get frequently autovacuumed
Table Name Tuples inserted Tuples updated Tuples deleted Tuples HOT
updated Live Tuples Dead Tuples Last vacuum Last autovacuum Last analyze Last
autoanalyze Vacuum counter Autovacuum counter Analyze counter Autoanalyze
counter Size vehicle_position 4689127 0 4689127 0 0 1985
2013-02-01 11:46:46.441227+02 2013-01-31 14:48:00.775864+02 2013-02-01
11:46:46.494234+02 0 3761 1 3754 4016 kB vehicle 2124 934640 0 924003
2124 390
2013-02-01 11:47:46.067695+02 2013-02-01 11:47:46.325444+02 0 1293 0 2038 2448
kB input_current 10032 1649206 0 1635902 10032 1728
2013-02-01 11:45:46.0681+02 2013-02-01 11:45:46.215048+02 0 442 0 1294 2336
kB

So sure there are more than just one table involved here. Also, the last
column in the
output above shows that these are fairly large tables. In fact, if the last
column is to be
trusted, they are as large 37GB, 20GB and 12GB respectively. Note the kB at
the end of
those values. Also given the large number of inserts/deletes, I would doubt
if the table can
really remain that small. May be auto-vacuum is truncating most of it and
thats why you
see only a few MBs before it swells again by the time next vacuum cycles
runs.

Whats your rate of insertion/deletion in the first table ? You also seem to
be updating the
last two tables quite heavily, but most of them are HOT updates, so your
tables should not
bloat too much.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Vlad Bailescu

2013-02-01 11:49:26 UTC

Permalink

Post by Pavan Deolasee

So sure there are more than just one table involved here. Also, the last
column in the
output above shows that these are fairly large tables. In fact, if the
last column is to be
trusted, they are as large 37GB, 20GB and 12GB respectively. Note the kB
at the end of
those values. Also given the large number of inserts/deletes, I would
doubt if the table can
really remain that small. May be auto-vacuum is truncating most of it and
thats why you
see only a few MBs before it swells again by the time next vacuum cycles
runs.
Whats your rate of insertion/deletion in the first table ? You also seem
to be updating the
last two tables quite heavily, but most of them are HOT updates, so your
tables should not
bloat too much.
Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Pavan, it seems there's a small issue with copy-paste and column
text-align. Table sizes are:

4136 kB 2448 kB 2336 kB
The actual partition tables are very large (ie: vehicle_position_y2013_m01
is 6448 MB), but they don't seem to receive so much attention.

The vehicle table gets updated very often because it has a foreign key to
vehicle_position and that changes with every batch of data.

We were able to reproduce this on a test VM with minimum workload to
trigger autovacuum. Here's the relevant log portion:

2012-12-05 00:32:57 EET LOG: database system is ready to accept connections
2012-12-05 00:32:57 EET LOG: autovacuum launcher started
2012-12-05 00:36:38 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position" system usage: CPU 0.00s/0.00u sec elapsed
0.03 sec
2012-12-05 00:37:39 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position_default" system usage: CPU 0.00s/0.15u sec
elapsed 0.99 sec
2012-12-05 00:44:23 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position" system usage: CPU 4.46s/0.61u sec elapsed
465.09 sec
2012-12-05 00:44:38 EET LOG: automatic vacuum of table
"fleet.fleet.vehicle_position": index scans: 1
pages: 8 removed, 0 remain
tuples: 64 removed, 0 remain
system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec
2012-12-05 00:44:38 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position" system usage: CPU 0.00s/0.00u sec elapsed
0.00 sec
2012-12-05 00:51:40 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position" system usage: CPU 4.34s/0.53u sec elapsed
422.20 sec
2012-12-05 00:52:38 EET LOG: automatic vacuum of table
"fleet.fleet.vehicle_position": index scans: 1
pages: 10 removed, 0 remain
tuples: 30 removed, 0 remain
system usage: CPU 0.00s/0.00u sec elapsed 0.02 sec
2012-12-05 00:52:38 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position" system usage: CPU 0.00s/0.00u sec elapsed
0.00 sec

During the analyze the IO was ~800-1000 K/s

Vlad

Pavan Deolasee

2013-02-01 12:40:53 UTC

Permalink

Post by Vlad Bailescu
Pavan, it seems there's a small issue with copy-paste and column
4136 kB 2448 kB 2336 kB

Ah OK. I see.

Post by Vlad Bailescu
2012-12-05 00:44:23 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position" system usage: CPU 4.46s/0.61u sec elapsed
465.09 sec

This is the interesting piece of information. So its the auto analyze thats
causing all
the IO activity. That explains why it was a read only IO that we noticed
earlier. Whats
happening here, and something that changed from 8.4 to 9.1, is that
whenever the parent
table is analyzed, the child tables are also automatically analyzed. I
don't remember the
rational for doing this change, but in your case the analyze on the parent
table itself is
quite useless because even though you inserting a large number of new
tuples, you are
also immediately deleting them. I don't want to comment on the design
aspect of that,
but you should be able to fix this problem by disabling auto-analyze on the
parent table.

Having said that, I don't see an easy way to just disable auto-analyze on a
table. You can
run ALTER TABLE foo SET (autovacuum_enabled = false), but that would also
disable
auto-vacuum, which you certainly don't want to do because the parent table
would just
keep growing.

You can set autovacuum_analyze_threshold to an artificially high value to
mitigate the
problem and reduce the frequency of auto-analyze on the table or see if you
can completely
avoid insert/delete on the parent table.

ALTER TABLE vehicle_position SET (autovacuum_analyze_threshold =
2000000000);

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Vlad Bailescu

2013-02-01 13:40:43 UTC

Permalink

Post by Pavan Deolasee

Post by Vlad Bailescu
Pavan, it seems there's a small issue with copy-paste and column
4136 kB 2448 kB 2336 kB

Ah OK. I see.

Post by Vlad Bailescu
2012-12-05 00:44:23 EET LOG: automatic analyze of table
"fleet.fleet.vehicle_position" system usage: CPU 4.46s/0.61u sec elapsed
465.09 sec

This is the interesting piece of information. So its the auto analyze
thats causing all
the IO activity. That explains why it was a read only IO that we noticed
earlier. Whats
happening here, and something that changed from 8.4 to 9.1, is that
whenever the parent
table is analyzed, the child tables are also automatically analyzed. I
don't remember the
rational for doing this change, but in your case the analyze on the parent
table itself is
quite useless because even though you inserting a large number of new
tuples, you are
also immediately deleting them. I don't want to comment on the design
aspect of that,
but you should be able to fix this problem by disabling auto-analyze on
the parent table.
Having said that, I don't see an easy way to just disable auto-analyze on
a table. You can
run ALTER TABLE foo SET (autovacuum_enabled = false), but that would also
disable
auto-vacuum, which you certainly don't want to do because the parent table
would just
keep growing.
You can set autovacuum_analyze_threshold to an artificially high value
to mitigate the
problem and reduce the frequency of auto-analyze on the table or see if
you can completely
avoid insert/delete on the parent table.
ALTER TABLE vehicle_position SET (autovacuum_analyze_threshold =
2000000000);
Thanks,
Pavan

Great insight Pavan,

I see this change in ANALYZE is documented since 9.0:

If the table being analyzed has one or more children, ANALYZE will gather

Post by Pavan Deolasee
statistics twice: once on the rows of the parent table only, and a second
time on the rows of the parent table with all of its children. The
autovacuum daemon, however, will only consider inserts or updates on the
parent table when deciding whether to trigger an automatic analyze. If that
table is rarely inserted into or updated, the inheritance statistics will
not be up to date unless you run ANALYZE manually.

We will try setting a higher threshold for autovacuum on the parent table.

As for design, we could have gone this way or use a rule to redirect
inserts from master to children. My understanding is that performance for
rules is not so good (especially since we're doing mostly single INSERTs).
Unfortunately, the normal trigger will not work for us at the moment.

I saw your proposal on -hackers, I will keep an eye on it!

Thanks you,
Vlad

Kevin Grittner

2013-02-01 13:27:58 UTC

Permalink

Post by Vlad Bailescu
Because of our application stack (Hibernate ORM) we use a
before_insert trigger to insert rows into the right partition and
in the master table and then an after_insert trigger to delete
them from master table (we need the original insert to return the
inserted row info). This means the master table becomes quickly
populated with dead tuples and triggers the autovacuum frequently
(we are using default settings for autovacuum).
After upgrading our DB from 8.4 to 9.1 (and moving from Debian 6
to Ubuntu 12.04) we noticed a large increase in IO and traced it
back to auto vacuum: iotop and collectd report bursts of 4-6M/s
taking 90 seconds followed by a 30 seconds pause. That's a lot of
data to chew on especially since the master table is only a
couple of megabytes (indexes included).
Can anyone please explain why this is happening and how we can
get around it?

It sounds like it could be an issue which will be fixed in the next
minor release:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5454344b968d6a189219cfd49af609a3e7d6af33

If this is the issue, a periodic manual VACUUM of the master table
should help for now. When 9.1.8 is released you might want to
upgrade without a lot of delay, and see if you are then OK without
the manual vacuums.

-Kevin

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general