Syncing an application cache with xmin

Discussion:

(too old to reply)

Jason Dusek

2013-02-03 15:22:44 UTC

Hello List,

Imagine a table of records describing the up-to-date state of
some objects:

id | t | data columns ...
------+-------------+------------------
uuid | timestamptz | ...

The `id' column is a PRIMARY KEY. When an object is updated, the
old record is archived and a new record, with `t' set to
transaction_timestamp(), is added. So the table really contains
just the up-to-date state of objects.

An application would like to cache this up to date state,
synchronizing at regular intervals. The application might not
maintain a persistent connection to the database, so
LISTEN/NOTIFY is not to be preferred. What are some algorithms
by which one can retrieve new rows since the last sync?

Since the sync is made at regular intervals -- let's call it
once a minute -- one could query for all records with `t' that
is less than a minute old. For safety's sake, one can query for
*two* minutes of data.

This would seem to solve the problem. It works pretty well.
However, a new challenger appears: imagine an inserting
transaction that runs for five minutes. None of rows are visible
and then they are added -- with a transaction_timestamp that is
five minutes in the past! These rows are never synced.

Now you could argue that writes shouldn't occur in such long
transactions and you would be right; but it happens that one-off
tools exhibit bad behaviour one wouldn't accept in a production
application, and fixing the tools can be hard to make happen.

I was reading today about the `xmin' column, `txid_current()`
and `pg_export_snapshot()` and I wonder if there is not a better
way to sync, using transaction IDs instead of time.

http://www.postgresql.org/docs/9.2/static/ddl-system-columns.html
http://www.postgresql.org/docs/9.2/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION

The idea would be, to store information about the last XID in
the last sync and search for XIDs committed since then upon
reconnecting for sync. Perhaps `txid_current_snapshot()'
preserves enough information. Is this a plausible technique?
Would it be a misuse of XIDs?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Tom Lane

2013-02-03 16:09:40 UTC

Permalink

Post by Jason Dusek
The idea would be, to store information about the last XID in
the last sync and search for XIDs committed since then upon
reconnecting for sync. Perhaps `txid_current_snapshot()'
preserves enough information. Is this a plausible technique?

Perfectly plausible, and often done in one guise or another. You can't
expect row XIDs to survive forever --- they'll be replaced by FrozenXID
after awhile to avoid problems due to transaction counter wraparound.
But for delays of a few minutes, in a database with an unremarkable
transaction rate, that's not an issue.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jason Dusek

2013-02-03 18:21:10 UTC

Permalink

Post by Tom Lane

Perfectly plausible, and often done in one guise or another.
You can't expect row XIDs to survive forever --- they'll be
replaced by FrozenXID after awhile to avoid problems due to
transaction counter wraparound. But for delays of a few
minutes, in a database with an unremarkable transaction rate,
that's not an issue.

What is the relationship of the epoch-extended XID returned by
`txid_current()' to the XIDs in rows? Do all rows from a
previous epoch always have the FrozenXID?

--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B

--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general