Jason Dusek
2013-02-03 15:22:44 UTC
Hello List,
Imagine a table of records describing the up-to-date state of
some objects:
id | t | data columns ...
------+-------------+------------------
uuid | timestamptz | ...
The `id' column is a PRIMARY KEY. When an object is updated, the
old record is archived and a new record, with `t' set to
transaction_timestamp(), is added. So the table really contains
just the up-to-date state of objects.
An application would like to cache this up to date state,
synchronizing at regular intervals. The application might not
maintain a persistent connection to the database, so
LISTEN/NOTIFY is not to be preferred. What are some algorithms
by which one can retrieve new rows since the last sync?
Since the sync is made at regular intervals -- let's call it
once a minute -- one could query for all records with `t' that
is less than a minute old. For safety's sake, one can query for
*two* minutes of data.
This would seem to solve the problem. It works pretty well.
However, a new challenger appears: imagine an inserting
transaction that runs for five minutes. None of rows are visible
and then they are added -- with a transaction_timestamp that is
five minutes in the past! These rows are never synced.
Now you could argue that writes shouldn't occur in such long
transactions and you would be right; but it happens that one-off
tools exhibit bad behaviour one wouldn't accept in a production
application, and fixing the tools can be hard to make happen.
I was reading today about the `xmin' column, `txid_current()`
and `pg_export_snapshot()` and I wonder if there is not a better
way to sync, using transaction IDs instead of time.
http://www.postgresql.org/docs/9.2/static/ddl-system-columns.html
http://www.postgresql.org/docs/9.2/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION
The idea would be, to store information about the last XID in
the last sync and search for XIDs committed since then upon
reconnecting for sync. Perhaps `txid_current_snapshot()'
preserves enough information. Is this a plausible technique?
Would it be a misuse of XIDs?
--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B
Imagine a table of records describing the up-to-date state of
some objects:
id | t | data columns ...
------+-------------+------------------
uuid | timestamptz | ...
The `id' column is a PRIMARY KEY. When an object is updated, the
old record is archived and a new record, with `t' set to
transaction_timestamp(), is added. So the table really contains
just the up-to-date state of objects.
An application would like to cache this up to date state,
synchronizing at regular intervals. The application might not
maintain a persistent connection to the database, so
LISTEN/NOTIFY is not to be preferred. What are some algorithms
by which one can retrieve new rows since the last sync?
Since the sync is made at regular intervals -- let's call it
once a minute -- one could query for all records with `t' that
is less than a minute old. For safety's sake, one can query for
*two* minutes of data.
This would seem to solve the problem. It works pretty well.
However, a new challenger appears: imagine an inserting
transaction that runs for five minutes. None of rows are visible
and then they are added -- with a transaction_timestamp that is
five minutes in the past! These rows are never synced.
Now you could argue that writes shouldn't occur in such long
transactions and you would be right; but it happens that one-off
tools exhibit bad behaviour one wouldn't accept in a production
application, and fixing the tools can be hard to make happen.
I was reading today about the `xmin' column, `txid_current()`
and `pg_export_snapshot()` and I wonder if there is not a better
way to sync, using transaction IDs instead of time.
http://www.postgresql.org/docs/9.2/static/ddl-system-columns.html
http://www.postgresql.org/docs/9.2/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION
The idea would be, to store information about the last XID in
the last sync and search for XIDs committed since then upon
reconnecting for sync. Perhaps `txid_current_snapshot()'
preserves enough information. Is this a plausible technique?
Would it be a misuse of XIDs?
--
Jason Dusek
pgp // solidsnack // C1EBC57DC55144F35460C8DF1FD4C6C1FED18A2B
--
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Sent via pgsql-general mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general