pg_prefaulter

command module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 14, 2017 License: Apache-2.0 Imports: 2 Imported by: 0

README

= `pg_prefaulter`
include::front-matter.adoc[]
:keywords: postgresql, go, wal, fault, prefault, replication

[[introduction]]
== Introduction

https://github.com/joyent/pg_prefaulter[`pg_prefaulter`] is used to mitigate the
effects of PostgreSQL replication lag.

In a primary-follower database cluster using PostgreSQL streaming replication,
the PostgreSQL's primary database instance enjoys the benefit of being able to
parallelize its random IO workload.  The follower, however, is not so fortunate.
The follower receives a sequential stream of WAL records and sequentially
schedules all `pread(2)` IOs in order in a single process with blocking IO.  If
the average IO latency per random `pread(2)` is ~8ms, that means that the
follower can only complete ~125 random IO `pread(2)` calls per second.

The `pg_prefaulter` reads the stream of WAL records using
https://www.postgresql.org/docs/current/static/pgxlogdump.html[`pg_xlogdump(1)`]
and schedules concurrent IOs in order to prefault PostgreSQL heap pages into the
follower's filesystem cache.  The effect of this is dramatic in that the
follower is able to apply pages with a filesystem cache-hit.

If you've seen replication lag due to
https://github.com/reorg/pg_repack/[`pg_repack`],
https://www.postgresql.org/docs/current/static/sql-vacuum.html[`VACUUM`], or a
write-heavy workload, the `pg_prefaulter` should reduce apply lag considerably.
See the <<background,Background section>> for additional details.

[[usage]]
== Usage

[source,sh]
----
$ go get github.com/joyent/pg_prefaulter                // <1>
$ pg_prefaulter run                                     // <2>
$ pg_prefaulter -l DEBUG run                            // <3>
$ pg_prefaulter -l DEBUG run -D .pgdata_primary -p 5433 // <4>
$ pg_prefaulter -h                                      // <5>
$ pg_prefaulter run -h                                  // <6>
----
<1> Build and install `pg_prefaulter` into `$GOPATH/bin` (where `GOPATH` is
    likely `go env GOPATH`).
<2> The default mode of running `pg_prefaulter`
<3> Run `pg_preafulter` with ``DEBUG``-level logging.
<4> Run `pg_prefaulter` with various PG connect options.
<5> `pg_prefaulter` global help options
<6> ``pg_prefaulter``'s ``run``-specific options

NOTE: `pg_prefaulter` exposes itself as a https://github.com/google/gops[`gops`
client].

[[config]]
== Configuration Files

The following is an example configuration file.  This can be used instead of CLI
arguments.  The default configuration filename is `pg_prefaulter.toml`.  The
following is a reasonable default `pg_prefaulter.toml` configuration file.

[source,toml,numbered]
----
[log]
# level can be set to "DEBUG", "INFO", "WARN", "ERROR", or "FATAL"
#level = "INFO"

[postgresql]
#pgdata = "pgdata"
#database = "postgres"
#host = "/tmp"
#password = ""
#port = 5432
#user = "postgres"

[postgresql.xlog]
#pg_xlogdump-path = "/usr/local/bin/pg_xlogdump"

[circonus]
#enabled = true

[circonus.api]
# token is a required value if circonus.enabled=true
#token = ""

[circonus.broker]
# id should be set to "35" (the public HTTPTrap broker) if you have enterprise
# brokers configured in your account.
#id = 35

[circonus.check]
#tags = "dc1:mydc"
----

A complete configuration file with all options listed can be seen in
`pg_prefaulter.toml.sample-all`.

[[development]]
== Development

1. `make freshdb-primary`
2. `make freshdb-follower`
3. `make resetdb`
4. `make build`
5. `make check`
6. `./pg_prefaulter ...` # Iterate

To cross-compile and build a release use
https://github.com/goreleaser/goreleaser[`goreleaser`] and the `make release` or
`make release-snapshot` targets.  For development, the following should be
sufficient:

1. `go get -u github.com/goreleaser/goreleaser`
2. `goreleaser --snapshot --skip-validate --skip-publish --rm-dist`

[source,sh]
----
$ make
pg_prefaulter make(1) targets:
build            Build pg_prefaulter binary
check            Run go test
fmt              fmt and simplify the code
release-snapshot  Build a snapshot release
release          Build a release
vendor-status    Display the vendor/ status
vet              vet the binary (excluding dependencies)

cleandb-primary  Clean primary database
freshdb-primary  Drops and recreates the primary database
initdb-primary   initdb(1) a primary database
psql-primary     Open a psql(1) shell to the primary
startdb-primary  Start the primary database

cleandb-follower  Clean follower database
freshdb-follower  Drops and recreates the follower database
initdb-follower  initdb(1) a follower database
psql-follower    Open a psql(1) shell to the follower
startdb-follower  Start the follower database

createdb         Create the test database
dropdb           Reset the test database
gendata          Generate data in the primary
resetdb          Drop and recreate the database
testdb           Run database tests

controldata      Display pg_controldata(1) of the primary
psql             Open a psql(1) shell to the primary

clean            Clean target
cleandb-shard    Clean entire shard

help             This help message
----

[[background]]
== Background

PostgreSQL forks a separate process to handle each connection to the database.
As database worker processes handle work, they modify database files and commit
transactions.  Commit of a database transaction flushes an entry into the
write-ahead-log (WAL).

PostgreSQL's streaming-based replication system is based on "log-shipping:" the
WAL is streamed to remote hosts.  Database replicas apply the changes listed in
the instructions within the WAL using the PostgreSQL crash recovery mechanism:
they read the log from beginning to end, find the changes to the underlying
datafiles (the "PostgreSQL Heap", or "Heap" for short) and make the relevant
changes to the database files.

The replica's WAL-apply operation is performed in a single process, using a
single thread.  Each time a WAL-entry is "consumed" as the replica reads in the
corresponding page of the underlying datafile.  The WAL-replay process on the
replica waits for the serial execution of disk I/O to complete and load the
underlying page(s) from the heap.  Unlike the primary, the follower lacks the
ability to spread its work across multiple processes.  As a consequence, the
replicas only perform single-process, single-threaded, blocking IO, and cannot
apply the WAL as quickly as the primary who generates the WAL files and are
using parallel IO.  To add insult to injury, the follower is prone to having its
filesystem cache fault, resulting in a physical disk IO, further slowing down
the apply process.

Fortunately, the WAL-itself gives perfect information to the replica regarding
what I/O operations will be required (for example, "write 50 bytes to relation
16439 at block number 2357235235").  However, for correctness of transaction
commit it is important that the WAL be replayed in-order.  The `pg_prefaulter`
is a helper application for the replica: it queries the replica PostgreSQL
process for information about where it is in the replay.  The `pg_prefaulter`
then uses the information encoded in the WAL files themselves to initiate
async-I/O operations (with a configurable degree of concurrency) with the goal
of giving the operating system a chance to load the file into the filesystem
cache before the PostgreSQL startup process (e.g. `postgres: startup process
recovering 00000001000002870000005E`) fetches the page from the filesystem.
Once the portions of the database files are resident in filesystem cache, the
PostgreSQL replica will be able to access them without having to wait so long
because the necessary page will already be resident in the filesystem cache.
The end result is that even though the replica is still a single-process with a
single thread, access to random pages will return quickly because the `pread(2)`
call will result in a filesystem cache-hit operation (versus the pessimistic
path where the page faults in both PostgreSQL's cache, the VFS cache, and the
apply process blocks performing an an extended `pread(2)` operation).

Before:: Replication apply lag before and during the recovery once `pg_prefaulter` was deployed
  image:images/lag_recovery_thumb.png["Apply Lag recovery",height=400,link="images/lag_recovery.png"]
After:: Replication apply lag as observed during the same workload seen earlier while `pg_prefaulter` is still running
  image:images/lag_steady_state_thumb.png["Apply Lag steady-state",height=400,link="images/lag_steady_state.png"]

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL