Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Jan 2007 00:15:22 +0100
From:      Ivan Voras <ivoras@fer.hr>
To:        freebsd-questions@freebsd.org
Subject:   Re: Tuning PostgreSQL for bulk imports
Message-ID:  <enruui$v68$1@sea.gmane.org>
In-Reply-To: <200701071618.11242.kirk@strauser.com>
References:  <200701071301.27423.kirk@strauser.com> <enrnt5$9h6$1@sea.gmane.org> <200701071618.11242.kirk@strauser.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig00ECDF92C285B46F597CD921
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Kirk Strauser wrote:
> On Sunday 07 January 2007 15:15, Ivan Voras wrote:

>> - What might help you is to keep the WAL (write-ahead-log, i.e. journa=
l)
>> files on a completely separate (and fast) drive from the rest of the
>> database, to allow parallelism and speed. For best results, format it
>> with 32k blocks/8k fragments.
>=20
> Thanks for the idea.  Assuming I actually get my wish of a matched set =
of 4=20
> high speed drives, would I be better off setting one aside for the jour=
nal,=20
> or striping them all together so everything benefits?

Everything is first written to the WAL, and then copied to the "normal"
database. So, speed is important, but maybe striping all 4 drives would
be an overkill, though. The logs are written sequentially so seek times
are not that important. What are your current IO rates? Since you have
only two drives you might be restricted by available disk bandwidth...

>> - If you don't specifically need the atomicity of transactions, you
>> might divide your import into many small transactions, for example one=

>> for every 100,000 rows instead of doing 8 million at once.
>=20
> Would that actually make a difference in total elapsed time spent impor=
ting?

It will certainly lower the final "commit" time and avoid copying large
chunks between the WAL and the database, but I don't have my own
measurements.

Do you have a SMP machine? If so, you can try importing NCPU of the big
transactions in parallel. This could help you almost linearly, if your
disks allow it. (see
http://www.postgresql.org/docs/8.1/interactive/runtime-config-wal.html :
you can increase commit_delay and decrease commit_siblings to achieve
sort of localized-async operation across several (parallel) transactions)=


I've just remembered - do you run VACUUM ANALYZE after your big imports?
You should (to get performance on SELECTs afterwards, but it won't help
the imports themselves).

>> Also, what version of PostgreSQL are you using? As a general rule, the=

>> newer the version, the faster it is. This is especially true if you're=

>> using 7.x - go to 8.1.5 immediately (but don't use 8.2 until 8.2.1 get=
s
>> out).
>=20
> We're already running 8.2 because it fixed some problems we were having=
 with=20
> 8.1.5.  Other than the excessively long import times, it's absolutely=20
> screaming and we couldn't be more pleased.

Ok. Be careful not to use SELECT ... LIMIT ALL ;)


--------------enig00ECDF92C285B46F597CD921
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFoX8QldnAQVacBcgRAvKjAKD7JUUai6aglWT+r0FyyVZfVvWNyACg6a3A
JgbKfyIDMTI7XGREDDnORhI=
=4sNE
-----END PGP SIGNATURE-----

--------------enig00ECDF92C285B46F597CD921--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?enruui$v68$1>