Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Jan 2007 20:54:47 -0600
From:      Kirk Strauser <kirk@strauser.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: Tuning PostgreSQL for bulk imports
Message-ID:  <200701072054.48442.kirk@strauser.com>
In-Reply-To: <20070108122245.53e3140b@localhost>
References:  <200701071301.27423.kirk@strauser.com> <200701071618.11242.kirk@strauser.com> <20070108122245.53e3140b@localhost>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart3671080.6uiMXo8huO
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Sunday 07 January 2007 19:22, Norberto Meijome wrote:

> I could be wrong and this only apply to generating indexes?

That's what we're doing now.  By dropping the table and recreating it, all =
the=20
index maintenance gets deferred until one batch at the end (which is vastly=
=20
faster in practice).  We also wrap the whole thing in a transaction so that=
=20
select queries still have access to the old data until the instant the comm=
it=20
is finished, assuming that the import doesn't fail for some reason and get=
=20
rolled back.

> Something also to consider is , do you *have* to import all that data eve=
ry
> time? ie., can you create data partitions, assign a new table to each
> day ,create the appropriate rules, and then only dump from FoxPro the last
> day's info?

I wish.  First, we run the import hourly (more or less).  I save the output=
 of=20
the last hour's Foxpro->PostgreSQL conversion, and run a hand-written=20
optimized diff against it.  If nothing has changed, that table gets skipped=
=2E =20
If a reasonably small percentage of rows have changed (which is almost alwa=
ys=20
the case), then I re-write it as a serious of deletes followed by a bulk=20
import.  Basically, I patch the table.  It's nice to see a twenty million r=
ow=20
table update reduced to a patch file 100 lines long.  Oh, for the record, m=
y=20
diff algorithm is written in Python and is still IO limited, even when=20
several copies are running in parallel.

> if cpu is hitting to hard, you can always run nice pgsql while importing
> this - it will still take 100% cpu, but yield as soon as something else
> needs to do some work . Of course, this wont help if you are IO bound (i
> think? ). *are* you CPU bound ?

{Un,}fortunately, no.  The CPUs are still idle enough to get a lot of other=
=20
processing done without slowing the import process.

> Let us know what solution you come up with, and good luck :)

Will do, and thanks!
=2D-=20
Kirk Strauser

--nextPart3671080.6uiMXo8huO
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQBFobJ45sRg+Y0CpvERAg+cAJ4gfRM7Cdk2zags461NGEvIf1F6cQCgiTCc
XGyAPB3D22NgKYiNsfxHAKI=
=0kVj
-----END PGP SIGNATURE-----

--nextPart3671080.6uiMXo8huO--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701072054.48442.kirk>