Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Nov 2005 11:29:32 -0600
From:      Kirk Strauser <kirk@strauser.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: Fast diff command for large files?
Message-ID:  <200511071129.34262.kirk@strauser.com>
In-Reply-To: <cone.1131381646.500858.17113.1000@zoraida.natserv.net>
References:  <200511040956.19087.kirk@strauser.com> <200511041129.17912.kirk@strauser.com> <cone.1131381646.500858.17113.1000@zoraida.natserv.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2694721.7iXZAHlP67
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Monday 07 November 2005 10:40, francisco@natserv.net wrote:

> I had the same setup a while back.
> A few suggestions.

Thanks for the tips; unfortunately, any fix that involves touching the=20
=46oxPro code is basically impossible.  It's not that we *can't*, but that=
=20
the sole FoxPro programmer at our company is completely occupied with other=
=20
projects.

> What type of system is this? In particular do any record can be modified
> or are only recent records changed?

Nope - every line in each table is subject to change.

Here's how our current system works:

1) Copy each FoxPro table file (and associated memo file if one exists) to =
a=20
Unix server via Samba.
2) Run my modified version of the "xbase" program to convert each table to =
a=20
tab-delimited file that can be loaded into PostgreSQL using the "copy=20
table" command.  These files are named "foo.dump", "bar.dump", etc.
3) If "foo.dump-old" exists:
    a) Using Andrew's algorithm, get the difference between foo.dump-old and
       foo.dump.  Write these out as a set of "delete from ..." commands and
       a "copy table" command.  Pipe this relatively tiny file into the
       "psql" command to upload the modifications.
  Otherwise:
    b) Use the psql command to upload foo.dump
4) "mv foo.dump foo.dump-old"
5) Profit!

I've already cut the runtime in half.  The next big step is going to be=20
getting our Windows admin to install rsync on the fileserver so that we can=
=20
minimize the time spent in step one.  With the exception of the space=20
required by keeping the old version of the dump files (step 4), this is=20
exceeding all of our performance expectations by a wide margin.

Even better, step 3a cuts the time that the PostgreSQL server has to spend=
=20
committing the new data by several orders of magnitude.  The net effect is=
=20
that our web visitors don't see a noticeable slowdown during the import=20
stage.  =20
=2D-=20
Kirk Strauser

--nextPart2694721.7iXZAHlP67
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----

iD8DBQBDb47+5sRg+Y0CpvERAn3HAJ48eDr6BzIr4ynASeXtd4EQPKRiLACdFfe1
VMB6s+iELhud7pAvWPhqRBU=
=7EEU
-----END PGP SIGNATURE-----

--nextPart2694721.7iXZAHlP67--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511071129.34262.kirk>