Date: Mon, 7 Nov 2005 09:48:22 -0600 From: Kirk Strauser <kirk@strauser.com> To: freebsd-questions@freebsd.org Subject: Re: Fast diff command for large files? Message-ID: <200511070948.27910.kirk@strauser.com> In-Reply-To: <cb5206420511060539qe4d7c40i198e806950c60482@mail.gmail.com> References: <200511040956.19087.kirk@strauser.com> <200511060657.39674.kirk@strauser.com> <cb5206420511060539qe4d7c40i198e806950c60482@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2449820.Ro4SCRXWNq Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Sunday 06 November 2005 07:39, Andrew P. wrote: > Note, that the difference must be kept in RAM, so it won't work if there= =20 > are multi-gig diffs, but it will work very fast if the diffs are only=20 > 10-100Mb, it will work at close to I/O speed if the diff is under 10Mb. = =20 Thanks, Andrew! My Python script runs that algorithm in 17 seconds on a=20 400MB file with 10% CPU. =46or anyone interested, here's my implementation. Note that the readline(= )=20 method in Python always returns something, even at EOF (at which point you= =20 get an empty string). Also, empty strings evaluate as "false", which is=20 why the "if not (oldline or newline): break" code exits at the end. old_records =3D [] new_records =3D [] while 1: oldline, newline =3D oldfile.readline(), newfile.readline() if not (oldline or newline): break if oldline =3D=3D newline: continue try: new_records.remove(oldline) except ValueError: if oldline: old_records.append(oldline) try: old_records.remove(newline) except ValueError: if newline: new_records.append(newline) > Hope this gives you some idea. It did. It must've been a long work week, because that all seems so obviou= s=20 in retrospect but was completely opaque at the time. Thanks again! =2D-=20 Kirk Strauser --nextPart2449820.Ro4SCRXWNq Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- iD8DBQBDb3dL5sRg+Y0CpvERAhUcAJ0XNZ4mWtxZgvUbbPbWbX77lI/CmwCfWZrH aiMPAA3WfoC1eKlNWbAMiGA= =qYPx -----END PGP SIGNATURE----- --nextPart2449820.Ro4SCRXWNq--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511070948.27910.kirk>