From owner-freebsd-git@freebsd.org Fri Dec 4 19:53:41 2015 Return-Path: Delivered-To: freebsd-git@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D390A41733; Fri, 4 Dec 2015 19:53:41 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 50E3410F8; Fri, 4 Dec 2015 19:53:41 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [IPv6:::1]) by freefall.freebsd.org (Postfix) with ESMTP id 3F0721C56; Fri, 4 Dec 2015 19:53:41 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from mail.xzibition.com (localhost [172.31.3.2]) by mail.xzibition.com (Postfix) with ESMTP id AD65118BAE; Fri, 4 Dec 2015 19:53:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at mail.xzibition.com Received: from mail.xzibition.com ([172.31.3.2]) by mail.xzibition.com (mail.xzibition.com [172.31.3.2]) (amavisd-new, port 10026) with LMTP id zr_4PRs3dMgs; Fri, 4 Dec 2015 19:53:37 +0000 (UTC) Subject: Re: FYI: SVN to GIT converter currently broken, github is falling behind DKIM-Filter: OpenDKIM Filter v2.9.2 mail.xzibition.com EC16018BA5 To: =?UTF-8?Q?Ulrich_Sp=c3=b6rlein?= , freebsd-git@freebsd.org References: <563EAAB8.5020702@freebsd.org> Cc: freebsd-current@freebsd.org, git-admin@freebsd.org From: Bryan Drewery Openpgp: id=F9173CB2C3AAEA7A5C8A1F0935D771BB6E4697CF; url=http://www.shatow.net/bryan/bryan2.asc Organization: FreeBSD Message-ID: <5661EF43.9040406@FreeBSD.org> Date: Fri, 4 Dec 2015 11:53:39 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="KIw4nppIcKxsRKMVRwiWAsIG3FSmtXmw6" X-BeenThere: freebsd-git@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of git use in the FreeBSD project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Dec 2015 19:53:41 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --KIw4nppIcKxsRKMVRwiWAsIG3FSmtXmw6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/4/2015 10:49 AM, Ulrich Sp=C3=B6rlein wrote: > 2015-11-08 12:06 GMT+01:00 Ulrich Sp=C3=B6rlein : >> 2015-11-08 11:32 GMT+01:00 Ulrich Sp=C3=B6rlein : >>> 2015-11-08 2:51 GMT+01:00 Alfred Perlstein : >>>>> >>>> Uli, >>>> >>>> One of the biggest concerns I've heard from folks using FreeBSD's gi= t mirror >>>> is that the hashes can change. >>>> >>>> I have a question about this. Is it possible to keep track of what= the >>>> "official" git mirror (on github) is doing and keep that as a log. = Then >>>> that log can be used to replay commits when there is a divergence pr= oblem. >>>> >>>> What I'm basically saying is that let's take this small example: >>>> >>>> importer is working fine @rev 10000 >>>> imports 10000 >>>> imports 10001 >>>> imports 10002 >>>> something happens to importer to give indeterminate shas. >>>> imports 10003 - sha is "unstable" sha3 >>>> imports 10004 - sha is "unstable" sha4 >>>> imports 10005 - sha is "unstable" sha5 >>>> imports 10006 - sha is "unstable" sha6 >>>> importer is fixed >>>> >>>> >>>> At this point normally we'd rewind the importer to 10002 and then fo= rce >>>> update the affected branches. >>>> >>>> My question is... can the imports of 10003, 10004, 10005 and 10006 b= e put >>>> into the importer such that any "mirror site" that re-does the impor= t using >>>> the most up to date importer will get the same shas. >>>> >>>> That would allow to proceed with 10007, etc without force pushing. >>>> >>>> This should be possible based on querying "git" for the meta data as= sociated >>>> with sha3..sha6 and then forcing those commits to have the same meta= data. >>>> >>>> This would eliminate the concern about shas in the mirror changing t= hat I've >>>> heard. >>> >>> The goal of the conversion is that everyone can re-do the conversion >>> in their basement and come up with the same history and checksums. >>> This was not the case when I first started, as there was some >>> non-deterministic hash structure being used in svn2git. This was fixe= d >>> in the code and then all converter runs produced the very same >>> results. >>> >>> The scenario that we have right now, is that one of the merge commits= >>> done about two weeks ago is being handled different by svn2git w/ svn= >>> v1.8 vs. svn v1.9 and I haven't investigated yet how the API's >>> behavior changed to cause this. I'm afraid I also swapped out all my >>> knowledge about svn2git internals and will have to redo this all from= >>> scratch :/ >>> >>> Your suggestion could only work, if we hard-code this svn revision >>> special handling into svn2git, either in the code or by providing mor= e >>> mappings and rules to the process. svn2git should run hermetic and no= t >>> poke at github's commits to see how things were handled in the past. >>> It has to be self-sufficient and must not depend on github. >>> >>> This would also only work, if the "breakage" window was very small, >>> but it is already about two weeks long and will surely increase till = I >>> find the proper fix. >>> >>> So, to take a stand here: this sort of kludge is unlikely to ever >>> happen. Git commit hashes *might* change in the future. I really don'= t >>> see how this is a big deal anyway. It happened once and I'm trying t= o >>> have it never happen again. But why are people afraid of this >>> happening? Every "official" git commit is tagged with a SVN revision >>> and the contents of those revisions are obviously correct (just not >>> the ancestry and the commit objects, possibly). So it would be easy t= o >>> write a script that replays VendorA's git history and swaps out the >>> new official commits for the old official commits. There would be no >>> merge conflicts. >>> >>> I can see how this would be annoying if you have 100 developers and >>> dozens of branches that are far from mainline FreeBSD. But I'm sure >>> these companies that depend on git will come forward and donate some >>> of their developer manpower to help me with keeping the converter >>> stable/deterministic. Right? Right? :) :) >>> >>> Cheers, >>> Uli >> >> Quick update: doc is so far unaffected by svn 1.9, but for ports, the >> drift happened as of Jul 18, so you'd need to special case a lot of >> commits. >> >> Here's the same commit, and the difference between 1.8 and 1.9: >> >> % git cat-file commit 803795d >> tree 7fc83aba022834da5c218114b09ad4640735bcc0 >> parent c96fb0418e545a569b5975b4d878a30a948c29d5 >> author olgeni 1437203525 +0000 >> committer olgeni 1437203525 +0000 >> >> Upgrade to version 0.4.1. >> % git cat-file commit 61ca43b >> tree 7fc83aba022834da5c218114b09ad4640735bcc0 >> parent c96fb0418e545a569b5975b4d878a30a948c29d5 >> author olgeni 1437203529 +0000 >> committer olgeni 1437203529 +0000 >> >> Upgrade to version 0.4.1. >> >> >> In case you don't see it, there's a 4s difference in the timestamps >> for authoring and committing. Here's the original: >> >> % svn log -vc392405 svn://svn.freebsd.org/ports >> ----------------------------------------------------------------------= -- >> r392405 | olgeni | 2015-07-18 09:12:05 +0200 (Sat, 18 Jul 2015) | 2 li= nes >> Changed paths: >> M /head/www/elixir-maru/Makefile >> M /head/www/elixir-maru/distinfo >> >> Upgrade to version 0.4.1. >> >> ----------------------------------------------------------------------= -- >> >> So yeah, svn 1.9 returned a timestamp that was off by 4s. WTF? >> >> For base it's actually even more complicated than I had thought so >> far. But let's take this one step at time ... >=20 > An update, which you won't like to hear: >=20 > SVN v1.9 is totally innocent, the API changed a little and has been > patched, this is not the source of the difference between the > currently published repo and a clean run. The difference stems from > the fact that the svnsync'ed copy on git.freebsd.org was poisoned and > is *NOT* in sync with our main repo. People tell me this is due to a > shortcoming of svnsync that can race and thus produce different > metadata for a commit, depending on when it is run. >=20 > This is a clusterfuck. >=20 > Both freebsd-base and freebsd-ports are no longer reproducible by > third-parties. It is only a matter of time when freebsd-doc is > affected. >=20 > clusteradm@ sadly has remained rather silent on this issue and unless > we can move the mirroring to rsync or syncthing or whatever I don't > see how the project can continue to provide a so-called git "mirror" >=20 Running svnsync in 2 places and then calling them mirrors seems odd. It's only needed once. (svnsync hurts global warming too). Then just rsync or use the git mirroring features. --=20 Regards, Bryan Drewery --KIw4nppIcKxsRKMVRwiWAsIG3FSmtXmw6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBAgAGBQJWYe9DAAoJEDXXcbtuRpfPMpIIAM0eaiH62z729Usg40skt+Ys jrQqnWegG/AGi5ULgMBFlh7LivlmCPobnwQokX83aZEsBhUvPKtfQp7b9Y14QWO1 cful3FRd4VfK4ti2uI0FMHnKmvLyb2iEATSnUCdAf+J+zB4kK14kz2gGDOUrec2+ A83MGgk4bJFRi1wFeqtzO6ZfBjecQPyikXICxX+rNtYd8pMfuuepr2MyneISUzyv Whc40hYYdyw1AIGv1mtMJhn4VQkKwjiyhzjLmHUsm+RSDlAOoV0UY6EQeVnbXL/L msRpLxhB4qq5EghEjqbUuL8C5wkwt9JO/BrW0/hSQXuBHKRKiab2pPVxXCZ48Kg= =rPTF -----END PGP SIGNATURE----- --KIw4nppIcKxsRKMVRwiWAsIG3FSmtXmw6--