From owner-freebsd-git@freebsd.org Thu Nov 19 22:50:17 2020 Return-Path: Delivered-To: freebsd-git@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 69C0F476379 for ; Thu, 19 Nov 2020 22:50:17 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CcZdN2BsSz3kht for ; Thu, 19 Nov 2020 22:50:16 +0000 (UTC) (envelope-from marcnarc@gmail.com) Received: by mail-qt1-x82a.google.com with SMTP id v11so5721492qtq.12 for ; Thu, 19 Nov 2020 14:50:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=yzg7Ye8Rr11T4FI/5zLOBwdyJFldt/vsXly33rNtSn4=; b=iKQWviBixGCgUo1UjAb5syWLuqPQCQQ6aA1zlYv5ba85POEdCvcDlh+/fiohO6NquL +DDnYusjNeGH4vrhka2K6rrSNSvXn8E7yfY0zhp5qDHeV+QKOANsYBXUCN0WPKVrlsGS 0Ow1WhbwwTyX8aDZTSKsWwt1ELPVrtdQf8Hl+Q4B/8IHyoivi3sSVAs2vFAv6+Tkj4n/ hNNC1jWLIZCj8AcC8pgWOC3jkpC7UKQM9nxCrdrLgSyuDz7WVfIuMaa1OJVQEZzpszJ0 Jz5meGhFrd1mFrv6GruKHd1X5U5OafODd0oo5FWlw3ug9tRWhWRU6VF6jBCDZs+4TSUd 78Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=yzg7Ye8Rr11T4FI/5zLOBwdyJFldt/vsXly33rNtSn4=; b=s89BFsY4E2TqnArJI7IuKIkj/gc3f2i6clxyUdy/5bSANzXCDbbRKWypg7goZvLvwT L0zxOK+eqI3ukt7iCOWFcDMT3xpUEUKV3HRoZSNIefp7U+oGU5oloeF1MaKVUgychnqW xkzv0uhWMbFQV1zoCNwXUeUqsvb2Yjqui8oexqw5ji00NGSmkIHz2S7lcyk0pttMgEK9 BMZKMpomYhmc1kbDSd+Xo0z43H+yRqgS0RlfpRxQY7i8QZ2i9nFERf4aRI/ZvYedVyZ5 zWSNDqWrbtWT6EHhwt3aQxBuNVBO/a7xzn9oppBzTENg0TbheBC1f81HwxvjESyH0ZFA Mrow== X-Gm-Message-State: AOAM532/ptUsgPd1l6Bdu5GrkNpH+U5/aU5SMZ+VTex/pTzEJR2X2QLa H5d1IjlDTdAgllhRcD6DLnw= X-Google-Smtp-Source: ABdhPJzL/E8kFB6LspUVADCl/f5hsyN9DTTirOkQadpxxXgKpHFQpwhDgwCfvaawwE/3uO6MzM9Y0A== X-Received: by 2002:aed:308e:: with SMTP id 14mr12978009qtf.275.1605826214886; Thu, 19 Nov 2020 14:50:14 -0800 (PST) Received: from [192.168.222.18] (192-222-183-158.qc.cable.ebox.net. [192.222.183.158]) by smtp.gmail.com with ESMTPSA id c9sm853800qkm.116.2020.11.19.14.50.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Nov 2020 14:50:14 -0800 (PST) Subject: Re: Monitoring commits on all branches To: Warner Losh Cc: Dan Langille , freebsd-git@freebsd.org References: <197541CC-FEA7-4B4C-936E-66A5625BB64C@langille.org> <3c9f6285-ae7c-1062-2dd3-42f8c953a230@gmail.com> From: Marc Branchaud Message-ID: <6ead26a8-54e3-ed0e-d1b7-28d69753dea4@gmail.com> Date: Thu, 19 Nov 2020 17:50:13 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4CcZdN2BsSz3kht X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=iKQWviBi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of marcnarc@gmail.com designates 2607:f8b0:4864:20::82a as permitted sender) smtp.mailfrom=marcnarc@gmail.com X-Spamd-Result: default: False [-4.00 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f8b0:4864:20::82a:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MID_RHS_MATCH_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-git@freebsd.org]; SPAMHAUS_ZRD(0.00)[2607:f8b0:4864:20::82a:from:127.0.2.255]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::82a:from]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-git] X-BeenThere: freebsd-git@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion of git use in the FreeBSD project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Nov 2020 22:50:17 -0000 On 2020-11-19 12:16 p.m., Warner Losh wrote: > > Thanks Marc! This is great advice... more comments below... > > On Thu, Nov 19, 2020 at 9:16 AM Marc Branchaud > wrote: > > On 2020-11-18 8:49 p.m., Dan Langille wrote: > > How can a repo be monitored for commits on all branches? > > > > I know how to ask a given branch: do you have any commits after > foo_hash? > > > > How do I: > > > > * get a list of all commits since foo_hash > > A quick a note about Warner's reply: > > > git log $hash..HEAD > > "HEAD" is just a git nickname for "whatever you have currently > checked-out" (which can be a branch, a tag, or "detached" commit SHA > ID). > > > * know which branch each of those commits was on (e.g. master, > branches/2020Q4) > > Unfortunately you'll find most normal git advice to be a bit > frustrating > with the FreeBSD repos, because FreeBSD doesn't work the way most > people > use git.  Specifically, the FreeBSD project does not ever merge > branches > (in the git sense of the word "merge").  Things would be very, very > much > easier if the FreeBSD project were to use git-style merging.  I believe > there are discussions underway about adjusting the whole MFC process > for > the git world.  I admit that part of my motivation in writing this > message is to provide grist for that mill. > > > FreeBSD src will be doing cherry-picks. There's only pain and suffering > from merge commits in this environment. Git's tools are adequate to cope > with individual and squashed cherry picks. Fair enough. I'm also sure that the git community would welcome patches that help make FreeBSD's workflow a bit smoother. > Fortunately even without git-merged branches, there are still git tools > that help, though they're not as precise as one would like. > > > They are for src. I suspect for ports they might not be. > > Let's look at a concrete example with the beta ports git repo (which I > just cloned), and compare the 2020Q4 and main branches.  I'll start > with > some overall exploration, then address your specific question. > > There are 298 commits in the 2020Q4 branch.  I know this because >      git merge-base origin/main origin/branches/2020Q4 > tells me where 2020Q4 branched off of main: commit 5dbe4e5f775ea2.  And >      git rev-list 5dbe4e5f775ea2..origin/branches/2020Q4 | wc -l > says "299".  (The "rev-list" command is a bare-bones version of "log" > that only lists commit SHA IDs.) > > Meanwhile there have been 4538 commits to the main branch since commit > 5dbe4e5f775ea2. > > As far as git is concerned, those 299 commits in 2020Q4 are *different* > from anything in main.  Even though most of them made the exact same > code changes, they were created at different times, often by different > authors, and they have different commit messages. > > > True. > > But you can still ask git to look at the code-change level to see which > 2020Q4 commits exactly replicated the code change from main: > >      git cherry -v origin/main origin/branches/2020Q4 > > This little piece of magic looks at the 299 commits in 2020Q4 that are > not in main and compares their code changes to the 4538 commits in main > that are not in 2020Q4.  It prints out the 299 2020Q4 commit SHA IDs, > prefixed with either a "- " or a "+ ".  The -v appends the commit > message's first line: > >      - 394d9746e5eea73f56334b2e7ddbdc8f686d6541 MFH: r550869 >      + 1ac9571956759c91d852ee92859a12e52dcbde48 MFH: r550885 r550886 >      - fd411bdfda55488b84de75e6b043c513a281abf0 MFH: r551209 >      - 533cdaa97457b3318aebcc53f7a1a46ea66721da MFH: r551236 >      ...... > > A "-" means that the commit matches the code change made by a commit in > main, while a "+" means that the commit's code change does not > *exactly* > match any main commit since commit 5dbe4e5f775ea2. > > So >      git cherry -v origin/main origin/branches/2020Q4 | grep ^- > shows us the 234 2020Q4 commits that made the exact same change as a > commit in main. > > And >      git cherry -v origin/main origin/branches/2020Q4 | grep ^+ > shows us that there are 41 not-exactly-the-same-change commits in > 2020Q4.  Mostly these are ones that combined two or more MFH's into one > commit (e.g. 2020Q4 commit 1ac95719567), or that changed a file in a > slightly different way (see the first patch hunk of 2020Q4 commit > cbd002878f2, compared to its counterpart in main: commit a5d21ea16b6). > > > Yes. These sorts of issues are why doing merge commits aren't always the > right way to go because we're not merging the entire history together > (doing a join), but rather just small subsets of it. How to cope with > the mostly the same small files tree that is our ports tree in the face > of git's guessing which does a poor job on such a tree is an interesting > problem to solve. merge commits can help some of the issue, but they can > create other issues as well when done incorrectly.... I admit I don't quite follow you there, but I'm particularly ignorant of the ports tree. I have some quite-likely-stupid ideas after having played with it for 10 minutes while composing my earlier message, but even if the ideas are somehow clever I suspect they'd entail too much workflow change to be palatable. > Even so, great hints for how to find cherry picked items. I suspect > we'll need to have some tooling that embeds hash(es) into the commit > message in some stylized way to allow tracking the non--trivial patch > changes that sometimes happen: squashing several cherry picks, necessary > differences due to branch drift, etc. It's unclear how we should do > this, though, in a way that works well, is reliable and doesn't add > undue friction to the process... It's traditional when doing a cherry-pick to add a Cherry-picked-from: line to the commit message. The "cherry-pick" command even has a -x option to automatically add such a line to the new commit's message. (There's also a "git interpret-trailers" command that is a general-purpose tool for manipulating "Foo: blah blah" lines in commit messages.) "git cherry-pick" might actually lead people away from squashing together multiple changes into one commit, because you have to make a bit of an effort to get cherry-pick to squash things up. I personally think the project would benefit from discouraging squashed-together MFC's. > Now to your specific question: Given a commit, how can we tell which > branches contain that code change?  Let's look at main commit > 6a9a8389d609 which I've determined, through manual spelunking, matches > 2020Q4's commit 02eba4048564. > > At a basic level, "git cherry" can tell us that *something* in 2020Q4 > made the same change as commit 6a9a8389d609.  Here I reversed the order > of the branch names in the command: >      git cherry origin/branches/2020Q4 origin/main | grep 6a9a8389d609 > This outputs: >      - 6a9a8389d609ca0370c8c6eb8f993c1aa4071681 > and the "-" tells me that 6a9a8389d609's code change is *somewhere* in > 2020Q4 unique 299 commits. > > Unfortunately there's no convenient git command that'll tell you > *which* > 2020Q4 commit replicated commit 6a9a8389d609.  For that, we need to > do a > bit of scripting: > > -----8<-----8<-----8<-----8<----- > > #!/bin/sh > > TARGET="6a9a8389d609" > > BASE=`git merge-base origin/branches/2020Q4 origin/main` > > TARGET_PATCH_ID=`git show -p $TARGET | git patch-id --stable | cut -f 1 > -d ' '` > > for REV in `git rev-list $BASE..origin/branches/2020Q4`; do >     PATCH_ID=`git show -p $REV | git patch-id --stable | cut -f 1 > -d ' '` >     if [ "$PATCH_ID" = "$TARGET_PATCH_ID" ]; then >        echo "Found a commit that replicated target commit $TARGET:" >        echo >        git show -s $REV >        exit 0 >     fi > done > > echo "Did not find any commit that exactly replicated $TARGET." > exit 1 > > ----->8----->8----->8----->8----- > > This only looks at the 2020Q4 branch, but it's easily adapted to > look at > a user-specified branch, or multiple branches.  (In the above I used > "git patch-id", which is what "git cherry" uses internally to > identify a > commit's code changes.) > > I hope all this helps a bit! > > > It does. I thought I'd had my head deep into git, but hadn't stumbled > upon this. I've been using git for over 10 years, and I still discover new things. This "git cherry" stuff, for example, I've only started using a little bit in the last few months. > It looks useful enough I'll try to add a section to my FAQ. I'm honoured! M.