From owner-svn-src-all@freebsd.org Tue Dec 8 22:23:02 2020 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0E2C84AD6F2 for ; Tue, 8 Dec 2020 22:23:02 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CrF7972Wxz3FVd for ; Tue, 8 Dec 2020 22:23:01 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) (Authenticated sender: kevans) by smtp.freebsd.org (Postfix) with ESMTPSA id DFD88234CF for ; Tue, 8 Dec 2020 22:23:01 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: by mail-qk1-f174.google.com with SMTP id 1so265731qka.0 for ; Tue, 08 Dec 2020 14:23:01 -0800 (PST) X-Gm-Message-State: AOAM532Je/EtyddIY52mndu0d3Its8gOyQI7wLQdzLPHxOh94BuEgvAk P44/jM2bnh4C05owr8NzExQQI9BeGx4f2haN2CQ= X-Received: by 2002:a37:ef05:: with SMTP id j5mt34134852qkk.120.1607466181456; Tue, 08 Dec 2020 14:23:01 -0800 (PST) MIME-Version: 1.0 References: <202012081405.0B8E5PJM029095@repo.freebsd.org> In-Reply-To: <202012081405.0B8E5PJM029095@repo.freebsd.org> From: Kyle Evans Date: Tue, 8 Dec 2020 16:22:46 -0600 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: svn commit: r368439 - head/share/mk Cc: src-committers , svn-src-all , svn-src-head Content-Type: text/plain; charset="UTF-8" X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Dec 2020 22:23:02 -0000 On Tue, Dec 8, 2020 at 8:05 AM Kyle Evans wrote: > > Author: kevans > Date: Tue Dec 8 14:05:25 2020 > New Revision: 368439 > URL: https://svnweb.freebsd.org/changeset/base/368439 > > Log: > src.opts.mk: switch to bsdgrep as /usr/bin/grep > > [.. snip ...] > > I have some WIP to make bsdgrep faster, but do not consider it a blocker > when compared to the pros of switching now (aforementioned bugs, licensing). > > [.. snip ...] I was asked to collect some stats from that patch to speed up bsdgrep; while the patch isn't ready yet, I decided to do a (really really) rough comparison between gnugrep/bsdgrep as well to follow-up on the speed aspect and perhaps provide a baseline. You can view the results of those comparisons (user time(1) output), which felt 'representative enough' of the difference, here: https://people.freebsd.org/~kevans/stable/grep-stats.txt Some notes, to help with interpretation: - This hardware is not great - All runs were doing a recursive grep from the root of a non-active base/head checkout, -I was not specified, in search of instances of the same pattern (but actually literal) - ${grep}-non == ${grep} -r 'closefrom' . - ${grep}-n == ${grep} -nr 'closefrom' . - ${grep}-c8 == ${grep} -rC8 'closefrom' . The sampling was low enough quality that we can probably just discard all of this, but I found the final two comparisons (gnugrep vs. gnugrep -n vs. gnugrep -C8 and bsdgrep vs. bsdgrep -n vs. bsdgrep -C8) interesting enough that I decided to share this despite the quality. Here are the key points that I find interesting: gnugrep sees a pretty significant difference from the baseline to either of the other two modes. This was expected to some extent- both -n and -C8 will imply some level of line tracking when you're taking the chunked search approach, as you need to count lines even in chunks that don't have any matches for -n and you might even need to do the same for -C8. I think the much smaller difference between the gnugrep baseline and -C8 indicates that they probably don't take the simple/slow approach of counting all newlines to determine that you have 8 and where the 8th prior started, but instead wait for a match then start backtracking. The surprising part about the bsdgrep comparison was that there is significant slowdown when we're checking context. There is almost certainly room for improvement there. Thanks, Kyle Evans