From owner-freebsd-current@freebsd.org Thu May 3 17:55:06 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B90C8FB3DCF for ; Thu, 3 May 2018 17:55:06 +0000 (UTC) (envelope-from se@freebsd.org) Received: from mailout04.t-online.de (mailout04.t-online.de [194.25.134.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mailout00.t-online.de", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 40BD97DBAB; Thu, 3 May 2018 17:55:06 +0000 (UTC) (envelope-from se@freebsd.org) Received: from fwd39.aul.t-online.de (fwd39.aul.t-online.de [172.20.27.138]) by mailout04.t-online.de (Postfix) with SMTP id 8494841AD8E0; Thu, 3 May 2018 19:54:58 +0200 (CEST) Received: from Stefans-MBP-LAN.fritz.box (ZqRD+sZJohtyzKLDP11qC3ggITZCke2cPoS+aSQnTXPknD2T6lFiPS7VmUW14LTZI3@[84.154.116.170]) by fwd39.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1fEIRV-0g0uIa0; Thu, 3 May 2018 19:54:57 +0200 Subject: Re: grep extremely slow for LC_CTYPE=C? [SOLVED] To: Kyle Evans References: <08d32caa-aa44-cff7-d09c-af2444674958@freebsd.org> From: Stefan Esser Openpgp: preference=signencrypt Autocrypt: addr=se@freebsd.org; prefer-encrypt=mutual; keydata= xsBNBFVxiRIBCADOLNOZBsqlplHUQ3tG782FNtVT33rQli9EjNt2fhFERHIo4NxHlWBpHLnU b0s4L/eItx7au0i7Gegv01A9LUMwOnAc9EFAm4EW3Wmoa6MYrcP7xDClohg/Y69f7SNpEs3x YATBy+L6NzWZbJjZXD4vqPgZSDuMcLU7BEdJf0f+6h1BJPnGuwHpsSdnnMrZeIM8xQ8PPUVQ L0GZkVojHgNUngJH6e21qDrud0BkdiBcij0M3TCP4GQrJ/YMdurfc8mhueLpwGR2U1W8TYB7 4UY+NLw0McThOCLCxXflIeF/Y7jSB0zxzvb/H3LWkodUTkV57yX9IbUAGA5RKRg9zsUtABEB AAHNLlN0ZWZhbiBFw59lciAoVC1PbmxpbmUpIDxzdC5lc3NlckB0LW9ubGluZS5kZT7CwH8E EwEIACkFAlhtTvQCGwMFCQWjmoAHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBH67Xv Wv31RAn0B/9skuajrZxjtCiaOFeJw9l8qEOSNF6PKMN2i/wosqNK57yRQ9AS18x4+mJKXQtc mwyejjQTO9wasBcniKMYyUiie3p7iGuFR4kSqi4xG7dXKjMkYvArWH5DxeWBrVf94yPDexEV FnEG9t1sIXjL17iFR8ng5Kkya5yGWWmikmPdtZChj9OUq4NKHKR7/HGM2dxP3I7BheOwY9PF 4mhqVN2Hu1ZpbzzJo68N8GGBmpQNmahnTsLQ97lsirbnPWyMviWcbzfBCocI9IlepwTCqzlN FMctBpLYjpgBwHZVGXKucU+eQ/FAm+6NWatcs7fpGr7dN99S8gVxnCFX1Lzp/T1YzsBNBFVx iRIBCACxI/aglzGVbnI6XHd0MTP05VK/fJub4hHdc+LQpz1MkVnCAhFbY9oecTB/togdKtfi loavjbFrb0nJhJnx57K+3SdSuu+znaQ4SlWiZOtXnkbpRWNUeMm+gtTDMSvloGAfr76RtFHs kdDOLgXsHD70bKuMhlBxUCrSwGzHaD00q8iQPhJZ5itb3WPqz3B4IjiDAWTO2obD1wtAvSuH uUj/XJRsiKDKW3x13cfavkad81bZW4cpNwUv8XHLv/vaZPSAly+hkY7NrDZydMMXVNQ7AJQu fWuTJ0q7sImRcEZ5EIa98esJPey4O7C0vY405wjeyxpVZkpqThDMurqtQFn1ABEBAAHCwGUE GAEKAA8FAlVxiRICGwwFCQWjmoAACgkQR+u171r99UQEHAf/ZxNbMxwX1v/hXc2ytE6yCAil piZzOffT1VtS3ET66iQRe5VVKL1RXHoIkDRXP7ihm3WF7ZKy9yA9BafMmFxsbXR3+2f+oND6 nRFqQHpiVB/QsVFiRssXeJ2f0WuPYqhpJMFpKTTW/wUWhsDbytFAKXLLfesKdUlpcrwpPnJo KqtVbWAtQ2/o3y+icYOUYzUig+CHl/0pEPr7cUhdDWqZfVdRGVIk6oy00zNYYUmlkkVoU7MB V5D7ZwcBPtjs254P3ecG42szSiEo2cvY9vnMTCIL37tX0M5fE/rHub/uKfG2+JdYSlPJUlva RS1+ODuLoy1pzRd907hl8a7eaVLQWA== Cc: FreeBSD Current Message-ID: <2324e7f9-e691-00ba-d45f-c392d2889416@freebsd.org> Date: Thu, 3 May 2018 19:54:56 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit X-ID: ZqRD+sZJohtyzKLDP11qC3ggITZCke2cPoS+aSQnTXPknD2T6lFiPS7VmUW14LTZI3 X-TOI-MSGID: a90286d0-e28e-428c-b2c1-53868cf8d276 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2018 17:55:07 -0000 Am 03.05.18 um 17:28 schrieb Kyle Evans: > On Thu, May 3, 2018 at 10:19 AM, Stefan Esser wrote: >> Am 03.05.18 um 16:41 schrieb Kyle Evans: >>> Hmm... what does `grep -V` look like, just to confirm? >> >> Ah, yes, good point ... >> >> $ which grep >> /usr/bin/grep >> >> $ grep -V >> grep (GNU grep) 2.5.1-FreeBSD >> >> Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc. >> This is free software; see the source for copying conditions. There is NO >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. >> >> So, it seems I have to complain somewhere else about this behavior ... > > Eh, no worries there. Newer GNU grep sucks less, and we're going to > replace it Real Soon Now (TM). Thank you very much - your reply was really helpful! I just tested with GNU grep 2.27 (the current port version) and it does not show the extreme slowness of the old version in FreeBSD, but is still more than 10 times slower than BSD grep on my test data. >> But I have (for a long time) in my /etc/src.conf: >> >> WITH_BSDGREP= yes >> WITH_BSD_GREP_FASTMATCH= yes >> WITHOUT_GNU_GREP_COMPAT= yes >> >> And before seeing the grep -V output, I was convinced that I had been using >> BSD grep (i.e. that it replaced GNU grep with above options) by default ... >> >> But now I see that I need to invoke bsdgrep under that name. It is very fast, >> but does not give the expected (correct?) result, which is the single line >> that is not suppressed by the pattern match ... > > This is actually because you've typo'd WITH_BSD_GREP. =) WITH_BSD_GREP > will replace /usr/bin/grep with bsdgrep and put GNU grep at > /usr/bin/gnugrep. Yes, that was what I had expected, and I had correctly spelled WITH_BSD_PATCH, but never bother to check that I got the "grep" I wanted ... > I also recommend using WITHOUT_BSD_GREP_FASTMATCH / not using > WITH_BSD_GREP_FASTMATCH. See below response. It is so much faster than GNU grep on this use-case anyway ;-) $ sh grep-test.sh All/mpfr-3.1.7.tgz 0.14 real 0.13 user 0.00 sys All/mpfr-3.1.7.tgz 0.13 real 0.13 user 0.00 sys This is a factor 30 to 40 better than with our GNU grep (for the UTF-8 case, where it finishes in finite time, orders of magnitude faster for LANG=C ;-) ). And yes, FASTMATCH was responsible for the erroneous result in my previous tests with BSD grep. Now that I have rebuild it without that option, it works perfectly for me :) > BSD_GREP_FASTMATCH is best left off (default on HEAD)- it was disabled > because the version of tre ("fastmatch") that bsdgrep uses is buggy > and I don't want to invest the time to fix it. The performance of the > version we use isn't any better than our libc regex(3), so I made the > decision to switch it to that and focus efforts on optimizing our > general regex implementation instead. A decision I can well understand and sympathize with. How about removing the BSD_GREP_FASTMATCH option, then? > I have plans to replace our libc regex(3) with Onigmo [1], which is at > least twice as fast as what we have and comes with all kinds of other > extensions- GNU extensions will be exposed via libregex, and I also > plan to install Onigmo on its own so that others can use that with its > own interface. The difference between it and libregex will be that > libregex exposes a regex(3) interface for using extensions with an > option to go REG_POSIX. > > [1] https://github.com/k-takata/Onigmo Great plan! But for now BSD grep seems well up to the task and my only problem is now, that I need to support stable releases that use (and will stay with) the old GNU grep, so I'll need to keep the work-around (or perhaps depend on the port version?). Thanks again! Best regards, STefan