From owner-svn-src-head@freebsd.org Fri Sep 20 12:14:44 2019 Return-Path: Delivered-To: svn-src-head@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8486CFBB16; Fri, 20 Sep 2019 12:14:44 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 46ZXhg2FhKz45yt; Fri, 20 Sep 2019 12:14:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id D38A736182F; Fri, 20 Sep 2019 22:14:40 +1000 (AEST) Date: Fri, 20 Sep 2019 22:14:39 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ed Maste cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r351700 - head/lib/libc/string In-Reply-To: <201909021356.x82Dui6v077765@repo.freebsd.org> Message-ID: <20190920212702.N1017@besplex.bde.org> References: <201909021356.x82Dui6v077765@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=cn9ZTVXxRq8eMFPCvVIA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: 46ZXhg2FhKz45yt X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-2.30 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23:c]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DMARC_NA(0.00)[optusnet.com.au]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RWL_MAILSPIKE_POSSIBLE(0.00)[249.132.29.211.rep.mailspike.net : 127.0.0.17]; IP_SCORE(0.00)[ip: (-5.26), ipnet: 211.28.0.0/14(-3.26), asn: 4804(-2.40), country: AU(0.01)]; IP_SCORE_FREEMAIL(0.00)[]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Sep 2019 12:14:44 -0000 On Mon, 2 Sep 2019, Ed Maste wrote: > Author: emaste > Date: Mon Sep 2 13:56:44 2019 > New Revision: 351700 > URL: https://svnweb.freebsd.org/changeset/base/351700 > > Log: > libc: Use musl's optimized memchr > > Parentheses added to HASZERO macro to avoid a GCC warning. > > Reviewed by: kib, mjg > Obtained from: musl (snapshot at commit 4d0a82170a) > Sponsored by: The FreeBSD Foundation > Differential Revision: https://reviews.freebsd.org/D17631 > > Modified: > head/lib/libc/string/memchr.c > > Modified: head/lib/libc/string/memchr.c > ============================================================================== > --- head/lib/libc/string/memchr.c Mon Sep 2 13:55:31 2019 (r351699) > +++ head/lib/libc/string/memchr.c Mon Sep 2 13:56:44 2019 (r351700) > @@ -1,55 +1,54 @@ > /*- > - * SPDX-License-Identifier: BSD-3-Clause > + * SPDX-License-Identifier: MIT > * > - * Copyright (c) 1990, 1993 > - * The Regents of the University of California. All rights reserved. > + * Copyright (c) 2005-2014 Rich Felker, et al. > * > - * This code is derived from software contributed to Berkeley by > - * Chris Torek. I prefer Torek's version. Optimizing this function is especially unimportant, and this version has mounds of style bugs. > -void * > -memchr(const void *s, int c, size_t n) > -{ > - if (n != 0) { > - const unsigned char *p = s; > +#define SS (sizeof(size_t)) > +#define ALIGN (sizeof(size_t)-1) > +#define ONES ((size_t)-1/UCHAR_MAX) > +#define HIGHS (ONES * (UCHAR_MAX/2+1)) > +#define HASZERO(x) (((x)-ONES) & ~(x) & HIGHS) > > - do { > - if (*p++ == (unsigned char)c) > - return ((void *)(p - 1)); > - } while (--n != 0); Old KNF code. In the !__GNUC__ #else clausew , this is replaced by equivalent code with a style that is very far from KNF. If compilers understood this function, then they could reasonably optimize the old code in very AMD ways, e.g., use AVX512. This is not so reasonable for the new code. clang does such optimizations to a fault, but knows that it doesn't understand this code or even strlen(), so it does a naive translation. > +void *memchr(const void *src, int c, size_t n) > +{ > + const unsigned char *s = src; > + c = (unsigned char)c; > +#ifdef __GNUC__ > + for (; ((uintptr_t)s & ALIGN) && n && *s != c; s++, n--); > + if (n && *s != c) { > + typedef size_t __attribute__((__may_alias__)) word; This seems to have no dependencies on __GNUC__ except the use of anti-aliasing, and that dependendency is broken (very old versions of gcc don't even have __attribute__(()), and old versions don't have __may_alias). > + const word *w; > + size_t k = ONES * c; > + for (w = (const void *)s; n>=SS && !HASZERO(*w^k); w++, n-=SS); > + s = (const void *)w; > } This is cryptic as well as non-KNF. libc/string/strlen.c uses the same dubious optimization method, but it is at least clearly written and it is even in KNF. > - return (NULL); > +#endif > + for (; n && *s != c; s++, n--); > + return n ? (void *)s : 0; > } I checked that this optimization actually works. For long strings, it is almost sizeof(size_t) times faster, by accessing memory with a size sizeof(size_t). size_t is a not very good way of hard-coding the word size. Related silly optimizations: - i386 has memchr() in asm. This uses scasb, which was last optimal on the 8088, or perhaps the original i386. On freefall, it is several times slower than the naive translation of the naive C code. - i386 pessimizes several old string functions using scasb. amd64 mostly doesn't copy this mistake. Similarly for wide char string functions. Optimizing these is even more unimportant and just writing them in asm is quite complicated. They are perhaps not fully pessimized in asm on i386. amd64 doesn't have any of them in asm. - i386 also does dubious alignment optimizations - amd64 pessimizes strcpy() by implementing it as a C wrapper for stpcpy(). The wrapper code is MI, but is only used for amd64. - amd64 stpcpy(), strcat() and strcmp() uses the fancy optimization using the 0x0101... and 0x8080... tricks (as in this version of memchr() except not so cryptically since the constants are literal instead of constructed using macros from the word type). Optimizing strcat() some of these functions is unimportant. Worth doing at most once, in the C version. - the generic strlen() (but no other generic string function) is optimized using the 0x0101... and 0x8080... tricks. This optimization is dubious since it is a pessimization for short strings. Compilers often inline string functions, especially strlen() anyway. I think x86 compilers used to routinely inline strlen(), but now they know that they don't understand it, so they just call the library strlen() for most strings that are not known at compile time. This results in the fancy optimization actually being used. Sometimes this optimization improves the speed of a large program by as much as 0.001%. Bruce