From owner-svn-src-all@FreeBSD.ORG Tue Jan 3 15:49:36 2012 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8141A1065676; Tue, 3 Jan 2012 15:49:36 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id 1C26D8FC14; Tue, 3 Jan 2012 15:49:35 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q03FnWVf026382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 4 Jan 2012 02:49:33 +1100 Date: Wed, 4 Jan 2012 02:49:32 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Ed Schouten In-Reply-To: <201201030714.q037E2qq010125@svn.freebsd.org> Message-ID: <20120104013401.S6960@besplex.bde.org> References: <201201030714.q037E2qq010125@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r229368 - in head: lib/libc lib/libc/arm/string lib/libc/i386/string lib/libc/mips/string lib/libc/string lib/libstand sys/boot/userboot/libstand X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jan 2012 15:49:36 -0000 On Tue, 3 Jan 2012, Ed Schouten wrote: > Log: > Merge index() and strchr() together. > > As I looked through the C library, I noticed the FreeBSD MIPS port has a > hand-written version of index(). This is nice, if it weren't for the > fact that most applications call strchr() instead. > > Also, on the other architectures index() and strchr() are identical, > meaning we have two identical pieces of code in the C library and > statically linked applications. Only in statically linked applications that used both, since they weren't actually identical -- they were intentionally put in separate object files to avoid this problem. (In asm, you don't need symbol magic to declare strong aliases, but just use 2 .globl labels together. But this is usually wrong since it doesn't keep things separate enough. Some files use #include to implement the multiple copies. For example, amd64 and i386 don't bother optimizing memcpy() over memmove(), but make it a copy in a separate file. The i386 index.S and strchr.S were not so good -- they duplicated the code.) > Solve this by naming the actual file strchr.[cS] and let it use > __strong_reference()/STRONG_ALIAS() to provide the index() routine. Do > the same for rindex()/strrchr(). This breaks the Standard C namespace. When they are in the same object file, there is no way to get the standard name without getting the nonstandard name. So the following C-standard-conforming C program now gets a linkage error (multiple definition of `index'), at least with static linkage: #include int index; void foo(const char *p) { return strchr(p, '1'); } When they were in separate object files, the nonstandard name just added to the general pollution in the libc runtime in a way that doesn't seem to cause any problems in practice, since it is orthogonal to any uses of the name in a conforming application. We mostly use weak references in libraries, to avoid problems like this. In libc, there were just 2 __strong_reference()s and 111 __weak_reference()s. One of the oldest weak references is from __vfscanf to vfscanf. This is used to implement a bug in C90: C90 doesn't have vfscanf, so it must not be in libc in a way that conflicts with any application symbol named vfscanf. libc needs vfscanf's functionality internally, and doesn't want to duplicate the whole thing. So it puts the functionality in __vfscanf and always uses that internally, and provides the duplication solely as a weak symbol. The symbol remains weak, and C90 remains sort of supported, although the bug is fixed in C99 (it has vfscanf). There are also _many_ (but not nearly all?) POSIX symbols that are handled as weak references. Internally, they have names like _open and weak symbols like `open' (for some reason, both _open and `open' are shown by nm as weak). These are implemented more magically using include/*namespace.h and macros in asm files. I got the count of 111 by grepping for the C macro. This missed all the asm macros. Grepping for ' W ' in libc.a shows 1024 weak references. That's almost 30% of all symbols (there are 2195 ' T ' symbols). nm doesn't seem to provide a way to show what the symbols are aliases for. It is worse for strong symbols (shows them both as ' T '). > > This seems to make the C libraries and static binaries slightly smaller, > but this reduction in size seems negligible. Duplication of the object file (except for the global symbols) is best, and may be required, even for the example of memcpy being identical to memmove given above. It is useful to be able to put a breakpoint at memcpy without having it trigger when memmove is called, and the C standard might require memcpy and memmove to have different addresses. Similarly for profiling. You want logically different functions to have different addresses. I wonder if gprof knows enough about symbols to prefer strchr over index if they are strong aliases for each other. Bruce