From owner-svn-src-all@FreeBSD.ORG Thu Jan 21 05:26:00 2010 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED5171065670; Thu, 21 Jan 2010 05:26:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id 7F1688FC16; Thu, 21 Jan 2010 05:25:59 +0000 (UTC) Received: from besplex.bde.org (c220-239-227-214.carlnfd1.nsw.optusnet.com.au [220.239.227.214]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o0L5PrvH031647 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 Jan 2010 16:25:55 +1100 Date: Thu, 21 Jan 2010 16:25:53 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andrey Chernov In-Reply-To: <20100120121827.GA55236@nagual.pp.ru> Message-ID: <20100121155841.H1512@besplex.bde.org> References: <201001181344.o0IDiiEL079037@svn.freebsd.org> <20100120012639.B67517@delplex.bde.org> <20100120070843.GA45937@nagual.pp.ru> <20100120191752.Q2120@besplex.bde.org> <20100120094627.GA53020@nagual.pp.ru> <20100120211722.S2451@besplex.bde.org> <20100120121827.GA55236@nagual.pp.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans Subject: Re: svn commit: r202572 - head/lib/libc/gen X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Jan 2010 05:26:01 -0000 On Wed, 20 Jan 2010, Andrey Chernov wrote: > On Wed, Jan 20, 2010 at 09:33:08PM +1100, Bruce Evans wrote: >>> But there is >>> nothing said about opendir() & strcoll() relation in the mentioned >>> standards. The only word I found is that opendir() returns "ordered" >>> sequence, but nowhere mentioned ordered by what criteria, so perhaps they >>> mean "stable": >> >> As I said before, sorting in opendir() has nothing to do with POSIX! It >> is an implementation detail for union file systems/mounts. > > Moreover, even sorting itself is not required here. We sort just to remove > dups. Interesting. Why does it require a stable sort then? It only removes duplicates by name. At least with strcmp() in the compare function, such dups will remain together although they may be moved. The stable sort would be needed if it must keep the original first of duplicates by name, but it doesn't say that. BTW, the statfs() to determine if this sort is necessary is a large pessimization for nfs file systems. Nfs caches most things but not statfs(). Thus a readdir() over nfs does an expensive statfs() every time although the directory contents will normally be cached after the first time. I think the sorting belongs in file systems, not in readdir() where it affects file systems that don't need it. >> It should also give the FreeBSD >> extension of POSIX. POSIX says: "If the strcoll() function fails, >> then the return value of alphasort() is unspecified.", but this makes >> alphasort() unusable since a qsort() comparison function must return >> a specified value. > > To be used in practice, strcoll() should never fails, doing fallback to > strcmp() instead, not only in that, but in lots of other cases too (it may > set errno like EILSEQ, but not fails). The next important thing is to > return 0 only for true binary equals, additionaly ranking (f.e. by > strcmp()) anything inside classes of equality to stabilize result. > > I hope our strcoll() will be kept in that state after implementing > UCA too. What is UCA? Failing is a POSIX bug -- C99 doesn't allow it to fail. I think it should at least be specified to return nonzero (unequal) on failure. This is like comparisons of NaNs returning unequal even for comparisons of identical NaNs. Can it return equal for non-binary-equal strings? I think it can -- the locale might have different encodings for strings that are considered identical. Then duplicates should be according to strcoll() and file systems would have a hard time managing such duplicates when they are created in a locale where they are non-duplicates. Bruce