From owner-cvs-all Thu Jun 6 4:12:10 2002 Delivered-To: cvs-all@freebsd.org Received: from treetop.robbins.dropbear.id.au (228.c.008.mel.iprimus.net.au [210.50.88.228]) by hub.freebsd.org (Postfix) with ESMTP id 1E17237B407; Thu, 6 Jun 2002 04:11:59 -0700 (PDT) Received: from treetop.robbins.dropbear.id.au (localhost [127.0.0.1]) by treetop.robbins.dropbear.id.au (8.12.2/8.12.2) with ESMTP id g56ATlBF045694; Thu, 6 Jun 2002 20:29:48 +1000 (EST) (envelope-from tim@treetop.robbins.dropbear.id.au) Received: (from tim@localhost) by treetop.robbins.dropbear.id.au (8.12.2/8.12.2/Submit) id g56ATg5B045693; Thu, 6 Jun 2002 20:29:42 +1000 (EST) Date: Thu, 6 Jun 2002 20:29:42 +1000 From: "Tim J. Robbins" To: "Andrey A. Chernov" Cc: cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/usr.bin/uniq uniq.c Message-ID: <20020606202942.A45282@treetop.robbins.dropbear.id.au> References: <200206060313.g563DAi26751@freefall.freebsd.org> <20020606031545.GA83612@nagual.pp.ru> <20020606161843.A44561@treetop.robbins.dropbear.id.au> <20020606083246.GA85860@nagual.pp.ru> <20020606192402.A45186@treetop.robbins.dropbear.id.au> <20020606100352.GA86621@nagual.pp.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5.1i In-Reply-To: <20020606100352.GA86621@nagual.pp.ru>; from ache@nagual.pp.ru on Thu, Jun 06, 2002 at 02:03:54PM +0400 Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, Jun 06, 2002 at 02:03:54PM +0400, Andrey A. Chernov wrote: > 3) There is no much sense to discuss non-localized implementations you mention. The GNU, Solaris and NetBSD implementations are localised, but do not use strcoll() because it would be incorrect to do so. > 4) Uniq must be consistent with other utilities 'unique' concept to > operate in the flow, like comm, join and sort, they _use_ collate, so uniq > must not produce different conflicting results. > > 5) From common sense: in some languages > alala > and > ssalala > are the same. strcoll() should not indicate that these strings are identical. If it does, it is incorrectly implemented. FreeBSD's strcoll() and strxfrm() are incorrectly implemented: strcoll("ss", "\xdf") == 0 in some locales on FreeBSD, but equals 1, -1 or -108 on all Solaris locales. strcmp() is the correct function to use to compare text strings for equality. strcoll() is the correct function to use to compare sorting order of text strings. uniq is not interested in the sort order of strings, it is interested in whether two lines of text are identical. If the sort utility is operating correctly, identical input lines will be adjacement in the output. $ export LANG=de_DE.ISO8859-15 $ printf "ss\n\337\nss\n\337\nss\n" | sort -u ss ß $ printf "ss\n\337\nss\n\337\nss\n" | sort -u | uniq ss This behaviour is simply not correct, and the bug lies in FreeBSD's old uniq implementation, not GNU sort. I shall not back this change out. Tim To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message