From owner-freebsd-bugs Thu Oct 21 8:57:41 1999 Delivered-To: freebsd-bugs@freebsd.org Received: from isabase.philol.msu.ru (isabase.philol.msu.ru [195.208.217.73]) by hub.freebsd.org (Postfix) with ESMTP id EE81C14F5B; Thu, 21 Oct 1999 08:56:53 -0700 (PDT) (envelope-from grg@isabase.philol.msu.ru) Received: (from grg@localhost) by isabase.philol.msu.ru (8.9.3/8.9.2) id TAA36290; Thu, 21 Oct 1999 19:56:53 +0400 (MSD) (envelope-from grg) Date: Thu, 21 Oct 1999 19:56:51 +0400 From: Grigoriy Strokin To: freebsd-hackers@freebsd.org Cc: ache@freebsd.org, freebsd-bugs@freebsd.org Subject: comm doesn't obey current locale collation Message-ID: <19991021195649.A36122@isabase.philol.msu.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0pre3i Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hello, 6 months ago I have sent a 'send-pr' about /usr/bin/comm (Problem Report bin/11221). Still there are no follow-ups, no has been this report assigned to any responsible person. What might this mean? ---------------- forward --------------- Problem Report bin/11221 comm doesn't obey current locale collation Confidential no Severity serious Priority medium Responsible freebsd-bugs@freebsd.org State open Class sw-bug Submitter-Id current-users Arrival-Date Mon Apr 19 10:40:03 PDT 1999 Last-Modified never Originator Grigoriy Strokin grg@philol.msu.ru Release FreeBSD 3.1-STABLE i386 Organization Moscow University Environment $LANG set to ru_RU.KOI8-R Description Comm produces wrong results when processing 8-bit text files sorted with /usr/bin/sort according to current locale (ru_RU.KOI8-R) How-To-Repeat Unpack the following shar-archive and call LANG=ru_RU.KOI8-R comm jaa.srt jaa2.srt Several identical characters will appear in both first and second column, whereas this must not occur with these files that were produced as output of LANG=ru_RU.KOI8-R sort ---------------------CUT------------------------------------------ # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # jaa.srt # jaa2.srt # echo x - jaa.srt sed 's/^X//' >jaa.srt << 'END-of-jaa.srt' Xô Xõ Xæ Xö Xé Xç Xà Xù Xü Xñ Xý Xû Xø Xá Xó END-of-jaa.srt echo x - jaa2.srt sed 's/^X//' >jaa2.srt << 'END-of-jaa2.srt' Xô Xõ Xæ Xè Xö Xé Xç Xà Xù Xü Xý Xø Xá Xó END-of-jaa2.srt exit Fix Apply the patch: --- comm.c.orig Mon Apr 19 16:57:56 1999 +++ comm.c Mon Apr 19 19:45:49 1999 @@ -55,9 +55,29 @@ #include #include #include +#include +#include #define MAXLINELEN (LINE_MAX + 1) +/* The standard library strcoll, an analog of strcmp that takes into account + * the current locale, but strcasecmp does not have such an analog. + * So let's define a replacement, locale_dependent_strcasecmp + * */ + +int locale_dependent_strcasecmp(const char *s1, const char *s2) +{ + char a1[MAXLINELEN], a2[MAXLINELEN]; + char *c; + for (c = a1; *s1; c++, s1++) + *c = toupper((unsigned char)(*s1)); + *c = 0; + for (c = a2; *s2; c++, s2++) + *c = toupper((unsigned char)(*s2)); + *c = 0; + return strcoll(a1, a2); +} + char *tabs[] = { "", "\t", "\t\t" }; FILE *file __P((char *)); @@ -74,7 +94,7 @@ FILE *fp1, *fp2; char *col1, *col2, *col3; char **p, line1[MAXLINELEN], line2[MAXLINELEN]; - + setlocale(LC_ALL, ""); flag1 = flag2 = flag3 = 1; iflag = 0; @@ -139,9 +159,9 @@ /* lines are the same */ if(iflag) - comp = strcasecmp(line1, line2); + comp = locale_dependent_strcasecmp(line1, line2); else - comp = strcmp(line1, line2); + comp = strcoll(line1, line2); if (!comp) { read1 = read2 = 1; ====== CUT ======== -- === Grigoriy Strokin, Lomonosov University (MGU), Moscow === === contact info: http://isabase.philol.msu.ru/~grg/ === To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message