Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Oct 1995 18:20:35 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        ache@astral.msk.su (=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=)
Cc:        terry@lambert.org, hackers@freefall.freebsd.org, kaleb@x.org
Subject:   Re: A couple problems in FreeBSD 2.1.0-950922-SNAP
Message-ID:  <199510170120.SAA26017@phaeton.artisoft.com>
In-Reply-To: <yZsrkWmKU1@ache.dialup.demos.ru> from "=?KOI8-R?Q?=E1=CE=C4=D2=C5=CA_=FE=C5=D2=CE=CF=D7?=" at Oct 17, 95 02:40:38 am

next in thread | previous in thread | raw e-mail | index | archive | help
> >This is valid for all 8859-x display/input systems, since the reuse of
> >the code points are not transformed by this (8859-x does not encode
> >characters in those locations).
> 
> You consider one very simple case (isprint/iscontrol only) and think
> that it is a proof. What you can say about ispunct() f.e.?
> It is clearly differ into 8859-1 and 8859-5 f.e., islower/isupper differs
> too. tolower/toupper differs too. Even isalpha differs.

What did I say before about lobbying international standards bodies
to replace 8859-5?

I don't know if I buy the [is,to][upper,lower] distinctions.  I think
they are mainly for undefined code points, and getting the wrong result
in an undefined are is not a problem.

> >The only potentially incorrect behaviour is on blanks not being interpreted
> >as blanks.  If you want a blank, you shouldn't be using some wild code
> >point other than 0x20 anyway.  You get what you deserve.
> 
> Well, isspace differs too.

Space isn't 0x20 in 8859-5?  Tab, LF, CR aren't the same?

> >The problems you will encounter in this circumstance are all *very*
> >specific to cases where a single file system is being used by multiple
> >nationalities of clients.
> 
> No it is different problem. By setting LANG for something != 8859-1
> (for programs that understands it) I assume that programs which
> not understands it still works right.
> If they are strict ASCII, I automatically protected from any
> unwanted effects. If they are 8859-1 I need to classify
> various unwanted effects for each != 8859-1 charset as
> 'default undefined behaviour'.

I agree.  And this is precisely the problem with the crt0.o/setlocale()
hack.  You are implicitly removing the protection from unwanted effects.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199510170120.SAA26017>