Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 May 1999 21:49:34 -0500
From:      "G. Adam Stanislav" <adam@whizkidtech.net>
To:        Dmitrij Tejblum <dima@tejblum.dnttm.rssi.ru>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: wc* routines
Message-ID:  <19990505214934.B217@whizkidtech.net>
In-Reply-To: <199905041711.VAA04689@tejblum.dnttm.rssi.ru>; from Dmitrij Tejblum on Tue, May 04, 1999 at 09:11:45PM %2B0400
References:  <199905041711.VAA04689@tejblum.dnttm.rssi.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, May 04, 1999 at 09:11:45PM +0400, Dmitrij Tejblum wrote:
> I don't like your idea that WEOF == INT_MIN. Apparently, everyone else 
> have WEOF == -1 (== EOF), and there is no reason why we should not too.
> I don't know about "debugging purposes". WEOF == EOF should allow more 
> code sharing with existing libc. Note that FreeBSD already have some 
> very sparse and nonstandard (but functional) wchar support.

Now that I have actually started coding, I agree with you. :-) I changed it to
(-1).

> Note that a major portion of <wctype.h> already almost implemented in FreeBSD: 
> plain ctype functions work with wide characters. So it should be fairly 
> easy to write an almost working <wctype.h>. (BTW, it is somewhere on my 
> ToDo list for quite some time, but now not that far from the top).

It is fairly easy regardless. :-) It is different from plain ctype functions
though. For example, iswdigit(ch) must return TRUE if ch is a digit in
Devanagari or Chinese or anything else. It also must be locale independent.
If ch is a digit in Unicode (or on any plane of its ISO 10646 extension), then
it is a digit even if one's locale does not know about it.

I have evaluated two existing packages today. They both include the
functionality of <wctype.h> and more (the authors thought C standard did not
go far enough).

I exchanged email with both authors. They both suggested I should include
either library in the base distribution and have the C routines be a front end
to it. I liked the idea at first, but, having thought about it for a couple
hours, I am now more inclined to place both libraries into the ports collection
and continue working on my routines.

One good thing about evaluating those two packages is that I noticed one of
them is using very much the same algorithms I have come up with. It is good
to get an affirmation that I am moving on the right track. :-)

The main problem with doing the front end is that we would either have to
include one of their libraries into the C library, thus adding to it things
that simply do not belong there, or we have to link programs with their library
just to use wctype functions from the standard C library, which would open a
whole can of worms.

Nevertheless, I have posted links to their libraries on the web page and am
open to comments.

I have also discovered an important link today: It is no longer necessary to
spend $305 to get your own copy of the ISO 10646 standard: It can be downloaded
from the web either in MS Word format (yeah, right) or as a PostScript file.
The link is now listed on the page, which, again, is at
http://www.whizkidtech.net/i18n/wc/.

I will also need to get some input on some "philosophical" questions. Namely,
I will need to build several tables for the wctype.h functionality. The thing
is that the standard is open: New codes can and will be added to it. I need
to decide whether to hardcode the tables or place them into files. At this
point, I am leaning toward the hardcoded solution for several reasons: A file
can be misplaced or lost, or even corrupted; the changes do not happen too
often; the changes do not affect major languages and are of little consequence
to most computer users (so if Egyptian hieroglyphics are added to plane 1 as
planned, Egyptologists will need to update their C libraries, while us mortals
may pretty much ignore it); it is just as easy to download an update of the
C library as an update of several files. For what it's worth, I will need to
write some utilities for my own use, utilities to create the code for tables.
So any time they add some new code of interest to only a small group of people,
the group can use the utilities on their own computers, and simply recompile
the library even if I am on vacation, or whatever.

Adam


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990505214934.B217>