Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Apr 2000 05:16:00 +0000
From:      Anatoly Vorobey <mellon@pobox.com>
To:        hackers@freebsd.org
Cc:        yokota@freebsd.org, peter@freebsd.org
Subject:   Re: Unicode on FreeBSD
Message-ID:  <20000402051559.A52041@happy.checkpoint.com>
In-Reply-To: <20000329033908.A14122@happy.checkpoint.com>; from mellon@pobox.com on Wed, Mar 29, 2000 at 03:39:08AM %2B0000
References:  <20000320194702.11223.qmail@web3101.mail.yahoo.com> <8bitar$2i4f$1@bigeye.rhein-neckar.de> <20000329033908.A14122@happy.checkpoint.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 29, 2000 at 03:39:08AM +0000, Anatoly Vorobey wrote:
> I wonder how useful it would be to teach syscons/kbd to handle Unicode.

No replies so far; let me try again. I was not trying to convey the
attitude "let it be done" in my message; I was rather hoping for a
reply of a "this would be a nice thing to have; if you do it, I'll review 
it and if works right, I'll commit it" kind. I believe that what I am 
suggesting is a Good Thing, but if this belief is not shared by others, 
there's not much sense in me trying to do it.

Thus I suggest teaching kbd/syscons/vga to use Unicode internally.
The picture would look as follows. A keymap specifies Unicode values 
(rather than 8-bit values as now) for keycodes; the console driver 
receives Unicode values from the keyboard driver. On the video side,
the console driver has a bunch of Unicode characters (rather than
8-bit characters) to render; in text mode, it translates them into
8-bit codes and puts them on the screen, the correct font having
been previously loaded; in raster mode, it uses the current font to
draw them out directly on the screen.

The benefits, rather considerable I think, are as follows:

- keymaps for different languages don't need to depend on encodings as
they do now (most of the languages currently have 2 and more different
encodings schemes arranged for in /usr/share/syscons/keymaps ; if Unicode
values are specified in keymaps, they all go away and only different
key layouts will require separate keymap files); in fact, kbdcontrol(1)
can then be written to be aware of the symbolic Unicode names which 
then would be used in keymap files, simplifying them greatly. 

- screen fonts as well don't need to depend on encodings - they will map
Unicode symbols into screen shapes. The redundant screen font files
go away.

- in raster modes (SC_PIXEL_MODE on, etc.) more than 256 characters can
now be trivially drawn. In fact, different languages that prevously
occupied the same codespace in 8-bit (i.e. all languages except English)
can now be displayed together in these modes. Maybe there are consequences
for scripts such as hiragana etc.? Consider the convenience for users
of scripts with relatively many characters.

- /usr/share/syscons/scrnmaps goes away, this kludge being no longer needed.

- the road is wide open for Unicode support in userland, through UTF-8. 

The drawbacks, as I see them, are as follows:

- The format of screen font files must be changed. They may not be
describing consecutive character codes anymore, and 8-bit indexed arrays
go away. One font file may now describe lots of languages at once.

- much more kernel memory used for font files if they are unified as
above and used as a whole. Some mechanism may be used for telling 
kbdcontrol(1) and friends which subset of the font to load (doing this 
strictly by user's LANG won't let him use several languages at once though).
In text modes, a mapping must be created to squeeze Unicode into 
the available 8-bit VGA font space, and if there isn't enough space, someone
must decide which Unicode chars to let go and convert into blanks -- 
syscons is the module which will be doing this job, and userland may 
tell it what the really important Unicode chars are based on the user's
LANG.

- some rendering routines are slowed down due to the fact that simple
8-bit array lookups are no longer available for getting characters'
information. This may be circumvented somewhat by smart searches/hash 
tables.

Implementation considerations:

- may be done in stages, which is good. For instance, keyboard driver
together with kbdcontrol(1) and keymap files may be modified at first,
with syscons translating Unicode codes into 8-bit using a translation
table conveyed to it by kbdcontrol(1) and handling video exactly
as before. Later video routines are changed, etc.

- kbd driver changes aren't significant in the kernel, mainly type changes
and the like (who else except syscons/pcvt is using the kbd driver?).

- in syscons, virtual buffer stuff, font support, and the VGA renderer
need to be significantly changed.

- in userland, data files in /usr/share/syscons need to be changed, 
kbdcontrol(1) and vidcontrol(1) need to adapt to that, and a method
for relaying to syscons the current Unicode<->8bit translation table
(so that userland programs won't feel anything) needs to be added.
The other alternative is to do that conversion in userland libraries
and make syscons completely Unicode.

What do you think?
-- 
Anatoly Vorobey,
mellon@pobox.com http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000402051559.A52041>