From owner-freebsd-hackers Sat Apr 1 19:14:51 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from happy.checkpoint.com (kinata.checkpoint.com [199.203.156.41]) by hub.freebsd.org (Postfix) with ESMTP id 0D7D737BB86; Sat, 1 Apr 2000 19:14:36 -0800 (PST) (envelope-from mellon@pobox.com) Received: (from mellon@localhost) by happy.checkpoint.com (8.9.3/8.9.3) id FAA54116; Sun, 2 Apr 2000 05:16:00 GMT (envelope-from mellon@pobox.com) Date: Sun, 2 Apr 2000 05:16:00 +0000 From: Anatoly Vorobey To: hackers@freebsd.org Cc: yokota@freebsd.org, peter@freebsd.org Subject: Re: Unicode on FreeBSD Message-ID: <20000402051559.A52041@happy.checkpoint.com> References: <20000320194702.11223.qmail@web3101.mail.yahoo.com> <8bitar$2i4f$1@bigeye.rhein-neckar.de> <20000329033908.A14122@happy.checkpoint.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000329033908.A14122@happy.checkpoint.com>; from mellon@pobox.com on Wed, Mar 29, 2000 at 03:39:08AM +0000 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, Mar 29, 2000 at 03:39:08AM +0000, Anatoly Vorobey wrote: > I wonder how useful it would be to teach syscons/kbd to handle Unicode. No replies so far; let me try again. I was not trying to convey the attitude "let it be done" in my message; I was rather hoping for a reply of a "this would be a nice thing to have; if you do it, I'll review it and if works right, I'll commit it" kind. I believe that what I am suggesting is a Good Thing, but if this belief is not shared by others, there's not much sense in me trying to do it. Thus I suggest teaching kbd/syscons/vga to use Unicode internally. The picture would look as follows. A keymap specifies Unicode values (rather than 8-bit values as now) for keycodes; the console driver receives Unicode values from the keyboard driver. On the video side, the console driver has a bunch of Unicode characters (rather than 8-bit characters) to render; in text mode, it translates them into 8-bit codes and puts them on the screen, the correct font having been previously loaded; in raster mode, it uses the current font to draw them out directly on the screen. The benefits, rather considerable I think, are as follows: - keymaps for different languages don't need to depend on encodings as they do now (most of the languages currently have 2 and more different encodings schemes arranged for in /usr/share/syscons/keymaps ; if Unicode values are specified in keymaps, they all go away and only different key layouts will require separate keymap files); in fact, kbdcontrol(1) can then be written to be aware of the symbolic Unicode names which then would be used in keymap files, simplifying them greatly. - screen fonts as well don't need to depend on encodings - they will map Unicode symbols into screen shapes. The redundant screen font files go away. - in raster modes (SC_PIXEL_MODE on, etc.) more than 256 characters can now be trivially drawn. In fact, different languages that prevously occupied the same codespace in 8-bit (i.e. all languages except English) can now be displayed together in these modes. Maybe there are consequences for scripts such as hiragana etc.? Consider the convenience for users of scripts with relatively many characters. - /usr/share/syscons/scrnmaps goes away, this kludge being no longer needed. - the road is wide open for Unicode support in userland, through UTF-8. The drawbacks, as I see them, are as follows: - The format of screen font files must be changed. They may not be describing consecutive character codes anymore, and 8-bit indexed arrays go away. One font file may now describe lots of languages at once. - much more kernel memory used for font files if they are unified as above and used as a whole. Some mechanism may be used for telling kbdcontrol(1) and friends which subset of the font to load (doing this strictly by user's LANG won't let him use several languages at once though). In text modes, a mapping must be created to squeeze Unicode into the available 8-bit VGA font space, and if there isn't enough space, someone must decide which Unicode chars to let go and convert into blanks -- syscons is the module which will be doing this job, and userland may tell it what the really important Unicode chars are based on the user's LANG. - some rendering routines are slowed down due to the fact that simple 8-bit array lookups are no longer available for getting characters' information. This may be circumvented somewhat by smart searches/hash tables. Implementation considerations: - may be done in stages, which is good. For instance, keyboard driver together with kbdcontrol(1) and keymap files may be modified at first, with syscons translating Unicode codes into 8-bit using a translation table conveyed to it by kbdcontrol(1) and handling video exactly as before. Later video routines are changed, etc. - kbd driver changes aren't significant in the kernel, mainly type changes and the like (who else except syscons/pcvt is using the kbd driver?). - in syscons, virtual buffer stuff, font support, and the VGA renderer need to be significantly changed. - in userland, data files in /usr/share/syscons need to be changed, kbdcontrol(1) and vidcontrol(1) need to adapt to that, and a method for relaying to syscons the current Unicode<->8bit translation table (so that userland programs won't feel anything) needs to be added. The other alternative is to do that conversion in userland libraries and make syscons completely Unicode. What do you think? -- Anatoly Vorobey, mellon@pobox.com http://pobox.com/~mellon/ "Angels can fly because they take themselves lightly" - G.K.Chesterton To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message