Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Sep 1996 02:50:41 -0700 (PDT)
From:      asami@freebsd.org (Satoshi Asami)
To:        doc@freebsd.org
Subject:   more on Japanese handbook
Message-ID:  <199609180950.CAA15480@silvia.HIP.Berkeley.EDU>

next in thread | raw e-mail | index | archive | help
John,

How's the current status of the merging effort?  Please let us know
your thoughts on the directory structure.

By the way, there is one more thing you may want to consider re the
handbook encoding.

The Japanese encoding used in the handbook sources (EUC-JP) is good in
that many tools allow the Japanes part of it to pass through
untouched, but there is an annoying tendency of netscape (and maybe
others) misjudging the language code of some files and thinking it's
Shift-JIS (the brain-damaged code nobody likes but since NEC decided
to use it for their once-popular PC98 series, it's not dying anytime
soon).

Also there is no language information in EUC, so if someone reads the
pages using netscape with language set to Chinese, well it will show
something totally incoherent (when it should have ignored it).

The optimal solution (at least viewed from the Japanese side of us) is
to convert the file into JIS just before it's written to whatever
output files sgmlfmt is creating (or is it instant now? :).  This is
really quite simple, since all we need to do here is to scan for bytes
with the eighth bit set and convert that, as well as the following
bytes with the eighth bit set, from something like

1[B1] 1[B2] ... 1[B2N-1] 1[B2N]

 to

Esc '$' 'B' 0[B1] 0[B2] ... 0[B2N-1] 0[B2N] Esc '(' 'B'

(1 is the eighth bit set, 0 is it cleared -- [BX] is the lower 7 bits
 of the X-th byte)

The nkf program (ports/japanese/nkf) is one such filter but that's a
gross overkill, as it needs to deal with all three forms of input and
output (plus mime and...).  Since we only need EUC->JIS conversion, it 
can be done with a 10-line (or so) C program.

What do you think?  By the way, I'm not sure what the user should set
${LANG} to when there might be both EUC and JIS on the system, would
it be suffice to just say "ja_JP"?  (I'm asking this mostly to
Mr. Hanai, I guess.)

Satoshi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609180950.CAA15480>