FreeBSD Mail Archives

Date:      Tue, 3 Sep 1996 17:06:41 -0700 (PDT)
From:      asami@freebsd.org (Satoshi Asami)
To:        jfieber@indiana.edu
Cc:        hanai@astec.co.jp, doc@freebsd.org, core@freebsd.org
Subject:   Re: Warning: SGML doc changes
Message-ID:  <199609040006.RAA09404@leia.cs.berkeley.edu>
In-Reply-To: <Pine.BSI.3.95.960903091755.3606E-100000@fallout.campusview.indiana.edu> (message from John Fieber on Tue, 3 Sep 1996 09:40:24 -0500 (EST))

 * It appears to handle the HTML transform, but look at
 :
 * to verify.

It looks ok but I only took a glance.  The only thing I could find was 
that the mailto: urls (look at the Core Team roster etc.) seem mangled 
but that may be because of the brokenness of the particular set of
files you grabbed (this thing is still being updated daily).

 * 	       The ascii and postscript transform fall over when
 * they hit groff.  What sort of tweaks does groff need to work? 
 * Are they the sort of tweaks that could be added to the FreeBSD's
 * groff? 

The tweaks that are in /usr/ports/japanese/groff. ;)

Also, I heard sgmlfmt needs to call groff with "-T nippon" instead of
"-T ascii".  (The modified groff works exactly like the original
unless it's called with "-T nippon".)

 * I know very little about multibyte character processing but I
 * suspect instant's handling is more accident than intentional. 
 * Instant doesn't really do much with the actual content.  However
 * it does a little more than sgmlsasp did so there is some
 * potential for mangling.  I'm about to go down to the library and
 * check out "Understanding Japanese Information Processing" (Publ.
 * O'Reailly) to educate myself on this matter. 

That might be overkill.  There are three major character encodings,
one stateful (JIS) and two stateless (EUC-Japanese and shift-JIS).
The ones we are using, EUC-J, just set the eighth bit on both bytes
of two-byte characters.  Unless you want to write something that groks 
all three, it's probably worth the effort to read a whole book for
this. ;)

Other than inserting line breaks between Japanese characters if
necessary (the Japanese language doesn't use spaces that often), and
making sure that you don't cut the line between the two bytes that
compose a single character, there isn't really that much to worry
about if you're talking EUC.  (Of course, there is an issue of certain
charecters not pemitted to appear at the beginning of lines, like
Japanese equivalents of "," and ".", but many programs (like
netscape3) don't handle this anyway so people are used to seeing them
screwed up....)

Satoshi

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609040006.RAA09404>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation