Date: Tue, 3 Sep 1996 17:06:41 -0700 (PDT) From: asami@freebsd.org (Satoshi Asami) To: jfieber@indiana.edu Cc: hanai@astec.co.jp, doc@freebsd.org, core@freebsd.org Subject: Re: Warning: SGML doc changes Message-ID: <199609040006.RAA09404@leia.cs.berkeley.edu> In-Reply-To: <Pine.BSI.3.95.960903091755.3606E-100000@fallout.campusview.indiana.edu> (message from John Fieber on Tue, 3 Sep 1996 09:40:24 -0500 (EST))
next in thread | previous in thread | raw e-mail | index | archive | help
* It appears to handle the HTML transform, but look at : * to verify. It looks ok but I only took a glance. The only thing I could find was that the mailto: urls (look at the Core Team roster etc.) seem mangled but that may be because of the brokenness of the particular set of files you grabbed (this thing is still being updated daily). * The ascii and postscript transform fall over when * they hit groff. What sort of tweaks does groff need to work? * Are they the sort of tweaks that could be added to the FreeBSD's * groff? The tweaks that are in /usr/ports/japanese/groff. ;) Also, I heard sgmlfmt needs to call groff with "-T nippon" instead of "-T ascii". (The modified groff works exactly like the original unless it's called with "-T nippon".) * I know very little about multibyte character processing but I * suspect instant's handling is more accident than intentional. * Instant doesn't really do much with the actual content. However * it does a little more than sgmlsasp did so there is some * potential for mangling. I'm about to go down to the library and * check out "Understanding Japanese Information Processing" (Publ. * O'Reailly) to educate myself on this matter. That might be overkill. There are three major character encodings, one stateful (JIS) and two stateless (EUC-Japanese and shift-JIS). The ones we are using, EUC-J, just set the eighth bit on both bytes of two-byte characters. Unless you want to write something that groks all three, it's probably worth the effort to read a whole book for this. ;) Other than inserting line breaks between Japanese characters if necessary (the Japanese language doesn't use spaces that often), and making sure that you don't cut the line between the two bytes that compose a single character, there isn't really that much to worry about if you're talking EUC. (Of course, there is an issue of certain charecters not pemitted to appear at the beginning of lines, like Japanese equivalents of "," and ".", but many programs (like netscape3) don't handle this anyway so people are used to seeing them screwed up....) Satoshi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609040006.RAA09404>