From owner-freebsd-doc Wed Mar 8 3:52:12 2000 Delivered-To: freebsd-doc@freebsd.org Received: from nagual.pp.ru (pobrecita.freebsd.ru [194.87.13.42]) by hub.freebsd.org (Postfix) with ESMTP id E666837B919; Wed, 8 Mar 2000 03:52:07 -0800 (PST) (envelope-from ache@nagual.pp.ru) Received: (from ache@localhost) by nagual.pp.ru (8.9.3/8.9.3) id OAA07874; Wed, 8 Mar 2000 14:52:00 +0300 (MSK) (envelope-from ache) Date: Wed, 8 Mar 2000 14:51:58 +0300 From: "Andrey A. Chernov" To: Hiroki Sato Cc: phantom@FreeBSD.ORG, doc@FreeBSD.ORG Subject: Re: SGML->HTML: entities translation is broken for non-Latin1 charsets Message-ID: <20000308145158.A7844@nagual.pp.ru> References: <20000306003545.A90564@nagual.pp.ru> <20000305151810.A200@scorpion.crimea.ua> <20000306130945.A92757@nagual.pp.ru> <20000305203633.A89852@nagual.pp.ru> <20000306130945.A92757@nagual.pp.ru> <200003081024.TAA24457@mail.geocities.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <200003081024.TAA24457@mail.geocities.co.jp>; from hrs@geocities.co.jp on Wed, Mar 08, 2000 at 07:20:36PM +0900 Organization: Biomechanoid Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, Mar 08, 2000 at 07:20:36PM +0900, Hiroki Sato wrote: > the last conversion of >127 characters, so < is output as > an entity <, but   is output as raw code #160. As I already write,   ->   conversion is still valid per HTML specs since numeric entities interpreted using Unicode, not local charset. But not all browsers implements it properly :-(   -> \xA0 is invalid per standard sice binary \xA0 interpreted according to local charset. > This problem is unavoidable as long as we use the current version > of tidy. We can build doc with NO_TIDY flag to avoid the problem > tentatively (actually do so now in Japanese-doc), but I personally > don't think this is a reasonable way. > > To tell the truth, this was pointed out and submitted a patch to > fix it by Kuriyama-san before. It seemed that tidy developers > didn't think it an important issue. We need to distinguish between short term and long term solutions. Short term solution is any workaround to unbroke non-Latin1 www and docs right now. Long term solution is either utilities local patches or contacting with their maintainers. So, if you think that only way to unbroke FAQ right now is NO_TIDY, it must be applied regardless of possible service/features lost in this step. Things must be not broken first or not builded at all. All other enhancements are optional. -- Andrey A. Chernov http://nagual.pp.ru/~ache/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message