From owner-freebsd-doc@FreeBSD.ORG Wed Jan 18 23:45:55 2012 Return-Path: Delivered-To: freebsd-doc@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 28B0E106566C for ; Wed, 18 Jan 2012 23:45:55 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (gatekeeper-int.allbsd.org [IPv6:2001:2f0:104:e002::2]) by mx1.freebsd.org (Postfix) with ESMTP id 16B6A8FC12 for ; Wed, 18 Jan 2012 23:45:53 +0000 (UTC) Received: from alph.allbsd.org ([IPv6:2001:2f0:104:e010:862b:2bff:febc:8956]) (authenticated bits=128) by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id q0INjerB041330; Thu, 19 Jan 2012 08:45:50 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0) by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id q0INjb8b056299; Thu, 19 Jan 2012 08:45:39 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Thu, 19 Jan 2012 08:44:34 +0900 (JST) Message-Id: <20120119.084434.926306642968660094.hrs@allbsd.org> To: wblock@wonkity.com From: Hiroki Sato In-Reply-To: References: X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.3.51 on Emacs 23.3 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="--Security_Multipart(Thu_Jan_19_08_44_34_2012_839)--" Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (mail.allbsd.org [IPv6:2001:2f0:104:e001::32]); Thu, 19 Jan 2012 08:45:50 +0900 (JST) X-Spam-Status: No, score=-104.6 required=13.0 tests=BAYES_00, CONTENT_TYPE_PRESENT, RDNS_NONE, SPF_SOFTFAIL, USER_IN_WHITELIST autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on gatekeeper.allbsd.org Cc: freebsd-doc@FreeBSD.org Subject: Re: Tidy and HTML tab spacing X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jan 2012 23:45:55 -0000 ----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Warren Block wrote in : wb> HTML versions of FreeBSD documents are fed through tidy (www/tidy or wb> www/tidy-devel) for cleanup. There's a bug in tidy[1] that can cause wb> tab stops to be wrong: wb> http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623 wb> wb> Note how DISTNAME and EXTRACT_SUFX do not line up. They are correct wb> in the source book.sgml. wb> wb> So what to do? I lean to fixing Tidy if possible. The reason why we are using Tidy is to fix mark-ups in rendered results from various tools like Jade, not (only) for human-readability. The results of Tidy are still not perfect from viewpoint of standard conformance, but it is better than nothing even if most of modern www browsers can handle the rendered HTMLs directly. It is known that there are some problems with entity dereference and white-space handling as you also pointed out. wb> 3. Tidy could be replaced with some other tool. However, the others Although I tried xmlindent, xmlformat, and xmllint as a replacement in the past, they were indended for well-formed XML docs and not enough for fixing malformed (sometimes broken) mark-ups. wb> 4. Add newlines to the HTML in the build process before it gets to wb> tidy: wb> s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/ I think this will break the results because a newline just after ">" is recognized as CDATA. wb> 5. Don't tidy HTML files at all (suggested as an option by Benedict wb> Reuschling). The unprocessed HTML is ugly, but few people are going wb> to look at it directly. Files that haven't been through tidy are a wb> little larger, about 4% in the case of the Porter's Handbook. To eliminate Tidy we have to improve standard conformance of the rendered results. I do not know the recent situation precisely because I investigated it seven years ago, but I think it still has some glitches. -- Hiroki ----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEABECAAYFAk8XWWIACgkQTyzT2CeTzy3HyQCeMVvG+f2eYwy4eQeSlgSWZOZv /AoAn3xKxtWP13Zwx1wD36PL32/SJozj =Tjgi -----END PGP SIGNATURE----- ----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)----