From owner-freebsd-doc@FreeBSD.ORG Wed Jan 18 22:49:49 2012 Return-Path: Delivered-To: freebsd-doc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E71B7106564A for ; Wed, 18 Jan 2012 22:49:49 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 8C0DA8FC1A for ; Wed, 18 Jan 2012 22:49:49 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.5/8.14.5) with ESMTP id q0IMnmVk041401 for ; Wed, 18 Jan 2012 15:49:48 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.5/8.14.5/Submit) with ESMTP id q0IMnmHh041398 for ; Wed, 18 Jan 2012 15:49:48 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Wed, 18 Jan 2012 15:49:48 -0700 (MST) From: Warren Block To: freebsd-doc@freebsd.org In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (wonkity.com [127.0.0.1]); Wed, 18 Jan 2012 15:49:48 -0700 (MST) Subject: Re: Tidy and HTML tab spacing X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jan 2012 22:49:50 -0000 HTML versions of FreeBSD documents are fed through tidy (www/tidy or www/tidy-devel) for cleanup. There's a bug in tidy[1] that can cause tab stops to be wrong: http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623 Note how DISTNAME and EXTRACT_SUFX do not line up. They are correct in the source book.sgml. So what to do? 1. It might be possible to fix tidy. This would be the neatest. (See [1]). 2. An option could be added to tidy to ignore tabs. The HTML standard "strongly discourages" tabs in PRE elements[2], but does not disallow them. Using actual tabs has an added benefit to the user in that they could cut-and-paste or just drag-select Makefile examples to see embedded tabs. 3. Tidy could be replaced with some other tool. However, the others I've found have additional dependencies on either PHP or Java, so I did not test them for correct handling of tabs[3],[4]. Either one adds some overhead not just for doc build machines but anyone who wants to work on FreeBSD documentation. 4. Add newlines to the HTML in the build process before it gets to tidy: s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/ 5. Don't tidy HTML files at all (suggested as an option by Benedict Reuschling). The unprocessed HTML is ugly, but few people are going to look at it directly. Files that haven't been through tidy are a little larger, about 4% in the case of the Porter's Handbook. Footnotes: [1] In www/tidy-devel, line 355 of streamio.c does not realize that characters at the beginning of the line may be inside a tag and should not count as visible. The pre-tidy HTML output of the example above is ----
DISTNAME=	foo
EXTRACT_SUFX=	.tgz
---- The '>' before DISTNAME is being wrongly counted toward the tab stop. See http://www.wonkity.com/~wblock/tidy/ for a slightly more detailed example. Tidy is mature software, and there's been a bug report for this problem in the bug database since 2008: https://sourceforge.net/tracker/?func=detail&aid=1885471&group_id=27659&atid=390963 So bug fixes in this area from the tidy project are unlikely. [2] http://www.w3.org/TR/html401/struct/text.html#edef-PRE [3] http://htmlpurifier.org/ [4] http://htmlcleaner.sourceforge.net/index.php