Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jan 2012 08:44:34 +0900 (JST)
From:      Hiroki Sato <hrs@FreeBSD.org>
To:        wblock@wonkity.com
Cc:        freebsd-doc@FreeBSD.org
Subject:   Re: Tidy and HTML tab spacing
Message-ID:  <20120119.084434.926306642968660094.hrs@allbsd.org>
In-Reply-To: <alpine.BSF.2.00.1201181520140.40712@wonkity.com>
References:  <alpine.BSF.2.00.1201181255210.39534@wonkity.com> <alpine.BSF.2.00.1201181520140.40712@wonkity.com>

next in thread | previous in thread | raw e-mail | index | archive | help
----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Warren Block <wblock@wonkity.com> wrote
  in <alpine.BSF.2.00.1201181520140.40712@wonkity.com>:

wb> HTML versions of FreeBSD documents are fed through tidy (www/tidy or
wb> www/tidy-devel) for cleanup.  There's a bug in tidy[1] that can cause
wb> tab stops to be wrong:
wb> http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook/makefile-distfiles.html#AEN1623
wb>
wb> Note how DISTNAME and EXTRACT_SUFX do not line up.  They are correct
wb> in the source book.sgml.
wb>
wb> So what to do?

 I lean to fixing Tidy if possible.  The reason why we are using Tidy
 is to fix mark-ups in rendered results from various tools like Jade,
 not (only) for human-readability.  The results of Tidy are still not
 perfect from viewpoint of standard conformance, but it is better than
 nothing even if most of modern www browsers can handle the rendered
 HTMLs directly.

 It is known that there are some problems with entity dereference and
 white-space handling as you also pointed out.

wb> 3. Tidy could be replaced with some other tool.  However, the others

 Although I tried xmlindent, xmlformat, and xmllint as a replacement
 in the past, they were indended for well-formed XML docs and not
 enough for fixing malformed (sometimes broken) mark-ups.

wb> 4. Add newlines to the HTML in the build process before it gets to
wb>    tidy:
wb>      s/CLASS="PROGRAMLISTING"\n>/CLASS="PROGRAMLISTING">\n/

 I think this will break the results because a newline just after ">"
 is recognized as CDATA.

wb> 5. Don't tidy HTML files at all (suggested as an option by Benedict
wb>    Reuschling).  The unprocessed HTML is ugly, but few people are going
wb>    to look at it directly.  Files that haven't been through tidy are a
wb>    little larger, about 4% in the case of the Porter's Handbook.

 To eliminate Tidy we have to improve standard conformance of the
 rendered results.  I do not know the recent situation precisely
 because I investigated it seven years ago, but I think it still has
 some glitches.

-- Hiroki

----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEABECAAYFAk8XWWIACgkQTyzT2CeTzy3HyQCeMVvG+f2eYwy4eQeSlgSWZOZv
/AoAn3xKxtWP13Zwx1wD36PL32/SJozj
=Tjgi
-----END PGP SIGNATURE-----

----Security_Multipart(Thu_Jan_19_08_44_34_2012_839)----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120119.084434.926306642968660094.hrs>