Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Apr 2012 10:12:56 +0200
From:      Matthias Apitz <>
To:        Matthew Seaman <>
Subject:   Re: converting UTF-8 to HTML
Message-ID:  <20120421081256.GA8769@tinyCurrent>
In-Reply-To: <>
References:  <20120421055823.GA6788@tinyCurrent> <>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
El día Saturday, April 21, 2012 a las 07:34:44AM +0100, Matthew Seaman escribió:

> www/tidy-devel
> (which is effectively a fork of the original www/tidy project, and has
> quite a lot of new functionality)
> If you specify 'ascii' for the output format, it should generate
> appropriate character escapes.

Thanks; it works fine if one specifies utf8 for input and ascii for
output in a config file .tidy like:

$ cat .tidy
output-xhtml: yes
add-xml-decl: no
doctype: strict
input-encoding: utf8
output-encoding: ascii
indent: auto
wrap: 76
repeated-attributes: keep-last
error-file: errs.txt

Then you can run and get valid ASCII HTML style, for example:

$ echo 'ΜΙΣΟ ΛΙΤΡΟ ΑΘΩΣ ΚΟΚΚΙΝΟ ΠΑΡΑΚΑΛΩ' | tidy -config .tidy 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

<html xmlns="">;
  <meta name="generator" content=
  "HTML Tidy for FreeBSD (vers 7 December 2008), see" />


  &Mu;&Iota;&Sigma;&Omicron; &Lambda;&Iota;&Tau;&Rho;&Omicron;

This is exactly what I was looking for. Thanks

Matthias Apitz
t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211
e <> - w
UNIX since V7 on PDP-11 | UNIX on mainframe since ESER 1055 (IBM /370)
UNIX on x86 since SVR4.2 UnixWare 2.1.2 | FreeBSD since 2.2.5

Want to link to this message? Use this URL: <>