From owner-freebsd-questions@FreeBSD.ORG Sat Apr 21 08:13:01 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 88F4C106566B; Sat, 21 Apr 2012 08:13:01 +0000 (UTC) (envelope-from guru@unixarea.de) Received: from ms16-1.1blu.de (ms16-1.1blu.de [89.202.0.34]) by mx1.freebsd.org (Postfix) with ESMTP id 1A0688FC0C; Sat, 21 Apr 2012 08:13:01 +0000 (UTC) Received: from [188.174.57.140] (helo=localhost.my.domain) by ms16-1.1blu.de with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1SLVRD-000400-DO; Sat, 21 Apr 2012 10:12:59 +0200 Received: from localhost.my.domain (localhost [127.0.0.1]) by localhost.my.domain (8.14.4/8.14.3) with ESMTP id q3L8CvM9008800; Sat, 21 Apr 2012 10:12:58 +0200 (CEST) (envelope-from guru@unixarea.de) Received: (from guru@localhost) by localhost.my.domain (8.14.4/8.14.3/Submit) id q3L8CuhC008799; Sat, 21 Apr 2012 10:12:56 +0200 (CEST) (envelope-from guru@unixarea.de) X-Authentication-Warning: localhost.my.domain: guru set sender to guru@unixarea.de using -f Date: Sat, 21 Apr 2012 10:12:56 +0200 From: Matthias Apitz To: Matthew Seaman Message-ID: <20120421081256.GA8769@tinyCurrent> References: <20120421055823.GA6788@tinyCurrent> <4F925504.4090001@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4F925504.4090001@FreeBSD.org> X-Operating-System: FreeBSD 9.0-CURRENT r214444 (i386) User-Agent: Mutt/1.5.21 (2010-09-15) X-Con-Id: 51246 X-Con-U: 0-guru X-Originating-IP: 188.174.57.140 Cc: freebsd-questions@freebsd.org Subject: Re: converting UTF-8 to HTML X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Matthias Apitz List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2012 08:13:01 -0000 El día Saturday, April 21, 2012 a las 07:34:44AM +0100, Matthew Seaman escribió: > www/tidy-devel > > (which is effectively a fork of the original www/tidy project, and has > quite a lot of new functionality) > > If you specify 'ascii' for the output format, it should generate > appropriate character escapes. Thanks; it works fine if one specifies utf8 for input and ascii for output in a config file .tidy like: $ cat .tidy output-xhtml: yes add-xml-decl: no doctype: strict input-encoding: utf8 output-encoding: ascii indent: auto wrap: 76 repeated-attributes: keep-last error-file: errs.txt Then you can run and get valid ASCII HTML style, for example: $ echo 'ΜΙΣΟ ΛΙΤΡΟ ΑΘΩΣ ΚΟΚΚΙΝΟ ΠΑΡΑΚΑΛΩ' | tidy -config .tidy ΜΙΣΟ ΛΙΤΡΟ ΑΘΩΣ ΚΟΚΚΙΝΟ ΠΑΡΑΚΑΛΩ This is exactly what I was looking for. Thanks matthias -- Matthias Apitz t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211 e - w http://www.unixarea.de/ UNIX since V7 on PDP-11 | UNIX on mainframe since ESER 1055 (IBM /370) UNIX on x86 since SVR4.2 UnixWare 2.1.2 | FreeBSD since 2.2.5