Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 May 2007 15:20:09 GMT
From:      Jeroen Ruigrok van der Werven <asmodai@in-nomine.org>
To:        freebsd-doc@FreeBSD.org
Subject:   Re: docs/50211: [PATCH] doc.docbook.mk: fix textfile creation
Message-ID:  <200705131520.l4DFK9nS000569@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR docs/50211; it has been noted by GNATS.

From: Jeroen Ruigrok van der Werven <asmodai@in-nomine.org>
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: docs/50211: [PATCH] doc.docbook.mk: fix textfile creation
Date: Sun, 13 May 2007 16:59:23 +0200

 A long overdue update I guess.
 
 Neither links or elinks will help for the multibyte environments of Chinese,
 Japanese, Korean and the likes. They simply do not understand encodings such
 as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8.
 
 Using www/w3m-m17n I can at least view Japanese pages.
 Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the
 resulting file is an UTF-8 encoded plain text file.
 
 The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR
 (Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and
 KOI8-U (Russian).
 
 I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it
 all works fine.
 
 So my suggestion is to change HTML2TXT to use w3m from w3m-m17n.
 
 -- 
 Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
 イェルーン ラウフロック ヴァン デル ウェルヴェン
 http://www.in-nomine.org/ | http://www.rangaku.org/
 Reality is an illusion, grimmer. The dreamlands are like masks within
 masks, and Time has no dominion beyond the Shroud...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200705131520.l4DFK9nS000569>