Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Mar 2003 08:43:58 +0100
From:      Jeroen Ruigrok/asmodai <asmodai@wxs.nl>
To:        Giorgos Keramidas <keramida@ceid.upatras.gr>
Cc:        freebsd-doc@FreeBSD.ORG
Subject:   Re: docs/50211: [PATCH] Fix textfile creation
Message-ID:  <20030324074358.GL87781@nexus.ninth-circle.org>
In-Reply-To: <20030324024026.GA23139@gothmog.gr>
References:  <200303231710.h2NHAGEb024196@freefall.freebsd.org> <20030324020745.GA22656@gothmog.gr> <20030324024026.GA23139@gothmog.gr>

next in thread | previous in thread | raw e-mail | index | archive | help
-On [20030324 04:02], Giorgos Keramidas (keramida@ceid.upatras.gr) wrote:
>Hmmm, now that I think about this reply a bit more, it might sound
>like I'm being negative just for the sake of it.  I'm not.

I wasn't even assuming that. :)

>I have been using w3m for producing text versions of the few Greek
>documents I managed to write so far.  It just works.  No special
>tweaking of ~/.w3m needed, no strange conversions donen to 8-bit text.

Problem with w3m is that it doesn't format quite as well as (e)links
does (or did).
w3m cuts text off at certain points, whereas elinks does it right.

Example:

w3m -dump -T text/html -cols 78:

For questions about TenDRA, read the documentation before contacting <    
                           help@tendra.org>.                              

elinks -dump -dump-width 78:

For questions about TenDRA, read the documentation before contacting
                         <help@tendra.org>.

lynx -dump -force_html -width=78:

For questions about TenDRA, read the [5]documentation before
              contacting <[6]help@tendra.org>.

References

[...]
   6. mailto:help@tendra.org

I need to add the explicit recognition of the HTML to w3m and lynx since
they apparently look at the extension of the filename, which is not
always .html, but also .html-text.

>If anyone is interested in testing this with a full doc and/or web
>build on i386, alpha and sparc64 I'd be very glad to commit it and
>update textproc/docproj to depend on w3m instead of links.

I am not convinved either one is the best solution thus far.  I am going
to hack elinks a bit to properly parse the Content-Type.  Funnily
though, it ``translated'' the 8-bit Greek from a page into latin-1 and I
recognised Greek words. :)

I still need to test: Korean, Japanese, Chinese, Taiwanese, Russian,
some European languages, Hindi to get a good grip for what is and what
is not supported.

Btw, I truly think Unix sucks hard when it comes to l10n and i18n.  Been
trying to get my aterm to display Greek for the past hour or so.

-- 
Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / a capoeirista
PGP fingerprint: 2D92 980E 45FE 2C28 9DB7  9D88 97E6 839B 2EAC 625B
http://www.tendra.org/   | http://www.in-nomine.org/~asmodai/diary/
If you would thoroughly know anything, teach it to others...

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030324074358.GL87781>