Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Mar 2003 21:28:04 +0200
From:      Giorgos Keramidas <keramida@ceid.upatras.gr>
To:        Jeroen Ruigrok/asmodai <asmodai@wxs.nl>
Cc:        freebsd-doc@FreeBSD.ORG
Subject:   Re: docs/50211: [PATCH] Fix textfile creation
Message-ID:  <20030324192804.GA26996@gothmog.gr>
In-Reply-To: <20030324074358.GL87781@nexus.ninth-circle.org>
References:  <200303231710.h2NHAGEb024196@freefall.freebsd.org> <20030324020745.GA22656@gothmog.gr> <20030324024026.GA23139@gothmog.gr> <20030324074358.GL87781@nexus.ninth-circle.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2003-03-24 08:43, Jeroen Ruigrok/asmodai <asmodai@wxs.nl> wrote:
>-On [20030324 04:02], Giorgos Keramidas (keramida@ceid.upatras.gr) wrote:
>> I have been using w3m for producing text versions of the few Greek
>> documents I managed to write so far.  It just works.  No special
>> tweaking of ~/.w3m needed, no strange conversions donen to 8-bit text.
>
> Problem with w3m is that it doesn't format quite as well as (e)links
> does (or did).
> w3m cuts text off at certain points, whereas elinks does it right.
>
> Example:
>
> w3m -dump -T text/html -cols 78:
>
> For questions about TenDRA, read the documentation before contacting <
>                            help@tendra.org>.

Ah, yes.  This is why I haven't been too persistent about switching to
w3m for everything.

> I need to add the explicit recognition of the HTML to w3m and lynx since
> they apparently look at the extension of the filename, which is not
> always .html, but also .html-text.

w3m seems to work fairly well with -T text/html, fwiw.

> I am not convinved either one is the best solution thus far.  I am
> going to hack elinks a bit to properly parse the Content-Type.
> Funnily though, it ``translated'' the 8-bit Greek from a page into
> latin-1 and I recognised Greek words. :)

It's a bit early to be certain how well it will work if I use it for a
while and read the documentation or source more carefully.  What you
describe seems to be a result of the notion elinks has for "output
terminals".  Is there some way of forcing elinks to use a "dumb"
terminal which we can set to ISO-8859-7 or whatever else with command
line options when -dump is used?

It does recognise Greek text but uses 7-bit approximations for the
output characters.  For instance:

	Ellhnik'o ke'imeno.

which is "Greek text" in 7-bit approximations of ISO-8859-7 Greek.

This is not enough though.  For European texts we need a browser that
can -dump 8-bit text without doing funky things with the characters.

I can't speak for Chinese, Japanese, Hangul or any other language that
uses wide characters or Unicode, so I'll leave this to more
experienced people who actually use those languages and encodings.

> Btw, I truly think Unix sucks hard when it comes to l10n and i18n.  Been
> trying to get my aterm to display Greek for the past hour or so.

Ehm, I'm not using aterm but it's small enough.  Let me install it for
a while... [ installs aterm port ]

...ah there it is.  It works fine with Greek here.  I'm using a font
that I hacked with xmbdfed to add ISO-8859-7 Greek characters, derived
from lucida-typewriter-10 and it displays Greek fine, but fails to
read *any* Greek at all from the keyboard.

Transparencies and all the rest are nice, but terminal emulators
really need to grow up and learn to be 8-bit clean in 2003 :-(

- Giorgos


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030324192804.GA26996>