Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Apr 2002 15:49:08 +0300
From:      Giorgos Keramidas <keramida@ceid.upatras.gr>
To:        Dan Langille <dan@langille.org>
Cc:        Terry Lambert <tlambert2@mindspring.com>, chat@FreeBSD.ORG
Subject:   Re: what are these characters please?
Message-ID:  <20020411124908.GD39629@hades.hell.gr>
In-Reply-To: <20020411113858.E48BB3F30@bast.unixathome.org>
References:  <3CB571D6.2C10B9AA@mindspring.com> <20020411113858.E48BB3F30@bast.unixathome.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2002-04-11 07:38, Dan Langille wrote:
> And line 14 is:
>
>         [Submitted by: Ville SkyttESC,AdESC(B &lt;ville.skytta@iki.fi&gt;]
>
> I think my goal here is remove all non-ISO-8859-1 characters from the
> incoming cvs-all message.  I've been searching newsgroups (comp.lang.perl
> and comp.text.xml) trying to find a simple solution.

You can probably get away with using col(1) and proper environment
settings to filter the CVS logs:

	$ env LANG=en_US LC_ALL=en_US.ISO8859-1 col -b
	Søren Schmidt
	Søren Schmidt

	$ env LANG=C LC_ALL=C col -b
	Søren Schmidt
	Sren Schmidt

The name of Søren includes "o/" and it is a valid ISO8859-1 character.
col(1) can understand and filter correctly based on this fact, but I'm
not sure if it can strip all the ANSI escape codes that you were
having trouble with.  It's just an idea.  Might work, or might not...

I don't know how you are using the CVS logs, so you'll have to set up
some Perl pipe to do the work yourself.  Perhaps something like:

	close(STDIN);
	open(STDIN, "env LANG=en_US LC_ALL=en_US.ISO8859-1 col -b |");

Giorgos Keramidas                       FreeBSD Documentation Project
keramida@{freebsd.org,ceid.upatras.gr}  http://www.FreeBSD.org/docproj/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020411124908.GD39629>