Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Feb 1999 11:49:59 +1100
From:      Sue Blake <sue@welearn.com.au>
To:        Mark Ovens <marko@uk.radan.com>
Cc:        questions@FreeBSD.ORG
Subject:   Re: cleaning a text file
Message-ID:  <19990216114959.08931@welearn.com.au>
In-Reply-To: <19990216002703.A337@localhost>; from Mark Ovens on Tue, Feb 16, 1999 at 12:27:03AM %2B0000
References:  <19990215201056.19929@welearn.com.au> <Pine.BSF.3.91.990215010943.20451F-100000@dsinw.com> <19990216095232.J2207@lemis.com> <19990216103740.60271@welearn.com.au> <19990216002703.A337@localhost>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 16, 1999 at 12:27:03AM +0000, Mark Ovens wrote:
> On Tue, Feb 16, 1999 at 10:37:40AM +1100, Sue Blake wrote:
> > On Tue, Feb 16, 1999 at 09:52:32AM +1030, Greg Lehey wrote:
> > > On Monday, 15 February 1999 at  1:10:36 -0800, rick hamell wrote:
> > > >
> > > >> Also, this file has some very long lines which would get truncated
> > > >> or unexpectedly wrapped when sent as email. And if there is something
> > > >> strange, I have to read it and guess what it should have been.
> > > >>
> > > >> Maybe someone will come up with something for this particular case.
> > > >> I can't believe there's not some little untility for this that's been
> > > >> hanging around unloved for years.
> > > >
> > > > 	Oy! Ok... how does Greg reformat all those emails?
> > > 
> > > With Emacs.  I have a collection of macros which I'm constantly
> > > changing to catch up with new tricks that mailers discover.
> > > 
> > > To Sue's original question: it depends on what your text looks like.
> > > tr(1) will remove characters if you ask it to.
> > 
> > If I knew which characters were there (so I could ask tr to remove
> > them) I would have already removed them with my text editor.
> > 
> > >  fmt(1) might be useful for wrapping lines.
> > 
> > I don't see the long line lengths as a big problem at this stage, but
> > fmt might be useful later.
> > 
> > The problem is that I don't know which funny characters exist in the
> > file, if any. I want to find out what they are, so I can search for
> > them and eyeball them before killing them.
> > 
> > 
> > Just knowing which characters they are would give me many solutions
> > immediately. There still doesn't seem to be a way to find this out :-(
> > 
> 
> First you need to identify the offending characters.

Indeed. That is my sole problem.

> Use od(1) or
> hexdump(1) to identify them and then work out a filter.

Well, you hit it on the head. I was being very lazy about this because
I really don't feel like reading and assessing 3 million hex numbers
as they flow across the screen today, it's too hot. Maybe tomorrow.

> Are they all extended ASCII (>127) chars? or are some of them
> control (<32) chars?.

If any exist in either of these categories, I want to be informed.
That's all I really need.

> You could possibly use awk(1) as a filter,
> or write a simple C prog using issprint() and isspace().

Not in the short term I couldn't. But I'm surprised that if it's so
easy to write a program to do this nobody has done so in the ancient or
recent past. It seems like something that'd be wanted frequently, yet
the responses I'm getting suggest that hardly anyone has thought
about this problem much previously. I find that hard to believe,
but I am slowly coming round.


-- 

Regards,
        -*Sue*-


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990216114959.08931>