From owner-freebsd-questions Mon Feb 15 17:17:58 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA07250 for freebsd-questions-outgoing; Mon, 15 Feb 1999 17:17:58 -0800 (PST) (envelope-from owner-freebsd-questions@FreeBSD.ORG) Received: from milf18.bus.net (milf18.bus.net [207.41.25.18]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA07238 for ; Mon, 15 Feb 1999 17:17:54 -0800 (PST) (envelope-from cao@milf18.bus.net) Received: (from cao@localhost) by milf18.bus.net (8.8.8/8.8.8) id UAA11707; Mon, 15 Feb 1999 20:14:40 -0500 (EST) (envelope-from cao) Date: Mon, 15 Feb 1999 20:14:40 -0500 From: "Chuck O'Donnell" To: Sue Blake Cc: freebsd-questions@FreeBSD.ORG Subject: Re: cleaning a text file Message-ID: <19990215201440.A11649@milf18.bus.net> References: <19990215201056.19929@welearn.com.au> <19990216095232.J2207@lemis.com> <19990216103740.60271@welearn.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95i In-Reply-To: <19990216103740.60271@welearn.com.au>; from Sue Blake on Tue, Feb 16, 1999 at 10:37:40AM +1100 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, Feb 16, 1999 at 10:37:40AM +1100, Sue Blake wrote: > On Tue, Feb 16, 1999 at 09:52:32AM +1030, Greg Lehey wrote: > > On Monday, 15 February 1999 at 1:10:36 -0800, rick hamell wrote: > > > > > >> Also, this file has some very long lines which would get truncated > > >> or unexpectedly wrapped when sent as email. And if there is something > > >> strange, I have to read it and guess what it should have been. > > >> > > >> Maybe someone will come up with something for this particular case. > > >> I can't believe there's not some little untility for this that's been > > >> hanging around unloved for years. > > > > > > Oy! Ok... how does Greg reformat all those emails? > > > > With Emacs. I have a collection of macros which I'm constantly > > changing to catch up with new tricks that mailers discover. > > > > To Sue's original question: it depends on what your text looks like. > > tr(1) will remove characters if you ask it to. > > If I knew which characters were there (so I could ask tr to remove > them) I would have already removed them with my text editor. > > > fmt(1) might be useful for wrapping lines. > > I don't see the long line lengths as a big problem at this stage, but > fmt might be useful later. > > The problem is that I don't know which funny characters exist in the > file, if any. I want to find out what they are, so I can search for > them and eyeball them before killing them. > > > Just knowing which characters they are would give me many solutions > immediately. There still doesn't seem to be a way to find this out :-( > > Maybe there's a long way... somehow put a linefeed after each character > in the file (with sed?) and then sort it and look at the top and bottom > of the sorted file. > If you just want to find funny chars, how about: --------------- #!/usr/local/bin/perl require 5; $reg = '[^\w\s\$#\@!\`\~\%\^\&\*\(\)+=\|\\\?\<\>,.\/"\':;\{\}-]'; while (<>) { while (m/($reg)/og) { $p = pos() - 1; $c = ord $1; ($s = $_) =~ s/$reg/?/og; printf "%s%s^ L%d C%d\n", $s, " " x $p, $., $c; } } --------------- anything not in $reg will marked and replaced with a '?' char. `L' will show the line number and `C' is the decimal value of the character. you could probably fix it so it does the right thing on long lines. -- Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message