Date: Sun, 12 Feb 2006 11:53:59 +0100 From: Kristian Vaaf <vaaf@broadpark.no> To: Parv <parv@pair.com> Cc: questions@freebsd.org Subject: Re: Script to clean text files Message-ID: <7.0.1.0.2.20060212114457.0219ab78@broadpark.no> In-Reply-To: <20060211214549.GA1674@holestein.holy.cow> References: <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no> <20060211214549.GA1674@holestein.holy.cow>
next in thread | previous in thread | raw e-mail | index | archive | help
At 22:45 11.02.2006, Parv wrote: >in message <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no>, >wrote Kristian Vaaf thusly... > > > > > > Among other things, this script is suppose to add an empty line at > > the bottom of a file. > > > > But somehow it always removes the first line in a text file, > > how do I stop this? > >Can you provide a small sample file complete w/ things that you >want to remove? > > > > #!/usr/local/bin/bash > > # > > # Remove CRLF, trailing whitespace and double lines. > >What are "double lines"? > > > > # $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $ > > # > > for file in `find -s . -type f -not -name ".*"`; do > > if file -b "$file" | grep -q 'text'; then > > echo >> "$file" > > perl -i -pe 's/\015$//' "$file" > > perl -i -pe 's/[^\S\n]+$//g' "$file" > >Why do you have two perl runs? More importantly, you will remove >anything which is not whitespace or not newline. That means, in the >end, you should have a file filled w/ whitespace only. > > > > > perl -pi -00 -e 1 "$file" > > echo "$file: Done" > > fi > > done > >To remove CRLF, trailing whitespace, and 2 consecutive blank lines >... > > { > tr -d '\r' < "$file" \ > | sed -E -e 's/[[:space:]]+$//' \ > | cat -s - > "${file}.tmp" > } && mv -f "${file}.tmp" "$file" > > > - Parv > >-- Hello Parv! Yes I meant blank lines :) I've used the script for a long time now. The only error is that it removes the top blank space, if any. Which is a bit annoying. It's fine for scripts with shebangs but not for custom laid out documents etc. I just wanted to know where that error was. I use the Perl runs because those were the only runs people gave me. You know how it is, you enter a FreeBSD help channel and ask how you do this or that, and the upper gentlemen always reply "Learn Perl," and then they go on giving you Perl runs :) Your suggestion looks very very good. So is this alright? #!/usr/local/bin/bash # # Remove CRLF, trailing whitespace and blank lines. # $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $ # for file in `find -s . -type f -not -name ".*"`; do if file -b "$file" | grep -q 'text'; then echo >> "$file" tr -d '\r' < "$file" sed -E -e 's/[[:space:]]+$//' cat -s - > "${file}.tmp" && mv -f "${file}.tmp" "$file" echo "$file: Done" fi done All the best man, Vaaf
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7.0.1.0.2.20060212114457.0219ab78>