Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 11 Feb 2006 16:45:49 -0500
From:      Parv <parv@pair.com>
To:        Kristian Vaaf <vaaf@broadpark.no>
Cc:        questions@freebsd.org
Subject:   Re: Script to clean text files
Message-ID:  <20060211214549.GA1674@holestein.holy.cow>
In-Reply-To: <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no>
References:  <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no>

next in thread | previous in thread | raw e-mail | index | archive | help
in message <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no>,
wrote Kristian Vaaf thusly...
>
>
> Among other things, this script is suppose to add an empty line at
> the bottom of a file.
>
> But somehow it always removes the first line in a text file,
> how do I stop this?

Can you provide a small sample file complete w/ things that you
want to remove?


> #!/usr/local/bin/bash
> #
> #   Remove CRLF, trailing whitespace and double lines.

What are "double lines"?


> #   $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $
> #
> for file in `find -s . -type f -not -name ".*"`; do
>       if file -b "$file" | grep -q 'text'; then
>               echo >> "$file"
>               perl -i -pe 's/\015$//' "$file"
>               perl -i -pe 's/[^\S\n]+$//g' "$file"

Why do you have two perl runs?  More importantly, you will remove
anything which is not whitespace or not newline.  That means, in the
end, you should have a file filled w/ whitespace only.

>
>               perl -pi -00 -e 1 "$file"
>               echo "$file: Done"
>       fi
> done

To remove CRLF, trailing whitespace, and 2 consecutive blank lines
...

  {
    tr -d '\r' < "$file" \
    | sed -E -e 's/[[:space:]]+$//' \
    | cat -s - > "${file}.tmp"
  } && mv -f "${file}.tmp" "$file"


  - Parv

--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060211214549.GA1674>