Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 01 Jan 1999 19:51:43 +0000
From:      Mark Ovens <marko@uk.radan.com>
To:        Jerry Preeper <preeper@cts.com>
Cc:        freebsd-questions@FreeBSD.ORG
Subject:   Re: replace non-ascii characters
Message-ID:  <368D274F.7A57D11A@uk.radan.com>
References:  <3.0.5.32.19990101042759.008a1a70@crash.cts.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Jerry Preeper wrote:
> 
> I know this isn't really a freebsd question, but I'm not sure where else to
> ask.  I'm trying to write a small shell script that replaces non-ascii
> characters with the html equivalent in a file and just can't seem to figure
> how to identify the non-ascii characters.
> 
> for example, I have written a small shell script that takes a file name as
> input to replace them using sed.  Here is the script.
> 
> #!/bin/sh
>   for file in $*
>   do
>     sed -n "s/\\0x80/\&Ccedil\;/g" ${file}
>     sed -n "s/\\0x81/\&uuml\;/g" ${file}
>     ..... bunches more
>   done
> 
> The problem is the search part isn't finding the special character.  I have
> tried cutting and pasting the special character directly into the script as
> well, but it doesn't seem to work either.
> 
> Does anyone have any ideas on how to accomplish.
> 

I've found this problem before. As a suggestion try

#!/bin/sh
  for file in $*
  do
	cp ${file} /tmp/${file}
        awk '{gsub("\x80", "\\&Ccedil"); \
                gsub("\x81", "\\&Cuuml"); \

                ...more of the same

                print}' < /tmp/${file} > ${file}
        rm /tmp/${file}
  done

``gsub()'' replaces all occurrences in a line. Note that ``&'' needs
escaping with ``\\''. I'm not sure if there is a limit to the length of
line that awk can process.

Perl may well provide the best solution though.

HTH, Happy New Year

> Thanks in advance.
> 
> Jerry
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-questions" in the body of the message

-- 
  Trust the computer industry to shorten Year 2000 to Y2K. It
  was this thinking that caused the problem in the first place.

Mark Ovens, CNC Applications Engineer, Radan Computational Ltd
Sheet Metal CAD/CAM Solutions
mailto:marko@uk.radan.com    http://www.radan.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?368D274F.7A57D11A>