Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 May 2008 08:29:03 +0200
From:      Karel Miklav <karel@inetis.com>
To:        Oliver Fromme <olli@lurza.secnetix.de>
Cc:        delphij@freebsd.org, chinsan <chinsan.tw@gmail.com>, freebsd-questions@FreeBSD.ORG
Subject:   Re: Sed, shell and hexadecimal character codes
Message-ID:  <483BAA2F.30009@inetis.com>
In-Reply-To: <200805231523.m4NFNOwO024115@lurza.secnetix.de>
References:  <200805231523.m4NFNOwO024115@lurza.secnetix.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Oliver Fromme wrote:
> Karel Miklav wrote:
>  > There's a tip in the FreeBSD fortunes database that says:
>  > 
>  > > Want to strip UTF-8 BOM(Bye Order Mark) from given files?
>  > > 
>  > > sed -e '1s/^\xef\xbb\xbf//' < bomfile > newfile
> 
> FreeBSD's sed(1) doesn't support hexadecimal or octal
> sequences.  I think even gnu sed doesn't support it, but
> you might try it yourself (/usr/ports/textprog/gsed).
> 
> I don't know why that fortunes entry exist.  It's wrong.

That's what I thought. Maybe we should replace the recipe with
the awk version Oliver proposed below?

>  > I can't make it work, and I can't find any other method to
>  > work with hexa codes in scripts or on the command line so
>  > I'm kind-a depressed :) I help myself with xxd now, but if
>  > it is possible to avoid it, I'd like to hear about it.
> 
> There is no standard for handling octal and hexadecimal
> sequences, unfortunately, so you have to consult the
> manual page to find out.  For example, tr(1) supports
> octal sequences only (no hexadecimal), while awk(1)
> supports both.  So the above line could be rewritten
> with awk:
> 
> awk '{if(NR==1)sub(/^\xef\xbb\xbf/, "");print}' < bomfile > newfile





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?483BAA2F.30009>