Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 9 Oct 2009 20:29:27 +0200 (CEST)
From:      Oliver Fromme <olli@lurza.secnetix.de>
To:        wblock@wonkity.com (Warren Block)
Cc:        freebsd-questions@freebsd.org
Subject:   Re: for perl wizards.
Message-ID:  <200910091829.n99ITRFG031873@lurza.secnetix.de>
In-Reply-To: <alpine.BSF.2.00.0910091140180.28881@wonkity.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Warren Block wrote:
 > Oliver Fromme wrote:
 > > Warren Block wrote:
 > > > Oliver Fromme wrote:
 > > > > Gary Kline wrote:
 > > > > > 
 > > > > > Whenever I save a wordpeocessoe file [OOo, say] into a
 > > > > > text file, I get a slew of hex codes to indicate the char to be
 > > > > > used.  I'm looking for a perl one-liner or script to translate
 > > > > > hex back into ', ", -- [that's a dash), and so forth.  Why does
 > > > > > this fail to trans the hex code to an apostrophe?
 > > > > > 
 > > > > > perl -pi.bak -e 's/\xe2\x80\x99/'/g'
 > > > > 
 > > > > You need to escape the inner quote character, of course.
 > > > > I think sed is better suited for this task than perl.
 > > > 
 > > > That's twice now people have suggested sed instead of perl.  Why?  For
 > > > many uses, perl is a better sed than sed.  The regex engine is far more
 > > > powerful and escapes are much simpler.
 > > 
 > > Neither powerful regexes nor escapes will help in this case.
 > 
 > Certainly \x will not help in sed; sed doesn't have it.

Right, that's an annoying flaw in sed (it doesn't even
support the \0 syntax for octal values, which is more
standard than \x).

Normally I just type such characters literally, which
is accepted fine by sed (it is 8 bit clean).

However, in this particular case I really recommend to
use the "recode" tool (ports/conversion/recode) to convert
from UTF-8 to some other encoding.  Much easier, and more
correct.

E2-80-99 (unicode 2019) isn't even a real apostrophe in
UTF-8, it's a right single quotation mark.  An apostrophe
would be ASCII 27.

Maybe the OP should configure his software to not save the
file with UTF-8 encoding in the first place.  I'm not an
OOo user, so I can't tell how to do that.  But obviously
the OP doesn't want the file to be stored as UTF-8.

 > It's possible "Mastering Regular Expressions" has influenced my thinking 
 > on this.

This isn't about regular expressions at all.  This is
about replacing fixed strings.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"One of the main causes of the fall of the Roman Empire was that,
lacking zero, they had no way to indicate successful termination
of their C programs."
        -- Robert Firth



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200910091829.n99ITRFG031873>