Date: Fri, 9 Oct 2009 20:29:27 +0200 (CEST) From: Oliver Fromme <olli@lurza.secnetix.de> To: wblock@wonkity.com (Warren Block) Cc: freebsd-questions@freebsd.org Subject: Re: for perl wizards. Message-ID: <200910091829.n99ITRFG031873@lurza.secnetix.de> In-Reply-To: <alpine.BSF.2.00.0910091140180.28881@wonkity.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Warren Block wrote: > Oliver Fromme wrote: > > Warren Block wrote: > > > Oliver Fromme wrote: > > > > Gary Kline wrote: > > > > > > > > > > Whenever I save a wordpeocessoe file [OOo, say] into a > > > > > text file, I get a slew of hex codes to indicate the char to be > > > > > used. I'm looking for a perl one-liner or script to translate > > > > > hex back into ', ", -- [that's a dash), and so forth. Why does > > > > > this fail to trans the hex code to an apostrophe? > > > > > > > > > > perl -pi.bak -e 's/\xe2\x80\x99/'/g' > > > > > > > > You need to escape the inner quote character, of course. > > > > I think sed is better suited for this task than perl. > > > > > > That's twice now people have suggested sed instead of perl. Why? For > > > many uses, perl is a better sed than sed. The regex engine is far more > > > powerful and escapes are much simpler. > > > > Neither powerful regexes nor escapes will help in this case. > > Certainly \x will not help in sed; sed doesn't have it. Right, that's an annoying flaw in sed (it doesn't even support the \0 syntax for octal values, which is more standard than \x). Normally I just type such characters literally, which is accepted fine by sed (it is 8 bit clean). However, in this particular case I really recommend to use the "recode" tool (ports/conversion/recode) to convert from UTF-8 to some other encoding. Much easier, and more correct. E2-80-99 (unicode 2019) isn't even a real apostrophe in UTF-8, it's a right single quotation mark. An apostrophe would be ASCII 27. Maybe the OP should configure his software to not save the file with UTF-8 encoding in the first place. I'm not an OOo user, so I can't tell how to do that. But obviously the OP doesn't want the file to be stored as UTF-8. > It's possible "Mastering Regular Expressions" has influenced my thinking > on this. This isn't about regular expressions at all. This is about replacing fixed strings. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "One of the main causes of the fall of the Roman Empire was that, lacking zero, they had no way to indicate successful termination of their C programs." -- Robert Firth
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200910091829.n99ITRFG031873>