Date: Tue, 07 Jul 2009 15:49:27 +0100 From: Matthew Seaman <m.seaman@infracaninophile.co.uk> To: "Aryeh M. Friedman" <aryeh.friedman@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: ot: regular expression help Message-ID: <4A536077.1080002@infracaninophile.co.uk> In-Reply-To: <4A5353D1.5010807@gmail.com> References: <4A5353D1.5010807@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig795AB6B7249CDE04F2C1D9BB Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Aryeh M. Friedman wrote: > I am attempting to make (without the perl expansions) a regular=20 > expansion that when used as a delim will split words on any=20 > punction/whitespace character *EXCEPT* "$" (for java people I want to=20 > feed it into something like this: >=20 > for(String foo:input.split([insert regex here]) > ... Well, there's no way to say "all foo except bar" using standard regexes, = so you can't use the [:punct:] character class. You'll have to roll your own= class. If your input is ASCII then see ispunct(3) for a handy list of all the ascii punctuation characters. I guess you'll need a RE something like th= is: []!"#%&'\(\)\*\+,\./:;<=3D>?@[\\^_`{\|}~-[:space:]]+ although that's completely untried, quite likely to not have all the metacharacters properly escaped (exactly what is or isn't a metacharacter= depends on the RE implementation you're using) and is probably horribly confused due to the inclusion of '[' '-' and ']' amongst the characters matched in the range. =20 If you're using anything other than ascii, then I suspect you're going to have problems with RE libs anyhow, unless you can somehow use PCRE. =20 The \p{isPunct} and \p{isWhite} escapes for matching unicode punctuation or whitespace is probably what you need. Even so, your best choice would probably be to separately check strings for the presence of $ characters -- maybe transform those $ characters to= something else -- and then split on any remaining punctuation characters.= Cheers, Matthew --=20 Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate Kent, CT11 9PW --------------enig795AB6B7249CDE04F2C1D9BB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREIAAYFAkpTYH4ACgkQ8Mjk52CukIw5xwCfTeecUAE0q+HvwhR4S1dxKZ0+ wzoAn2Jpv0XRW9LLB0zkMEmuIOD+MCfB =zVuU -----END PGP SIGNATURE----- --------------enig795AB6B7249CDE04F2C1D9BB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A536077.1080002>