Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Jul 2009 15:49:27 +0100
From:      Matthew Seaman <>
To:        "Aryeh M. Friedman" <>
Subject:   Re: ot: regular expression help
Message-ID:  <>
In-Reply-To: <>
References:  <>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable

Aryeh M. Friedman wrote:
> I am attempting to make (without the perl expansions) a regular=20
> expansion that when used as a delim will split words on any=20
> punction/whitespace character *EXCEPT* "$" (for java people I want to=20
> feed it into something like this:
> for(String foo:input.split([insert regex here])
>    ...

Well, there's no way to say "all foo except bar" using standard regexes, =
you can't use the [:punct:] character class. You'll have to roll your own=


If your input is ASCII then see ispunct(3) for a handy list of all the
ascii punctuation characters.  I guess you'll need a RE something like th=


although that's completely untried, quite likely to not have all the
metacharacters properly escaped (exactly what is or isn't a metacharacter=

depends on the RE implementation you're using) and is probably horribly
confused due to the inclusion of '[' '-' and ']' amongst the characters
matched in the range. =20

If you're using anything other than ascii, then I suspect you're going
to have problems with RE libs anyhow, unless you can somehow use PCRE. =20
The \p{isPunct} and \p{isWhite} escapes for matching unicode punctuation
or whitespace is probably what you need.

Even so, your best choice would probably be to separately check strings
for the presence of $ characters -- maybe transform those $ characters to=

something else -- and then split on any remaining punctuation characters.=



Dr Matthew J Seaman MA, D.Phil.                   7 Priory Courtyard
                                                  Flat 3
PGP:     Ramsgate
                                                  Kent, CT11 9PW

Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

Version: GnuPG v2.0.12 (FreeBSD)
Comment: Using GnuPG with Mozilla -



Want to link to this message? Use this URL: <>