From owner-freebsd-questions@FreeBSD.ORG Tue Jul 7 14:49:41 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C4AC10656B4 for ; Tue, 7 Jul 2009 14:49:41 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from smtp.infracaninophile.co.uk (gate6.infracaninophile.co.uk [IPv6:2001:8b0:151:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 031998FC16 for ; Tue, 7 Jul 2009 14:49:40 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from happy-idiot-talk.infracaninophile.co.uk (localhost [IPv6:::1]) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.14.3/8.14.3) with ESMTP id n67EnYiv003598; Tue, 7 Jul 2009 15:49:34 +0100 (BST) (envelope-from m.seaman@infracaninophile.co.uk) X-DKIM: Sendmail DKIM Filter v2.8.3 smtp.infracaninophile.co.uk n67EnYiv003598 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=infracaninophile.co.uk; s=200708; t=1246978174; bh=olGnYaMt8iehyviRtNrLdE62+JPk3i6BNA35Fytjkn0=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Cc:Content-Type:Date:From:In-Reply-To: Message-ID:Mime-Version:References:To; z=Message-ID:=20<4A536077.1080002@infracaninophile.co.uk>|Date:=20T ue,=2007=20Jul=202009=2015:49:27=20+0100|From:=20Matthew=20Seaman= 20|Organization:=20Infracaninophi le|User-Agent:=20Thunderbird=202.0.0.22=20(X11/20090625)|MIME-Vers ion:=201.0|To:=20"Aryeh=20M.=20Friedman"=20|CC:=20freebsd-questions@freebsd.org|Subject:=20Re:=20ot:=20reg ular=20expression=20help|References:=20<4A5353D1.5010807@gmail.com >|In-Reply-To:=20<4A5353D1.5010807@gmail.com>|X-Enigmail-Version:= 200.95.6|Content-Type:=20multipart/signed=3B=20micalg=3Dpgp-sha256 =3B=0D=0A=20protocol=3D"application/pgp-signature"=3B=0D=0A=20boun dary=3D"------------enig795AB6B7249CDE04F2C1D9BB"; b=LtbaJsGahf+OhEiY7SMxSfNaTc0616N6V1Godnpm0Y9QI5G3+aVB6el0cKsfys6HS pMhm2IyRLIlP+u1glKR1wRw9oonjXhISnqwFtES+nXX01vJH5rkIjpxc6OptY6HDdU TPXANQr7mBNXUlg0wb3RDNMDArDUBuAN+StzFdYo= X-Authentication-Warning: happy-idiot-talk.infracaninophile.co.uk: Host localhost [IPv6:::1] claimed to be happy-idiot-talk.infracaninophile.co.uk Message-ID: <4A536077.1080002@infracaninophile.co.uk> Date: Tue, 07 Jul 2009 15:49:27 +0100 From: Matthew Seaman Organization: Infracaninophile User-Agent: Thunderbird 2.0.0.22 (X11/20090625) MIME-Version: 1.0 To: "Aryeh M. Friedman" References: <4A5353D1.5010807@gmail.com> In-Reply-To: <4A5353D1.5010807@gmail.com> X-Enigmail-Version: 0.95.6 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig795AB6B7249CDE04F2C1D9BB" X-Virus-Scanned: clamav-milter 0.95.2 at happy-idiot-talk.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VERIFIED,NO_RELAYS autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on happy-idiot-talk.infracaninophile.co.uk Cc: freebsd-questions@freebsd.org Subject: Re: ot: regular expression help X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2009 14:49:42 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig795AB6B7249CDE04F2C1D9BB Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Aryeh M. Friedman wrote: > I am attempting to make (without the perl expansions) a regular=20 > expansion that when used as a delim will split words on any=20 > punction/whitespace character *EXCEPT* "$" (for java people I want to=20 > feed it into something like this: >=20 > for(String foo:input.split([insert regex here]) > ... Well, there's no way to say "all foo except bar" using standard regexes, = so you can't use the [:punct:] character class. You'll have to roll your own= class. If your input is ASCII then see ispunct(3) for a handy list of all the ascii punctuation characters. I guess you'll need a RE something like th= is: []!"#%&'\(\)\*\+,\./:;<=3D>?@[\\^_`{\|}~-[:space:]]+ although that's completely untried, quite likely to not have all the metacharacters properly escaped (exactly what is or isn't a metacharacter= depends on the RE implementation you're using) and is probably horribly confused due to the inclusion of '[' '-' and ']' amongst the characters matched in the range. =20 If you're using anything other than ascii, then I suspect you're going to have problems with RE libs anyhow, unless you can somehow use PCRE. =20 The \p{isPunct} and \p{isWhite} escapes for matching unicode punctuation or whitespace is probably what you need. Even so, your best choice would probably be to separately check strings for the presence of $ characters -- maybe transform those $ characters to= something else -- and then split on any remaining punctuation characters.= Cheers, Matthew --=20 Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate Kent, CT11 9PW --------------enig795AB6B7249CDE04F2C1D9BB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREIAAYFAkpTYH4ACgkQ8Mjk52CukIw5xwCfTeecUAE0q+HvwhR4S1dxKZ0+ wzoAn2Jpv0XRW9LLB0zkMEmuIOD+MCfB =zVuU -----END PGP SIGNATURE----- --------------enig795AB6B7249CDE04F2C1D9BB--