Date: Tue, 22 Oct 2002 12:15:59 -0500 From: D J Hawkey Jr <hawkeyd@visi.com> To: questions at FreeBSD <freebsd-questions@freebsd.org> Subject: OT: regex(3) and POSIX collating sequences Message-ID: <20021022121559.A86362@sheol.localdomain>
next in thread | raw e-mail | index | archive | help
Hi. This is rather off-topic, but as the trouble I'm having is on a FreeBSD box, I'm hoping you'll excuse me. What's up with collating sequences and the regcomp(3) function? From the re_format(7) man page: Within a bracket expression, a collating element (a character, a multi- character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in `[.' and `.]' stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression containing a multi-character collating element can thus match more than one character, e.g. if the collating sequence includes a `ch' collating element, then the RE `[[.ch.]]*c' matches the first five characters of `chchcc'. But darned if I can get it to work: $ echo "ZXCV asdf qwer" |sed -e "s/[^[.ZXCV.][.1234.]]/zxcv/" sed: 1: "s/[^[.ZXCV.][.1234.]]/zxcv/ ": RE error: invalid collating element Foolishness, yes, but it illustrates my problem nicely. I've got a program that uses REs, and it reports this error when I try to use a "[[.phrase.]]" bracket syntax. Relevant code example: #include <sys/types.h> #include <regex.h> #define REGCOMP_FLAGS (REG_EXTENDED | REG_NOSUB) regex_t re; int result; char *phrase = "[^[.ZXCV.][.1234.]]"; char buffer[256]; if ((result = regcomp(&re, phrase, REGCOMP_FLAGS)) != 0) { regerror(result, &re, buffer, sizeof(buffer)); regfree(&re); fprintf(stderr, "regcomp(\"%s\") error: %s\n", phrase, buffer); } This works for everything I've thrown at it except for a "[[.whatever.]]" bracket expression. regcomp(3) refuses to compile it. The REG_NOSUB is intentional; I only need to know that a match occurs with regexec(3). What the devil have I missed? Or, what is an acceptable RE that matches "anything except "ZXCV" or "1234""? Please CC: me, I'm not subscribed. Thanks, Dave -- ______________________ ______________________ \__________________ \ D. J. HAWKEY JR. / __________________/ \________________/\ hawkeyd@visi.com /\________________/ http://www.visi.com/~hawkeyd/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021022121559.A86362>