From owner-freebsd-questions Tue Oct 22 10:16: 3 2002 Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 985A337B401 for ; Tue, 22 Oct 2002 10:16:01 -0700 (PDT) Received: from bodb.mc.mpls.visi.com (bodb.mc.mpls.visi.com [208.42.156.104]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2676A43E65 for ; Tue, 22 Oct 2002 10:16:01 -0700 (PDT) (envelope-from hawkeyd@visi.com) Received: from sheol.localdomain (hawkeyd-fw.dsl.visi.com [208.42.101.193]) by bodb.mc.mpls.visi.com (Postfix) with ESMTP id 4BC7749E1 for ; Tue, 22 Oct 2002 12:16:00 -0500 (CDT) Received: (from hawkeyd@localhost) by sheol.localdomain (8.11.6/8.11.6) id g9MHFxs86413 for freebsd-questions@freebsd.org; Tue, 22 Oct 2002 12:15:59 -0500 (CDT) (envelope-from hawkeyd) Date: Tue, 22 Oct 2002 12:15:59 -0500 From: D J Hawkey Jr To: questions at FreeBSD Subject: OT: regex(3) and POSIX collating sequences Message-ID: <20021022121559.A86362@sheol.localdomain> Reply-To: hawkeyd@visi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hi. This is rather off-topic, but as the trouble I'm having is on a FreeBSD box, I'm hoping you'll excuse me. What's up with collating sequences and the regcomp(3) function? From the re_format(7) man page: Within a bracket expression, a collating element (a character, a multi- character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in `[.' and `.]' stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression containing a multi-character collating element can thus match more than one character, e.g. if the collating sequence includes a `ch' collating element, then the RE `[[.ch.]]*c' matches the first five characters of `chchcc'. But darned if I can get it to work: $ echo "ZXCV asdf qwer" |sed -e "s/[^[.ZXCV.][.1234.]]/zxcv/" sed: 1: "s/[^[.ZXCV.][.1234.]]/zxcv/ ": RE error: invalid collating element Foolishness, yes, but it illustrates my problem nicely. I've got a program that uses REs, and it reports this error when I try to use a "[[.phrase.]]" bracket syntax. Relevant code example: #include #include #define REGCOMP_FLAGS (REG_EXTENDED | REG_NOSUB) regex_t re; int result; char *phrase = "[^[.ZXCV.][.1234.]]"; char buffer[256]; if ((result = regcomp(&re, phrase, REGCOMP_FLAGS)) != 0) { regerror(result, &re, buffer, sizeof(buffer)); regfree(&re); fprintf(stderr, "regcomp(\"%s\") error: %s\n", phrase, buffer); } This works for everything I've thrown at it except for a "[[.whatever.]]" bracket expression. regcomp(3) refuses to compile it. The REG_NOSUB is intentional; I only need to know that a match occurs with regexec(3). What the devil have I missed? Or, what is an acceptable RE that matches "anything except "ZXCV" or "1234""? Please CC: me, I'm not subscribed. Thanks, Dave -- ______________________ ______________________ \__________________ \ D. J. HAWKEY JR. / __________________/ \________________/\ hawkeyd@visi.com /\________________/ http://www.visi.com/~hawkeyd/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message