Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Oct 2002 12:15:59 -0500
From:      D J Hawkey Jr <hawkeyd@visi.com>
To:        questions at FreeBSD <freebsd-questions@freebsd.org>
Subject:   OT: regex(3) and POSIX collating sequences
Message-ID:  <20021022121559.A86362@sheol.localdomain>

next in thread | raw e-mail | index | archive | help
Hi.

This is rather off-topic, but as the trouble I'm having is on a FreeBSD
box, I'm hoping you'll excuse me.

What's up with collating sequences and the regcomp(3) function? From the
re_format(7) man page:

     Within a bracket expression, a collating element (a character, a multi-
     character sequence that collates as if it were a single character, or a
     collating-sequence name for either) enclosed in `[.' and `.]' stands for
     the sequence of characters of that collating element.  The sequence is a
     single element of the bracket expression's list.  A bracket expression
     containing a multi-character collating element can thus match more than
     one character, e.g. if the collating sequence includes a `ch' collating
     element, then the RE `[[.ch.]]*c' matches the first five characters of
     `chchcc'.

But darned if I can get it to work:

    $ echo "ZXCV asdf qwer" |sed -e "s/[^[.ZXCV.][.1234.]]/zxcv/"
    sed: 1: "s/[^[.ZXCV.][.1234.]]/zxcv/
    ": RE error: invalid collating element

Foolishness, yes, but it illustrates my problem nicely. I've got a program
that uses REs, and it reports this error when I try to use a "[[.phrase.]]"
bracket syntax. Relevant code example:

    #include <sys/types.h>
    #include <regex.h>

    #define REGCOMP_FLAGS    (REG_EXTENDED | REG_NOSUB)

    regex_t re;
    int result;
    char *phrase = "[^[.ZXCV.][.1234.]]";
    char buffer[256];

    if ((result = regcomp(&re, phrase, REGCOMP_FLAGS)) != 0)
    {
        regerror(result, &re, buffer, sizeof(buffer));
        regfree(&re);

        fprintf(stderr, "regcomp(\"%s\") error: %s\n", phrase, buffer);
    }

This works for everything I've thrown at it except for a "[[.whatever.]]"
bracket expression. regcomp(3) refuses to compile it. The REG_NOSUB is
intentional; I only need to know that a match occurs with regexec(3).

What the devil have I missed? Or, what is an acceptable RE that matches
"anything except "ZXCV" or "1234""?

Please CC: me, I'm not subscribed. Thanks,
Dave

-- 
  ______________________                         ______________________
  \__________________   \    D. J. HAWKEY JR.   /   __________________/
     \________________/\     hawkeyd@visi.com    /\________________/
                      http://www.visi.com/~hawkeyd/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20021022121559.A86362>