Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Nov 2017 14:35:29 -0500
From:      mfv <mfv@bway.net>
To:        "James B. Byrne via freebsd-questions" <freebsd-questions@freebsd.org>
Cc:        byrnejb@harte-lyne.ca
Subject:   Re: Regex character and collation class documentation
Message-ID:  <20171113143529.572a4b76@gecko4>
In-Reply-To: <b0835f510ae66a82808725fa8ae8c7d0.squirrel@webmail.harte-lyne.ca>
References:  <mailman.90.1510315202.51235.freebsd-questions@freebsd.org> <68be33ca89aab31e068253dffe129021.squirrel@webmail.harte-lyne.ca> <20171111104543.11279fb7@gecko4> <b0835f510ae66a82808725fa8ae8c7d0.squirrel@webmail.harte-lyne.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Mon, 2017-11-13 at 09:09 "James B. Byrne via freebsd-questions"
> <freebsd-questions@freebsd.org> wrote:
>
>On Sat, November 11, 2017 10:45, mfv wrote:
>
>> As a result I did some more digging and discovered that the valid
>> names for [[.<name>.]] are contained in /usr/src/lib/libc/regex
>> /cname.h.  The names in "man ascii" are a subset of cname.h.
>>
>> It also explains why [[.SP.]] generates an error message.  Even
>> though SP is listed in "man ascii" it is not specified in cname.h.
>>
>> Cheers ...
>>
>> Marek
>>  
>
>A file named cname.h does not even exist on my system.  At least if it
>does then find does not report it.  On the other hand, this file:
>
>/usr/local/include/nstring.h
>
>contains this:
>
>/* The standard C library routines isdigit(), for some weird
>   historical reason, does not take a character (type 'char') as its
>   argument.  Instead it takes an integer.  When the integer is a whole
>   number, it represents a character in the obvious way using the local
>   character set encoding.  When the integer is negative, the results
>   are undefined.
>
>   Passing a character to isdigit(), which expects an integer,
>   results in isdigit() sometimes getting a negative number.
>
>   On some systems, when the integer is negative, it represents exactly
>   the character you want it to anyway (e.g. -1 is the character that
>   is encoded 0xFF).  But on others, it does not.
>
>   (The same is true of other routines like isdigit()).
>
>   Therefore, we have the substitutes for isdigit() etc. that take an
>   actual character (type 'char') as an argument.
>*/
>
>#define ISALNUM(C) (isalnum((unsigned char)(C)))
>#define ISALPHA(C) (isalpha((unsigned char)(C)))
>#define ISCNTRL(C) (iscntrl((unsigned char)(C)))
>#define ISDIGIT(C) (isdigit((unsigned char)(C)))
>#define ISGRAPH(C) (isgraph((unsigned char)(C)))
>#define ISLOWER(C) (islower((unsigned char)(C)))
>#define ISPRINT(C) (isprint((unsigned char)(C)))
>#define ISPUNCT(C) (ispunct((unsigned char)(C)))
>#define ISSPACE(C) (isspace((unsigned char)(C)))
>#define ISUPPER(C) (isupper((unsigned char)(C)))
>#define ISXDIGIT(C) (isxdigit((unsigned char)(C)))
>#define TOUPPER(C) ((char)toupper((unsigned char)(C)))
>
>But nowhere can I find 'isnul' or ISNUL'.
>
>
>

Hello James,

Do you have /usr/src on your system?  All the directories
under /usr/src are the source code used to build FreeBSD on one's own
computer.

If not, here is a link to the GIT repository where the source code
for /usr/src/lib/libc/regex/cname.h can be seen:

 https://github.com/freebsd/freebsd/blob/master/lib/libc/regex/cname.h

All names listed on the left can be used in sed to match the character
to the right.  For example, /[[.asterisk.]]{3}/ matches ***.

Some of the characters have two names.  For example, the octal control
character '\007' is represented by 'BEL' as well as 'alert'.

I do not know the purpose of /usr/local/include/nstring.h.  As such I
can not shed any light on that particular file.

Cheers ...

Marek



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171113143529.572a4b76>