Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2014 20:10:11 -0500
From:      Pedro Giffuni <pfg@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Stefan Farfeleder <stefanf@freebsd.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r267675 - head/lib/libc/regex
Message-ID:  <B74D36BF-1BF7-40E4-9583-76E6A917ECDC@freebsd.org>
In-Reply-To: <20140621072634.I921@besplex.bde.org>
References:  <201406201529.s5KFTAEB068038@svn.freebsd.org> <20140620182311.GA1214@mole.fafoe.narf.at> <53A48D62.4060801@FreeBSD.org> <20140621072634.I921@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Il giorno 20/giu/2014, alle ore 16:59, Bruce Evans =
<brde@optusnet.com.au> ha scritto:

>=20
>> El 6/20/2014 1:23 PM, Stefan Farfeleder escribi=F3:
>>> On Fri, Jun 20, 2014 at 03:29:10PM +0000, Pedro F. Giffuni wrote:
>>>> ...
>>>> Log:
>>>>   regex: Make use of reallocf().
>>>>   Use of reallocf is useful in libraries as we are not certain the
>>>>   application will exit after NULL.
>>>> ...
>>>>   This somewhat reduces portability but if since you are building
>>>>   this as part of libc it is likely you have our non-standard
>>>>   reallocf(3) already.
>=20
> It also somewhat increases namespace pollution.
>=20
>>>> Modified: head/lib/libc/regex/regcomp.c
>>>> =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
>>>> --- head/lib/libc/regex/regcomp.c	Fri Jun 20 13:26:49 2014 =
(r267674)
>>>> +++ head/lib/libc/regex/regcomp.c	Fri Jun 20 15:29:09 2014 =
(r267675)
>>>> @@ -1111,7 +1111,7 @@ allocset(struct parse *p)
>>>>  {
>>>>  	cset *cs, *ncs;
>>>> -	ncs =3D realloc(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
>>>> +	ncs =3D reallocf(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
>>>>  	if (ncs =3D=3D NULL) {
>>>>  		SETERROR(REG_ESPACE);
>>>>  		return (NULL);
>>>> @@ -1174,7 +1174,7 @@ CHadd(struct parse *p, cset *cs, wint_t
>>>>  	if (ch < NC)
>>>>  		cs->bmp[ch >> 3] |=3D 1 << (ch & 7);
>>>>  	else {
>>>> -		newwides =3D realloc(cs->wides, (cs->nwides + 1) *
>>>> +		newwides =3D reallocf(cs->wides, (cs->nwides + 1) *
>>>>  		    sizeof(*cs->wides));
>>>>  		if (newwides =3D=3D NULL) {
>>>>  			SETERROR(REG_ESPACE);
>>> I don't think these changes are OK. If reallocf() fails here, the
>>> cs->wides pointer will be freed and later freeset() will call
>>> free(cs->wides), probably crashing. The other cases are most =
probably
>>> similar though I haven't examined them closely.
>>=20
>> OK ...
>>=20
>> I don't think there is any problem:
>>=20
>> If reallocf fails, newwides will be set to NULL and if free() is =
called it doesn't do anything when the argument is NULL.
>>=20
>> Also freeset() is meant to be called to "free a now-unused set" and =
it is not called within the library. I would think using a value when =
the allocation has failed is a much more serious issue than attempting =
to fail to free it after trying to use it. ;-).
>=20
> It doesn't look OK to me.  It turns the careful use of newwides, etc.,
> into garbage, and leaves a garbage pointer in cs->wides, etc., to =
cause
> problems later.  It might work without this careful use of a temporary
> variable, but even that is not clear.  Consider:
>=20
> - start with a consistent data structure with some pointers to =
malloced
>  storage in it; foo->p say
> - simple reallocf() allocation strategy:
>    foo->p =3D reallocf(foo->p, size)
>    if (foo->p =3D=3D NULL)
>      return (ERROR);
>  This leaves the data structure consistent if and only if the only =
thing
>  in it relevant to p is p itself, with foo->p serving as a flag for
>  validity.
> - more complicated reallocf() allocation strategy:
>    foo->p =3D reallocf(foo->p, size)
>    if (foo->p =3D=3D NULL) {
>      foo->psize =3D 0;
>      foo->pvalid =3D 0;
>      foo->foovalid =3D 0;
>      ...
>      return (ERROR);
>    }
>  This still can't do things like cleaning up what foo->p points to, =
since
>  reallocf() clobbered that.
> This shows that reallocf() is only useful in simple cases.  In =
general, you
> must keep the pointer valid to clean things up:
>    newp =3D reallocf(foo->p, size)
>    if (newp =3D=3D NULL) {
>      for (i =3D 0; i < foo->psize; i++)
> 	free(foo->p[i]);
>      foo->p =3D NULL;
>      foo->psize =3D 0;
>      foo->pvalid =3D 0;
>      foo->foovalid =3D 0;
>      ...
>      return (ERROR);
>    }
>=20
> Of course, malloc never fails so all this error checking is more a =
source
> of complexity and bugs than useful.
>=20

This is interesting, however I have problems understanding that a =
library would let you ignore an error condition and pass you garbage as =
part of it=92s normal behavior. Plus being a standard library I am not =
sure if you can count on other implementations doing the same ...


> Re-quoting/recovering some context:
>=20
> @ Modified: head/lib/libc/regex/regcomp.c
> @ =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> @ --- head/lib/libc/regex/regcomp.c	Fri Jun 20 13:26:49 2014	=
(r267674)
> @ +++ head/lib/libc/regex/regcomp.c	Fri Jun 20 15:29:09 2014	=
(r267675)
> @ @@ -1111,7 +1111,7 @@ allocset(struct parse *p)
> @  {
> @  	cset *cs, *ncs;
> @ @ -	ncs =3D realloc(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
> @ +	ncs =3D reallocf(p->g->sets, (p->g->ncsets + 1) * sizeof(*ncs));
> @  	if (ncs =3D=3D NULL) {
> @  		SETERROR(REG_ESPACE);
> @  		return (NULL);
>=20
> This code shows that even Henry didn't know how to program memory =
allocation
> in 1988.  The code was much worse in 4.4BSD-Lite2:
>=20
> old@ 		if (p->g->sets =3D=3D NULL)
> old@ 			p->g->sets =3D (cset *)malloc(nc * =
sizeof(cset));
> old@ 		else
> old@ 			p->g->sets =3D (cset *)realloc((char =
*)p->g->sets,
> old@ 							nc * =
sizeof(cset));
>=20
> This just leaked memory.  It was improved by assigning to ncs and by
> removing the pre-C90 and C++ casts.  Now it is unimproved by turning =
the
> careful use of ncs into garbage and leaving garbage in *p.
>=20
> old@ 		if (p->g->setbits =3D=3D NULL)
> old@ 			p->g->setbits =3D (uch *)malloc(nbytes);
> old@ 		else {
> old@ 			p->g->setbits =3D (uch *)realloc((char =
*)p->g->setbits,
> old@ 								nbytes);
> old@ 			/* xxx this isn't right if setbits is now NULL =
*/
> old@ 			for (i =3D 0; i < no; i++)
> old@ 				p->g->sets[i].ptr =3D p->g->setbits + =
css*(i/CHAR_BIT);
> old@ 		}
>=20
> Honest memory mismanagement.  It just assumes that malloc() and =
realloc()
> can't fail.  In -current, the function is much simpler and doesn't =
have
> this code.  I think part of the simplication is to use realloc() of =
NULL
> instead of malloc() to start up.
>=20
> @ @@ -1174,7 +1174,7 @@ CHadd(struct parse *p, cset *cs, wint_t @  	=
if (ch < NC)
> @  		cs->bmp[ch >> 3] |=3D 1 << (ch & 7);
> @  	else {
> @ -		newwides =3D realloc(cs->wides, (cs->nwides + 1) *
> @ +		newwides =3D reallocf(cs->wides, (cs->nwides + 1) *
> @  		    sizeof(*cs->wides));
> @  		if (newwides =3D=3D NULL) {
> @  			SETERROR(REG_ESPACE);
> @ @@ -1203,7 +1203,7 @@ CHaddrange(struct parse *p, cset *cs, wi
> @  		CHadd(p, cs, min);
> @  	if (min >=3D max)
> @  		return;
> @ -	newranges =3D realloc(cs->ranges, (cs->nranges + 1) *
> @ +	newranges =3D reallocf(cs->ranges, (cs->nranges + 1) *
> @  	    sizeof(*cs->ranges));
> @  	if (newranges =3D=3D NULL) {
> @  		SETERROR(REG_ESPACE);
> @ @@ -1227,7 +1227,7 @@ CHaddtype(struct parse *p, cset *cs, wct
> @  	for (i =3D 0; i < NC; i++)
> @  		if (iswctype(i, wct))
> @  			CHadd(p, cs, i);
> @ -	newtypes =3D realloc(cs->types, (cs->ntypes + 1) *
> @ +	newtypes =3D reallocf(cs->types, (cs->ntypes + 1) *
> @  	    sizeof(*cs->types));
> @  	if (newtypes =3D=3D NULL) {
> @  		SETERROR(REG_ESPACE);
> @ @@ -1350,7 +1350,7 @@ enlarge(struct parse *p, sopno size)
> @  	if (p->ssize >=3D size)
> @  		return 1;
> @ @ -	sp =3D (sop *)realloc(p->strip, size*sizeof(sop));
> @ +	sp =3D (sop *)reallocf(p->strip, size*sizeof(sop));
> @  	if (sp =3D=3D NULL) {
> @  		SETERROR(REG_ESPACE);
> @  		return 0;
>=20
> Similarly in 4 more cases, except most of them weren't in old =
versions.
>=20
> @ @@ -1368,7 +1368,7 @@ static void
> @  stripsnug(struct parse *p, struct re_guts *g)
> @  {
> @  	g->nstates =3D p->slen;
> @ -	g->strip =3D (sop *)realloc((char *)p->strip, p->slen * =
sizeof(sop));
> @ +	g->strip =3D (sop *)reallocf((char *)p->strip, p->slen * =
sizeof(sop));
> @  	if (g->strip =3D=3D NULL) {
> @  		SETERROR(REG_ESPACE);
> @  		g->strip =3D p->strip;
>=20
> This was the only realloc() that had not been cleaned up, so using
> reallocf() has a chance of improving it.  It still has the pre-C90 and
> C++ casts.  But there is an obvious new bug visible without reading =
more
> than the patch context:
> - the local variable might not be needed now, since the variable =
assigned
>  to, g->strip, is different from the variable realloced, p->strip
> - on realloc failure, p->strip becomes invalid
> - the error handling is completely broken.  It points g->strip at the =
old
>  (now deallocated) storage.  This is immediate use of the garbage =
pointer
>  created by using reallocf().
>=20
> Clearly, only realloc() code of the form "p =3D realloc(p, size);", =
where the
> pointer assigned to is the same as the pointer realloced, can be =
improved
> by blindly converting it to use reallocf().
>=20

Ah yes, my fault there.

> The last function is most interesting.  It seems to be to reduce the =
size.
> So an allocation failure is even less likely than usually, except lots =
of
> small allocations and reallocations are more likely to waste space =
than
> save it.  But if failure occurs it is almost harmless, and the error
> handling of keeping using the previous allocation was good.  Maybe =
Henry
> knew how to program memory allocation after all.  (The last function
> survived previously-unchanged from 4.4BSDLite-2 except for removing =
K&R
> support and 'register=92.

OK, it=92s pretty tricky, but I agree that reallocf() simply won=92t do =
anything here. I will revert.

Thanks!

Pedro.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B74D36BF-1BF7-40E4-9583-76E6A917ECDC>