Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Apr 2016 23:57:34 +0200
From:      Marius Strobl <marius@freebsd.org>
To:        Shigeharu TAKENO <shige@iee.niit.ac.jp>, gabor@FreeBSD.org
Cc:        freebsd-sparc64@freebsd.org, Joerg Wunsch <joerg_wunsch@uriah.heep.sax.de>
Subject:   Re: /usr/bin/sort may be incorrect
Message-ID:  <20160411215734.GA2101@alchemy.franken.de>
In-Reply-To: <201603311122.u2VBMPam007017@pc98tak.iee.niit.ac.jp>
References:  <201603250229.u2P2TVLp003567@pc98tak.iee.niit.ac.jp> <201603310446.u2V4kLGM003303@pc98tak.iee.niit.ac.jp> <20160331072303.GP53011@uriah.heep.sax.de> <201603311122.u2VBMPam007017@pc98tak.iee.niit.ac.jp>

next in thread | previous in thread | raw e-mail | index | archive | help

--/NkBOFFp2J2Af1nK
Content-Type: multipart/mixed; boundary="qMm9M+Fa2AknHoGS"
Content-Disposition: inline


--qMm9M+Fa2AknHoGS
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Mar 31, 2016 at 08:22:25PM +0900, Shigeharu TAKENO wrote:
> shige 03/31 2016
> ----------------
>=20
> Thank you for your reply.
>=20
> Joerg Wunsch wrote:
>=20
> | struct key_value
> | {
> |    struct bwstring *k;
> |    struct key_hint hint[];
> | };
> |=20
> | If that works for you, too, I think it would be the preferrable way to
> | write it.
>=20
> Unfortunately this does not fix the problem.
>=20
>=20
> | > The k field of key_value may be overwritten by the hint field
> | > in numcoll_impl(), gnumcoll() and monthcoll() (coll.c), and the
> | > pointer value of k may change to incorrect value.
> |=20
> | Are you saying that something like
> |=20
> | struct key_value *kw;
> |=20
> | ...
> |=20
> |    kw->hint[-1] =3D something;
> |=20
> | happens?  That would certainly be a bug in the code then that ought to
> | be fixed, rather than worked around.
>=20
> I tested under your suggestion "struct key_hint hint[]", which=20
> behaves as the same of default sort command.
>=20
> % ( echo 2 5 8 ; echo 2 6 5 ) | sort -n +0 -1 +1 -2 +2 -3
>=20
>=20
> In key_coll(struct keys_array *ps1, struct keys_array *ps2,=20
>   size_t offset) (in coll.c), initial pointer values are the
> followings:
>=20
>  &(ps1->key[0]) =3D 0x40c140f8
>  &(ps1->key[1]) =3D 0x40c14100
>  &(ps1->key[2]) =3D 0x40c14108
>  &(ps2->key[0]) =3D 0x40c14088
>  &(ps2->key[1]) =3D 0x40c14090
>  &(ps2->key[2]) =3D 0x40c14198
>  (the pointer repeat is only 8 byte.)
>=20
>  ps1->key[0].k =3D 0x40c060e0
>  ps1->key[1].k =3D 0x40c060f0
>  ps1->key[2].k =3D 0x40c06100
>  ps2->key[0].k =3D 0x40c060a0
>  ps2->key[1].k =3D 0x40c060b0
>  ps2->key[2].k =3D 0x40c060c0
>=20
> key_coll() calls sm->func() =3D numcoll(), and it uses
> numcoll_impl(struct key_value *kv1, struct key_value *kv2) with
> ps1->key[i] and ps2->key[i]. The function numcoll_impl() uses k
> field and hint field of struct key_value.
>=20
>=20
> For i =3D 0, the k field pointers of arguments kv1 and kv2 of=20
> numcoll_impl() are correct:
>=20
>  kv1->k =3D 0x40c060e0, kv2->k =3D 0x40c060a0
>=20
> but the hint field pointers of kv1, kv2 are doughtful:
>=20
>  &(kv1->hint) =3D 0x40c14100, &(kv2->hint) =3D 0x40c14090
>=20
> which are the same value of &(ps1->key[1]) and &(ps2->key[1]).
>=20
>=20
> And for i =3D 1, the k field pointers of arguments kv1 and kv2=20
> become incorrect:
>=20
>  kv1->k =3D 0x140c060f0, kv2->k =3D 0x140c060b0
>=20
> which are added 0x100000000 to the original pointer value.=20
> The sort command stops where it uses the value.
>=20
>=20
> If we use the definition "struct key_hint hint[1]", the repeat
> of pointers of ps1->key[i] becomes 32 byte, and incorrect changes=20
> of pointers do not occur.
>=20
>   &(ps1->key[0]) =3D 0x40c08208
>   &(ps1->key[1]) =3D 0x40c08228
>   &(ps1->key[2]) =3D 0x40c08248
>=20

AFAICT is sort(1) relying on undefined behavior. It builds up an
array of struct keys_array, which in turn has a zero length array
(apparently unnecessary) of struct key_value, which in turn has
a zero length array of struct key_hint. While sort(1) takes care
to allocate space for the latter based on need_hint, it relies
on the compiler to determine the correct offsets of the struct
key_value elements within the array of struct keys_array. Given
that the compiler isn't aware of the actual size - for essentially
the same reason zero length arrays also don't contribute to the
sizeof() of such a struct - of struct key_value, that doesn't
work, i. e. struct key_hint isn't accounted for in the offset
calculation leading to the corruption described above.
I'm not exactly sure why this bug apparently doesn't affect x86
but I think that's due to its little-endian addressing avoiding
corruption in practice; that's somewhat hard to think through
given that struct bwstring and the use thereof are a similarly
=2E.. interesting constructs.
Anyway, using an accessor function for determining the correct
offsets of struct key_value elements within the array of struct
keys_array as in the attached patch fixes the problem for me.
However, I suspect that the space savings by only allocating
memory for struct key_hint when needed don't really justify
all the hackery involved in that approach. At least, changing
struct key_hint to be a fixed member of struct key_value and
removing all parts hanging off from need_hint would make the
code a lot cleaner and readable.

Gabor, what do you think?

Marius


--qMm9M+Fa2AknHoGS
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="sort.diff"
Content-Transfer-Encoding: quoted-printable

Index: coll.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- coll.c	(revision 297803)
+++ coll.c	(working copy)
@@ -105,14 +105,29 @@
 {
=20
 	if (ka) {
-		for (size_t i =3D 0; i < keys_num; ++i)
-			if (ka->key[i].k && ka->key[i].k !=3D s)
-				bwsfree(ka->key[i].k);
+		for (size_t i =3D 0; i < keys_num; ++i) {
+			const struct key_value *kv;
+
+			kv =3D get_key_from_keys_array(ka, i);
+			if (kv->k && kv->k !=3D s)
+				bwsfree(kv->k);
+		}
 		memset(ka, 0, keys_array_size());
 	}
 }
=20
 /*
+ * Get pointer to a key value in the keys set
+ */
+struct key_value *
+get_key_from_keys_array(struct keys_array *ka, size_t ind)
+{
+
+	return ((struct key_value *)((caddr_t)&(ka->key[ind]) +
+	    (key_hint_size() * ind)));
+}
+
+/*
  * Set value of a key in the keys set
  */
 void
@@ -122,7 +137,7 @@
 	if (ka && keys_num > ind) {
 		struct key_value *kv;
=20
-		kv =3D &(ka->key[ind]);
+		kv =3D get_key_from_keys_array(ka, ind);
=20
 		if (kv->k && kv->k !=3D s)
 			bwsfree(kv->k);
@@ -156,9 +171,9 @@
 		if (si->str)
 			ret +=3D bws_memsize(si->str);
 		for (size_t i =3D 0; i < keys_num; ++i) {
-			struct key_value *kv;
+			const struct key_value *kv;
=20
-			kv =3D &(si->ka.key[i]);
+			kv =3D get_key_from_keys_array(&si->ka, i);
=20
 			if (kv->k !=3D si->str)
 				ret +=3D bws_memsize(kv->k);
@@ -475,16 +490,19 @@
 int
 key_coll(struct keys_array *ps1, struct keys_array *ps2, size_t offset)
 {
+	struct key_value *kv1, *kv2;
 	struct sort_mods *sm;
 	int res =3D 0;
=20
 	for (size_t i =3D 0; i < keys_num; ++i) {
+		kv1 =3D get_key_from_keys_array(ps1, i);
+		kv2 =3D get_key_from_keys_array(ps2, i);
 		sm =3D &(keys[i].sm);
=20
 		if (sm->rflag)
-			res =3D sm->func(&(ps2->key[i]), &(ps1->key[i]), offset);
+			res =3D sm->func(kv2, kv1, offset);
 		else
-			res =3D sm->func(&(ps1->key[i]), &(ps2->key[i]), offset);
+			res =3D sm->func(kv1, kv2, offset);
=20
 		if (res)
 			break;
Index: coll.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- coll.h	(revision 297803)
+++ coll.h	(working copy)
@@ -91,7 +91,7 @@
 {
 	struct bwstring		*k; /* key string */
 	struct key_hint		 hint[0]; /* key sort hint */
-};
+} __packed;
=20
 /*
  * Set of keys container object.
@@ -146,6 +146,7 @@
=20
 struct keys_array *keys_array_alloc(void);
 size_t keys_array_size(void);
+struct key_value *get_key_from_keys_array(struct keys_array *ka, size_t in=
d);
 void set_key_on_keys_array(struct keys_array *ka, struct bwstring *s, size=
_t ind);
 void clean_keys_array(const struct bwstring *s, struct keys_array *ka);
=20
Index: radixsort.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- radixsort.c	(revision 297803)
+++ radixsort.c	(working copy)
@@ -243,9 +243,11 @@
 static inline int
 get_wc_index(struct sort_list_item *sli, size_t level)
 {
+	const struct key_value *kv;
 	const struct bwstring *bws;
=20
-	bws =3D sli->ka.key[0].k;
+	kv =3D get_key_from_keys_array(&sli->ka, 0);
+	bws =3D kv->k;
=20
 	if ((BWSLEN(bws) > level))
 		return (unsigned char) BWS_GET(bws,level);

--qMm9M+Fa2AknHoGS--

--/NkBOFFp2J2Af1nK
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQJ8BAEBCgBmBQJXDB3JXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQ1M0Q5QjQzNTVGOTU5ODBGQzVENzZCMDIy
MEI3MERFMTNGMUQxRTRGAAoJECC3DeE/HR5PQYgP/jfxi7sCgIH7Nnn2ylnM3Ixx
lLAYZxlqsdMp/cUAtzP1MMWtYjdobhgiXDY9+289Azf2I3S8bC6lh6STqRsP2dp3
Ouh/YkSc0sG+TeuBo28wv7EnGk4ubEDyhMLBZg/J6wouTo4QI2ioj3Lq77AJHdmo
4+UO5OC5npBRM37ICOirhSdqbEkjIdYgvdY1ZgZ2TpY8u0P/oJ0Prra1imBJ6dpL
nTC6ZykAyoNvwOc9mWxjXmMQcCIDYdeljCjtSSZhGXqXiqy+FDiFVvKtDWngPcRo
sfpVEVcgmjto7Tn7QncVAz7n9JYPGyeZaxLFyGGn3nDTpR8Eu4X9okpVcbQi67Hq
+/3ZoB8Hm3burBENNo0h860PUymC+8an24iHfmZ/S1nOoVkgEvCddrFSCZNbtVdR
04CeBzE+Qlyx0s+0YXovfuvZZ1Mzz4GAo8z6Psj3FgJ0Nij4y0P9cnP04B1phYFA
MkmOgdJu0br3iz4RkAzdgL+TN/6z9nSJUBpnADukWTizglf0PhgCj7ztUgH7A05H
YKD5sB5bYp3dhrTM1kGqawDljq3u9dqIEoQfglpjiStBePAwOo45RTrx4Zh16b62
toqrwn4hZnXlWkwPUpU4alPM4/jJYACqOhoAoR/qHP0d9Vc94c6kSIUoJKeDtDRV
vlpPci8CMMdMg9HehT8z
=umiB
-----END PGP SIGNATURE-----

--/NkBOFFp2J2Af1nK--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160411215734.GA2101>