From owner-freebsd-arch@FreeBSD.ORG Fri Jul 5 19:52:41 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B7735AAC for ; Fri, 5 Jul 2013 19:52:41 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id 2362B1FD4 for ; Fri, 5 Jul 2013 19:52:39 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 987E3209; Fri, 5 Jul 2013 21:47:49 +0200 (CEST) Date: Fri, 5 Jul 2013 21:52:55 +0200 From: Pawel Jakub Dawidek To: Poul-Henning Kamp Subject: Re: General purpose library for name/value pairs. Message-ID: <20130705195255.GB25842@garage.freebsd.pl> References: <20130704215329.GG1402@garage.freebsd.pl> <4818.1373008073@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PmA2V3Z32TCmWXqI" Content-Disposition: inline In-Reply-To: <4818.1373008073@critter.freebsd.dk> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jul 2013 19:52:41 -0000 --PmA2V3Z32TCmWXqI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 05, 2013 at 07:07:53AM +0000, Poul-Henning Kamp wrote: > In message <20130704215329.GG1402@garage.freebsd.pl>, Pawel Jakub Dawidek= write > s: >=20 > >Currently we have plenty of random methods to pass data around: > >- nmount(2) implements its own name/value pair approach, > >- GEOM uses XML to pass data to userland, > >- various sysctls and ioctls use binary structures to export data to > > userland. >=20 > You even overlooked a number of other marshalling implementations, > NETGRAPH for instance. >=20 > As should be obvious from the above list, where I'm guilty of most > of it, I think this is a fundamentally a good and overdue idea. >=20 > I am not entirely convinced that one-size-fits-all is possible, but > a name/value list is basically what I ended up with both in nmount() > and GEOM::gctl. >=20 > Exporting complex state with XML is very hard to beat when it comes > to code complexity in the kernel. >=20 > For one thing, you'd need nested n/v lists which is where I decided > in GEOM that XML was the way to go. I do support nesting, but with a limit to avoid too deep recursion during nvlist_pack/unpack. I haven't said that, but my plan is not to replace existing mechanisms, ie. XML in GEOM or even nmount(2). This would be hard as we would need to retain backward compatibility anyway. My main goal is to have something for use with new code. If someone will be willing to replace existing formats, that's fine be me, but not my main goal. > Also, exporting performance counters via mmap(2) (see: devstat) is > far superior to anything else, because it means no-syscall high-speed > monitoring is possible. Yes, libnv is not for use in very-performance-sensitive cases. > ifconfig(8) and netstat(1) could really use a total rewrite along > those lines. >=20 > So, yes I think your n/v idea has a lot of merits, but it's not > going to be The Final Solution, and you should not force it on > problems where we have better solutions. I'm not planning to:) > Implementation issues: >=20 > Are duplicate names allowed, and if so, do they keep stable order ? Duplicates are allowed and the ordering is stable unless one of those flags is specified: /* Duplicated names are not allowed. */ #define NV_FLAG_UNIQUE_NAME 0x01 /* Duplicated names of the same type are not allowed. */ #define NV_FLAG_UNIQUE_NAME_TYPE 0x02 To be honest I'd much prefer not to support duplicates, because arrays of values are supported as well as nesting. Not supporting duplicates would simplify implementation a bit. > Otherwise we will need some other API-convenient form of array-simulation, > for instance for variable number of slices in GEOM slicers etc. The API does support arrays. All nvlist_add_() functions: void nvlist_add_nvpair(nvlist_t *nvl, nvpair_t *nvp); void nvlist_add_null(nvlist_t *nvl, const char *name); void nvlist_add_bool(nvlist_t *nvl, const char *name, bool value); void nvlist_add_int8(nvlist_t *nvl, const char *name, int8_t value); void nvlist_add_uint8(nvlist_t *nvl, const char *name, uint8_t value); void nvlist_add_int16(nvlist_t *nvl, const char *name, int16_t value); void nvlist_add_uint16(nvlist_t *nvl, const char *name, uint16_t value); void nvlist_add_int32(nvlist_t *nvl, const char *name, int32_t value); void nvlist_add_uint32(nvlist_t *nvl, const char *name, uint32_t value); void nvlist_add_int64(nvlist_t *nvl, const char *name, int64_t value); void nvlist_add_uint64(nvlist_t *nvl, const char *name, uint64_t value); void nvlist_add_string(nvlist_t *nvl, const char *name, const char *value); void nvlist_add_stringf(nvlist_t *nvl, const char *name, const char *value= fmt, ...) __printflike(3, 4); void nvlist_add_stringv(nvlist_t *nvl, const char *name, const char *value= fmt, va_list valueap) __printflike(3, 0); void nvlist_add_nvlist(nvlist_t *nvl, const char *name, const nvlist_t *va= lue); void nvlist_add_descriptor(nvlist_t *nvl, const char *name, int value); void nvlist_add_bool_array(nvlist_t *nvl, const char *name, const bool *va= lue, size_t nitems); void nvlist_add_int8_array(nvlist_t *nvl, const char *name, const int8_t *= value, size_t nitems); void nvlist_add_uint8_array(nvlist_t *nvl, const char *name, const uint8_t= *value, size_t nitems); void nvlist_add_int16_array(nvlist_t *nvl, const char *name, const int16_t= *value, size_t nitems); void nvlist_add_uint16_array(nvlist_t *nvl, const char *name, const uint16= _t *value, size_t nitems); void nvlist_add_int32_array(nvlist_t *nvl, const char *name, const int32_t= *value, size_t nitems); void nvlist_add_uint32_array(nvlist_t *nvl, const char *name, const uint32= _t *value, size_t nitems); void nvlist_add_int64_array(nvlist_t *nvl, const char *name, const int64_t= *value, size_t nitems); void nvlist_add_uint64_array(nvlist_t *nvl, const char *name, const uint64= _t *value, size_t nitems); void nvlist_add_string_array(nvlist_t *nvl, const char *name, const char *= const *value, size_t nitems); void nvlist_add_nvlist_array(nvlist_t *nvl, const char *name, const nvlist= _t * const *value, size_t nitems); void nvlist_add_descriptor_array(nvlist_t *nvl, const char *name, const in= t *value, size_t nitems); Because it is easy to use format strings for names and because the library support nesting, one could also do the following: nvlist_t *nvl, *slice; int i; nvl =3D nvlist_create(0); nvlist_add_uint32(nvl, "nslices", nslices); for (i =3D 0; i < nslices; i++) { slice =3D nvlist_create(0); nvlist_add_uint64(slice, "offset", slices[i]->offset); nvlist_add_uint64(slice, "size", slices[i]->size); nvlist_add_string(slice, "label", slices[i]->label); nvlist_addf_nvlist(nvl, slice, "slice%d", i); } > >Note that we neither check if allocation succeeded nor individual > >additions. The library is designed so that write-like operations can > >gracefully handle previous failures.=20 >=20 > Yes, this is important, otherwise the error handling gets too maddening. Right. sbuf(9) was inspiration for that:) > Also remember a function for debugging which renders a nvlist_t (into > a sbuf ?) Currently I have: void nvlist_dump(const nvlist_t *nvl, int fd); void nvlist_fdump(const nvlist_t *nvl, FILE *out); Not depending on sbuf could encourage wider addoption, so I'd prefer not to do that. Converting it to JSON as simple string might be good idea. > >So, to add elements to the list one can use either nvlist_add_() > >or nvlist_move_() functions. To get the values one also have two > >choices: nvlist_get_() and nvlist_take_() - the former just > >returns the value (but the value is still part of the nvlist) and the > >latter removes associated nvpair from nvlist and returns its value. >=20 > Do consider having these functions in a variant with a default > argument, so that people don't have to wrap each optioanl n/v pair > in an if(). Using default value to report a problem is too error-prone for my taste and not intuitive. I was considering it (I even have earlier nv implementation in src/sbin/hastd/nv.c that does that), but I decided against it. For some types it is trivial, ie. if nvlist_get_string() returns NULL we know there is a problem, but if nvlist_get_int32() returns 0 it is hard to say if it is a bug or not. In Solaris/IllumOS there is additional set of functions that return an error and use additional function argument for the result, for example we could have: int32_t nvlist_get_int32(const nvlist_t *nvl, const char *name); int nvlist_cget_int32(const nvlist_t *nvl, const char *name, int32_t *valu= ep); I was considering this as well, but there is a lot of functions as it is now and I'm not sure I want to add ~100 more (nvlist_cget_() and nvlist_ctake_() for every type multiplied by format string version multiplied by va_list version). > One of the things you will need in the kernel is to check that sets > of nv's contain what they should, before continuing processing. >=20 > One particular thing to detect is extra, unwanted, elements, > which at least represent a programmer mistake and which may become > a security risk down the road, if they suddenly gets used. >=20 > I would do this globally, by insisting that (kernel-)code cleans > the nv lists it is passed, and then in syscall-return check that all > nv-lists are in fact empty. >=20 > Leaving it to programmers is more risky, most people will not > grasp the need to be that careful for the long term, but in > that case you will need functions like: >=20 > int nvlist_accept_only(nvlist_t *, const char *, ...); >=20 > which complains if there are any "strange" nv pairs. > (see above for arrays, it should support that.) >=20 > The complementary function, is also needed: >=20 > int nvlist_has_all_of(nvlist_t *, const char *, ...); That's interesting idea. I do that in my current consumers of this API by walking nvlist_t in a loop and checking for extra nvpairs. Having more reusable mechanism for that would be nice. What you propose is not enough, as we need three informations about nvpair (in most cases): 1. name 2. type 3. required/optional So we could allow to create static const arrays with nvlist specification that we could compare against, eg. struct nvpair_spec { const char *name; int type; bool required; }; #define NVPAIR_SPEC_SENTINEL { NULL, NV_TYPE_NONE, false } static const struct nvpair_spec geom_provider_spec[] =3D { { "name", NV_TYPE_STRING, true }, { "mediasize", NV_TYPE_UINT64, true }, { "sectorsize", NV_TYPE_UINT64, true }, { "stripesize", NV_TYPE_UINT64, false }, NVPAIR_SPEC_SENTINEL }; /* nvlist_check() could take buffer for detailed error message */ if (nvlist_check(nvl, geom_provider_spec) < 0) err(1, "nvlist doesn't match specification"); > Finally, this kind of complex-data API causes errors which errno > does not have the expressive power to communicate: >=20 > "foo" not allowed when "bar" also specified >=20 > Some years ago I proposed an API extension to allow the kernel/libs > to return "detailed error strings" and I think this is a better > idea than trying to shoe-horn errorstrings into individual API call. >=20 > Something like: >=20 > int error_buffer(char *, size_t len) >=20 > Registers a char[] where kernel/libs can dump error strings. >=20 > Thereafter, whenever a library or kernel call fail, you *MAY* have > additional details in that bufffer: >=20 > static char myerrbuf[1024]; > [...] > assert(0 =3D=3D error_buffer(myerrbuf, sizeof myerrbuf)); > strcpy(myerrbuf, "(No details available)"; > [...] > nvl =3D nvlist_recv(sock); > if (nvl =3D=3D NULL) > err(1, "nvlist_recv() failed: %s", myerrbuf); > [...] >=20 > This is just to show the principle, it should be thought through, > for instance maybe strerror() should know about and report the > buffer contents, to get instant benefit without code changes ? Fine idea, but doesn't seem to directly related to libnv. I can imagine implementing it in libnv internally, so that we not only keep internal error number, but also internal error message that can be obtained with nvlist_errmsg(). If nvl=3DNULL is passed it will just return "No memory" or similar. It would be especially useful for the nvlist_check() above, but also to tell which nvlist_add_() exactly failed, etc. I like it. There is only a problem with translating those messages, which we keep avoiding. > And now for a really nasty question: >=20 > Have you considered marshalling the n/v lists as JSON strings ? >=20 > It would be pretty dang efficient just to stuff it into an sbuf > and it would make transport into the kernel a breeze. >=20 > Nobody says the kernel has to support anything but that exact > JSON subset generated by your library functions, so parsing it > it not too bad. No, I haven't considered it, but currently with: void *nvlist_pack(const nvlist_t *nvl, size_t *sizep); nvlist_t *nvlist_unpack(const void *buf, size_t size); you can easly pack it to a binary blob and pass to the kernel. Thanks for your input. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com --PmA2V3Z32TCmWXqI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (FreeBSD) iEYEARECAAYFAlHXJBcACgkQForvXbEpPzTSrwCg+GegJJpkjgGsQmxO5d8OhD2E uoUAoMCny23u1qwbDbynXG8nJqDKAIQh =1saQ -----END PGP SIGNATURE----- --PmA2V3Z32TCmWXqI--