FreeBSD Mail Archives

Date:      Thu, 31 Jul 2014 11:55:45 -0400
From:      "Garance A Drosehn" <drosih@rpi.edu>
To:        "Zaro Korchev" <zkorchev@mail.bg>
Cc:        Jonathan Anderson <jonathan.robert.anderson@gmail.com>, Eitan Adler <lists@eitanadler.com>, Alfred Perlstein <bright@mu.org>, David Chisnall <theraven@theravensnest.org>, freebsd-hackers@freebsd.org
Subject:   Re: [GSoC] Machine readable output from userland utilities
Message-ID:  <1F6CE510-95F2-46C8-AB0E-D5987CF13760@rpi.edu>
In-Reply-To: <4C29B30D-6833-4F0E-B071-C8EA215C0A17@mail.bg>
References:  <0FCB749A-67F7-4C2F-AAC1-32D0BD67B502@theravensnest.org> <4C29B30D-6833-4F0E-B071-C8EA215C0A17@mail.bg>

Apologies for quoting so much, but I do have something to add way
down at the end of this, and I expect few people remember the
original message.  I see 'soc-status@freebsd.org' in one of the
messages quoted here.  Should I be sending this to that list too?

Way back on 3 Jun 2014, Zaro Korchev wrote:
> Hi everybody
>
> I see there are several different ideas about how the output
> format should be specified.
>
> I first started using an option named -O with the idea that
> this can be changed when the best variant is decided.
>
>
> There is the idea with the environment variable that we
> discussed with Eitan:
>
> On 29 May 2014 at 18:31, Eitan Adler wrote :
>> On 29 May 2014 05:12, Zaro Korchev <zkorchev@mail.bg> wrote:
>>> I thought about whether it is better to use an option or environment
>>> variable. I did it with option because it is easier to switch an
>>> option on/off. It appears that the flag -O is free in almost all
>>> tools. I have no problem making the use an environment variable.
>>
>> My concern is that future standards may require this option (or at
>> least, would be precluded from using it).  In addition, it may
>> conflict with non-base utilities, such as coreutils ones.
>
> ----
>
> There is the pipelining idea of Jonathan:
>
> On 23 May 2014 at 16:27, Jonathan Anderson wrote :
>> Imagine:
>>
>> $ ifconfig | filterBy "ether" " 3c:07:.*" | sortBy "ether" | output 
>> my_ifconfig.format   # or "json" or "xml" or ...
>>
>> A pipeline of little tools, each doing one thing well: how much more
>> unix can you get? Currently, every command-line tool has to do two or
>> three things:
>> 1. its primary job,
>> 2. output some arbitrary text format (that you're never allowed to
>> change because other tools scrape it) and
>> 3. (optionally) parse arbitrary text formats generated by users or
>> some other tool.
>>
>> Task 2 is annoying: in order to usefully query command-line tools, I 
>> have
>> to write a parser. The tool has binary data, I want binary data, but 
>> we
>> have to go through a dump/parse dance in order for me to get the 
>> data.
>> This is the approach (again, from Plan 9) that brings you Linux 
>> sysfs.
>> Perhaps David would now like to comment on his cross-platform "how 
>> much
>> battery do I have" experience. :)
>>
>> Task 3 isn't just annoying, however, it's risky. If every tool 
>> implements
>> its own string protocol parsing, we greatly increase the risk of 
>> unnoticed
>> bugs. Better to centralize as much string parsing as possible into a
>> single library, which can be rigorously analyzed (and optimized!).
>>
>> Imagine if geom didn't have to speak XML natively, but rather used a 
>> supported-everywhere-in-base data structure that users could convert 
>> into
>> XML if they need it. Desktop applications are going to start 
>> requiring
>> structured data passing via kdbus-like interfaces (currently based on
>> GLib's GVariant), so we might as well have a structured 
>> representation
>> that we like and are able to provide ABI support for (and, in the 
>> kdbus
>> case, can possibly be converted to/from GVariant as required).
>
> ----
>
> There is the long option idea of David:
>
>> From : David Chisnall <theraven@theravensnest.org>
>> Date : 2 June 2014 16:31:11
>> To : Zaro Korchev <zkorchev@mail.bg>
>> Cc : soc-status@freebsd.org
>>
>> On 2 Jun 2014, at 12:43, Zaro Korchev <zkorchev@mail.bg> wrote:
>>
>>> At the moment both ls and vmstat are told to output JSON by 
>>> specifying
>>> the -O option. However as I discussed with my mentor, this will be
>>> changed. The idea is to use an environment variable instead of
>>> the -O flag.
>>
>> I don't like the idea of using an environment variable, because this 
>> is
>> something that you might want to control on a per-command basis 
>> within
>> a pipeline.  Especially with respect to incremental adoption, if you
>> have some commands that will emit their default format, which is sent
>> to sed / awk / whatever, and some that will emit json natively, you
>> don't want to suddenly have the output format from the legacy tools
>> change once they gain machine-readable output support.
>>
>> One *very* important thing to do is standardise the command-line flag
>> that is used to specify the output format.  This may involve also
>> converting some of the tools to use getopt_long if they don't already
>> (lots of tools already use most single-digit options, so there's no
>> possibility to define a single-letter flag that will be useable on
>> all tools).
>>
>>> I understand your concerns about multi-threading. The idea is to
>>> have functions that serialize the object in an allocated buffer as
>>> it is constructed. Here is a more detailed example of what I mean:
>>
>> It would be better to has some stream output API as the default.  If
>> one back end only supports writing to buffers, then you can add an
>> extra alloc / write / free sequence to hide it, but it would be good
>> if the interface understands writing directly to file descriptors.
>> If the back end natively supports streaming, then you don't need
>> to buffer the output.
>
>
> As you have more experience I believe you can decide which is the 
> best.
>
> I like the pipelining and the long option idea the most. At the moment
> I'm working on porting more tools to use libsol so this decision is
> not urgent. I can change how the format is specified easily.

Last night it occurred to me that another way to do this would be to 
have
a second version of each command.  So "wc" and "mor-wc", "uptime" and
"mro-uptime", etc.  This might seem like some horrible bloat, but you
could have the two commands hard-linked to each other, and the behavior
of the executable would dependent upon what name it was executed as.
The amount of disk space used would be pretty close to the same used
by any of the other methods.

Advantages of this:
1. The "mro" (machine-readable-output) version can start fresh with
    whatever command-line arguments it wants to support.  You can
    make *all* "mro" commands use the same parameters and environment
    variables for formatting options.
2. It is trivial for a human to see if there is an "mro" version of
    a command that they're going to use in some script.  They don't
    have to go reading through the 20-screen man page for 'ps' to
    see if it can be made to generate MRO.
3. It is trivial for some cross-platform package to use autoconfig
    to find out if a given OS has the "mro" version of whatever
    commands they wanted.

If it seems pretty ugly to clutter /bin and /usr/bin with two versions
of many commands, the MRO versions could also be moved into separate
directories from the original commands.  /bin/mro/df, /bin/mro/ps, etc.
If someone *wants* to type in MRO commands at the command line, they
can add that to their PATH.  But I expect most usage of these commands
will be in scripts, and scripts can set their own paths.

(and yes, I think would still want separate command names with a common
prefix even if all the MRO commands are in their own directory.  I don't
want someone to get the MRO-version of a command by mistake if they put
the MRO directory before the standard directory).

-- 
Garance Alistair Drosehn                =     drosih@rpi.edu
Senior Systems Programmer               or   gad@FreeBSD.org
Rensselaer Polytechnic Institute;             Troy, NY;  USA

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1F6CE510-95F2-46C8-AB0E-D5987CF13760>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation