FreeBSD Mail Archives

Date:      Wed, 17 Oct 2007 16:04:40 +0200
From:      Alexander Leidinger <netchild@FreeBSD.org>
To:        Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc:        Wilko Bulte <wb@freebie.xs4all.nl>, src-committers@FreeBSD.org, cvs-all@FreeBSD.org, cvs-src@FreeBSD.org
Subject:   Re: cvs commit: src/etc Makefile sensorsd.conf src/etc/defaults rc.conf src/etc/rc.d Makefile sensorsd src/lib/libc/gen sysctl.3 src/sbin/sysctl sysctl.8 sysctl.c src/share/man/man5 rc.conf.5 src/share/man/man9 Makefile sensor_attach.9 src/sys/conf files ...
Message-ID:  <20071017160440.b6fd00xs6cog888g@webmail.leidinger.net>
In-Reply-To: <13868.1192555960@critter.freebsd.dk>
References:  <13868.1192555960@critter.freebsd.dk>

Quoting Poul-Henning Kamp <phk@phk.freebsd.dk> (from Tue, 16 Oct 2007 =20
17:32:40 +0000):

> In message <20071016183311.lu97hbwzggsk4ow4@webmail.leidinger.net>,  =20
> Alexander L
> eidinger writes:
>
>>> Yes, that is the abstract argument, but the very same argument can
>>> be made for every other single kind of entity which consumes or
>>> produces bytes, from fingerprint readers to 9-track tape stations.
>>
>> Why do we have a common linked list API? It's easy enough to do it
>> again and again and again... We have it because we don't want to do it
>> again and again... And with the sensors API we gain something similar.
>
> There is a very big difference between <sys/queue.h> and sensors,
> in that <sys/queue.h> is not an external API, but a convenience
> tool for code to maintain its own data internally, whereas sensors
> is an API for exporting data.

The idea behind is the same. Don't do a lot by hand what can be done =20
with less work with an API.

>> It adds meta-data which can be used in an automated way. This is done
>> with a consistent and documented API. Sure, we can do it with sysctls
>> by hand, but see above.
>
> What exactly do you mean when you say "used in an automated way" ?

You can write a probe for a monitoring system, which has a look at =20
hw.sensors and based upon the data it sees there can generate at least =20
<name> <type> <value> reports, without the need that the probe needs =20
to be changed to be able to handle sensors which wheren't seen before.

> Can I run some magic program and tell it "alert me if something is
> wrong" or do I have to write a tedious configuration file to explain
> what "something is wrong" would look like to the program ?

It depends on the magic program and the sensor. As Constantine already =20
explained and you surely know, there are stupid sensors and there are =20
smart sensors.

>> It is not supposed to make the monitoring itself easier.
>> [...]
>> A human being still has to interprete the measurements. No doubts. But
>> with the framework you don't have to hunt down where to read the
>> sensor data, and how to name it. You can write a probe which takes
>> everything in the the sensors mib and let it produce names and values
>> for the probed things automatically.
>
> So the only problem sensors solves, is that it defines a single
> place in the sysctl tree, where you can find all sorts of non-random
> numbers, each of which comes with a piece of ascii text that isn't
> formatted in any consistent way ?
>
> I'd say, lets raise the bar several notches right here.
>
> How about we look at what is desirable from such a subsystem, and
> see what architecture that mandates ?

Hmmm... "desirable" is not the same as "useful" or "necessary". Let's =20
try to not overengineer this. Note: I also think as long as we try to =20
not prevent the framework to be able to handle specific things which =20
we think are desirable, the framework doesn't need to be able to =20
handle all from day one.

> Here are some things to think about:
>
> * Input only or input & output ?
>
>   Would it make sense to be able to control the fans or power
>   to various subsystems while we are at it ?

Apart from what Constantine said:
Do you want to change the power of various subsystems? Isn't the =20
system supposed to do it itself in a sensible and automatic way? I =20
would say it depends. Most of the time I don't want to fight with =20
something like this in a production system (and AFAIK Intel tries to =20
do more and more themself regarding power control in their CPUs, as =20
they noticed that often the "messing around" with this is ... =20
suboptimal).

> * It should be possible to implement a sensor in userland, so that
>   interface to external sensors is possible without forcing the
>   code into the kernel.  Think: Maxim/Dallas 1-Wire temperature
>   sensors and similar.

I see hw.sensors as a interface to get the data from sensoric data =20
which is within control of the kernel into the userland. I don't think =20
about it as something where "sensor" includes status info from =20
userland applications. I fail to see where it is beneficial to put =20
data which isn't measured by something in the kernel (e.g. the fill =20
level of a database or any other value an userland program produces). =20
Could you please explain why it should be possible to feed such =20
userland data into the kernel?

> * Metadata information in machine redable format:
>     - recommended, min and max poll rate

Typically the monitoring programs I know poll based upon a fixed rate. =20
The sensor framework caches already data, and it is up to the code =20
which puts the value into place to decide if the sensor has to be =20
queried again or not.

>     - Nominal value, quantization step and alarm limit(s)

Alarm limits normally are set in the monitoring applications I know =20
and depend upon various factors.

>     - alarm transgression severity for system integrity

How can a sensor know this? If it is about something which is in =20
active use, a violation of a specific value may be critical for the =20
entire system, but if it is just present in the system and not used at =20
all, crossing the same value may be not critical. In general this is a =20
policy decision which can not be solved by the person writting the =20
handover of this data to the sensor framework.

>     - sensorfailure severity for system integrity

Dito.

>     - physical location of measured quantity

Do you know monitoring programs which allow probes to submit this =20
information to the monitoring program? If not, why should the =20
framework allow to keep this information in the kernel when a file on =20
the system satisfies the same requirement?

>     ...

So far the things you mentioned are better suited to be kept in the =20
userland, instead of in the kernel. A simple file which a specific =20
syntax would be enough to let a probe automatically match a specific =20
sensor with this metadata and let it transfer this to the monitoring =20
application (if the monitoring application is suited to accept this =20
kind of data).

> * Event support ?
>     - enumeration, arrival and departure of sensors
>     - alarm transgressions
>     - sensor failure
>     ...
>
> * Interface and integration with IPMI, ACPI and similar.
>     Do any of these have a metadata format we can use ?

Constantine answered those already.

> and probably a lot of other stuff I didn't think of right now...
>
>> Now... how much hardware out there supports IPMI, or
>> better... how much in production use doesn't use IPMI?
>
> But don't you think it would be better to have a subsystem that
> made it possible to use IPMI and ACPI, than to just say "Naa,
> that sucks, it must do, because we don't support it" ?

I haven't said it doesn't support it, I told you already that Nate =20
didn't identify something in the sensors framework which prevents the =20
use of ACPI and the sensors framework, and Constantine already showed =20
how it integrates with IPMI.

>>> Let me get this straight, you're telling me:
>>>
>>> =09"I'm worried about this code running as root, so I'm putting
>>> =09it in the kernel instead."
>>
>> You missed the point.
>
> No, I most certainly did not.
>
> By defining the sensor API (on top of sysctl) at the kernel/userland
> boundary you have decided that all sensor implementations must live
> in the kernel, there is no room in your architecture for sensors
> that live in userland.

No, I didn't. I said (even last time when you first told us that you =20
don't like the sensors framework), that the sensors framework is =20
supposed to export data which lifes in the kernel to the userland. I =20
never said the sensors framework is supposed to be the one and only =20
way of getting status data from a running system. Userland status =20
belongs to userland programs. It would be nice to have a userland =20
framework which collects userland status, so that you don't have to =20
run around, and it may not be a bad idea that this userland framework =20
may collect also the data from the sensors framework (e.g. one plugin =20
to get all hw.sensors data, instead of multiple plugins to get all the =20
various states from the various places of a non-uniform status export =20
from kernel to userland). But here we talk about exporting data from =20
the kernel to the userland via an API, not about a userland framework =20
to collect status information (some people may say we already have =20
this with SNMP).

> Effectively, you have elevated all sensor implementations to root++
> priviledge, even if they don't need any priviledge at all.

No, I haven't, see above.

> I don't care much about who wrote the code or how trustworthy they
> are, that's a problem that can be fixed along the way.
>
> But I do care about taking away, by design, the choice of running
> at low priviledge from people who implement sensors.

I'm not taking this away.

>>> I repeat: The SoC interface is not the gateway to -current.
>>
>> It provides an idea in what people are interested in.
>
> Sure, lets list "Peace in the middle-east" on there, I'm sure people
> are interested :-)
>
> "People", whoever they are, are interested in anything that sounds
> fancy or flashy, but that doesn't mean that they can or will actually
> use it for anything if somebody produce it, and it certainly gives
> no guarantee that you will not shoot yourself in the foot along the
> way if you do so.

You said you don't like the idea of an unified way of exporting sensor =20
data which lifes in the kernel to the userland. You didn't provide =20
technical arguments against such an API (I'm not taking lm.4 into =20
account ATM, as this was not your main objection). When the "People" =20
voted for this project, they voted for this idea, which seems to work =20
nice in OpenBSD. You failed so far to show that it doesn't work in =20
FreeBSD, while we've seen several examples where we get benefits from =20
it. I don't say the implementation is free of bugs, or can not be =20
improved, but you are not talking about code, you are talking about =20
the idea. You are slapping our fellow committers (I don't count myself =20
here) into the face, which voted in favour of this project. You are =20
doing the same with those people which didn't voted against this =20
project.

>> And several
>> committers here in the thread also showed interest in this framework
>> (maybe not in the current implementation, but at least in the idea
>> behind it).
>
> Right, but if we didn't object, you had saddled us with this implementatio=
n,
> without any actual discussion about what exactly the idea behind it
> was and if that was the right idea for us.

I wouldn't have saddled us with the implementation. I would have maybe =20
saddled us with the API for the lifetime of one released branch... if =20
nobody would have improved it in the next 18 months (when HEAD is =20
branched). I don't think your very negatively sounding sentence above =20
is deserved. The people voting in the GSoC are supposed to reject =20
ideas which they identify as being outright bad. And for projects they =20
vote for, they have a look if it makes sense. If it doesn't make =20
sense, they don't vote for a project. The big disconnection between =20
FreeBSD and the SOC you are trying to put here, is not the case. I =20
agree that not everything which is produced in the soc deserves it's =20
way into CVS. But what gets rejected to far is stuff which proved =20
during the soc to be not usable. Either because the architecture =20
doesn't fit, like with the pluggable disk schedulers when you =20
introduced GEOM and the project wasn't usable anymore, or because the =20
student missed the goal, or when the goal was achieved but the =20
implementation was a cruel, or when we noticed that the project needs =20
complete rearchitecture because the initial design doesn't fit.

>> Just because you do not see how such a framework can be
>> useful to you (so far I have the impression from your mails, that you
>> object to the idea of this framework),
>
> I *can* see why and how such a framework can be useful, that's why
> I'm objecting to this half-baked attempt at it.

Now you sound differently than before. Before you said you don't like =20
the idea of such a framework at all. Some of the points you bring on =20
the table above look overengineered to me (I pointed them out). And so =20
far I wasn't able to identify a point there, which the sensors =20
framework prevents to implement. I also want to point out that so far =20
the goal was to do what is needed and evolutionary improve the =20
architecture/implementation, instead of trying to produce a big thing =20
with bells, whistles, trumpets and whatever (in Germany we call this =20
"eierlegendewollmichsau"). We all know that overengineered projects =20
typically fail, and that the evolutionary behavior in OpenSource =20
software produces very good results (e.g., SMP in 4.x was very good =20
for this time, and now as we have "more SMP" and raised expectations =20
we morphed into something better). Some of the things you want to have =20
for sensors looks nice. Some of this nice stuff doesn't belong into =20
the kernel. And I haven't seen something which can not be done with =20
the sensors framework in a next step. So far the sensors framework =20
allows to provide features we don't have in RELENG_x.

>>> Ten years ago when we didn't have P4 and the _extensive_ infrastructure
>>> for making it easy for people to work out of the tree, we had to do
>>> stuff like that, but there is no excuse for it today.
>>
>> Nobody is perfect. There will always be some bugs when something is
>> committed to -current.
>
> Bugs, yes, and we have means to deal with them.
>
> But we should try much harder to avoid half-baked concepts and wrong
> architecture, because that is 10 times harder to fix than a plain
> bug is.

Feel free to point out wrong architecture. Regarding the half-baked =20
part... so far Constantine already showed what is possible to do from =20
the list you came up with. I still think you are fighting against the =20
framework based mostly upon feelings, and not based upon technical =20
facts.

>> You don't talk about obvious problems here.
>> There's no destabilized system, there are no panics. You talk about
>> not using an underdocumented API and not using a generic framework for
>> creating tasks [...]
>
> Yes, it does appear to me that we are not on the same level of
> abstraction.
>
> I am indeed not talking about how many compiler warnings or style(9)
> infractions this code has.

gcc 4 introduced new warnings. Without compiling this with -Werror on =20
e.g. RELENG_6 with gcc 3, I don't think you should talk about compiler =20
warnings at the moment, as the kernel is on a similar level (in case =20
we don't compile with -Werror anymore). Regarding style(9) Constantine =20
made several commits in p4 during the soc.

> I'm talking about:
>   - if it actually solves a problem for us that we have.

Yes.

>   - if if should solve more problems than it does right now.

As you have seen it already does more than you think it is able to do.

>   - if it creates even more problems down the road.

Have you identified some problems?

> I'm talking about architecture, you're talking about code.

Wrong. I never said the code is without the possibility to improve it =20
or that it is free of flaws. I'm talking about your behavior of =20
rejecting the idea (not even the architecture, but the idea) of the =20
sensors framework without accepting that other people see a benefit in =20
such a framework and calling it crap without comming up with technical =20
reasons. I'm also talking about the idea of the framework and what it =20
is supposed to do whiel you say that you don't like the idea. I may =20
also have talked about parts of the code, but it is wrong to say that =20
I focus on the code. And I'm also the wrong person to talk about the =20
code, Constantine is the person to talk with if it is about the code.

Bye,
Alexander.

--=20
http://www.Leidinger.net  Alexander @ Leidinger.net: PGP ID =3D B0063FE7
http://www.FreeBSD.org     netchild @ FreeBSD.org  : PGP ID =3D 72077137
No one can feel as helpless as the owner of a sick goldfish.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071017160440.b6fd00xs6cog888g>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation