From owner-cvs-src@FreeBSD.ORG Wed Oct 17 14:06:05 2007 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 09DB116A41A; Wed, 17 Oct 2007 14:06:05 +0000 (UTC) (envelope-from netchild@freebsd.org) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id 258AD13C455; Wed, 17 Oct 2007 14:06:03 +0000 (UTC) (envelope-from netchild@freebsd.org) Received: from outgoing.leidinger.net (p54A548C8.dip.t-dialin.net [84.165.72.200]) by redbull.bpaserver.net (Postfix) with ESMTP id C2FCE2E2A4; Wed, 17 Oct 2007 16:05:46 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 3726E5B480D; Wed, 17 Oct 2007 16:04:41 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.1/8.13.8/Submit) id l9HE4f4C090814; Wed, 17 Oct 2007 16:04:41 +0200 (CEST) (envelope-from netchild@FreeBSD.org) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde MIME library) with HTTP; Wed, 17 Oct 2007 16:04:40 +0200 Message-ID: <20071017160440.b6fd00xs6cog888g@webmail.leidinger.net> X-Priority: 3 (Normal) Date: Wed, 17 Oct 2007 16:04:40 +0200 From: Alexander Leidinger To: Poul-Henning Kamp References: <13868.1192555960@critter.freebsd.dk> In-Reply-To: <13868.1192555960@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.4) / FreeBSD-7.0 X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=5.696, required 8, BAYES_50 2.50, IMPRONONCABLE_2 1.50, J_CHICKENPOX_27 0.60, MIME_QP_LONG_LINE 1.40, RDNS_DYNAMIC 0.10, SARE_FROM_SPAM_WORD3 0.10, SMILEY -0.50) X-BPAnet-MailScanner-SpamScore: sssss X-BPAnet-MailScanner-From: netchild@freebsd.org X-Spam-Status: No Cc: Wilko Bulte , src-committers@FreeBSD.org, cvs-all@FreeBSD.org, cvs-src@FreeBSD.org Subject: Re: cvs commit: src/etc Makefile sensorsd.conf src/etc/defaults rc.conf src/etc/rc.d Makefile sensorsd src/lib/libc/gen sysctl.3 src/sbin/sysctl sysctl.8 sysctl.c src/share/man/man5 rc.conf.5 src/share/man/man9 Makefile sensor_attach.9 src/sys/conf files ... X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Oct 2007 14:06:05 -0000 Quoting Poul-Henning Kamp (from Tue, 16 Oct 2007 =20 17:32:40 +0000): > In message <20071016183311.lu97hbwzggsk4ow4@webmail.leidinger.net>, =20 > Alexander L > eidinger writes: > >>> Yes, that is the abstract argument, but the very same argument can >>> be made for every other single kind of entity which consumes or >>> produces bytes, from fingerprint readers to 9-track tape stations. >> >> Why do we have a common linked list API? It's easy enough to do it >> again and again and again... We have it because we don't want to do it >> again and again... And with the sensors API we gain something similar. > > There is a very big difference between and sensors, > in that is not an external API, but a convenience > tool for code to maintain its own data internally, whereas sensors > is an API for exporting data. The idea behind is the same. Don't do a lot by hand what can be done =20 with less work with an API. >> It adds meta-data which can be used in an automated way. This is done >> with a consistent and documented API. Sure, we can do it with sysctls >> by hand, but see above. > > What exactly do you mean when you say "used in an automated way" ? You can write a probe for a monitoring system, which has a look at =20 hw.sensors and based upon the data it sees there can generate at least =20 reports, without the need that the probe needs =20 to be changed to be able to handle sensors which wheren't seen before. > Can I run some magic program and tell it "alert me if something is > wrong" or do I have to write a tedious configuration file to explain > what "something is wrong" would look like to the program ? It depends on the magic program and the sensor. As Constantine already =20 explained and you surely know, there are stupid sensors and there are =20 smart sensors. >> It is not supposed to make the monitoring itself easier. >> [...] >> A human being still has to interprete the measurements. No doubts. But >> with the framework you don't have to hunt down where to read the >> sensor data, and how to name it. You can write a probe which takes >> everything in the the sensors mib and let it produce names and values >> for the probed things automatically. > > So the only problem sensors solves, is that it defines a single > place in the sysctl tree, where you can find all sorts of non-random > numbers, each of which comes with a piece of ascii text that isn't > formatted in any consistent way ? > > I'd say, lets raise the bar several notches right here. > > How about we look at what is desirable from such a subsystem, and > see what architecture that mandates ? Hmmm... "desirable" is not the same as "useful" or "necessary". Let's =20 try to not overengineer this. Note: I also think as long as we try to =20 not prevent the framework to be able to handle specific things which =20 we think are desirable, the framework doesn't need to be able to =20 handle all from day one. > Here are some things to think about: > > * Input only or input & output ? > > Would it make sense to be able to control the fans or power > to various subsystems while we are at it ? Apart from what Constantine said: Do you want to change the power of various subsystems? Isn't the =20 system supposed to do it itself in a sensible and automatic way? I =20 would say it depends. Most of the time I don't want to fight with =20 something like this in a production system (and AFAIK Intel tries to =20 do more and more themself regarding power control in their CPUs, as =20 they noticed that often the "messing around" with this is ... =20 suboptimal). > * It should be possible to implement a sensor in userland, so that > interface to external sensors is possible without forcing the > code into the kernel. Think: Maxim/Dallas 1-Wire temperature > sensors and similar. I see hw.sensors as a interface to get the data from sensoric data =20 which is within control of the kernel into the userland. I don't think =20 about it as something where "sensor" includes status info from =20 userland applications. I fail to see where it is beneficial to put =20 data which isn't measured by something in the kernel (e.g. the fill =20 level of a database or any other value an userland program produces). =20 Could you please explain why it should be possible to feed such =20 userland data into the kernel? > * Metadata information in machine redable format: > - recommended, min and max poll rate Typically the monitoring programs I know poll based upon a fixed rate. =20 The sensor framework caches already data, and it is up to the code =20 which puts the value into place to decide if the sensor has to be =20 queried again or not. > - Nominal value, quantization step and alarm limit(s) Alarm limits normally are set in the monitoring applications I know =20 and depend upon various factors. > - alarm transgression severity for system integrity How can a sensor know this? If it is about something which is in =20 active use, a violation of a specific value may be critical for the =20 entire system, but if it is just present in the system and not used at =20 all, crossing the same value may be not critical. In general this is a =20 policy decision which can not be solved by the person writting the =20 handover of this data to the sensor framework. > - sensorfailure severity for system integrity Dito. > - physical location of measured quantity Do you know monitoring programs which allow probes to submit this =20 information to the monitoring program? If not, why should the =20 framework allow to keep this information in the kernel when a file on =20 the system satisfies the same requirement? > ... So far the things you mentioned are better suited to be kept in the =20 userland, instead of in the kernel. A simple file which a specific =20 syntax would be enough to let a probe automatically match a specific =20 sensor with this metadata and let it transfer this to the monitoring =20 application (if the monitoring application is suited to accept this =20 kind of data). > * Event support ? > - enumeration, arrival and departure of sensors > - alarm transgressions > - sensor failure > ... > > * Interface and integration with IPMI, ACPI and similar. > Do any of these have a metadata format we can use ? Constantine answered those already. > and probably a lot of other stuff I didn't think of right now... > >> Now... how much hardware out there supports IPMI, or >> better... how much in production use doesn't use IPMI? > > But don't you think it would be better to have a subsystem that > made it possible to use IPMI and ACPI, than to just say "Naa, > that sucks, it must do, because we don't support it" ? I haven't said it doesn't support it, I told you already that Nate =20 didn't identify something in the sensors framework which prevents the =20 use of ACPI and the sensors framework, and Constantine already showed =20 how it integrates with IPMI. >>> Let me get this straight, you're telling me: >>> >>> =09"I'm worried about this code running as root, so I'm putting >>> =09it in the kernel instead." >> >> You missed the point. > > No, I most certainly did not. > > By defining the sensor API (on top of sysctl) at the kernel/userland > boundary you have decided that all sensor implementations must live > in the kernel, there is no room in your architecture for sensors > that live in userland. No, I didn't. I said (even last time when you first told us that you =20 don't like the sensors framework), that the sensors framework is =20 supposed to export data which lifes in the kernel to the userland. I =20 never said the sensors framework is supposed to be the one and only =20 way of getting status data from a running system. Userland status =20 belongs to userland programs. It would be nice to have a userland =20 framework which collects userland status, so that you don't have to =20 run around, and it may not be a bad idea that this userland framework =20 may collect also the data from the sensors framework (e.g. one plugin =20 to get all hw.sensors data, instead of multiple plugins to get all the =20 various states from the various places of a non-uniform status export =20 from kernel to userland). But here we talk about exporting data from =20 the kernel to the userland via an API, not about a userland framework =20 to collect status information (some people may say we already have =20 this with SNMP). > Effectively, you have elevated all sensor implementations to root++ > priviledge, even if they don't need any priviledge at all. No, I haven't, see above. > I don't care much about who wrote the code or how trustworthy they > are, that's a problem that can be fixed along the way. > > But I do care about taking away, by design, the choice of running > at low priviledge from people who implement sensors. I'm not taking this away. >>> I repeat: The SoC interface is not the gateway to -current. >> >> It provides an idea in what people are interested in. > > Sure, lets list "Peace in the middle-east" on there, I'm sure people > are interested :-) > > "People", whoever they are, are interested in anything that sounds > fancy or flashy, but that doesn't mean that they can or will actually > use it for anything if somebody produce it, and it certainly gives > no guarantee that you will not shoot yourself in the foot along the > way if you do so. You said you don't like the idea of an unified way of exporting sensor =20 data which lifes in the kernel to the userland. You didn't provide =20 technical arguments against such an API (I'm not taking lm.4 into =20 account ATM, as this was not your main objection). When the "People" =20 voted for this project, they voted for this idea, which seems to work =20 nice in OpenBSD. You failed so far to show that it doesn't work in =20 FreeBSD, while we've seen several examples where we get benefits from =20 it. I don't say the implementation is free of bugs, or can not be =20 improved, but you are not talking about code, you are talking about =20 the idea. You are slapping our fellow committers (I don't count myself =20 here) into the face, which voted in favour of this project. You are =20 doing the same with those people which didn't voted against this =20 project. >> And several >> committers here in the thread also showed interest in this framework >> (maybe not in the current implementation, but at least in the idea >> behind it). > > Right, but if we didn't object, you had saddled us with this implementatio= n, > without any actual discussion about what exactly the idea behind it > was and if that was the right idea for us. I wouldn't have saddled us with the implementation. I would have maybe =20 saddled us with the API for the lifetime of one released branch... if =20 nobody would have improved it in the next 18 months (when HEAD is =20 branched). I don't think your very negatively sounding sentence above =20 is deserved. The people voting in the GSoC are supposed to reject =20 ideas which they identify as being outright bad. And for projects they =20 vote for, they have a look if it makes sense. If it doesn't make =20 sense, they don't vote for a project. The big disconnection between =20 FreeBSD and the SOC you are trying to put here, is not the case. I =20 agree that not everything which is produced in the soc deserves it's =20 way into CVS. But what gets rejected to far is stuff which proved =20 during the soc to be not usable. Either because the architecture =20 doesn't fit, like with the pluggable disk schedulers when you =20 introduced GEOM and the project wasn't usable anymore, or because the =20 student missed the goal, or when the goal was achieved but the =20 implementation was a cruel, or when we noticed that the project needs =20 complete rearchitecture because the initial design doesn't fit. >> Just because you do not see how such a framework can be >> useful to you (so far I have the impression from your mails, that you >> object to the idea of this framework), > > I *can* see why and how such a framework can be useful, that's why > I'm objecting to this half-baked attempt at it. Now you sound differently than before. Before you said you don't like =20 the idea of such a framework at all. Some of the points you bring on =20 the table above look overengineered to me (I pointed them out). And so =20 far I wasn't able to identify a point there, which the sensors =20 framework prevents to implement. I also want to point out that so far =20 the goal was to do what is needed and evolutionary improve the =20 architecture/implementation, instead of trying to produce a big thing =20 with bells, whistles, trumpets and whatever (in Germany we call this =20 "eierlegendewollmichsau"). We all know that overengineered projects =20 typically fail, and that the evolutionary behavior in OpenSource =20 software produces very good results (e.g., SMP in 4.x was very good =20 for this time, and now as we have "more SMP" and raised expectations =20 we morphed into something better). Some of the things you want to have =20 for sensors looks nice. Some of this nice stuff doesn't belong into =20 the kernel. And I haven't seen something which can not be done with =20 the sensors framework in a next step. So far the sensors framework =20 allows to provide features we don't have in RELENG_x. >>> Ten years ago when we didn't have P4 and the _extensive_ infrastructure >>> for making it easy for people to work out of the tree, we had to do >>> stuff like that, but there is no excuse for it today. >> >> Nobody is perfect. There will always be some bugs when something is >> committed to -current. > > Bugs, yes, and we have means to deal with them. > > But we should try much harder to avoid half-baked concepts and wrong > architecture, because that is 10 times harder to fix than a plain > bug is. Feel free to point out wrong architecture. Regarding the half-baked =20 part... so far Constantine already showed what is possible to do from =20 the list you came up with. I still think you are fighting against the =20 framework based mostly upon feelings, and not based upon technical =20 facts. >> You don't talk about obvious problems here. >> There's no destabilized system, there are no panics. You talk about >> not using an underdocumented API and not using a generic framework for >> creating tasks [...] > > Yes, it does appear to me that we are not on the same level of > abstraction. > > I am indeed not talking about how many compiler warnings or style(9) > infractions this code has. gcc 4 introduced new warnings. Without compiling this with -Werror on =20 e.g. RELENG_6 with gcc 3, I don't think you should talk about compiler =20 warnings at the moment, as the kernel is on a similar level (in case =20 we don't compile with -Werror anymore). Regarding style(9) Constantine =20 made several commits in p4 during the soc. > I'm talking about: > - if it actually solves a problem for us that we have. Yes. > - if if should solve more problems than it does right now. As you have seen it already does more than you think it is able to do. > - if it creates even more problems down the road. Have you identified some problems? > I'm talking about architecture, you're talking about code. Wrong. I never said the code is without the possibility to improve it =20 or that it is free of flaws. I'm talking about your behavior of =20 rejecting the idea (not even the architecture, but the idea) of the =20 sensors framework without accepting that other people see a benefit in =20 such a framework and calling it crap without comming up with technical =20 reasons. I'm also talking about the idea of the framework and what it =20 is supposed to do whiel you say that you don't like the idea. I may =20 also have talked about parts of the code, but it is wrong to say that =20 I focus on the code. And I'm also the wrong person to talk about the =20 code, Constantine is the person to talk with if it is about the code. Bye, Alexander. --=20 http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137 No one can feel as helpless as the owner of a sick goldfish.