From owner-freebsd-stable@FreeBSD.ORG Mon Oct 22 15:38:24 2012 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C8696FFD; Mon, 22 Oct 2012 15:38:24 +0000 (UTC) (envelope-from takeda@takeda.tk) Received: from chinatsu.takeda.tk (mail.takeda.tk [74.0.89.210]) by mx1.freebsd.org (Postfix) with ESMTP id 906388FC14; Mon, 22 Oct 2012 15:38:24 +0000 (UTC) Received: from takeda-ws.lan (takeda-ws.lan [10.0.0.3]) (authenticated bits=0) by chinatsu.takeda.tk (8.14.5/8.14.5) with ESMTP id q9MFcHo4016455 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Mon, 22 Oct 2012 08:38:18 -0700 (PDT) (envelope-from takeda@takeda.tk) Date: Mon, 22 Oct 2012 08:38:11 -0700 From: Derek Kulinski X-Priority: 3 (Normal) Message-ID: <35578786.20121022083811@takeda.tk> To: Jeremy Chadwick Subject: Re: Problem reading vitals from Gigabyte H77-DH3H In-Reply-To: <20121022130348.GA28302@icarus.home.lan> References: <20121022130348.GA28302@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: freebsd-stable@FreeBSD.org, avg@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Oct 2012 15:38:24 -0000 Hello Jeremy, Monday, October 22, 2012, 6:03:49 AM, you wrote: > I'm not subscribed to the FreeBSD lists any longer, but I did come > across this thread via the web: > http://lists.freebsd.org/pipermail/freebsd-stable/2012-October/070169.html > Either (or both) of you are free to bounce a copy of my Email here to > the list if you feel it'd benefit others. > I have a lot of familiarity with hardware monitoring chips and > interfacing with them (as the author of ports/sysutils/bsdhwmon). > The H/W monitoring chip on that Gigabyte motherboard is **not** the same > or has resistors/pullups that differ from what the OpenBSD sensors > framework code expects. That is quite evident from the below. There > are also very likely labels that are wrong. I'll get to explaining how > to fix that properly further down. > Let me explain in detail one section at a time: >> hw.sensors.it0.volt0: 1,42 VDC (VCORE_A) >> hw.sensors.it0.volt1: 2,72 VDC (VCORE_B) > The term "Vcore" refers to the CPU core voltage. This is a > per-physical-CPU basis. This software is assuming there's 2 physical > CPUs (not cores, I'm talking about physical processors). > VCORE_A may be correct (meaning 1.42V), however it depends on the CPU > model. Derek did not disclose this so I cannot tell you if 1.42V is > considered "correct" or not. Some models run at 1.2V, others 1.5V, > others vary. It is i5-3470 3.2GHz quad core (The entire component list I used to build is here: http://pcpartpicker.com/p/koz3). The CPU is not overclocked, I set "auto" for all this kind of settings in the BIOS. > VCORE_B is probably not VCORE_B at all. However, worse: 2.72V does not > look to be a correct/valid voltage no matter what (even if for an MCH or > a southbridge). So probably a calculation error or its reading the > wrong bits from the chip. >> hw.sensors.it0.volt2: 2,70 VDC (+3.3V) > This is also wrong -- either the voltage or the label. There is no way > your system would be stable if a +3.3V line was at +2.7V. So another > calculation error or reading wrong bits from the chip. >> hw.sensors.it0.volt3: 4,60 VDC (+5V) > This is probably also wrong, but it's hard to say. +5V is relied on > heavily throughout the entire system, so a 0.4V drop is pretty damn > major. So probably another calculation error or reading wrong bits from > the chip. >> hw.sensors.it0.volt4: 0,06 VDC (+12V) > This is flat out completely wrong on numerous levels. >> hw.sensors.it0.volt5: -5,08 VDC (Unused) > No idea. This could be -5V monitoring, but it depends. Only Gigabyte > would know. >> hw.sensors.it0.volt6: -6,53 VDC (-12V) > Also totally wrong (voltage and label). So another calculation error or > reading wrong bits from the chip. >> hw.sensors.it0.volt7: 3,74 VDC (+5VSB) > Also totally wrong (voltage and/or label). "+5Vsb" stands for "+5V > standby"; it's the +5V line that comes off the PSU and is *always on*, > even when the motherboard is off. It's what allows systems to power > back up from sleep state. So another calculation error or reading wrong > bits from the chip. >> hw.sensors.it0.volt8: 2,14 VDC (VBAT) > Also totally wrong (voltage and/or label). "VBAT" refers to the voltage > of the CMOS battery, which should be +3.3V. So another calculation > error or reading wrong bits from the chip. > Here is what proper labels and a proper system should show, as an > example: > # bsdhwmon > CPU1 Temperature 31 C > System Temperature 35 C > FAN1 0 RPM > FAN2 0 RPM > FAN3 0 RPM > FAN4 2042 RPM > FAN5 0 RPM > FAN6 1875 RPM > VcoreA 1.106 V > MCH Core 1.522 V > -12V -12.288 V > V_DIMM 1.712 V > +3.3V 3.392 V > +12V 12.096 V > 5Vsb 5.070 V > 5VDD 5.118 V > P_VTT 1.142 V > Vbat 3.328 V > The bottom line here is this: the problem with the sensors framework is > that it has no concept of per-motherboard engineering (to my knowledge). > Again, that is why I designed bsdhwmon the way I did -- I key off of > SMBIOS string data because it's the only way to do things as reliable as > possible. Each motherboard model requires unique support. Without > that, voltage calculations are either wrong, or labels are completely > wrong, or both. > If I could get within the bowels of Gigabyte and actually talk to a > **real engineer** and not tech support, I could find out if their > GA-H77-DS3H motherboard has SMBus tie-ins for their H/W monitoring chip. > If it does, I **absolutely** could add PROPER support for it to > bsdhwmon. > However, regardless of that, it also requires the owner of the > motherboard to be able to run the monitoring software provided by the > vendor for the board (usually Windows software) as a "baseline" > comparison -- or -- take a screenshot of the hardware monitoring details > in the BIOS (or UEFI system) for comparison. Sometimes a VERY HIGH > RESOLUTION photo of the motherboard is helpful -- though sometimes this > isn't useful because motherboard vendors actually use "emulation modes" > of their Super I/O chips (e.g. Chip Z is installed on the board, but > it's configured to emulate Chip X which the Chip company made 2 years > ago). I've found this on many Supermicro boards actually -- what's > silkscreened on the chips says one thing but how the chip *behaves* is > another. Not exactly a screenshot but I wrote down values given by BIOS: CPU Vcore 1.044V DRAM Voltage 1.524V +3.3V 3.363V +12V 12.168V CPU Temp 33C System Temp 30C Please let me know if this is enough. As for the picture of the motherboard, this one (http://www.nix.ru/autocatalog/motherboards_gigabyte/135869_2245_draft.jpg) looks way better than any of my picture. It is revision 1.0. Gigabyte seems to have also rev 1.1, but 1.0 is the one I use. > But sometimes even WITH proper documentation from the vendor there are > unexplained issues. Two examples taken from bsdhwmon's doc/BUGS: > Winbond W83792D: +5V Vcc is incorrect > ======================================= > Currently, boards which use the Winbond W83792D H/W monitoring IC will > have their +5V voltage shown incorrectly. > I've mailed Supermicro to try and find out why the calculation formula > is wrong (since what I'm following comes from Winbond), but as of this > writing have received no response. > I have also looked at the Linux lm-sensors project, but the code is > quite "spaghetti" -- it's hard to discern what the calculation values > are, and if they're the same for all W83792D systems. > Winbond W83792D: FAN3 RPMs may be inaccurate/high > =================================================== > I've received a single (isolated) report involving a Supermicro P8SCi > board reporting absurdly high values for FAN3. Example: > FAN1 0 RPM > FAN2 2909 RPM > FAN3 84375 RPM > FAN4 0 RPM > FAN5 0 RPM > Further executions of bsdhwmon did not exhibit this problem. However, > I take the report seriously, as it could indicate a strange bug in > bsdhwmon, or possibly a bug in the Winbond W83792D chipset. At this > time I have not been able to determine the root cause, however the > user had his fan RPM configuration in the system BIOS set to > "3-pin Server" rather than "Disabled" (which runs the fans at full > speed). This could be a bug in the Winbond chipset, but I simply > don't know. > ------------ > I refuse to interface with Super I/O or H/W monitoring chips that use > the classic ISA interface (/dev/io) because it's an extremely risky > interface. You can crash and lock up a system very very easily with > this model. The wrong I/O port or wrong bit set in the wrong sub-reg > and pow, the system is in a weird state. It's a lot more difficult with > SMBus given the unique assignment of a slave device address per-device. > Don't get me started on what Linux lm-sensors looks like either. Good > god what a mess. Does it work? Yeah, it works. But it's just such a > garbled mess of code and "configs" and some abstract strangeness. It > really doesn't read well, and is not commented good to boot. > I wish I could help solve this in some way for you guys (without using > sensors). I've spent way too many years doing H/W monitoring "stuff", > and concluded long ago that on FreeBSD H/W monitoring is absolutely > doable but we need support from vendors on a per-motherboard basis. > Supermicro happens to be one of the few vendors who is quite good about > this, barring the Winbond W83792D +5V Vcc problem. > The biggest problem: this kind of support/effort is quite literally a > full-time job. Finding/getting in contact with engineers deep within > the bowels of companies is the hardest part. > P.S. -- Question for Andriy: I thought it was established long ago that > none of this monitoring should be done in the kernel? Were you around > when someone took the time to port the OpenBSD sensors framework to > FreeBSD, and it resulted in a *massive* discussion and backlash from > FreeBSD kernel committers stating "this should not go in the kernel?" > Now I see this, and mention of an it(4) driver...? What exactly is > going on? To put it in California-style: "dude. This REALLY pisses me > off. WTF is going on over there?" -- Best regards, Derek mailto:takeda@takeda.tk Always remember you're unique, just like everyone else.