From owner-freebsd-performance@FreeBSD.ORG Sun Apr 8 13:00:33 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 22F671065677; Sun, 8 Apr 2012 13:00:32 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id D73C48FC08; Sun, 8 Apr 2012 13:00:22 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 9219A2842D; Sun, 8 Apr 2012 14:53:16 +0200 (CEST) Received: from [192.168.1.2] (static-84-242-120-26.net.upcbroadband.cz [84.242.120.26]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id B8A8028426; Sun, 8 Apr 2012 14:53:15 +0200 (CEST) Message-ID: <4F818A3B.5040904@quip.cz> Date: Sun, 08 Apr 2012 14:53:15 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Nikolay Denev References: <4F7ED7F4.5060509@zedat.fu-berlin.de> <687BFFD7-1456-4D7B-AFB2-356EE9B0D1DD@gmail.com> In-Reply-To: <687BFFD7-1456-4D7B-AFB2-356EE9B0D1DD@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org, Current FreeBSD , "O. Hartmann" Subject: Re: ECC memory driver in FreeBSD 10? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Apr 2012 13:00:33 -0000 Nikolay Denev wrote: > On Apr 6, 2012, at 2:48 PM, O. Hartmann wrote: > >> I'm looking for a way to force FreeBSD 10 to maintain/watch ECC errors >> reported by UEFI (or BIOS). >> Since ECC is said to be essential for server systems both in buisness >> and science and I do not question this, I was wondering if I can not >> report ECC errors via a watchdog or UEFI (ACPI?) report to syslog >> facility on FreeBSD. >> FreeBSD is supposed to be a server operating system, as far as I know, >> so I believe there must be something which didn't have revealed itself >> to me, yet. > > If the hardware supports it, such errors should be logged as MCEs (Machine Check Exceptions). > I can say for sure it works pretty well with Dell servers, as I had one with failing RAM module, and > it reported the corrected ECC errors in dmesg. Memory ECC errors are logged in to messages and you can decode it by sysutils/mcelog. I did it in the past on one of our Sun Fire X2100 M2 with FreeBSD 8.x. Miroslav Lachman From owner-freebsd-performance@FreeBSD.ORG Mon Apr 9 10:04:29 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EF925106566B; Mon, 9 Apr 2012 10:04:29 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) by mx1.freebsd.org (Postfix) with ESMTP id A19068FC14; Mon, 9 Apr 2012 10:04:29 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1SHBSW-0006Ru-Ft>; Mon, 09 Apr 2012 12:04:28 +0200 Received: from e178034252.adsl.alicedsl.de ([85.178.34.252] helo=thor.walstatt.dyndns.org) by inpost2.zedat.fu-berlin.de (Exim 4.69) with esmtpsa (envelope-from ) id <1SHBSW-0005xp-AU>; Mon, 09 Apr 2012 12:04:28 +0200 Message-ID: <4F82B42B.1050900@zedat.fu-berlin.de> Date: Mon, 09 Apr 2012 12:04:27 +0200 From: "O. Hartmann" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.3) Gecko/20120314 Thunderbird/10.0.3 MIME-Version: 1.0 To: Miroslav Lachman <000.fbsd@quip.cz> References: <4F7ED7F4.5060509@zedat.fu-berlin.de> <687BFFD7-1456-4D7B-AFB2-356EE9B0D1DD@gmail.com> <4F818A3B.5040904@quip.cz> In-Reply-To: <4F818A3B.5040904@quip.cz> X-Enigmail-Version: 1.4 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig910C6AD7A324B09C21B3649A" X-Originating-IP: 85.178.34.252 Cc: Nikolay Denev , freebsd-performance@freebsd.org, Current FreeBSD Subject: Re: ECC memory driver in FreeBSD 10? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 10:04:30 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig910C6AD7A324B09C21B3649A Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Am 04/08/12 14:53, schrieb Miroslav Lachman: > Nikolay Denev wrote: >> On Apr 6, 2012, at 2:48 PM, O. Hartmann wrote: >> >>> I'm looking for a way to force FreeBSD 10 to maintain/watch ECC error= s >>> reported by UEFI (or BIOS). >>> Since ECC is said to be essential for server systems both in buisness= >>> and science and I do not question this, I was wondering if I can not >>> report ECC errors via a watchdog or UEFI (ACPI?) report to syslog >>> facility on FreeBSD. >>> FreeBSD is supposed to be a server operating system, as far as I know= , >>> so I believe there must be something which didn't have revealed itsel= f >>> to me, yet. >=20 >> >> If the hardware supports it, such errors should be logged as MCEs >> (Machine Check Exceptions). >> I can say for sure it works pretty well with Dell servers, as I had=20 >> one with failing RAM module, and >> it reported the corrected ECC errors in dmesg. >=20 > Memory ECC errors are logged in to messages and you can decode it by > sysutils/mcelog. I did it in the past on one of our Sun Fire X2100 M2 > with FreeBSD 8.x. >=20 > Miroslav Lachman Seems that I have been blessed with non-faulty memory over tha past three or four years. Last time I saw errors was around 2000. All of our 24/7 servers do have ECC RAM. So, your replies all implies if I log the system's messages via syslog properly (as we do remotely on a centralized server), then ECC errors should be reported by FreeBSD/kernel in a canonical way as the UEFI/BIOS reports them? Without special drivers/tools, scripts which scans for those errors should report occurences? Since my (FreeBSD) boxes didn't show up errors of that kind - Linux boxes of a colleague did once! - doesn't imply missing capabilities. This is nice to hear/read. Thanks a lot, Oliver --------------enig910C6AD7A324B09C21B3649A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQEcBAEBAgAGBQJPgrQrAAoJEOgBcD7A/5N8cF8H/1nYRUFkgGBpmOaMyS5ED1ij 7wqM4s0OiCsW7bzFxTj3/C3dushNefBcesdTSDmU/I8nks0197J8PPy7PSldqffB OvlpxxNKEJwO+kp8+iO3oAdu0QNKK8pLhoAaDeXPq8N/e0M2DpcjE6j2rnC0td/l sppKb9cKZKEoWBZ/3dc5DjyzO3oVxTrnxSwIFolF7EINHkADb80ka8vtjOHSqXIP M0CkQZA+hJPL+iHRK1Ab5Kw4Wq6/7tljPlo560U/nr9gW7XoPGH0lTXzcjGOMVuX FepjT6D7r1kf+k0zrmi/AyJy6NuLEqKXWprmEoYTXQZBqv6NdM+zcHImiwJNvds= =UYjv -----END PGP SIGNATURE----- --------------enig910C6AD7A324B09C21B3649A-- From owner-freebsd-performance@FreeBSD.ORG Mon Apr 9 13:42:04 2012 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 270081065783; Mon, 9 Apr 2012 13:42:04 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-245-Pennsylvania.hfc.comcastbusiness.net [75.149.8.245]) by mx1.freebsd.org (Postfix) with ESMTP id C98138FC12; Mon, 9 Apr 2012 13:42:03 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id 30F39446006; Mon, 9 Apr 2012 09:31:28 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R1lVzEGRrzHd; Mon, 9 Apr 2012 09:31:23 -0400 (EDT) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id 914EC446003; Mon, 9 Apr 2012 09:31:23 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Andrew Boyer In-Reply-To: <4F82B42B.1050900@zedat.fu-berlin.de> Date: Mon, 9 Apr 2012 09:32:01 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4F7ED7F4.5060509@zedat.fu-berlin.de> <687BFFD7-1456-4D7B-AFB2-356EE9B0D1DD@gmail.com> <4F818A3B.5040904@quip.cz> <4F82B42B.1050900@zedat.fu-berlin.de> To: O. Hartmann X-Mailer: Apple Mail (2.1084) X-Mailman-Approved-At: Mon, 09 Apr 2012 16:01:16 +0000 Cc: Nikolay Denev , freebsd-performance@freebsd.org, Current FreeBSD , Miroslav Lachman <000.fbsd@quip.cz> Subject: Re: ECC memory driver in FreeBSD 10? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 13:42:04 -0000 On Apr 9, 2012, at 6:04 AM, O. Hartmann wrote: > Am 04/08/12 14:53, schrieb Miroslav Lachman: >> Nikolay Denev wrote: >>> On Apr 6, 2012, at 2:48 PM, O. Hartmann wrote: >>>=20 >>>> I'm looking for a way to force FreeBSD 10 to maintain/watch ECC = errors >>>> reported by UEFI (or BIOS). >>>> Since ECC is said to be essential for server systems both in = buisness >>>> and science and I do not question this, I was wondering if I can = not >>>> report ECC errors via a watchdog or UEFI (ACPI?) report to syslog >>>> facility on FreeBSD. >>>> FreeBSD is supposed to be a server operating system, as far as I = know, >>>> so I believe there must be something which didn't have revealed = itself >>>> to me, yet. >>=20 >>>=20 >>> If the hardware supports it, such errors should be logged as MCEs >>> (Machine Check Exceptions). >>> I can say for sure it works pretty well with Dell servers, as I had=20= >>> one with failing RAM module, and >>> it reported the corrected ECC errors in dmesg. >>=20 >> Memory ECC errors are logged in to messages and you can decode it by >> sysutils/mcelog. I did it in the past on one of our Sun Fire X2100 M2 >> with FreeBSD 8.x. >>=20 >> Miroslav Lachman >=20 > Seems that I have been blessed with non-faulty memory over tha past > three or four years. Last time I saw errors was around 2000. All of = our > 24/7 servers do have ECC RAM. >=20 > So, your replies all implies if I log the system's messages via syslog > properly (as we do remotely on a centralized server), then ECC errors > should be reported by FreeBSD/kernel in a canonical way as the = UEFI/BIOS > reports them? > Without special drivers/tools, scripts which scans for those errors > should report occurences? >=20 > Since my (FreeBSD) boxes didn't show up errors of that kind - Linux > boxes of a colleague did once! - doesn't imply missing capabilities. > This is nice to hear/read. >=20 > Thanks a lot, >=20 > Oliver >=20 This is what you see in syslog when sys/x86/x86/mca.c detects a memory = error: > Mar 16 12:37:33 hostname kernel: MCA: Bank 8, Status = 0x8c0000400001009f > Mar 16 12:37:33 hostname kernel: MCA: Global Cap 0x0000000000001c09, = Status 0x0000000000000000 > Mar 16 12:37:33 hostname kernel: MCA: Vendor "GenuineIntel", ID = 0x206c2, APIC ID 0 > Mar 16 12:37:33 hostname kernel: MCA: CPU 0 COR (1) RD channel ?? = memory error > Mar 16 12:37:33 hostname kernel: MCA: Address 0xb43ca6240 > Mar 16 12:37:33 hostname kernel: MCA: Misc 0x4ac8111000064808 mcelog will help you figure out which DIMM is affected. Also, if your server includes an IPMI controller, the BIOS should be set = up to log memory errors to the IPMI system event log (SEL). You can = look at the SEL with ipmitool from the ports collection. 'ipmitool sel = list' will show you if any errors have been reported. -Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com