From owner-freebsd-questions@freebsd.org Wed Aug 3 03:43:00 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B321FBACFAD for ; Wed, 3 Aug 2016 03:43:00 +0000 (UTC) (envelope-from wam@hiwaay.net) Received: from fly.hiwaay.net (fly.hiwaay.net [216.180.54.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 737A61694 for ; Wed, 3 Aug 2016 03:43:00 +0000 (UTC) (envelope-from wam@hiwaay.net) Received: from kabini1.local (dynamic-216-186-209-65.knology.net [216.186.209.65] (may be forged)) (authenticated bits=0) by fly.hiwaay.net (8.13.8/8.13.8/fly) with ESMTP id u733gpTx007084 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Tue, 2 Aug 2016 22:42:52 -0500 To: FreeBSD Questions !!!! From: "William A. Mahaffey III" Subject: Ominous smartd messages .... Message-ID: Date: Tue, 2 Aug 2016 22:48:21 -0453.75 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2016 03:43:00 -0000 I have a workstation, recently (within the last 12 mos.) purpose built that is my dev. box for in house software (CFD code & user interface). It has 8 1 TB HDD's in a ZFS unmirrored pool. All data is backed up nightly by rsunc to other boxen on my LAN & weekly by compressed tar to those boxen as well. This A.M., I noticed some messages on the console from smartd about 1 of the HDDs. I tailed my syslog file & see the following: [root@devbox, /etc, 9:42:37pm] 360 % tail -50 /var/log/messages ; hwclock -r ; date Jul 26 05:11:34 devbox kernel: pid 39865 (time), uid 1110: exited on signal 6 Jul 26 05:22:29 devbox kernel: pid 41916 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 26 05:22:41 devbox kernel: pid 41953 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 26 05:22:52 devbox kernel: pid 41990 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 26 22:51:42 devbox healthd: A value of 1.64 for CPU #0 Core Voltage with a range of (1.65 <= n <= 2.30) Jul 27 06:30:56 devbox kernel: pid 71015 (memcheck-amd64-free), uid 1110: exited on signal 6 Jul 27 06:30:56 devbox kernel: pid 71014 (time), uid 1110: exited on signal 6 Jul 27 06:41:44 devbox kernel: pid 73053 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 27 06:41:55 devbox kernel: pid 73090 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 27 06:42:07 devbox kernel: pid 73127 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 28 05:10:56 devbox kernel: pid 2029 (memcheck-amd64-free), uid 1110: exited on signal 6 Jul 28 05:10:56 devbox kernel: pid 2028 (time), uid 1110: exited on signal 6 Jul 28 05:21:41 devbox kernel: pid 4067 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 28 05:21:51 devbox kernel: pid 4104 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 28 05:22:02 devbox kernel: pid 4141 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 29 06:30:46 devbox kernel: pid 33158 (memcheck-amd64-free), uid 1110: exited on signal 6 Jul 29 06:30:47 devbox kernel: pid 33157 (time), uid 1110: exited on signal 6 Jul 29 06:41:39 devbox kernel: pid 35197 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 29 06:41:51 devbox kernel: pid 35234 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 29 06:42:01 devbox kernel: pid 35271 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 30 05:10:54 devbox kernel: pid 64110 (memcheck-amd64-free), uid 1110: exited on signal 6 Jul 30 05:10:54 devbox kernel: pid 64109 (time), uid 1110: exited on signal 6 Jul 30 05:21:44 devbox kernel: pid 66149 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 30 05:21:55 devbox kernel: pid 66186 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 30 05:22:06 devbox kernel: pid 66223 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 31 16:18:28 devbox kernel: pid 96421 (memcheck-amd64-free), uid 1110: exited on signal 6 Jul 31 16:18:28 devbox kernel: pid 96420 (time), uid 1110: exited on signal 6 Jul 31 16:29:15 devbox kernel: pid 98457 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 31 16:29:26 devbox kernel: pid 98494 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Jul 31 16:29:36 devbox kernel: pid 98531 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Aug 1 06:31:17 devbox kernel: pid 26641 (memcheck-amd64-free), uid 1110: exited on signal 6 Aug 1 06:31:17 devbox kernel: pid 26640 (time), uid 1110: exited on signal 6 Aug 1 06:42:45 devbox kernel: pid 28681 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Aug 1 06:42:58 devbox kernel: pid 28718 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Aug 1 06:43:09 devbox kernel: pid 28755 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Aug 1 23:09:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 1 23:39:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 00:09:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 00:39:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 01:09:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 01:39:58 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 02:09:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 02:39:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 03:09:59 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 03:39:58 devbox smartd[835]: Device: /dev/ada5, 8 Currently unreadable (pending) sectors Aug 2 05:11:06 devbox kernel: pid 57518 (memcheck-amd64-free), uid 1110: exited on signal 6 Aug 2 05:11:06 devbox kernel: pid 57517 (time), uid 1110: exited on signal 6 Aug 2 05:21:55 devbox kernel: pid 59557 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Aug 2 05:22:07 devbox kernel: pid 59594 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) Aug 2 05:22:16 devbox kernel: pid 59642 (PreBFCGL.opteron.TE), uid 1110: exited on signal 6 (core dumped) hwclock: Command not found. Tue Aug 2 22:28:28 MCDT 2016 You have new mail. [root@devbox, /etc, 10:28:28pm] 361 % uname -a FreeBSD devbox 9.3-RELEASE-p33 FreeBSD 9.3-RELEASE-p33 #0: Wed Jan 13 17:55:39 UTC 2016 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 [root@devbox, /etc, 10:28:53pm] 362 % I am asking about the smartd messages from overnight Aug 1-2. The rest gives a bit of context about the duty cycle of the machine, which is pretty light except for rebuild code nightly & run a suite of scripted tests, some of which obviously don't complete successfully :-). The HDD's were brand new 2.5" SATA3 7200 RPM HGST's, which I have had good luck with in other builds (I have about 25 in service, all purchased new from NewEgg for each build). Case is well ventilated, drives are cool, etc. From a shell script to nicely format smartctl output: [root@devbox, /etc, 10:39:28pm] 364 % hddtemp /dev/ada[0-7] SMART supported, SMART enabled drive /dev/ada0: HGST HTS721010A9E630, S/N: JR10046P1EEJZN, Temp. 27 degC, min/max, cycle: 17/36, lifetime: 17/36, lifetime avg. 25 degC drive /dev/ada1: HGST HTS721010A9E630, S/N: JR10046P1E5ZWN, Temp. 27 degC, min/max, cycle: 17/37, lifetime: 17/37, lifetime avg. 25 degC drive /dev/ada2: HGST HTS721010A9E630, S/N: JR10046P1EKG0N, Temp. 27 degC, min/max, cycle: 16/36, lifetime: 16/36, lifetime avg. 24 degC drive /dev/ada3: HGST HTS721010A9E630, S/N: JR10046P1EK72N, Temp. 27 degC, min/max, cycle: 17/37, lifetime: 17/37, lifetime avg. 25 degC drive /dev/ada4: HGST HTS721010A9E630, S/N: JR10046P1E649N, Temp. 25 degC, min/max, cycle: 15/32, lifetime: 15/33, lifetime avg. 23 degC drive /dev/ada5: HGST HTS721010A9E630, S/N: JR10046P1E69HN, Temp. 25 degC, min/max, cycle: 14/31, lifetime: 14/31, lifetime avg. 22 degC drive /dev/ada6: HGST HTS721010A9E630, S/N: JR10046P1EK92N, Temp. 25 degC, min/max, cycle: 14/31, lifetime: 14/32, lifetime avg. 22 degC drive /dev/ada7: HGST HTS721010A9E630, S/N: JR10046P1E6A4N, Temp. 25 degC, min/max, cycle: 14/32, lifetime: 14/32, lifetime avg. 23 degC [root@devbox, /etc, 10:39:43pm] 365 % My question is: Are these messages benign, or am I in the market for more hardware ? *ANY* more questions, please ask. TIA & have a good one. -- William A. Mahaffey III ---------------------------------------------------------------------- "The M1 Garand is without doubt the finest implement of war ever devised by man." -- Gen. George S. Patton Jr.