Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Nov 2008 12:48:22 -0800
From:      Jo Rhett <jrhett@svcolo.com>
To:        freebsd-stable Stable <freebsd-stable@freebsd.org>
Cc:        Jeremy Chadwick <koitsu@freebsd.org>
Subject:   smartd long self-test causes drives to hang
Message-ID:  <EBDD87D8-401B-4812-9121-C3301C06276B@svcolo.com>

next in thread | raw e-mail | index | archive | help
I've spent about 3 months tracing down what was causing my personal  
colo box to start getting "sluggish" right around dawn every Saturday  
morning.  It took so long because some mornings I simply couldn't pull  
my head out of my tail enough to do proper debugging.

The cause was *really slow* filesystem response time.  No cron jobs in  
that period.  No specific process ran any slower than another,  
although I eventually learned that ones which did no file i/o were  
fine.  And finally I realized that just "ls -la" was very slow (~1  
minute) even after I had killed off every disk-using process in the  
system.  SMTP and HTTP in particular were basically fubar.

No data loss, just *real slow*.  Nothing other than a soft reboot ever  
solved the problem.    Even leaving it running only minimal processes  
for 24 hours didn't bring it back to normal.

Finally I was browsing through Jeremy Chadwick's list of known ATA  
problems and spotted his comments about smartd self-tests causing  
problems.  Sure enough, my long self test was scheduled for 5am on  
Saturday mornings.  Rechecking the observed slow-down periods  
confirmed that the problem never became visible before 5am.   
(sometimes it took up to 45 minutes before things slowed down enough  
to set off monitoring alarms)

So, long story short, if you're having weirdness in system time  
response - check the smartd configuration, and try disabling the self  
tests.  The short self test I was running daily didn't appear to  
affect anything, but the long test was just bringing the system to  
just shuddering and limping at best.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?EBDD87D8-401B-4812-9121-C3301C06276B>