From owner-freebsd-questions@FreeBSD.ORG  Wed Aug  7 20:36:27 2013
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id A727E4B6
 for <freebsd-questions@freebsd.org>; Wed,  7 Aug 2013 20:36:27 +0000 (UTC)
 (envelope-from jdavidlists@gmail.com)
Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com
 [IPv6:2607:f8b0:4001:c03::22c])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 78F5127AB
 for <freebsd-questions@freebsd.org>; Wed,  7 Aug 2013 20:36:27 +0000 (UTC)
Received: by mail-ie0-f172.google.com with SMTP id 17so346093iea.31
 for <freebsd-questions@freebsd.org>; Wed, 07 Aug 2013 13:36:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=FWmV1lZpV0wfCDxxuAYL5+FRbbt/Ve+4Qp6KTfr8oy4=;
 b=dH+5Upt/Jv9ivVDcvHn1blUCpOXQrNszaoMh2HMtQZSTPK5fDGi9I0rnue6sL9oKAM
 TGPbq37gjocJbh6n/lwa/WKEJl6+onOCgxuyI6CPXo10HowbEP3MEpgKWQn/Re3fjZbz
 ZqrAoX+b8MflIIrkr0SbW47obIfjSS6e7/fLz1A9aU+NiVqBxMt2cuqp95uO6VvOM+xv
 bMoUzLDaXDDzp9y7JjIVxjsu6keacKOnyLkBEl6ZgF2z/j+iYa1v7HZStDoBaaOYFgj5
 dANFR0rwNEcrDJT4EOI6gwetez+MWITMG9WMecRSTQOMFnTkWGSVm6or01iPqkAkQQi4
 CFow==
MIME-Version: 1.0
X-Received: by 10.43.137.9 with SMTP id im9mr456979icc.39.1375907786465; Wed,
 07 Aug 2013 13:36:26 -0700 (PDT)
Sender: jdavidlists@gmail.com
Received: by 10.42.150.196 with HTTP; Wed, 7 Aug 2013 13:36:26 -0700 (PDT)
In-Reply-To: <CAEhBLvg7ZUMja5zpFm2UQBXESW-0fL9L7EatR2aasstXd8ALHA@mail.gmail.com>
References: <CABXB=RSRnB41yjq5Qcbiz-JCRssNwx2AatJ2Dn+HhuD9GaBh+w@mail.gmail.com>
 <CAEhBLvg7ZUMja5zpFm2UQBXESW-0fL9L7EatR2aasstXd8ALHA@mail.gmail.com>
Date: Wed, 7 Aug 2013 16:36:26 -0400
X-Google-Sender-Auth: TWNBsZ2kkLZaK8L0XG5ha7qJxc4
Message-ID: <CABXB=RRvp0BLURq7M9iBb5anqaGsrvXeA1WmAroNji6bZP8p4w@mail.gmail.com>
Subject: Re: Terrible disk performance with LSI / FreeBSD 9.2-RC1
From: J David <j.david.lists@gmail.com>
To: James Gosnell <jamesgosnell@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-questions@freebsd.org
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Aug 2013 20:36:27 -0000

On Wed, Aug 7, 2013 at 3:15 PM, James Gosnell <jamesgosnell@gmail.com> wrote:
> Maybe one of your drives is bad, so it's constantly doing error correction?

Not according to SMART; all the drives report no problems.  Also, all
the drives seem to perform in lock-step for both reading and writing.
E.g. when one drive in an array is failing, all the drives may be
pulling the same # of reads, but the failing drive will often report
100% busy and/or multi-second svc_t's and the others will sit at 4%
with 20msec svc_t's or similar.  In this case, it's acting like the
disks are all hugely overloaded.   Except without even the high
svc_t's I typically associate with overworking an array.

The speeds do fluctuate.  Last night it was down to 64k/sec reads per
drive (about 15 reads/sec) and still reporting 90% busy on all drives.

It feels like some sort of issue with the
bus/controller/kernel/driver/ZFS that is affecting all the drives
equally.

Also, even ls takes forever (10-30 seconds for "ls -lh /") but when it
eventually does finish, "time ls -lh /" reports:

        0.02 real         0.00 user         0.00 sys

Really not sure what to make of that. An attempt to do "ps axlww |
fgrep ls" while the ls was running failed, because the ps hangs just
as long as the ls.  So it's like the system is just repeatedly putting
anything that touches the disks on hold, even if all the data being
requested is clearly in cache.  (Even apparently loading the binary
for /bin/ls or doing "ls -lh /" twice in a row.)

Thanks!