Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 May 2015 14:10:15 -0500
From:      Thomas Johnson <tommyj27@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   zpool on Dell MD3000 causes frequent hangs
Message-ID:  <CAMwYC7bJpQ__7YvoMokW%2Bo9SEkGGJ=Xw48PFTkgSnT650X_YGQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hello,

I am trying to track down an ongoing issue that I've been having, and
looking for any suggestions on a possible cause, or suggestions on how I
might troubleshoot further.

The issue seems to be related to a Dell MD3000 storage array, which
contains a zpool. It seems that the host attached to the array will
occasionally hang, usually during periods of high disk activity
(annoyingly, usually about 0300).

When the system hangs, I can ping the host, and switch between virtual
consoles (but not interact with them). The system is otherwise
unresponsive; with no errors reported on the console or logs. The only
remedy I have found is to hard-reset the host.

I believe this issue is tied to the MD3000. I have tried swapping out SAS
cables, HBAs, the controller on the MD3000, and the host itself. I have
updated all the firmware I can find. Before I upgraded the host OS to
FreeBSD 10.1 (from 10.0) last month, I experienced hangs about once a
month. Since the upgrade, I have seen several events per week.

In addition to the MD3000, I have a set of USB drives that are used in a
rotation as offsite backups for the zpool. I have seen a number of hang
events during zfs send/receive transfers to the USB disk.

After the most recent hang, I removed two [consumer] SSDs from the pool
that were being used as cache devices. It is too early to tell if this
change had any impact.

Here is some of the pertinent output from the host. I can provide any other
information that would be helpful.

root@leopard:/home/tom-> uname -a
FreeBSD leopard 10.1-RELEASE-p9 FreeBSD 10.1-RELEASE-p9 #0 r281232: Tue
Apr  7 17:38:04 CDT 2015
root@cheshire-b:/pkg/base/obj_10.1-RELEASE-p9/pkg/base/src_10.1-RELEASE-p9/sys/GENERIC
amd64
root@leopard:/home/tom-> zpool list
NAME          SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH
ALTROOT
backup       5.31T  3.61T  1.70T    22%         -    68%  1.00x  ONLINE  -
jumpdrive_f  2.72T  2.04T   693G    30%         -    75%  1.00x  ONLINE  -
root@leopard:/home/tom-> zpool status backup
  pool: backup
 state: ONLINE
  scan: scrub repaired 0 in 13h15m with 0 errors on Wed May 13 16:17:29 2015
config:

    NAME        STATE     READ WRITE CKSUM
    backup      ONLINE       0     0     0
      da0       ONLINE       0     0     0

errors: No known data errors
root@leopard:/home/tom-> zpool get all backup
NAME    PROPERTY                       VALUE                          SOURCE
backup  size                           5.31T                          -
backup  capacity                       68%                            -
backup  altroot                        -
default
backup  health                         ONLINE                         -
backup  guid                           12638712474922952450
default
backup  version                        -
default
backup  bootfs                         -
default
backup  delegation                     on
default
backup  autoreplace                    off
default
backup  cachefile                      -
default
backup  failmode                       wait
default
backup  listsnapshots                  off
default
backup  autoexpand                     off
default
backup  dedupditto                     0
default
backup  dedupratio                     1.00x                          -
backup  free                           1.70T                          -
backup  allocated                      3.61T                          -
backup  readonly                       off                            -
backup  comment                        -
default
backup  expandsize                     0                              -
backup  freeing                        0
default
backup  fragmentation                  22%                            -
backup  leaked                         0
default
backup  feature@async_destroy          enabled                        local
backup  feature@empty_bpobj            active                         local
backup  feature@lz4_compress           active                         local
backup  feature@multi_vdev_crash_dump  enabled                        local
backup  feature@spacemap_histogram     active                         local
backup  feature@enabled_txg            active                         local
backup  feature@hole_birth             active                         local
backup  feature@extensible_dataset     enabled                        local
backup  feature@embedded_data          active                         local
backup  feature@bookmarks              enabled                        local
backup  feature@filesystem_limits      enabled                        local



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMwYC7bJpQ__7YvoMokW%2Bo9SEkGGJ=Xw48PFTkgSnT650X_YGQ>