From owner-freebsd-questions@FreeBSD.ORG Mon May 20 02:55:15 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 862BD72A for ; Mon, 20 May 2013 02:55:15 +0000 (UTC) (envelope-from dg@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 29263619 for ; Mon, 20 May 2013 02:55:15 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.6/8.14.5) with ESMTP id r4K2sxwo096110; Sun, 19 May 2013 19:54:59 -0700 (PDT) (envelope-from dg@pki2.com) Subject: Re: More than 32 CPUs under 8.4-P From: Dennis Glatting To: Paul Kraus In-Reply-To: <1369014335.16472.60.camel@btw.pki2.com> References: <1368897188.16472.19.camel@btw.pki2.com> <51989FDA.5070302@coosemans.org> <1368978686.16472.25.camel@btw.pki2.com> <1369014335.16472.60.camel@btw.pki2.com> Content-Type: text/plain; charset="ISO-8859-1" Date: Sun, 19 May 2013 19:54:59 -0700 Message-ID: <1369018499.16472.65.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: r4K2sxwo096110 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg@pki2.com Cc: Tijl Coosemans , freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 May 2013 02:55:15 -0000 Minutes after I typed that message 2x16 the system paniced with the following back trace: kdb_backtrace panic vdev_deadman vdev_deadman vdev_deadman spa_deadman softclock intr_event_execute_handlers ithread_loop fork_exit fork_trampoline I had just created a memory disk when that happened: root@iirc:~ # mdconfig -a -t swap -s 1g -u 1 root@iirc:~ # newfs -U /dev/md1 root@iirc:~ # mount /dev/md1 /mnt root@iirc:~ # cp -p procstat kgdb /mnt root@iirc:~ # cd /rescue/ root@iirc:/rescue # cp -p * /mnt On Sun, 2013-05-19 at 18:45 -0700, Dennis Glatting wrote: > On Sun, 2013-05-19 at 16:28 -0400, Paul Kraus wrote: > > On May 19, 2013, at 11:51 AM, Dennis Glatting wrote: > > > > > ZFS hangs on multi-socket systems (Tyan, Supermicro) under 9.1. ZFS does > > > not hang under 8.4. This (and one other 4 socket) is a production > > > system. > > > > Can you be more specific, I have been running 9.0 and 9.1 systems with > > multi-CPU and all ZFS with no (CPU related*) issues. > > > > I have (down to) ten FreeBSD/ZFS systems. Five of them are multi-socket > populated. All are AMD CPUs of the 6200 series. Two of those > multi-socketed systems are simply workstations and don't do much file > I/O, so I have yet to see them fault. > > The remaining three perform significant I/O in the 1-8TB (simultaneous) > file range, including sorting, compression, backup, etc (ZFS compression > is enabled on some data sets as is dedup on a few minor data sets). I > also do iSCSI and NFS from one of these systems. > > Simply, if I run 9.1 on those three busy systems ZFS will eventually > hang under load (within ten hours to a few days) whereas it does not > under 8.3/4. Two of those systems are 4x16 cores, one 2x16, and two 2x8 > cores. Multiple, simultaneous pbzip2 runs on individual 2-5TB ASCII > files generally causes a hang within 10-20 hours. > > "Hang" means the system is alive and on the network but disk I/O has > stopped. Run any command except statically linked executables on a > memory volume and they will not run (no output or return to command > prompt). This includes "reboot," which never really reboots. > > The volumes where work is performed are typically 12-33TB RAIDz2 > volumes. For example: > > root@mc:~ # zpool list disk-1 > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > disk-1 16.2T 5.86T 10.4T 36% 1.32x ONLINE - > > root@mc:~ # zpool status disk-1 > pool: disk-1 > state: ONLINE > scan: scrub repaired 0 in 21h53m with 0 errors on Mon Apr 29 01:52:55 > 2013 > config: > > NAME STATE READ WRITE CKSUM > disk-1 ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > cache > da0 ONLINE 0 0 0 > > errors: No known data errors > > > > * I say no CPU related issues because I have run into SATA timeout > > issues with an external SATA enclosure with 4 drives (I know, SATA port > > expanders are evil, but it is my best option here). Sometimes the zpool > > hangs hard, sometimes just becomes unresponsive for a while. My "fix", > > such as it is, is to tune the zfs per vdev queue depth as follows: > > > > vfs.zfs.vdev.min_pending="3" > > vfs.zfs.vdev.max_pending="5" > > > > I've not tried those. Currently, these are mine: > > vfs.zfs.write_limit_override="1G" > vfs.zfs.arc_max="8G" > vfs.zfs.txg.timeout=15 > vfs.zfs.cache_flush_disable=1 > > # Recommended from the net > # April, 2013 > vfs.zfs.l2arc_norw=0 # Default is 1 > vfs.zfs.l2arc_feed_again=0 # Default is 1 > vfs.zfs.l2arc_noprefetch=0 # Default is 0 > vfs.zfs.l2arc_feed_min_ms=1000 # Default is 200 > > > > The defaults are 5 and 10 respectively, and when I run with those I > > have the timeout issues, but only under very heavy I/O load. I only > > generate such load when migrating large amounts of data, which > > thankfully does not happen all that often. > > > > Two days ago when the 9.1 system hanged I was able to run a static > procstat where it inadvertently(?) printed that da0 wasn't responsive on > the console. Unfortunately I didn't have a static camcontrol ready so I > was unable to query it. > > That said, according to the criteria from > https://wiki.freebsd.org/AvgZfsDeadlockDebug that hang isn't a true ZFS > problem, yet hung it was. > > I have since (today) updated the firmware of most of the devices in that > system and it is currently running some tasks. Most of the disks in that > system are Seagate but the un-updated devices include three WD disks > (RAID1 OS and a swap disk) -- unupdated because I haven't been able to > figure WD firmware download out) and a SSD where the manufacturer > indicates the firmware diff is minor, though I plan to go back and flash > it anyway. > > If my 4x16 system ever finishes I will be updating its device's firmware > too but it is an 8.4-P system and doesn't give me any trouble. Another > 4x16 system gave me ZFS trouble under 9.1 but when I downgraded to 8.4-P > it has been stable as a rock for the past 22 days often under heavy > load. > > > > > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" -- Dennis Glatting