From owner-freebsd-fs@FreeBSD.ORG Sun Mar 17 00:57:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 53D5064B for ; Sun, 17 Mar 2013 00:57:49 +0000 (UTC) (envelope-from freebsd@deman.com) Received: from plato.corp.nas.com (plato.corp.nas.com [66.114.32.138]) by mx1.freebsd.org (Postfix) with ESMTP id 1203DA70 for ; Sun, 17 Mar 2013 00:57:48 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by plato.corp.nas.com (Postfix) with ESMTP id 1AF49133BB40C; Sat, 16 Mar 2013 17:48:56 -0700 (PDT) X-Virus-Scanned: amavisd-new at corp.nas.com Received: from plato.corp.nas.com ([127.0.0.1]) by localhost (plato.corp.nas.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t4vGbUTPY4PS; Sat, 16 Mar 2013 17:48:55 -0700 (PDT) Received: from [192.168.113.203] (75-151-97-138-washington.hfc.comcastbusiness.net [75.151.97.138]) by plato.corp.nas.com (Postfix) with ESMTPSA id 01819133BB3FC; Sat, 16 Mar 2013 17:48:54 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: FreeBSD & no single point of failure file service From: Michael DeMan In-Reply-To: Date: Sat, 16 Mar 2013 17:48:54 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> References: To: J David , freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1499) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2013 00:57:49 -0000 Hi David, We are looking at the exact same thing - let me know what you find out. I think it is pretty obvious that ixsystems.com has this figured out = along with all the tricky details - but for the particular company I am = looking to implement this for - vendors that can't show their prices for = products are vendors we have to stay away from because not showing = pricing means it starts at $100K minimum + giant annual support fees. = In all honesty some kind of 3rd party designed solution with only = minimal support would be fine for us, but I don't think that is their = regular market. I was thinking to maybe test something out like: #1. A couple old Dell 2970s head units with LSI cards. #2. One dual-port SAS chassis. #3. Figure out what needs to happen with devd+carp in order for the = head end units to REALIBLY know when to export/import ZFS and when to = advertise NFS/iSCSI, etc. A couple catches with this of course is that for #3 there could be some = kind of unexpected heartbeat failure between the two head end units = where they both decide the other is gone and both become masters - which = would probably result in catastrophic corruption on the file system. SuperMicro does have that one chassis that accepts lots of drives and = two custom motherboards that are linked internally via 10GB - I think = ixsystems uses that. So in theory the edge case of the accidental = 'master/master' configuration is helped by hardware. By the same token = I am skeptical of having both head end units in a single chassis. = Pardon me for being paranoid. So what I came to the conclusion with #3 for home-brew design was that = devd+carp is great overall, but there needs to be an additional = out-of-band confirmation between the two head end units. Scenario is: #1-#2 above. The head units are wired up such that they are providing storage and = also running (hsrp/carp/vrrp) on their main link that they vend their = storage resources off to the network. They are also connected via another channel - this could be a x-over = ethernet link, serial cable - or in my case simply re-use the dedicated = ethernet port that is used for management-only access to the servers and = is already out of band. If a network engineer comes and tweaks around the NFS/iSCSI switches or = something else, makes a mistake, and that link between the two head end = units is broken - both machines are going to want to be masters, and = write directly to whatever shared physical storage they have? This is where the additional link between the head units comes in. = Storage delivery side of things has 'split brain' - head end units can = not talk to each other, but may be able to talk to some (or all) clients = that use their services. With current design for ZFS v28 there can be = only one master for utilizing the physical attached storage from the = head ends - otherwise small problem that could have been better fixed by = just having an outage turns into a potential loss of all the data = everywhere? So basically failover between the head units works as follows: A) I am secondary on the big storage ethernet link and the primary has = timed out on telling me it is still alive. B) Confirm on the out-of-band link whether the primary is still up or = not, and what it thinks the state of affairs may be. (optimize by = starting this check 1st time primary heartbeat is lost - not after = timeout?) C) If the primary thinks it has lost connectivity to the clients then = confirm it is also not longer acting as a primary for the physical = storage, and I should attach the storage and try to become the primary. D) ??? If the primary thinks it still can connect to the clients, then = what? E) =46rom (C) above - lets be sure to avoid a flapping situation. F) No matter what, if the state of which head end unit should be the = 'master' (vending NFS/iSCSI and also handling the physical storage) - = then both units should deny services? Longer e-mail than I expected. Thanks for the post - it made me think = about things. Probably there are huge problems in my above synopsis. = The hard work is always in the details, not the design? - Mike On Mar 9, 2013, at 3:40 PM, J David wrote: > Hello, >=20 > I would like to build a file server with no single point of failure, = and I > would like to use FreeBSD and ZFS to do it. >=20 > The hardware configuration we're looking at would be two servers with = 4x > SAS connectors and two SAS JBOD shelves. Both servers would have dual > connections to each shelf. >=20 > The disks would be configured in mirrored pairs, with one disk from = each > pair in each shelf. One pair for ZIL, one or two pairs for L2ARC, and = the > rest for ZFS data. >=20 > We would be shooting for an active/standby configuration where the = standby > system is booted up but doesn't touch the bus unless/until it detects = CARP > failover from the master via devd, then it does a zpool import. (Even = so > all TCP sessions for NFS and iSCSI will get reset, which seems = unavoidable > but recoverable.) >=20 > This will be really expensive to test, so I would be very interested = if > anyone has feedback on how FreeBSD will handle this type of shared-SAS > hardware configuration. >=20 > Thanks for any advice! > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun Mar 17 01:00:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 842E8886 for ; Sun, 17 Mar 2013 01:00:08 +0000 (UTC) (envelope-from freebsd@deman.com) Received: from plato.corp.nas.com (plato.corp.nas.com [66.114.32.138]) by mx1.freebsd.org (Postfix) with ESMTP id 4A523A81 for ; Sun, 17 Mar 2013 01:00:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by plato.corp.nas.com (Postfix) with ESMTP id 137B9133BB6AE; Sat, 16 Mar 2013 18:00:07 -0700 (PDT) X-Virus-Scanned: amavisd-new at corp.nas.com Received: from plato.corp.nas.com ([127.0.0.1]) by localhost (plato.corp.nas.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OjWAJ6Ff0nzp; Sat, 16 Mar 2013 18:00:05 -0700 (PDT) Received: from [192.168.113.203] (75-151-97-138-washington.hfc.comcastbusiness.net [75.151.97.138]) by plato.corp.nas.com (Postfix) with ESMTPSA id EF1D6133BB6A3; Sat, 16 Mar 2013 18:00:04 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: FreeBSD & no single point of failure file service From: Michael DeMan In-Reply-To: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> Date: Sat, 16 Mar 2013 18:00:03 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> To: J David , freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1499) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2013 01:00:08 -0000 Errata.. --- by 'out of band', for my case simply another ethernet link that by = convention is physically separate from the 'primary' storage ethernet. = Good enough for my use case. --- On (F) below - meant 'if neither head unit can decide whether it = should be the master or not' - then they both deny services. Better = that bugs cause outages rather than data loss? - Mike On Mar 16, 2013, at 5:48 PM, Michael DeMan wrote: > Hi David, >=20 > We are looking at the exact same thing - let me know what you find = out. >=20 > I think it is pretty obvious that ixsystems.com has this figured out = along with all the tricky details - but for the particular company I am = looking to implement this for - vendors that can't show their prices for = products are vendors we have to stay away from because not showing = pricing means it starts at $100K minimum + giant annual support fees. = In all honesty some kind of 3rd party designed solution with only = minimal support would be fine for us, but I don't think that is their = regular market. >=20 > I was thinking to maybe test something out like: >=20 > #1. A couple old Dell 2970s head units with LSI cards. > #2. One dual-port SAS chassis. > #3. Figure out what needs to happen with devd+carp in order for the = head end units to REALIBLY know when to export/import ZFS and when to = advertise NFS/iSCSI, etc. >=20 > A couple catches with this of course is that for #3 there could be = some kind of unexpected heartbeat failure between the two head end units = where they both decide the other is gone and both become masters - which = would probably result in catastrophic corruption on the file system. >=20 > SuperMicro does have that one chassis that accepts lots of drives and = two custom motherboards that are linked internally via 10GB - I think = ixsystems uses that. So in theory the edge case of the accidental = 'master/master' configuration is helped by hardware. By the same token = I am skeptical of having both head end units in a single chassis. = Pardon me for being paranoid. >=20 > So what I came to the conclusion with #3 for home-brew design was that = devd+carp is great overall, but there needs to be an additional = out-of-band confirmation between the two head end units. >=20 >=20 > Scenario is: >=20 > #1-#2 above. >=20 > The head units are wired up such that they are providing storage and = also running (hsrp/carp/vrrp) on their main link that they vend their = storage resources off to the network. >=20 > They are also connected via another channel - this could be a x-over = ethernet link, serial cable - or in my case simply re-use the dedicated = ethernet port that is used for management-only access to the servers and = is already out of band. >=20 > If a network engineer comes and tweaks around the NFS/iSCSI switches = or something else, makes a mistake, and that link between the two head = end units is broken - both machines are going to want to be masters, and = write directly to whatever shared physical storage they have? >=20 > This is where the additional link between the head units comes in. = Storage delivery side of things has 'split brain' - head end units can = not talk to each other, but may be able to talk to some (or all) clients = that use their services. With current design for ZFS v28 there can be = only one master for utilizing the physical attached storage from the = head ends - otherwise small problem that could have been better fixed by = just having an outage turns into a potential loss of all the data = everywhere? >=20 > So basically failover between the head units works as follows: >=20 > A) I am secondary on the big storage ethernet link and the primary has = timed out on telling me it is still alive. > B) Confirm on the out-of-band link whether the primary is still up or = not, and what it thinks the state of affairs may be. (optimize by = starting this check 1st time primary heartbeat is lost - not after = timeout?) > C) If the primary thinks it has lost connectivity to the clients then = confirm it is also not longer acting as a primary for the physical = storage, and I should attach the storage and try to become the primary. > D) ??? If the primary thinks it still can connect to the clients, then = what? > E) =46rom (C) above - lets be sure to avoid a flapping situation. > F) No matter what, if the state of which head end unit should be the = 'master' (vending NFS/iSCSI and also handling the physical storage) - = then both units should deny services? >=20 >=20 > Longer e-mail than I expected. Thanks for the post - it made me think = about things. Probably there are huge problems in my above synopsis. = The hard work is always in the details, not the design? > - Mike >=20 >=20 >=20 >=20 >=20 >=20 >=20 > On Mar 9, 2013, at 3:40 PM, J David wrote: >=20 >> Hello, >>=20 >> I would like to build a file server with no single point of failure, = and I >> would like to use FreeBSD and ZFS to do it. >>=20 >> The hardware configuration we're looking at would be two servers with = 4x >> SAS connectors and two SAS JBOD shelves. Both servers would have = dual >> connections to each shelf. >>=20 >> The disks would be configured in mirrored pairs, with one disk from = each >> pair in each shelf. One pair for ZIL, one or two pairs for L2ARC, = and the >> rest for ZFS data. >>=20 >> We would be shooting for an active/standby configuration where the = standby >> system is booted up but doesn't touch the bus unless/until it detects = CARP >> failover from the master via devd, then it does a zpool import. = (Even so >> all TCP sessions for NFS and iSCSI will get reset, which seems = unavoidable >> but recoverable.) >>=20 >> This will be really expensive to test, so I would be very interested = if >> anyone has feedback on how FreeBSD will handle this type of = shared-SAS >> hardware configuration. >>=20 >> Thanks for any advice! >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 17 13:41:58 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 31C379F8 for ; Sun, 17 Mar 2013 13:41:58 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id B42621C4 for ; Sun, 17 Mar 2013 13:41:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r2HDfnWb081915; Sun, 17 Mar 2013 17:41:49 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Sun, 17 Mar 2013 17:41:49 +0400 (MSK) From: Dmitry Morozovsky To: Michael DeMan Subject: Re: FreeBSD & no single point of failure file service In-Reply-To: Message-ID: References: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (woozle.rinet.ru [0.0.0.0]); Sun, 17 Mar 2013 17:41:49 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2013 13:41:58 -0000 Michael, "great minds think alike", as I just written mail about HA storage here :) On Sat, 16 Mar 2013, Michael DeMan wrote: > --- by 'out of band', for my case simply another ethernet link that by > convention is physically separate from the 'primary' storage ethernet. Good > enough for my use case. Hmm, is Dell's management interface accissible from the mainboard directly? > --- On (F) below - meant 'if neither head unit can decide whether it should > be the master or not' - then they both deny services. Better that bugs cause > outages rather than data loss? I think it's better administratively define "emergency master" than going nowhere without service. Of course, changing this role shouldn't be allowed without other half alive [snip] -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Sun Mar 17 14:52:16 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5BBAC209; Sun, 17 Mar 2013 14:52:16 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 327A3403; Sun, 17 Mar 2013 14:52:16 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r2HEqGwL027686; Sun, 17 Mar 2013 14:52:16 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r2HEqGXN027685; Sun, 17 Mar 2013 14:52:16 GMT (envelope-from linimon) Date: Sun, 17 Mar 2013 14:52:16 GMT Message-Id: <201303171452.r2HEqGXN027685@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/176978: [zfs] [panic] zfs send -D causes "panic: System call ioctl returning with 1 locks held" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2013 14:52:16 -0000 Old Synopsis: zfs send -D causes "panic: System call ioctl returning with 1 locks held" New Synopsis: [zfs] [panic] zfs send -D causes "panic: System call ioctl returning with 1 locks held" Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Mar 17 14:51:58 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=176978 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 17 20:50:01 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8F784C1B for ; Sun, 17 Mar 2013 20:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 6381D1F2 for ; Sun, 17 Mar 2013 20:50:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r2HKo1tU092612 for ; Sun, 17 Mar 2013 20:50:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r2HKo1lh092610; Sun, 17 Mar 2013 20:50:01 GMT (envelope-from gnats) Date: Sun, 17 Mar 2013 20:50:01 GMT Message-Id: <201303172050.r2HKo1lh092610@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Andriy Gapon Subject: Re: kern/176978: [zfs] [panic] zfs send -D causes " panic: System call ioctl returning with 1 locks held" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Andriy Gapon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Mar 2013 20:50:01 -0000 The following reply was made to PR kern/176978; it has been noted by GNATS. From: Andriy Gapon To: bug-followup@FreeBSD.org, nwf@cs.jhu.edu Cc: Subject: Re: kern/176978: [zfs] [panic] zfs send -D causes "panic: System call ioctl returning with 1 locks held" Date: Sun, 17 Mar 2013 22:39:51 +0200 Please try to obtain a crashdump according to the following guidelines: http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 11:06:44 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E752FA4A for ; Mon, 18 Mar 2013 11:06:44 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 5DCBFAA8 for ; Mon, 18 Mar 2013 11:06:41 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r2IB6fiS002112 for ; Mon, 18 Mar 2013 11:06:41 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r2IB6f9t002110 for freebsd-fs@FreeBSD.org; Mon, 18 Mar 2013 11:06:41 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 18 Mar 2013 11:06:41 GMT Message-Id: <201303181106.r2IB6f9t002110@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 11:06:45 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/176978 fs [zfs] [panic] zfs send -D causes "panic: System call i o kern/176857 fs [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic o bin/176253 fs zpool(8): zfs pool indentation is misleading/wrong o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/174060 fs [ext2fs] Ext2FS system crashes (buffer overflow?) o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 300 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 12:03:05 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AD9F75B1 for ; Mon, 18 Mar 2013 12:03:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 10D8A695 for ; Mon, 18 Mar 2013 12:03:03 +0000 (UTC) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r2I6HhVC010164 for ; Mon, 18 Mar 2013 17:17:43 +1100 Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r2I6HUCP025820 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 18 Mar 2013 17:17:32 +1100 Date: Mon, 18 Mar 2013 17:17:30 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Peter Maloney Subject: Re: Aligning MBR for ZFS boot help In-Reply-To: <5141BA2A.9080904@brockmann-consult.de> Message-ID: <20130318170513.X1164@besplex.bde.org> References: <513C1629.50501@caltel.com> <513CD9AB.5080903@caltel.com> <513CE369.4030303@caltel.com> <1362951595.99445.2.camel@btw.pki2.com> <513E1208.5020804@caltel.com> <20130312203745.A1130@besplex.bde.org> <513F8F04.60206@caltel.com> <20130313232247.B1078@besplex.bde.org> <5140F373.1010907@caltel.com> <20130314195715.Y909@besplex.bde.org> <5141B8B6.4010209@brockmann-consult.de> <5141BA2A.9080904@brockmann-consult.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=JMpjKL2b c=1 sm=1 a=u3bVZBOdoLwA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=cUKNXEIY390A:10 a=KQk-SLzs4LMGY7x5F1IA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 12:03:05 -0000 On Thu, 14 Mar 2013, Peter Maloney wrote: > On 2013-03-14 12:47, Peter Maloney wrote: >> On 2013-03-14 10:41, Bruce Evans wrote: >>> On Wed, 13 Mar 2013, Cody Ritts wrote: >>> >>>> So, by setting those CHS values I am: >>>> making the partition table more compatible with other operating >>>> systems and BIOSes? >>>> and giving some utilities the CHS stuff they need to function right? >>> It's not completely clear that S=32 H=64 is portable, but it is what most >>> old SCSI BIOSes used. >>> >>> Also, if the disk already has some partitions with a certain geometry, >>> use >>> the same geometry for other partitions and don't use fdisk's defaults if >>> they differ. >>> >>> Bruce >> Oh man... I thought yeah that -a 1 or -a 2048 should work, but it >> doesn't. And then I thought I'd be extra crafty and use dd to directly >> write the partition table myself and send that as a solution to you >> guys, but even that fails! >> >> Here's writing a 63 alignment mbr to the disk, just to prove dd can do this: >> >> # gdd if=mbr.img of=/dev/md10 bs=512 count=1 >> 1+0 records in >> 1+0 records out >> 512 bytes (512 B) copied, 16.8709 s, 0.0 kB/s >> >> # gpart show md10 >> => 63 4194241 md10 MBR (2.0G) >> 63 40950 1 freebsd (20M) >> 41013 4153291 - free - (2G) >> >> Here's changing the start sector on the first partition to 2048 ;) >> Writing to the device works with bs=512, but not bs=1, so we use a file >> and bs=1 to do our edits, and then bs=512 to the disk. I use files too often to edit disks, because the binary editor that I use is old and assumes that block devices aren't broken, so it doesn't do its own blocking and thus always fails for disks, since it always writes 1 byte at a time. Of course, it is safer to edit a copy, but then it is too easy to make an error with the input or output offsets when dd'ing the files back to the disk. >> # gdd if=<(echo -ne "\x00\x08" ) of=mbr.img bs=1 seek=454 >> 2+0 records in >> 2+0 records out >> 2 bytes (2 B) copied, 0.000112023 s, 17.9 kB/s >> >> Here's writing the new 2048 aligned mbr to the disk: >> >> # gdd if=mbr.img of=/dev/md10 bs=1 count=1 >> gdd: writing `/dev/md10': *Invalid argument* >> 1+0 records in >> 0+0 records out >> 0 bytes (0 B) copied, 21.0247 s, 0.0 kB/s >> >> :O >> _________________________________________ > > Oh, and I almost forgot the most important part... the solution! > > The solution is to align to 129024 sectors instead, which fits the needs > of modern 512/1024/2048 alignment, and also the crazy old thing. Except it is beyond the end of the disk for a crazy old fdisk :-). > # gpart add -t freebsd -a 129024 -s 1M md10 > md10s1 added > # gpart add -t freebsd -a 129024 -s 1511M md10 > md10s2 added > # gpart show md10 > => 63 4194241 md10 MBR (2.0G) > 63 128961 - free - (63M) > 129024 2016 1 freebsd (1M) > 131040 127008 - free - (62M) > 258048 2967552 2 freebsd (1.4G) > 3225600 968704 - free - (473M) > ... Bruce From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 14:22:21 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8BF8A774; Mon, 18 Mar 2013 14:22:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 51AAEDA2; Mon, 18 Mar 2013 14:22:21 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id A25A3B968; Mon, 18 Mar 2013 10:22:20 -0400 (EDT) From: John Baldwin To: Rick Macklem Subject: Re: Deadlock in the NFS client Date: Mon, 18 Mar 2013 10:01:10 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <88927360.3963361.1363399419023.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <88927360.3963361.1363399419023.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201303181001.10217.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 18 Mar 2013 10:22:20 -0400 (EDT) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 14:22:21 -0000 On Friday, March 15, 2013 10:03:39 pm Rick Macklem wrote: > John Baldwin wrote: > > On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov wrote: > > > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote: > > > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov wrote: > > > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem wrote: > > > > > > John Baldwin wrote: > > > > > > > I ran into a machine that had a deadlock among certain files > > > > > > > on a > > > > > > > given NFS > > > > > > > mount today. I'm not sure how best to resolve it, though it > > > > > > > seems like > > > > > > > perhaps there is a bug with how the pool of nfsiod threads > > > > > > > is managed. > > > > > > > Anyway, more details on the actual hang below. This was on > > > > > > > 8.x with > > > > > > > the > > > > > > > old NFS client, but I don't see anything in HEAD that would > > > > > > > fix this. > > > > > > > > > > > > > > First note that the system was idle so it had dropped down > > > > > > > to only one > > > > > > > nfsiod thread. > > > > > > > > > > > > > Hmm, I see the problem and I'm a bit surprised it doesn't bite > > > > > > more often. > > > > > > It seems to me that this snippet of code from nfs_asyncio() > > > > > > makes too > > > > > > weak an assumption: > > > > > > /* > > > > > > * If none are free, we may already have an iod working on > > > > > > this mount > > > > > > * point. If so, it will process our request. > > > > > > */ > > > > > > if (!gotiod) { > > > > > > if (nmp->nm_bufqiods > 0) { > > > > > > NFS_DPF(ASYNCIO, > > > > > > ("nfs_asyncio: %d iods are already processing mount %p\n", > > > > > > nmp->nm_bufqiods, nmp)); > > > > > > gotiod = TRUE; > > > > > > } > > > > > > } > > > > > > It assumes that, since an nfsiod thread is processing some > > > > > > buffer for the > > > > > > mount, it will become available to do this one, which isn't > > > > > > true for your > > > > > > deadlock. > > > > > > > > > > > > I think the simple fix would be to recode nfs_asyncio() so > > > > > > that > > > > > > it only returns 0 if it finds an AVAILABLE nfsiod thread that > > > > > > it > > > > > > has assigned to do the I/O, getting rid of the above. The > > > > > > problem > > > > > > with doing this is that it may result in a lot more > > > > > > synchronous I/O > > > > > > (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe > > > > > > more > > > > > > synchronous I/O could be avoided by allowing nfs_asyncio() to > > > > > > create a > > > > > > new thread even if the total is above nfs_iodmax. (I think > > > > > > this would > > > > > > require the fixed array to be replaced with a linked list and > > > > > > might > > > > > > result in a large number of nfsiod threads.) Maybe just having > > > > > > a large > > > > > > nfs_iodmax would be an adequate compromise? > > > > > > > > > > > > Does having a large # of nfsiod threads cause any serious > > > > > > problem for > > > > > > most systems these days? > > > > > > > > > > > > I'd be tempted to recode nfs_asyncio() as above and then, > > > > > > instead > > > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed > > > > > > number of > > > > > > nfsiod threads (this could be a tunable, with the > > > > > > understanding that > > > > > > it should be large for good performance) > > > > > > > > > > > > > > > > I do not see how this would solve the deadlock itself. The > > > > > proposal would > > > > > only allow system to survive slightly longer after the deadlock > > > > > appeared. > > > > > And, I think that allowing the unbound amount of nfsiod threads > > > > > is also > > > > > fatal. > > > > > > > > > > The issue there is the LOR between buffer lock and vnode lock. > > > > > Buffer lock > > > > > always must come after the vnode lock. The problematic nfsiod > > > > > thread, which > > > > > locks the vnode, volatile this rule, because despite the > > > > > LK_KERNPROC > > > > > ownership of the buffer lock, it is the thread which de fact > > > > > owns the > > > > > buffer (only the thread can unlock it). > > > > > > > > > > A possible solution would be to pass LK_NOWAIT to nfs_nget() > > > > > from the > > > > > nfs_readdirplusrpc(). From my reading of the code, nfs_nget() > > > > > should > > > > > be capable of correctly handling the lock failure. And EBUSY > > > > > would > > > > > result in doit = 0, which should be fine too. > > > > > > > > > > It is possible that EBUSY should be reset to 0, though. > > > > > > > > Yes, thinking about this more, I do think the right answer is for > > > > readdirplus to do this. The only question I have is if it should > > > > do > > > > this always, or if it should do this only from the nfsiod thread. > > > > I > > > > believe you can't get this in the non-nfsiod case. > > > > > > I agree that it looks as of the workaround only needed for nfsiod > > > thread. > > > On the other hand, it is not immediately obvious how to detect that > > > the current thread is nfsio daemon. Probably a thread flag should be > > > set. > > > > OTOH, updating the attributes from readdir+ is only an optimization > > anyway, so > > just having it always do LK_NOWAIT is probably ok (and simple). > > Currently I'm > > trying to develop a test case to provoke this so I can test the fix, > > but no > > luck on that yet. > > > > -- > > John Baldwin > Just fyi, ignore my comment about the second version of the patch that > disables the nfsiod threads from doing readdirplus running faster. It > was just that when I tested the 2nd patch, the server's caches were > primed. Oops. > > However, sofar the minimal testing I've done has been essentially > performance neutral between the unpatch and patched versions. > > Hopefully John has a convenient way to do some performance testing, > since I won't be able to do much until the end of April. Performance testing I don't really have available. What I am focusing on atm is testing that the deadlock is fixed (I have a way to reproduce it now). -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 15:20:48 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 81CEBCD for ; Mon, 18 Mar 2013 15:20:48 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 044031E4 for ; Mon, 18 Mar 2013 15:20:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363618239; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=hRJuSYCHNIIR1C0tjJUsIzMlvLwKYGXcNtA4brIbyiY=; b=DQuKNJWZxbskv6I0eppFRUzn1y3hFsUEPmK/UQZJgT0+8ofLijAf3aydE7HcaV2S 28ICEdky7EX3vrk/LzN6K7sSc99V9qBjtVlWq4xjb2dnDb1lhA1hsAFqpXppbGe+ wW1hozXWvfW2//a+Qg6MqMrVeuk6VgTwVzAHKGP7PAk=; Received: from [213.92.90.12] ([213.92.90.12:49561] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 5F/14-24145-EB927415; Mon, 18 Mar 2013 15:50:39 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHbOY-000HLH-Tj for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 15:50:39 +0100 Received: (qmail 66665 invoked by uid 89); 18 Mar 2013 14:50:38 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 18 Mar 2013 14:50:38 -0000 Message-ID: <514729BD.2000608@contactlab.com> Date: Mon, 18 Mar 2013 15:50:37 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: FreBSD 9.1 and ZFS v28 performances Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 15:20:48 -0000 Hi all, I'm trying to use ZFS on a DELL R720 with 2x6-core, 32GB ram, H710 controller (no JBOD) and 15K rpm SAS HD: I will use it for a mysql 5.6 server, so I am trying to use ZFS to get L2ARC and ZIL benefits. I created a RAID10 and used zpool to create a pool on top: # zpool create DATA mfid3 # zpool add DATA cache mfid1 log mfid2 I have a question on zfs performances. Using: dd if=/dev/zero of=file.out bs=16k count=1M I cannot go faster than 400MB/s so I think I'm missing something; I tried removing zil, removing l2arc but everything is still the same. Here my configuration details: OS: FreeBSD 9.1 amd64 GENERIC /boot/loader.conf vfs.zfs.arc_min="4096M" vfs.zfs.arc_max="15872M" vm.kmem_size_max="64G" vm.kmem_size="49152M" vfs.zfs.write_limit_override=1073741824 /etc/sysctl.conf: kern.ipc.somaxconn=32768 kern.threads.max_threads_per_proc=16384 kern.maxfiles=262144 kern.maxfilesperproc=131072 kern.ipc.nmbclusters=65536 kern.corefile="/var/coredumps/%U.%N.%P.core" vfs.zfs.prefetch_disable="1" kern.maxvnodes=250000 mfiutil show volumes: mfi0 Volumes: Id Size Level Stripe State Cache Name mfid0 ( 278G) RAID-1 64k OPTIMAL Disabled mfid1 ( 118G) RAID-0 64k OPTIMAL Disabled mfid2 ( 118G) RAID-0 64k OPTIMAL Disabled mfid3 ( 1116G) RAID-10 64k OPTIMAL Disabled zpool status: pool: DATA state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM DATA ONLINE 0 0 0 mfid3 ONLINE 0 0 0 logs mfid2 ONLINE 0 0 0 cache mfid1 ONLINE 0 0 0 errors: No known data errors zfs get all DATA NAME PROPERTY VALUE SOURCE DATA type filesystem - DATA creation Mon Mar 18 13:41 2013 - DATA used 53.0G - DATA available 1.02T - DATA referenced 53.0G - DATA compressratio 1.00x - DATA mounted yes - DATA quota none default DATA reservation none default DATA recordsize 16K local DATA mountpoint /DATA default DATA sharenfs off default DATA checksum on default DATA compression off default DATA atime off local DATA devices on default DATA exec on default DATA setuid on default DATA readonly off default DATA jailed off default DATA snapdir hidden default DATA aclmode discard default DATA aclinherit restricted default DATA canmount on default DATA xattr off temporary DATA copies 1 default DATA version 5 - DATA utf8only off - DATA normalization none - DATA casesensitivity sensitive - DATA vscan off default DATA nbmand off default DATA sharesmb off default DATA refquota none default DATA refreservation none default DATA primarycache metadata local DATA secondarycache all default DATA usedbysnapshots 0 - DATA usedbydataset 53.0G - DATA usedbychildren 242K - DATA usedbyrefreservation 0 - DATA logbias latency default DATA dedup off default DATA mlslabel - DATA sync standard default DATA refcompressratio 1.00x - DATA written 53.0G - DATA zfs:zfs_nocacheflush 1 local I'm using recordsize=16k because of mysql. I am trying to use sysbench (0.5, not in the ports yet) with oltp test suite and my performances not so good. Any advice? Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 15:31:32 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BAF8E8EB for ; Mon, 18 Mar 2013 15:31:32 +0000 (UTC) (envelope-from prvs=17892983bb=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 60BD22C5 for ; Mon, 18 Mar 2013 15:31:32 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002792921.msg for ; Mon, 18 Mar 2013 15:31:31 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Mar 2013 15:31:31 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=17892983bb=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> From: "Steven Hartland" To: "Davide D'Amico" , References: <514729BD.2000608@contactlab.com> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Mon, 18 Mar 2013 15:31:51 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 15:31:32 -0000 ----- Original Message ----- From: "Davide D'Amico" To: Sent: Monday, March 18, 2013 2:50 PM Subject: FreBSD 9.1 and ZFS v28 performances > Hi all, > I'm trying to use ZFS on a DELL R720 with 2x6-core, 32GB ram, H710 > controller (no JBOD) and 15K rpm SAS HD: I will use it for a mysql 5.6 > server, so I am trying to use ZFS to get L2ARC and ZIL benefits. > > I created a RAID10 and used zpool to create a pool on top: > > # zpool create DATA mfid3 > # zpool add DATA cache mfid1 log mfid2 > > I have a question on zfs performances. Using: > > dd if=/dev/zero of=file.out bs=16k count=1M > > I cannot go faster than 400MB/s so I think I'm missing something; I > tried removing zil, removing l2arc but everything is still the same. > > Here my configuration details: > > OS: FreeBSD 9.1 amd64 GENERIC > > /boot/loader.conf > vfs.zfs.arc_min="4096M" > vfs.zfs.arc_max="15872M" > vm.kmem_size_max="64G" > vm.kmem_size="49152M" > vfs.zfs.write_limit_override=1073741824 > > /etc/sysctl.conf: > kern.ipc.somaxconn=32768 > kern.threads.max_threads_per_proc=16384 > kern.maxfiles=262144 > kern.maxfilesperproc=131072 > kern.ipc.nmbclusters=65536 > kern.corefile="/var/coredumps/%U.%N.%P.core" > vfs.zfs.prefetch_disable="1" > kern.maxvnodes=250000 > > mfiutil show volumes: > mfi0 Volumes: > Id Size Level Stripe State Cache Name > mfid0 ( 278G) RAID-1 64k OPTIMAL Disabled > mfid1 ( 118G) RAID-0 64k OPTIMAL Disabled > mfid2 ( 118G) RAID-0 64k OPTIMAL Disabled > mfid3 ( 1116G) RAID-10 64k OPTIMAL Disabled > > zpool status: > pool: DATA > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > DATA ONLINE 0 0 0 > mfid3 ONLINE 0 0 0 > logs > mfid2 ONLINE 0 0 0 > cache > mfid1 ONLINE 0 0 0 > > errors: No known data errors > > zfs get all DATA > NAME PROPERTY VALUE SOURCE > DATA type filesystem - > DATA creation Mon Mar 18 13:41 2013 - > DATA used 53.0G - > DATA available 1.02T - > DATA referenced 53.0G - > DATA compressratio 1.00x - > DATA mounted yes - > DATA quota none default > DATA reservation none default > DATA recordsize 16K local > DATA mountpoint /DATA default > DATA sharenfs off default > DATA checksum on default > DATA compression off default > DATA atime off local > DATA devices on default > DATA exec on default > DATA setuid on default > DATA readonly off default > DATA jailed off default > DATA snapdir hidden default > DATA aclmode discard default > DATA aclinherit restricted default > DATA canmount on default > DATA xattr off temporary > DATA copies 1 default > DATA version 5 - > DATA utf8only off - > DATA normalization none - > DATA casesensitivity sensitive - > DATA vscan off default > DATA nbmand off default > DATA sharesmb off default > DATA refquota none default > DATA refreservation none default > DATA primarycache metadata local > DATA secondarycache all default > DATA usedbysnapshots 0 - > DATA usedbydataset 53.0G - > DATA usedbychildren 242K - > DATA usedbyrefreservation 0 - > DATA logbias latency default > DATA dedup off default > DATA mlslabel - > DATA sync standard default > DATA refcompressratio 1.00x - > DATA written 53.0G - > DATA zfs:zfs_nocacheflush 1 local > > > I'm using recordsize=16k because of mysql. > > I am trying to use sysbench (0.5, not in the ports yet) with oltp test > suite and my performances not so good. First off ideally you shouldn't use RAID controllers for ZFS, let it have the raw disks and use a JBOD controller e.g. mps not a HW RAID controller like mfi. HEAD has some significant changes for the mfi driver specifically:- http://svnweb.freebsd.org/base?view=revision&revision=247369 This fixes lots off bugs but also enables full queue support on TBOLT cards so if your mfi is a TBOLT card you may see some speed up in random IO, not that this would effect your test here. While having a separate ZIL disk is good, your benefits may well be limited if said disk is a traditional HD, better to look at enterprise SSD's for this. The same and them some applies to your L2ARC disks. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 15:50:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1F62CE19 for ; Mon, 18 Mar 2013 15:50:24 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id EB5DC3DA for ; Mon, 18 Mar 2013 15:50:23 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHcKL-000Igz-Ud; Mon, 18 Mar 2013 11:50:21 -0400 Date: Mon, 18 Mar 2013 11:50:21 -0400 From: Gary Palmer To: Davide D'Amico Subject: Re: FreBSD 9.1 and ZFS v28 performances Message-ID: <20130318155021.GC52706@in-addr.com> References: <514729BD.2000608@contactlab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <514729BD.2000608@contactlab.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 15:50:24 -0000 On Mon, Mar 18, 2013 at 03:50:37PM +0100, Davide D'Amico wrote: > Hi all, > I'm trying to use ZFS on a DELL R720 with 2x6-core, 32GB ram, H710 > controller (no JBOD) and 15K rpm SAS HD: I will use it for a mysql 5.6 > server, so I am trying to use ZFS to get L2ARC and ZIL benefits. > > I created a RAID10 and used zpool to create a pool on top: > > # zpool create DATA mfid3 > # zpool add DATA cache mfid1 log mfid2 > > I have a question on zfs performances. Using: > > dd if=/dev/zero of=file.out bs=16k count=1M > > I cannot go faster than 400MB/s so I think I'm missing something; I > tried removing zil, removing l2arc but everything is still the same. > If you have ufs on mfid3 then does the performance change? How about if you umount the filesystem and dd to the raw device? In other words, are you sure this is a zfs issue and not an issue somewhere else? What I/O rate were you hoping/expecting to see? Gary From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 16:13:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 14C3C6E4 for ; Mon, 18 Mar 2013 16:13:21 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 83DB2733 for ; Mon, 18 Mar 2013 16:13:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363623199; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=eLBOwZ3vgEy4iR2t3fXLFVToaqfL0EBS2B9xDpgAyz8=; b=q6yMORYgzL0YEMookDdtIcNN8uoZ9J68bXH4NPBiC2bJSXRK/sDUfUaLs9/FtqPN kA7jW6AES/ggrtWelGcHRlVPejsxHAXhQHIZUecw42pRYAec8caoLVJLC7AyLXrt 99UHq22kfDO2vJxyG0DfQn/ZBMFOFdITaMAeDRRp/VM=; Received: from [213.92.90.12] ([213.92.90.12:58431] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 40/90-24145-F1D37415; Mon, 18 Mar 2013 17:13:19 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHcgY-000P5V-U1 for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 17:13:19 +0100 Received: (qmail 96437 invoked by uid 89); 18 Mar 2013 16:13:18 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 18 Mar 2013 16:13:18 -0000 Message-ID: <51473D1D.3050306@contactlab.com> Date: Mon, 18 Mar 2013 17:13:17 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> In-Reply-To: <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 16:13:21 -0000 Il 18/03/13 16:31, Steven Hartland ha scritto: > > ----- Original Message ----- From: "Davide D'Amico" > > To: > Sent: Monday, March 18, 2013 2:50 PM > Subject: FreBSD 9.1 and ZFS v28 performances > > >> Hi all, >> I'm trying to use ZFS on a DELL R720 with 2x6-core, 32GB ram, H710 >> controller (no JBOD) and 15K rpm SAS HD: I will use it for a mysql 5.6 >> server, so I am trying to use ZFS to get L2ARC and ZIL benefits. >> >> I created a RAID10 and used zpool to create a pool on top: >> >> # zpool create DATA mfid3 >> # zpool add DATA cache mfid1 log mfid2 >> >> I have a question on zfs performances. Using: >> >> dd if=/dev/zero of=file.out bs=16k count=1M >> >> I cannot go faster than 400MB/s so I think I'm missing something; I >> tried removing zil, removing l2arc but everything is still the same. >> >> Here my configuration details: >> >> OS: FreeBSD 9.1 amd64 GENERIC >> >> /boot/loader.conf >> vfs.zfs.arc_min="4096M" >> vfs.zfs.arc_max="15872M" >> vm.kmem_size_max="64G" >> vm.kmem_size="49152M" >> vfs.zfs.write_limit_override=1073741824 >> >> /etc/sysctl.conf: >> kern.ipc.somaxconn=32768 >> kern.threads.max_threads_per_proc=16384 >> kern.maxfiles=262144 >> kern.maxfilesperproc=131072 >> kern.ipc.nmbclusters=65536 >> kern.corefile="/var/coredumps/%U.%N.%P.core" >> vfs.zfs.prefetch_disable="1" >> kern.maxvnodes=250000 >> >> mfiutil show volumes: >> mfi0 Volumes: >> Id Size Level Stripe State Cache Name >> mfid0 ( 278G) RAID-1 64k OPTIMAL Disabled >> mfid1 ( 118G) RAID-0 64k OPTIMAL Disabled >> mfid2 ( 118G) RAID-0 64k OPTIMAL Disabled >> mfid3 ( 1116G) RAID-10 64k OPTIMAL Disabled >> >> zpool status: >> pool: DATA >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> DATA ONLINE 0 0 0 >> mfid3 ONLINE 0 0 0 >> logs >> mfid2 ONLINE 0 0 0 >> cache >> mfid1 ONLINE 0 0 0 >> >> errors: No known data errors >> >> zfs get all DATA >> NAME PROPERTY VALUE SOURCE >> DATA type filesystem - >> DATA creation Mon Mar 18 13:41 2013 - >> DATA used 53.0G - >> DATA available 1.02T - >> DATA referenced 53.0G - >> DATA compressratio 1.00x - >> DATA mounted yes - >> DATA quota none default >> DATA reservation none default >> DATA recordsize 16K local >> DATA mountpoint /DATA default >> DATA sharenfs off default >> DATA checksum on default >> DATA compression off default >> DATA atime off local >> DATA devices on default >> DATA exec on default >> DATA setuid on default >> DATA readonly off default >> DATA jailed off default >> DATA snapdir hidden default >> DATA aclmode discard default >> DATA aclinherit restricted default >> DATA canmount on default >> DATA xattr off temporary >> DATA copies 1 default >> DATA version 5 - >> DATA utf8only off - >> DATA normalization none - >> DATA casesensitivity sensitive - >> DATA vscan off default >> DATA nbmand off default >> DATA sharesmb off default >> DATA refquota none default >> DATA refreservation none default >> DATA primarycache metadata local >> DATA secondarycache all default >> DATA usedbysnapshots 0 - >> DATA usedbydataset 53.0G - >> DATA usedbychildren 242K - >> DATA usedbyrefreservation 0 - >> DATA logbias latency default >> DATA dedup off default >> DATA mlslabel - >> DATA sync standard default >> DATA refcompressratio 1.00x - >> DATA written 53.0G - >> DATA zfs:zfs_nocacheflush 1 local >> >> >> I'm using recordsize=16k because of mysql. >> >> I am trying to use sysbench (0.5, not in the ports yet) with oltp test >> suite and my performances not so good. > > First off ideally you shouldn't use RAID controllers for ZFS, let it > have the raw disks and use a JBOD controller e.g. mps not a HW RAID > controller like mfi. I tried removing the hardware raid10 and leaving 4 disks unconfigured and then: # mfiutil create jbod mfid3 mfid4 mfid5 mfid6 same behaviour/performance (probably because perc h710 'sees' them as raid0-single disks devices. Here my controller details: mfi0 Firmware Package Version: 21.0.2-0001 mfi0 Firmware Images: Name Version Date Time Status BIOS 5.30.00_4.12.05.00_0x05110000 1/ 7/2012 1/ 7/2012 active CTLR 4.00-0014 Aug 04 2011 12:49:17 active PCLI 05.00-03:#%00008 Feb 17 2011 14:03:12 active APP 3.130.05-1587 Apr 03 2012 09:36:13 active NVDT 2.1108.03-0076 Dec 02 2011 22:55:02 active BTBL 2.03.00.00-0003 Dec 16 2010 17:31:28 active BOOT 06.253.57.219 9/9/2010 15:32:25 active > > HEAD has some significant changes for the mfi driver specifically:- > http://svnweb.freebsd.org/base?view=revision&revision=247369 > > This fixes lots off bugs but also enables full queue support on TBOLT > cards so if your mfi is a TBOLT card you may see some speed up in > random IO, not that this would effect your test here. > > While having a separate ZIL disk is good, your benefits may well be > limited if said disk is a traditional HD, better to look at enterprise > SSD's for this. The same and them some applies to your L2ARC disks. I'm using SSD disks for zfs cache and zfs log: mfi0 Physical Drives: 0 ( 279G) ONLINE SAS E1:S0 1 ( 279G) ONLINE SAS E1:S1 2 ( 558G) ONLINE SAS E1:S2 3 ( 558G) ONLINE SAS E1:S3 4 ( 558G) ONLINE SAS E1:S4 5 ( 558G) ONLINE SAS E1:S5 6 ( 119G) ONLINE SATA E1:S6 7 ( 119G) ONLINE SATA E1:S7 Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 17:08:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E16A37E6 for ; Mon, 18 Mar 2013 17:08:47 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 4EAF0A1B for ; Mon, 18 Mar 2013 17:08:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363626525; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=CqwNbUHeVlgkEioYCIl6fNp2F+aN+taTxGm0sRwsUS0=; b=TdlzFoxfwPy3g1gWvxNg4+QlLIU/2sjnTfPAHWCDybYFHZOw8eUFaFt4HGEWB6Gr p4xFTzd0tQJVaquHu/Ihj4xNh0IsndLL6Xzk1fTC7PZNBVcqt24ULwELhA+zu0SE ATYehSjSLf0MDoXObqv43yqtCz+3TFSrr6eaVKeWw8g=; Received: from [213.92.90.12] ([213.92.90.12:28352] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 37/F8-24145-D1A47415; Mon, 18 Mar 2013 18:08:45 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHdYD-0003F7-Gz for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 18:08:45 +0100 Received: (qmail 12465 invoked by uid 89); 18 Mar 2013 17:08:45 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 18 Mar 2013 17:08:45 -0000 Message-ID: <51474A1C.7090604@contactlab.com> Date: Mon, 18 Mar 2013 18:08:44 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: kpneal@pobox.com Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <20130318163833.GA11916@neutralgood.org> In-Reply-To: <20130318163833.GA11916@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 17:08:47 -0000 Il 18/03/13 17:38, kpneal@pobox.com ha scritto: > On Mon, Mar 18, 2013 at 03:31:51PM -0000, Steven Hartland wrote: >> >> ----- Original Message ----- >> From: "Davide D'Amico" >> To: >> Sent: Monday, March 18, 2013 2:50 PM >> Subject: FreBSD 9.1 and ZFS v28 performances >> >> >>> Hi all, >>> I'm trying to use ZFS on a DELL R720 with 2x6-core, 32GB ram, H710 >>> controller (no JBOD) and 15K rpm SAS HD: I will use it for a mysql 5.6 >>> server, so I am trying to use ZFS to get L2ARC and ZIL benefits. >>> >>> I created a RAID10 and used zpool to create a pool on top: >>> >>> # zpool create DATA mfid3 >>> # zpool add DATA cache mfid1 log mfid2 >>> >>> I have a question on zfs performances. Using: >>> >>> dd if=/dev/zero of=file.out bs=16k count=1M >>> >>> I cannot go faster than 400MB/s so I think I'm missing something; I >>> tried removing zil, removing l2arc but everything is still the same. > > The ZIL only helps with synchronous writes. This is something apps must > request specifically typically and I would guess that dd would not do that. > So the ZIL doesn't affect your test. > > The L2ARC is a read cache. It does very little for writes. If the ZFS cache > working set fits all in memory then the L2ARC does nothing for you. Since > you are writing the only thing needed from the ARC is metadata. > >>> mfiutil show volumes: >>> mfi0 Volumes: >>> Id Size Level Stripe State Cache Name >>> mfid0 ( 278G) RAID-1 64k OPTIMAL Disabled >>> mfid1 ( 118G) RAID-0 64k OPTIMAL Disabled >>> mfid2 ( 118G) RAID-0 64k OPTIMAL Disabled >>> mfid3 ( 1116G) RAID-10 64k OPTIMAL Disabled >>> >>> zpool status: >>> pool: DATA >>> state: ONLINE >>> scan: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> DATA ONLINE 0 0 0 >>> mfid3 ONLINE 0 0 0 >>> logs >>> mfid2 ONLINE 0 0 0 >>> cache >>> mfid1 ONLINE 0 0 0 > > Warning: your ZIL should probably be mirrored. If it isn't, and the drive > fails, AND your machine takes a sudden dive (kernel panic, power outage, > etc) then you will lose data. I know: here is only one drive because i'm testing, yet. > >> DATA primarycache metadata local >> DATA secondarycache all default > > Is there a specific reason that you are making a point of not putting > regular data in the ARC? If you do that then reads of data will look in > the L2ARC, which is a normal 15k drive, before hitting the main pool drives > which also consists of normal 15k drives. Adding an extra set of spinning > rust before accessing your spinning rust doesn't sound helpful. Ok, good point. > >> HEAD has some significant changes for the mfi driver specifically:- >> http://svnweb.freebsd.org/base?view=revision&revision=247369 >> >> This fixes lots off bugs but also enables full queue support on TBOLT >> cards so if your mfi is a TBOLT card you may see some speed up in >> random IO, not that this would effect your test here. > > I believe the H710 is a TBOLT card. It was released with the 12G servers > like the R720. Anyway I cannot use -CURRENT on a production server, so I should use 9.1. My 'goal' is to understand if I can perform better than UFS on the same hardware setup. So, I removed the pool and formatted mfid3 using UFS: # mount | grep mfid3 /dev/mfid3 on /DATA (ufs, local, soft-updates) where I have: # dd if=/dev/zero of=file.out bs=8k count=1M 1048576+0 records in 1048576+0 records out 8589934592 bytes transferred in 21.406975 secs (401268025 bytes/sec) (executed 4-5 times) I have the same throughput than ZFS with or without ZIL. I don't know if this is "normal" or if I am missing something on my setup, that's why I'm asking if I can do something more or if with this setup this value is the best I can have. > > I don't believe the OP mentioned how many drives are in the RAID10. More > drives ~== more parallelism ~== better performance. So I too am wondering > how much performance is expected. > >> While having a separate ZIL disk is good, your benefits may well be >> limited if said disk is a traditional HD, better to look at enterprise >> SSD's for this. The same and them some applies to your L2ARC disks. > > Before purchasing SSD's check the H710 docs to make sure they are allowed. > The 6/i in my R610 specifically says that if an SSD is used it must be the > only drive. Your R720's H710 is much newer and thus may not have that > restriction. Still, checking the documentation is cheap. > Yes, in R620 SSD are allowed (and a possible choice in the online DELL configurator). Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 17:28:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 13CE9F10 for ; Mon, 18 Mar 2013 17:28:01 +0000 (UTC) (envelope-from prvs=17892983bb=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id ADE6BB2A for ; Mon, 18 Mar 2013 17:28:00 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002794477.msg for ; Mon, 18 Mar 2013 17:27:58 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Mar 2013 17:27:58 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=17892983bb=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> From: "Steven Hartland" To: "Davide D'Amico" References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Mon, 18 Mar 2013 17:28:16 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 17:28:01 -0000 ----- Original Message ----- From: "Davide D'Amico" To: "Steven Hartland" Cc: Sent: Monday, March 18, 2013 4:13 PM Subject: Re: FreBSD 9.1 and ZFS v28 performances > Il 18/03/13 16:31, Steven Hartland ha scritto: >> >> ----- Original Message ----- From: "Davide D'Amico" >> >> To: >> Sent: Monday, March 18, 2013 2:50 PM >> Subject: FreBSD 9.1 and ZFS v28 performances >> >> >>> Hi all, >>> I'm trying to use ZFS on a DELL R720 with 2x6-core, 32GB ram, H710 >>> controller (no JBOD) and 15K rpm SAS HD: I will use it for a mysql 5.6 >>> server, so I am trying to use ZFS to get L2ARC and ZIL benefits. >>> >>> I created a RAID10 and used zpool to create a pool on top: >> >> While having a separate ZIL disk is good, your benefits may well be >> limited if said disk is a traditional HD, better to look at enterprise >> SSD's for this. The same and them some applies to your L2ARC disks. > > I'm using SSD disks for zfs cache and zfs log: > > mfi0 Physical Drives: > 0 ( 279G) ONLINE SAS E1:S0 > 1 ( 279G) ONLINE SAS E1:S1 > 2 ( 558G) ONLINE SAS E1:S2 > 3 ( 558G) ONLINE SAS E1:S3 > 4 ( 558G) ONLINE SAS E1:S4 > 5 ( 558G) ONLINE SAS E1:S5 > 6 ( 119G) ONLINE SATA E1:S6 > 7 ( 119G) ONLINE SATA E1:S7 So RAID10 on just 6 disks in effect just 3 active spindles? If so then your throughput of 400MB/s is about right. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 17:30:26 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BE4F6FB1 for ; Mon, 18 Mar 2013 17:30:26 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 46214B56 for ; Mon, 18 Mar 2013 17:30:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363627824; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=oWCgdyOTGKjdYxIt5UJCpZ5Mg2l3GsYl0riHr51dVK4=; b=T05owhKo6zWh3nz85j5r0N06m/T/N0CIVO889jTqt9Sj1c7GTLIPHFLkOlOgDcVK BJ2Lrhn7DB8cHWZbLFIPxDdnDYp0Vuf+gPhVPkwraGzBOYQ113MAl8xEYL/1n76j WdWf3F9QDklWt/P0amDXJUX99xdWStHC4ZmJrl/g0gs=; Received: from [213.92.90.12] ([213.92.90.12:30244] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id D2/00-24145-03F47415; Mon, 18 Mar 2013 18:30:24 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHdtA-0004eH-GT for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 18:30:24 +0100 Received: (qmail 17869 invoked by uid 89); 18 Mar 2013 17:30:24 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 18 Mar 2013 17:30:24 -0000 Message-ID: <51474F2F.5040003@contactlab.com> Date: Mon, 18 Mar 2013 18:30:23 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> In-Reply-To: <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 17:30:26 -0000 > > So RAID10 on just 6 disks in effect just 3 active spindles? If so then your > throughput of 400MB/s is about right. Well, my RAID10 is on 4 disk (2 spindle) so do I have 400MB/s (3GBps) because the max throughput is 6Gbps? Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 17:41:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4F0F8537 for ; Mon, 18 Mar 2013 17:41:51 +0000 (UTC) (envelope-from prvs=17892983bb=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id E69E7CD7 for ; Mon, 18 Mar 2013 17:41:50 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002794660.msg for ; Mon, 18 Mar 2013 17:41:49 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Mar 2013 17:41:49 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=17892983bb=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Davide D'Amico" References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Mon, 18 Mar 2013 17:42:09 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 17:41:51 -0000 ----- Original Message ----- From: "Davide D'Amico" >> So RAID10 on just 6 disks in effect just 3 active spindles? If so then your >> throughput of 400MB/s is about right. > Well, my RAID10 is on 4 disk (2 spindle) so do I have 400MB/s (3GBps) > because the max throughput is 6Gbps? You'll be limited by the actual disks. For your disks this is stated as 122 to 204MB/s sustained. So if your getting 400MB/s your doing well :) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 17:44:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C2793841 for ; Mon, 18 Mar 2013 17:44:10 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 1D193D0B for ; Mon, 18 Mar 2013 17:44:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363628649; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=MI9T3+J2COwwM19HFb3ZWCFdH3SfAOWxm3eEQq1pBA0=; b=RWEL125z+LFK+jdIfLnQbgPu/mE8by6FMTl60KBAqc4x/reIh3ZKiOromFOggyUi 1WIQsvt6ekcks+hd4krBWCc+e1RVKqHANXkTN8IN6DRorzCNQuHXX2c25pQaKdAy 9C5ivRQlCZuypqLMmnlII+rQz4iPWF8q4e5rX5k2kE0=; Received: from [213.92.90.12] ([213.92.90.12:32069] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 72/F7-24145-96257415; Mon, 18 Mar 2013 18:44:09 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHe6S-0005X6-Ty for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 18:44:09 +0100 Received: (qmail 21268 invoked by uid 89); 18 Mar 2013 17:44:08 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 18 Mar 2013 17:44:08 -0000 Message-ID: <51475267.1050204@contactlab.com> Date: Mon, 18 Mar 2013 18:44:07 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 17:44:10 -0000 Il 18/03/13 18:42, Steven Hartland ha scritto: > ----- Original Message ----- From: "Davide D'Amico" > > >>> So RAID10 on just 6 disks in effect just 3 active spindles? If so >>> then your >>> throughput of 400MB/s is about right. >> Well, my RAID10 is on 4 disk (2 spindle) so do I have 400MB/s (3GBps) >> because the max throughput is 6Gbps? > > You'll be limited by the actual disks. For your disks this is stated > as 122 to 204MB/s sustained. So if your getting 400MB/s your doing well :) > Thanks, now it's clear. d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 18:07:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 73D502FA for ; Mon, 18 Mar 2013 18:07:28 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id C5D2EE5E for ; Mon, 18 Mar 2013 18:07:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363630046; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=zFFeV4LFNH7ZEAuvq7lRUEparnboqD8Fgviy7EG9jCc=; b=EI36Z+6LN4hALq2PhZ0nx6LogSslVc0TKp4vA4ZjLjGv3qOkkmbb0TXrceDiPCvu UrBTs65lQiZsNUnRXSsHJJwA7ipXZbjmySb56XUTF6Sxj+zQrXJ7l7dw+8SMjigj v9vHHqZjDtYocrT94sXV4LGrDR/mvxnfSpx/zjL/tO4=; Received: from [213.92.90.12] ([213.92.90.12:47135] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 5C/0C-24145-ED757415; Mon, 18 Mar 2013 19:07:26 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHeT0-0006uO-El for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 19:07:26 +0100 Received: (qmail 26556 invoked by uid 89); 18 Mar 2013 18:07:26 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 18 Mar 2013 18:07:26 -0000 Message-ID: <514757DD.9030705@contactlab.com> Date: Mon, 18 Mar 2013 19:07:25 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> In-Reply-To: <51475267.1050204@contactlab.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 18:07:28 -0000 Il 18/03/13 18:44, Davide D'Amico ha scritto: > Il 18/03/13 18:42, Steven Hartland ha scritto: >> ----- Original Message ----- From: "Davide D'Amico" >> >> >>>> So RAID10 on just 6 disks in effect just 3 active spindles? If so >>>> then your >>>> throughput of 400MB/s is about right. >>> Well, my RAID10 is on 4 disk (2 spindle) so do I have 400MB/s (3GBps) >>> because the max throughput is 6Gbps? >> >> You'll be limited by the actual disks. For your disks this is stated >> as 122 to 204MB/s sustained. So if your getting 400MB/s your doing >> well :) >> > > Thanks, now it's clear. But now I do other tests using a lua script with sysbench with different setups: UFS on RAID10 HW: General statistics: total time: 36.1023s total number of events: 1 total time taken by event execution: 36.1002s UFS on 1 SSD: General statistics: total time: 36.3970s total number of events: 1 total time taken by event execution: 36.3948s ZFS (mirror mfid3 mfid4 mirror mfid5 mfid6): General statistics: total time: 78.0531s total number of events: 1 total time taken by event execution: 78.0509s ZFS with ZIL: General statistics: total time: 85.2306s total number of events: 1 total time taken by event execution: 85.2285s The workload is always the same (a set of 50k mysql myisam queries), and as you can see zfs is really slow compared to ufs, and I don't know why :( The latest check I should do is using a L2ARC, but I'll do tomorrow. Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 18:28:13 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 051C652F for ; Mon, 18 Mar 2013 18:28:13 +0000 (UTC) (envelope-from prvs=17892983bb=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 8BBBDF43 for ; Mon, 18 Mar 2013 18:28:12 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002795481.msg for ; Mon, 18 Mar 2013 18:28:09 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Mar 2013 18:28:09 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=17892983bb=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> From: "Steven Hartland" To: "Davide D'Amico" References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Mon, 18 Mar 2013 18:28:30 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 18:28:13 -0000 ----- Original Message ----- From: "Davide D'Amico" > But now I do other tests using a lua script with sysbench with different > setups: > > UFS on RAID10 HW: > General statistics: > total time: 36.1023s > total number of events: 1 > total time taken by event execution: 36.1002s > > UFS on 1 SSD: > General statistics: > total time: 36.3970s > total number of events: 1 > total time taken by event execution: 36.3948s > > ZFS (mirror mfid3 mfid4 mirror mfid5 mfid6): > General statistics: > total time: 78.0531s > total number of events: 1 > total time taken by event execution: 78.0509s > > ZFS with ZIL: > General statistics: > total time: 85.2306s > total number of events: 1 > total time taken by event execution: 85.2285s > > > The workload is always the same (a set of 50k mysql myisam queries), and > as you can see zfs is really slow compared to ufs, and I don't know why :( > > The latest check I should do is using a L2ARC, but I'll do tomorrow. How does ZFS compare if you do it on 1 SSD as per your second UFS test? As I'm wondering the mfi cache is kicking in? While running the tests what sort of thing are you seeing from gstat, any disks maxing? If so primarily read or write? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 18:35:26 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E1DE08BB for ; Mon, 18 Mar 2013 18:35:26 +0000 (UTC) (envelope-from prvs=17892983bb=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 73547FF2 for ; Mon, 18 Mar 2013 18:35:25 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002795548.msg for ; Mon, 18 Mar 2013 18:35:25 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Mar 2013 18:35:25 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=17892983bb=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Davide D'Amico" References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Mon, 18 Mar 2013 18:35:44 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 18:35:26 -0000 ----- Original Message ----- From: "Davide D'Amico" To: "Steven Hartland" Cc: Sent: Monday, March 18, 2013 6:07 PM Subject: Re: FreBSD 9.1 and ZFS v28 performances > Il 18/03/13 18:44, Davide D'Amico ha scritto: >> Il 18/03/13 18:42, Steven Hartland ha scritto: >>> ----- Original Message ----- From: "Davide D'Amico" >>> >>> >>>>> So RAID10 on just 6 disks in effect just 3 active spindles? If so >>>>> then your >>>>> throughput of 400MB/s is about right. >>>> Well, my RAID10 is on 4 disk (2 spindle) so do I have 400MB/s (3GBps) >>>> because the max throughput is 6Gbps? >>> >>> You'll be limited by the actual disks. For your disks this is stated >>> as 122 to 204MB/s sustained. So if your getting 400MB/s your doing >>> well :) >>> >> >> Thanks, now it's clear. > > But now I do other tests using a lua script with sysbench with different > setups: > > UFS on RAID10 HW: > General statistics: > total time: 36.1023s > total number of events: 1 > total time taken by event execution: 36.1002s > > UFS on 1 SSD: > General statistics: > total time: 36.3970s > total number of events: 1 > total time taken by event execution: 36.3948s > > ZFS (mirror mfid3 mfid4 mirror mfid5 mfid6): > General statistics: > total time: 78.0531s > total number of events: 1 > total time taken by event execution: 78.0509s > > ZFS with ZIL: > General statistics: > total time: 85.2306s > total number of events: 1 > total time taken by event execution: 85.2285s > > > The workload is always the same (a set of 50k mysql myisam queries), and > as you can see zfs is really slow compared to ufs, and I don't know why :( > > The latest check I should do is using a L2ARC, but I'll do tomorrow. Oh and another thing if this is mysql did you set the right settings for your ZFS volume e.g. zfs set atime=off tank zfs create tank/mysql zfs set recordsize=16k tank/mysql Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 18:55:22 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5501CF93; Mon, 18 Mar 2013 18:55:22 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 1B7F51AB; Mon, 18 Mar 2013 18:55:22 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 671EAB926; Mon, 18 Mar 2013 14:55:21 -0400 (EDT) From: John Baldwin To: Rick Macklem Subject: Re: Deadlock in the NFS client Date: Mon, 18 Mar 2013 14:45:57 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <88927360.3963361.1363399419023.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <88927360.3963361.1363399419023.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201303181445.57714.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 18 Mar 2013 14:55:21 -0400 (EDT) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 18:55:22 -0000 On Friday, March 15, 2013 10:03:39 pm Rick Macklem wrote: > John Baldwin wrote: > > On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov wrote: > > > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote: > > > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov wrote: > > > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem wrote: > > > > > > John Baldwin wrote: > > > > > > > I ran into a machine that had a deadlock among certain files > > > > > > > on a > > > > > > > given NFS > > > > > > > mount today. I'm not sure how best to resolve it, though it > > > > > > > seems like > > > > > > > perhaps there is a bug with how the pool of nfsiod threads > > > > > > > is managed. > > > > > > > Anyway, more details on the actual hang below. This was on > > > > > > > 8.x with > > > > > > > the > > > > > > > old NFS client, but I don't see anything in HEAD that would > > > > > > > fix this. > > > > > > > > > > > > > > First note that the system was idle so it had dropped down > > > > > > > to only one > > > > > > > nfsiod thread. > > > > > > > > > > > > > Hmm, I see the problem and I'm a bit surprised it doesn't bite > > > > > > more often. > > > > > > It seems to me that this snippet of code from nfs_asyncio() > > > > > > makes too > > > > > > weak an assumption: > > > > > > /* > > > > > > * If none are free, we may already have an iod working on > > > > > > this mount > > > > > > * point. If so, it will process our request. > > > > > > */ > > > > > > if (!gotiod) { > > > > > > if (nmp->nm_bufqiods > 0) { > > > > > > NFS_DPF(ASYNCIO, > > > > > > ("nfs_asyncio: %d iods are already processing mount %p\n", > > > > > > nmp->nm_bufqiods, nmp)); > > > > > > gotiod = TRUE; > > > > > > } > > > > > > } > > > > > > It assumes that, since an nfsiod thread is processing some > > > > > > buffer for the > > > > > > mount, it will become available to do this one, which isn't > > > > > > true for your > > > > > > deadlock. > > > > > > > > > > > > I think the simple fix would be to recode nfs_asyncio() so > > > > > > that > > > > > > it only returns 0 if it finds an AVAILABLE nfsiod thread that > > > > > > it > > > > > > has assigned to do the I/O, getting rid of the above. The > > > > > > problem > > > > > > with doing this is that it may result in a lot more > > > > > > synchronous I/O > > > > > > (nfs_asyncio() returns EIO, so the caller does the I/O). Maybe > > > > > > more > > > > > > synchronous I/O could be avoided by allowing nfs_asyncio() to > > > > > > create a > > > > > > new thread even if the total is above nfs_iodmax. (I think > > > > > > this would > > > > > > require the fixed array to be replaced with a linked list and > > > > > > might > > > > > > result in a large number of nfsiod threads.) Maybe just having > > > > > > a large > > > > > > nfs_iodmax would be an adequate compromise? > > > > > > > > > > > > Does having a large # of nfsiod threads cause any serious > > > > > > problem for > > > > > > most systems these days? > > > > > > > > > > > > I'd be tempted to recode nfs_asyncio() as above and then, > > > > > > instead > > > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed > > > > > > number of > > > > > > nfsiod threads (this could be a tunable, with the > > > > > > understanding that > > > > > > it should be large for good performance) > > > > > > > > > > > > > > > > I do not see how this would solve the deadlock itself. The > > > > > proposal would > > > > > only allow system to survive slightly longer after the deadlock > > > > > appeared. > > > > > And, I think that allowing the unbound amount of nfsiod threads > > > > > is also > > > > > fatal. > > > > > > > > > > The issue there is the LOR between buffer lock and vnode lock. > > > > > Buffer lock > > > > > always must come after the vnode lock. The problematic nfsiod > > > > > thread, which > > > > > locks the vnode, volatile this rule, because despite the > > > > > LK_KERNPROC > > > > > ownership of the buffer lock, it is the thread which de fact > > > > > owns the > > > > > buffer (only the thread can unlock it). > > > > > > > > > > A possible solution would be to pass LK_NOWAIT to nfs_nget() > > > > > from the > > > > > nfs_readdirplusrpc(). From my reading of the code, nfs_nget() > > > > > should > > > > > be capable of correctly handling the lock failure. And EBUSY > > > > > would > > > > > result in doit = 0, which should be fine too. > > > > > > > > > > It is possible that EBUSY should be reset to 0, though. > > > > > > > > Yes, thinking about this more, I do think the right answer is for > > > > readdirplus to do this. The only question I have is if it should > > > > do > > > > this always, or if it should do this only from the nfsiod thread. > > > > I > > > > believe you can't get this in the non-nfsiod case. > > > > > > I agree that it looks as of the workaround only needed for nfsiod > > > thread. > > > On the other hand, it is not immediately obvious how to detect that > > > the current thread is nfsio daemon. Probably a thread flag should be > > > set. > > > > OTOH, updating the attributes from readdir+ is only an optimization > > anyway, so > > just having it always do LK_NOWAIT is probably ok (and simple). > > Currently I'm > > trying to develop a test case to provoke this so I can test the fix, > > but no > > luck on that yet. > > > > -- > > John Baldwin > Just fyi, ignore my comment about the second version of the patch that > disables the nfsiod threads from doing readdirplus running faster. It > was just that when I tested the 2nd patch, the server's caches were > primed. Oops. > > However, sofar the minimal testing I've done has been essentially > performance neutral between the unpatch and patched versions. > > Hopefully John has a convenient way to do some performance testing, > since I won't be able to do much until the end of April. This does fix the deadlock, but I can't really speak to performance. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 19:13:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 4BFF04BB for ; Mon, 18 Mar 2013 19:13:50 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id B57032C8 for ; Mon, 18 Mar 2013 19:13:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363634027; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=slIuux2+uiX4OxPB6aqnTNoxhyw/gXgfV/tyM51hZAw=; b=K9liGf6b4w2pLeTU1pVdgOmzvi9wFWvI/1nNGdfVqzon1KR4GZpheueEneMhKKM5 MI5D8af+uO2sPHZLU6oRo6BjEtIam0aoExSysOw3I0q3KuVQjAoqCgvJkFZY6Z7V Z+eHv5tU0EJ+1J32ee4qvALFJ7dCJ11BFFpX6xKfN0w=; Received: from [213.92.90.12] ([213.92.90.12:47782] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 2E/C3-24145-B6767415; Mon, 18 Mar 2013 20:13:47 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHfVD-0009e7-AV for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 20:13:47 +0100 Received: (qmail 37078 invoked by uid 80); 18 Mar 2013 19:13:47 -0000 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.50 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 18 Mar 2013 20:13:47 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> Message-ID: <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 19:13:50 -0000 > How does ZFS compare if you do it on 1 SSD as per your second > UFS test? As I'm wondering the mfi cache is kicking in? Well, it was a test :) The MFI cache is enabled because I am using mfid* as jbod (mfiutil create jbod mfid3 mfid4 mfid5 mfid6): > > While running the tests what sort of thing are you > seeing from gstat, any disks maxing? If so primarily > read or write? Here the r/w pattern using zpool iostat 2: DATA 52.2G 1.03T 102 0 1.60M 0 DATA 52.2G 1.03T 7 105 128K 674K DATA 52.2G 1.03T 40 0 655K 0 DATA 52.2G 1.03T 16 0 264K 0 DATA 52.2G 1.03T 7 154 120K 991K DATA 52.2G 1.03T 125 0 1.95M 0 DATA 52.2G 1.03T 44 117 711K 718K DATA 52.2G 1.03T 63 0 1015K 0 DATA 52.2G 1.03T 39 0 631K 0 DATA 52.2G 1.03T 1 152 24.0K 1006K DATA 52.2G 1.03T 9 0 152K 0 DATA 52.2G 1.03T 2 100 40.0K 571K DATA 52.2G 1.03T 41 0 663K 0 DATA 52.2G 1.03T 41 0 658K 89.9K DATA 52.2G 1.03T 1 114 24.0K 741K DATA 52.2G 1.03T 0 0 0 0 DATA 52.2G 1.03T 2 155 40.0K 977K DATA 52.2G 1.03T 3 0 63.9K 0 DATA 52.2G 1.03T 28 0 456K 0 DATA 52.2G 1.03T 98 125 1.49M 863K DATA 52.2G 1.03T 122 0 1.89M 0 DATA 52.2G 1.03T 70 123 1.10M 841K DATA 52.2G 1.03T 21 0 352K 0 DATA 52.2G 1.03T 1 0 24.0K 0 DATA 52.2G 1.03T 10 160 168K 1.06M DATA 52.2G 1.03T 6 0 112K 0 DATA 52.2G 1.03T 0 126 7.99K 908K DATA 52.2G 1.03T 50 0 807K 0 DATA 52.2G 1.03T 19 0 320K 97.9K DATA 52.2G 1.03T 4 122 66.9K 862K DATA 52.2G 1.03T 6 0 104K 0 DATA 52.2G 1.03T 0 164 0 1.06M DATA 52.2G 1.03T 128 0 2.01M 0 DATA 52.2G 1.03T 0 0 0 0 DATA 52.2G 1.03T 0 106 0 649K DATA 52.2G 1.03T 5 0 95.9K 0 DATA 52.2G 1.03T 8 114 144K 711K DATA 52.2G 1.03T 40 0 655K 0 DATA 52.2G 1.03T 47 0 759K 0 DATA 52.2G 1.03T 13 96 216K 551K DATA 52.2G 1.03T 2 0 40.0K 0 DATA 52.2G 1.03T 0 97 0 402K And the result from sysbench: General statistics: total time: 82.9567s total number of events: 1 total time taken by event execution: 82.9545s Using a SSD: # iostat mfid2 -x 2 tty mfid2 cpu tin tout KB/t tps MB/s us ni sy in id 0 32 125.21 31 3.84 0 0 0 0 99 0 170 0.00 0 0.00 1 0 0 0 99 0 22 0.00 0 0.00 3 0 2 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 32.00 2 0.08 3 0 1 0 96 0 22 32.00 0 0.02 3 0 1 0 96 0 22 4.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 2 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 1 0 96 0 22 0.00 0 0.00 3 0 2 0 96 0 22 44.80 67 2.95 3 0 1 0 96 0 22 87.58 9 0.81 3 0 2 0 96 0 22 32.00 3 0.09 2 0 2 0 96 0 585 0.00 0 0.00 3 0 1 0 96 0 22 4.00 0 0.00 0 0 0 0 100 And the result from sysbench: General statistics: total time: 36.1146s total number of events: 1 total time taken by event execution: 36.1123s That are the same results using SAS disks. d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 19:20:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A02CF71A for ; Mon, 18 Mar 2013 19:20:06 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 08278326 for ; Mon, 18 Mar 2013 19:20:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363634404; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=jzrfGO3zwKzlqSZQAxvhI3ql49ohWjs5C8eQDQoj5m8=; b=rQ8OGabPXZDmfkoYLM4CFYh6Y9fO4nRFuAflN4WHqqOohhrf7gCT+mhCPYQPqSor 7B1DZr4QMZZrMY5x9kDCksaOU4ZStYxl6M+6FASaADuN4qZ2aTwQyL1Zx/E/aLfS GOjmFzHnldWAGvXNkUATj8FRZ5XMynPYhIj1nZtd3o0=; Received: from [213.92.90.12] ([213.92.90.12:35596] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id E6/FC-24145-4E867415; Mon, 18 Mar 2013 20:20:04 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHfbI-0009p5-J9 for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 20:20:04 +0100 Received: (qmail 37758 invoked by uid 80); 18 Mar 2013 19:20:04 -0000 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.227 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 18 Mar 2013 20:20:04 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> Message-ID: <4280fd76b9b18376e11e90705e0c736c@sys.tomatointeractive.it> X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 19:20:06 -0000 [...] > Oh and another thing if this is mysql did you set the right settings > for your ZFS volume e.g. > zfs set atime=off tank > zfs create tank/mysql > zfs set recordsize=16k tank/mysql # zfs get all DATA | egrep -e "atime|record" DATA recordsize 16K local DATA atime off local I didn't create tank/mysql, I issued: # zpool create DATA mirror ... # cd /DATA # zfs set atime=off DATA # zfs set recordsize=16k DATA # mkdir mysql # chown mysql:mysql mysql # cp -Rp /repo/mysql/* /DATA/mysql [..] # I think it's the same. I've used: vfs.zfs.prefetch_disable="1" and vfs.zfs.prefetch_disable="0" same result. I tried enabling/disabling in /boot/loader.conf (I have 32GB ram): vfs.zfs.arc_min="4096M" vfs.zfs.arc_max="15872M" vm.kmem_size_max="64G" vm.kmem_size="49152M" vfs.zfs.write_limit_override=1073741824 same result. Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 19:27:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D0AABA54 for ; Mon, 18 Mar 2013 19:27:50 +0000 (UTC) (envelope-from prvs=17892983bb=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 60796386 for ; Mon, 18 Mar 2013 19:27:49 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002796179.msg for ; Mon, 18 Mar 2013 19:27:49 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 18 Mar 2013 19:27:49 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=17892983bb=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> From: "Steven Hartland" To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Mon, 18 Mar 2013 19:28:07 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 19:27:50 -0000 ----- Original Message ----- From: "Davide D'Amico" >> How does ZFS compare if you do it on 1 SSD as per your second >> UFS test? As I'm wondering the mfi cache is kicking in? > > Well, it was a test :) > > The MFI cache is enabled because I am using mfid* as jbod (mfiutil > create jbod mfid3 mfid4 mfid5 mfid6): Don't use mfiutil to do this it doesnt work it creates mirrors. Use MegaCli instead to create real jbods e.g. MegaCli -AdpSetProp -EnableJBOD -1 -aALL >> >> While running the tests what sort of thing are you >> seeing from gstat, any disks maxing? If so primarily >> read or write? > Here the r/w pattern using zpool iostat 2: > > DATA 52.2G 1.03T 102 0 1.60M 0 > DATA 52.2G 1.03T 7 105 128K 674K ... > DATA 52.2G 1.03T 0 97 0 402K > > And the result from sysbench: > General statistics: > total time: 82.9567s > total number of events: 1 > total time taken by event execution: 82.9545s Thats hardly doing any disk access at all, so odd it would be doubling your benchmark time. > Using a SSD: > # iostat mfid2 -x 2 > tty mfid2 cpu > tin tout KB/t tps MB/s us ni sy in id > 0 32 125.21 31 3.84 0 0 0 0 99 > 0 170 0.00 0 0.00 1 0 0 0 99 > 0 22 0.00 0 0.00 3 0 2 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 32.00 2 0.08 3 0 1 0 96 > 0 22 32.00 0 0.02 3 0 1 0 96 > 0 22 4.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 2 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 1 0 96 > 0 22 0.00 0 0.00 3 0 2 0 96 > 0 22 44.80 67 2.95 3 0 1 0 96 > 0 22 87.58 9 0.81 3 0 2 0 96 > 0 22 32.00 3 0.09 2 0 2 0 96 > 0 585 0.00 0 0.00 3 0 1 0 96 > 0 22 4.00 0 0.00 0 0 0 0 100 > > And the result from sysbench: > General statistics: > total time: 36.1146s > total number of events: 1 > total time taken by event execution: 36.1123s > > That are the same results using SAS disks. So this is ZFS on the SSD, resulting the same benchmark results as UFS? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 19:32:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2FD63B15 for ; Mon, 18 Mar 2013 19:32:35 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 98E3F3E0 for ; Mon, 18 Mar 2013 19:32:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363635153; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=BBXXHZoVi/PC8vRoGjC8vAUUF9Z2GOwqEiG25RLjlfc=; b=QSNmh/lvsxHFxPQaLMzpvMozsIjU5eb0Ll3372swo0ppxRxC0gFKrfIY1Gdikl+K XQqJkyJVOFTT9foZ0n7hEWZlBihHJ3Lz3W+8GWW8NpLogwFDk+hGH7jx/oyaUP7Y bAELcjI0KDCzxe4pM+0F5VwHhMks2qxinLxnAM44cPc=; Received: from [213.92.90.12] ([213.92.90.12:10473] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id A2/7E-24145-1DB67415; Mon, 18 Mar 2013 20:32:33 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHfnN-000AI5-09 for freebsd-fs@freebsd.org; Mon, 18 Mar 2013 20:32:33 +0100 Received: (qmail 39556 invoked by uid 80); 18 Mar 2013 19:32:32 -0000 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.228 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 18 Mar 2013 20:32:32 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> Message-ID: <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 19:32:35 -0000 Il 18.03.2013 20:28 Steven Hartland ha scritto: > ----- Original Message ----- From: "Davide D'Amico" > >>> How does ZFS compare if you do it on 1 SSD as per your second >>> UFS test? As I'm wondering the mfi cache is kicking in? >> Well, it was a test :) >> The MFI cache is enabled because I am using mfid* as jbod (mfiutil >> create jbod mfid3 mfid4 mfid5 mfid6): > > Don't use mfiutil to do this it doesnt work it creates mirrors. > > Use MegaCli instead to create real jbods e.g. > MegaCli -AdpSetProp -EnableJBOD -1 -aALL > Ok, I'll give it a try (never used, I thought it has been dismissed), and I'll let you know. >> And the result from sysbench: >> General statistics: >> total time: 82.9567s >> total number of events: 1 >> total time taken by event execution: 82.9545s > > Thats hardly doing any disk access at all, so odd it would be doubling > your benchmark time. > >> Using a SSD: >> # iostat mfid2 -x 2 >> tty mfid2 cpu >> tin tout KB/t tps MB/s us ni sy in id >> 0 32 125.21 31 3.84 0 0 0 0 99 [...] >> 0 585 0.00 0 0.00 3 0 1 0 96 >> 0 22 4.00 0 0.00 0 0 0 0 100 >> And the result from sysbench: >> General statistics: >> total time: 36.1146s >> total number of events: 1 >> total time taken by event execution: 36.1123s >> That are the same results using SAS disks. > > So this is ZFS on the SSD, resulting the same benchmark results as > UFS? This is UFS on SSD, that has the same behaviour than UFS on RAID10 HW on SAS drives. d. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 18 23:03:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CBD72B0D for ; Mon, 18 Mar 2013 23:03:52 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [IPv6:2a01:4f8:141:52a3:186::]) by mx1.freebsd.org (Postfix) with ESMTP id 856AC2A1 for ; Mon, 18 Mar 2013 23:03:52 +0000 (UTC) Received: from [10.10.1.100] (unknown [217.71.4.82]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id 35D69D9C0C; Tue, 19 Mar 2013 00:03:50 +0100 (CET) X-DKIM: OpenDKIM Filter v2.5.2 mail.tyknet.dk 35D69D9C0C DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gibfest.dk; s=default; t=1363647831; bh=sgxXbJO9U0H1q1ZUuG7bv0okumGHtTTFvWx7KTVxlTk=; h=Date:From:To:Subject; b=QlpR8ik7uF2Kn6ECWKjvbyF10gSSnPj6d7WZSfkfyyZHgd1iBdfdkCuhejrcitp7M 34i/ybk0VTVRG8AIK+sOelMgZEhaFNlbcXjaE8fzMIpmpoIRD1JZPuKr+fKQTdUn+C ATwZhahBOy7d8S4RoldxHuxvS4ZhNUMUGMEhvSe8= Message-ID: <51479D54.1040509@gibfest.dk> Date: Tue, 19 Mar 2013 00:03:48 +0100 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: When will we see TRIM support for GELI volumes ? X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Mar 2013 23:03:52 -0000 Hello there, I was happy to see TRIM support in UFS and ZFS, however: I would really like to see TRIM support for GELI volumes. I finally got an SSD with TRIM support for the laptop, but I can't really use it with GELI disk encryption because the lack of TRIM support makes writing to the disk really slow after a while. I've been told this is not a huge job, but I wouldn't know. I can't understand why more people aren't asking for this. Do people not encrypt their laptops, or do they not use SSDs ? Thanks in advance! :) Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 00:14:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3ACAE60F for ; Tue, 19 Mar 2013 00:14:11 +0000 (UTC) (envelope-from prvs=1790199af7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id CDD5C7C1 for ; Tue, 19 Mar 2013 00:14:10 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002799626.msg for ; Tue, 19 Mar 2013 00:14:09 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 19 Mar 2013 00:14:09 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1790199af7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Tue, 19 Mar 2013 00:14:26 -0000 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 00:14:11 -0000 ----- Original Message -----=20 From: "Davide D'Amico" >>> And the result from sysbench: >>> General statistics: >>> total time: 82.9567s >>> total number of events: 1 >>> total time taken by event execution: 82.9545s >>=20 >> Thats hardly doing any disk access at all, so odd it would be doubling >> your benchmark time. >>=20 >>> Using a SSD: >>> # iostat mfid2 -x 2 >>> tty mfid2 cpu >>> tin tout KB/t tps MB/s us ni sy in id >>> 0 32 125.21 31 3.84 0 0 0 0 99 > [...] >>> 0 585 0.00 0 0.00 3 0 1 0 96 >>> 0 22 4.00 0 0.00 0 0 0 0 100 >>> And the result from sysbench: >>> General statistics: >>> total time: 36.1146s >>> total number of events: 1 >>> total time taken by event execution: 36.1123s >>> That are the same results using SAS disks. >>=20 >> So this is ZFS on the SSD, resulting the same benchmark results as=20 >> UFS? > This is UFS on SSD, that has the same behaviour than UFS on RAID10 HW=20 > on SAS drives. I'd recommend doing the same test on the SSD with ZFS as well as that would give you a simple like for like comparison. Regards Steve =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.=20 In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 00:38:39 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6F1D1955 for ; Tue, 19 Mar 2013 00:38:39 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-la0-x231.google.com (mail-la0-x231.google.com [IPv6:2a00:1450:4010:c03::231]) by mx1.freebsd.org (Postfix) with ESMTP id D39BF863 for ; Tue, 19 Mar 2013 00:38:38 +0000 (UTC) Received: by mail-la0-f49.google.com with SMTP id fs13so6890651lab.8 for ; Mon, 18 Mar 2013 17:38:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=QL+MHMa0V5luMGQvUqxsdUEiy57mzJCmgjDSVCqBuT0=; b=lT76LR4mY4YZISFSSzTalYcD9fCcd9T5bh/C1vFr9hWaxNBFf8A4LOu2g906fq6VMg nYbpZlN8iHNy0QCs6CZO8+oqXNqSC91a58NUcZHIlPL/r/tRTFalaVhO0MvbgbI7VqA+ MRUIUkwFbohFvM0Wp0mQmwqV2p3y59DGm6JZGufagYKDEPNB6y2Fp5x1psdRTfDnlrX5 oMnuNt3evxqIvRR0Pov7LfjrnbcWK3bxVeVQXvo/jJHYyaIoKyqGzY8PI3BPnKk7GRnb xNRDwb7tQN5vaLgM/GL91rzGkBvAvzmKkmoM1h7i2SF6N+cYpmC33vu/zx7FSqYWBboN lTTg== MIME-Version: 1.0 X-Received: by 10.152.109.208 with SMTP id hu16mr42288lab.45.1363653517786; Mon, 18 Mar 2013 17:38:37 -0700 (PDT) Received: by 10.112.144.104 with HTTP; Mon, 18 Mar 2013 17:38:37 -0700 (PDT) Received: by 10.112.144.104 with HTTP; Mon, 18 Mar 2013 17:38:37 -0700 (PDT) In-Reply-To: <20130318163833.GA11916@neutralgood.org> References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <20130318163833.GA11916@neutralgood.org> Date: Tue, 19 Mar 2013 01:38:37 +0100 Message-ID: Subject: Re: FreBSD 9.1 and ZFS v28 performances From: Damien Fleuriot To: kpneal@pobox.com X-Gm-Message-State: ALoCoQmoUKxuddyyz2q0nWdducG6PxJJxPReSy20ST8a0ODSKf0lvmDbTp+EuBrmfW9ORIHuyz16 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org, Davide D'Amico X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 00:38:39 -0000 On Mar 18, 2013 5:39 PM, wrote: > > On Mon, Mar 18, 2013 at 03:31:51PM -0000, Steven Hartland wrote: > > > > ----- Original Message ----- > > From: "Davide D'Amico" > > To: > > Sent: Monday, March 18, 2013 2:50 PM > > Subject: FreBSD 9.1 and ZFS v28 performances > > > > > > > Hi all, > > > I'm trying to use ZFS on a DELL R720 with 2x6-core, 32GB ram, H710 > > > controller (no JBOD) and 15K rpm SAS HD: I will use it for a mysql 5.6 > > > server, so I am trying to use ZFS to get L2ARC and ZIL benefits. > > > > > > I created a RAID10 and used zpool to create a pool on top: > > > > > > # zpool create DATA mfid3 > > > # zpool add DATA cache mfid1 log mfid2 > > > > > > I have a question on zfs performances. Using: > > > > > > dd if=/dev/zero of=file.out bs=16k count=1M > > > > > > I cannot go faster than 400MB/s so I think I'm missing something; I > > > tried removing zil, removing l2arc but everything is still the same. > > The ZIL only helps with synchronous writes. This is something apps must > request specifically typically and I would guess that dd would not do that. > So the ZIL doesn't affect your test. > > The L2ARC is a read cache. It does very little for writes. If the ZFS cache > working set fits all in memory then the L2ARC does nothing for you. Since > you are writing the only thing needed from the ARC is metadata. > > > > mfiutil show volumes: > > > mfi0 Volumes: > > > Id Size Level Stripe State Cache Name > > > mfid0 ( 278G) RAID-1 64k OPTIMAL Disabled > > > mfid1 ( 118G) RAID-0 64k OPTIMAL Disabled > > > mfid2 ( 118G) RAID-0 64k OPTIMAL Disabled > > > mfid3 ( 1116G) RAID-10 64k OPTIMAL Disabled > > > > > > zpool status: > > > pool: DATA > > > state: ONLINE > > > scan: none requested > > > config: > > > > > > NAME STATE READ WRITE CKSUM > > > DATA ONLINE 0 0 0 > > > mfid3 ONLINE 0 0 0 > > > logs > > > mfid2 ONLINE 0 0 0 > > > cache > > > mfid1 ONLINE 0 0 0 > > Warning: your ZIL should probably be mirrored. If it isn't, and the drive > fails, AND your machine takes a sudden dive (kernel panic, power outage, > etc) then you will lose data. > How so ? Unless he loses the zil device itself, I can't see how he'd lose pending trasnsactions. > > DATA primarycache metadata local > > DATA secondarycache all default > > Is there a specific reason that you are making a point of not putting > regular data in the ARC? If you do that then reads of data will look in > the L2ARC, which is a normal 15k drive, before hitting the main pool drives > which also consists of normal 15k drives. Adding an extra set of spinning > rust before accessing your spinning rust doesn't sound helpful. > > > HEAD has some significant changes for the mfi driver specifically:- > > http://svnweb.freebsd.org/base?view=revision&revision=247369 > > > > This fixes lots off bugs but also enables full queue support on TBOLT > > cards so if your mfi is a TBOLT card you may see some speed up in > > random IO, not that this would effect your test here. > > I believe the H710 is a TBOLT card. It was released with the 12G servers > like the R720. > That's a negatory, we've got r[4-7]10 servers here with h710 raid cards. > I don't believe the OP mentioned how many drives are in the RAID10. More > drives ~== more parallelism ~== better performance. So I too am wondering > how much performance is expected. > > > While having a separate ZIL disk is good, your benefits may well be > > limited if said disk is a traditional HD, better to look at enterprise > > SSD's for this. The same and them some applies to your L2ARC disks. > > Before purchasing SSD's check the H710 docs to make sure they are allowed. > The 6/i in my R610 specifically says that if an SSD is used it must be the > only drive. Your R720's H710 is much newer and thus may not have that > restriction. Still, checking the documentation is cheap. > > -- > Kevin P. Neal http://www.pobox.com/~kpn/ > > "It sounded pretty good, but it's hard to tell how it will work out > in practice." -- Dennis Ritchie, ~1977, "Summary of a DEC 32-bit machine" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 00:50:48 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5C6D5D20; Tue, 19 Mar 2013 00:50:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E16788C7; Tue, 19 Mar 2013 00:50:47 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEANe1R1GDaFvO/2dsb2JhbABDiC26I4JugXR0giQBAQUjBFIbDgoCAg0ZAlkGLod5r22SWYEjjC2BDDQHgi2BEwOWXpECgyYggTc1 X-IronPort-AV: E=Sophos;i="4.84,868,1355115600"; d="scan'208";a="19627880" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 18 Mar 2013 20:50:40 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 9B8C8B3F17; Mon, 18 Mar 2013 20:50:40 -0400 (EDT) Date: Mon, 18 Mar 2013 20:50:40 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <972993089.4035818.1363654240617.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201303181445.57714.jhb@freebsd.org> Subject: Re: Deadlock in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 00:50:48 -0000 John Baldwin wrote: > On Friday, March 15, 2013 10:03:39 pm Rick Macklem wrote: > > John Baldwin wrote: > > > On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov wrote: > > > > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote: > > > > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov > > > > > wrote: > > > > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem > > > > > > wrote: > > > > > > > John Baldwin wrote: > > > > > > > > I ran into a machine that had a deadlock among certain > > > > > > > > files > > > > > > > > on a > > > > > > > > given NFS > > > > > > > > mount today. I'm not sure how best to resolve it, though > > > > > > > > it > > > > > > > > seems like > > > > > > > > perhaps there is a bug with how the pool of nfsiod > > > > > > > > threads > > > > > > > > is managed. > > > > > > > > Anyway, more details on the actual hang below. This was > > > > > > > > on > > > > > > > > 8.x with > > > > > > > > the > > > > > > > > old NFS client, but I don't see anything in HEAD that > > > > > > > > would > > > > > > > > fix this. > > > > > > > > > > > > > > > > First note that the system was idle so it had dropped > > > > > > > > down > > > > > > > > to only one > > > > > > > > nfsiod thread. > > > > > > > > > > > > > > > Hmm, I see the problem and I'm a bit surprised it doesn't > > > > > > > bite > > > > > > > more often. > > > > > > > It seems to me that this snippet of code from > > > > > > > nfs_asyncio() > > > > > > > makes too > > > > > > > weak an assumption: > > > > > > > /* > > > > > > > * If none are free, we may already have an iod working > > > > > > > on > > > > > > > this mount > > > > > > > * point. If so, it will process our request. > > > > > > > */ > > > > > > > if (!gotiod) { > > > > > > > if (nmp->nm_bufqiods > 0) { > > > > > > > NFS_DPF(ASYNCIO, > > > > > > > ("nfs_asyncio: %d iods are already processing mount > > > > > > > %p\n", > > > > > > > nmp->nm_bufqiods, nmp)); > > > > > > > gotiod = TRUE; > > > > > > > } > > > > > > > } > > > > > > > It assumes that, since an nfsiod thread is processing some > > > > > > > buffer for the > > > > > > > mount, it will become available to do this one, which > > > > > > > isn't > > > > > > > true for your > > > > > > > deadlock. > > > > > > > > > > > > > > I think the simple fix would be to recode nfs_asyncio() so > > > > > > > that > > > > > > > it only returns 0 if it finds an AVAILABLE nfsiod thread > > > > > > > that > > > > > > > it > > > > > > > has assigned to do the I/O, getting rid of the above. The > > > > > > > problem > > > > > > > with doing this is that it may result in a lot more > > > > > > > synchronous I/O > > > > > > > (nfs_asyncio() returns EIO, so the caller does the I/O). > > > > > > > Maybe > > > > > > > more > > > > > > > synchronous I/O could be avoided by allowing nfs_asyncio() > > > > > > > to > > > > > > > create a > > > > > > > new thread even if the total is above nfs_iodmax. (I think > > > > > > > this would > > > > > > > require the fixed array to be replaced with a linked list > > > > > > > and > > > > > > > might > > > > > > > result in a large number of nfsiod threads.) Maybe just > > > > > > > having > > > > > > > a large > > > > > > > nfs_iodmax would be an adequate compromise? > > > > > > > > > > > > > > Does having a large # of nfsiod threads cause any serious > > > > > > > problem for > > > > > > > most systems these days? > > > > > > > > > > > > > > I'd be tempted to recode nfs_asyncio() as above and then, > > > > > > > instead > > > > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed > > > > > > > number of > > > > > > > nfsiod threads (this could be a tunable, with the > > > > > > > understanding that > > > > > > > it should be large for good performance) > > > > > > > > > > > > > > > > > > > I do not see how this would solve the deadlock itself. The > > > > > > proposal would > > > > > > only allow system to survive slightly longer after the > > > > > > deadlock > > > > > > appeared. > > > > > > And, I think that allowing the unbound amount of nfsiod > > > > > > threads > > > > > > is also > > > > > > fatal. > > > > > > > > > > > > The issue there is the LOR between buffer lock and vnode > > > > > > lock. > > > > > > Buffer lock > > > > > > always must come after the vnode lock. The problematic > > > > > > nfsiod > > > > > > thread, which > > > > > > locks the vnode, volatile this rule, because despite the > > > > > > LK_KERNPROC > > > > > > ownership of the buffer lock, it is the thread which de fact > > > > > > owns the > > > > > > buffer (only the thread can unlock it). > > > > > > > > > > > > A possible solution would be to pass LK_NOWAIT to nfs_nget() > > > > > > from the > > > > > > nfs_readdirplusrpc(). From my reading of the code, > > > > > > nfs_nget() > > > > > > should > > > > > > be capable of correctly handling the lock failure. And EBUSY > > > > > > would > > > > > > result in doit = 0, which should be fine too. > > > > > > > > > > > > It is possible that EBUSY should be reset to 0, though. > > > > > > > > > > Yes, thinking about this more, I do think the right answer is > > > > > for > > > > > readdirplus to do this. The only question I have is if it > > > > > should > > > > > do > > > > > this always, or if it should do this only from the nfsiod > > > > > thread. > > > > > I > > > > > believe you can't get this in the non-nfsiod case. > > > > > > > > I agree that it looks as of the workaround only needed for > > > > nfsiod > > > > thread. > > > > On the other hand, it is not immediately obvious how to detect > > > > that > > > > the current thread is nfsio daemon. Probably a thread flag > > > > should be > > > > set. > > > > > > OTOH, updating the attributes from readdir+ is only an > > > optimization > > > anyway, so > > > just having it always do LK_NOWAIT is probably ok (and simple). > > > Currently I'm > > > trying to develop a test case to provoke this so I can test the > > > fix, > > > but no > > > luck on that yet. > > > > > > -- > > > John Baldwin > > Just fyi, ignore my comment about the second version of the patch > > that > > disables the nfsiod threads from doing readdirplus running faster. > > It > > was just that when I tested the 2nd patch, the server's caches were > > primed. Oops. > > > > However, sofar the minimal testing I've done has been essentially > > performance neutral between the unpatch and patched versions. > > > > Hopefully John has a convenient way to do some performance testing, > > since I won't be able to do much until the end of April. > > This does fix the deadlock, but I can't really speak to performance. > Ok, well it seemed performance neutral for "ls -lR". I'll try some other things like doing "find" and see if I find any cases where there is a significant performance difference. (I don't have a particularily good performance testing setup, since it uses old laptops, but it's better than nothing.) If others have ideas for good tests to run for readdir perf., feel free to suggest them via email, rick > -- > John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 01:00:56 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 24944ECC; Tue, 19 Mar 2013 01:00:56 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id A5717924; Tue, 19 Mar 2013 01:00:54 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEABC3R1GDaFvO/2dsb2JhbABDiC26I4JugXR0giQBAQQBIwRSBRYOCgICDRkCWQYuh3MGr22SW4EjjTk0B4ItgRMDll6RAoMmIIFs X-IronPort-AV: E=Sophos;i="4.84,868,1355115600"; d="scan'208";a="19629136" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 18 Mar 2013 21:00:54 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4ACD2B3F1D; Mon, 18 Mar 2013 21:00:54 -0400 (EDT) Date: Mon, 18 Mar 2013 21:00:54 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <2141845166.4036172.1363654854297.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201303181001.10217.jhb@freebsd.org> Subject: Re: Deadlock in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 01:00:56 -0000 John Baldwin wrote: > On Friday, March 15, 2013 10:03:39 pm Rick Macklem wrote: > > John Baldwin wrote: > > > On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov wrote: > > > > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote: > > > > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov > > > > > wrote: > > > > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem > > > > > > wrote: > > > > > > > John Baldwin wrote: > > > > > > > > I ran into a machine that had a deadlock among certain > > > > > > > > files > > > > > > > > on a > > > > > > > > given NFS > > > > > > > > mount today. I'm not sure how best to resolve it, though > > > > > > > > it > > > > > > > > seems like > > > > > > > > perhaps there is a bug with how the pool of nfsiod > > > > > > > > threads > > > > > > > > is managed. > > > > > > > > Anyway, more details on the actual hang below. This was > > > > > > > > on > > > > > > > > 8.x with > > > > > > > > the > > > > > > > > old NFS client, but I don't see anything in HEAD that > > > > > > > > would > > > > > > > > fix this. > > > > > > > > > > > > > > > > First note that the system was idle so it had dropped > > > > > > > > down > > > > > > > > to only one > > > > > > > > nfsiod thread. > > > > > > > > > > > > > > > Hmm, I see the problem and I'm a bit surprised it doesn't > > > > > > > bite > > > > > > > more often. > > > > > > > It seems to me that this snippet of code from > > > > > > > nfs_asyncio() > > > > > > > makes too > > > > > > > weak an assumption: > > > > > > > /* > > > > > > > * If none are free, we may already have an iod working > > > > > > > on > > > > > > > this mount > > > > > > > * point. If so, it will process our request. > > > > > > > */ > > > > > > > if (!gotiod) { > > > > > > > if (nmp->nm_bufqiods > 0) { > > > > > > > NFS_DPF(ASYNCIO, > > > > > > > ("nfs_asyncio: %d iods are already processing mount > > > > > > > %p\n", > > > > > > > nmp->nm_bufqiods, nmp)); > > > > > > > gotiod = TRUE; > > > > > > > } > > > > > > > } > > > > > > > It assumes that, since an nfsiod thread is processing some > > > > > > > buffer for the > > > > > > > mount, it will become available to do this one, which > > > > > > > isn't > > > > > > > true for your > > > > > > > deadlock. > > > > > > > > > > > > > > I think the simple fix would be to recode nfs_asyncio() so > > > > > > > that > > > > > > > it only returns 0 if it finds an AVAILABLE nfsiod thread > > > > > > > that > > > > > > > it > > > > > > > has assigned to do the I/O, getting rid of the above. The > > > > > > > problem > > > > > > > with doing this is that it may result in a lot more > > > > > > > synchronous I/O > > > > > > > (nfs_asyncio() returns EIO, so the caller does the I/O). > > > > > > > Maybe > > > > > > > more > > > > > > > synchronous I/O could be avoided by allowing nfs_asyncio() > > > > > > > to > > > > > > > create a > > > > > > > new thread even if the total is above nfs_iodmax. (I think > > > > > > > this would > > > > > > > require the fixed array to be replaced with a linked list > > > > > > > and > > > > > > > might > > > > > > > result in a large number of nfsiod threads.) Maybe just > > > > > > > having > > > > > > > a large > > > > > > > nfs_iodmax would be an adequate compromise? > > > > > > > > > > > > > > Does having a large # of nfsiod threads cause any serious > > > > > > > problem for > > > > > > > most systems these days? > > > > > > > > > > > > > > I'd be tempted to recode nfs_asyncio() as above and then, > > > > > > > instead > > > > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed > > > > > > > number of > > > > > > > nfsiod threads (this could be a tunable, with the > > > > > > > understanding that > > > > > > > it should be large for good performance) > > > > > > > > > > > > > > > > > > > I do not see how this would solve the deadlock itself. The > > > > > > proposal would > > > > > > only allow system to survive slightly longer after the > > > > > > deadlock > > > > > > appeared. > > > > > > And, I think that allowing the unbound amount of nfsiod > > > > > > threads > > > > > > is also > > > > > > fatal. > > > > > > > > > > > > The issue there is the LOR between buffer lock and vnode > > > > > > lock. > > > > > > Buffer lock > > > > > > always must come after the vnode lock. The problematic > > > > > > nfsiod > > > > > > thread, which > > > > > > locks the vnode, volatile this rule, because despite the > > > > > > LK_KERNPROC > > > > > > ownership of the buffer lock, it is the thread which de fact > > > > > > owns the > > > > > > buffer (only the thread can unlock it). > > > > > > > > > > > > A possible solution would be to pass LK_NOWAIT to nfs_nget() > > > > > > from the > > > > > > nfs_readdirplusrpc(). From my reading of the code, > > > > > > nfs_nget() > > > > > > should > > > > > > be capable of correctly handling the lock failure. And EBUSY > > > > > > would > > > > > > result in doit = 0, which should be fine too. > > > > > > > > > > > > It is possible that EBUSY should be reset to 0, though. > > > > > > > > > > Yes, thinking about this more, I do think the right answer is > > > > > for > > > > > readdirplus to do this. The only question I have is if it > > > > > should > > > > > do > > > > > this always, or if it should do this only from the nfsiod > > > > > thread. > > > > > I > > > > > believe you can't get this in the non-nfsiod case. > > > > > > > > I agree that it looks as of the workaround only needed for > > > > nfsiod > > > > thread. > > > > On the other hand, it is not immediately obvious how to detect > > > > that > > > > the current thread is nfsio daemon. Probably a thread flag > > > > should be > > > > set. > > > > > > OTOH, updating the attributes from readdir+ is only an > > > optimization > > > anyway, so > > > just having it always do LK_NOWAIT is probably ok (and simple). > > > Currently I'm > > > trying to develop a test case to provoke this so I can test the > > > fix, > > > but no > > > luck on that yet. > > > > > > -- > > > John Baldwin > > Just fyi, ignore my comment about the second version of the patch > > that > > disables the nfsiod threads from doing readdirplus running faster. > > It > > was just that when I tested the 2nd patch, the server's caches were > > primed. Oops. > > > > However, sofar the minimal testing I've done has been essentially > > performance neutral between the unpatch and patched versions. > > > > Hopefully John has a convenient way to do some performance testing, > > since I won't be able to do much until the end of April. > > Performance testing I don't really have available. All I've been doing are things like (assuming /mnt is an NFSv3 mount point): # cd /mnt # time ls -lR > /dev/null # time ls -R > /dev/null - for both a patched and unpatched kernel (Oh, and you need to keep the server's caches pretty consistent. For me once I run the test once, the server caches end up primed and then the times seem to be pretty consistent, but I am only using old laptops.) Maybe you could do something like the above? (I'll try some finds too.) (I don't really have any clever ideas for other tests.) rick ps: There is a readdir test in Connectathon test suite I'll run too. > What I am focusing > on atm > is testing that the deadlock is fixed (I have a way to reproduce it > now). > > -- > John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 01:12:09 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 10370202 for ; Tue, 19 Mar 2013 01:12:09 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [176.9.9.186]) by mx1.freebsd.org (Postfix) with ESMTP id EBD2D970 for ; Tue, 19 Mar 2013 01:12:06 +0000 (UTC) Received: from [10.10.1.100] (unknown [217.71.4.82]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id B867DD9D54; Tue, 19 Mar 2013 02:11:59 +0100 (CET) X-DKIM: OpenDKIM Filter v2.5.2 mail.tyknet.dk B867DD9D54 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gibfest.dk; s=default; t=1363655524; bh=7mjPJ0WtO25ShsnIPtbx7XFXOBSgHiwZoaDF+82O1so=; h=Date:From:To:CC:Subject:References:In-Reply-To; b=Z5/W9a97wsZ/qfvwEruerZqOC3vXrYqucD4MCCfp01VarLpmiEfRF6ATsXcZlHmUG 1BlzV/JA90eSOVcyYeHnLypIkDBr44CSCO85SV28hckrGNsHEtptSkCMSf4gKQsU4J af7dIdu/2fKSc/c+dMgEpBhTVkl+rsl2znqrEiW8= Message-ID: <5147BB5C.7020205@gibfest.dk> Date: Tue, 19 Mar 2013 02:11:56 +0100 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: kpneal@pobox.com Subject: Re: When will we see TRIM support for GELI volumes ? References: <51479D54.1040509@gibfest.dk> <20130319000232.GA18711@neutralgood.org> In-Reply-To: <20130319000232.GA18711@neutralgood.org> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 01:12:09 -0000 On 19-03-2013 01:02, kpneal@pobox.com wrote: > On Tue, Mar 19, 2013 at 12:03:48AM +0100, Thomas Steen Rasmussen wrote: >> Hello there, >> >> I was happy to see TRIM support in UFS and ZFS, however: >> I would really like to see TRIM support for GELI volumes. >> >> I finally got an SSD with TRIM support for the laptop, but I can't >> really use it with GELI disk encryption because the lack of TRIM >> support makes writing to the disk really slow after a while. >> >> I've been told this is not a huge job, but I wouldn't know. >> >> I can't understand why more people aren't asking for this. >> Do people not encrypt their laptops, or do they not use SSDs ? > Wouldn't that defeat the purpose somewhat? > > With an encrypted disk an attacker who gets the disk does not know > which parts of the disk have valid data and which do not. But with > TRIM the drive does know where the valid data is, and so an attacker > knows as well. > > Does it make sense to put a flashing neon sign up that says "secret data > right here!"? Hello, This is a bit off topic, but I'll bite: I suppose it depends on the use-case. personally I could care less if a thief who steals my laptop knows that the disk contains encrypted data. If I was hiding some top secret files from a government I might feel different, but I'm not so I don't. I do feel though that in this day in age we should strive to encrypt everything, even data that is not secret. Network connections too. Doing so protects your privacy, and more importantly, if one day you DO have something that is really secret, it doesn't stand out :) Have you tried using an SSD without TRIM support ? It really is awfully slow, I'm talking 10-20-30 seconds freezes while the disk is writing. It is not usable - but neither is a laptop without disk encryption (to me) :) /Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 07:09:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DDB3216A for ; Tue, 19 Mar 2013 07:09:44 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 445E2829 for ; Tue, 19 Mar 2013 07:09:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363676981; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=t3WhTOrOMpoe9TCk+z5ewLAAOJUGSMF5ofax2W/m3jc=; b=ANtVlD2j/N0NSL8p9tQRMwfjBGyNdoKM7NmW9jU8B5fgjdT/dtuyk0hl1jy0jd2m CmLDaPmvVhdaH88/F9XyZxPTZpWN9Ui/3q+g+/KtANFW9dGvih3kZUoRFEH5LjGX IFscjAUNLtXYoAo3R6XOjL4+VYmrX9jDfX0i5Ba8KiU=; Received: from [213.92.90.12] ([213.92.90.12:26604] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 9F/ED-24145-53F08415; Tue, 19 Mar 2013 08:09:41 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHqg0-0002ph-V2 for freebsd-fs@freebsd.org; Tue, 19 Mar 2013 08:09:41 +0100 Received: (qmail 10886 invoked by uid 80); 19 Mar 2013 07:09:40 -0000 To: Damien Fleuriot Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.16 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Tue, 19 Mar 2013 08:09:40 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <20130318163833.GA11916@neutralgood.org> Message-ID: X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 07:09:44 -0000 [...] > That's a negatory, we've got r[4-7]10 servers here with h710 raid > cards. Thank you, and what kind of performances there? d. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 07:12:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B793A32B for ; Tue, 19 Mar 2013 07:12:37 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 819EA856 for ; Tue, 19 Mar 2013 07:12:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363677155; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=7R1Hjkq3SWT55kFFhipF/oFatS+XCXLuoliFoCiSVmI=; b=o1doOYPxw9UvuqauDGRmFInBitCdTX7JYBo994mqTYucqszeWHoYtgp4bx4B48C6 Wes9d9dY7nV9BhmO/I6+4Rb+i2sZy+RXkdM+x/2ZQ8OBNLyZaxYOw2Qa9/jo7w6X 4jcOcSsFdWUvWLkbRDuoh1Dwq+SpKEBlqOtGKe+60VI=; Received: from [213.92.90.12] ([213.92.90.12:11073] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 60/02-24145-3EF08415; Tue, 19 Mar 2013 08:12:35 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHqio-0002wd-Aj for freebsd-fs@freebsd.org; Tue, 19 Mar 2013 08:12:34 +0100 Received: (qmail 11318 invoked by uid 80); 19 Mar 2013 07:12:34 -0000 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.228 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Tue, 19 Mar 2013 08:12:34 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> Message-ID: <01576f39e05f96ab3b3c822531e0c286@sys.tomatointeractive.it> X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 07:12:37 -0000 [...] > I'd recommend doing the same test on the SSD with ZFS as well as that > would > give you a simple like for like comparison. Here we are: # zpool create DATAZFS mfid2 1° round: General statistics: total time: 90.1662s total number of events: 1 total time taken by event execution: 90.1631s 2° round: General statistics: total time: 82.3333s total number of events: 1 total time taken by event execution: 82.3307s 3° round: General statistics: total time: 81.5421s total number of events: 1 total time taken by event execution: 81.5399s 4° round: General statistics: total time: 82.1657s total number of events: 1 total time taken by event execution: 82.1636s # zfs get all DATAZFS NAME PROPERTY VALUE SOURCE DATAZFS type filesystem - DATAZFS creation Tue Mar 19 7:36 2013 - DATAZFS used 52.0G - DATAZFS available 64.2G - DATAZFS referenced 52.0G - DATAZFS compressratio 1.00x - DATAZFS mounted yes - DATAZFS quota none default DATAZFS reservation none default DATAZFS recordsize 128K default DATAZFS mountpoint /DATAZFS default DATAZFS sharenfs off default DATAZFS checksum on default DATAZFS compression off default DATAZFS atime off local DATAZFS devices on default DATAZFS exec on default DATAZFS setuid on default DATAZFS readonly off default DATAZFS jailed off default DATAZFS snapdir hidden default DATAZFS aclmode discard default DATAZFS aclinherit restricted default DATAZFS canmount on default DATAZFS xattr off temporary DATAZFS copies 1 default DATAZFS version 5 - DATAZFS utf8only off - DATAZFS normalization none - DATAZFS casesensitivity sensitive - DATAZFS vscan off default DATAZFS nbmand off default DATAZFS sharesmb off default DATAZFS refquota none default DATAZFS refreservation none default DATAZFS primarycache all default DATAZFS secondarycache all default DATAZFS usedbysnapshots 0 - DATAZFS usedbydataset 52.0G - DATAZFS usedbychildren 225K - DATAZFS usedbyrefreservation 0 - DATAZFS logbias latency default DATAZFS dedup off default DATAZFS mlslabel - DATAZFS sync standard default DATAZFS refcompressratio 1.00x - DATAZFS written 52.0G - Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 08:18:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2F67A818 for ; Tue, 19 Mar 2013 08:18:19 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from flat.berklix.org (flat.berklix.org [83.236.223.115]) by mx1.freebsd.org (Postfix) with ESMTP id AAA92AC8 for ; Tue, 19 Mar 2013 08:18:17 +0000 (UTC) Received: from mart.js.berklix.net (pD9FBEBA4.dip.t-dialin.net [217.251.235.164]) (authenticated bits=128) by flat.berklix.org (8.14.5/8.14.5) with ESMTP id r2J8HNA7055092; Tue, 19 Mar 2013 09:17:24 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id r2J8HuKB065604; Tue, 19 Mar 2013 09:17:56 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id r2J8Hkdg052031; Tue, 19 Mar 2013 09:17:51 +0100 (CET) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201303190817.r2J8Hkdg052031@fire.js.berklix.net> To: Thomas Steen Rasmussen Subject: Re: When will we see TRIM support for GELI volumes ? From: "Julian H. Stacey" Organization: http://berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Tue, 19 Mar 2013 02:11:56 +0100." <5147BB5C.7020205@gibfest.dk> Date: Tue, 19 Mar 2013 09:17:46 +0100 Sender: jhs@berklix.com Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 08:18:19 -0000 Thomas Steen Rasmussen wrote: > On 19-03-2013 01:02, kpneal@pobox.com wrote: > > On Tue, Mar 19, 2013 at 12:03:48AM +0100, Thomas Steen Rasmussen wrote: > >> Hello there, > >> > >> I was happy to see TRIM support in UFS and ZFS, however: > >> I would really like to see TRIM support for GELI volumes. > >> > >> I finally got an SSD with TRIM support for the laptop, but I can't > >> really use it with GELI disk encryption because the lack of TRIM > >> support makes writing to the disk really slow after a while. > >> > >> I've been told this is not a huge job, but I wouldn't know. > >> > >> I can't understand why more people aren't asking for this. > >> Do people not encrypt their laptops, or do they not use SSDs ? > > Wouldn't that defeat the purpose somewhat? > > > > With an encrypted disk an attacker who gets the disk does not know > > which parts of the disk have valid data and which do not. But with > > TRIM the drive does know where the valid data is, and so an attacker > > knows as well. > > > > Does it make sense to put a flashing neon sign up that says "secret data > > right here!"? > Hello, > > This is a bit off topic, but I'll bite: > > I suppose it depends on the use-case. personally I could care > less if a thief who steals my laptop knows that the disk > contains encrypted data. If I was hiding some top secret files > from a government I might feel different, but I'm not so I don't. > > I do feel though that in this day in age we should strive to encrypt > everything, even data that is not secret. Network connections too. > > Doing so protects your privacy, and more importantly, if one day > you DO have something that is really secret, it doesn't stand out :) > > Have you tried using an SSD without TRIM support ? It really is > awfully slow, I'm talking 10-20-30 seconds freezes while the disk > is writing. It is not usable - but neither is a laptop without disk > encryption (to me) :) My laptop has a hard disk with gbde encryption not geli. No big pauses I've noticed. Maybe your pauses may come from something else ? ( eg lack of RAM or CPU ? (in my case on a tower + X, my I see occasional nasty long pauses from bursts of background activity when crontab + fetchmail feeds occasional large files into procmail with 15,000 anti spam rules), yup, my own fault ) To find what's causing your pauses, ideas to be tried on similar load: top, iostat, (etc) take out components to narrow down suspicion: try gbde instead for a while for comparison try a hard disk (*) for a while to see if its the SSD (*: internal or external boot via USB, OK, clunky, but only for a while for test). Good luck Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com Reply below not above, like a play script. Indent old text with "> ". Send plain text. No quoted-printable, HTML, base64, multipart/alternative. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 08:25:58 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 071EDAFC for ; Tue, 19 Mar 2013 08:25:58 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id AB026B29 for ; Tue, 19 Mar 2013 08:25:57 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 0B36D81C; Tue, 19 Mar 2013 09:22:39 +0100 (CET) Date: Tue, 19 Mar 2013 09:27:32 +0100 From: Pawel Jakub Dawidek To: Thomas Steen Rasmussen Subject: Re: When will we see TRIM support for GELI volumes ? Message-ID: <20130319082732.GB1367@garage.freebsd.pl> References: <51479D54.1040509@gibfest.dk> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PmA2V3Z32TCmWXqI" Content-Disposition: inline In-Reply-To: <51479D54.1040509@gibfest.dk> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 08:25:58 -0000 --PmA2V3Z32TCmWXqI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 19, 2013 at 12:03:48AM +0100, Thomas Steen Rasmussen wrote: > Hello there, >=20 > I was happy to see TRIM support in UFS and ZFS, however: > I would really like to see TRIM support for GELI volumes. At this point I am convinced that TRIM support should be added to GELI. In the past I wanted to implement BIO_DELETE support as secure delete, which can be found in the comment: case BIO_DELETE: /* * We could eventually support BIO_DELETE request. * It could be done by overwritting requested sector * with random data g_eli_overwrites number of times. */ default: g_io_deliver(bp, EOPNOTSUPP); return; } This was written back when none of the file systems we had supported TRIM. > I finally got an SSD with TRIM support for the laptop, but I can't > really use it with GELI disk encryption because the lack of TRIM > support makes writing to the disk really slow after a while. This is not what I see. On one of my SSDs in my laptop I've two partitions, both running ZFS, but one of them on top of GELI. I don't use ZFS TRIM yet, as I see no slowdown whatsoever. How can you say this is lack of TRIM slowing your writes? The performance degraded over time? > I've been told this is not a huge job, but I wouldn't know. It isn't. My idea to implement this is the following: - Add -t and -T flags to geli init/onetime/configure subcommands. -t will enable TRIM and -T will disable it. TRIM should be enabled by default for providers that are only encrypted and disabled by default for providers with integrity verification. - Add G_ELI_FLAG_TRIM flag that is set by default and configured using new switches above. - Update g_eli.c to pass BIO_DELETEs down if the G_ELI_FLAG_TRIM flag is set. If BIO_DELETE returns EOPNOTSUPP error, the G_ELI_FLAG_TRIM should be removed from the in-memory structure (but not from on-disk metadata, of course). Unfortunately I have no time currently to implement this, so if someone would like to beat me to it, this is how I'd imagine it. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --PmA2V3Z32TCmWXqI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlFIIXQACgkQForvXbEpPzTfGACgzz1qbF8TyixQ9A8Oja2bc7YF kRoAoM7yd1L6tMNDMAP6mSqg+jcCn2Wt =GTl0 -----END PGP SIGNATURE----- --PmA2V3Z32TCmWXqI-- From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 09:18:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AB519DA8 for ; Tue, 19 Mar 2013 09:18:35 +0000 (UTC) (envelope-from prvs=1790199af7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 4981F26B for ; Tue, 19 Mar 2013 09:18:35 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002804292.msg for ; Tue, 19 Mar 2013 09:18:32 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 19 Mar 2013 09:18:32 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1790199af7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <0D88348E154D43E58597FF40BA41D22F@multiplay.co.uk> From: "Steven Hartland" To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> <01576f39e05f96ab3b3c822531e0c286@sys.tomatointeractive.it> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Tue, 19 Mar 2013 09:18:54 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=response Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 09:18:35 -0000 ----- Original Message ----- From: "Davide D'Amico" To: "Steven Hartland" Cc: Sent: Tuesday, March 19, 2013 7:12 AM Subject: Re: FreBSD 9.1 and ZFS v28 performances > [...] >> I'd recommend doing the same test on the SSD with ZFS as well as that would >> give you a simple like for like comparison. > > Here we are: > > # zpool create DATAZFS mfid2 > > 1° round: > General statistics: > total time: 90.1662s > total number of events: 1 > total time taken by event execution: 90.1631s > > 2° round: > General statistics: > total time: 82.3333s > total number of events: 1 > total time taken by event execution: 82.3307s > > 3° round: > General statistics: > total time: 81.5421s > total number of events: 1 > total time taken by event execution: 81.5399s > > 4° round: > General statistics: > total time: 82.1657s > total number of events: 1 > total time taken by event execution: 82.1636s > > > # zfs get all DATAZFS > NAME PROPERTY VALUE SOURCE > DATAZFS type filesystem - > DATAZFS creation Tue Mar 19 7:36 2013 - > DATAZFS used 52.0G - > DATAZFS available 64.2G - > DATAZFS referenced 52.0G - > DATAZFS compressratio 1.00x - > DATAZFS mounted yes - > DATAZFS quota none default > DATAZFS reservation none default > DATAZFS recordsize 128K default > DATAZFS mountpoint /DATAZFS default > DATAZFS sharenfs off default > DATAZFS checksum on default > DATAZFS compression off default > DATAZFS atime off local > DATAZFS devices on default > DATAZFS exec on default > DATAZFS setuid on default > DATAZFS readonly off default > DATAZFS jailed off default > DATAZFS snapdir hidden default > DATAZFS aclmode discard default > DATAZFS aclinherit restricted default > DATAZFS canmount on default > DATAZFS xattr off temporary > DATAZFS copies 1 default > DATAZFS version 5 - > DATAZFS utf8only off - > DATAZFS normalization none - > DATAZFS casesensitivity sensitive - > DATAZFS vscan off default > DATAZFS nbmand off default > DATAZFS sharesmb off default > DATAZFS refquota none default > DATAZFS refreservation none default > DATAZFS primarycache all default > DATAZFS secondarycache all default > DATAZFS usedbysnapshots 0 - > DATAZFS usedbydataset 52.0G - > DATAZFS usedbychildren 225K - > DATAZFS usedbyrefreservation 0 - > DATAZFS logbias latency default > DATAZFS dedup off default > DATAZFS mlslabel - > DATAZFS sync standard default > DATAZFS refcompressratio 1.00x - > DATAZFS written 52.0G - That's got the wrong record size for mysql :( ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 09:22:51 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DF04BE8F for ; Tue, 19 Mar 2013 09:22:51 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 5D7AB29D for ; Tue, 19 Mar 2013 09:22:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363684969; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=gjAcgnIqLgM38w4+m1RMXQ0aYzpEdWXsRPB233Og0mM=; b=cMtOedExmywgLxTLWK7Ymhm51Pw/xezyyyVdTkp+1O/St3MpfF+w131mZi1rZxgp EK1+BAD07YtVpm/xdb/9nHEt9zOwIqlrCzZ3QO4357alFC7s3oJEVh8UsE22kfeo pn8u9EgmFJQbfr7Eia8H7/b4EnoTk3XyRyPeEpZCmkI=; Received: from [213.92.90.12] ([213.92.90.12:58752] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 85/05-24145-96E28415; Tue, 19 Mar 2013 10:22:49 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHskq-0009lJ-Q7 for freebsd-fs@freebsd.org; Tue, 19 Mar 2013 10:22:49 +0100 Received: (qmail 37524 invoked by uid 89); 19 Mar 2013 09:22:48 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 19 Mar 2013 09:22:48 -0000 Message-ID: <51482E67.8060900@contactlab.com> Date: Tue, 19 Mar 2013 10:22:47 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> <01576f39e05f96ab3b3c822531e0c286@sys.tomatointeractive.it> <0D88348E154D43E58597FF40BA41D22F@multiplay.co.uk> In-Reply-To: <0D88348E154D43E58597FF40BA41D22F@multiplay.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 09:22:51 -0000 Il 19/03/13 10:18, Steven Hartland ha scritto: > > ----- Original Message ----- From: "Davide D'Amico" > > To: "Steven Hartland" > Cc: > Sent: Tuesday, March 19, 2013 7:12 AM > Subject: Re: FreBSD 9.1 and ZFS v28 performances > > >> [...] >>> I'd recommend doing the same test on the SSD with ZFS as well as that >>> would >>> give you a simple like for like comparison. >> >> Here we are: >> >> # zpool create DATAZFS mfid2 >> >> 1° round: >> General statistics: >> total time: 90.1662s >> total number of events: 1 >> total time taken by event execution: 90.1631s >> >> 2° round: >> General statistics: >> total time: 82.3333s >> total number of events: 1 >> total time taken by event execution: 82.3307s >> >> 3° round: >> General statistics: >> total time: 81.5421s >> total number of events: 1 >> total time taken by event execution: 81.5399s >> >> 4° round: >> General statistics: >> total time: 82.1657s >> total number of events: 1 >> total time taken by event execution: 82.1636s >> >> >> # zfs get all DATAZFS >> NAME PROPERTY VALUE SOURCE >> DATAZFS type filesystem - >> DATAZFS creation Tue Mar 19 7:36 2013 - >> DATAZFS used 52.0G - >> DATAZFS available 64.2G - >> DATAZFS referenced 52.0G - >> DATAZFS compressratio 1.00x - >> DATAZFS mounted yes - >> DATAZFS quota none default >> DATAZFS reservation none default >> DATAZFS recordsize 128K default >> DATAZFS mountpoint /DATAZFS default >> DATAZFS sharenfs off default >> DATAZFS checksum on default >> DATAZFS compression off default >> DATAZFS atime off local >> DATAZFS devices on default >> DATAZFS exec on default >> DATAZFS setuid on default >> DATAZFS readonly off default >> DATAZFS jailed off default >> DATAZFS snapdir hidden default >> DATAZFS aclmode discard default >> DATAZFS aclinherit restricted default >> DATAZFS canmount on default >> DATAZFS xattr off temporary >> DATAZFS copies 1 default >> DATAZFS version 5 - >> DATAZFS utf8only off - >> DATAZFS normalization none - >> DATAZFS casesensitivity sensitive - >> DATAZFS vscan off default >> DATAZFS nbmand off default >> DATAZFS sharesmb off default >> DATAZFS refquota none default >> DATAZFS refreservation none default >> DATAZFS primarycache all default >> DATAZFS secondarycache all default >> DATAZFS usedbysnapshots 0 - >> DATAZFS usedbydataset 52.0G - >> DATAZFS usedbychildren 225K - >> DATAZFS usedbyrefreservation 0 - >> DATAZFS logbias latency default >> DATAZFS dedup off default >> DATAZFS mlslabel - >> DATAZFS sync standard default >> DATAZFS refcompressratio 1.00x - >> DATAZFS written 52.0G - > > That's got the wrong record size for mysql :( Sorry, my fault (I've made so many tests...). I'll modify it asap, copy the mysql/* files again and posting here the results. Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 09:25:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7335FF22 for ; Tue, 19 Mar 2013 09:25:54 +0000 (UTC) (envelope-from prvs=1790199af7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id E32072B3 for ; Tue, 19 Mar 2013 09:25:53 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002804341.msg for ; Tue, 19 Mar 2013 09:25:51 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 19 Mar 2013 09:25:51 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1790199af7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <394EAD0E59004D1B8D2010F94DDB7C84@multiplay.co.uk> From: "Steven Hartland" To: "Davide D'Amico" References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> <01576f39e05f96ab3b3c822531e0c286@sys.tomatointeractive.it> <0D88348E154D43E58597FF40BA41D22F@multiplay.co.uk> <51482E67.8060900@contactlab.com> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Tue, 19 Mar 2013 09:26:12 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=response Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 09:25:54 -0000 ----- Original Message ----- From: "Davide D'Amico" >>> [...] >>>> I'd recommend doing the same test on the SSD with ZFS as well as that >>>> would >>>> give you a simple like for like comparison. >>> >>> Here we are: >>> >>> # zpool create DATAZFS mfid2 >>> >>> 1° round: >>> General statistics: >>> total time: 90.1662s >>> total number of events: 1 >>> total time taken by event execution: 90.1631s >>> >>> 2° round: >>> General statistics: >>> total time: 82.3333s >>> total number of events: 1 >>> total time taken by event execution: 82.3307s >>> >>> 3° round: >>> General statistics: >>> total time: 81.5421s >>> total number of events: 1 >>> total time taken by event execution: 81.5399s >>> >>> 4° round: >>> General statistics: >>> total time: 82.1657s >>> total number of events: 1 >>> total time taken by event execution: 82.1636s >>> >>> >>> # zfs get all DATAZFS >>> NAME PROPERTY VALUE SOURCE >>> DATAZFS type filesystem - >>> DATAZFS creation Tue Mar 19 7:36 2013 - >>> DATAZFS used 52.0G - >>> DATAZFS available 64.2G - >>> DATAZFS referenced 52.0G - >>> DATAZFS compressratio 1.00x - >>> DATAZFS mounted yes - >>> DATAZFS quota none default >>> DATAZFS reservation none default >>> DATAZFS recordsize 128K default >>> DATAZFS mountpoint /DATAZFS default >>> DATAZFS sharenfs off default >>> DATAZFS checksum on default >>> DATAZFS compression off default >>> DATAZFS atime off local >>> DATAZFS devices on default >>> DATAZFS exec on default >>> DATAZFS setuid on default >>> DATAZFS readonly off default >>> DATAZFS jailed off default >>> DATAZFS snapdir hidden default >>> DATAZFS aclmode discard default >>> DATAZFS aclinherit restricted default >>> DATAZFS canmount on default >>> DATAZFS xattr off temporary >>> DATAZFS copies 1 default >>> DATAZFS version 5 - >>> DATAZFS utf8only off - >>> DATAZFS normalization none - >>> DATAZFS casesensitivity sensitive - >>> DATAZFS vscan off default >>> DATAZFS nbmand off default >>> DATAZFS sharesmb off default >>> DATAZFS refquota none default >>> DATAZFS refreservation none default >>> DATAZFS primarycache all default >>> DATAZFS secondarycache all default >>> DATAZFS usedbysnapshots 0 - >>> DATAZFS usedbydataset 52.0G - >>> DATAZFS usedbychildren 225K - >>> DATAZFS usedbyrefreservation 0 - >>> DATAZFS logbias latency default >>> DATAZFS dedup off default >>> DATAZFS mlslabel - >>> DATAZFS sync standard default >>> DATAZFS refcompressratio 1.00x - >>> DATAZFS written 52.0G - >> >> That's got the wrong record size for mysql :( > > Sorry, my fault (I've made so many tests...). I'll modify it asap, copy the mysql/* files again and posting here the results. Is this a test I could possibly run here? I have a machine on test for mysql so if its something you can let me have the data for I can run some tests locally too. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 09:28:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C44D8143 for ; Tue, 19 Mar 2013 09:28:24 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 48CFE2E3 for ; Tue, 19 Mar 2013 09:28:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363685303; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=ANrVo+KmA50pxJYx93WZNrctB1nbTgGYRYQfQmmwtLQ=; b=r9gc62B3OlbDEFeNnrH1fFUxlxIv+XVN3ZsXNI8QfCNbaxmBFizzWmhHQcGz++lY cYp5hQbpi7tXzCajj/NvevYKaydFOKHcEu9YgrjNGShStnPUqlYlRT3YszbhRlvR AhrTIfp7NGKQvWZOXhCu8ufvF/MbNgg5z4ZeNaEnvas=; Received: from [213.92.90.12] ([213.92.90.12:42278] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id AF/1E-24145-7BF28415; Tue, 19 Mar 2013 10:28:23 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UHsqF-000AIA-3v for freebsd-fs@freebsd.org; Tue, 19 Mar 2013 10:28:23 +0100 Received: (qmail 39559 invoked by uid 89); 19 Mar 2013 09:28:22 -0000 Received: from localhost (HELO davepro.local) (127.0.0.1) by mx3-master.housing.tomato.lan with SMTP; 19 Mar 2013 09:28:22 -0000 Message-ID: <51482FB5.2000305@contactlab.com> Date: Tue, 19 Mar 2013 10:28:21 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> <01576f39e05f96ab3b3c822531e0c286@sys.tomatointeractive.it> <0D88348E154D43E58597FF40BA41D22F@multiplay.co.uk> <51482E67.8060900@contactlab.com> <394EAD0E59004D1B8D2010F94DDB7C84@multiplay.co.uk> In-Reply-To: <394EAD0E59004D1B8D2010F94DDB7C84@multiplay.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 09:28:24 -0000 Il 19/03/13 10:26, Steven Hartland ha scritto: > > ----- Original Message ----- From: "Davide D'Amico" > >>>> [...] >>>>> I'd recommend doing the same test on the SSD with ZFS as well as that >>>>> would >>>>> give you a simple like for like comparison. >>>> >>>> Here we are: >>>> >>>> # zpool create DATAZFS mfid2 >>>> >>>> 1° round: >>>> General statistics: >>>> total time: 90.1662s >>>> total number of events: 1 >>>> total time taken by event execution: 90.1631s >>>> >>>> 2° round: >>>> General statistics: >>>> total time: 82.3333s >>>> total number of events: 1 >>>> total time taken by event execution: 82.3307s >>>> >>>> 3° round: >>>> General statistics: >>>> total time: 81.5421s >>>> total number of events: 1 >>>> total time taken by event execution: 81.5399s >>>> >>>> 4° round: >>>> General statistics: >>>> total time: 82.1657s >>>> total number of events: 1 >>>> total time taken by event execution: 82.1636s >>>> >>>> >>>> # zfs get all DATAZFS >>>> NAME PROPERTY VALUE SOURCE >>>> DATAZFS type filesystem - >>>> DATAZFS creation Tue Mar 19 7:36 2013 - >>>> DATAZFS used 52.0G - >>>> DATAZFS available 64.2G - >>>> DATAZFS referenced 52.0G - >>>> DATAZFS compressratio 1.00x - >>>> DATAZFS mounted yes - >>>> DATAZFS quota none default >>>> DATAZFS reservation none default >>>> DATAZFS recordsize 128K default >>>> DATAZFS mountpoint /DATAZFS default >>>> DATAZFS sharenfs off default >>>> DATAZFS checksum on default >>>> DATAZFS compression off default >>>> DATAZFS atime off local >>>> DATAZFS devices on default >>>> DATAZFS exec on default >>>> DATAZFS setuid on default >>>> DATAZFS readonly off default >>>> DATAZFS jailed off default >>>> DATAZFS snapdir hidden default >>>> DATAZFS aclmode discard default >>>> DATAZFS aclinherit restricted default >>>> DATAZFS canmount on default >>>> DATAZFS xattr off temporary >>>> DATAZFS copies 1 default >>>> DATAZFS version 5 - >>>> DATAZFS utf8only off - >>>> DATAZFS normalization none - >>>> DATAZFS casesensitivity sensitive - >>>> DATAZFS vscan off default >>>> DATAZFS nbmand off default >>>> DATAZFS sharesmb off default >>>> DATAZFS refquota none default >>>> DATAZFS refreservation none default >>>> DATAZFS primarycache all default >>>> DATAZFS secondarycache all default >>>> DATAZFS usedbysnapshots 0 - >>>> DATAZFS usedbydataset 52.0G - >>>> DATAZFS usedbychildren 225K - >>>> DATAZFS usedbyrefreservation 0 - >>>> DATAZFS logbias latency default >>>> DATAZFS dedup off default >>>> DATAZFS mlslabel - >>>> DATAZFS sync standard default >>>> DATAZFS refcompressratio 1.00x - >>>> DATAZFS written 52.0G - >>> >>> That's got the wrong record size for mysql :( >> >> Sorry, my fault (I've made so many tests...). I'll modify it asap, >> copy the mysql/* files again and posting here the results. > > Is this a test I could possibly run here? I have a machine on test for > mysql so if its something you can let me have the data for I can run > some tests locally too. i don't know if you have installed sysbench 0.5.0 somewhere in you server, but we could test using *standard* oltp tests (the dataset I'm using is strictly private) included in sysbench package. Is it possible, for you? I'm using mysql-5.6.10 enterprise (but I think the one you find in the ports tree it's a good choice, too). Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 09:59:18 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EDC017B4; Tue, 19 Mar 2013 09:59:18 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [176.9.9.186]) by mx1.freebsd.org (Postfix) with ESMTP id 8223169A; Tue, 19 Mar 2013 09:59:17 +0000 (UTC) Received: from [10.20.15.71] (out1.hq.siminn.dk [195.184.109.1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id D054FDA40D; Tue, 19 Mar 2013 10:59:15 +0100 (CET) X-DKIM: OpenDKIM Filter v2.5.2 mail.tyknet.dk D054FDA40D DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gibfest.dk; s=default; t=1363687156; bh=b9Elak3SamYTgaOsvZSxk9hWgnEJYXdlxzRr9EsngME=; h=Date:From:To:CC:Subject:References:In-Reply-To; b=RhD6MzH684rfY3Kk8VZm/oOXEIIfHv9iPK7cmVTOjQx2i0IxNKCFsoH6YOZzxyQAq ANQPh/uKXitnSvrYqtubzrEP6NmiR2QlKDliAIX4rjut/MnSYCyxoIW069xxZhNEV2 URQvCxclPOBi5goG4O6CDrA8KGYdWLfYsk2hmds4= Message-ID: <514836F2.9080900@gibfest.dk> Date: Tue, 19 Mar 2013 10:59:14 +0100 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Pawel Jakub Dawidek Subject: Re: When will we see TRIM support for GELI volumes ? References: <51479D54.1040509@gibfest.dk> <20130319082732.GB1367@garage.freebsd.pl> In-Reply-To: <20130319082732.GB1367@garage.freebsd.pl> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 09:59:19 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 19-03-2013 09:27, Pawel Jakub Dawidek wrote: > On Tue, Mar 19, 2013 at 12:03:48AM +0100, Thomas Steen Rasmussen wrote: >> Hello there, >> >> I was happy to see TRIM support in UFS and ZFS, however: >> I would really like to see TRIM support for GELI volumes. > > At this point I am convinced that TRIM support should be added to GELI. Great! >> I finally got an SSD with TRIM support for the laptop, but I can't >> really use it with GELI disk encryption because the lack of TRIM >> support makes writing to the disk really slow after a while. > > This is not what I see. On one of my SSDs in my laptop I've two > partitions, both running ZFS, but one of them on top of GELI. > I don't use ZFS TRIM yet, as I see no slowdown whatsoever. > > How can you say this is lack of TRIM slowing your writes? > The performance degraded over time? Yes, the performance degraded a lot over time - in the beginning it was very fast, but as as soon as I had written the amount of space that the disk can hold, writes slowed down to a crawl. The disk was not full, data had been written and deleted, but once I had written approx the 120GB that is the size of the SSD, so it had to start deleting sectors to write, it got really slow. It still reads and for example boots fast, but if I download a large file on a fast connection, the entire machine freezes while it writes. I jumped on the SSD wagon pretty early, before the SSDs had TRIM support, so I have a good idea of what it "feels like". I have also confirmed that plain UFS with TRIM straight on the SSD (without GELI) is plenty fast on the same hardware. So I am indeed reasonably certain that this is caused by the lack of TRIM support. Pawel, on the system where you don't have TRIM enabled, but you still have decent performance: have you written more data than the capacity of the disk yet ? If not, the wear-levelling of the SSD just writes to fresh sectors and you will not notice a slowdown (yet). >> I've been told this is not a huge job, but I wouldn't know. > > It isn't. Great! :) > - Add -t and -T flags to geli init/onetime/configure subcommands. > -t will enable TRIM and -T will disable it. TRIM should be enabled by > default for providers that are only encrypted and disabled by default > for providers with integrity verification. > > - Add G_ELI_FLAG_TRIM flag that is set by default and configured using > new switches above. > > - Update g_eli.c to pass BIO_DELETEs down if the G_ELI_FLAG_TRIM flag is > set. If BIO_DELETE returns EOPNOTSUPP error, the G_ELI_FLAG_TRIM > should be removed from the in-memory structure (but not from on-disk > metadata, of course). This does sound pretty doable. Thank you for the info! /Thomas Steen Rasmussen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlFINvIACgkQGjEBQafC9MC7+gCeOx0IbHRMl5JSLkybsZloKyKh xNEAn1hz9o9Az8sPjrKoj3L4Ao63h1K2 =w/U3 -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 14:05:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 22360F31 for ; Tue, 19 Mar 2013 14:05:23 +0000 (UTC) (envelope-from prvs=1790199af7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 8C5908D6 for ; Tue, 19 Mar 2013 14:05:22 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002807892.msg for ; Tue, 19 Mar 2013 14:05:19 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 19 Mar 2013 14:05:19 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1790199af7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <21EBCFAD4455494EA3893BBBA6F2A7A5@multiplay.co.uk> From: "Steven Hartland" To: "Davide D'Amico" References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> <897DB64CEBAF4F04AE9C76B3F686E497@multiplay.co.uk> <13317bbd289c4c828f134e2c2592a2d7@sys.tomatointeractive.it> <01576f39e05f96ab3b3c822531e0c286@sys.tomatointeractive.it> <0D88348E154D43E58597FF40BA41D22F@multiplay.co.uk> <51482E67.8060900@contactlab.com> <394EAD0E59004D1B8D2010F94DDB7C84@multiplay.co.uk> <51482FB5.2000305@contactlab.com> Subject: Re: FreBSD 9.1 and ZFS v28 performances Date: Tue, 19 Mar 2013 14:05:41 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="UTF-8"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 14:05:23 -0000 ----- Original Message ----- From: "Davide D'Amico" > i don't know if you have installed sysbench 0.5.0 somewhere in you > server, but we could test using *standard* oltp tests (the dataset I'm > using is strictly private) included in sysbench package. > > Is it possible, for you? > > I'm using mysql-5.6.10 enterprise (but I think the one you find in the > ports tree it's a good choice, too). I've installed sysbench latest version from ports which is 0.4.12_1. Having run the standard oltp both read only and read + write it doesn't seem like a very good benchmark for disk as it seems like it does very little disk access. == ZFS == Test execution summary: total time: 248.5637s total number of events: 1000468 total time taken by event execution: 3968.4764 == UFS == Test execution summary: total time: 220.7406s total number of events: 1000588 total time taken by event execution: 3523.5829 I noticed that UFS's support for TRIM is very poor compared to ZFS's with it queuing all TRIMs till after the test completed as well as performing many more BIO_DELETE requests causing significant system overhead and disk saturation after completion. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 14:49:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CD982895 for ; Tue, 19 Mar 2013 14:49:04 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-la0-x22a.google.com (mail-la0-x22a.google.com [IPv6:2a00:1450:4010:c03::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 5FDDFBF2 for ; Tue, 19 Mar 2013 14:49:04 +0000 (UTC) Received: by mail-la0-f42.google.com with SMTP id fe20so1078525lab.29 for ; Tue, 19 Mar 2013 07:49:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=1iPTxiZXE6+/qLziTHEc7KH0/2WdGD7Kz9+Vo8Vjzl8=; b=LZrg4wyeRtmn7DPDGpo/NhQetLcGoqkam66VfGeyPF1mME0HrZGSLZS62nBkl2gCfO crLel0P1e8x1CvylzdxjlhuDhXTxG9q7ROFQZ38a01yxXCQ0/xugE7J0aJxF3YiniAm8 tERs7FtVIyLk5RGDc+BrJr8lAe52OuMjrR2LcRG3zJMedyObmJlppjLDqq/dutHUEJWt MLljANMyff87PEcQ+uRjNp1puZMEClCQMwNM63RVsngebQNVfjX/Fijfm0lIyeIGyQqt b/hD/rzGK2a3aTxGLGu0xGY0R2XkeJLuiNFJ/PFuiuZu0tki7jNXRv9Nmk3P6Zy31o3+ Okpg== MIME-Version: 1.0 X-Received: by 10.112.43.232 with SMTP id z8mr8139495lbl.135.1363704542910; Tue, 19 Mar 2013 07:49:02 -0700 (PDT) Received: by 10.112.26.135 with HTTP; Tue, 19 Mar 2013 07:49:02 -0700 (PDT) Date: Tue, 19 Mar 2013 14:49:02 +0000 Message-ID: Subject: ZFS TRIM support MFC From: Tom Evans To: FreeBSD FS Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 14:49:04 -0000 Hi all I was wondering if there was an anticipated date when the ZFS TRIM support would be MFC'd to a stable branch? Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 14:57:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0BE73F7A for ; Tue, 19 Mar 2013 14:57:21 +0000 (UTC) (envelope-from prvs=1790199af7=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 9EC0BD10 for ; Tue, 19 Mar 2013 14:57:20 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002808389.msg for ; Tue, 19 Mar 2013 14:57:14 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 19 Mar 2013 14:57:14 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1790199af7=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Tom Evans" , "FreeBSD FS" References: Subject: Re: ZFS TRIM support MFC Date: Tue, 19 Mar 2013 14:57:33 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 14:57:21 -0000 ----- Original Message ----- From: "Tom Evans" > I was wondering if there was an anticipated date when the ZFS TRIM > support would be MFC'd to a stable branch? That's likely something that I'll do. I've just finished a round of patches which are in review ATM and will hopefully get committed in the next week or two. After those have had time to bed down in HEAD, another few weeks, then would be the first point at which this should be MFC'ed IMO. That said we've been running TRIM on 8.3 for months in production so if your happy with back porting the changes yourself you should have no problem doing so. If you happen to be on 8 then I could provide you with our current patch set. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 14:57:41 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 71D76FEC for ; Tue, 19 Mar 2013 14:57:41 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id B8C9CD1E for ; Tue, 19 Mar 2013 14:57:40 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA14239; Tue, 19 Mar 2013 16:57:38 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <51487CE1.5090703@FreeBSD.org> Date: Tue, 19 Mar 2013 16:57:37 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130313 Thunderbird/17.0.4 MIME-Version: 1.0 To: Freddie Cash Subject: Re: Strange slowdown when cache devices enabled in ZFS References: <51430744.6020004@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 14:57:41 -0000 on 15/03/2013 16:58 Freddie Cash said the following: > How does one do that? I've never done that before. > > Point me to some docs, and I'll see what I can find out. hwpmc(4), pmcstat(8) and/or /usr/share/dtrace/toolkit/hotkernel and/or (just an example) dtrace -n 'profile:::profile-4001 { @stacks[pid, tid, execname, stack()] = count(); } END { trunc(@stacks, 10); printa(@stacks); }' > On Fri, Mar 15, 2013 at 4:34 AM, Andriy Gapon > wrote: > > on 14/03/2013 20:13 Freddie Cash said the following: > > the l2arc_feed_thread of zfskern will spin until it takes up 100% > > of a CPU core > > If you see a thread taking 100% where it shouldn't, then just profile it and > actually see what it's doing. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 15:00:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D27B218E for ; Tue, 19 Mar 2013 15:00:44 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-la0-x235.google.com (mail-la0-x235.google.com [IPv6:2a00:1450:4010:c03::235]) by mx1.freebsd.org (Postfix) with ESMTP id 62CCED53 for ; Tue, 19 Mar 2013 15:00:44 +0000 (UTC) Received: by mail-la0-f53.google.com with SMTP id fr10so1111783lab.12 for ; Tue, 19 Mar 2013 08:00:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=CXEtu6900fdXV8kOGgWP/JeetNOhJYvgONx3+8a+O+A=; b=nPL7Y+OZyo3VERXYgTAtu9rwpBwrVENYum7NpgN4v+MxMSNW1L4AE8ivVzyB7QviOf SBJu8hk+LSanOL8eVUHJEgSa7WFKpV/JTJ9dQYa9KV4OE+2JSTWtbBEQ9Fqe4HfQJzWt bXTi7NDbNVt40wwxSfP58YU0Y3cSIXeRKJhR0zmT0D1nqoTOwuVlvtu42aTJybGUV3og 3KUXaZbICGAfzpKkGUgObf0abB3rmpmgldV6m7+c3Y4JIvVjjiQWsL1q7T8TaCRY2wdL YXa3H6cSrcml3XjIq8CUG6w7PpD4jEO9/BlL/DOa6PjP1OI2CszQlvCrUhsksPYcu0LV +hyQ== MIME-Version: 1.0 X-Received: by 10.112.23.35 with SMTP id j3mr8001375lbf.60.1363705243290; Tue, 19 Mar 2013 08:00:43 -0700 (PDT) Received: by 10.112.26.135 with HTTP; Tue, 19 Mar 2013 08:00:43 -0700 (PDT) In-Reply-To: References: Date: Tue, 19 Mar 2013 15:00:43 +0000 Message-ID: Subject: Re: ZFS TRIM support MFC From: Tom Evans To: Steven Hartland Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 15:00:44 -0000 On Tue, Mar 19, 2013 at 2:57 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Tom Evans" > > >> I was wondering if there was an anticipated date when the ZFS TRIM >> support would be MFC'd to a stable branch? > > > That's likely something that I'll do. I've just finished a round of > patches which are in review ATM and will hopefully get committed > in the next week or two. Excellent news! I didn't want to hassle anyone over this, but noticed it was coming up to it's 6 month anniversary on HEAD :) > > After those have had time to bed down in HEAD, another few weeks, > then would be the first point at which this should be MFC'ed IMO. > > That said we've been running TRIM on 8.3 for months in production > so if your happy with back porting the changes yourself you should > have no problem doing so. If you happen to be on 8 then I could > provide you with our current patch set. I'm on 9. I saw your big list of patches for 8.3, but I was concerned that I wouldn't know which ones are necessary, which ones are not, and which ones had already been merged to 9. I'm happy to wait! Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 17:45:59 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 99745FD8 for ; Tue, 19 Mar 2013 17:45:59 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 1409495F for ; Tue, 19 Mar 2013 17:45:58 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r2JHjvU5057025 for ; Tue, 19 Mar 2013 21:45:57 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 19 Mar 2013 21:45:57 +0400 (MSK) From: Dmitry Morozovsky To: freebsd-fs@FreeBSD.org Subject: LSI 9260: is there a way to configure it JBOD like mps? Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (woozle.rinet.ru [0.0.0.0]); Tue, 19 Mar 2013 21:45:57 +0400 (MSK) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 17:45:59 -0000 Dear colleagues, I'm currently in process of making new backup server, based on LSI 9260 controller. I'm planning to use ZFS over disks, hence the most natural way seems to configure mfi to JBOD mode - but I can't find easy way to reach this, neither in BIOS utilities nor via MegaCli Any hints? Thanks! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 18:00:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 382B2BBB for ; Tue, 19 Mar 2013 18:00:52 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-la0-x236.google.com (mail-la0-x236.google.com [IPv6:2a00:1450:4010:c03::236]) by mx1.freebsd.org (Postfix) with ESMTP id BE560A4B for ; Tue, 19 Mar 2013 18:00:51 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id gw10so1503502lab.13 for ; Tue, 19 Mar 2013 11:00:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=u781O+GnNG6xrtxldW2sGmz8LSO5bsU8O3ZlIHmuk4k=; b=gdoKyiU6MgnZeBIlHhuI4O25Pae7dVpGSU3V77Fw785gBJ9nYUEaIupi1cRWTTvyDM fS2NmaQ2MROW6QGvJOxdcA8bA4C4/t8+eoYwmA+ZPjl5vanL1YX4Vj9Khpe8qlOGD3g4 hJH2HQUjda5G8rpbJISCSZZmkexcsy2/W2cblF20PiahkEijx+3kz2z6VmILtOZUUBAB ErOvlJ/SHyTVdWb20tOUJrFdnoa/9/XhysUHf9BEFYQLNKGbyHulpOsbUUuTIcuCKRQp fHrhDFIrzKF80fmg+UsH/SSN9ASrd2BJNY9m93qExamqJMgIAeHTTKARIW6vnJomF/Qx nnPA== MIME-Version: 1.0 X-Received: by 10.112.40.228 with SMTP id a4mr8408737lbl.26.1363716050530; Tue, 19 Mar 2013 11:00:50 -0700 (PDT) Received: by 10.112.26.135 with HTTP; Tue, 19 Mar 2013 11:00:50 -0700 (PDT) In-Reply-To: References: Date: Tue, 19 Mar 2013 18:00:50 +0000 Message-ID: Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? From: Tom Evans To: Dmitry Morozovsky Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 18:00:52 -0000 On Tue, Mar 19, 2013 at 5:45 PM, Dmitry Morozovsky wrote: > Dear colleagues, > > I'm currently in process of making new backup server, based on LSI 9260 > controller. I'm planning to use ZFS over disks, hence the most natural way > seems to configure mfi to JBOD mode - but I can't find easy way to reach this, > neither in BIOS utilities nor via MegaCli > > Any hints? > > Thanks! > 9260 should be SAS-2008 based, so mps(4) not mfi(4). The internet[1] suggests that this card should be flashable to a 9211-8i with IT mode firmware, which is just about the ultimate ZFS card, instant-JBOD on inserting a disk, passthru for SMART, high performance, etc. Cheers Tom [1] http://blog.grem.de/sysadmin/LSI-SAS2008-Flashing-2012-04-12-22-17.html From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 18:27:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 89F31346 for ; Tue, 19 Mar 2013 18:27:52 +0000 (UTC) (envelope-from michael@fuckner.net) Received: from mo6-p00-ob.rzone.de (mo6-p00-ob.rzone.de [IPv6:2a01:238:20a:202:5300::1]) by mx1.freebsd.org (Postfix) with ESMTP id 260B0C25 for ; Tue, 19 Mar 2013 18:27:51 +0000 (UTC) X-RZG-AUTH: :IWUHfUGtd9+6EujMWHx57N4dWae4bmTL/JIGbzkGUoozgk+7q1hCDwOR4JI33JuV X-RZG-CLASS-ID: mo00 Received: from [10.1.2.100] (port-47413.pppoe.wtnet.de [46.59.233.239]) by smtp.strato.de (josoe mo41) (RZmta 31.21 AUTH) with ESMTPA id i01235p2JGvUaI for ; Tue, 19 Mar 2013 19:27:43 +0100 (CET) Message-ID: <5148A017.2060108@fuckner.net> Date: Tue, 19 Mar 2013 18:27:51 +0100 From: Michael Fuckner User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 18:27:52 -0000 Am 3/19/2013 7:00 PM, schrieb Tom Evans: > On Tue, Mar 19, 2013 at 5:45 PM, Dmitry Morozovsky wrote: >> Dear colleagues, >> >> I'm currently in process of making new backup server, based on LSI 9260 >> controller. I'm planning to use ZFS over disks, hence the most natural way >> seems to configure mfi to JBOD mode - but I can't find easy way to reach this, >> neither in BIOS utilities nor via MegaCli there is no such way :( > 9260 should be SAS-2008 based, so mps(4) not mfi(4). IMHO 9260 is a megaraid card (with Cache and BBU Option) based on 2108, not 2008 Chipset. It is not flashable to an HBA. This wouldn't make sense anyway since HBAs are way cheaper than Raid Cards. I also tried to configure megaraid for ZFS- the good thing is that you can make use of the controller cache. The bad thing is: it is very nasty to configure: every disk needs to be an own raidset with a Raid0 array. When a disk fails- the controller complains about unrecoverable raids/volumes. There is nothing like an HBA mode. I saw Adaptec claims to hace Controllers behaving like an HBA with Cache, but I think it still takes some weeks until I have my Card. Regards, Michael! From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 18:36:13 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8603A6AE for ; Tue, 19 Mar 2013 18:36:13 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 4DC76D0C for ; Tue, 19 Mar 2013 18:36:13 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:a1ea:346e:4c51:7ab3]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 0F76F4AC58; Tue, 19 Mar 2013 22:36:11 +0400 (MSK) Date: Tue, 19 Mar 2013 22:36:08 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1584925436.20130319223608@serebryakov.spb.ru> To: Dmitry Morozovsky Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 18:36:13 -0000 Hello, Dmitry. You wrote 19 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2013 =D0=B3., 21:45:57: DM> I'm currently in process of making new backup server, based on LSI 9260 DM> controller. I'm planning to use ZFS over disks, hence the most natural= way DM> seems to configure mfi to JBOD mode - but I can't find easy way to reac= h this, DM> neither in BIOS utilities nor via MegaCli ZFS Administration Guide from Sun, errr, Oracle, says, that it is bad idea to use anything but raw spindles for ZFS. You don't need JBOD mode, you need many separate disks, added to ZFS vdev. It allows ZFS to plan parallel requests & distribute vital metainformattion properly. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 18:56:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C6326FD; Tue, 19 Mar 2013 18:56:54 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id 8514BDED; Tue, 19 Mar 2013 18:56:54 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:a1ea:346e:4c51:7ab3]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 49ACF4AC58; Tue, 19 Mar 2013 22:56:46 +0400 (MSK) Date: Tue, 19 Mar 2013 22:56:42 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1954349453.20130319225642@serebryakov.spb.ru> To: Pawel Jakub Dawidek Subject: Re: When will we see TRIM support for GELI volumes ? In-Reply-To: <20130319082732.GB1367@garage.freebsd.pl> References: <51479D54.1040509@gibfest.dk> <20130319082732.GB1367@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 18:56:54 -0000 Hello, Pawel. You wrote 19 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2013 =D0=B3., 12:27:32: PJD> This is not what I see. On one of my SSDs in my laptop I've two PJD> partitions, both running ZFS, but one of them on top of GELI. PJD> I don't use ZFS TRIM yet, as I see no slowdown whatsoever. It depends on your SSD controller and write rate. SandForce-based SSD degrades badly without TRIM and can not recover performance by themselves if here is a lot of writes. But modern SSD on Marvell, Indilinx and LAMP-based SSD restore write performance after some idle time, not so effectively as with TRIM, but to very good level. And SF-2281 based SSD sucks in this area badly. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 19:04:23 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1795A3F0 for ; Tue, 19 Mar 2013 19:04:23 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id 9BF3CE43 for ; Tue, 19 Mar 2013 19:04:22 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r2JJ4EXZ060868; Tue, 19 Mar 2013 23:04:14 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 19 Mar 2013 23:04:14 +0400 (MSK) From: Dmitry Morozovsky To: Tom Evans Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (woozle.rinet.ru [0.0.0.0]); Tue, 19 Mar 2013 23:04:14 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 19:04:23 -0000 On Tue, 19 Mar 2013, Tom Evans wrote: > > I'm currently in process of making new backup server, based on LSI 9260 > > controller. I'm planning to use ZFS over disks, hence the most natural way > > seems to configure mfi to JBOD mode - but I can't find easy way to reach this, > > neither in BIOS utilities nor via MegaCli > > 9260 should be SAS-2008 based, so mps(4) not mfi(4). Well, it at least detected by stable/9 GENERIC as mfi > The internet[1] suggests that this card should be flashable to a > 9211-8i with IT mode firmware, which is just about the ultimate ZFS > card, instant-JBOD on inserting a disk, passthru for SMART, high > performance, etc. Will check, thanks for the reference. > [1] http://blog.grem.de/sysadmin/LSI-SAS2008-Flashing-2012-04-12-22-17.html > -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 20:45:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A103FCD8 for ; Tue, 19 Mar 2013 20:45:42 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-vc0-f173.google.com (mail-vc0-f173.google.com [209.85.220.173]) by mx1.freebsd.org (Postfix) with ESMTP id 59094662 for ; Tue, 19 Mar 2013 20:45:42 +0000 (UTC) Received: by mail-vc0-f173.google.com with SMTP id gd11so774926vcb.18 for ; Tue, 19 Mar 2013 13:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=Jd2vuCNMXPekFiWoAhcCnrqbsagPz7kdBqDucUPeamg=; b=QbbE9+3zdH4cc8wsavLi5Iu0NYYSDvH7F9ydvQdifZOToDhz8x672/aQ8aMe8paTA5 aapxAwPooo8orX3eZAjv3LTZQG9C2XrD3jOGqZLPSL99ZVCnrAnCoq55qq+aceiFk32K Jxq9dx8QXtzvZn03+osscpiqXlou2SYrjQOIeDeIjL+bu8Xr9ZRCC5n8MhbFEqIZh2+f GXkVuUsDgOvAKjP/9sMRd+6H0V8kbmBXocgvr4Y92AknKZNztf414VthV5YcVQtLIh/+ EhZa4YFOh90hQIB5WNg8W2rjz4ISGgomnCL9SWDpQTy73L1Jb6Y4AIEKepUzlil7DpvJ AxPQ== MIME-Version: 1.0 X-Received: by 10.220.222.8 with SMTP id ie8mr4674577vcb.27.1363725936614; Tue, 19 Mar 2013 13:45:36 -0700 (PDT) Received: by 10.220.232.6 with HTTP; Tue, 19 Mar 2013 13:45:36 -0700 (PDT) Date: Tue, 19 Mar 2013 16:45:36 -0400 Message-ID: Subject: ZFS: Almost a minute of dirty buffers? From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 20:45:42 -0000 During a recent protracted power outage at home, it came time to shutdown my ZFS fileserver. This doesn't happen often --- it's a reliable performer. The kicker is that _after_ the buffers have been sync'd for UFS (root/var/usr on UFS), ZFS spends some time shutting down --- or that's what I believe since the disk usage lights on the ZFS drives are going crazy. ... and ZFS takes nearly a minute of very active disk to shutdown ?!!? Are these dirty buffers? What is it doing? This period of disk blinking seems to be related to uptime (ie: longer uptime, longer blinking on shutdown). For the curious, my ZFS config is: [1:1:301]root@virtual:~> zpool status pool: vr2 state: ONLINE scan: resilvered 30.8M in 0h2m with 0 errors on Tue Feb 26 20:41:45 2013 config: NAME STATE READ WRITE CKSUM vr2 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 label/vr2-d0 ONLINE 0 0 0 label/vr2-d1 ONLINE 0 0 0 label/vr2-d2a ONLINE 0 0 0 label/vr2-d3a ONLINE 0 0 0 label/vr2-d4 ONLINE 0 0 0 label/vr2-d5 ONLINE 0 0 0 label/vr2-d6 ONLINE 0 0 0 label/vr2-d7c ONLINE 0 0 0 label/vr2-d8 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 gpt/vr2-e0 ONLINE 0 0 0 gpt/vr2-e1 ONLINE 0 0 0 gpt/vr2-e2 ONLINE 0 0 0 gpt/vr2-e3 ONLINE 0 0 0 gpt/vr2-e4 ONLINE 0 0 0 gpt/vr2-e5 ONLINE 0 0 0 gpt/vr2-e6 ONLINE 0 0 0 gpt/vr2-e7 ONLINE 0 0 0 errors: No known data errors I know that the two vdevs are not the same size (9 disks and 8 disks), but I noticed this behavior when there was only one vdev in this array, too. Most of the ZFS usage is via NFS, SMB or iSCSI. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 21:09:57 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CAA6351D for ; Tue, 19 Mar 2013 21:09:57 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id 96EBE775 for ; Tue, 19 Mar 2013 21:09:57 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76) (envelope-from ) id 1UI3CU-0001ko-DQ for freebsd-fs@freebsd.org; Tue, 19 Mar 2013 16:32:02 -0400 Message-ID: <5148CB42.6090001@cse.yorku.ca> Date: Tue, 19 Mar 2013 16:32:02 -0400 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: best freebsd version for zfs file server References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: Hi. I hope to soon put into production a new file server hosting many ZFS filesystem with FreeBSD. The system has 2 x 9205-8e cards, and 1 x 9207-8i card and 24 x 900 GB 10K RPM drives. I'm trying to figure out what is ultimately the "best" version of FreeBSD to run on a production file server. I believe that it doesn't make sense to stick directly to the 9.1/release because there have already been many ZFS problems that were solved in 9.1/stable. On the other hand, stable doesn't necessarily have to be "stable"! Of course "release" might not be "stable" either if there's a bug that say, causes a hang on my controller card, and it's not fixed in anything but "stable"! Yet, "stable" might "break" something else. I'm wondering what people who are running FreeBSD file servers in production do -- do you track individual changes, and compile release + individual bug fixes that likely affect you, or, in my case, if I run "stable", do all my testing with "stable", do I run that version of stable, and only attempt to upgrade to the next "stable" release while very carefully reviewing the bug list, then holding my breath when the server comes up? Any recommendations would be appreciated. I know there are a lot of people who are happily running FreeBSD file servers. :) [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 21:09:57 -0000 Hi. I hope to soon put into production a new file server hosting many ZFS filesystem with FreeBSD. The system has 2 x 9205-8e cards, and 1 x 9207-8i card and 24 x 900 GB 10K RPM drives. I'm trying to figure out what is ultimately the "best" version of FreeBSD to run on a production file server. I believe that it doesn't make sense to stick directly to the 9.1/release because there have already been many ZFS problems that were solved in 9.1/stable. On the other hand, stable doesn't necessarily have to be "stable"! Of course "release" might not be "stable" either if there's a bug that say, causes a hang on my controller card, and it's not fixed in anything but "stable"! Yet, "stable" might "break" something else. I'm wondering what people who are running FreeBSD file servers in production do -- do you track individual changes, and compile release + individual bug fixes that likely affect you, or, in my case, if I run "stable", do all my testing with "stable", do I run that version of stable, and only attempt to upgrade to the next "stable" release while very carefully reviewing the bug list, then holding my breath when the server comes up? Any recommendations would be appreciated. I know there are a lot of people who are happily running FreeBSD file servers. :) Jason. On 03/19/2013 03:04 PM, Dmitry Morozovsky wrote: > On Tue, 19 Mar 2013, Tom Evans wrote: > >>> I'm currently in process of making new backup server, based on LSI 9260 >>> controller. I'm planning to use ZFS over disks, hence the most natural way >>> seems to configure mfi to JBOD mode - but I can't find easy way to reach this, >>> neither in BIOS utilities nor via MegaCli >> 9260 should be SAS-2008 based, so mps(4) not mfi(4). > Well, it at least detected by stable/9 GENERIC as mfi > >> The internet[1] suggests that this card should be flashable to a >> 9211-8i with IT mode firmware, which is just about the ultimate ZFS >> card, instant-JBOD on inserting a disk, passthru for SMART, high >> performance, etc. > Will check, thanks for the reference. > >> [1] http://blog.grem.de/sysadmin/LSI-SAS2008-Flashing-2012-04-12-22-17.html >> From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 21:35:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A3176A4A for ; Tue, 19 Mar 2013 21:35:41 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qa0-f41.google.com (mail-qa0-f41.google.com [209.85.216.41]) by mx1.freebsd.org (Postfix) with ESMTP id 6A3F0888 for ; Tue, 19 Mar 2013 21:35:41 +0000 (UTC) Received: by mail-qa0-f41.google.com with SMTP id bs12so2701333qab.14 for ; Tue, 19 Mar 2013 14:35:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=LSZ8vcbF9A+Oo9JTmGuzAnk8XfVEHYfx2kJLdpLxLww=; b=bbqcrJfafYAHgPo0hYmCiMy/XjezpweX7PedKauvTTuvbYyRdlHH10DIb7/+tPnsKw JxTWaGsrtNgTAJ+xY1JqSDiD/HT0z94FEbpPDQReTx3Wc45SSc5k/9KNLrWl8c+O8QpS JFPauE4kAtCBaLGeLLizM/oAOF9t3rO7Fj0s5Xum6j6e2mdRNVJ73E6w73Vd5TiMzHXK s+Q92jpzGAcdacn6PiytETitbiSSNh+QoaXw/Wdu3BWV/dUICzbSdedCSyvk3a8oBRm4 Rz7ri+d2QX7bOq6gu/stgakGWQq9f4Fwumqy/gQVWauJHwR4kX9RTAusnU7xm2R8qVIC NwmA== MIME-Version: 1.0 X-Received: by 10.229.76.209 with SMTP id d17mr797363qck.78.1363728934832; Tue, 19 Mar 2013 14:35:34 -0700 (PDT) Received: by 10.49.50.67 with HTTP; Tue, 19 Mar 2013 14:35:34 -0700 (PDT) In-Reply-To: <5148CB42.6090001@cse.yorku.ca> References: <5148CB42.6090001@cse.yorku.ca> Date: Tue, 19 Mar 2013 14:35:34 -0700 Message-ID: Subject: Re: best freebsd version for zfs file server From: Freddie Cash To: Jason Keltz Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 21:35:41 -0000 9-STABLE after r248369. That enables: - feature flags - lz4 compression - deadman thread - various other bug fixes, tweaks, updates, etc If you don't want to run the very latest stable, the following command is useful: svn co -r 248369 svn://svn0.us-west.freebsd.org/base/stable/9 /usr/src I read through the svn-src-stable-9 mailing list archives to see if there are any interesting commits for devices or services that I use, and update to those revisions as needed. For example, there's some nice mps(4) and zfs-related commits listed here: http://lists.freebsd.org/pipermail/svn-src-stable-9/2013-March/thread.html -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue Mar 19 21:49:22 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D32BDED8 for ; Tue, 19 Mar 2013 21:49:22 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 25792908 for ; Tue, 19 Mar 2013 21:49:20 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAJjcSFGDaFvO/2dsb2JhbAA7CIglvHiBcnSCJAEBAQMBAQEBIAQnIAsFFg4KERkCBCUBCSYGCAcEARwEh20GDK82gkCQDY1FFQQGdhkbB4ItgRMDjzyEZYI+gR+PY4MmIDJ9CBce X-IronPort-AV: E=Sophos;i="4.84,874,1355115600"; d="scan'208";a="19806651" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 19 Mar 2013 17:49:19 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 11AB2B3EEF; Tue, 19 Mar 2013 17:49:19 -0400 (EDT) Date: Tue, 19 Mar 2013 17:49:19 -0400 (EDT) From: Rick Macklem To: Jason Keltz Message-ID: <312742115.4078123.1363729759052.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <5148CB42.6090001@cse.yorku.ca> Subject: Re: best freebsd version for zfs file server MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_4078122_800517510.1363729759048" X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Mar 2013 21:49:22 -0000 ------=_Part_4078122_800517510.1363729759048 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Jason Keltz wrote: > Hi. > I hope to soon put into production a new file server hosting many ZFS > filesystem with FreeBSD. The system has 2 x 9205-8e cards, and 1 x > 9207-8i card and 24 x 900 GB 10K RPM drives. I'm trying to figure out > what is ultimately the "best" version of FreeBSD to run on a > production > file server. I believe that it doesn't make sense to stick directly to > the 9.1/release because there have already been many ZFS problems that > were solved in 9.1/stable. On the other hand, stable doesn't > necessarily have to be "stable"! Of course "release" might not be > "stable" either if there's a bug that say, causes a hang on my > controller card, and it's not fixed in anything but "stable"! Yet, > "stable" might "break" something else. I'm wondering what people who > are running FreeBSD file servers in production do -- do you track > individual changes, and compile release + individual bug fixes that > likely affect you, or, in my case, if I run "stable", do all my > testing > with "stable", do I run that version of stable, and only attempt to > upgrade to the next "stable" release while very carefully reviewing > the > bug list, then holding my breath when the server comes up? Any > recommendations would be appreciated. I know there are a lot of people > who are happily running FreeBSD file servers. :) > > Jason. > You might want to consider the attached patch which Garrett Wollman has been testing. It is not even in head yet, but earlier versions of the patch have been in testing for a while. It allows you to adjust tunables to trade increased storage use in the DRC (mostly mbuf clusters) for decreased mutex lock contention and cpu overheads. rick > On 03/19/2013 03:04 PM, Dmitry Morozovsky wrote: > > On Tue, 19 Mar 2013, Tom Evans wrote: > > > >>> I'm currently in process of making new backup server, based on LSI > >>> 9260 > >>> controller. I'm planning to use ZFS over disks, hence the most > >>> natural way > >>> seems to configure mfi to JBOD mode - but I can't find easy way to > >>> reach this, > >>> neither in BIOS utilities nor via MegaCli > >> 9260 should be SAS-2008 based, so mps(4) not mfi(4). > > Well, it at least detected by stable/9 GENERIC as mfi > > > >> The internet[1] suggests that this card should be flashable to a > >> 9211-8i with IT mode firmware, which is just about the ultimate ZFS > >> card, instant-JBOD on inserting a disk, passthru for SMART, high > >> performance, etc. > > Will check, thanks for the reference. > > > >> [1] > >> http://blog.grem.de/sysadmin/LSI-SAS2008-Flashing-2012-04-12-22-17.html > >> > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" ------=_Part_4078122_800517510.1363729759048 Content-Type: text/x-patch; name=drc4.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=drc4.patch LS0tIGZzL25mc3NlcnZlci9uZnNfbmZzZGNhY2hlLmMub3JpZwkyMDEzLTAxLTA3IDA5OjA0OjEz LjAwMDAwMDAwMCAtMDUwMAorKysgZnMvbmZzc2VydmVyL25mc19uZnNkY2FjaGUuYwkyMDEzLTAz LTEyIDIyOjQyOjA1LjAwMDAwMDAwMCAtMDQwMApAQCAtMTYwLDEyICsxNjAsMzEgQEAgX19GQlNE SUQoIiRGcmVlQlNEOiBwcm9qZWN0cy9uZnN2NC1wYWNrcgogI2luY2x1ZGUgPGZzL25mcy9uZnNw b3J0Lmg+CiAKIGV4dGVybiBzdHJ1Y3QgbmZzc3RhdHMgbmV3bmZzc3RhdHM7Ci1ORlNDQUNIRU1V VEVYOworZXh0ZXJuIHN0cnVjdCBtdHggbmZzcmNfdGNwbXR4W05GU1JWQ0FDSEVfSEFTSFNJWkVd OworZXh0ZXJuIHN0cnVjdCBtdHggbmZzcmNfdWRwbXR4OwogaW50IG5mc3JjX2Zsb29kbGV2ZWwg PSBORlNSVkNBQ0hFX0ZMT09ETEVWRUwsIG5mc3JjX3RjcHNhdmVkcmVwbGllcyA9IDA7CiAjZW5k aWYJLyogIUFQUExFS0VYVCAqLwogCi1zdGF0aWMgaW50IG5mc3JjX3RjcG5vbmlkZW1wb3RlbnQg PSAxOwotc3RhdGljIGludCBuZnNyY191ZHBoaWdod2F0ZXIgPSBORlNSVkNBQ0hFX1VEUEhJR0hX QVRFUiwgbmZzcmNfdWRwY2FjaGVzaXplID0gMDsKK1NZU0NUTF9ERUNMKF92ZnNfbmZzZCk7CisK K3N0YXRpYyB1X2ludAluZnNyY190Y3BoaWdod2F0ZXIgPSAwOworU1lTQ1RMX1VJTlQoX3Zmc19u ZnNkLCBPSURfQVVUTywgdGNwaGlnaHdhdGVyLCBDVExGTEFHX1JXLAorICAgICZuZnNyY190Y3Bo aWdod2F0ZXIsIDAsCisgICAgIkhpZ2ggd2F0ZXIgbWFyayBmb3IgVENQIGNhY2hlIGVudHJpZXMi KTsKK3N0YXRpYyB1X2ludAluZnNyY191ZHBoaWdod2F0ZXIgPSBORlNSVkNBQ0hFX1VEUEhJR0hX QVRFUjsKK1NZU0NUTF9VSU5UKF92ZnNfbmZzZCwgT0lEX0FVVE8sIHVkcGhpZ2h3YXRlciwgQ1RM RkxBR19SVywKKyAgICAmbmZzcmNfdWRwaGlnaHdhdGVyLCAwLAorICAgICJIaWdoIHdhdGVyIG1h cmsgZm9yIFVEUCBjYWNoZSBlbnRyaWVzIik7CitzdGF0aWMgdV9pbnQJbmZzcmNfdGNwdGltZW91 dCA9IE5GU1JWQ0FDSEVfVENQVElNRU9VVDsKK1NZU0NUTF9VSU5UKF92ZnNfbmZzZCwgT0lEX0FV VE8sIHRjcGNhY2hldGltZW8sIENUTEZMQUdfUlcsCisgICAgJm5mc3JjX3RjcHRpbWVvdXQsIDAs CisgICAgIlRpbWVvdXQgZm9yIFRDUCBlbnRyaWVzIGluIHRoZSBEUkMiKTsKK3N0YXRpYyB1X2lu dCBuZnNyY190Y3Bub25pZGVtcG90ZW50ID0gMTsKK1NZU0NUTF9VSU5UKF92ZnNfbmZzZCwgT0lE X0FVVE8sIGNhY2hldGNwLCBDVExGTEFHX1JXLAorICAgICZuZnNyY190Y3Bub25pZGVtcG90ZW50 LCAwLAorICAgICJFbmFibGUgdGhlIERSQyBmb3IgTkZTIG92ZXIgVENQIik7CisKK3N0YXRpYyBp bnQgbmZzcmNfdWRwY2FjaGVzaXplID0gMDsKIHN0YXRpYyBUQUlMUV9IRUFEKCwgbmZzcnZjYWNo ZSkgbmZzcnZ1ZHBscnU7CiBzdGF0aWMgc3RydWN0IG5mc3J2aGFzaGhlYWQgbmZzcnZoYXNodGJs W05GU1JWQ0FDSEVfSEFTSFNJWkVdLAogICAgIG5mc3J2dWRwaGFzaHRibFtORlNSVkNBQ0hFX0hB U0hTSVpFXTsKQEAgLTE5NywxMCArMjE2LDExIEBAIHN0YXRpYyBpbnQgbmV3bmZzdjJfcHJvY2lk W05GU19WM05QUk9DU10KIAlORlNWMlBST0NfTk9PUCwKIH07CiAKKyNkZWZpbmUJbmZzcmNfaGFz aCh4aWQpCSgoKHhpZCkgKyAoKHhpZCkgPj4gMjQpKSAlIE5GU1JWQ0FDSEVfSEFTSFNJWkUpCiAj ZGVmaW5lCU5GU1JDVURQSEFTSCh4aWQpIFwKLQkoJm5mc3J2dWRwaGFzaHRibFsoKHhpZCkgKyAo KHhpZCkgPj4gMjQpKSAlIE5GU1JWQ0FDSEVfSEFTSFNJWkVdKQorCSgmbmZzcnZ1ZHBoYXNodGJs W25mc3JjX2hhc2goeGlkKV0pCiAjZGVmaW5lCU5GU1JDSEFTSCh4aWQpIFwKLQkoJm5mc3J2aGFz aHRibFsoKHhpZCkgKyAoKHhpZCkgPj4gMjQpKSAlIE5GU1JWQ0FDSEVfSEFTSFNJWkVdKQorCSgm bmZzcnZoYXNodGJsW25mc3JjX2hhc2goeGlkKV0pCiAjZGVmaW5lCVRSVUUJMQogI2RlZmluZQlG QUxTRQkwCiAjZGVmaW5lCU5GU1JWQ0FDSEVfQ0hFQ0tMRU4JMTAwCkBAIC0yNTEsNiArMjcxLDE4 IEBAIHN0YXRpYyBpbnQgbmZzcmNfZ2V0bGVuYW5kY2tzdW0obWJ1Zl90IG0KIHN0YXRpYyB2b2lk IG5mc3JjX21hcmtzYW1ldGNwY29ubih1X2ludDY0X3QpOwogCiAvKgorICogUmV0dXJuIHRoZSBj b3JyZWN0IG11dGV4IGZvciB0aGlzIGNhY2hlIGVudHJ5LgorICovCitzdGF0aWMgX19pbmxpbmUg c3RydWN0IG10eCAqCituZnNyY19jYWNoZW11dGV4KHN0cnVjdCBuZnNydmNhY2hlICpycCkKK3sK KworCWlmICgocnAtPnJjX2ZsYWcgJiBSQ19VRFApICE9IDApCisJCXJldHVybiAoJm5mc3JjX3Vk cG10eCk7CisJcmV0dXJuICgmbmZzcmNfdGNwbXR4W25mc3JjX2hhc2gocnAtPnJjX3hpZCldKTsK K30KKworLyoKICAqIEluaXRpYWxpemUgdGhlIHNlcnZlciByZXF1ZXN0IGNhY2hlIGxpc3QKICAq LwogQVBQTEVTVEFUSUMgdm9pZApAQCAtMzI1LDEwICszNTcsMTIgQEAgbmZzcmNfZ2V0dWRwKHN0 cnVjdCBuZnNydl9kZXNjcmlwdCAqbmQsIAogCXN0cnVjdCBzb2NrYWRkcl9pbjYgKnNhZGRyNjsK IAlzdHJ1Y3QgbmZzcnZoYXNoaGVhZCAqaHA7CiAJaW50IHJldCA9IDA7CisJc3RydWN0IG10eCAq bXV0ZXg7CiAKKwltdXRleCA9IG5mc3JjX2NhY2hlbXV0ZXgobmV3cnApOwogCWhwID0gTkZTUkNV RFBIQVNIKG5ld3JwLT5yY194aWQpOwogbG9vcDoKLQlORlNMT0NLQ0FDSEUoKTsKKwltdHhfbG9j ayhtdXRleCk7CiAJTElTVF9GT1JFQUNIKHJwLCBocCwgcmNfaGFzaCkgewogCSAgICBpZiAobmV3 cnAtPnJjX3hpZCA9PSBycC0+cmNfeGlkICYmCiAJCW5ld3JwLT5yY19wcm9jID09IHJwLT5yY19w cm9jICYmCkBAIC0zMzYsOCArMzcwLDggQEAgbG9vcDoKIAkJbmZzYWRkcl9tYXRjaChORVRGQU1J TFkocnApLCAmcnAtPnJjX2hhZGRyLCBuZC0+bmRfbmFtKSkgewogCQkJaWYgKChycC0+cmNfZmxh ZyAmIFJDX0xPQ0tFRCkgIT0gMCkgewogCQkJCXJwLT5yY19mbGFnIHw9IFJDX1dBTlRFRDsKLQkJ CQkodm9pZCltdHhfc2xlZXAocnAsIE5GU0NBQ0hFTVVURVhQVFIsCi0JCQkJICAgIChQWkVSTyAt IDEpIHwgUERST1AsICJuZnNyYyIsIDEwICogaHopOworCQkJCSh2b2lkKW10eF9zbGVlcChycCwg bXV0ZXgsIChQWkVSTyAtIDEpIHwgUERST1AsCisJCQkJICAgICJuZnNyYyIsIDEwICogaHopOwog CQkJCWdvdG8gbG9vcDsKIAkJCX0KIAkJCWlmIChycC0+cmNfZmxhZyA9PSAwKQpAQCAtMzQ3LDE0 ICszODEsMTQgQEAgbG9vcDoKIAkJCVRBSUxRX0lOU0VSVF9UQUlMKCZuZnNydnVkcGxydSwgcnAs IHJjX2xydSk7CiAJCQlpZiAocnAtPnJjX2ZsYWcgJiBSQ19JTlBST0cpIHsKIAkJCQluZXduZnNz dGF0cy5zcnZjYWNoZV9pbnByb2doaXRzKys7Ci0JCQkJTkZTVU5MT0NLQ0FDSEUoKTsKKwkJCQlt dHhfdW5sb2NrKG11dGV4KTsKIAkJCQlyZXQgPSBSQ19EUk9QSVQ7CiAJCQl9IGVsc2UgaWYgKHJw LT5yY19mbGFnICYgUkNfUkVQU1RBVFVTKSB7CiAJCQkJLyoKIAkJCQkgKiBWMiBvbmx5LgogCQkJ CSAqLwogCQkJCW5ld25mc3N0YXRzLnNydmNhY2hlX25vbmlkZW1kb25laGl0cysrOwotCQkJCU5G U1VOTE9DS0NBQ0hFKCk7CisJCQkJbXR4X3VubG9jayhtdXRleCk7CiAJCQkJbmZzcnZkX3JlcGhl YWQobmQpOwogCQkJCSoobmQtPm5kX2VycnApID0gcnAtPnJjX3N0YXR1czsKIAkJCQlyZXQgPSBS Q19SRVBMWTsKQEAgLTM2Miw3ICszOTYsNyBAQCBsb29wOgogCQkJCQlORlNSVkNBQ0hFX1VEUFRJ TUVPVVQ7CiAJCQl9IGVsc2UgaWYgKHJwLT5yY19mbGFnICYgUkNfUkVQTUJVRikgewogCQkJCW5l d25mc3N0YXRzLnNydmNhY2hlX25vbmlkZW1kb25laGl0cysrOwotCQkJCU5GU1VOTE9DS0NBQ0hF KCk7CisJCQkJbXR4X3VubG9jayhtdXRleCk7CiAJCQkJbmQtPm5kX21yZXEgPSBtX2NvcHltKHJw LT5yY19yZXBseSwgMCwKIAkJCQkJTV9DT1BZQUxMLCBNX1dBSVRPSyk7CiAJCQkJcmV0ID0gUkNf UkVQTFk7CkBAIC0zOTIsNyArNDI2LDcgQEAgbG9vcDoKIAl9CiAJTElTVF9JTlNFUlRfSEVBRCho cCwgbmV3cnAsIHJjX2hhc2gpOwogCVRBSUxRX0lOU0VSVF9UQUlMKCZuZnNydnVkcGxydSwgbmV3 cnAsIHJjX2xydSk7Ci0JTkZTVU5MT0NLQ0FDSEUoKTsKKwltdHhfdW5sb2NrKG11dGV4KTsKIAlu ZC0+bmRfcnAgPSBuZXdycDsKIAlyZXQgPSBSQ19ET0lUOwogCkBAIC00MTAsMTIgKzQ0NCwxNiBA QCBuZnNydmRfdXBkYXRlY2FjaGUoc3RydWN0IG5mc3J2X2Rlc2NyaXB0CiAJc3RydWN0IG5mc3J2 Y2FjaGUgKnJwOwogCXN0cnVjdCBuZnNydmNhY2hlICpyZXRycCA9IE5VTEw7CiAJbWJ1Zl90IG07 CisJc3RydWN0IG10eCAqbXV0ZXg7CiAKKwlpZiAobmZzcmNfdGNwaGlnaHdhdGVyID4gbmZzcmNf Zmxvb2RsZXZlbCkKKwkJbmZzcmNfZmxvb2RsZXZlbCA9IG5mc3JjX3RjcGhpZ2h3YXRlcjsKIAly cCA9IG5kLT5uZF9ycDsKIAlpZiAoIXJwKQogCQlwYW5pYygibmZzcnZkX3VwZGF0ZWNhY2hlIG51 bGwgcnAiKTsKIAluZC0+bmRfcnAgPSBOVUxMOwotCU5GU0xPQ0tDQUNIRSgpOworCW11dGV4ID0g bmZzcmNfY2FjaGVtdXRleChycCk7CisJbXR4X2xvY2sobXV0ZXgpOwogCW5mc3JjX2xvY2socnAp OwogCWlmICghKHJwLT5yY19mbGFnICYgUkNfSU5QUk9HKSkKIAkJcGFuaWMoIm5mc3J2ZF91cGRh dGVjYWNoZSBub3QgaW5wcm9nIik7CkBAIC00MzAsNyArNDY4LDcgQEAgbmZzcnZkX3VwZGF0ZWNh Y2hlKHN0cnVjdCBuZnNydl9kZXNjcmlwdAogCSAqLwogCWlmIChuZC0+bmRfcmVwc3RhdCA9PSBO RlNFUlJfUkVQTFlGUk9NQ0FDSEUpIHsKIAkJbmV3bmZzc3RhdHMuc3J2Y2FjaGVfbm9uaWRlbWRv bmVoaXRzKys7Ci0JCU5GU1VOTE9DS0NBQ0hFKCk7CisJCW10eF91bmxvY2sobXV0ZXgpOwogCQlu ZC0+bmRfcmVwc3RhdCA9IDA7CiAJCWlmIChuZC0+bmRfbXJlcSkKIAkJCW1idWZfZnJlZW0obmQt Pm5kX21yZXEpOwpAQCAtNDM4LDcgKzQ3Niw3IEBAIG5mc3J2ZF91cGRhdGVjYWNoZShzdHJ1Y3Qg bmZzcnZfZGVzY3JpcHQKIAkJCXBhbmljKCJyZXBseSBmcm9tIGNhY2hlIik7CiAJCW5kLT5uZF9t cmVxID0gbV9jb3B5bShycC0+cmNfcmVwbHksIDAsCiAJCSAgICBNX0NPUFlBTEwsIE1fV0FJVE9L KTsKLQkJcnAtPnJjX3RpbWVzdGFtcCA9IE5GU0RfTU9OT1NFQyArIE5GU1JWQ0FDSEVfVENQVElN RU9VVDsKKwkJcnAtPnJjX3RpbWVzdGFtcCA9IE5GU0RfTU9OT1NFQyArIG5mc3JjX3RjcHRpbWVv dXQ7CiAJCW5mc3JjX3VubG9jayhycCk7CiAJCWdvdG8gb3V0OwogCX0KQEAgLTQ2MywyMSArNTAx LDIxIEBAIG5mc3J2ZF91cGRhdGVjYWNoZShzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQKIAkJICAgIG5m c3YyX3JlcHN0YXRbbmV3bmZzdjJfcHJvY2lkW25kLT5uZF9wcm9jbnVtXV0pIHsKIAkJCXJwLT5y Y19zdGF0dXMgPSBuZC0+bmRfcmVwc3RhdDsKIAkJCXJwLT5yY19mbGFnIHw9IFJDX1JFUFNUQVRV UzsKLQkJCU5GU1VOTE9DS0NBQ0hFKCk7CisJCQltdHhfdW5sb2NrKG11dGV4KTsKIAkJfSBlbHNl IHsKIAkJCWlmICghKHJwLT5yY19mbGFnICYgUkNfVURQKSkgewotCQkJICAgIG5mc3JjX3RjcHNh dmVkcmVwbGllcysrOworCQkJICAgIGF0b21pY19hZGRfaW50KCZuZnNyY190Y3BzYXZlZHJlcGxp ZXMsIDEpOwogCQkJICAgIGlmIChuZnNyY190Y3BzYXZlZHJlcGxpZXMgPgogCQkJCW5ld25mc3N0 YXRzLnNydmNhY2hlX3RjcHBlYWspCiAJCQkJbmV3bmZzc3RhdHMuc3J2Y2FjaGVfdGNwcGVhayA9 CiAJCQkJICAgIG5mc3JjX3RjcHNhdmVkcmVwbGllczsKIAkJCX0KLQkJCU5GU1VOTE9DS0NBQ0hF KCk7Ci0JCQltID0gbV9jb3B5bShuZC0+bmRfbXJlcSwgMCwgTV9DT1BZQUxMLCBNX1dBSVRPSyk7 Ci0JCQlORlNMT0NLQ0FDSEUoKTsKKwkJCW10eF91bmxvY2sobXV0ZXgpOworCQkJbSA9IG1fY29w eW0obmQtPm5kX21yZXEsIDAsIE1fQ09QWUFMTCwgTV9XQUlUKTsKKwkJCW10eF9sb2NrKG11dGV4 KTsKIAkJCXJwLT5yY19yZXBseSA9IG07CiAJCQlycC0+cmNfZmxhZyB8PSBSQ19SRVBNQlVGOwot CQkJTkZTVU5MT0NLQ0FDSEUoKTsKKwkJCW10eF91bmxvY2sobXV0ZXgpOwogCQl9CiAJCWlmIChy cC0+cmNfZmxhZyAmIFJDX1VEUCkgewogCQkJcnAtPnJjX3RpbWVzdGFtcCA9IE5GU0RfTU9OT1NF QyArCkBAIC00ODUsNyArNTIzLDcgQEAgbmZzcnZkX3VwZGF0ZWNhY2hlKHN0cnVjdCBuZnNydl9k ZXNjcmlwdAogCQkJbmZzcmNfdW5sb2NrKHJwKTsKIAkJfSBlbHNlIHsKIAkJCXJwLT5yY190aW1l c3RhbXAgPSBORlNEX01PTk9TRUMgKwotCQkJICAgIE5GU1JWQ0FDSEVfVENQVElNRU9VVDsKKwkJ CSAgICBuZnNyY190Y3B0aW1lb3V0OwogCQkJaWYgKHJwLT5yY19yZWZjbnQgPiAwKQogCQkJCW5m c3JjX3VubG9jayhycCk7CiAJCQllbHNlCkBAIC00OTMsNyArNTMxLDcgQEAgbmZzcnZkX3VwZGF0 ZWNhY2hlKHN0cnVjdCBuZnNydl9kZXNjcmlwdAogCQl9CiAJfSBlbHNlIHsKIAkJbmZzcmNfZnJl ZWNhY2hlKHJwKTsKLQkJTkZTVU5MT0NLQ0FDSEUoKTsKKwkJbXR4X3VubG9jayhtdXRleCk7CiAJ fQogCiBvdXQ6CkBAIC01MDksMTQgKzU0NywxNiBAQCBvdXQ6CiBBUFBMRVNUQVRJQyB2b2lkCiBu ZnNydmRfZGVsY2FjaGUoc3RydWN0IG5mc3J2Y2FjaGUgKnJwKQogeworCXN0cnVjdCBtdHggKm11 dGV4OwogCisJbXV0ZXggPSBuZnNyY19jYWNoZW11dGV4KHJwKTsKIAlpZiAoIShycC0+cmNfZmxh ZyAmIFJDX0lOUFJPRykpCiAJCXBhbmljKCJuZnNydmRfZGVsY2FjaGUgbm90IGluIHByb2ciKTsK LQlORlNMT0NLQ0FDSEUoKTsKKwltdHhfbG9jayhtdXRleCk7CiAJcnAtPnJjX2ZsYWcgJj0gflJD X0lOUFJPRzsKIAlpZiAocnAtPnJjX3JlZmNudCA9PSAwICYmICEocnAtPnJjX2ZsYWcgJiBSQ19M T0NLRUQpKQogCQluZnNyY19mcmVlY2FjaGUocnApOwotCU5GU1VOTE9DS0NBQ0hFKCk7CisJbXR4 X3VubG9jayhtdXRleCk7CiB9CiAKIC8qCkBAIC01MjgsNyArNTY4LDkgQEAgQVBQTEVTVEFUSUMg dm9pZAogbmZzcnZkX3NlbnRjYWNoZShzdHJ1Y3QgbmZzcnZjYWNoZSAqcnAsIHN0cnVjdCBzb2Nr ZXQgKnNvLCBpbnQgZXJyKQogewogCXRjcF9zZXEgdG1wX3NlcTsKKwlzdHJ1Y3QgbXR4ICptdXRl eDsKIAorCW11dGV4ID0gbmZzcmNfY2FjaGVtdXRleChycCk7CiAJaWYgKCEocnAtPnJjX2ZsYWcg JiBSQ19MT0NLRUQpKQogCQlwYW5pYygibmZzcnZkX3NlbnRjYWNoZSBub3QgbG9ja2VkIik7CiAJ aWYgKCFlcnIpIHsKQEAgLTUzNywxMCArNTc5LDEwIEBAIG5mc3J2ZF9zZW50Y2FjaGUoc3RydWN0 IG5mc3J2Y2FjaGUgKnJwLCAKIAkJICAgICBzby0+c29fcHJvdG8tPnByX3Byb3RvY29sICE9IElQ UFJPVE9fVENQKQogCQkJcGFuaWMoIm5mcyBzZW50IGNhY2hlIik7CiAJCWlmIChuZnNydl9nZXRz b2Nrc2VxbnVtKHNvLCAmdG1wX3NlcSkpIHsKLQkJCU5GU0xPQ0tDQUNIRSgpOworCQkJbXR4X2xv Y2sobXV0ZXgpOwogCQkJcnAtPnJjX3RjcHNlcSA9IHRtcF9zZXE7CiAJCQlycC0+cmNfZmxhZyB8 PSBSQ19UQ1BTRVE7Ci0JCQlORlNVTkxPQ0tDQUNIRSgpOworCQkJbXR4X3VubG9jayhtdXRleCk7 CiAJCX0KIAl9CiAJbmZzcmNfdW5sb2NrKHJwKTsKQEAgLTU1OSwxMSArNjAxLDEzIEBAIG5mc3Jj X2dldHRjcChzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQgKm5kLCAKIAlzdHJ1Y3QgbmZzcnZjYWNoZSAq aGl0cnA7CiAJc3RydWN0IG5mc3J2aGFzaGhlYWQgKmhwLCBuZnNyY190ZW1wbGlzdDsKIAlpbnQg aGl0LCByZXQgPSAwOworCXN0cnVjdCBtdHggKm11dGV4OwogCisJbXV0ZXggPSBuZnNyY19jYWNo ZW11dGV4KG5ld3JwKTsKIAlocCA9IE5GU1JDSEFTSChuZXdycC0+cmNfeGlkKTsKIAluZXdycC0+ cmNfcmVxbGVuID0gbmZzcmNfZ2V0bGVuYW5kY2tzdW0obmQtPm5kX21yZXAsICZuZXdycC0+cmNf Y2tzdW0pOwogdHJ5YWdhaW46Ci0JTkZTTE9DS0NBQ0hFKCk7CisJbXR4X2xvY2sobXV0ZXgpOwog CWhpdCA9IDE7CiAJTElTVF9JTklUKCZuZnNyY190ZW1wbGlzdCk7CiAJLyoKQEAgLTYyMSw4ICs2 NjUsOCBAQCB0cnlhZ2FpbjoKIAkJcnAgPSBoaXRycDsKIAkJaWYgKChycC0+cmNfZmxhZyAmIFJD X0xPQ0tFRCkgIT0gMCkgewogCQkJcnAtPnJjX2ZsYWcgfD0gUkNfV0FOVEVEOwotCQkJKHZvaWQp bXR4X3NsZWVwKHJwLCBORlNDQUNIRU1VVEVYUFRSLAotCQkJICAgIChQWkVSTyAtIDEpIHwgUERS T1AsICJuZnNyYyIsIDEwICogaHopOworCQkJKHZvaWQpbXR4X3NsZWVwKHJwLCBtdXRleCwgKFBa RVJPIC0gMSkgfCBQRFJPUCwKKwkJCSAgICAibmZzcmMiLCAxMCAqIGh6KTsKIAkJCWdvdG8gdHJ5 YWdhaW47CiAJCX0KIAkJaWYgKHJwLT5yY19mbGFnID09IDApCkBAIC02MzAsNyArNjc0LDcgQEAg dHJ5YWdhaW46CiAJCXJwLT5yY19mbGFnIHw9IFJDX0xPQ0tFRDsKIAkJaWYgKHJwLT5yY19mbGFn ICYgUkNfSU5QUk9HKSB7CiAJCQluZXduZnNzdGF0cy5zcnZjYWNoZV9pbnByb2doaXRzKys7Ci0J CQlORlNVTkxPQ0tDQUNIRSgpOworCQkJbXR4X3VubG9jayhtdXRleCk7CiAJCQlpZiAobmV3cnAt PnJjX3NvY2tyZWYgPT0gcnAtPnJjX3NvY2tyZWYpCiAJCQkJbmZzcmNfbWFya3NhbWV0Y3Bjb25u KHJwLT5yY19zb2NrcmVmKTsKIAkJCXJldCA9IFJDX0RST1BJVDsKQEAgLTYzOSwyNCArNjgzLDI0 IEBAIHRyeWFnYWluOgogCQkJICogVjIgb25seS4KIAkJCSAqLwogCQkJbmV3bmZzc3RhdHMuc3J2 Y2FjaGVfbm9uaWRlbWRvbmVoaXRzKys7Ci0JCQlORlNVTkxPQ0tDQUNIRSgpOworCQkJbXR4X3Vu bG9jayhtdXRleCk7CiAJCQlpZiAobmV3cnAtPnJjX3NvY2tyZWYgPT0gcnAtPnJjX3NvY2tyZWYp CiAJCQkJbmZzcmNfbWFya3NhbWV0Y3Bjb25uKHJwLT5yY19zb2NrcmVmKTsKIAkJCXJldCA9IFJD X1JFUExZOwogCQkJbmZzcnZkX3JlcGhlYWQobmQpOwogCQkJKihuZC0+bmRfZXJycCkgPSBycC0+ cmNfc3RhdHVzOwogCQkJcnAtPnJjX3RpbWVzdGFtcCA9IE5GU0RfTU9OT1NFQyArCi0JCQkJTkZT UlZDQUNIRV9UQ1BUSU1FT1VUOworCQkJCW5mc3JjX3RjcHRpbWVvdXQ7CiAJCX0gZWxzZSBpZiAo cnAtPnJjX2ZsYWcgJiBSQ19SRVBNQlVGKSB7CiAJCQluZXduZnNzdGF0cy5zcnZjYWNoZV9ub25p ZGVtZG9uZWhpdHMrKzsKLQkJCU5GU1VOTE9DS0NBQ0hFKCk7CisJCQltdHhfdW5sb2NrKG11dGV4 KTsKIAkJCWlmIChuZXdycC0+cmNfc29ja3JlZiA9PSBycC0+cmNfc29ja3JlZikKIAkJCQluZnNy Y19tYXJrc2FtZXRjcGNvbm4ocnAtPnJjX3NvY2tyZWYpOwogCQkJcmV0ID0gUkNfUkVQTFk7CiAJ CQluZC0+bmRfbXJlcSA9IG1fY29weW0ocnAtPnJjX3JlcGx5LCAwLAogCQkJCU1fQ09QWUFMTCwg TV9XQUlUT0spOwogCQkJcnAtPnJjX3RpbWVzdGFtcCA9IE5GU0RfTU9OT1NFQyArCi0JCQkJTkZT UlZDQUNIRV9UQ1BUSU1FT1VUOworCQkJCW5mc3JjX3RjcHRpbWVvdXQ7CiAJCX0gZWxzZSB7CiAJ CQlwYW5pYygibmZzIHRjcCBjYWNoZTEiKTsKIAkJfQpAQCAtNjc0LDcgKzcxOCw3IEBAIHRyeWFn YWluOgogCW5ld3JwLT5yY19jYWNoZXRpbWUgPSBORlNEX01PTk9TRUM7CiAJbmV3cnAtPnJjX2Zs YWcgfD0gUkNfSU5QUk9HOwogCUxJU1RfSU5TRVJUX0hFQUQoaHAsIG5ld3JwLCByY19oYXNoKTsK LQlORlNVTkxPQ0tDQUNIRSgpOworCW10eF91bmxvY2sobXV0ZXgpOwogCW5kLT5uZF9ycCA9IG5l d3JwOwogCXJldCA9IFJDX0RPSVQ7CiAKQEAgLTY4NSwxNiArNzI5LDE3IEBAIG91dDoKIAogLyoK ICAqIExvY2sgYSBjYWNoZSBlbnRyeS4KLSAqIEFsc28gcHV0cyBhIG11dGV4IGxvY2sgb24gdGhl IGNhY2hlIGxpc3QuCiAgKi8KIHN0YXRpYyB2b2lkCiBuZnNyY19sb2NrKHN0cnVjdCBuZnNydmNh Y2hlICpycCkKIHsKLQlORlNDQUNIRUxPQ0tSRVFVSVJFRCgpOworCXN0cnVjdCBtdHggKm11dGV4 OworCisJbXV0ZXggPSBuZnNyY19jYWNoZW11dGV4KHJwKTsKKwltdHhfYXNzZXJ0KG11dGV4LCBN QV9PV05FRCk7CiAJd2hpbGUgKChycC0+cmNfZmxhZyAmIFJDX0xPQ0tFRCkgIT0gMCkgewogCQly cC0+cmNfZmxhZyB8PSBSQ19XQU5URUQ7Ci0JCSh2b2lkKW10eF9zbGVlcChycCwgTkZTQ0FDSEVN VVRFWFBUUiwgUFpFUk8gLSAxLAotCQkgICAgIm5mc3JjIiwgMCk7CisJCSh2b2lkKW10eF9zbGVl cChycCwgbXV0ZXgsIFBaRVJPIC0gMSwgIm5mc3JjIiwgMCk7CiAJfQogCXJwLT5yY19mbGFnIHw9 IFJDX0xPQ0tFRDsKIH0KQEAgLTcwNSwxMSArNzUwLDEzIEBAIG5mc3JjX2xvY2soc3RydWN0IG5m c3J2Y2FjaGUgKnJwKQogc3RhdGljIHZvaWQKIG5mc3JjX3VubG9jayhzdHJ1Y3QgbmZzcnZjYWNo ZSAqcnApCiB7CisJc3RydWN0IG10eCAqbXV0ZXg7CiAKLQlORlNMT0NLQ0FDSEUoKTsKKwltdXRl eCA9IG5mc3JjX2NhY2hlbXV0ZXgocnApOworCW10eF9sb2NrKG11dGV4KTsKIAlycC0+cmNfZmxh ZyAmPSB+UkNfTE9DS0VEOwogCW5mc3JjX3dhbnRlZChycCk7Ci0JTkZTVU5MT0NLQ0FDSEUoKTsK KwltdHhfdW5sb2NrKG11dGV4KTsKIH0KIAogLyoKQEAgLTczMiw3ICs3NzksNiBAQCBzdGF0aWMg dm9pZAogbmZzcmNfZnJlZWNhY2hlKHN0cnVjdCBuZnNydmNhY2hlICpycCkKIHsKIAotCU5GU0NB Q0hFTE9DS1JFUVVJUkVEKCk7CiAJTElTVF9SRU1PVkUocnAsIHJjX2hhc2gpOwogCWlmIChycC0+ cmNfZmxhZyAmIFJDX1VEUCkgewogCQlUQUlMUV9SRU1PVkUoJm5mc3J2dWRwbHJ1LCBycCwgcmNf bHJ1KTsKQEAgLTc0Miw3ICs3ODgsNyBAQCBuZnNyY19mcmVlY2FjaGUoc3RydWN0IG5mc3J2Y2Fj aGUgKnJwKQogCWlmIChycC0+cmNfZmxhZyAmIFJDX1JFUE1CVUYpIHsKIAkJbWJ1Zl9mcmVlbShy cC0+cmNfcmVwbHkpOwogCQlpZiAoIShycC0+cmNfZmxhZyAmIFJDX1VEUCkpCi0JCQluZnNyY190 Y3BzYXZlZHJlcGxpZXMtLTsKKwkJCWF0b21pY19hZGRfaW50KCZuZnNyY190Y3BzYXZlZHJlcGxp ZXMsIC0xKTsKIAl9CiAJRlJFRSgoY2FkZHJfdClycCwgTV9ORlNSVkNBQ0hFKTsKIAluZXduZnNz dGF0cy5zcnZjYWNoZV9zaXplLS07CkBAIC03NTcsMjAgKzgwMywyMiBAQCBuZnNydmRfY2xlYW5j YWNoZSh2b2lkKQogCXN0cnVjdCBuZnNydmNhY2hlICpycCwgKm5leHRycDsKIAlpbnQgaTsKIAot CU5GU0xPQ0tDQUNIRSgpOwogCWZvciAoaSA9IDA7IGkgPCBORlNSVkNBQ0hFX0hBU0hTSVpFOyBp KyspIHsKKwkJbXR4X2xvY2soJm5mc3JjX3RjcG10eFtpXSk7CiAJCUxJU1RfRk9SRUFDSF9TQUZF KHJwLCAmbmZzcnZoYXNodGJsW2ldLCByY19oYXNoLCBuZXh0cnApIHsKIAkJCW5mc3JjX2ZyZWVj YWNoZShycCk7CiAJCX0KKwkJbXR4X3VubG9jaygmbmZzcmNfdGNwbXR4W2ldKTsKIAl9CisJbXR4 X2xvY2soJm5mc3JjX3VkcG10eCk7CiAJZm9yIChpID0gMDsgaSA8IE5GU1JWQ0FDSEVfSEFTSFNJ WkU7IGkrKykgewogCQlMSVNUX0ZPUkVBQ0hfU0FGRShycCwgJm5mc3J2dWRwaGFzaHRibFtpXSwg cmNfaGFzaCwgbmV4dHJwKSB7CiAJCQluZnNyY19mcmVlY2FjaGUocnApOwogCQl9CiAJfQogCW5l d25mc3N0YXRzLnNydmNhY2hlX3NpemUgPSAwOworCW10eF91bmxvY2soJm5mc3JjX3VkcG10eCk7 CiAJbmZzcmNfdGNwc2F2ZWRyZXBsaWVzID0gMDsKLQlORlNVTkxPQ0tDQUNIRSgpOwogfQogCiAv KgpAQCAtNzgwLDI4ICs4MjgsOTcgQEAgc3RhdGljIHZvaWQKIG5mc3JjX3RyaW1jYWNoZSh1X2lu dDY0X3Qgc29ja3JlZiwgc3RydWN0IHNvY2tldCAqc28pCiB7CiAJc3RydWN0IG5mc3J2Y2FjaGUg KnJwLCAqbmV4dHJwOwotCWludCBpOworCWludCBpLCBqLCBrLCB0aW1lX2hpc3RvWzEwXTsKKwl0 aW1lX3QgdGhpc3N0YW1wOworCXN0YXRpYyB0aW1lX3QgdWRwX2xhc3R0cmltID0gMCwgdGNwX2xh c3R0cmltID0gMDsKKwlzdGF0aWMgaW50IG9uZXRocmVhZCA9IDA7CiAKLQlORlNMT0NLQ0FDSEUo KTsKLQlUQUlMUV9GT1JFQUNIX1NBRkUocnAsICZuZnNydnVkcGxydSwgcmNfbHJ1LCBuZXh0cnAp IHsKLQkJaWYgKCEocnAtPnJjX2ZsYWcgJiAoUkNfSU5QUk9HfFJDX0xPQ0tFRHxSQ19XQU5URUQp KQotCQkgICAgICYmIHJwLT5yY19yZWZjbnQgPT0gMAotCQkgICAgICYmICgocnAtPnJjX2ZsYWcg JiBSQ19SRUZDTlQpIHx8Ci0JCQkgTkZTRF9NT05PU0VDID4gcnAtPnJjX3RpbWVzdGFtcCB8fAot CQkJIG5mc3JjX3VkcGNhY2hlc2l6ZSA+IG5mc3JjX3VkcGhpZ2h3YXRlcikpCi0JCQluZnNyY19m cmVlY2FjaGUocnApOwotCX0KLQlmb3IgKGkgPSAwOyBpIDwgTkZTUlZDQUNIRV9IQVNIU0laRTsg aSsrKSB7Ci0JCUxJU1RfRk9SRUFDSF9TQUZFKHJwLCAmbmZzcnZoYXNodGJsW2ldLCByY19oYXNo LCBuZXh0cnApIHsKKwlpZiAoYXRvbWljX2NtcHNldF9hY3FfaW50KCZvbmV0aHJlYWQsIDAsIDEp ID09IDApCisJCXJldHVybjsKKwlpZiAoTkZTRF9NT05PU0VDICE9IHVkcF9sYXN0dHJpbSB8fAor CSAgICBuZnNyY191ZHBjYWNoZXNpemUgPj0gKG5mc3JjX3VkcGhpZ2h3YXRlciArCisJICAgIG5m c3JjX3VkcGhpZ2h3YXRlciAvIDIpKSB7CisJCW10eF9sb2NrKCZuZnNyY191ZHBtdHgpOworCQl1 ZHBfbGFzdHRyaW0gPSBORlNEX01PTk9TRUM7CisJCVRBSUxRX0ZPUkVBQ0hfU0FGRShycCwgJm5m c3J2dWRwbHJ1LCByY19scnUsIG5leHRycCkgewogCQkJaWYgKCEocnAtPnJjX2ZsYWcgJiAoUkNf SU5QUk9HfFJDX0xPQ0tFRHxSQ19XQU5URUQpKQogCQkJICAgICAmJiBycC0+cmNfcmVmY250ID09 IDAKIAkJCSAgICAgJiYgKChycC0+cmNfZmxhZyAmIFJDX1JFRkNOVCkgfHwKLQkJCQkgTkZTRF9N T05PU0VDID4gcnAtPnJjX3RpbWVzdGFtcCB8fAotCQkJCSBuZnNyY19hY3RpdmVzb2NrZXQocnAs IHNvY2tyZWYsIHNvKSkpCisJCQkJIHVkcF9sYXN0dHJpbSA+IHJwLT5yY190aW1lc3RhbXAgfHwK KwkJCQkgbmZzcmNfdWRwY2FjaGVzaXplID4gbmZzcmNfdWRwaGlnaHdhdGVyKSkKIAkJCQluZnNy Y19mcmVlY2FjaGUocnApOwogCQl9CisJCW10eF91bmxvY2soJm5mc3JjX3VkcG10eCk7CisJfQor CWlmIChORlNEX01PTk9TRUMgIT0gdGNwX2xhc3R0cmltIHx8CisJICAgIG5mc3JjX3RjcHNhdmVk cmVwbGllcyA+PSBuZnNyY190Y3BoaWdod2F0ZXIpIHsKKwkJZm9yIChpID0gMDsgaSA8IDEwOyBp KyspCisJCQl0aW1lX2hpc3RvW2ldID0gMDsKKwkJZm9yIChpID0gMDsgaSA8IE5GU1JWQ0FDSEVf SEFTSFNJWkU7IGkrKykgeworCQkJbXR4X2xvY2soJm5mc3JjX3RjcG10eFtpXSk7CisJCQlpZiAo aSA9PSAwKQorCQkJCXRjcF9sYXN0dHJpbSA9IE5GU0RfTU9OT1NFQzsKKwkJCUxJU1RfRk9SRUFD SF9TQUZFKHJwLCAmbmZzcnZoYXNodGJsW2ldLCByY19oYXNoLAorCQkJICAgIG5leHRycCkgewor CQkJCWlmICghKHJwLT5yY19mbGFnICYKKwkJCQkgICAgIChSQ19JTlBST0d8UkNfTE9DS0VEfFJD X1dBTlRFRCkpCisJCQkJICAgICAmJiBycC0+cmNfcmVmY250ID09IDApIHsKKwkJCQkJLyoKKwkJ CQkJICogVGhlIHRpbWVzdGFtcHMgcmFuZ2UgZnJvbSByb3VnaGx5IHRoZQorCQkJCQkgKiBwcmVz ZW50ICh0Y3BfbGFzdHRyaW0pIHRvIHRoZSBwcmVzZW50CisJCQkJCSAqICsgbmZzcmNfdGNwdGlt ZW91dC4gR2VuZXJhdGUgYSBzaW1wbGUKKwkJCQkJICogaGlzdG9ncmFtIG9mIHdoZXJlIHRoZSB0 aW1lb3V0cyBmYWxsLgorCQkJCQkgKi8KKwkJCQkJaiA9IHJwLT5yY190aW1lc3RhbXAgLSB0Y3Bf bGFzdHRyaW07CisJCQkJCWlmIChqID49IG5mc3JjX3RjcHRpbWVvdXQpCisJCQkJCQlqID0gbmZz cmNfdGNwdGltZW91dCAtIDE7CisJCQkJCWlmIChqIDwgMCkKKwkJCQkJCWogPSAwOworCQkJCQlq ID0gKGogKiAxMCAvIG5mc3JjX3RjcHRpbWVvdXQpICUgMTA7CisJCQkJCXRpbWVfaGlzdG9bal0r KzsKKwkJCQkJaWYgKChycC0+cmNfZmxhZyAmIFJDX1JFRkNOVCkgfHwKKwkJCQkJICAgIHRjcF9s YXN0dHJpbSA+IHJwLT5yY190aW1lc3RhbXAgfHwKKwkJCQkJICAgIG5mc3JjX2FjdGl2ZXNvY2tl dChycCwgc29ja3JlZiwgc28pKQorCQkJCQkJbmZzcmNfZnJlZWNhY2hlKHJwKTsKKwkJCQl9CisJ CQl9CisJCQltdHhfdW5sb2NrKCZuZnNyY190Y3BtdHhbaV0pOworCQl9CisJCWogPSBuZnNyY190 Y3BoaWdod2F0ZXIgLyA1OwkvKiAyMCUgb2YgaXQgKi8KKwkJaWYgKGogPiAwICYmIChuZnNyY190 Y3BzYXZlZHJlcGxpZXMgKyBqKSA+IG5mc3JjX3RjcGhpZ2h3YXRlcikgeworCQkJLyoKKwkJCSAq IFRyaW0gc29tZSBtb3JlIHdpdGggYSBzbWFsbGVyIHRpbWVvdXQgb2YgYXMgbGl0dGxlCisJCQkg KiBhcyAyMCUgb2YgbmZzcmNfdGNwdGltZW91dCB0byB0cnkgYW5kIGdldCBiZWxvdworCQkJICog ODAlIG9mIHRoZSBuZnNyY190Y3BoaWdod2F0ZXIuCisJCQkgKi8KKwkJCWsgPSAwOworCQkJZm9y IChpID0gMDsgaSA8IDg7IGkrKykgeworCQkJCWsgKz0gdGltZV9oaXN0b1tpXTsKKwkJCQlpZiAo ayA+IGopCisJCQkJCWJyZWFrOworCQkJfQorCQkJayA9IG5mc3JjX3RjcHRpbWVvdXQgKiAoaSAr IDEpIC8gMTA7CisJCQlpZiAoayA8IDEpCisJCQkJayA9IDE7CisJCQl0aGlzc3RhbXAgPSB0Y3Bf bGFzdHRyaW0gKyBrOworCQkJZm9yIChpID0gMDsgaSA8IE5GU1JWQ0FDSEVfSEFTSFNJWkU7IGkr KykgeworCQkJCW10eF9sb2NrKCZuZnNyY190Y3BtdHhbaV0pOworCQkJCUxJU1RfRk9SRUFDSF9T QUZFKHJwLCAmbmZzcnZoYXNodGJsW2ldLCByY19oYXNoLAorCQkJCSAgICBuZXh0cnApIHsKKwkJ CQkJaWYgKCEocnAtPnJjX2ZsYWcgJgorCQkJCQkgICAgIChSQ19JTlBST0d8UkNfTE9DS0VEfFJD X1dBTlRFRCkpCisJCQkJCSAgICAgJiYgcnAtPnJjX3JlZmNudCA9PSAwCisJCQkJCSAgICAgJiYg KChycC0+cmNfZmxhZyAmIFJDX1JFRkNOVCkgfHwKKwkJCQkJCSB0aGlzc3RhbXAgPiBycC0+cmNf dGltZXN0YW1wIHx8CisJCQkJCQkgbmZzcmNfYWN0aXZlc29ja2V0KHJwLCBzb2NrcmVmLAorCQkJ CQkJICAgIHNvKSkpCisJCQkJCQluZnNyY19mcmVlY2FjaGUocnApOworCQkJCX0KKwkJCQltdHhf dW5sb2NrKCZuZnNyY190Y3BtdHhbaV0pOworCQkJfQorCQl9CiAJfQotCU5GU1VOTE9DS0NBQ0hF KCk7CisJYXRvbWljX3N0b3JlX3JlbF9pbnQoJm9uZXRocmVhZCwgMCk7CiB9CiAKIC8qCkBAIC04 MTAsMTIgKzkyNywxNCBAQCBuZnNyY190cmltY2FjaGUodV9pbnQ2NF90IHNvY2tyZWYsIHN0cnVj CiBBUFBMRVNUQVRJQyB2b2lkCiBuZnNydmRfcmVmY2FjaGUoc3RydWN0IG5mc3J2Y2FjaGUgKnJw KQogeworCXN0cnVjdCBtdHggKm11dGV4OwogCi0JTkZTTE9DS0NBQ0hFKCk7CisJbXV0ZXggPSBu ZnNyY19jYWNoZW11dGV4KHJwKTsKKwltdHhfbG9jayhtdXRleCk7CiAJaWYgKHJwLT5yY19yZWZj bnQgPCAwKQogCQlwYW5pYygibmZzIGNhY2hlIHJlZmNudCIpOwogCXJwLT5yY19yZWZjbnQrKzsK LQlORlNVTkxPQ0tDQUNIRSgpOworCW10eF91bmxvY2sobXV0ZXgpOwogfQogCiAvKgpAQCAtODI0 LDE0ICs5NDMsMTYgQEAgbmZzcnZkX3JlZmNhY2hlKHN0cnVjdCBuZnNydmNhY2hlICpycCkKIEFQ UExFU1RBVElDIHZvaWQKIG5mc3J2ZF9kZXJlZmNhY2hlKHN0cnVjdCBuZnNydmNhY2hlICpycCkK IHsKKwlzdHJ1Y3QgbXR4ICptdXRleDsKIAotCU5GU0xPQ0tDQUNIRSgpOworCW11dGV4ID0gbmZz cmNfY2FjaGVtdXRleChycCk7CisJbXR4X2xvY2sobXV0ZXgpOwogCWlmIChycC0+cmNfcmVmY250 IDw9IDApCiAJCXBhbmljKCJuZnMgY2FjaGUgZGVyZWZjbnQiKTsKIAlycC0+cmNfcmVmY250LS07 CiAJaWYgKHJwLT5yY19yZWZjbnQgPT0gMCAmJiAhKHJwLT5yY19mbGFnICYgKFJDX0xPQ0tFRCB8 IFJDX0lOUFJPRykpKQogCQluZnNyY19mcmVlY2FjaGUocnApOwotCU5GU1VOTE9DS0NBQ0hFKCk7 CisJbXR4X3VubG9jayhtdXRleCk7CiB9CiAKIC8qCi0tLSBmcy9uZnNzZXJ2ZXIvbmZzX25mc2Rw b3J0LmMub3JpZwkyMDEzLTAzLTAyIDE4OjE5OjM0LjAwMDAwMDAwMCAtMDUwMAorKysgZnMvbmZz c2VydmVyL25mc19uZnNkcG9ydC5jCTIwMTMtMDMtMTIgMTc6NTE6MzEuMDAwMDAwMDAwIC0wNDAw CkBAIC02MSw3ICs2MSw4IEBAIGV4dGVybiBzdHJ1Y3QgbmZzdjRsb2NrIG5mc2Rfc3VzcGVuZF9s b2MKIGV4dGVybiBzdHJ1Y3QgbmZzc2Vzc2lvbmhhc2ggbmZzc2Vzc2lvbmhhc2hbTkZTU0VTU0lP TkhBU0hTSVpFXTsKIHN0cnVjdCB2ZnNvcHRsaXN0IG5mc3Y0cm9vdF9vcHQsIG5mc3Y0cm9vdF9u ZXdvcHQ7CiBORlNETE9DS01VVEVYOwotc3RydWN0IG10eCBuZnNfY2FjaGVfbXV0ZXg7CitzdHJ1 Y3QgbXR4IG5mc3JjX3RjcG10eFtORlNSVkNBQ0hFX0hBU0hTSVpFXTsKK3N0cnVjdCBtdHggbmZz cmNfdWRwbXR4Owogc3RydWN0IG10eCBuZnNfdjRyb290X211dGV4Owogc3RydWN0IG5mc3J2Zmgg bmZzX3Jvb3RmaCwgbmZzX3B1YmZoOwogaW50IG5mc19wdWJmaHNldCA9IDAsIG5mc19yb290Zmhz ZXQgPSAwOwpAQCAtMzMwNSw3ICszMzA2LDEwIEBAIG5mc2RfbW9kZXZlbnQobW9kdWxlX3QgbW9k LCBpbnQgdHlwZSwgdm8KIAkJaWYgKGxvYWRlZCkKIAkJCWdvdG8gb3V0OwogCQluZXduZnNfcG9y dGluaXQoKTsKLQkJbXR4X2luaXQoJm5mc19jYWNoZV9tdXRleCwgIm5mc19jYWNoZV9tdXRleCIs IE5VTEwsIE1UWF9ERUYpOworCQlmb3IgKGkgPSAwOyBpIDwgTkZTUlZDQUNIRV9IQVNIU0laRTsg aSsrKQorCQkJbXR4X2luaXQoJm5mc3JjX3RjcG10eFtpXSwgIm5mc190Y3BjYWNoZV9tdXRleCIs IE5VTEwsCisJCQkgICAgTVRYX0RFRik7CisJCW10eF9pbml0KCZuZnNyY191ZHBtdHgsICJuZnNf dWRwY2FjaGVfbXV0ZXgiLCBOVUxMLCBNVFhfREVGKTsKIAkJbXR4X2luaXQoJm5mc192NHJvb3Rf bXV0ZXgsICJuZnNfdjRyb290X211dGV4IiwgTlVMTCwgTVRYX0RFRik7CiAJCW10eF9pbml0KCZu ZnN2NHJvb3RfbW50Lm1udF9tdHgsICJzdHJ1Y3QgbW91bnQgbXR4IiwgTlVMTCwKIAkJICAgIE1U WF9ERUYpOwpAQCAtMzM1Miw3ICszMzU2LDkgQEAgbmZzZF9tb2RldmVudChtb2R1bGVfdCBtb2Qs IGludCB0eXBlLCB2bwogCQkJc3ZjcG9vbF9kZXN0cm95KG5mc3J2ZF9wb29sKTsKIAogCQkvKiBh bmQgZ2V0IHJpZCBvZiB0aGUgbG9ja3MgKi8KLQkJbXR4X2Rlc3Ryb3koJm5mc19jYWNoZV9tdXRl eCk7CisJCWZvciAoaSA9IDA7IGkgPCBORlNSVkNBQ0hFX0hBU0hTSVpFOyBpKyspCisJCQltdHhf ZGVzdHJveSgmbmZzcmNfdGNwbXR4W2ldKTsKKwkJbXR4X2Rlc3Ryb3koJm5mc3JjX3VkcG10eCk7 CiAJCW10eF9kZXN0cm95KCZuZnNfdjRyb290X211dGV4KTsKIAkJbXR4X2Rlc3Ryb3koJm5mc3Y0 cm9vdF9tbnQubW50X210eCk7CiAJCWZvciAoaSA9IDA7IGkgPCBORlNTRVNTSU9OSEFTSFNJWkU7 IGkrKykKLS0tIGZzL25mcy9uZnNwb3J0Lmgub3JpZwkyMDEzLTAzLTAyIDE4OjM1OjEzLjAwMDAw MDAwMCAtMDUwMAorKysgZnMvbmZzL25mc3BvcnQuaAkyMDEzLTAzLTEyIDE3OjUxOjMxLjAwMDAw MDAwMCAtMDQwMApAQCAtNjA5LDExICs2MDksNiBAQCB2b2lkIG5mc3J2ZF9yY3Yoc3RydWN0IHNv Y2tldCAqLCB2b2lkICosCiAjZGVmaW5lCU5GU1JFUVNQSU5MT0NLCQlleHRlcm4gc3RydWN0IG10 eCBuZnNfcmVxX211dGV4CiAjZGVmaW5lCU5GU0xPQ0tSRVEoKQkJbXR4X2xvY2soJm5mc19yZXFf bXV0ZXgpCiAjZGVmaW5lCU5GU1VOTE9DS1JFUSgpCQltdHhfdW5sb2NrKCZuZnNfcmVxX211dGV4 KQotI2RlZmluZQlORlNDQUNIRU1VVEVYCQlleHRlcm4gc3RydWN0IG10eCBuZnNfY2FjaGVfbXV0 ZXgKLSNkZWZpbmUJTkZTQ0FDSEVNVVRFWFBUUgkoJm5mc19jYWNoZV9tdXRleCkKLSNkZWZpbmUJ TkZTTE9DS0NBQ0hFKCkJCW10eF9sb2NrKCZuZnNfY2FjaGVfbXV0ZXgpCi0jZGVmaW5lCU5GU1VO TE9DS0NBQ0hFKCkJbXR4X3VubG9jaygmbmZzX2NhY2hlX211dGV4KQotI2RlZmluZQlORlNDQUNI RUxPQ0tSRVFVSVJFRCgpCW10eF9hc3NlcnQoJm5mc19jYWNoZV9tdXRleCwgTUFfT1dORUQpCiAj ZGVmaW5lCU5GU1NPQ0tNVVRFWAkJZXh0ZXJuIHN0cnVjdCBtdHggbmZzX3Nsb2NrX211dGV4CiAj ZGVmaW5lCU5GU1NPQ0tNVVRFWFBUUgkJKCZuZnNfc2xvY2tfbXV0ZXgpCiAjZGVmaW5lCU5GU0xP Q0tTT0NLKCkJCW10eF9sb2NrKCZuZnNfc2xvY2tfbXV0ZXgpCi0tLSBmcy9uZnMvbmZzcnZjYWNo ZS5oLm9yaWcJMjAxMy0wMS0wNyAwOTowNDoxNS4wMDAwMDAwMDAgLTA1MDAKKysrIGZzL25mcy9u ZnNydmNhY2hlLmgJMjAxMy0wMy0xMiAxODowMjo0Mi4wMDAwMDAwMDAgLTA0MDAKQEAgLTQxLDcg KzQxLDcgQEAKICNkZWZpbmUJTkZTUlZDQUNIRV9NQVhfU0laRQkyMDQ4CiAjZGVmaW5lCU5GU1JW Q0FDSEVfTUlOX1NJWkUJICA2NAogCi0jZGVmaW5lCU5GU1JWQ0FDSEVfSEFTSFNJWkUJMjAKKyNk ZWZpbmUJTkZTUlZDQUNIRV9IQVNIU0laRQk1MDAKIAogc3RydWN0IG5mc3J2Y2FjaGUgewogCUxJ U1RfRU5UUlkobmZzcnZjYWNoZSkgcmNfaGFzaDsJCS8qIEhhc2ggY2hhaW4gKi8K ------=_Part_4078122_800517510.1363729759048-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 00:36:17 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 721C8D7B for ; Wed, 20 Mar 2013 00:36:17 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Received: from mail-ie0-x233.google.com (mail-ie0-x233.google.com [IPv6:2607:f8b0:4001:c03::233]) by mx1.freebsd.org (Postfix) with ESMTP id 26311F23 for ; Wed, 20 Mar 2013 00:36:17 +0000 (UTC) Received: by mail-ie0-f179.google.com with SMTP id k11so1455591iea.24 for ; Tue, 19 Mar 2013 17:36:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=pVZG9cCZcsCPjbuI0grN+HOCru/99LUYkfAqHxMLR10=; b=WbyUpWTH+GoyLGjcYndXM00CQ/YDKK6CgBRsCqY9KDhl+AoNIs38DyMZY+JHiG4Eql UrWZ9RDH79hR6AyYdiqoaP7Bldzt4GkUnM+1+TNkAgEyKPECvP0R+6JGmcdg6t6H9KPa /Q72/whOi5WJ6+T+xxFcupQ9P0JcUVLg1BzeunYfUTlAlSjUstGaKbSsDy5JMmaUN9go Ha/ndD4OsTJODh/bGqbAlsaX7JWbwT4k4s2lhlVYp1n3+BSkikv+McAsoUTdj2k5IpnA kFlme2TFtJIHxj6zrqwf0aUhXP5BtP54KNi4Fi7QkqRd3ZcAnEDnv89VCp+218KLqDiG Hi9w== MIME-Version: 1.0 X-Received: by 10.43.88.134 with SMTP id ba6mr12294050icc.18.1363739776128; Tue, 19 Mar 2013 17:36:16 -0700 (PDT) Sender: jdavidlists@gmail.com Received: by 10.42.153.133 with HTTP; Tue, 19 Mar 2013 17:36:15 -0700 (PDT) In-Reply-To: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> References: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> Date: Tue, 19 Mar 2013 20:36:15 -0400 X-Google-Sender-Auth: r6XxkX7Ebaq24Hvaq1nAF6pS5ts Message-ID: Subject: Re: FreeBSD & no single point of failure file service From: J David To: Michael DeMan Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 00:36:17 -0000 On Sat, Mar 16, 2013 at 8:48 PM, Michael DeMan wrote: > I was thinking to maybe test something out like: > > #1. A couple old Dell 2970s head units with LSI cards. > #2. One dual-port SAS chassis. > #3. Figure out what needs to happen with devd+carp in order for the head > end units to REALIBLY know when to export/import ZFS and when to advertise > NFS/iSCSI, etc. > I was trying to figure out if it could be tested with a couple of virtual machines pointed at the same shared disk image. :) > A couple catches with this of course is that for #3 there could be some > kind of unexpected heartbeat failure between the two head end units where > they both decide the other is gone and both become masters - which would > probably result in catastrophic corruption on the file system. > I think you almost need three (or more) participants, rather than two. Then, the participants elect a master, and if you don't have a majority (e.g. two out of three votes), you didn't win. Only two of them need to be connected to the actual disks. The additional voter(s) could be one or more consumers of the filesystem services, which would tend to help keep the available one winning the master role in a split-brain scenario. That probably needs to be complicated slightly, as the export/import process isn't anything like instant. So if you get that scenario where the master loses connectivity to the clients but not the FS and you still need to promote a new master -- or you want to do manual failover for maintenance reasons -- you do need to make sure "export" finishes before "import" starts. You could wait until you've been master for X seconds before starting your import (where maybe X ~= 30), and the whole world will wait with you. Another alternative would be some sort of shared permanent storage, like a non-ZFS partition or drive upon which the master writes a timestamp, and the slave reads it. You don't touch the drives until either the timestamp says it's all clear or the timestamp is X seconds old. But then you run into all those goofy shared disk read caching issues, and I'm not at all sure you can peek at one partition of a SAS drive while another partition is mounted on another system. (The alternative being to dedicate two drives for that purposes, which two drives to share one 512 byte sector sounds terribly wasteful.) The third possibility would be to do it without shared storage: a machine could just broadcast "I'm touching the drives!" every second and a newly-elected master would have to wait until those messages stop for X seconds or until it sees "I'm not touching the drives!" before proceeding. That would be a little less reliable if the newly-elected master rebooted unless each machine keeps a persistent copy in local storage. In that scheme, you would just have to make sure you started/stopped things in the right order. Start: 1. Start greedy shouter. 2. Import ZFS pool. 3. ifup service interface. (Arguably doesn't even need CARP at this point.) 4. Start NFS/iSCSI Stop: 1. Stop NFS/iSCSI. 2. Ifdown service interface. 3. Export ZFS pool. 4. Stop greedy shouter. CARP loses a lot of value because it's not like TCP sessions for NFS or iSCSI can live migrate between machines anyway, but might still be useful to make sure the interface IPs have the same MAC address. Either way, I think the interface in question should be explicitly marked up/down rather than utilizing CARP for automatic interface failover. I don't think it's a good idea for a service IP to jump to a machine if it's 100% certain that that machine won't be ready. That is particularly true in the case of a previously-down master returning to service alongside a working new master. Of course the simplest solution of all is just to not implement automated failover right away. If the machines are there and configured and there is 24x7 admin, just make sure they always boot up in standby mode and have to be manually promoted to master. The time it would take for an admin to log in to the standby server and type "the_student_is_now_the_master.sh" is still probably a huge improvement over whatever the present state of affairs is. :) That would allow some time to examine real-world failure cases in a bit more detail, observe the decisions the admin makes about when to fail over, and maybe come up with a better / more resilient design that better models those decisions. SuperMicro does have that one chassis that accepts lots of drives and two > custom motherboards that are linked internally via 10GB - I think ixsystems > uses that. So in theory the edge case of the accidental 'master/master' > configuration is helped by hardware. By the same token I am skeptical of > having both head end units in a single chassis. Pardon me for being > paranoid. > I tried to convince myself "it's OK as long as they only common part is sheet metal." But yes, I've seen that and as cool as it looks, it makes me nervous too. > The hard work is always in the details, not the design? > Too right. Of course there's a whole other category of problems, like those where ZFS can run with a failed cache dev but sometimes won't import without it. Hopefully those types of problems are mostly behind us. I know I still read a lot of stuff on this list about ZFS that makes me even more nervous than putting all my eggs in one sheet metal basket. Thanks! From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 01:37:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 980FB642 for ; Wed, 20 Mar 2013 01:37:42 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 250B4235 for ; Wed, 20 Mar 2013 01:37:41 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id r2K1bYgB020319; Tue, 19 Mar 2013 20:37:37 -0500 (CDT) Date: Tue, 19 Mar 2013 20:37:34 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Steven Hartland Subject: Re: FreBSD 9.1 and ZFS v28 performances In-Reply-To: Message-ID: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 19 Mar 2013 20:37:37 -0500 (CDT) Cc: freebsd-fs@freebsd.org, Davide D'Amico X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 01:37:42 -0000 On Mon, 18 Mar 2013, Steven Hartland wrote: > > Oh and another thing if this is mysql did you set the right settings > for your ZFS volume e.g. > zfs set atime=off tank > zfs create tank/mysql > zfs set recordsize=16k tank/mysql Very importantly, the recordsize should be set before first creating the database file. The recordsize becomes a property of the file. Even if one sets it to 16k, the file will continue to use 128k if that was the setting when it was created. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 01:45:54 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 11A32715 for ; Wed, 20 Mar 2013 01:45:54 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 9549726D for ; Wed, 20 Mar 2013 01:45:53 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id r2K1YGLu020314; Tue, 19 Mar 2013 20:34:17 -0500 (CDT) Date: Tue, 19 Mar 2013 20:34:16 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "Davide D'Amico" Subject: Re: FreBSD 9.1 and ZFS v28 performances In-Reply-To: <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> Message-ID: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 19 Mar 2013 20:34:17 -0500 (CDT) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 01:45:54 -0000 On Mon, 18 Mar 2013, Davide D'Amico wrote: >> >> While running the tests what sort of thing are you >> seeing from gstat, any disks maxing? If so primarily >> read or write? > Here the r/w pattern using zpool iostat 2: Using 'zpool iostat 2' is not likely to be very useful since zfs writes all of is data in bursts and may wait up to 5 seconds to do so. If your benchmark uses synchronous writes and does continous updates, then you should see zfs writing continiously to your zil device (or the pool). Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 03:36:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 975A6F43 for ; Wed, 20 Mar 2013 03:36:41 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 190AA979 for ; Wed, 20 Mar 2013 03:36:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363750592; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=uq19aj7NlNIj026w35pouYR8mMZm2AO78OwdjaF/bIE=; b=drcV6jMAkCCdyRTrspu3jAdbhbKqj01a14BfuGtMs95uQg2RwOlADA67OJmktRbp jPBMO1KIkf+VWMNxhLFSbMtkjdCGLRSfHOnLwn83+lzuVutd6ri88seDz+uclUCN v60OvwzmXFZsawwCYkX1SUh/B4fuEAmwCFqcVHuEarc=; Received: from [213.92.90.12] ([213.92.90.12:61650] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id DD/B6-24145-0CE29415; Wed, 20 Mar 2013 04:36:32 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UI9pI-0000fY-EI for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 04:36:32 +0100 Received: (qmail 2570 invoked by uid 80); 20 Mar 2013 03:36:32 -0000 To: Bob Friesenhahn Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.227 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Wed, 20 Mar 2013 04:36:32 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> Message-ID: <4930f6fddf6a995051bc6554d1a6a6b7@sys.tomatointeractive.it> X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 03:36:41 -0000 Il 20.03.2013 02:37 Bob Friesenhahn ha scritto: > On Mon, 18 Mar 2013, Steven Hartland wrote: >> >> Oh and another thing if this is mysql did you set the right settings >> for your ZFS volume e.g. >> zfs set atime=off tank >> zfs create tank/mysql >> zfs set recordsize=16k tank/mysql > > Very importantly, the recordsize should be set before first creating > the database file. The recordsize becomes a property of the file. > Even if one sets it to 16k, the file will continue to use 128k if that > was the setting when it was created. Well, after changing the recordsite property, I copied the file from an UFS partition (using cp -Rp): this should use recordsize=16k, right? Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 03:39:00 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B780F13C for ; Wed, 20 Mar 2013 03:39:00 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 2FAF8996 for ; Wed, 20 Mar 2013 03:39:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363750739; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=UiXrIpyjveheb2bnmOMcaWyzYnXnGehw/+R8Lg/NgHo=; b=wKOscCPFBblYQ4KEO+O+xQ4fxjlN//RhL7b45yIUjYHFEO6VbiWH5rMPC/0vM6UX MM2qstbVXU8t0sl9BGn1xo37/N/znJcfFQcVlE0eO4kxUuvon+T2Q+mHUzs5xFQN GqRdYuaXS7vjBWITg+zh5C3d/Y0YZyXRZnYqmMkXtOk=; Received: from [213.92.90.12] ([213.92.90.12:57676] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id 27/18-24145-35F29415; Wed, 20 Mar 2013 04:38:59 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UI9rf-0000j6-2j for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 04:38:59 +0100 Received: (qmail 2791 invoked by uid 80); 20 Mar 2013 03:38:59 -0000 To: Bob Friesenhahn Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.51 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Wed, 20 Mar 2013 04:38:58 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> Message-ID: <6d4c0172b3aa157a4e80212be4083966@sys.tomatointeractive.it> X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 03:39:00 -0000 Il 20.03.2013 02:34 Bob Friesenhahn ha scritto: > On Mon, 18 Mar 2013, Davide D'Amico wrote: >>> While running the tests what sort of thing are you >>> seeing from gstat, any disks maxing? If so primarily >>> read or write? >> Here the r/w pattern using zpool iostat 2: > > Using 'zpool iostat 2' is not likely to be very useful since zfs > writes all of is data in bursts and may wait up to 5 seconds to do so. > > If your benchmark uses synchronous writes and does continous updates, > then you should see zfs writing continiously to your zil device (or > the pool). And so, considering I'm using an SSD as ZIL device, I don't understand why ZFS performances are so slow (0.5x UFS performances). My benchmark is a set of 50k queries (select, insert, updates) that 'stress' the FS more than the 'simple' oltp tests. Thank, d. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 10:45:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DF99987F for ; Wed, 20 Mar 2013 10:45:49 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 56A4BE2A for ; Wed, 20 Mar 2013 10:45:48 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r2KAjbAn072541 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 20 Mar 2013 12:45:37 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <51499351.1040406@digsys.bg> Date: Wed, 20 Mar 2013 12:45:37 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130304 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <4930f6fddf6a995051bc6554d1a6a6b7@sys.tomatointeractive.it> In-Reply-To: <4930f6fddf6a995051bc6554d1a6a6b7@sys.tomatointeractive.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 10:45:49 -0000 On 20.03.13 05:36, Davide D'Amico wrote: > Il 20.03.2013 02:37 Bob Friesenhahn ha scritto: >> On Mon, 18 Mar 2013, Steven Hartland wrote: >>> >>> Oh and another thing if this is mysql did you set the right settings >>> for your ZFS volume e.g. >>> zfs set atime=off tank >>> zfs create tank/mysql >>> zfs set recordsize=16k tank/mysql >> >> Very importantly, the recordsize should be set before first creating >> the database file. The recordsize becomes a property of the file. >> Even if one sets it to 16k, the file will continue to use 128k if that >> was the setting when it was created. > > Well, after changing the recordsite property, I copied the file from > an UFS partition (using cp -Rp): this should use recordsize=16k, right? Perhaps, if you delete the file, or preferably the entire ZFS dataset first. Copying an file over another existing, does not change anything with the destination file except it's contents and modification times. As is always with changing settings, it is safer to just create the entire data set from scratch, with the new settings. Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 10:59:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 96DECA10 for ; Wed, 20 Mar 2013 10:59:14 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 211AFE9F for ; Wed, 20 Mar 2013 10:59:13 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r2KAxACu075535 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 20 Mar 2013 12:59:10 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <5149967E.4050900@digsys.bg> Date: Wed, 20 Mar 2013 12:59:10 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130304 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: When will we see TRIM support for GELI volumes ? References: <51479D54.1040509@gibfest.dk> <20130319000232.GA18711@neutralgood.org> <5147BB5C.7020205@gibfest.dk> In-Reply-To: <5147BB5C.7020205@gibfest.dk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 10:59:14 -0000 On 19.03.13 03:11, Thomas Steen Rasmussen wrote: > Have you tried using an SSD without TRIM support ? It really is > awfully slow, I'm talking 10-20-30 seconds freezes while the disk is > writing. There are different SSD disks, just as there are different beers. Many SSDs that do not offer TRIP (or you not use it) perform just fine and you can hardly ever saturate them on a laptop. These just cost more. Having said that, there should be really way to use cheaper components, that greatly benefit (to a point) from TRIM. But, you have to balance your act. By the way, many SSDs perform awfully on writes with an sector size of 512b. Try setting the sector size to 4096 (for example) and see if this will make any difference for you. The comment before about TRIM being bad idea with encrypted storage is very valid. You don't want anyone to know the layout of the data on the drive. Considering, that today anyone can have access to huge computing farms, anything that can make the task of decrypting more difficult is more than welcome. If you want to be safe, just use more performant drive and encrypt it all, with no gaps. The bigger the drive, the safer your data is. Of course, as with everything UNIX, you should be allowed to shoot yourself in the foot. Maybe name the sysctl that activates TRIM on GELI something like kern.geom.eli.insecure_trim :) Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 12:24:26 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B17DFEE2 for ; Wed, 20 Mar 2013 12:24:26 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 461A9343 for ; Wed, 20 Mar 2013 12:24:25 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UII41-0003bO-L5 for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 13:24:18 +0100 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UII41-0001qs-EY for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 13:24:17 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <42B9D942BA134E16AFDDB564858CA007@multiplay.co.uk> <1bfdea0efb95a7e06554dadf703d58e7@sys.tomatointeractive.it> Date: Wed, 20 Mar 2013 13:24:17 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: User-Agent: Opera Mail/12.14 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: ++++++++ X-Spam-Score: 8.2 X-Spam-Status: Yes, score=8.2 required=5.0 tests=BAYES_40, IN_PBL_AND_BAYES_40, RCVD_IN_SBL autolearn=disabled version=3.3.1 X-Spam-Flag: YES X-Scan-Signature: a8ecdd0179e5342c74548fafd5461917 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 12:24:26 -0000 On Wed, 20 Mar 2013 02:34:16 +0100, Bob Friesenhahn wrote: > On Mon, 18 Mar 2013, Davide D'Amico wrote: >>> While running the tests what sort of thing are you >>> seeing from gstat, any disks maxing? If so primarily >>> read or write? >> Here the r/w pattern using zpool iostat 2: > > Using 'zpool iostat 2' is not likely to be very useful since zfs writes > all of is data in bursts and may wait up to 5 seconds to do so. > > If your benchmark uses synchronous writes and does continous updates, > then you should see zfs writing continiously to your zil device (or the > pool). > > Bob Zpool iostat 2 still prints the per-second-speed. So it is the amount of data divided by 2. It is easier to reason about it if you use zpool iostat 1. Ronald. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 12:32:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6D7E1FF4 for ; Wed, 20 Mar 2013 12:32:21 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id E3A7C397 for ; Wed, 20 Mar 2013 12:32:20 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UIIBn-0004dS-KG for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 13:32:20 +0100 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UIIBm-0002Io-OE for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 13:32:18 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <4930f6fddf6a995051bc6554d1a6a6b7@sys.tomatointeractive.it> <51499351.1040406@digsys.bg> Date: Wed, 20 Mar 2013 13:32:19 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <51499351.1040406@digsys.bg> User-Agent: Opera Mail/12.14 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: +++++ X-Spam-Score: 5.2 X-Spam-Status: Yes, score=5.2 required=5.0 tests=BAYES_20, RCVD_IN_SBL autolearn=disabled version=3.3.1 X-Spam-Flag: YES X-Scan-Signature: 938925967a2432a0d8c7279c30be63be X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 12:32:21 -0000 On Wed, 20 Mar 2013 11:45:37 +0100, Daniel Kalchev wrote: > > On 20.03.13 05:36, Davide D'Amico wrote: >> Il 20.03.2013 02:37 Bob Friesenhahn ha scritto: >>> On Mon, 18 Mar 2013, Steven Hartland wrote: >>>> >>>> Oh and another thing if this is mysql did you set the right settings >>>> for your ZFS volume e.g. >>>> zfs set atime=off tank >>>> zfs create tank/mysql >>>> zfs set recordsize=16k tank/mysql >>> >>> Very importantly, the recordsize should be set before first creating >>> the database file. The recordsize becomes a property of the file. >>> Even if one sets it to 16k, the file will continue to use 128k if that >>> was the setting when it was created. >> >> Well, after changing the recordsite property, I copied the file from an >> UFS partition (using cp -Rp): this should use recordsize=16k, right? > > Perhaps, if you delete the file, or preferably the entire ZFS dataset > first. Copying an file over another existing, does not change anything > with the destination file except it's contents and modification times. > As is always with changing settings, it is safer to just create the > entire data set from scratch, with the new settings. > > Daniel ZFS never overwrites contents of a files. It always allocates new blocks. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 13:02:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D0BE8912 for ; Wed, 20 Mar 2013 13:02:07 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 6340C6FA for ; Wed, 20 Mar 2013 13:02:07 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UIIea-0008He-Nl for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 14:02:05 +0100 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UIIea-00047h-Ge for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 14:02:04 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: FreBSD 9.1 and ZFS v28 performances References: <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> <4930f6fddf6a995051bc6554d1a6a6b7@sys.tomatointeractive.it> <51499351.1040406@digsys.bg> <20130320124501.GA60926@neutralgood.org> Date: Wed, 20 Mar 2013 14:02:04 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <20130320124501.GA60926@neutralgood.org> User-Agent: Opera Mail/12.14 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: ++++++++ X-Spam-Score: 8.2 X-Spam-Status: Yes, score=8.2 required=5.0 tests=BAYES_40, IN_PBL_AND_BAYES_40, RCVD_IN_SBL autolearn=disabled version=3.3.1 X-Spam-Flag: YES X-Scan-Signature: 76f3589a93270604ea078d468a2051b3 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 13:02:07 -0000 On Wed, 20 Mar 2013 13:45:01 +0100, wrote: > On Wed, Mar 20, 2013 at 01:32:19PM +0100, Ronald Klop wrote: >> On Wed, 20 Mar 2013 11:45:37 +0100, Daniel Kalchev >> wrote: >> >> Well, after changing the recordsite property, I copied the file from >> an >> >> UFS partition (using cp -Rp): this should use recordsize=16k, right? >> > >> > Perhaps, if you delete the file, or preferably the entire ZFS dataset >> > first. Copying an file over another existing, does not change anything >> > with the destination file except it's contents and modification times. >> > As is always with changing settings, it is safer to just create the >> > entire data set from scratch, with the new settings. >> > >> > Daniel >> >> ZFS never overwrites contents of a files. It always allocates new >> blocks. > > True but not really relevant. > > Taking an existing file, truncating it to length zero, and then putting > data into it results in the same file having different contents. But > deleting the existing file, creating a new file, and putting data into it > gives you (like I said) a new/different file. This is true with both UFS > and ZFS. > > Applications don't care that ZFS does COW under the hood. Applications > care that the observed behavior of ZFS be similar to UFS. > It is relevant. After changing the recordsize all new blocks will get the new recordsize. The discussion was not about if a file is the same one or not. It is about if the recordsize changes. And recreating the volume/pool is not needed for that. Ronald. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 13:04:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 000519B7 for ; Wed, 20 Mar 2013 13:04:03 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.31.35]) by mx1.freebsd.org (Postfix) with ESMTP id B25C771F for ; Wed, 20 Mar 2013 13:04:03 +0000 (UTC) Received: from [84.44.154.73] (helo=fabiankeil.de) by smtprelay01.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1UIIak-0000iK-9a; Wed, 20 Mar 2013 13:58:06 +0100 Date: Wed, 20 Mar 2013 13:57:59 +0100 From: Fabian Keil To: Daniel Kalchev Subject: Re: When will we see TRIM support for GELI volumes ? Message-ID: <20130320135759.48b5dba8@fabiankeil.de> In-Reply-To: <5149967E.4050900@digsys.bg> References: <51479D54.1040509@gibfest.dk> <20130319000232.GA18711@neutralgood.org> <5147BB5C.7020205@gibfest.dk> <5149967E.4050900@digsys.bg> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/SHPNJFjjZ8J0EguYFKqh0Ie"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 13:04:04 -0000 --Sig_/SHPNJFjjZ8J0EguYFKqh0Ie Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Daniel Kalchev wrote: > The comment before about TRIM being bad idea with encrypted storage is=20 > very valid. You don't want anyone to know the layout of the data on the=20 > drive. Considering, that today anyone can have access to huge computing=20 > farms, anything that can make the task of decrypting more difficult is=20 > more than welcome. If you want to be safe, just use more performant=20 > drive and encrypt it all, with no gaps. The bigger the drive, the safer=20 > your data is. Why would it be safer? I agree that there might be scenarios in which one might not want to disclose how much of the disk is used for actual data, but I'd expect brute force attacks to concentrate on getting the master key instead of dealing with every sector on its own. As long as a single provider is used encrypting more data shouldn't make this attack harder. Trimming could decrease the chances of recovering a previous copy of the master key, though. Are you aware of other attacks on geli? Fabian --Sig_/SHPNJFjjZ8J0EguYFKqh0Ie Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlFJsl4ACgkQBYqIVf93VJ047QCgrKwKynMskvwBFho1mDR8515S c70AoIZHBg2TjmIfTagkTbTc9dU7jHVH =rX3n -----END PGP SIGNATURE----- --Sig_/SHPNJFjjZ8J0EguYFKqh0Ie-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 13:07:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DC0DBA77 for ; Wed, 20 Mar 2013 13:07:52 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 85B3F74A for ; Wed, 20 Mar 2013 13:07:52 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UIIkA-0000o4-5c for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 14:07:51 +0100 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UIIk9-0004Ha-W6 for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 14:07:49 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Subject: Re: best freebsd version for zfs file server References: <5148CB42.6090001@cse.yorku.ca> Date: Wed, 20 Mar 2013 14:07:50 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <5148CB42.6090001@cse.yorku.ca> User-Agent: Opera Mail/12.14 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: ++++++++ X-Spam-Score: 8.2 X-Spam-Status: Yes, score=8.2 required=5.0 tests=BAYES_40, IN_PBL_AND_BAYES_40, RCVD_IN_SBL autolearn=disabled version=3.3.1 X-Spam-Flag: YES X-Scan-Signature: 897836312160ed0141c32cdc6ac56212 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 13:07:52 -0000 On Tue, 19 Mar 2013 21:32:02 +0100, Jason Keltz wrote: > Hi. > I hope to soon put into production a new file server hosting many ZFS > filesystem with FreeBSD. The system has 2 x 9205-8e cards, and 1 x > 9207-8i card and 24 x 900 GB 10K RPM drives. I'm trying to figure out > what is ultimately the "best" version of FreeBSD to run on a production > file server. I believe that it doesn't make sense to stick directly to > the 9.1/release because there have already been many ZFS problems that > were solved in 9.1/stable. On the other hand, stable doesn't > necessarily have to be "stable"! Of course "release" might not be > "stable" either if there's a bug that say, causes a hang on my > controller card, and it's not fixed in anything but "stable"! Yet, > "stable" might "break" something else. I'm wondering what people who > are running FreeBSD file servers in production do -- do you track > individual changes, and compile release + individual bug fixes that > likely affect you, or, in my case, if I run "stable", do all my testing > with "stable", do I run that version of stable, and only attempt to > upgrade to the next "stable" release while very carefully reviewing the > bug list, then holding my breath when the server comes up? Any > recommendations would be appreciated. I know there are a lot of people > who are happily running FreeBSD file servers. :) I would run 9-RELEASE until there is a really (really really) good reason to do otherwise. There is no reason to get h*rny about every feature or every additional commit if just serving files works really well. Ronald. > > Jason. > > On 03/19/2013 03:04 PM, Dmitry Morozovsky wrote: >> On Tue, 19 Mar 2013, Tom Evans wrote: >> >>>> I'm currently in process of making new backup server, based on LSI >>>> 9260 >>>> controller. I'm planning to use ZFS over disks, hence the most >>>> natural way >>>> seems to configure mfi to JBOD mode - but I can't find easy way to >>>> reach this, >>>> neither in BIOS utilities nor via MegaCli >>> 9260 should be SAS-2008 based, so mps(4) not mfi(4). >> Well, it at least detected by stable/9 GENERIC as mfi >> >>> The internet[1] suggests that this card should be flashable to a >>> 9211-8i with IT mode firmware, which is just about the ultimate ZFS >>> card, instant-JBOD on inserting a disk, passthru for SMART, high >>> performance, etc. >> Will check, thanks for the reference. >> >>> [1] >>> http://blog.grem.de/sysadmin/LSI-SAS2008-Flashing-2012-04-12-22-17.html >>> > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 13:34:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 357B14EE for ; Wed, 20 Mar 2013 13:34:03 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id CD5D58AE for ; Wed, 20 Mar 2013 13:34:02 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76) (envelope-from ) id 1UIJ9V-0002Qc-IN for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 09:34:01 -0400 Message-ID: <5149BAC9.9080609@cse.yorku.ca> Date: Wed, 20 Mar 2013 09:34:01 -0400 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: best freebsd version for zfs file server References: <5148CB42.6090001@cse.yorku.ca> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: On 03/20/2013 09:07 AM, Ronald Klop wrote: > On Tue, 19 Mar 2013 21:32:02 +0100, Jason Keltz wrote: > >> Hi. >> I hope to soon put into production a new file server hosting many ZFS >> filesystem with FreeBSD. The system has 2 x 9205-8e cards, and 1 x >> 9207-8i card and 24 x 900 GB 10K RPM drives. I'm trying to figure out >> what is ultimately the "best" version of FreeBSD to run on a >> production file server. I believe that it doesn't make sense to >> stick directly to the 9.1/release because there have already been >> many ZFS problems that were solved in 9.1/stable. On the other hand, >> stable doesn't necessarily have to be "stable"! Of course "release" >> might not be "stable" either if there's a bug that say, causes a hang >> on my controller card, and it's not fixed in anything but "stable"! >> Yet, "stable" might "break" something else. I'm wondering what >> people who are running FreeBSD file servers in production do -- do >> you track individual changes, and compile release + individual bug >> fixes that likely affect you, or, in my case, if I run "stable", do >> all my testing with "stable", do I run that version of stable, and >> only attempt to upgrade to the next "stable" release while very >> carefully reviewing the bug list, then holding my breath when the >> server comes up? Any recommendations would be appreciated. I know >> there are a lot of people who are happily running FreeBSD file >> servers. :) > > I would run 9-RELEASE until there is a really (really really) good > reason to do otherwise. > There is no reason to get h*rny about every feature or every > additional commit if just serving files works really well. > > Ronald. [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 13:34:03 -0000 On 03/20/2013 09:07 AM, Ronald Klop wrote: > On Tue, 19 Mar 2013 21:32:02 +0100, Jason Keltz wrote: > >> Hi. >> I hope to soon put into production a new file server hosting many ZFS >> filesystem with FreeBSD. The system has 2 x 9205-8e cards, and 1 x >> 9207-8i card and 24 x 900 GB 10K RPM drives. I'm trying to figure out >> what is ultimately the "best" version of FreeBSD to run on a >> production file server. I believe that it doesn't make sense to >> stick directly to the 9.1/release because there have already been >> many ZFS problems that were solved in 9.1/stable. On the other hand, >> stable doesn't necessarily have to be "stable"! Of course "release" >> might not be "stable" either if there's a bug that say, causes a hang >> on my controller card, and it's not fixed in anything but "stable"! >> Yet, "stable" might "break" something else. I'm wondering what >> people who are running FreeBSD file servers in production do -- do >> you track individual changes, and compile release + individual bug >> fixes that likely affect you, or, in my case, if I run "stable", do >> all my testing with "stable", do I run that version of stable, and >> only attempt to upgrade to the next "stable" release while very >> carefully reviewing the bug list, then holding my breath when the >> server comes up? Any recommendations would be appreciated. I know >> there are a lot of people who are happily running FreeBSD file >> servers. :) > > I would run 9-RELEASE until there is a really (really really) good > reason to do otherwise. > There is no reason to get h*rny about every feature or every > additional commit if just serving files works really well. > > Ronald. Hi Ronald, I'm not at all concerned about new functionality, or even minor bug fixes to general O/S commands that I likely won't be using on the server anyway. That obviously leaves "current" out of the question, especially for a production server. That being said, stability changes with respect to ZFS (of which there are already several in 9.1-stable as Freddie pointed out) are what I'm after. I wish there was a better separation in FreeBSD between the "critical" versus "not so critical" patches ... something like "release" that I can always download and know that I can't really "go wrong"... the difference between adding functionality, and fixing critical bugs in existing functionality. I might be misunderstanding the whole concept, but it's probably what puzzles me more than anything about FreeBSD. I'm coming from the RHEL world where I can rely on vendor binary kernels to fix serious bugs without "adding" new functionality. Sometimes, things break, but in general, it's all pretty good... Jason. The truth is, on my other machines, I'm using to running RHEL where there are frequent binary ker From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 14:20:03 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 994DE72 for ; Wed, 20 Mar 2013 14:20:03 +0000 (UTC) (envelope-from rainer@ultra-secure.de) Received: from mail.ultra-secure.de (mail.ultra-secure.de [78.47.114.122]) by mx1.freebsd.org (Postfix) with ESMTP id 0520DAD2 for ; Wed, 20 Mar 2013 14:20:02 +0000 (UTC) Received: (qmail 60277 invoked by uid 89); 20 Mar 2013 14:15:22 -0000 Received: by simscan 1.4.0 ppid: 60272, pid: 60274, t: 0.0744s scanners: attach: 1.4.0 clamav: 0.97.3/m:54/d:16876 Received: from unknown (HELO suse3) (rainer@ultra-secure.de@212.71.117.1) by mail.ultra-secure.de with ESMTPA; 20 Mar 2013 14:15:22 -0000 Date: Wed, 20 Mar 2013 15:15:21 +0100 From: Rainer Duffner To: freebsd-fs@freebsd.org Subject: Re: best freebsd version for zfs file server Message-ID: <20130320151521.1b0e00b0@suse3> In-Reply-To: <5149BAC9.9080609@cse.yorku.ca> References: <5148CB42.6090001@cse.yorku.ca> <5149BAC9.9080609@cse.yorku.ca> X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; x86_64-suse-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 14:20:03 -0000 > I'm not at all concerned about new functionality, or even minor bug > fixes to general O/S commands that I likely won't be using on the > server anyway. That obviously leaves "current" out of the question, > especially for a production server. That being said, stability > changes with respect to ZFS (of which there are already several in > 9.1-stable as Freddie pointed out) are what I'm after. I wish there > was a better separation in FreeBSD between the "critical" versus "not > so critical" patches ... something like "release" that I can always > download and know that I can't really "go wrong"... the difference > between adding functionality, and fixing critical bugs in existing > functionality. I might be misunderstanding the whole concept, but > it's probably what puzzles me more than anything about FreeBSD. I'm > coming from the RHEL world where I can rely on vendor binary kernels > to fix serious bugs without "adding" new functionality. Sometimes, > things break, but in general, it's all pretty good... AFAIK, the reason why this "stable-stable" version of FreeBSD does not exist is purely due to the lack of resources (money mostly). If somebody would pay for it, it would happen. Personally, I'm glad FreeBSD get's out one release per year but it would be interesting to know if somebody from the Foundation has done the maths on this and could come up with an estimate about the sort of (financial) commitment this would require. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 18:01:21 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F04CCFBA for ; Wed, 20 Mar 2013 18:01:21 +0000 (UTC) (envelope-from dean.jones@oregonstate.edu) Received: from smtp1.oregonstate.edu (smtp1.oregonstate.edu [128.193.15.35]) by mx1.freebsd.org (Postfix) with ESMTP id CB7D398C for ; Wed, 20 Mar 2013 18:01:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.oregonstate.edu (Postfix) with ESMTP id ACD373E980 for ; Wed, 20 Mar 2013 11:01:15 -0700 (PDT) X-Virus-Scanned: amavisd-new at oregonstate.edu Received: from smtp1.oregonstate.edu ([127.0.0.1]) by localhost (smtp.oregonstate.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JgtmPtKbbQTg for ; Wed, 20 Mar 2013 11:01:15 -0700 (PDT) Received: from mail-ie0-f173.google.com (mail-ie0-f173.google.com [209.85.223.173]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by smtp1.oregonstate.edu (Postfix) with ESMTPSA id 613403E972 for ; Wed, 20 Mar 2013 11:01:15 -0700 (PDT) Received: by mail-ie0-f173.google.com with SMTP id 9so2432961iec.32 for ; Wed, 20 Mar 2013 11:01:14 -0700 (PDT) X-Received: by 10.50.170.36 with SMTP id aj4mr25168igc.4.1363802474591; Wed, 20 Mar 2013 11:01:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.33.161 with HTTP; Wed, 20 Mar 2013 11:00:54 -0700 (PDT) In-Reply-To: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> References: <6B3D0B04-9DCE-47A4-A582-08DD640E5676@deman.com> From: Dean Jones Date: Wed, 20 Mar 2013 11:00:54 -0700 Message-ID: Subject: Re: FreeBSD & no single point of failure file service To: "freebsd-fs@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 18:01:22 -0000 On Sat, Mar 16, 2013 at 5:48 PM, Michael DeMan wrote: snip > In all honesty some kind of 3rd party designed solution with only minimal > support would be fine for us, but I don't think that is their regular > market. > Check out these guys: http://www.high-availability.com/zfs-ha-plugin/ From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 19:52:04 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CAEF62AE for ; Wed, 20 Mar 2013 19:52:04 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 32193122 for ; Wed, 20 Mar 2013 19:52:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; d=contactlab.it; s=clab1; c=relaxed/relaxed; q=dns/txt; i=@contactlab.it; t=1363809120; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=qElUqbXG2aG52j74Zu7lVRsmEcweSRYv8gVwZaoOScY=; b=rqnI3ojVAibOnhcg0fSP1nD0qS5GnGHaoHCucIUzGjMDbucVZEvZinG1qoE4Bb7Z wQ1f4AHO1dRT0JaN6bE6W3PdRVwQRzC2P23dWU5TbcrZs8KzAECwNCaJsZG+Pojy mo3xVdUD5w0H+Bo2dQfePuSxtpK7fLYg0lBvgFRjfco=; Received: from [213.92.90.12] ([213.92.90.12:36364] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.5.1.37854 r(Momo-dev:3.5.1.0)) with ESMTP id F7/E1-24145-0631A415; Wed, 20 Mar 2013 20:52:00 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UIP3I-0009kq-A7 for freebsd-fs@freebsd.org; Wed, 20 Mar 2013 20:52:00 +0100 Received: (qmail 37494 invoked by uid 80); 20 Mar 2013 19:52:00 -0000 To: Bob Friesenhahn Subject: Re: FreBSD 9.1 and ZFS v28 performances X-PHP-Script: uebmeil.sys.tomatointeractive.it/index.php for 172.16.16.227 X-PHP-Originating-Script: 0:main.inc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Wed, 20 Mar 2013 20:52:00 +0100 From: Davide D'Amico Organization: ContactLab Mail-Reply-To: In-Reply-To: References: <514729BD.2000608@contactlab.com> <810E5C08C2D149DBAC94E30678234995@multiplay.co.uk> <51473D1D.3050306@contactlab.com> <1DD6360145924BE0ABF2D0979287F5F4@multiplay.co.uk> <51474F2F.5040003@contactlab.com> <51475267.1050204@contactlab.com> <514757DD.9030705@contactlab.com> Message-ID: X-Sender: davide.damico@contactlab.com User-Agent: Roundcube Webmail/0.8.5 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: davide.damico@contactlab.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 19:52:04 -0000 Il 20.03.2013 02:37 Bob Friesenhahn ha scritto: > On Mon, 18 Mar 2013, Steven Hartland wrote: >> >> Oh and another thing if this is mysql did you set the right settings >> for your ZFS volume e.g. >> zfs set atime=off tank >> zfs create tank/mysql >> zfs set recordsize=16k tank/mysql > > Very importantly, the recordsize should be set before first creating > the database file. The recordsize becomes a property of the file. > Even if one sets it to 16k, the file will continue to use 128k if that > was the setting when it was created. To recap, a collegue of mine was able to reach the same performances of UFS using a lot of *magic* knobs or sysctls such as vfs.zfs.txg.{synctime,timeout}, vfs.zfs.write_limit_override, txg.write_limit_override and vfs.zfs.zil_disable=0. So, the lesson for me is: with my hardware (R720 12core, 32GB ram), with my raid/disks setup, with the software I have to use (mysql-5.6.10-ent), with the dataset/payload I have to use I think I'll use UFS (I have a lot of other dbserver using UFS) or (but I will know this only tomorrow) I'll test CentOS on the same hardware to see how does it perform, so I'll post my questions and/or results on freebsd-performances ml :) Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 23:44:30 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A831199C for ; Wed, 20 Mar 2013 23:44:30 +0000 (UTC) (envelope-from prvs=1791dbe725=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 4CE97AFC for ; Wed, 20 Mar 2013 23:44:29 +0000 (UTC) Received: from r2d2 ([82.12.16.150]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002833862.msg for ; Wed, 20 Mar 2013 23:44:28 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 20 Mar 2013 23:44:28 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDDNSBL-Result: mail1.multiplay.co.uk, Wed, 20 Mar 2013 23:44:28 +0000 zen.spamhaus.org returned result of 127.0.0.11 X-MDRemoteIP: 82.12.16.150 X-Return-Path: prvs=1791dbe725=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.org Message-ID: From: "Steven Hartland" To: "Dmitry Morozovsky" , References: Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? Date: Wed, 20 Mar 2013 23:44:23 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 23:44:30 -0000 ----- Original Message ----- From: "Dmitry Morozovsky" > I'm currently in process of making new backup server, based on LSI 9260 > controller. I'm planning to use ZFS over disks, hence the most natural way > seems to configure mfi to JBOD mode - but I can't find easy way to reach this, > neither in BIOS utilities nor via MegaCli > > Any hints? I don't remember the model number, but you may find there's an IT mode FW which will make it report under mps and not mfi. If that's one of the versions which has no IT mode FW then you can try configuring JBOD under mfi with something like:- MegaCli -AdpSetProp -EnableJBOD -1 -aALL Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 20 23:46:14 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 61793A15; Wed, 20 Mar 2013 23:46:14 +0000 (UTC) (envelope-from prvs=1791dbe725=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id AFC43B0C; Wed, 20 Mar 2013 23:46:13 +0000 (UTC) Received: from r2d2 ([82.12.16.150]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002833897.msg; Wed, 20 Mar 2013 23:46:11 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 20 Mar 2013 23:46:11 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDDNSBL-Result: mail1.multiplay.co.uk, Wed, 20 Mar 2013 23:46:11 +0000 zen.spamhaus.org returned result of 127.0.0.11 X-MDRemoteIP: 82.12.16.150 X-Return-Path: prvs=1791dbe725=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <69BACDB971CC4008B0546E3D5824A51D@multiplay.co.uk> From: "Steven Hartland" To: , "Pawel Jakub Dawidek" References: <51479D54.1040509@gibfest.dk> <20130319082732.GB1367@garage.freebsd.pl> <1954349453.20130319225642@serebryakov.spb.ru> Subject: Re: When will we see TRIM support for GELI volumes ? Date: Wed, 20 Mar 2013 23:46:09 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Mar 2013 23:46:14 -0000 ----- Original Message ----- From: "Lev Serebryakov" To: "Pawel Jakub Dawidek" Cc: Sent: Tuesday, March 19, 2013 6:56 PM Subject: Re: When will we see TRIM support for GELI volumes ? > Hello, Pawel. > You wrote 19 марта 2013 г., 12:27:32: > > PJD> This is not what I see. On one of my SSDs in my laptop I've two > PJD> partitions, both running ZFS, but one of them on top of GELI. > PJD> I don't use ZFS TRIM yet, as I see no slowdown whatsoever. > It depends on your SSD controller and write rate. SandForce-based SSD > degrades badly without TRIM and can not recover performance by > themselves if here is a lot of writes. But modern SSD on Marvell, > Indilinx and LAMP-based SSD restore write performance after some idle > time, not so effectively as with TRIM, but to very good level. And > SF-2281 based SSD sucks in this area badly. Also make sure your running 503 FW or above on controllers running 5 series FW otherwise TRIM will be broken :( Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 05:10:22 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BBCFD8C0 for ; Thu, 21 Mar 2013 05:10:22 +0000 (UTC) (envelope-from rcartwri@asu.edu) Received: from mail-we0-x234.google.com (mail-we0-x234.google.com [IPv6:2a00:1450:400c:c03::234]) by mx1.freebsd.org (Postfix) with ESMTP id 5D63F8F6 for ; Thu, 21 Mar 2013 05:10:22 +0000 (UTC) Received: by mail-we0-f180.google.com with SMTP id k14so1944968wer.25 for ; Wed, 20 Mar 2013 22:10:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:x-gm-message-state; bh=i3B7tLjG9TSnjPgoIWHcPjgZkmFJrwKwvyPYRMx7kwk=; b=FKNl1Nq0QWcwou2ty5hyz73wTWbTZSC9ehTTfhEHnJ7/7yuiSOjVbCRnFiSqPwHdDy zXE6Y49YzoCv0+yKsB80vhi/ClsSXC8KPWSxYzyA1LcBpecLkBIdyKoolFfRO4WCU2L/ sNdS85tTROIy1NddPdRp18bn5xolBWwtugzB5RVHs8p5zQLiCc6TAxisqMVgnfrX/GNi o3e1SsV6MpB/0CawfMLrsywxsEDFpFJzjf0PjwmgL67U3lKTqWTuKy/bhZEfLMbgSBqk fA0nqDQb1Ir+qQ1WbFe14NVARkRlwAlAFm8Nv/B3tmzjmxjXTJdb6o709b610DUdn5kq 7Wpw== MIME-Version: 1.0 X-Received: by 10.194.109.35 with SMTP id hp3mr14732953wjb.15.1363842620273; Wed, 20 Mar 2013 22:10:20 -0700 (PDT) Received: by 10.180.198.2 with HTTP; Wed, 20 Mar 2013 22:10:20 -0700 (PDT) In-Reply-To: <20130321044557.GA15977@icarus.home.lan> References: <20130321044557.GA15977@icarus.home.lan> Date: Wed, 20 Mar 2013 22:10:20 -0700 Message-ID: Subject: Re: ZFS question From: "Reed A. Cartwright" To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQktkDXWaEd17hl403UoW+4VWKfU4fGfg5nGPIdWrqmYO3gCsHfLP9I8ZvKaxZMq2KAcEyGY Cc: freebsd-fs@freebsd.org, Quartz , freebsd-questions@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 05:10:22 -0000 Note that my issue seems to do with an interaction between the CAM system and the MPS driver in 9.1. Thus it is more than likely different than what you are experiencing Quartz. Now that ZFS deadman has been incorporated into stable, I'll probably give a 9.1 (i.e. 9/stable) another try. Jeremy, I have a question about enabling kernel dumps based on my current swap config. I currently have a 1TB drive split into 4 geli encrypted swap partitions (Freebsd doesn't like swap partitions over ~250 GB and I have lots of RAM). These partitions are UFS-swap partitions and are not backed by any mirroing or ZFSing. So, how do I best enable crash dumps? If I need to remove encryption, I can do that. On Wed, Mar 20, 2013 at 9:45 PM, Jeremy Chadwick wrote: > (Please keep me CC'd as I'm not subscribed to -questions) > > > Lots to say about this. > > 1. freebsd-fs is the proper list for filesystem-oriented questions of > this sort, especially for ZFS. > > 2. The issue you've described is experienced by some, and **not** > experienced by even more/just as many, so please keep that in mind. > Each/every person's situation/environment/issue has to be treated > separately/as unique. > > 3. You haven't provided any useful details, even in your follow-up post > here: > > http://lists.freebsd.org/pipermail/freebsd-questions/2013-March/249958.html > > All you've provided is a "general overview" with no technical details, > no actual data. You need to provide that data verbatim. You need to > provide: > > - Contents of /boot/loader.conf > - Contents of /etc/sysctl.conf > - Output from "zpool status" > - Output from "zpool get all" > - Output from "zfs get all" > - Output from "dmesg" (probably the most important) > - Output from "sysctl vfs.zfs kstat.zfs" > > I particularly tend to assist with disk-level problems, so if this turns > out to be a disk-level issue (and NOT a controller or controller driver > issue), I can help quite a bit with that. > > 4. I would **not** suggest rolling back to 9.0. This recommendation is > solves nothing -- if there is truly a bug/livelock issue, then that > needs to be tracked down. By rolling back, if there is an issue, you're > effectively ensuring it'll never get investigated or fixed, which means > you can probably expect to see this in 9.2, 9.3, or even 10.x onward. > > If you can't deal with the instability, or don't have the > time/cycles/interest to help track it down, that's perfectly okay too: > my recommendation is to go back to UFS (there's no shame in that). > > Else, as always, I strongly recommend running stable/9 (keep reading). > > 5. stable/9 (a.k.a. FreeBSD 9.1-STABLE) just recently (~5 days ago) > MFC'd an Illumos ZFS feature solely to help debug/troubleshoot this > exact type of situation: introduction of the ZFS deadmean thread. > Reference materials for what that is: > > http://svnweb.freebsd.org/base?view=revision&revision=248369 > http://svnweb.freebsd.org/base?view=revision&revision=247265 > https://www.illumos.org/issues/3246 > > The purpose of this feature (enabled by default) is to induce a kernel > panic when ZFS I/O stalls/hangs for unexpectedly long periods of time > (configurable via vfs.zfs.deadman_synctime). > > Once the panic happens (assuming your system is configured with a slice > dedicated to swap (ZFS-backed swap = bad bad bad) and use of > dumpdev="auto" in rc.conf), upon reboot the system should extract the > crash dump from swap and save it into /var/crash. At that point kernel > developers on the -fs list can help tell you *exactly* what to do with > kgdb(1) that can shed some light on what happened/where the issue may > lie. > > All that's assuming that the issue truly is ZFS waiting for I/O and not > something else (like ZFS internally spinning hard in its own code). > > Good luck, and let us know how you want to proceed. > > -- > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Mountain View, CA, US | > | Making life hard for others since 1977. PGP 4BD6C0CB | > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" -- Reed A. Cartwright, PhD Assistant Professor of Genomics, Evolution, and Bioinformatics School of Life Sciences Center for Evolutionary Medicine and Informatics The Biodesign Institute Arizona State University From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 06:06:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1DD82F8C for ; Thu, 21 Mar 2013 06:06:40 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16]) by mx1.freebsd.org (Postfix) with ESMTP id 02A8BA92 for ; Thu, 21 Mar 2013 06:06:40 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta01.emeryville.ca.mail.comcast.net with comcast id E64E1l0080x6nqcA166fHn; Thu, 21 Mar 2013 06:06:39 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta12.emeryville.ca.mail.comcast.net with comcast id E66e1l00H1t3BNj8Y66eYY; Thu, 21 Mar 2013 06:06:39 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id A860973A1C; Wed, 20 Mar 2013 23:06:38 -0700 (PDT) Date: Wed, 20 Mar 2013 23:06:38 -0700 From: Jeremy Chadwick To: "Reed A. Cartwright" Subject: Re: ZFS question Message-ID: <20130321060638.GA16997@icarus.home.lan> References: <20130321044557.GA15977@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1363845999; bh=jWURl9cCVQ/iQZkr2W9H0HhtchnKtuRe0xLji1lZSc4=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=QbtEmLtW8e2S2DR7N558D5KFDzwUlbF/KywR3K6wzsMmkwvtvmoo1zgMprvqT3kK9 /HNNSWJzh4xyRgf1CfVbPYvOUGKnnwUUxIZXe2k9vIlbLLwPr6qc5NCo7h6mDwj/p6 lRq4yIEzJPsU5dXRdpXD9qa7+xxJ9Nf/Gxz6YYQUQcWwUm6NKp1Uug8/Nu1tbvnPIH ugP0kiN1SL74awZHLX5034mjV+JNyHJTTA6eVwjVxUpD3EMqXHu+Xss9B2yfv7MuHo UGpRNtqCjEfusR96k8vLErls2Av5xp8k5M5gS3auUcfMiUe9LQipRGiwui+KTHQfBt Npiq0SGYG+LgA== Cc: freebsd-fs@freebsd.org, Quartz , freebsd-questions@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 06:06:40 -0000 On Wed, Mar 20, 2013 at 10:10:20PM -0700, Reed A. Cartwright wrote: > {snipped stuff about CAM and mps and ZFS deadman} > > Jeremy, I have a question about enabling kernel dumps based on my > current swap config. > > I currently have a 1TB drive split into 4 geli encrypted swap > partitions (Freebsd doesn't like swap partitions over ~250 GB and I > have lots of RAM). > > These partitions are UFS-swap partitions and are > not backed by any mirroing or ZFSing. > > So, how do I best enable crash dumps? If I need to remove encryption, > I can do that. I have zero familiarity with geli(8), gbde(8), and file-based swap. My gut feeling is that you cannot use this to achieve a proper kernel panic dump, but I have not tried it. You can force a kernel panic via "sysctl debug.kdb.panic=1". I'm not sure if an automatic memory dump to swap happens with the stock GENERIC kernel however. I can talk more about that if needed (it involves adding some options to your kernel config, and one rc.conf variable). Regarding "enabling crash dumps" as a general concept: In rc.conf you need to have dumpdev="auto" (or point it to a specific disk slice, but auto works just fine assuming you have a "swap" or "dump" device defines in /etc/fstab -- see savecore(8) man page). Full details are in rc.conf(5). How this works: After a system reboots, during rc script startup, rc.d/savecore runs savecore which examines the configured dumpdev for headers + tries to detect if there was previously a kernel panic. If it finds one, it begins pulling the data out of swap and writing the results directly to /var/crash in a series of files (again, see savecore(8)). It does this ***before*** swapon(8) is run (reason why should be obvious) via rc.d/swapon. After it finishes, swapon is run (meaning anything previously written to the swap slice is effectively lost), and the system continues through the rest of the rc scripts. Purely for educational purposes: to examine system rc script order, see rcorder(8) or run "rcorder /etc/rc.d/*". -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 08:53:08 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 12E32E9A for ; Thu, 21 Mar 2013 08:53:08 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta12.emeryville.ca.mail.comcast.net (qmta12.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:227]) by mx1.freebsd.org (Postfix) with ESMTP id 86CD33F4 for ; Thu, 21 Mar 2013 08:53:07 +0000 (UTC) Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51]) by qmta12.emeryville.ca.mail.comcast.net with comcast id E8sC1l00216AWCUAC8t6sF; Thu, 21 Mar 2013 08:53:06 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta06.emeryville.ca.mail.comcast.net with comcast id E8t51l0051t3BNj8S8t5mB; Thu, 21 Mar 2013 08:53:05 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 06DA373A1C; Thu, 21 Mar 2013 01:53:05 -0700 (PDT) Date: Thu, 21 Mar 2013 01:53:05 -0700 From: Jeremy Chadwick To: Quartz Subject: Re: ZFS question Message-ID: <20130321085304.GB16997@icarus.home.lan> References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <514AA192.2090006@sneakertech.com> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1363855986; bh=baLzKJJBCnjjQfirS66V/ar7JjtznQKqtHg4/ZOyjPY=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=bSA3vYK3Xx2tPNXn0/EbmUdYw6ETilVQQks6xp6wm98+bnd8NGVIuJbNhXggH7BoW RdgDCKAiFw54aAUQVYpqPLPHuNVfSvfoHqn1NxgYtpk1mMWOrOximL9BXTdPkXvNGw TCLhIr9CiB5R+d6cxHqgulLvjUsBl3tHDjBUYOS2LJwoDKGIBsMsQ7kVDGC+mDHNaS 5j0FnSvkoqyqyKIMOKtzouyU3OOkKTPkKw9ISAMal2sb1Z4Tt3s1HU9cnUZ5+XrEK/ qY+hKfP3nCl/3hSymmCVXX7NrWceei8kBFKH7I7KRF0nLgfAJTm9UoeuMrzveL3WS0 g4fGFAvl5wy5g== Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 08:53:08 -0000 On Thu, Mar 21, 2013 at 01:58:42AM -0400, Quartz wrote: > > >1. freebsd-fs is the proper list for filesystem-oriented questions of > >this sort, especially for ZFS. > > Ok, I'm assuming I should subscribe to that list and post there then? Correct. Cross-posting this thread to freebsd-fs (e.g. adding it to the CC line) is generally shunned. I've changed the CC line to use freebsd-fs@ instead, and will follow-up with freebsd-questions@ stating that the thread/discussion has been moved. I've also snipped the rest of our conversation because once I got to the very, VERY end of the convo and recapped what all has been said in this thread (how you reported the problem vs. what the problem is), I realise none of this really matters. I also don't want to get into a discussion about -RELEASE vs. -STABLE because I could practically write a book on the subject (particularly why -STABLE is a better choice). One thing I did want to discuss: > There are eight drives in the machine at the moment, and I'm not > messing with partitions yet because I don't want to complicate things. > (I will eventually be going that route though as the controller tends > to renumber drives in a first-come-first-serve order that makes some > things difficult). Solving this is easy, WITHOUT use of partitions or labels. There is a feature of CAM(4) called "wired down" or "wiring down", where you can in essence statically map a SATA port to a static device number regardless if a disk is inserted at the time the kernel boots (i.e. SATA port 0 on controller X is always ada2, SATA port 1 on controller X is always ada3, SATA port 0 on controller Y is always ada0, etc.). I've discussed how to do this many times over the years, including recently as well. It involves some lines in /boot/loader.conf. It can can sometimes be tricky to figure out depending on the type of controllers you're using, but you do the work/set this up *once* and never touch it again (barring changing brands of controllers). Trust me, it's really not that bad. I can help you with this, but I need to see a dmesg (everything from boot to the point mountroot gets done). > >All that's assuming that the issue truly is ZFS waiting for I/O and not > >something else > > Well, everything I've read so far indicates that zfs has issues when > dealing with un-writable pools, so I assume that's what's going on > here. Let's recap what was said; I'm sorry for hemming and hawing over what was said, but the way your phrased your issue/situation matters. This is how you described your problem initially: > I'm experiencing fatal issues with pools hanging my machine requiring a > hard-reset. This, to me, means something very different than what was described in a subsequent follow-up: > However, when I pop a third drive, the machine becomes VERY unstable. I > can nose around the boot drive just fine, but anything involving i/o > that so much as sneezes in the general direction of the pool hangs the > machine. Once this happens I can log in via ssh, but that's pretty much > it. > > The machine never recovers (at least, not inside 35 minutes, which is > the most I'm willing to wait). Reconnecting the drives has no effect. My > only option is to hard reset the machine with the front panel button. > Googling for info suggested I try changing the pool's "failmode" setting > from "wait" to "continue", but that doesn't appear to make any > difference. For reference, this is a virgin 9.1-release installed off > the dvd image with no ports or packages or any extra anything. So let's recap, along with some answers: S1. In your situation, when a ZFS pool loses enough vdev or vdev members to cause permanent pool damage (as in completely 100% unrecoverable, such as losing 3 disks of a raidz2 pool), any I/O to the pool results in that applications hanging. The system is still functional/usable (e.g. I/O to other pools and non-ZFS filesystems works fine), just that I/O to the now-busted pool hangs indefinitely. A1. This is because "failmode=wait" on the pool, which is the default property value. This is by design; there is no ZFS "timeout" for this sort of thing. "failmode=continue" is what you're looking for (keep reading). S2. If the pool uses "failmode=continue", there is no change in behaviour, (i.e. EIO is still never returned). A2. That sounds like a bug then. I test your claim below, and you might be surprised at the findings. S3. If the previously-yanked disks are reinserted, the issue remains. A3. What you're looking for is the "autoreplace" pool property. However, on FreeBSD, this property is in effect a no-op; manual intervention is always required to replace a disk ("zpool replace"). Solaris/Illumos/etc. don't have this problem because they have proper notification frameworks (fmd/FMF and SMF) that can make this happen. On FreeBSD, you could accomplish running "zpool replace" automatically with devd(8), but that's up to you. Now let's talk about the "failmode=continue" bug/issue. Here's a testbox I use for testing issues with CAM, ZFS, and other bits: root@testbox:/root # uname -a FreeBSD testbox.home.lan 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 root@testbox:/root # zpool create array raidz2 da1 da2 da3 da4 root@testbox:/root # zpool status pool: array state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM array ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 errors: No known data errors root@testbox:/root # zpool set failmode=continue array Now in another window, launching dd to do some gradual but continuous I/O, and use Ctrl-T (SIGUSR1) to get statuses: root@testbox:/root # dd if=/dev/zero of=/array/testfile bs=1 load: 0.00 cmd: dd 939 [running] 0.62r 0.00u 0.62s 5% 1508k 83348+0 records in 83347+0 records out 83347 bytes transferred in 0.620288 secs (134368 bytes/sec) Now I physically remove da4... root@testbox:/root # zpool status pool: array state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM array DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 9863791736611294808 REMOVED 0 0 0 was /dev/da4 errors: No known data errors dd is still transferring data: load: 0.53 cmd: dd 939 [running] 39.58r 0.55u 38.94s 100% 1512k 5792063+0 records in 5792062+0 records out 5792062 bytes transferred in 39.580059 secs (146338 bytes/sec) Now I physically remove da3... root@testbox:/root # zpool status pool: array state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM array DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 16564477967045696210 REMOVED 0 0 0 was /dev/da3 9863791736611294808 REMOVED 0 0 0 was /dev/da4 errors: No known data errors dd is still going: load: 0.81 cmd: dd 939 [running] 83.55r 1.28u 81.63s 100% 1512k 12537268+0 records in 12537267+0 records out 12537267 bytes transferred in 83.552147 secs (150053 bytes/sec) Now I physically remove da2... root@testbox:/root # zpool status pool: array state: DEGRADED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://illumos.org/msg/ZFS-8000-JQ scan: none requested config: NAME STATE READ WRITE CKSUM array DEGRADED 0 16 0 raidz2-0 DEGRADED 0 40 0 da1 ONLINE 0 0 0 da2 ONLINE 0 46 0 16564477967045696210 REMOVED 0 0 0 was /dev/da3 9863791736611294808 REMOVED 0 0 0 was /dev/da4 errors: 2 data errors, use '-v' for a list And in the other window where dd is running, it immediately terminates with EIO: dd: /array/testfile: Input/output error 22475027+0 records in 22475026+0 records out 22475026 bytes transferred in 150.249338 secs (149585 bytes/sec) root@testbox:/root # So at this point, I can safely say that ***actively running*** processes which are doing I/O to the pool DO get passed on EIO status. But just wait, the situation gets more interesting... One thing to note (and it's important) above is that da2 is still considered "ONLINE". More on that in a moment. I then decide to then issue some other I/O requests to /array (such as copying /array/testfile to /tmp), to see what the behaviour is in this state: root@testbox:/root # ls -l /array total 21984 -rw-r--r-- 1 root wheel 22475026 Mar 21 01:11 testfile How this ls worked is beyond me, since the pool is effectively broken. Possibly some of this is being pulled from the ARC or vnode caching, I don't know. Anyway, I decide to copy /array/testfile to /tmp to see what happens: root@testbox:/root # cp /array/testfile /tmp load: 0.00 cmd: cp 959 [tx->tx_sync_done_cv)] 4.88r 0.00u 0.10s 0% 2520k load: 0.00 cmd: cp 959 [tx->tx_sync_done_cv)] 7.02r 0.00u 0.10s 0% 2520k ^C^C^C^C^Z Clearly you can see here that a syscall of sorts is stuck indefinitely waiting on the kernel. Kernel call stack for cp: root@testbox:/root # procstat -kk 959 PID TID COMM TDNAME KSTACK 959 100090 cp - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 txg_wait_synced+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x8d vputx+0x2d8 vn_close+0xa4 vn_closefile+0x5d _fdrop+0x23 closef+0x52 kern_close+0x172 amd64_syscall+0x546 Xfast_syscall+0xf7 So while this is going on, I decide to reattach da2 with the plan of issuing "zpool replace array da2" -- sure, even though the pool is completely horked (data loss) at this point, I figure what the hell. Upon inserting da2, CAM and its related bits say nothing about device insertion. When da2 was removed, indeed there were messages. Hmm, this sounds reminiscent of something I've seen recently (keep reading): root@testbox:/root # camcontrol devlist at scbus1 target 0 lun 0 (pass0,cd0) at scbus2 target 0 lun 0 (pass1,da0) at scbus2 target 1 lun 0 (pass2,da1) at scbus2 target 2 lun 0 (pass3,da2) root@testbox:/root # ls -l /dev/da* crw-r----- 1 root operator 0, 88 Mar 21 00:52 /dev/da0 crw-r----- 1 root operator 0, 94 Mar 21 00:52 /dev/da0p1 crw-r----- 1 root operator 0, 95 Mar 21 00:52 /dev/da0p2 crw-r----- 1 root operator 0, 96 Mar 21 00:52 /dev/da0p3 crw-r----- 1 root operator 0, 89 Mar 21 00:52 /dev/da1 Notice no /dev/da2. So this shouldn't come as much of a surprise: root@testbox:/root # zpool replace array da2 cannot open 'da2': no such GEOM provider must be a full path or shorthand device name This would indicate a separate/different bug, probably in CAM or its related pieces. There were fixes for very similar situations to this in stable/9 recently -- I know because I was the person who reported such. mav@ and ken@ worked out a series of kinks/bugs in CAM pertaining to pass(4) and xpt(4) and some other things. You can read about that here: http://lists.freebsd.org/pipermail/freebsd-fs/2013-February/016515.html http://lists.freebsd.org/pipermail/freebsd-fs/2013-February/016524.html For me to determine if those fixes address the above oddity while testing, I would need to build stable/9 on this testbox. I can do that, and will try to dedicate some time to it tomorrow. So in summary: there seem to be multiple issues shown above, but I can confirm that failmode=continue **does** pass EIO to *running* processes that are doing I/O. Subsequent I/O, however, is questionable at this time. I'll end this Email with (hopefully) an educational statement: I hope my analysis shows you why very thorough, detailed output/etc. needs to be provided when reporting a problem, and not just some "general" description. This is why hard data/logs/etc. are necessary, and why every single step of the way needs to be provided, including physical tasks performed. P.S. -- I started this Email at 23:15 PDT. It's now 01:52 PDT. To whom should I send a bill for time rendered? ;-) -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 10:31:25 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 35BDCBC4 for ; Thu, 21 Mar 2013 10:31:25 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id B6850ADF for ; Thu, 21 Mar 2013 10:31:24 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r2LAVG3N070268; Thu, 21 Mar 2013 14:31:16 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Thu, 21 Mar 2013 14:31:16 +0400 (MSK) From: Dmitry Morozovsky To: Steven Hartland Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (woozle.rinet.ru [0.0.0.0]); Thu, 21 Mar 2013 14:31:16 +0400 (MSK) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 10:31:25 -0000 On Wed, 20 Mar 2013, Steven Hartland wrote: > ----- Original Message ----- From: "Dmitry Morozovsky" > > > I'm currently in process of making new backup server, based on LSI 9260 > > controller. I'm planning to use ZFS over disks, hence the most natural way > > seems to configure mfi to JBOD mode - but I can't find easy way to reach > > this, neither in BIOS utilities nor via MegaCli > > > > Any hints? > > I don't remember the model number, but you may find there's an > IT mode FW which will make it report under mps and not mfi. > > If that's one of the versions which has no IT mode FW then you > can try configuring JBOD under mfi with something like:- > > MegaCli -AdpSetProp -EnableJBOD -1 -aALL I did. No luck :( Well, it seems I'll have to create 12 RAOI0 volumes :-/ -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 10:54:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 34D0C5BE for ; Thu, 21 Mar 2013 10:54:06 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.8]) by mx1.freebsd.org (Postfix) with ESMTP id C6D5BD0B for ; Thu, 21 Mar 2013 10:54:05 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu2) with ESMTP (Nemesis) id 0LjO9b-1UqJmy0vZn-00cypS; Thu, 21 Mar 2013 11:53:59 +0100 Message-ID: <514AE6C6.6070702@brockmann-consult.de> Date: Thu, 21 Mar 2013 11:53:58 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Dmitry Morozovsky Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? References: In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:3bNy+a8jieaWdto3e+P0N67qZr+Bfkxn+9ZW9RGx9D6 OMtUtWYHxrZnnXOHGa4d1T3JfjGqdU3Zg6YO61+GXfY3/Qad57 nr+xKfhwr7F8ORrdkMsF+d4rFkxeMT3nUBTHQpV9SXQ4BQLlTL eIDYtqg2YADmw4XixWb6svEEYJxU+qq2Ky1jOIdtR7Be15FGM2 1VletXkRE5IkUlqrCI3EVqj2C9tH/ZUHOfyulhUYi0qEj7OKFD AATTdnaZEHCEChh8EKmZ1lICjoHKEATYYMjBex/WEeHnjYkmhz PLYgoRE4pWeBsZ56RaHCmYoc60PevRrll1VkqxoX/yd81eDiI7 W/Q0gXrmCdklODDP8fs77WaQUcm4ezY3rVbKB27PY Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 10:54:06 -0000 On 2013-03-21 11:31, Dmitry Morozovsky wrote: > Well, it seems I'll have to create 12 RAOI0 volumes :-/ If you do that, you still have vendor lock and all the rest of the problems. Why not just sell/trade your 1 raid card for 2 or 3 HBAs? If that's an 8 port one, it's something like $480 and an HBA is only around $230 (9211-8i). If you plan on permanently using this machine for pure ZFS, this is the best option. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 10:54:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A16CB5CC for ; Thu, 21 Mar 2013 10:54:11 +0000 (UTC) (envelope-from rincebrain@gmail.com) Received: from mail-ia0-x231.google.com (mail-ia0-x231.google.com [IPv6:2607:f8b0:4001:c02::231]) by mx1.freebsd.org (Postfix) with ESMTP id 74F82D10 for ; Thu, 21 Mar 2013 10:54:11 +0000 (UTC) Received: by mail-ia0-f177.google.com with SMTP id y25so2212021iay.8 for ; Thu, 21 Mar 2013 03:54:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=nRW4wQ6c8n0u+35VUcoLvXE+3cg7l9xOrDIwEKvXo1M=; b=YBB6ueafgi4NZxRqaLXpqKIy+dNb1JRvBecy0on+i9gfT1Rg8vTufRRk4wevXoFrG6 57uhL1iel4Yy12w/8mgdxpKzfEOFupUKkav5T76/JCRLum+mMAq+ZTuFPFKpViELpxDG xQmWBwD2G0M39rgHT8GFf56q271OVsv5pjDIbFA1keFeI0wM2RVIgg8+FUHjHFoGj6qS 9g71/cMG3Y/rjKJrxBuENFItCs2YKaBDnNhmVOJ8ePPWMWpVwfH6y+cY2LN/yd7irnCY rB9YqoJyl+nSX/R3KSROJPnY4ZCkNiEfOKEDaQ3RsFR6ZmS0BI25s7PR2gMe7ZyevnuF WwoA== MIME-Version: 1.0 X-Received: by 10.50.192.165 with SMTP id hh5mr1790892igc.89.1363863251098; Thu, 21 Mar 2013 03:54:11 -0700 (PDT) Sender: rincebrain@gmail.com Received: by 10.64.165.166 with HTTP; Thu, 21 Mar 2013 03:54:10 -0700 (PDT) In-Reply-To: References: Date: Thu, 21 Mar 2013 06:54:10 -0400 X-Google-Sender-Auth: qqKT7U4O_dYDUiRDh7CSQp-vwyE Message-ID: Subject: Re: LSI 9260: is there a way to configure it JBOD like mps? From: Rich To: Dmitry Morozovsky Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 10:54:11 -0000 On Thu, Mar 21, 2013 at 6:31 AM, Dmitry Morozovsky wrote: > On Wed, 20 Mar 2013, Steven Hartland wrote: > >> ----- Original Message ----- From: "Dmitry Morozovsky" >> >> > I'm currently in process of making new backup server, based on LSI 9260 >> > controller. I'm planning to use ZFS over disks, hence the most natural way >> > seems to configure mfi to JBOD mode - but I can't find easy way to reach >> > this, neither in BIOS utilities nor via MegaCli >> > >> > Any hints? >> >> I don't remember the model number, but you may find there's an >> IT mode FW which will make it report under mps and not mfi. >> >> If that's one of the versions which has no IT mode FW then you >> can try configuring JBOD under mfi with something like:- >> >> MegaCli -AdpSetProp -EnableJBOD -1 -aALL > > I did. No luck :( > > Well, it seems I'll have to create 12 RAOI0 volumes :-/ 924x will do JBOD, 926x will not. One of those annoying little caveats for no clear reason. But yes, I'd imagine you can destructively reflash it from DOS into a 9211-8i if it's the 2008 chipset. [If it's 2108/2208, that's a different FW but same concept.] - Rich From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 11:53:20 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E5824C91 for ; Thu, 21 Mar 2013 11:53:20 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de [80.67.31.97]) by mx1.freebsd.org (Postfix) with ESMTP id 7AF6595 for ; Thu, 21 Mar 2013 11:53:20 +0000 (UTC) Received: from [78.35.142.252] (helo=fabiankeil.de) by smtprelay05.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1UIe3U-0002qN-DQ; Thu, 21 Mar 2013 12:53:12 +0100 Date: Thu, 21 Mar 2013 12:53:07 +0100 From: Fabian Keil To: "Reed A. Cartwright" Subject: Re: ZFS question Message-ID: <20130321125307.131a8727@fabiankeil.de> In-Reply-To: References: <20130321044557.GA15977@icarus.home.lan> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/e72goy4LgjDLwcZ_ngRRHDu"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 11:53:21 -0000 --Sig_/e72goy4LgjDLwcZ_ngRRHDu Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable "Reed A. Cartwright" wrote: > Note that my issue seems to do with an interaction between the CAM > system and the MPS driver in 9.1. Thus it is more than likely > different than what you are experiencing Quartz. At least from your description it's not obvious to me why the problem should be caused by an interaction between CAM and MPS and not by one of the driver-independent ZFS deadlocks. If you haven't already, you might want to have a look at: https://wiki.freebsd.org/AvgZfsDeadlockDebug =20 > Now that ZFS deadman has been incorporated into stable, I'll probably > give a 9.1 (i.e. 9/stable) another try. At least on my system the ZFS deadman doesn't trigger in case of ZFS-internal deadlocks and I don't think it's supposed to either: https://www.illumos.org/issues/3246 Of course knowing whether or not it triggers in your case could still be useful. > Jeremy, I have a question about enabling kernel dumps based on my > current swap config. >=20 > I currently have a 1TB drive split into 4 geli encrypted swap > partitions (Freebsd doesn't like swap partitions over ~250 GB and I > have lots of RAM). These partitions are UFS-swap partitions and are > not backed by any mirroing or ZFSing. >=20 > So, how do I best enable crash dumps? If I need to remove encryption, > I can do that. Crash dumps can be written to any device and temporarily attaching an USB stick might be more convenient than using one of the "UFS-swap" partitions which you'll have to recreate afterwards. Dumping on geli providers should work, but I haven't tested it as I use one-time master keys for swap. Fabian --Sig_/e72goy4LgjDLwcZ_ngRRHDu Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlFK9KgACgkQBYqIVf93VJ2lmwCgh2To3nMFKJ8VdxtS2cuSWYUf Tp4An3dys3KTujJhY/gnWfjo1tsCQpgi =Z0QM -----END PGP SIGNATURE----- --Sig_/e72goy4LgjDLwcZ_ngRRHDu-- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 15:53:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6D1E0AF9 for ; Thu, 21 Mar 2013 15:53:11 +0000 (UTC) (envelope-from josh@signalboxes.net) Received: from mail-ob0-x22d.google.com (mail-ob0-x22d.google.com [IPv6:2607:f8b0:4003:c01::22d]) by mx1.freebsd.org (Postfix) with ESMTP id 368333C9 for ; Thu, 21 Mar 2013 15:53:11 +0000 (UTC) Received: by mail-ob0-f173.google.com with SMTP id dn14so3026389obc.18 for ; Thu, 21 Mar 2013 08:53:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=DksBotVl9R+toSWnHBukgd2F+JiNCVes5cwVOxBcEnk=; b=ib9BUP6tn/tWwgvT3Gm8y/dWrmcBRvmlfFx9/AF3JSxJFLFyRc4Vc4JUBzemfGOLk7 qRRv865LzYQIBJcdwhbyvA3E1WueaDKEXT9uiIQMc57CtEoGC49gQsyUWs+nOboaGWOH mdApglCurCC0yi+ABe0U4Rhp1YiIiG4vpHYmZJ/CrOo7W1q8zeKSS7aospwDc3SIHxau wrm5vSARhFxNM/i3HyNWbt/asef+7sYxdFsTfbcT52amC9ZaRnkRBDAEs6SVOAoHqXqP AVffcwNDMHRjL0Yan7Pb/9jM11LIOjs5xOetLxcnRhQhW3ayYGVnCyTTfIM/GldRMk29 NIqQ== X-Received: by 10.60.3.233 with SMTP id f9mr7191425oef.32.1363881190567; Thu, 21 Mar 2013 08:53:10 -0700 (PDT) Received: from mail-ob0-x22b.google.com (mail-ob0-x22b.google.com [2607:f8b0:4003:c01::22b]) by mx.google.com with ESMTPS id 4sm7395326obj.7.2013.03.21.08.53.10 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 21 Mar 2013 08:53:10 -0700 (PDT) Received: by mail-ob0-f171.google.com with SMTP id x4so2993093obh.2 for ; Thu, 21 Mar 2013 08:53:09 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.11.8 with SMTP id m8mr7297224oeb.22.1363881189756; Thu, 21 Mar 2013 08:53:09 -0700 (PDT) Received: by 10.60.62.168 with HTTP; Thu, 21 Mar 2013 08:53:09 -0700 (PDT) Date: Thu, 21 Mar 2013 09:53:09 -0600 Message-ID: Subject: ZFS + NFS poor performance after restarting from 100 day uptime From: Josh Beard To: freebsd-fs@freebsd.org X-Gm-Message-State: ALoCoQlIIvyVxMt1RGcbLtF07en2LxEFhYz8Pa4EQBgXIyPOI1RCVcGPkU0T2HCtZeIVgqgNBYyF Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 15:53:11 -0000 Hello, I have a system with 12 disks spread between 2 raidz1. I'm using the native ("new") NFS to export a pool on this. This has worked very well all along, but since a reboot, has performed horribly - unusably under load. The system was running 9.1-rc3 and I upgraded it to 9.1-release-p1 (GENERIC kernel) after ~110 days of running (with zero performance issues). After rebooting from the upgrade, I'm finding the disks seem constantly slammed. gstat reports 90-100% busy most of the day with only ~100-130 ops/s. I didn't change any settings in /etc/sysctl.conf or /boot/loader. No ZFS tuning, etc. I've looked at the commits between 9.1-rc3 and 9.1-release-p1 and I can't see any reason why simply upgrading it would cause this. Since I had no issues at all with the same configuration before a reboot, I'm reluctant to start tweaking things too much, as I didn't have to before. As far as I can tell, the disks are fine. They're attached to a 3ware 9650 RAID card, configured to pass through the disks. The disks are WD5000AAKS-22YGA0. This system is used by many users for network home directories. I should also say - the RAID controller is scheduled to do media checks weekly. This usually takes ~10 hours. The weekend after the restart, it took ~28 hours, but came back fine. Does anyone have any recommendations as to where I might start looking? I feel like this must have something to do with the controller card, as the software changes were none that should have any impact on the performance and the mere act of restarting the system (shutdown -r, not a true power cycle) seemed to cause this. For trials, I /have/ tried some common tuning (but reverted). Such as changing the values of vfs.zfs.vdev.max_pending, vfs.zfs.write_limit_override=1073741824. Even for comparison, I set sync=disabled (not permanently) on that zpool. Performance improved, but nothing significant, and I certainly don't want to have to start tuning these if I didn't need to before. Thanks! --- /boot/loader.conf: hw.em.num_queues=1 hw.usb.no_pf=1 vfs.zfs.arc_max="13958643712" # 24 GB RAM coretemp_load="YES" aio_load="YES" loader_logo="beastie" autoboot_delay="5" Nothing in /etc/sysctl.conf Output of zpool status: pool: store state: ONLINE scan: scrub repaired 0 in 7h51m with 0 errors on Sun Mar 17 03:51:58 2013 config: NAME STATE READ WRITE CKSUM store ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 da11 ONLINE 0 0 0 da12 ONLINE 0 0 0 spares da13 AVAIL errors: No known data errors A snip of gstat: dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0| cd0 0 1 0 0 0.0 1 32 0.2 0.0| da0 0 0 0 0 0.0 0 0 0.0 0.0| da0p1 0 1 0 0 0.0 1 32 0.2 0.0| da0p2 0 0 0 0 0.0 0 0 0.0 0.0| da0p3 4 160 126 1319 31.3 34 100 0.1 100.3| da1 4 146 110 1289 33.6 36 98 0.1 97.8| da2 4 142 107 1370 36.1 35 101 0.2 101.9| da3 4 121 95 1360 35.6 26 19 0.1 95.9| da4 4 151 117 1409 34.0 34 102 0.1 100.1| da5 4 141 109 1366 35.9 32 101 0.1 97.9| da6 4 136 118 1207 24.6 18 13 0.1 87.0| da7 4 118 102 1278 32.2 16 12 0.1 89.8| da8 4 138 116 1240 33.4 22 55 0.1 100.0| da9 4 133 117 1269 27.8 16 13 0.1 86.5| da10 4 121 102 1302 53.1 19 51 0.1 100.0| da11 4 120 99 1242 40.7 21 51 0.1 99.7| da12 dmesg (note the 100.000MB/s transfers. That seems low(?) However, looking at older logs, that's always been the case): Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.1-RELEASE-p1 #0 r247731: Sun Mar 3 15:11:33 MST 2013 root@topeka:/usr/obj/usr/src/sys/GENERIC amd64 CPU: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz (2133.45-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x106a5 Family = 6 Model = 1a Stepping = 5 Features=0xbfebfbff Features2=0x9ce3bd AMD Features=0x28100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 25773998080 (24580 MB) avail memory = 24795668480 (23646 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <010511 APIC1122> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 2 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 2 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 6 cpu4 (AP): APIC ID: 16 cpu5 (AP): APIC ID: 18 cpu6 (AP): APIC ID: 20 cpu7 (AP): APIC ID: 22 ioapic0: Changing APIC ID to 1 ioapic1: Changing APIC ID to 3 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard kbd1 at kbdmux0 ctl: CAM Target Layer loaded acpi0: on motherboard acpi0: Power Button (fixed) acpi0: reservation of 400, 100 (3) failed cpu0: on acpi0 ACPI Warning: Incorrect checksum in table [OEMB] - 0x90, should be 0x8D (20110527/tbutils-282) cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 cpu4: on acpi0 cpu5: on acpi0 cpu6: on acpi0 cpu7: on acpi0 attimer0: port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: port 0x70-0x71 irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 450 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 Event timer "HPET3" frequency 14318180 Hz quality 440 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: port 0xcf8-0xcff iomem 0xfed40000-0xfed44fff on acpi0 pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 3ware device driver for 9000 series storage controllers, version: 3.80.06.003 twa0: <3ware 9000 series Storage Controller> port 0xa800-0xa8ff mem 0xf6000000-0xf7ffffff,0xfb9de000-0xfb9defff irq 28 at device 0.0 on pci1 twa0: INFO: (0x04: 0x0053): Battery capacity test is overdue: twa0: INFO: (0x15: 0x1300): Controller details:: Model 9650SE-16ML, 16 ports, Firmware FE9X 4.10.00.024, BIOS BE9X 4.08.00.004 pcib2: at device 3.0 on pci0 pci2: on pcib2 pcib3: at device 7.0 on pci0 pci3: on pcib3 pcib4: at device 9.0 on pci0 pci4: on pcib4 em0: port 0xbc00-0xbc1f mem 0xfba20000-0xfba3ffff,0xfba80000-0xfbafffff,0xfba1c000-0xfba1ffff irq 32 at device 0.0 on pci4 em0: Using MSIX interrupts with 3 vectors em0: Ethernet address: 00:1b:21:b8:cf:1b pci0: at device 20.0 (no driver attached) pci0: at device 20.1 (no driver attached) pci0: at device 20.2 (no driver attached) pci0: at device 20.3 (no driver attached) pci0: at device 22.0 (no driver attached) pci0: at device 22.1 (no driver attached) pci0: at device 22.2 (no driver attached) pci0: at device 22.3 (no driver attached) pci0: at device 22.4 (no driver attached) pci0: at device 22.5 (no driver attached) pci0: at device 22.6 (no driver attached) pci0: at device 22.7 (no driver attached) uhci0: port 0x9c00-0x9c1f irq 16 at device 26.0 on pci0 uhci0: LegSup = 0x2f00 usbus0 on uhci0 uhci1: port 0x9880-0x989f irq 21 at device 26.1 on pci0 uhci1: LegSup = 0x2f00 usbus1 on uhci1 uhci2: port 0x9800-0x981f irq 19 at device 26.2 on pci0 uhci2: LegSup = 0x2f00 usbus2 on uhci2 ehci0: mem 0xfbeda000-0xfbeda3ff irq 18 at device 26.7 on pci0 usbus3: EHCI version 1.0 usbus3 on ehci0 hdac0: mem 0xfbed4000-0xfbed7fff irq 22 at device 27.0 on pci0 pcib5: irq 17 at device 28.0 on pci0 pci5: on pcib5 em1: port 0xcc00-0xcc1f mem 0xfbb20000-0xfbb3ffff,0xfbb80000-0xfbbfffff,0xfbb1c000-0xfbb1ffff irq 16 at device 0.0 on pci5 em1: Using MSIX interrupts with 3 vectors em1: Ethernet address: 00:1b:21:b8:d0:1d pcib6: irq 17 at device 28.4 on pci0 pci6: on pcib6 em2: port 0xdc00-0xdc1f mem 0xfbce0000-0xfbcfffff,0xfbcdc000-0xfbcdffff irq 16 at device 0.0 on pci6 em2: Using MSIX interrupts with 3 vectors em2: Ethernet address: 00:25:90:32:c9:26 pcib7: irq 16 at device 28.5 on pci0 pci7: on pcib7 em3: port 0xec00-0xec1f mem 0xfbde0000-0xfbdfffff,0xfbddc000-0xfbddffff irq 17 at device 0.0 on pci7 em3: Using MSIX interrupts with 3 vectors em3: Ethernet address: 00:25:90:32:c9:27 uhci3: port 0x9480-0x949f irq 23 at device 29.0 on pci0 uhci3: LegSup = 0x2f00 usbus4 on uhci3 uhci4: port 0x9400-0x941f irq 19 at device 29.1 on pci0 uhci4: LegSup = 0x2f00 usbus5 on uhci4 uhci5: port 0x9080-0x909f irq 18 at device 29.2 on pci0 uhci5: LegSup = 0x2f00 usbus6 on uhci5 ehci1: mem 0xfbed8000-0xfbed83ff irq 23 at device 29.7 on pci0 usbus7: EHCI version 1.0 usbus7 on ehci1 pcib8: at device 30.0 on pci0 pci8: on pcib8 vgapci0: mem 0xf9000000-0xf9ffffff,0xfaffc000-0xfaffffff,0xfb000000-0xfb7fffff irq 18 at device 1.0 on pci8 isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff90-0xff9f,0xffa0-0xffaf at device 31.2 on pci0 ata0: at channel 0 on atapci0 ata1: at channel 1 on atapci0 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 qpi0: on motherboard pcib9: pcibus 255 on qpi0 pci255: on pcib9 pcib10: pcibus 254 on qpi0 pci254: on pcib10 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xc9fff,0xca000-0xcbfff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] ppc0: cannot reserve I/O port range coretemp0: on cpu0 est0: on cpu0 p4tcc0: on cpu0 coretemp1: on cpu1 est1: on cpu1 p4tcc1: on cpu1 coretemp2: on cpu2 est2: on cpu2 p4tcc2: on cpu2 coretemp3: on cpu3 est3: on cpu3 p4tcc3: on cpu3 coretemp4: on cpu4 est4: on cpu4 p4tcc4: on cpu4 coretemp5: on cpu5 est5: on cpu5 p4tcc5: on cpu5 coretemp6: on cpu6 est6: on cpu6 p4tcc6: on cpu6 coretemp7: on cpu7 est7: on cpu7 p4tcc7: on cpu7 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 usbus4: 12Mbps Full Speed USB v1.0 usbus5: 12Mbps Full Speed USB v1.0 usbus6: 12Mbps Full Speed USB v1.0 usbus7: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 ugen4.1: at usbus4 uhub4: on usbus4 ugen5.1: at usbus5 uhub5: on usbus5 ugen6.1: at usbus6 uhub6: on usbus6 ugen7.1: at usbus7 uhub7: on usbus7 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub4: 2 ports with 2 removable, self powered uhub5: 2 ports with 2 removable, self powered uhub6: 2 ports with 2 removable, self powered da0 at twa0 bus 0 scbus0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 100.000MB/s transfers da0: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) cd0 at ata0 bus 0 scbus1 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed da1 at twa0 bus 0 scbus0 target 1 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 100.000MB/s transfers da1: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da2 at twa0 bus 0 scbus0 target 2 lun 0 da2: Fixed Direct Access SCSI-5 device da2: 100.000MB/s transfers da2: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da3 at twa0 bus 0 scbus0 target 3 lun 0 da3: Fixed Direct Access SCSI-5 device da3: 100.000MB/s transfers da3: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da4 at twa0 bus 0 scbus0 target 4 lun 0 da4: Fixed Direct Access SCSI-5 device da4: 100.000MB/s transfers da4: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da5 at twa0 bus 0 scbus0 target 5 lun 0 da5: Fixed Direct Access SCSI-5 device da5: 100.000MB/s transfers da5: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da6 at twa0 bus 0 scbus0 target 6 lun 0 da6: Fixed Direct Access SCSI-5 device da6: 100.000MB/s transfers da6: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da7 at twa0 bus 0 scbus0 target 7 lun 0 da7: Fixed Direct Access SCSI-5 device da7: 100.000MB/s transfers da7: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da8 at twa0 bus 0 scbus0 target 8 lun 0 da8: Fixed Direct Access SCSI-5 device da8: 100.000MB/s transfers da8: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da9 at twa0 bus 0 scbus0 target 9 lun 0 da9: Fixed Direct Access SCSI-5 device da9: 100.000MB/s transfers da9: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da10 at twa0 bus 0 scbus0 target 10 lun 0 da10: Fixed Direct Access SCSI-5 device da10: 100.000MB/s transfers da10: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da11 at twa0 bus 0 scbus0 target 11 lun 0 da11: Fixed Direct Access SCSI-5 device da11: 100.000MB/s transfers da11: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da12 at twa0 bus 0 scbus0 target 12 lun 0 da12: Fixed Direct Access SCSI-5 device da12: 100.000MB/s transfers da12: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da13 at twa0 bus 0 scbus0 target 13 lun 0 da13: Fixed Direct Access SCSI-5 device da13: 100.000MB/s transfers da13: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da14 at twa0 bus 0 scbus0 target 14 lun 0 da14: Fixed Direct Access SCSI-5 device da14: 100.000MB/s transfers da14: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) SMP: AP CPU #2 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #1 Launched! da7: 100.000MB/s transfers da7: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da8 at twa0 bus 0 scbus0 target 8 lun 0 da8: Fixed Direct Access SCSI-5 device da8: 100.000MB/s transfers da8: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da9 at twa0 bus 0 scbus0 target 9 lun 0 da9: Fixed Direct Access SCSI-5 device da9: 100.000MB/s transfers da9: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da10 at twa0 bus 0 scbus0 target 10 lun 0 da10: Fixed Direct Access SCSI-5 device da10: 100.000MB/s transfers da10: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da11 at twa0 bus 0 scbus0 target 11 lun 0 da11: Fixed Direct Access SCSI-5 device da11: 100.000MB/s transfers da11: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da12 at twa0 bus 0 scbus0 target 12 lun 0 da12: Fixed Direct Access SCSI-5 device da12: 100.000MB/s transfers da12: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da13 at twa0 bus 0 scbus0 target 13 lun 0 da13: Fixed Direct Access SCSI-5 device da13: 100.000MB/s transfers da13: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da14 at twa0 bus 0 scbus0 target 14 lun 0 da14: Fixed Direct Access SCSI-5 device da14: 100.000MB/s transfers da14: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) SMP: AP CPU #2 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #5 Launched! Timecounter "TSC-low" frequency 16667583 Hz quality 1000 uhub3: 6 ports with 6 removable, self powered uhub7: 6 ports with 6 removable, self powered Trying to mount root from ufs:/dev/da0p2 [rw]... ugen4.2: at usbus4 ukbd0: on usbus4 kbd2 at ukbd0 ums0: on usbus4 ums0: 5 buttons and [XYZ] coordinates ID=1 ZFS filesystem version 5 ZFS storage pool version 28 em0: link state changed to UP lagg0: link state changed to UP em2: link state changed to UP em1: link state changed to UP em3: link state changed to UP From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 16:14:50 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DF8F8539 for ; Thu, 21 Mar 2013 16:14:50 +0000 (UTC) (envelope-from prvs=1792b855f9=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 6EBCE73E for ; Thu, 21 Mar 2013 16:14:50 +0000 (UTC) Received: from r2d2 ([82.12.16.150]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002850053.msg for ; Thu, 21 Mar 2013 16:14:48 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 21 Mar 2013 16:14:48 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDDNSBL-Result: mail1.multiplay.co.uk, Thu, 21 Mar 2013 16:14:48 +0000 zen.spamhaus.org returned result of 127.0.0.11 X-MDRemoteIP: 82.12.16.150 X-Return-Path: prvs=1792b855f9=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Josh Beard" , References: Subject: Re: ZFS + NFS poor performance after restarting from 100 day uptime Date: Thu, 21 Mar 2013 16:14:46 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 16:14:50 -0000 ----- Original Message ----- From: "Josh Beard" To: Sent: Thursday, March 21, 2013 3:53 PM Subject: ZFS + NFS poor performance after restarting from 100 day uptime > Hello, > > I have a system with 12 disks spread between 2 raidz1. I'm using the > native ("new") NFS to export a pool on this. This has worked very well all > along, but since a reboot, has performed horribly - unusably under load. > > The system was running 9.1-rc3 and I upgraded it to 9.1-release-p1 (GENERIC > kernel) after ~110 days of running (with zero performance issues). After > rebooting from the upgrade, I'm finding the disks seem constantly slammed. > gstat reports 90-100% busy most of the day with only ~100-130 ops/s. > > I didn't change any settings in /etc/sysctl.conf or /boot/loader. No ZFS > tuning, etc. I've looked at the commits between 9.1-rc3 and 9.1-release-p1 > and I can't see any reason why simply upgrading it would cause this. ... > A snip of gstat: > dT: 1.002s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 0 0 0 0.0 0 0 0.0 0.0| cd0 > 0 1 0 0 0.0 1 32 0.2 0.0| da0 > 0 0 0 0 0.0 0 0 0.0 0.0| da0p1 > 0 1 0 0 0.0 1 32 0.2 0.0| da0p2 > 0 0 0 0 0.0 0 0 0.0 0.0| da0p3 > 4 160 126 1319 31.3 34 100 0.1 100.3| da1 > 4 146 110 1289 33.6 36 98 0.1 97.8| da2 > 4 142 107 1370 36.1 35 101 0.2 101.9| da3 > 4 121 95 1360 35.6 26 19 0.1 95.9| da4 > 4 151 117 1409 34.0 34 102 0.1 100.1| da5 > 4 141 109 1366 35.9 32 101 0.1 97.9| da6 > 4 136 118 1207 24.6 18 13 0.1 87.0| da7 > 4 118 102 1278 32.2 16 12 0.1 89.8| da8 > 4 138 116 1240 33.4 22 55 0.1 100.0| da9 > 4 133 117 1269 27.8 16 13 0.1 86.5| da10 > 4 121 102 1302 53.1 19 51 0.1 100.0| da11 > 4 120 99 1242 40.7 21 51 0.1 99.7| da12 Your ops/s are be maxing your disks. You say "only" but the ~190 ops/s is what HD's will peak at, so whatever our machine is doing is causing it to max the available IO for your disks. If you boot back to your previous kernel does the problem go away? If so you could look at the changes between the two kernel revisions for possible causes and if needed to a binary chop with kernel builds to narrow down the cause. Regards Steve Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 17:10:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DE2FB3DA for ; Thu, 21 Mar 2013 17:10:19 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay02.pair.com (relay02.pair.com [209.68.5.16]) by mx1.freebsd.org (Postfix) with SMTP id 82699D4F for ; Thu, 21 Mar 2013 17:10:19 +0000 (UTC) Received: (qmail 84319 invoked by uid 0); 21 Mar 2013 17:10:18 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay02.pair.com with SMTP; 21 Mar 2013 17:10:18 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <514B3EF9.2000105@sneakertech.com> Date: Thu, 21 Mar 2013 13:10:17 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: ZFS question References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> <20130321085304.GB16997@icarus.home.lan> In-Reply-To: <20130321085304.GB16997@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 17:10:19 -0000 > I've changed the CC line to use freebsd-fs@ instead, and will follow-up > with freebsd-questions@ stating that the thread/discussion has been > moved. Ok, I'm not yet subscribed to -fs, so lemme do that. Once that's set up I'll continue there with a different (more descriptive) subject line. >I also don't want to get into a discussion > about -RELEASE vs. -STABLE because I could practically write a book on > the subject (particularly why -STABLE is a better choice). While I understand that -stable is probably better for typical use, I want to cut out as many variables as possible until I figure out what's going on here. But yes, this is a different argument for another time. ______________________________________ it has a certain smooth-brained appeal From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 17:22:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D8645632 for ; Thu, 21 Mar 2013 17:22:47 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay00.pair.com (relay00.pair.com [209.68.5.9]) by mx1.freebsd.org (Postfix) with SMTP id 9495AE0B for ; Thu, 21 Mar 2013 17:22:47 +0000 (UTC) Received: (qmail 57110 invoked by uid 0); 21 Mar 2013 17:22:40 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay00.pair.com with SMTP; 21 Mar 2013 17:22:40 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <514B41DF.4010802@sneakertech.com> Date: Thu, 21 Mar 2013 13:22:39 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: ZFS: Failed pool causes system to hang References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> <20130321085304.GB16997@icarus.home.lan> In-Reply-To: <20130321085304.GB16997@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 17:22:47 -0000 So looking through the past couple emails, it doesn't appear that anyone included a complete copy of my description when CCing -fs, so I'm going to do that now just so there's no confusion. I'll respond to Jeremy's questions separately. ------ I have a raidz2 comprised of six sata drives connected via my motherboard's intel southbridge sata ports. All of the bios raid options are disabled and the drives are in straight ahci mode (hotswap enabled). The system (accounts, home dir, etc) is installed on a separate 7th drive formatted as normal ufs, connected to a separate non-intel motherboard port. As part of my initial stress testing, I'm simulating failures by popping the sata cable to various drives in the 6x pool. If I pop two drives, the pool goes into 'degraded' mode and everything works as expected. I can zero and replace the drives, etc, no problem. However, when I pop a third drive, the machine becomes VERY unstable. I can nose around the boot drive just fine, but anything involving i/o that so much as sneezes in the general direction of the pool hangs the machine. Once this happens I can log in via ssh, but that's pretty much it. I've reinstalled and tested this over a dozen times, and it's perfectly repeatable: `ls` the dir where the pool is mounted? hang. I'm already in the dir, and try to `cd` back to my home dir? hang. zpool destroy? hang. zpool replace? hang. zpool history? hang. shutdown -r now? gets halfway through, then hang. reboot -q? same as shutdown. The machine never recovers (at least, not inside 35 minutes, which is the most I'm willing to wait). Reconnecting the drives has no effect. My only option is to hard reset the machine with the front panel button. Googling for info suggested I try changing the pool's "failmode" setting from "wait" to "continue", but that doesn't appear to make any difference. For reference, this is a virgin 9.1-release installed off the dvd image with no ports or packages or any extra anything. I don't think I'm doing anything wrong procedure wise. I fully understand and accept that a raidz2 with three dead drives is toast, but I will NOT accept having it take down the rest of the machine with it. As it stands, I can't even reliably look at what state the pool is in. I can't even nuke the pool and start over without taking the whole machine offline. ______________________________________ it has a certain smooth-brained appeal From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 18:11:07 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 3E96E5F7 for ; Thu, 21 Mar 2013 18:11:07 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay00.pair.com (relay00.pair.com [209.68.5.9]) by mx1.freebsd.org (Postfix) with SMTP id 073D915F for ; Thu, 21 Mar 2013 18:11:06 +0000 (UTC) Received: (qmail 78625 invoked by uid 0); 21 Mar 2013 18:11:04 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay00.pair.com with SMTP; 21 Mar 2013 18:11:04 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <514B4D38.6090101@sneakertech.com> Date: Thu, 21 Mar 2013 14:11:04 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: ZFS: Failed pool causes system to hang References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> <20130321085304.GB16997@icarus.home.lan> In-Reply-To: <20130321085304.GB16997@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 18:11:07 -0000 >> I'm not >> messing with partitions yet because I don't want to complicate things. >> (I will eventually be going that route though as the controller tends >> to renumber drives in a first-come-first-serve order that makes some >> things difficult). > > Solving this is easy, WITHOUT use of partitions or labels. There is a > feature of CAM(4) called "wired down" or "wiring down", where you can in > essence statically map a SATA port to a static device number regardless > if a disk is inserted at the time the kernel boots My wording implied the wrong thing here: the dev ID mapping issue is *one* of the reasons I'm going to go with partitions. Another other being the "replacement disk is one sector too small" issue, and that gpt labels give me the ability to reference drives by arbitrary string, which makes it easier because I don't have to remember which dev ID corresponds to which physical bay. I probably want to know about this trick anyway though, it looks useful. > I can help you with this, but I need to see a dmesg (everything from > boot to the point mountroot gets done). Can do, but I'll need to reinstall again first. Gimmie a little while. >> I'm experiencing fatal issues with pools hanging my machine requiring a >> hard-reset. > > This, to me, means something very different than what was described in a > subsequent follow-up: Well, what I meant here is that when the pool fails it takes the entire machine down with it in short order. Having a machine become unresponsive and require a panel-button hard reset (with subsequent fsck-ing and possible corruption) counts as a fatal problem in my book. I don't accept this type of behavior in *any* system, even a windows desktop. > S1. In your situation, when a ZFS pool loses enough vdev or vdev members > to cause permanent pool damage (as in completely 100% unrecoverable, > such as losing 3 disks of a raidz2 pool), any I/O to the pool results in > that applications hanging. Sorta. Yes the command I issued hangs, but so do a lot of other things as well. I can't kill -9 any of them or reboot or anything. >The system is still functional/usable (e.g. > I/O to other pools and non-ZFS filesystems works fine), Assuming I do those *first*. Once something touches the pool, all bets are off. 'ps' and 'top' seem safe, but things like 'cd' are a gamble. Admittedly though, I haven't spent any time testing exactly what does and doesn't work and if there's a pattern to it. > A1. This is because "failmode=wait" on the pool, which is the default > property value. This is by design; there is no ZFS "timeout" for this > sort of thing. "failmode=continue" is what you're looking for (keep > reading). > > S2. If the pool uses "failmode=continue", there is no change in > behaviour, (i.e. EIO is still never returned). > > A2. That sounds like a bug then. I test your claim below, and you might > be surprised at the findings. As far as I'm aware, "wait" will hang all i/o read or write, whereas "continue" is supposed to hang only write. My problem (as near I can tell) is that nothing informs or limits processes from trying to write to the pool, so "continue" effectively only delays the inevitable by several seconds. > S3. If the previously-yanked disks are reinserted, the issue remains. > > A3. What you're looking for is the "autoreplace" pool property. No it's not. I *don't* want the pool trying to suck up a freshly inserted drive without my explicit say so. I only mentioned this because some other thread I was reading implied that zfs would come back to life if it could talk to the drive again. > And in the other window where dd is running, it immediately terminates > with EIO: IIRC I only tried popping a third disk during activity once... It was during scp from another machine and it just paused. During all other tests, I've waited to make sure everything settles down first. > One thing to note (and it's important) above is that da2 is still > considered "ONLINE". More on that in a moment. Yeah I noticed that in my testing. > root@testbox:/root # zpool replace array da2 > cannot open 'da2': no such GEOM provider > must be a full path or shorthand device name > > This would indicate a separate/different bug, probably in CAM or its > related pieces. I don't even get as far as this. Most of the time, once something caused the hang, not a lot works past that point. Assuming I followed your example to the letter and typed 'ls' first, 'zpool replace' would have just hung as well without printing anything. > I'll end this Email with (hopefully) an educational statement: I hope > my analysis shows you why very thorough, detailed output/etc. needs to > be provided when reporting a problem, and not just some "general" > description. This is why hard data/logs/etc. are necessary, and why > every single step of the way needs to be provided, including physical > tasks performed. Oh I agree, but etiquette dictates I don't spam people with 5kb of unsolicited text including every possible detail about everything, especially when I'm not even sure if it's the right mailing list. > P.S. -- I started this Email at 23:15 PDT. It's now 01:52 PDT. To whom > should I send a bill for time rendered? ;-) Ha, I think I have you beat there :) I'll frequently spend hours writing single emails. ______________________________________ it has a certain smooth-brained appeal From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 20:01:32 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 54A4C7A3; Thu, 21 Mar 2013 20:01:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 1B234827; Thu, 21 Mar 2013 20:01:32 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 75702B926; Thu, 21 Mar 2013 16:01:31 -0400 (EDT) From: John Baldwin To: Rick Macklem Subject: Re: Deadlock in the NFS client Date: Thu, 21 Mar 2013 16:00:46 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <2141845166.4036172.1363654854297.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <2141845166.4036172.1363654854297.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201303211600.46884.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 21 Mar 2013 16:01:31 -0400 (EDT) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 20:01:32 -0000 On Monday, March 18, 2013 9:00:54 pm Rick Macklem wrote: > John Baldwin wrote: > > On Friday, March 15, 2013 10:03:39 pm Rick Macklem wrote: > > > John Baldwin wrote: > > > > On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov wrote: > > > > > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin wrote: > > > > > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov > > > > > > wrote: > > > > > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem > > > > > > > wrote: > > > > > > > > John Baldwin wrote: > > > > > > > > > I ran into a machine that had a deadlock among certain > > > > > > > > > files > > > > > > > > > on a > > > > > > > > > given NFS > > > > > > > > > mount today. I'm not sure how best to resolve it, though > > > > > > > > > it > > > > > > > > > seems like > > > > > > > > > perhaps there is a bug with how the pool of nfsiod > > > > > > > > > threads > > > > > > > > > is managed. > > > > > > > > > Anyway, more details on the actual hang below. This was > > > > > > > > > on > > > > > > > > > 8.x with > > > > > > > > > the > > > > > > > > > old NFS client, but I don't see anything in HEAD that > > > > > > > > > would > > > > > > > > > fix this. > > > > > > > > > > > > > > > > > > First note that the system was idle so it had dropped > > > > > > > > > down > > > > > > > > > to only one > > > > > > > > > nfsiod thread. > > > > > > > > > > > > > > > > > Hmm, I see the problem and I'm a bit surprised it doesn't > > > > > > > > bite > > > > > > > > more often. > > > > > > > > It seems to me that this snippet of code from > > > > > > > > nfs_asyncio() > > > > > > > > makes too > > > > > > > > weak an assumption: > > > > > > > > /* > > > > > > > > * If none are free, we may already have an iod working > > > > > > > > on > > > > > > > > this mount > > > > > > > > * point. If so, it will process our request. > > > > > > > > */ > > > > > > > > if (!gotiod) { > > > > > > > > if (nmp->nm_bufqiods > 0) { > > > > > > > > NFS_DPF(ASYNCIO, > > > > > > > > ("nfs_asyncio: %d iods are already processing mount > > > > > > > > %p\n", > > > > > > > > nmp->nm_bufqiods, nmp)); > > > > > > > > gotiod = TRUE; > > > > > > > > } > > > > > > > > } > > > > > > > > It assumes that, since an nfsiod thread is processing some > > > > > > > > buffer for the > > > > > > > > mount, it will become available to do this one, which > > > > > > > > isn't > > > > > > > > true for your > > > > > > > > deadlock. > > > > > > > > > > > > > > > > I think the simple fix would be to recode nfs_asyncio() so > > > > > > > > that > > > > > > > > it only returns 0 if it finds an AVAILABLE nfsiod thread > > > > > > > > that > > > > > > > > it > > > > > > > > has assigned to do the I/O, getting rid of the above. The > > > > > > > > problem > > > > > > > > with doing this is that it may result in a lot more > > > > > > > > synchronous I/O > > > > > > > > (nfs_asyncio() returns EIO, so the caller does the I/O). > > > > > > > > Maybe > > > > > > > > more > > > > > > > > synchronous I/O could be avoided by allowing nfs_asyncio() > > > > > > > > to > > > > > > > > create a > > > > > > > > new thread even if the total is above nfs_iodmax. (I think > > > > > > > > this would > > > > > > > > require the fixed array to be replaced with a linked list > > > > > > > > and > > > > > > > > might > > > > > > > > result in a large number of nfsiod threads.) Maybe just > > > > > > > > having > > > > > > > > a large > > > > > > > > nfs_iodmax would be an adequate compromise? > > > > > > > > > > > > > > > > Does having a large # of nfsiod threads cause any serious > > > > > > > > problem for > > > > > > > > most systems these days? > > > > > > > > > > > > > > > > I'd be tempted to recode nfs_asyncio() as above and then, > > > > > > > > instead > > > > > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a fixed > > > > > > > > number of > > > > > > > > nfsiod threads (this could be a tunable, with the > > > > > > > > understanding that > > > > > > > > it should be large for good performance) > > > > > > > > > > > > > > > > > > > > > > I do not see how this would solve the deadlock itself. The > > > > > > > proposal would > > > > > > > only allow system to survive slightly longer after the > > > > > > > deadlock > > > > > > > appeared. > > > > > > > And, I think that allowing the unbound amount of nfsiod > > > > > > > threads > > > > > > > is also > > > > > > > fatal. > > > > > > > > > > > > > > The issue there is the LOR between buffer lock and vnode > > > > > > > lock. > > > > > > > Buffer lock > > > > > > > always must come after the vnode lock. The problematic > > > > > > > nfsiod > > > > > > > thread, which > > > > > > > locks the vnode, volatile this rule, because despite the > > > > > > > LK_KERNPROC > > > > > > > ownership of the buffer lock, it is the thread which de fact > > > > > > > owns the > > > > > > > buffer (only the thread can unlock it). > > > > > > > > > > > > > > A possible solution would be to pass LK_NOWAIT to nfs_nget() > > > > > > > from the > > > > > > > nfs_readdirplusrpc(). From my reading of the code, > > > > > > > nfs_nget() > > > > > > > should > > > > > > > be capable of correctly handling the lock failure. And EBUSY > > > > > > > would > > > > > > > result in doit = 0, which should be fine too. > > > > > > > > > > > > > > It is possible that EBUSY should be reset to 0, though. > > > > > > > > > > > > Yes, thinking about this more, I do think the right answer is > > > > > > for > > > > > > readdirplus to do this. The only question I have is if it > > > > > > should > > > > > > do > > > > > > this always, or if it should do this only from the nfsiod > > > > > > thread. > > > > > > I > > > > > > believe you can't get this in the non-nfsiod case. > > > > > > > > > > I agree that it looks as of the workaround only needed for > > > > > nfsiod > > > > > thread. > > > > > On the other hand, it is not immediately obvious how to detect > > > > > that > > > > > the current thread is nfsio daemon. Probably a thread flag > > > > > should be > > > > > set. > > > > > > > > OTOH, updating the attributes from readdir+ is only an > > > > optimization > > > > anyway, so > > > > just having it always do LK_NOWAIT is probably ok (and simple). > > > > Currently I'm > > > > trying to develop a test case to provoke this so I can test the > > > > fix, > > > > but no > > > > luck on that yet. > > > > > > > > -- > > > > John Baldwin > > > Just fyi, ignore my comment about the second version of the patch > > > that > > > disables the nfsiod threads from doing readdirplus running faster. > > > It > > > was just that when I tested the 2nd patch, the server's caches were > > > primed. Oops. > > > > > > However, sofar the minimal testing I've done has been essentially > > > performance neutral between the unpatch and patched versions. > > > > > > Hopefully John has a convenient way to do some performance testing, > > > since I won't be able to do much until the end of April. > > > > Performance testing I don't really have available. > All I've been doing are things like (assuming /mnt is an NFSv3 mount point): > # cd /mnt > # time ls -lR > /dev/null > # time ls -R > /dev/null > - for both a patched and unpatched kernel > (Oh, and you need to keep the server's caches pretty consistent. For me > once I run the test once, the server caches end up primed and then > the times seem to be pretty consistent, but I am only using old laptops.) > > Maybe you could do something like the above? (I'll try some finds too.) > (I don't really have any clever ideas for other tests.) I've been doing find across trees on different servers (one mounted with rdirplus and one without). I've compared the current behavior (blocking lock in rdirplus + readahead) to disabling readahead for dirs as well as just using non-blocking locks in the nfsiod case in readdir+ processing. I also ran my tests with various concurrent number of jobs (up to 8 since this is an 8-core machine). The three cases were all basically the same, and the two possible fixes were no different, so I think you can fix this however you want. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Thu Mar 21 23:24:31 2013 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BD570A55; Thu, 21 Mar 2013 23:24:31 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 51B63168; Thu, 21 Mar 2013 23:24:30 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAJOVS1GDaFvO/2dsb2JhbABDiCC6KIMKgXF0giQBAQUjBFIbDgoCAg0ZAlkGLod5sAmSJoEjjTo0B4ItgRMDlmSRAoMmIIFs X-IronPort-AV: E=Sophos;i="4.84,888,1355115600"; d="scan'208";a="20199593" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu.net.uoguelph.ca with ESMTP; 21 Mar 2013 19:24:24 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 0D057B4035; Thu, 21 Mar 2013 19:24:24 -0400 (EDT) Date: Thu, 21 Mar 2013 19:24:24 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <1881768576.34111.1363908264039.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201303211600.46884.jhb@freebsd.org> Subject: Re: Deadlock in the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Rick Macklem , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Mar 2013 23:24:31 -0000 John Baldwin wrote: > On Monday, March 18, 2013 9:00:54 pm Rick Macklem wrote: > > John Baldwin wrote: > > > On Friday, March 15, 2013 10:03:39 pm Rick Macklem wrote: > > > > John Baldwin wrote: > > > > > On Thursday, March 14, 2013 1:22:39 pm Konstantin Belousov > > > > > wrote: > > > > > > On Thu, Mar 14, 2013 at 10:57:13AM -0400, John Baldwin > > > > > > wrote: > > > > > > > On Thursday, March 14, 2013 5:27:28 am Konstantin Belousov > > > > > > > wrote: > > > > > > > > On Wed, Mar 13, 2013 at 07:33:35PM -0400, Rick Macklem > > > > > > > > wrote: > > > > > > > > > John Baldwin wrote: > > > > > > > > > > I ran into a machine that had a deadlock among > > > > > > > > > > certain > > > > > > > > > > files > > > > > > > > > > on a > > > > > > > > > > given NFS > > > > > > > > > > mount today. I'm not sure how best to resolve it, > > > > > > > > > > though > > > > > > > > > > it > > > > > > > > > > seems like > > > > > > > > > > perhaps there is a bug with how the pool of nfsiod > > > > > > > > > > threads > > > > > > > > > > is managed. > > > > > > > > > > Anyway, more details on the actual hang below. This > > > > > > > > > > was > > > > > > > > > > on > > > > > > > > > > 8.x with > > > > > > > > > > the > > > > > > > > > > old NFS client, but I don't see anything in HEAD > > > > > > > > > > that > > > > > > > > > > would > > > > > > > > > > fix this. > > > > > > > > > > > > > > > > > > > > First note that the system was idle so it had > > > > > > > > > > dropped > > > > > > > > > > down > > > > > > > > > > to only one > > > > > > > > > > nfsiod thread. > > > > > > > > > > > > > > > > > > > Hmm, I see the problem and I'm a bit surprised it > > > > > > > > > doesn't > > > > > > > > > bite > > > > > > > > > more often. > > > > > > > > > It seems to me that this snippet of code from > > > > > > > > > nfs_asyncio() > > > > > > > > > makes too > > > > > > > > > weak an assumption: > > > > > > > > > /* > > > > > > > > > * If none are free, we may already have an iod > > > > > > > > > working > > > > > > > > > on > > > > > > > > > this mount > > > > > > > > > * point. If so, it will process our request. > > > > > > > > > */ > > > > > > > > > if (!gotiod) { > > > > > > > > > if (nmp->nm_bufqiods > 0) { > > > > > > > > > NFS_DPF(ASYNCIO, > > > > > > > > > ("nfs_asyncio: %d iods are already processing mount > > > > > > > > > %p\n", > > > > > > > > > nmp->nm_bufqiods, nmp)); > > > > > > > > > gotiod = TRUE; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > It assumes that, since an nfsiod thread is processing > > > > > > > > > some > > > > > > > > > buffer for the > > > > > > > > > mount, it will become available to do this one, which > > > > > > > > > isn't > > > > > > > > > true for your > > > > > > > > > deadlock. > > > > > > > > > > > > > > > > > > I think the simple fix would be to recode > > > > > > > > > nfs_asyncio() so > > > > > > > > > that > > > > > > > > > it only returns 0 if it finds an AVAILABLE nfsiod > > > > > > > > > thread > > > > > > > > > that > > > > > > > > > it > > > > > > > > > has assigned to do the I/O, getting rid of the above. > > > > > > > > > The > > > > > > > > > problem > > > > > > > > > with doing this is that it may result in a lot more > > > > > > > > > synchronous I/O > > > > > > > > > (nfs_asyncio() returns EIO, so the caller does the > > > > > > > > > I/O). > > > > > > > > > Maybe > > > > > > > > > more > > > > > > > > > synchronous I/O could be avoided by allowing > > > > > > > > > nfs_asyncio() > > > > > > > > > to > > > > > > > > > create a > > > > > > > > > new thread even if the total is above nfs_iodmax. (I > > > > > > > > > think > > > > > > > > > this would > > > > > > > > > require the fixed array to be replaced with a linked > > > > > > > > > list > > > > > > > > > and > > > > > > > > > might > > > > > > > > > result in a large number of nfsiod threads.) Maybe > > > > > > > > > just > > > > > > > > > having > > > > > > > > > a large > > > > > > > > > nfs_iodmax would be an adequate compromise? > > > > > > > > > > > > > > > > > > Does having a large # of nfsiod threads cause any > > > > > > > > > serious > > > > > > > > > problem for > > > > > > > > > most systems these days? > > > > > > > > > > > > > > > > > > I'd be tempted to recode nfs_asyncio() as above and > > > > > > > > > then, > > > > > > > > > instead > > > > > > > > > of nfs_iodmin and nfs_iodmax, I'd simply have: - a > > > > > > > > > fixed > > > > > > > > > number of > > > > > > > > > nfsiod threads (this could be a tunable, with the > > > > > > > > > understanding that > > > > > > > > > it should be large for good performance) > > > > > > > > > > > > > > > > > > > > > > > > > I do not see how this would solve the deadlock itself. > > > > > > > > The > > > > > > > > proposal would > > > > > > > > only allow system to survive slightly longer after the > > > > > > > > deadlock > > > > > > > > appeared. > > > > > > > > And, I think that allowing the unbound amount of nfsiod > > > > > > > > threads > > > > > > > > is also > > > > > > > > fatal. > > > > > > > > > > > > > > > > The issue there is the LOR between buffer lock and vnode > > > > > > > > lock. > > > > > > > > Buffer lock > > > > > > > > always must come after the vnode lock. The problematic > > > > > > > > nfsiod > > > > > > > > thread, which > > > > > > > > locks the vnode, volatile this rule, because despite the > > > > > > > > LK_KERNPROC > > > > > > > > ownership of the buffer lock, it is the thread which de > > > > > > > > fact > > > > > > > > owns the > > > > > > > > buffer (only the thread can unlock it). > > > > > > > > > > > > > > > > A possible solution would be to pass LK_NOWAIT to > > > > > > > > nfs_nget() > > > > > > > > from the > > > > > > > > nfs_readdirplusrpc(). From my reading of the code, > > > > > > > > nfs_nget() > > > > > > > > should > > > > > > > > be capable of correctly handling the lock failure. And > > > > > > > > EBUSY > > > > > > > > would > > > > > > > > result in doit = 0, which should be fine too. > > > > > > > > > > > > > > > > It is possible that EBUSY should be reset to 0, though. > > > > > > > > > > > > > > Yes, thinking about this more, I do think the right answer > > > > > > > is > > > > > > > for > > > > > > > readdirplus to do this. The only question I have is if it > > > > > > > should > > > > > > > do > > > > > > > this always, or if it should do this only from the nfsiod > > > > > > > thread. > > > > > > > I > > > > > > > believe you can't get this in the non-nfsiod case. > > > > > > > > > > > > I agree that it looks as of the workaround only needed for > > > > > > nfsiod > > > > > > thread. > > > > > > On the other hand, it is not immediately obvious how to > > > > > > detect > > > > > > that > > > > > > the current thread is nfsio daemon. Probably a thread flag > > > > > > should be > > > > > > set. > > > > > > > > > > OTOH, updating the attributes from readdir+ is only an > > > > > optimization > > > > > anyway, so > > > > > just having it always do LK_NOWAIT is probably ok (and > > > > > simple). > > > > > Currently I'm > > > > > trying to develop a test case to provoke this so I can test > > > > > the > > > > > fix, > > > > > but no > > > > > luck on that yet. > > > > > > > > > > -- > > > > > John Baldwin > > > > Just fyi, ignore my comment about the second version of the > > > > patch > > > > that > > > > disables the nfsiod threads from doing readdirplus running > > > > faster. > > > > It > > > > was just that when I tested the 2nd patch, the server's caches > > > > were > > > > primed. Oops. > > > > > > > > However, sofar the minimal testing I've done has been > > > > essentially > > > > performance neutral between the unpatch and patched versions. > > > > > > > > Hopefully John has a convenient way to do some performance > > > > testing, > > > > since I won't be able to do much until the end of April. > > > > > > Performance testing I don't really have available. > > All I've been doing are things like (assuming /mnt is an NFSv3 mount > > point): > > # cd /mnt > > # time ls -lR > /dev/null > > # time ls -R > /dev/null > > - for both a patched and unpatched kernel > > (Oh, and you need to keep the server's caches pretty consistent. For > > me > > once I run the test once, the server caches end up primed and then > > the times seem to be pretty consistent, but I am only using old > > laptops.) > > > > Maybe you could do something like the above? (I'll try some finds > > too.) > > (I don't really have any clever ideas for other tests.) > > I've been doing find across trees on different servers (one mounted > with > rdirplus and one without). I've compared the current behavior > (blocking > lock in rdirplus + readahead) to disabling readahead for dirs as well > as > just using non-blocking locks in the nfsiod case in readdir+ > processing. > I also ran my tests with various concurrent number of jobs (up to 8 > since > this is an 8-core machine). The three cases were all basically the > same, > and the two possible fixes were no different, so I think you can fix > this > however you want. > > -- > John Baldwin Yep, same here. (Basically performance neutral for what I've tried. At most 0.5% slower without using the nfsiods and that might have been within the statistical variance of the test runs.) I'll commit the one that disables read-ahead for rdirplus in April, if that's ok with everyone. That way if there is a big performance hit for some situation, it can be avoided by taking the "rdirplus" option off the mount. Thanks for your help with this, rick From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 08:48:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1F86F500 for ; Fri, 22 Mar 2013 08:48:33 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 8F397EAA for ; Fri, 22 Mar 2013 08:48:31 +0000 (UTC) Received: from server.rulingia.com (c220-239-237-213.belrs5.nsw.optusnet.com.au [220.239.237.213]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id r2M8OqDW077490 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 22 Mar 2013 19:24:53 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id r2M8OlwL035104 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 22 Mar 2013 19:24:47 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id r2M8Ol3i035103; Fri, 22 Mar 2013 19:24:47 +1100 (EST) (envelope-from peter) Date: Fri, 22 Mar 2013 19:24:47 +1100 From: Peter Jeremy To: Zaphod Beeblebrox Subject: Re: ZFS: Almost a minute of dirty buffers? Message-ID: <20130322082447.GC81066@server.rulingia.com> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="BXVAT5kNtrzKuDFl" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 08:48:33 -0000 --BXVAT5kNtrzKuDFl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2013-Mar-19 16:45:36 -0400, Zaphod Beeblebrox wrote: >... and ZFS takes nearly a minute of very active disk to shutdown ?!!? > >Are these dirty buffers? What is it doing? This period of disk blinking >seems to be related to uptime (ie: longer uptime, longer blinking on >shutdown). Well, ZFS will be flushing all dirty buffers and the ZIL and then serially (synchronously) updating all 4 vdev headers on each disk - though this shouldn't take a minute. How many filesystems are in your pool? How much dirty data does your system have? --=20 Peter Jeremy --BXVAT5kNtrzKuDFl Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlFMFU4ACgkQ/opHv/APuIcfXACgslEAV3a9kBHzyXyPK+9xzE+j or8An3+oSDOnzB1GTu1u8M4KJOocnlq3 =QKQl -----END PGP SIGNATURE----- --BXVAT5kNtrzKuDFl-- From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 12:38:54 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D9E2A78F for ; Fri, 22 Mar 2013 12:38:54 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:150:6101::4]) by mx1.freebsd.org (Postfix) with ESMTP id 96A13611 for ; Fri, 22 Mar 2013 12:38:54 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.2]) by mail.vx.sk (Postfix) with ESMTP id 96577350E9 for ; Fri, 22 Mar 2013 13:38:52 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk by core.vx.sk (amavisd-new, unix socket) with LMTP id P2KEj8Mwma5r for ; Fri, 22 Mar 2013 13:38:50 +0100 (CET) Received: from [10.9.8.1] (chello085216226145.chello.sk [85.216.226.145]) by mail.vx.sk (Postfix) with ESMTPSA id 79D37350E2 for ; Fri, 22 Mar 2013 13:38:49 +0100 (CET) Message-ID: <514C50D6.9080302@FreeBSD.org> Date: Fri, 22 Mar 2013 13:38:46 +0100 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Subject: [CFT] libzfs_core for 9-STABLE X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 12:38:54 -0000 Hello all, libzfs_core and the rewritten locking code around dsl_sync_dataset have been commited to -HEAD: http://svnweb.freebsd.org/changeset/base/248571 The scheduled merge date to 9-STABLE is around Apr 21, 2013. Early adopters can test new code by applying the following patch (against stable/9 r248611): http://people.freebsd.org/~mm/patches/zfs/stable-9-248611-lzc.patch.gz Steps to apply to a clean checked-out source: cd /path/to/src patch -p0 < /path/to/stable-9-248611-lzc.patch Alternatively you can download a pre-compiled amd64 mfsBSD image for testing: (see http://mfsbsd.vx.sk for more information on mfsBSD) http://mfsbsd.vx.sk/files/testing/ I am primarily interested in the following areas of feedback: - stability - backward compatibility (new kernel, old utilities) Feedback and suggestions are welcome. -- Martin Matuska FreeBSD committer http://blog.vx.sk From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 15:07:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2DC4B44E for ; Fri, 22 Mar 2013 15:07:52 +0000 (UTC) (envelope-from zeus@ibs.dn.ua) Received: from relay.ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25]) by mx1.freebsd.org (Postfix) with ESMTP id A7AE0359 for ; Fri, 22 Mar 2013 15:07:51 +0000 (UTC) Received: from ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25]) by relay.ibs.dn.ua with ESMTP id r2MF7gv8093503 for ; Fri, 22 Mar 2013 17:07:43 +0200 (EET) Message-ID: <20130322170742.93502@relay.ibs.dn.ua> Date: Fri, 22 Mar 2013 17:07:42 +0300 From: Zeus Panchenko To: cc: Subject: ZFS pool size to bearable free space amount ratio Organization: I.B.S. LLC X-Mailer: MH-E 8.3.1; GNU Mailutils 2.99.97; GNU Emacs 24.0.93 X-Face: &sReWXo3Iwtqql1[My(t1Gkx; y?KF@KF`4X+'9Cs@PtK^y%}^.>Mtbpyz6U=,Op:KPOT.uG )Nvx`=er!l?WASh7KeaGhga"1[&yz$_7ir'cVp7o%CGbJ/V)j/=]vzvvcqcZkf; JDurQG6wTg+?/xA go`}1.Ze//K; Fk&/&OoHd'[b7iGt2UO>o(YskCT[_D)kh4!yY'<&:yt+zM=A`@`~9U+P[qS:f; #9z~ Or/Bo#N-'S'!'[3Wog'ADkyMqmGDvga?WW)qd=?)`Y&k=o}>!ST\ MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Zeus Panchenko List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 15:07:52 -0000 =2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 hi all, please, share your experience on the subject ... what is ZFS-pool-size to bearable-free-space-amount ratio? how much percents of total ZFS pool space has to be free before performance starts worsening? =2D --=20 Zeus V. Panchenko jid:zeus@im.ibs.dn.ua IT Dpt., I.B.S. LLC GMT+2 (EET) =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlFMc74ACgkQr3jpPg/3oyrtuwCfZKx+ze0/VTGnP1SNcqie1yAK eH0AoLDpmkYLcFXWEnmq2EZJk3JS+/1p =3DUmMt =2D----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 15:10:20 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C81CC5FE for ; Fri, 22 Mar 2013 15:10:20 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-ee0-f51.google.com (mail-ee0-f51.google.com [74.125.83.51]) by mx1.freebsd.org (Postfix) with ESMTP id 6186D385 for ; Fri, 22 Mar 2013 15:10:20 +0000 (UTC) Received: by mail-ee0-f51.google.com with SMTP id d17so2262507eek.10 for ; Fri, 22 Mar 2013 08:10:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=7lyWeEEAGRIh61ZWEpErPIditezPMp6VtRUTx+pDoj8=; b=fsSsX9zH/PZePlZ5zHh01t8i99CdJs1157GypO+Y0cBHB3uNhRKoUErAfKEtC16TEd 5BZ9I/0gIWhVRkIu3SdAYb2q6tAHKYlUzX/6fegNHyPdCZ90T72Ajg/MCYmYOatOSOUb uBBhH+JH/tnKsiAD5UtVgthVe1iq8VhcLpyCL5dLEBcXfPWqdjZxaskUzoyk4qD97WIn V1MCSTcedMNDkVwiE71npAW987Hsl4cgh8AtVpcZKVg4vBoi8xF7JjbXsqvYG4wkNmX5 tTqI7Tr1M9Qb7aNEt6jGY8hol7dLYReR5D4v8++hpPrVan5q69vmazdht0w6WbO0XoyJ DeTw== X-Received: by 10.14.5.6 with SMTP id 6mr5912326eek.42.1363965019568; Fri, 22 Mar 2013 08:10:19 -0700 (PDT) Received: from [192.168.50.105] (double-l.xs4all.nl. [80.126.205.144]) by mx.google.com with ESMTPS id q5sm3506696eeo.17.2013.03.22.08.10.18 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Mar 2013 08:10:18 -0700 (PDT) Message-ID: <514C745B.5000803@gmail.com> Date: Fri, 22 Mar 2013 16:10:19 +0100 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Zeus Panchenko , freebsd-fs@freebsd.org Subject: Re: ZFS pool size to bearable free space amount ratio References: <20130322170742.93502@relay.ibs.dn.ua> In-Reply-To: <20130322170742.93502@relay.ibs.dn.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 15:10:20 -0000 Zeus Panchenko schreef: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > hi all, > > please, share your experience on the subject ... > > what is ZFS-pool-size to bearable-free-space-amount ratio? > > how much percents of total ZFS pool space has to be free before > performance starts worsening? > > - -- > Zeus V. Panchenko jid:zeus@im.ibs.dn.ua > IT Dpt., I.B.S. LLC GMT+2 (EET) > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.19 (FreeBSD) > > iEYEARECAAYFAlFMc74ACgkQr3jpPg/3oyrtuwCfZKx+ze0/VTGnP1SNcqie1yAK > eH0AoLDpmkYLcFXWEnmq2EZJk3JS+/1p > =UmMt > -----END PGP SIGNATURE----- > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" It is best to keep the pool below 80% Above that, things starts to slow down. gr johan Hendriks From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 15:26:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D46CDE59 for ; Fri, 22 Mar 2013 15:26:59 +0000 (UTC) (envelope-from zeus@ibs.dn.ua) Received: from relay.ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25]) by mx1.freebsd.org (Postfix) with ESMTP id 5C26E6C4 for ; Fri, 22 Mar 2013 15:26:58 +0000 (UTC) Received: from ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25]) by relay.ibs.dn.ua with ESMTP id r2MFQvEH095913 for ; Fri, 22 Mar 2013 17:26:57 +0200 (EET) Message-ID: <20130322172657.95912@relay.ibs.dn.ua> Date: Fri, 22 Mar 2013 17:26:57 +0300 From: Zeus Panchenko To: cc: Subject: RAM amount recommendations for ZFS pools with ZIL and L2ARC on SSD Organization: I.B.S. LLC X-Mailer: MH-E 8.3.1; GNU Mailutils 2.99.97; GNU Emacs 24.0.93 X-Face: &sReWXo3Iwtqql1[My(t1Gkx; y?KF@KF`4X+'9Cs@PtK^y%}^.>Mtbpyz6U=,Op:KPOT.uG )Nvx`=er!l?WASh7KeaGhga"1[&yz$_7ir'cVp7o%CGbJ/V)j/=]vzvvcqcZkf; JDurQG6wTg+?/xA go`}1.Ze//K; Fk&/&OoHd'[b7iGt2UO>o(YskCT[_D)kh4!yY'<&:yt+zM=A`@`~9U+P[qS:f; #9z~ Or/Bo#N-'S'!'[3Wog'ADkyMqmGDvga?WW)qd=?)`Y&k=o}>!ST\ MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Zeus Panchenko List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 15:26:59 -0000 =2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 hi all, while discovering the subj, I found recommendations at: http://doc.freenas.org/index.php/Hardware_Recommendations#RAM =2D --------------------------------------------------------------------- ... a general rule of thumb is 1 GB of RAM for every 1TB of storage ... If you plan to use ZFS deduplication, a general rule of thumb is 5 GB RAM per TB of storage to be deduplicated. =2D --------------------------------------------------------------------- so, are these recommendations correct for ZFS pools with ZIL and L2ARC on SSD configurations? are there corellations between "RAM amount" and "with/without ZIL, L2ARC on separate devices pool configuration" ? =2D --=20 Zeus V. Panchenko jid:zeus@im.ibs.dn.ua IT Dpt., I.B.S. LLC GMT+2 (EET) =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlFMeEEACgkQr3jpPg/3oyr6awCfYllLOsUVicDA6OHt644HHQNF aSUAn0lvTY86uKdm1D3nT6lz7KxmuRyA =3D9DNs =2D----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 18:17:48 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EC738F1D for ; Fri, 22 Mar 2013 18:17:47 +0000 (UTC) (envelope-from josh@signalboxes.net) Received: from mail-ob0-x22a.google.com (mail-ob0-x22a.google.com [IPv6:2607:f8b0:4003:c01::22a]) by mx1.freebsd.org (Postfix) with ESMTP id B42947B7 for ; Fri, 22 Mar 2013 18:17:47 +0000 (UTC) Received: by mail-ob0-f170.google.com with SMTP id wc20so4351136obb.29 for ; Fri, 22 Mar 2013 11:17:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=jAOjl2sMt/HpbIyZMc7IZegZwhQvq0EkQn/xEbRxyPA=; b=aFPQ/2EXRiBxvyE5N6XNB/Mh4TG/YILHK8wjNXMQGVtrCCLEKuH7meTWuU0DXe/hRz KpZyNFVpGsOSMXCrLa23dHfeJwPHDK8E+udj/i2Z1+pstQlc4kRhAyXi4x43avJ6G7SE aisxgmo86StYB1waV7DDxgO0DeCYgI0KIaSp2VHbKwi3o3LByU5wb1TBzVR4fjXbYXgh 0sN7pgA49PcDfnwgRcO1orxo6/hqMksjvx3+Bf9g87cfJZx6P3ivKxcfa6yad7TuHQpB vzhWYoShQiyC8CtfgqT0cUVvLr2d5H4vASCuv3EHljvODNnDJpXJ+WzvRpjmQYBm63KS 2hbg== X-Received: by 10.60.14.71 with SMTP id n7mr2742411oec.135.1363976267162; Fri, 22 Mar 2013 11:17:47 -0700 (PDT) Received: from mail-ob0-x234.google.com (mail-ob0-x234.google.com [2607:f8b0:4003:c01::234]) by mx.google.com with ESMTPS id w10sm3314862oed.2.2013.03.22.11.17.46 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Mar 2013 11:17:46 -0700 (PDT) Received: by mail-ob0-f180.google.com with SMTP id wo10so1973153obc.39 for ; Fri, 22 Mar 2013 11:17:46 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.11.8 with SMTP id m8mr2947829oeb.22.1363976265951; Fri, 22 Mar 2013 11:17:45 -0700 (PDT) Received: by 10.60.62.168 with HTTP; Fri, 22 Mar 2013 11:17:45 -0700 (PDT) In-Reply-To: References: Date: Fri, 22 Mar 2013 12:17:45 -0600 Message-ID: Subject: Re: ZFS + NFS poor performance after restarting from 100 day uptime From: Josh Beard To: Steven Hartland X-Gm-Message-State: ALoCoQlbllTJIAHuHXTwAvryKCwdCQY4RUMIK391p4bNYFI4bgNNPiuf7njkCzJJ4JNiwCZm6a04 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 18:17:48 -0000 On Thu, Mar 21, 2013 at 10:14 AM, Steven Hartland wrote: > > ----- Original Message ----- From: "Josh Beard" > To: > Sent: Thursday, March 21, 2013 3:53 PM > Subject: ZFS + NFS poor performance after restarting from 100 day uptime > > > > Hello, >> >> I have a system with 12 disks spread between 2 raidz1. I'm using the >> native ("new") NFS to export a pool on this. This has worked very well >> all >> along, but since a reboot, has performed horribly - unusably under load. >> >> The system was running 9.1-rc3 and I upgraded it to 9.1-release-p1 >> (GENERIC >> kernel) after ~110 days of running (with zero performance issues). After >> rebooting from the upgrade, I'm finding the disks seem constantly slammed. >> gstat reports 90-100% busy most of the day with only ~100-130 ops/s. >> >> I didn't change any settings in /etc/sysctl.conf or /boot/loader. No ZFS >> tuning, etc. I've looked at the commits between 9.1-rc3 and >> 9.1-release-p1 >> and I can't see any reason why simply upgrading it would cause this. >> > ... > >> A snip of gstat: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 0 0 0 0.0 0 0 0.0 0.0| cd0 >> 0 1 0 0 0.0 1 32 0.2 0.0| da0 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0p1 >> 0 1 0 0 0.0 1 32 0.2 0.0| da0p2 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0p3 >> 4 160 126 1319 31.3 34 100 0.1 100.3| da1 >> 4 146 110 1289 33.6 36 98 0.1 97.8| da2 >> 4 142 107 1370 36.1 35 101 0.2 101.9| da3 >> 4 121 95 1360 35.6 26 19 0.1 95.9| da4 >> 4 151 117 1409 34.0 34 102 0.1 100.1| da5 >> 4 141 109 1366 35.9 32 101 0.1 97.9| da6 >> 4 136 118 1207 24.6 18 13 0.1 87.0| da7 >> 4 118 102 1278 32.2 16 12 0.1 89.8| da8 >> 4 138 116 1240 33.4 22 55 0.1 100.0| da9 >> 4 133 117 1269 27.8 16 13 0.1 86.5| da10 >> 4 121 102 1302 53.1 19 51 0.1 100.0| da11 >> 4 120 99 1242 40.7 21 51 0.1 99.7| da12 >> > > Your ops/s are be maxing your disks. You say "only" but the ~190 ops/s > is what HD's will peak at, so whatever our machine is doing is causing > it to max the available IO for your disks. > > If you boot back to your previous kernel does the problem go away? > > If so you could look at the changes between the two kernel revisions > for possible causes and if needed to a binary chop with kernel builds > to narrow down the cause. > > Regards > Steve > > Regards > Steve > > > Steve, Thanks for your response. I booted with the old kernel (9.1-RC3) and the problem disappeared! We're getting 3x the performance with the previous kernel than we do with the 9.1-RELEASE-p1 kernel: Output from gstat: 1 362 0 0 0.0 345 20894 9.4 52.9| da1 1 365 0 0 0.0 348 20893 9.4 54.1| da2 1 367 0 0 0.0 350 20920 9.3 52.6| da3 1 362 0 0 0.0 345 21275 9.5 54.1| da4 1 363 0 0 0.0 346 21250 9.6 54.2| da5 1 359 0 0 0.0 342 21352 9.5 53.8| da6 1 347 0 0 0.0 330 20486 9.4 52.3| da7 1 353 0 0 0.0 336 20689 9.6 52.9| da8 1 355 0 0 0.0 338 20669 9.5 53.0| da9 1 357 0 0 0.0 340 20770 9.5 52.5| da10 1 351 0 0 0.0 334 20641 9.4 53.1| da11 1 362 0 0 0.0 345 21155 9.6 54.1| da12 The kernels were compiled identically using GENERIC with no modification. I'm no expert, but none of the stuff I've seen looking at svn commits looks like it would have any impact on this. Any clues? From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 19:07:11 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4A675E52 for ; Fri, 22 Mar 2013 19:07:11 +0000 (UTC) (envelope-from prvs=179345a321=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id CCB14C27 for ; Fri, 22 Mar 2013 19:07:10 +0000 (UTC) Received: from r2d2 ([82.12.16.150]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002874290.msg for ; Fri, 22 Mar 2013 19:07:02 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Fri, 22 Mar 2013 19:07:02 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDDNSBL-Result: mail1.multiplay.co.uk, Fri, 22 Mar 2013 19:07:02 +0000 zen.spamhaus.org returned result of 127.0.0.11 X-MDRemoteIP: 82.12.16.150 X-Return-Path: prvs=179345a321=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: <12CCA57CCC7E4F16A1147F8422F5F151@multiplay.co.uk> From: "Steven Hartland" To: "Josh Beard" References: Subject: Re: ZFS + NFS poor performance after restarting from 100 day uptime Date: Fri, 22 Mar 2013 19:07:06 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 19:07:11 -0000 > ----- Original Message ----- > From: Josh Beard >> A snip of gstat: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name ... >> 4 160 126 1319 31.3 34 100 0.1 100.3| da1 >> 4 146 110 1289 33.6 36 98 0.1 97.8| da2 >> 4 142 107 1370 36.1 35 101 0.2 101.9| da3 >> 4 121 95 1360 35.6 26 19 0.1 95.9| da4 >> 4 151 117 1409 34.0 34 102 0.1 100.1| da5 >> 4 141 109 1366 35.9 32 101 0.1 97.9| da6 >> 4 136 118 1207 24.6 18 13 0.1 87.0| da7 >> 4 118 102 1278 32.2 16 12 0.1 89.8| da8 >> 4 138 116 1240 33.4 22 55 0.1 100.0| da9 >> 4 133 117 1269 27.8 16 13 0.1 86.5| da10 >> 4 121 102 1302 53.1 19 51 0.1 100.0| da11 >> 4 120 99 1242 40.7 21 51 0.1 99.7| da12 >> >> Your ops/s are be maxing your disks. You say "only" but the ~190 ops/s >> is what HD's will peak at, so whatever our machine is doing is causing >> it to max the available IO for your disks. >> >> If you boot back to your previous kernel does the problem go away? >> >> If so you could look at the changes between the two kernel revisions >> for possible causes and if needed to a binary chop with kernel builds >> to narrow down the cause. > > Thanks for your response. I booted with the old kernel (9.1-RC3) and the > problem disappeared! We're getting 3x the performance with the previous > kernel than we do with the 9.1-RELEASE-p1 kernel: > > Output from gstat: > > 1 362 0 0 0.0 345 20894 9.4 52.9| da1 > 1 365 0 0 0.0 348 20893 9.4 54.1| da2 > 1 367 0 0 0.0 350 20920 9.3 52.6| da3 > 1 362 0 0 0.0 345 21275 9.5 54.1| da4 > 1 363 0 0 0.0 346 21250 9.6 54.2| da5 > 1 359 0 0 0.0 342 21352 9.5 53.8| da6 > 1 347 0 0 0.0 330 20486 9.4 52.3| da7 > 1 353 0 0 0.0 336 20689 9.6 52.9| da8 > 1 355 0 0 0.0 338 20669 9.5 53.0| da9 > 1 357 0 0 0.0 340 20770 9.5 52.5| da10 > 1 351 0 0 0.0 334 20641 9.4 53.1| da11 > 1 362 0 0 0.0 345 21155 9.6 54.1| da12 > > > The kernels were compiled identically using GENERIC with no modification. > I'm no expert, but none of the stuff I've seen looking at svn commits > looks like it would have any impact on this. Any clues? Your seeing a totally different profile there Josh as in all writes no reads where as before you where seeing mainly reads and some writes. So I would ask if your sure your seeing the same work load, or has something external changed too? Might be worth rebooting back to the new kernel and seeing if your still see the issue ;-) Regards Steve Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 20:24:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 05ECAD47 for ; Fri, 22 Mar 2013 20:24:49 +0000 (UTC) (envelope-from josh@signalboxes.net) Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [IPv6:2607:f8b0:4003:c01::235]) by mx1.freebsd.org (Postfix) with ESMTP id C032C3C9 for ; Fri, 22 Mar 2013 20:24:48 +0000 (UTC) Received: by mail-ob0-f181.google.com with SMTP id ni5so4382414obc.26 for ; Fri, 22 Mar 2013 13:24:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=8eRxiX8TLbZE9RWCP1DouShI7+KEw9dBfhWyoIKBqJM=; b=eoQYNYVoVuCqCLTI+/EK5ciSnXqZht1kxjQ97SCSdwciXFvAO64H/UtMIpkmj3jqoG Yfp8p3HdhNfCRx9V1sWopjgRWcwAWj/gzkY2co2uAkZZMEsG+1/+CgpamlpIfi7vsRxd iBiNBiXUagQDh467qCrLkhpxcNzNzt7wVxSsl5WohqN5OOyvZWcv4P8sied42k1KzdrV S7DCJfuwlqRuWFWfnm8uc3Z0U3E6FfjJdWRfN+1AOUryIrdYHixyMn2uAnAEuAd2P3wc 8FBV3nfxBZuH3XA12m9BYyFEp53RAVrWfpA4HmK2G0mPzLJ31v8JA6gtWJGrlR0MOka7 V1Wg== X-Received: by 10.60.25.4 with SMTP id y4mr3125484oef.114.1363983888088; Fri, 22 Mar 2013 13:24:48 -0700 (PDT) Received: from mail-ob0-x22b.google.com (mail-ob0-x22b.google.com [2607:f8b0:4003:c01::22b]) by mx.google.com with ESMTPS id j4sm3739249oea.3.2013.03.22.13.24.47 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Mar 2013 13:24:47 -0700 (PDT) Received: by mail-ob0-f171.google.com with SMTP id x4so4398504obh.16 for ; Fri, 22 Mar 2013 13:24:47 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.12.170 with SMTP id z10mr3303695oeb.122.1363983887060; Fri, 22 Mar 2013 13:24:47 -0700 (PDT) Received: by 10.60.62.168 with HTTP; Fri, 22 Mar 2013 13:24:46 -0700 (PDT) In-Reply-To: <12CCA57CCC7E4F16A1147F8422F5F151@multiplay.co.uk> References: <12CCA57CCC7E4F16A1147F8422F5F151@multiplay.co.uk> Date: Fri, 22 Mar 2013 14:24:46 -0600 Message-ID: Subject: Re: ZFS + NFS poor performance after restarting from 100 day uptime From: Josh Beard To: Steven Hartland X-Gm-Message-State: ALoCoQmW2PohbfUENA3CiW2aPbqm47KQUKmqn5tDnqEQ3jHqUHeqgYrer6qzJOojuCSeQD2//kNk Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 20:24:49 -0000 On Fri, Mar 22, 2013 at 1:07 PM, Steven Hartland wrote: > > ----- Original Message ----- From: Josh Beard >> >>> A snip of gstat: >>> >>> dT: 1.002s w: 1.000s >>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >>> >> ... > >> 4 160 126 1319 31.3 34 100 0.1 100.3| da1 >>> 4 146 110 1289 33.6 36 98 0.1 97.8| da2 >>> 4 142 107 1370 36.1 35 101 0.2 101.9| da3 >>> 4 121 95 1360 35.6 26 19 0.1 95.9| da4 >>> 4 151 117 1409 34.0 34 102 0.1 100.1| da5 >>> 4 141 109 1366 35.9 32 101 0.1 97.9| da6 >>> 4 136 118 1207 24.6 18 13 0.1 87.0| da7 >>> 4 118 102 1278 32.2 16 12 0.1 89.8| da8 >>> 4 138 116 1240 33.4 22 55 0.1 100.0| da9 >>> 4 133 117 1269 27.8 16 13 0.1 86.5| da10 >>> 4 121 102 1302 53.1 19 51 0.1 100.0| da11 >>> 4 120 99 1242 40.7 21 51 0.1 99.7| da12 >>> >>> Your ops/s are be maxing your disks. You say "only" but the ~190 ops/s >>> is what HD's will peak at, so whatever our machine is doing is causing >>> it to max the available IO for your disks. >>> >>> If you boot back to your previous kernel does the problem go away? >>> >>> If so you could look at the changes between the two kernel revisions >>> for possible causes and if needed to a binary chop with kernel builds >>> to narrow down the cause. >>> >> >> Thanks for your response. I booted with the old kernel (9.1-RC3) and the >> problem disappeared! We're getting 3x the performance with the previous >> kernel than we do with the 9.1-RELEASE-p1 kernel: >> >> Output from gstat: >> >> 1 362 0 0 0.0 345 20894 9.4 52.9| da1 >> 1 365 0 0 0.0 348 20893 9.4 54.1| da2 >> 1 367 0 0 0.0 350 20920 9.3 52.6| da3 >> 1 362 0 0 0.0 345 21275 9.5 54.1| da4 >> 1 363 0 0 0.0 346 21250 9.6 54.2| da5 >> 1 359 0 0 0.0 342 21352 9.5 53.8| da6 >> 1 347 0 0 0.0 330 20486 9.4 52.3| da7 >> 1 353 0 0 0.0 336 20689 9.6 52.9| da8 >> 1 355 0 0 0.0 338 20669 9.5 53.0| da9 >> 1 357 0 0 0.0 340 20770 9.5 52.5| da10 >> 1 351 0 0 0.0 334 20641 9.4 53.1| da11 >> 1 362 0 0 0.0 345 21155 9.6 54.1| da12 >> >> >> The kernels were compiled identically using GENERIC with no modification. >> I'm no expert, but none of the stuff I've seen looking at svn commits >> looks like it would have any impact on this. Any clues? >> > > Your seeing a totally different profile there Josh as in all writes no > reads where as before you where seeing mainly reads and some writes. > > So I would ask if your sure your seeing the same work load, or has > something external changed too? > > Might be worth rebooting back to the new kernel and seeing if your > still see the issue ;-) > > > Regards > Steve > > Regards > Steve > > Steve, You're absolutely right. I didn't catch that, but the total ops/s is reaching quite a bit higher. Things are certainly more responsive than they have been, for what it's worth, so it "feels right." I'm also not seeing this thing consistently railed to 100% busy like I was before with similar testing (that was 50 machines just pushing data with dd). I won't be able to get a good comparison until Monday, when our students come back (this is a file server for a public school district and used for network homes). Josh From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 21:47:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8C99F3DB for ; Fri, 22 Mar 2013 21:47:52 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [176.9.45.25]) by mx1.freebsd.org (Postfix) with ESMTP id 3241AD24 for ; Fri, 22 Mar 2013 21:47:52 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.2]) by mail.vx.sk (Postfix) with ESMTP id 2F08734CC6; Fri, 22 Mar 2013 22:47:45 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk by core.vx.sk (amavisd-new, unix socket) with LMTP id nWYZ1Ao69EEs; Fri, 22 Mar 2013 22:47:43 +0100 (CET) Received: from [10.9.8.1] (chello085216226145.chello.sk [85.216.226.145]) by mail.vx.sk (Postfix) with ESMTPSA id CD14734CBC; Fri, 22 Mar 2013 22:47:42 +0100 (CET) Message-ID: <514CD17C.70709@FreeBSD.org> Date: Fri, 22 Mar 2013 22:47:40 +0100 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 MIME-Version: 1.0 To: Johan Hendriks Subject: Re: ZFS pool size to bearable free space amount ratio References: <20130322170742.93502@relay.ibs.dn.ua> <514C745B.5000803@gmail.com> In-Reply-To: <514C745B.5000803@gmail.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 21:47:52 -0000 With the recent merges from illumos (merged to 9-STABLE in r248571) this should get somewhat better. http://svnweb.freebsd.org/base?view=revision&revision=248369 https://www.illumos.org/issues/3552 https://www.illumos.org/issues/3564 https://www.illumos.org/issues/3578 More technical information on space maps: https://blogs.oracle.com/bonwick/entry/space_maps On 22.3.2013 16:10, Johan Hendriks wrote: > Zeus Panchenko schreef: > hi all, > > please, share your experience on the subject ... > > what is ZFS-pool-size to bearable-free-space-amount ratio? > > how much percents of total ZFS pool space has to be free before > performance starts worsening? > > -- Zeus V. Panchenko jid:zeus@im.ibs.dn.ua > IT Dpt., I.B.S. LLC GMT+2 (EET) >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > It is best to keep the pool below 80% > Above that, things starts to slow down. > > gr > johan Hendriks > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Martin Matuska FreeBSD committer http://blog.vx.sk From owner-freebsd-fs@FreeBSD.ORG Fri Mar 22 21:50:20 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AE12454C; Fri, 22 Mar 2013 21:50:20 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 892E0D61; Fri, 22 Mar 2013 21:50:20 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r2MLoK1Y058981; Fri, 22 Mar 2013 21:50:20 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r2MLoKXt058980; Fri, 22 Mar 2013 21:50:20 GMT (envelope-from linimon) Date: Fri, 22 Mar 2013 21:50:20 GMT Message-Id: <201303222150.r2MLoKXt058980@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-amd64@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/177240: [zfs] zpool import failed with state UNAVAIL but all disks are ONLINE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Mar 2013 21:50:20 -0000 Old Synopsis: zpool import failed with state UNAVAIL but all disks are ONLINE New Synopsis: [zfs] zpool import failed with state UNAVAIL but all disks are ONLINE Responsible-Changed-From-To: freebsd-amd64->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Mar 22 21:50:08 UTC 2013 Responsible-Changed-Why: reclassify and assign. http://www.freebsd.org/cgi/query-pr.cgi?pr=177240 From owner-freebsd-fs@FreeBSD.ORG Sat Mar 23 05:16:10 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0B5AB8A2; Sat, 23 Mar 2013 05:16:10 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [IPv6:2607:f8b0:4003:c01::235]) by mx1.freebsd.org (Postfix) with ESMTP id 82253131; Sat, 23 Mar 2013 05:16:09 +0000 (UTC) Received: by mail-ob0-f181.google.com with SMTP id ni5so4595711obc.26 for ; Fri, 22 Mar 2013 22:16:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:date:x-google-sender-auth:message-id :subject:from:to:cc:content-type; bh=Xg9biQo4hvXnDZDmfUnmAJkH4uMFiiJChxAMJbJiIfU=; b=csXqoJ8NCCxVAT14MWn8XLBNaZSf26wNcLNO8TixvzLrCDu9h+RhjQ6HA5hNeklU0t d5+ocn/XQvOVw81HEzwajuFPX+m5ML86lyUgzggvMlY7SXjv0BTnQrxlw75OkQokixMD IU5nu0u5QedqGDDmk6Fz9VfLUljnWiZrTmgHenRp09hvnyKF0BVjiH9XpD5WrTomeeus SQ7JiqUngm6eU44fXtBmlDkVW4Va+Y934bmypubEu+E6YUGVf3dZ5bO0CIF2N9e6gYIm yCsRctKgPi6KNxKo0rwvJqpraO5v4IDJVwl6xgH+F43KAz38leLKd5O7m8B9n0eeSUiR qQDg== MIME-Version: 1.0 X-Received: by 10.60.24.197 with SMTP id w5mr4237667oef.6.1364015769149; Fri, 22 Mar 2013 22:16:09 -0700 (PDT) Sender: kob6558@gmail.com Received: by 10.76.33.7 with HTTP; Fri, 22 Mar 2013 22:16:09 -0700 (PDT) Date: Fri, 22 Mar 2013 22:16:09 -0700 X-Google-Sender-Auth: hQA7GG9DWwcxXZbk16pfMKWxRwQ Message-ID: Subject: Report on issues with fusefs From: Kevin Oberman To: Attilio Rao Content-Type: text/plain; charset=UTF-8 Cc: Florian Smeets , Peter Holm , bdrewery@freebsd.org, FreeBSD FS , freebsd-current@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Mar 2013 05:16:10 -0000 I've now been using fusefs regularly for a few months and I have found a few issues that i wanted to report. Most disturbing is corrupted NTFS systems. On several occasions I have found an NTFS system could not be written to with either FreeBSD or Windows. I had to user Windows disk check to repair the file system, but a few files were lost. this may be an issue with either fusefs or ntfs-3g. Not sure which, but it is likely tied to the next issue. On several occasions an attempt to re-boot my systems when NTFS volumes were mounted failed. After a power cycle the system came back, but te file systems were not clean and had to be fscked. All UFS systems checked clean and had no errors at all. I suspect that fusefs or ntfs-3g was the cause as I have been manually unmounted the NTFS systems before issuing the shutdown. The unmount has always succeeded in an odd way (issue 3), and the system has always shut down cleanly. The failures only seem to have happened when the NTFS volumes have been written to. The final issue is that I can't unmount a single NTFS volume. I normally have two NTFS volumes mounted, but issuing a umount on either will unmount both. This is rather annoying. I assume it is the result of all fusefs filesystems being /dev/fuse. I have not been able to figure out any way to unmount only one volume. I can't say whether this has any link to the file system corruptions. Could there be an issue with one of the volumes not actually being properly unmounted when both are unmounted by a single umount? While these are a bit of an annoyance, I continue to use fusefs with ntfs-3g and it generally is working fine. -- R. Kevin Oberman, Network Engineer E-mail: rkoberman@gmail.com From owner-freebsd-fs@FreeBSD.ORG Sat Mar 23 06:48:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1BA8A495; Sat, 23 Mar 2013 06:48:42 +0000 (UTC) (envelope-from zec@fer.hr) Received: from mail.fer.hr (mail.fer.hr [161.53.72.233]) by mx1.freebsd.org (Postfix) with ESMTP id A2D473E2; Sat, 23 Mar 2013 06:48:41 +0000 (UTC) Received: from tpx32.lan (89.164.207.80) by MAIL.fer.hr (161.53.72.233) with Microsoft SMTP Server (TLS) id 14.2.309.2; Sat, 23 Mar 2013 07:48:33 +0100 From: Marko Zec To: Subject: Re: Report on issues with fusefs Date: Sat, 23 Mar 2013 07:47:54 +0100 User-Agent: KMail/1.9.10 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-ID: <201303230747.55405.zec@fer.hr> X-Originating-IP: [89.164.207.80] Cc: FreeBSD FS , Florian Smeets , Peter Holm , bdrewery@freebsd.org, Attilio Rao , Kevin Oberman X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Mar 2013 06:48:42 -0000 On Saturday 23 March 2013 06:16:09 Kevin Oberman wrote: > I've now been using fusefs regularly for a few months and I have found > a few issues that i wanted to report. > > Most disturbing is corrupted NTFS systems. On several occasions I have > found an NTFS system could not be written to with either FreeBSD or > Windows. I had to user Windows disk check to repair the file system, > but a few files were lost. this may be an issue with either fusefs or > ntfs-3g. Not sure which, but it is likely tied to the next issue. +1. This can be reproduced fairly easily by copying large file hierarchies to a NTFS volume (such as a snapshot of our src tree). Marko > On several occasions an attempt to re-boot my systems when NTFS > volumes were mounted failed. After a power cycle the system came back, > but te file systems were not clean and had to be fscked. All UFS > systems checked clean and had no errors at all. I suspect that fusefs > or ntfs-3g was the cause as I have been manually unmounted the NTFS > systems before issuing the shutdown. The unmount has always succeeded > in an odd way (issue 3), and the system has always shut down cleanly. > The failures only seem to have happened when the NTFS volumes have > been written to. > > The final issue is that I can't unmount a single NTFS volume. I > normally have two NTFS volumes mounted, but issuing a umount on either > will unmount both. This is rather annoying. I assume it is the result > of all fusefs filesystems being /dev/fuse. I have not been able to > figure out any way to unmount only one volume. I can't say whether > this has any link to the file system corruptions. Could there be an > issue with one of the volumes not actually being properly unmounted > when both are unmounted by a single umount? > > While these are a bit of an annoyance, I continue to use fusefs with > ntfs-3g and it generally is working fine. From owner-freebsd-fs@FreeBSD.ORG Sat Mar 23 20:54:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1DEF240D for ; Sat, 23 Mar 2013 20:54:06 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay02.pair.com (relay02.pair.com [209.68.5.16]) by mx1.freebsd.org (Postfix) with SMTP id B83A3794 for ; Sat, 23 Mar 2013 20:54:05 +0000 (UTC) Received: (qmail 98323 invoked by uid 0); 23 Mar 2013 20:54:03 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay02.pair.com with SMTP; 23 Mar 2013 20:54:03 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <514E166A.5070409@sneakertech.com> Date: Sat, 23 Mar 2013 16:54:02 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: ZFS: Failed pool causes system to hang References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> <20130321085304.GB16997@icarus.home.lan> <514B4D38.6090101@sneakertech.com> In-Reply-To: <514B4D38.6090101@sneakertech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Mar 2013 20:54:06 -0000 Ok, have my dmesg (sorry for the delay, I've been knocked out with a head cold the past few days). What's the preferred protocol for sending it? Should I attach it to list post, or throw it up on some website, or what? ______________________________________ it has a certain smooth-brained appeal From owner-freebsd-fs@FreeBSD.ORG Sat Mar 23 22:52:39 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A38DF1FF for ; Sat, 23 Mar 2013 22:52:39 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16]) by mx1.freebsd.org (Postfix) with ESMTP id 88629F3E for ; Sat, 23 Mar 2013 22:52:39 +0000 (UTC) Received: from omta05.emeryville.ca.mail.comcast.net ([76.96.30.43]) by qmta01.emeryville.ca.mail.comcast.net with comcast id F8QZ1l00D0vp7WLA1Asels; Sat, 23 Mar 2013 22:52:38 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta05.emeryville.ca.mail.comcast.net with comcast id FAsd1l00E1t3BNj8RAsdqu; Sat, 23 Mar 2013 22:52:37 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 3C95773A1E; Sat, 23 Mar 2013 15:52:37 -0700 (PDT) Date: Sat, 23 Mar 2013 15:52:37 -0700 From: Jeremy Chadwick To: Quartz Subject: Re: ZFS: Failed pool causes system to hang Message-ID: <20130323225237.GA85482@icarus.home.lan> References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> <20130321085304.GB16997@icarus.home.lan> <514B4D38.6090101@sneakertech.com> <514E166A.5070409@sneakertech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <514E166A.5070409@sneakertech.com> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1364079158; bh=MY5eqbCXN7estjvdzDkteS7qv81f0k0sJsI/FTb4lTY=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=Yikd0F9+dKqh6h4wflKQ0PG7PidYdZId+n8MyDTxEHkIVbSlV8BAwsOaXSSUF3G2v UJ6LgW5qXA6GWh6q/lnGaA5/c/PCz5W7Vrje53M+WhiDUcNKIxXP4VLh2eKaPaiX8n 7Q1ZJ0Uy/Af6P9VSQE7ZnppO3hhn/9yJ6HSqqDBgIQ74c47S/A30UZItXOPwfg3L2F mu7r4prxp6QFpelIQKSHZeUl59N4fxBs1qoimaEcdg7jjrZqhZVw6XFl8KQIfhoM/G h12r1TlP6RBlLuYf7KHGfLI4Uxh3hAuZXzPshBC/4Cq5cHmkkxFhQ0h2yRPbvxD+c/ qrqZvrWIym+ZQ== Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Mar 2013 22:52:39 -0000 On Sat, Mar 23, 2013 at 04:54:02PM -0400, Quartz wrote: > Ok, have my dmesg (sorry for the delay, I've been knocked out with a > head cold the past few days). > > What's the preferred protocol for sending it? Should I attach it to > list post, or throw it up on some website, or what? Pasting the contents here would be fine, pasting it on pastebin or equivalent + providing a URL is also fine. Attachments on the list are often deleted/stripped by the mailing list software, so don't do that. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |