From owner-freebsd-fs@freebsd.org Sun Jun 19 01:14:34 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A98D5A789CD for ; Sun, 19 Jun 2016 01:14:34 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: from mail-pf0-x22e.google.com (mail-pf0-x22e.google.com [IPv6:2607:f8b0:400e:c00::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7A7022F0F; Sun, 19 Jun 2016 01:14:34 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: by mail-pf0-x22e.google.com with SMTP id i123so38505661pfg.0; Sat, 18 Jun 2016 18:14:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:mime-version:in-reply-to:content-transfer-encoding :message-id:cc:from:subject:date:to; bh=A44CKbDxu5iaWZMS7kXVDJbHoOuBuxE2cSJDEam7IVI=; b=uzEr+xm4RUIXkRY2glweH1XayTggJcUuuvhPRBOohgggcxRFGfPgzOicukgdbYDGqI 4n08qXPCZTVjBcEKM5GDxC72tDnjjFcum0OBNXbixTybwbeFZGPFnJaSUNiB20jyIt72 DJcMn1bdVLFSN45A1Sm+LLKZffMpdzqSl2DFTaDqvIiy8m/oVeYDm2gm4AP8UFckZSn1 TaqtgM2mw10aEVh92haoSi8t12YUF0TU3w8MYvDZJV6t2tzYDiceJaHjh2u+vYbqx8+Q g6v841KY0quWXxlEuMr2fGUxbKyVeM9Gb6JiYVfe55zEY11QblGcgNovYErjGkEu+YBx iNbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:references:mime-version:in-reply-to :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=A44CKbDxu5iaWZMS7kXVDJbHoOuBuxE2cSJDEam7IVI=; b=iXGWZZb8xl1LkGvqAW+y7EUo0jJsd4KXQfx3BZCiUqAi/YeN8Rk+DdQ2qurYwBFt5i XMlxiV9abfJ9ncDXK6YNZEh+a0Z5GSnXzUrLmvivMMdKvObGWmhfRp+xofj3fapsoCct Ld1nzHVLPVZ4WxR0zEFwrfejO2TvZa02T//JY3fwM5ygxd0hQcdIbxeI2KewVu3C4TYl bTLYM+X7cLTU3TwTa0rKTuwa05jR+ZHtuifUKsawMMKRFt8dB+cxfAuGpNS6nUviEa9U G4vpCoP6YHEzUkSomNATUfFQUhEjdupGskJeRuqhWRbhaPa6ePjG8khndFG5qFu/TXIX cnIQ== X-Gm-Message-State: ALyK8tJ5fSZzQPTr4W4pZA/qWdxsjfmBvAsyYjOOrTeDsjU+B4tksWW1u/T0sC+/wpk8HA== X-Received: by 10.98.35.27 with SMTP id j27mr12485659pfj.10.1466298873778; Sat, 18 Jun 2016 18:14:33 -0700 (PDT) Received: from [21.247.192.62] ([172.56.30.157]) by smtp.gmail.com with ESMTPSA id wt6sm78137666pab.3.2016.06.18.18.14.31 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 18 Jun 2016 18:14:32 -0700 (PDT) References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-Id: <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com> Cc: Rick Macklem , freebsd-fs , Alexander Motin X-Mailer: iPhone Mail (13F69) From: Chris Watson Subject: Re: pNFS server Plan B Date: Sat, 18 Jun 2016 20:14:29 -0500 To: Jordan Hubbard X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 01:14:34 -0000 Since Jordan brought up clustering, I would be interested to hear Justin Gib= bs thoughts here. I know about a year ago he was asked on an "after hours" v= ideo chat hosted by Matt Aherns about a feature he would really like to see a= nd he mentioned he would really like, in a universe filled with time and mon= ey I'm sure, to work on a native clustering solution for FreeBSD. I don't kn= ow if he is subscribed to the list, and I'm certainly not throwing him under= the bus by bringing his name up, but I know he has at least been thinking a= bout this for some time and probably has some value to add here.=20 Chris Sent from my iPhone 5 > On Jun 18, 2016, at 3:50 PM, Jordan Hubbard wrote: >=20 >=20 >> On Jun 13, 2016, at 3:28 PM, Rick Macklem wrote: >>=20 >> You may have already heard of Plan A, which sort of worked >> and you could test by following the instructions here: >>=20 >> http://people.freebsd.org/~rmacklem/pnfs-setup.txt >>=20 >> However, it is very slow for metadata operations (everything other than >> read/write) and I don't think it is very useful. >=20 > Hi guys, >=20 > I finally got a chance to catch up and bring up Rick=E2=80=99s pNFS setup o= n a couple of test machines. He=E2=80=99s right, obviously - The =E2=80=9Cp= lan A=E2=80=9D approach is a bit convoluted and not at all surprisingly slow= . With all of those transits twixt kernel and userland, not to mention glus= terfs itself which has not really been tuned for our platform (there are a n= umber of papers on this we probably haven=E2=80=99t even all read yet), we=E2= =80=99re obviously still in the =E2=80=9Cfirst make it work=E2=80=9D stage. >=20 > That said, I think there are probably more possible plans than just A and B= here, and we should give the broader topic of =E2=80=9Cwhat does FreeBSD wa= nt to do in the Enterprise / Cloud computing space?" at least some considera= tion at the same time, since there are more than a few goals running in para= llel here. >=20 > First, let=E2=80=99s talk about our story around clustered filesystems + a= ssociated command-and-control APIs in FreeBSD. There is something of an emb= arrassment of riches in the industry at the moment - glusterfs, ceph, Hadoop= HDFS, RiakCS, moose, etc. All or most of them offer different pros and con= s, and all offer more than just the ability to store files and scale =E2=80=9C= elastically=E2=80=9D. They also have ReST APIs for configuring and monitori= ng the health of the cluster, some offer object as well as file storage, and= Riak offers a distributed KVS for storing information *about* file objects i= n addition to the object themselves (and when your application involves stor= ing and managing several million photos, for example, the idea of distributi= ng the index as well as the files in a fault-tolerant fashion is also compel= ling). Some, if not most, of them are also far better supported under Linux= than FreeBSD (I don=E2=80=99t think we even have a working ceph port yet). = I=E2=80=99m not saying we need to blindly follow the herds and do all the s= ame things others are doing here, either, I=E2=80=99m just saying that it=E2= =80=99s a much bigger problem space than simply =E2=80=9Cparallelizing NFS=E2= =80=9D and if we can kill multiple birds with one stone on the way to doing t= hat, we should certainly consider doing so. >=20 > Why? Because pNFS was first introduced as a draft RFC (RFC5661 ) in 2005. The linux folks have been worki= ng on it since 2006. Ten years is a long time in this business, and when I raise= d the topic of pNFS at the recent SNIA DSI conference (where storage develop= ers gather to talk about trends and things), the most prevalent reaction I g= ot was =E2=80=9Cpeople are still using pNFS?!=E2=80=9D This is clearly one= of those technologies that may still have some runway left, but it=E2=80=99= s been rapidly overtaken by other approaches to solving more or less the sam= e problems in coherent, distributed filesystem access and if we want to get m= indshare for this, we should at least have an answer ready for the =E2=80=9C= why did you guys do pNFS that way rather than just shimming it on top of ${s= omeNewerHotness}??=E2=80=9D argument. I=E2=80=99m not suggesting pNFS is d= ead - hell, even AFS still appears to be somewhat= alive, but there=E2=80=99s a difference between appealing to an increasingl= y narrow niche and trying to solve the sorts of problems most DevOps folks w= orking At Scale these days are running into. >=20 > That is also why I am not sure I would totally embrace the idea of a centr= al MDS being a Real Option. Sure, the risks can be mitigated (as you say, b= y mirroring it), but even saying the words =E2=80=9Ccentral MDS=E2=80=9D (or= central anything) may be such a turn-off to those very same DevOps folks, f= olks who have been burned so many times by SPOFs and scaling bottlenecks in l= arge environments, that we'll lose the audience the minute they hear the tri= gger phrase. Even if it means signing up for Other Problems later, it=E2=80= =99s a lot easier to =E2=80=9Csell=E2=80=9D the concept of completely distri= buted mechanisms where, if there is any notion of centralization at all, it=E2= =80=99s at least the result of a quorum election and the DevOps folks don=E2= =80=99t have to do anything manually to cause it to happen - the cluster is =E2= =80=9Cresilient" and "self-healing" and they are happy with being able to sa= y those buzzwords to the CIO, who nods knowingly and tells them they=E2=80=99= re doing a fine job! >=20 > Let=E2=80=99s get back, however, to the notion of downing multiple avians w= ith the same semi-spherical kinetic projectile: What seems to be The Rage a= t the moment, and I don=E2=80=99t know how well it actually scales since I=E2= =80=99ve yet to be at the pointy end of such a real-world deployment, is the= idea of clustering the storage (=E2=80=9Csomehow=E2=80=9D) underneath and t= hen providing NFS and SMB protocol access entirely in userland, usually with= both of those services cooperating with the same lock manager and even the s= ame ACL translation layer. Our buddies at Red Hat do this with glusterfs at= the bottom and NFS Ganesha + Samba on top - I talked to one of the Samba co= re team guys at SNIA and he indicated that this was increasingly common, wit= h the team having helped here and there when approached by different vendors= with the same idea. We (iXsystems) also get a lot of requests to be able t= o make the same file(s) available via both NFS and SMB at the same time and t= hey don=E2=80=99t much at all like being told =E2=80=9Cbut that=E2=80=99s da= ngerous - don=E2=80=99t do that! Your file contents and permissions models a= re not guaranteed to survive such an experience!=E2=80=9D They really want t= o do it, because the rest of the world lives in Heterogenous environments an= d that=E2=80=99s just the way it is. >=20 > Even the object storage folks, like Openstack=E2=80=99s Swift project, are= spending significant amounts of mental energy on the topic of how to re-exp= ort their object stores as shared filesystems over NFS and SMB, the single c= onsistent and distributed object store being, of course, Their Thing. They w= ish, of course, that the rest of the world would just fall into line and use= their object system for everything, but they also get that the "legacy stuf= f=E2=80=9D just won=E2=80=99t go away and needs some sort of attention if th= ey=E2=80=99re to remain players at the standards table. >=20 > So anyway, that=E2=80=99s the view I have from the perspective of someone w= ho actually sells storage solutions for a living, and while I could certainl= y =E2=80=9Csell some pNFS=E2=80=9D to various customers who just want to add= a dash of steroids to their current NFS infrastructure, or need to use NFS b= ut also need to store far more data into a single namespace than any one box= will accommodate, I also know that offering even more elastic solutions wil= l be a necessary part of offering solutions to the growing contingent of fol= ks who are not tied to any existing storage infrastructure and have various n= on-greybearded folks shouting in their ears about object this and cloud that= . Might there not be some compromise solution which allows us to put more o= f this in userland with less context switches in and out of the kernel, also= giving us the option of presenting a more united front to multiple protocol= s that require more ACL and lock impedance-matching than we=E2=80=99d ever w= ant to put in the kernel anyway? >=20 > - Jordan >=20 >=20 >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Sun Jun 19 01:50:57 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68A02A7A044 for ; Sun, 19 Jun 2016 01:50:57 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: from barracuda.ixsystems.com (barracuda.ixsystems.com [12.229.62.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certificate Authority - G2" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B7811B55 for ; Sun, 19 Jun 2016 01:50:56 +0000 (UTC) (envelope-from jkh@ixsystems.com) X-ASG-Debug-ID: 1466301054-08ca041142196c80001-3nHGF7 Received: from zimbra.ixsystems.com ([10.246.0.20]) by barracuda.ixsystems.com with ESMTP id k71RYlA3B161ecdL (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 18 Jun 2016 18:50:54 -0700 (PDT) X-Barracuda-Envelope-From: jkh@ixsystems.com X-Barracuda-RBL-Trusted-Forwarder: 10.246.0.20 X-ASG-Whitelist: Client Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 4B213DD0D9E; Sat, 18 Jun 2016 18:50:54 -0700 (PDT) Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id HvfPflKr0gQY; Sat, 18 Jun 2016 18:50:53 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 477A8DD0D9D; Sat, 18 Jun 2016 18:50:53 -0700 (PDT) X-Virus-Scanned: amavisd-new at ixsystems.com Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 7P2zdQr-883f; Sat, 18 Jun 2016 18:50:53 -0700 (PDT) Received: from [172.20.0.10] (vpn.ixsystems.com [10.249.0.2]) by zimbra.ixsystems.com (Postfix) with ESMTPSA id AB66BDD0D6C; Sat, 18 Jun 2016 18:50:52 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: pNFS server Plan B From: Jordan Hubbard X-ASG-Orig-Subj: Re: pNFS server Plan B In-Reply-To: <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com> Date: Sat, 18 Jun 2016 18:50:52 -0700 Cc: Rick Macklem , freebsd-fs , Alexander Motin Content-Transfer-Encoding: quoted-printable Message-Id: References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com> To: Chris Watson X-Mailer: Apple Mail (2.3124) X-Barracuda-Connect: UNKNOWN[10.246.0.20] X-Barracuda-Start-Time: 1466301054 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://10.246.0.26:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ixsystems.com X-Barracuda-BRTS-Status: 1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 01:50:57 -0000 > On Jun 18, 2016, at 6:14 PM, Chris Watson wrote: >=20 > Since Jordan brought up clustering, I would be interested to hear = Justin Gibbs thoughts here. I know about a year ago he was asked on an = "after hours" video chat hosted by Matt Aherns about a feature he would = really like to see and he mentioned he would really like, in a universe = filled with time and money I'm sure, to work on a native clustering = solution for FreeBSD. I don't know if he is subscribed to the list, and = I'm certainly not throwing him under the bus by bringing his name up, = but I know he has at least been thinking about this for some time and = probably has some value to add here.=20 I think we should also be careful to define our terms in such a = discussion. Specifically: 1. Are we talking about block-level clustering underneath ZFS (e.g. HAST = or ${somethingElse}) or otherwise incorporated into ZFS itself at some = low level? If you Google for =E2=80=9CHigh-availability ZFS=E2=80=9D = you will encounter things like RSF-1 or the somewhat more mysterious = Zetavault (http://www.zeta.systems/zetavault/high-availability/) but = it=E2=80=99s not entirely clear how these technologies work, they simply = claim to =E2=80=9Cscale-out ZFS=E2=80=9D or =E2=80=9Ccluster ZFS=E2=80=9D = (which can be done within ZFS or one level above and still probably pass = the Marketing Test for what people are willing to put on a web page). 2. Are we talking about clustering at a slightly higher level, in a = filesystem-agnostic fashion which still preserves filesystem semantics? 3. Are we talking about clustering for data objects, in a fashion which = does not necessarily provide filesystem semantics (a sharding database = which can store arbitrary BLOBs would qualify)? For all of the above: Are we seeking to be compatible with any other = mechanisms, or are we talking about a FreeBSD-only solution? This is why I brought up glusterfs / ceph / RiakCS in my previous = comments - when talking to the $users that Rick wants to involve in the = discussion, they rarely come to the table asking for =E2=80=9Csome or = any sort of clustering, don=E2=80=99t care which or how it works=E2=80=9D = - they ask if I can offer an S3 compatible object store with horizontal = scaling, or if they can use NFS in some clustered fashion where = there=E2=80=99s a single namespace offering petabytes of storage with = configurable redundancy such that no portion of that namespace is ever = unavailable. I=E2=80=99d be interested in what Justin had in mind when he asked Matt = about this. Being able to =E2=80=9Cattach ZFS pools to one another=E2=80=9D= in such a fashion that all clients just see One Big Pool and ZFS=E2=80=99= s own redundancy / snapshotting characteristics magically apply to the = =C3=BCberpool would be Pretty Cool, obviously, and would allow one to do = round-robin DNS for NFS such that any node could serve the same = contents, but that also sounds pretty ambitious, depending on how it=E2=80= =99s implemented. - Jordan From owner-freebsd-fs@freebsd.org Sun Jun 19 11:38:59 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0EE7DA7A24E for ; Sun, 19 Jun 2016 11:38:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F3010297E for ; Sun, 19 Jun 2016 11:38:58 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5JBcwaC070227 for ; Sun, 19 Jun 2016 11:38:58 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 210347] External USB Raidz2 zpool cause zfsloader auto reboot Date: Sun, 19 Jun 2016 11:38:59 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 11:38:59 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D210347 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-fs@FreeBSD.org --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sun Jun 19 16:31:30 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 727C7A7A70B for ; Sun, 19 Jun 2016 16:31:30 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 45F2B2A2A; Sun, 19 Jun 2016 16:31:29 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u5JGVE9R084240 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sun, 19 Jun 2016 09:31:19 -0700 (PDT) (envelope-from julian@freebsd.org) Subject: Re: pNFS server Plan B To: Jordan Hubbard , Chris Watson References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com> Cc: freebsd-fs , Alexander Motin From: Julian Elischer Message-ID: <61bf7e38-4164-c784-6301-50e73564745c@freebsd.org> Date: Mon, 20 Jun 2016 00:31:08 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 16:31:30 -0000 On 19/06/2016 9:50 AM, Jordan Hubbard wrote: > 1. Are we talking about block-level clustering underneath ZFS (e.g. HAST or ${somethingElse}) or otherwise incorporated into ZFS itself at some low level? If you Google for “High-availability ZFS” you will encounter things like RSF-1 or the somewhat more mysterious Zetavault (http://www.zeta.systems/zetavault/high-availability/) but it’s not entirely clear how these technologies work, they simply claim to “scale-out ZFS” or “cluster ZFS” (which can be done within ZFS or one level above and still probably pass the Marketing Test for what people are willing to put on a web page). umm look at Panzura who have been selling this on FreeBSD for 4 years and need FreeBSD devs in the bay area (or closer than me)) From owner-freebsd-fs@freebsd.org Sun Jun 19 17:54:14 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D0E12A7A113 for ; Sun, 19 Jun 2016 17:54:14 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: from barracuda.ixsystems.com (barracuda.ixsystems.com [12.229.62.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certificate Authority - G2" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B37432408 for ; Sun, 19 Jun 2016 17:54:14 +0000 (UTC) (envelope-from jkh@ixsystems.com) X-ASG-Debug-ID: 1466358853-08ca04114219bf90001-3nHGF7 Received: from zimbra.ixsystems.com ([10.246.0.20]) by barracuda.ixsystems.com with ESMTP id yOgj3asggQMQrDI5 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 19 Jun 2016 10:54:13 -0700 (PDT) X-Barracuda-Envelope-From: jkh@ixsystems.com X-Barracuda-RBL-Trusted-Forwarder: 10.246.0.20 X-ASG-Whitelist: Client Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 3A39EDD1270; Sun, 19 Jun 2016 10:54:13 -0700 (PDT) Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id XFRzIUlePSgb; Sun, 19 Jun 2016 10:54:13 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id C9931DD1274; Sun, 19 Jun 2016 10:54:12 -0700 (PDT) X-Virus-Scanned: amavisd-new at ixsystems.com Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 8qmz5ZUk6NvR; Sun, 19 Jun 2016 10:54:12 -0700 (PDT) Received: from [172.20.0.6] (vpn.ixsystems.com [10.249.0.2]) by zimbra.ixsystems.com (Postfix) with ESMTPSA id 65313DD1270; Sun, 19 Jun 2016 10:54:12 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: pNFS server Plan B From: Jordan Hubbard X-ASG-Orig-Subj: Re: pNFS server Plan B In-Reply-To: <61bf7e38-4164-c784-6301-50e73564745c@freebsd.org> Date: Sun, 19 Jun 2016 10:54:13 -0700 Cc: Chris Watson , freebsd-fs , Alexander Motin Content-Transfer-Encoding: quoted-printable Message-Id: References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com> <61bf7e38-4164-c784-6301-50e73564745c@freebsd.org> To: Julian Elischer X-Mailer: Apple Mail (2.3124) X-Barracuda-Connect: UNKNOWN[10.246.0.20] X-Barracuda-Start-Time: 1466358853 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://10.246.0.26:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ixsystems.com X-Barracuda-BRTS-Status: 1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 17:54:14 -0000 > On Jun 19, 2016, at 9:31 AM, Julian Elischer = wrote: >=20 > umm look at Panzura who have been selling this on FreeBSD for 4 years = and need FreeBSD devs in the bay area (or closer than me)) Well, unlike Panzura, I think we=E2=80=99re also looking for an open = source solution that can be upstreamed to FreeBSD and/or (probably = better) the OpenZFS project. Any takers on that? My hand is up. :) - Jordan From owner-freebsd-fs@freebsd.org Sun Jun 19 19:38:10 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44F04A7ADFB for ; Sun, 19 Jun 2016 19:38:10 +0000 (UTC) (envelope-from kayasaman@gmail.com) Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com [IPv6:2a00:1450:400c:c09::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B81B52726 for ; Sun, 19 Jun 2016 19:38:09 +0000 (UTC) (envelope-from kayasaman@gmail.com) Received: by mail-wm0-x229.google.com with SMTP id a66so52085911wme.0 for ; Sun, 19 Jun 2016 12:38:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=to:from:subject:message-id:date:user-agent:mime-version :content-transfer-encoding; bh=HU0kcL6RSAjOe42a8yzDDEC3w20eR8/JO8Y1J1WG8oI=; b=X1KmXEU8SplTvvv1tcmBr9aEud4vmYzYXaq6iPYrZGrQ0VpgoTSxeiRdBW5kx+wDzJ /QYx6Ap3cfGScYst5NMdLP497QSrXCFOQU6LQDKDXT3UDhqeKezYxtjGqzYEZseUw0BY xvm/fyFu0x8iDwXlE3B7jITpwnJ7+6OqBp+CV9IgWxnC8MPnG91BHEAokuECg54c4G0A 1wPUdZKCEfGp7lwXnJRfJmYvJCem1Fq3IYFDYFSYk3uboSeVYVYqskc2ehVzTl5L22/E Z17nlgnUuc2um5YMH/2ZN8PDHoM4vHmup8DgO5Z3/IwuZnyY+rSKERm5r6oFpOalbEzn 36pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-transfer-encoding; bh=HU0kcL6RSAjOe42a8yzDDEC3w20eR8/JO8Y1J1WG8oI=; b=PboFbD6V41HYusd+4pyk4q6gB2zTyrAqbY8OUKlMWJMD1ElGZcac+hkhQXAkyu44jE y8F5DqEphQkTYuFe0dsFUZdAOmQFcTEgs1kUMWURh4Mj889/6QhyNKj+7S0X6wiYWkgT OXIjzpIeN/nFYTW0q9Km4IPvvE4EcZ/XyAaXMtTOn5rGE7ZFyq8Q/+PaXEOWg5TNs8v/ sZqSdCf/aoj+M9kxyhDQw7JUKWbvyXTFRa+/FDYpsFE/ZuhU7V+/i4qT3mN3rs1q+BSw FVpc3+oaf6Pn0dknfLcLGkRdLitXriBAAC1YT3+YQ0FbFSQoCuzDR0+SXzljw/Hs/Yw9 cUlA== X-Gm-Message-State: ALyK8tIlFrd1oIi8KwgS6bnZyU2GUqNfnWpGZhynxOOnPG4kWZJhUmMqJMBCRsdh6qlCdQ== X-Received: by 10.28.141.4 with SMTP id p4mr8325567wmd.46.1466365087968; Sun, 19 Jun 2016 12:38:07 -0700 (PDT) Received: from [192.168.20.30] (optiplexnetworks.plus.com. [212.159.80.17]) by smtp.googlemail.com with ESMTPSA id z5sm4727158wme.5.2016.06.19.12.38.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 Jun 2016 12:38:07 -0700 (PDT) To: FreeBSD Filesystems From: Kaya Saman Subject: High CPU Interrupt using ZFS Message-ID: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> Date: Sun, 19 Jun 2016 20:38:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 19:38:10 -0000 Hi, I have a strange problem and I'm not sure if anyone has ever experienced this to help give me some advice on how to tackle it. Basically I run ZFS as root FS mirrored over two drives which are directly connected to the SATA connectors on a SuperMicro Xeon E5 server based MB. Then I have an LSI HBA connected to the remaining disks with various ZPOOLs. The main pool has ZIL and L2ARC enabled. As the majority of data is A/V content I disabled Prefetch as instructed in the FreeBSD tuning tips guide. https://www.freebsd.org/doc/handbook/zfs.html For some reason after a period of time the CPU interrupt will just got sky high and the system will totally bog down. My home drive is running off the "Main Pool" too and when this happens it becomes inaccessible. The system runs FBSD 10.3: 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016 ZPOOL List output: # zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT ZPOOL_2 27.2T 26.3T 884G - 41% 96% 1.00x ONLINE - ZPOOL_3 298G 248G 50.2G - 34% 83% 1.00x ONLINE - ZPOOL_4 1.81T 1.75T 66.4G - 25% 96% 1.00x ONLINE - ZPOOL_5 186G 171G 14.9G - 62% 92% 1.00x ONLINE - workspaces 119G 77.7G 41.3G - 56% 65% 1.00x ONLINE - zroot 111G 88.9G 22.1G - 70% 80% 1.00x ONLINE - The system has a Xeon E5 with 24GB RAM and 16GB of Swap space. I also run 5x jails on this box. 1x for network based monitoring (munin, zabbix etc) 1x DB jail which runs Postgresql and Mysql + some others; they are all run off the ZRoot Boot Loader info: zfs_load="YES" kern.ipc.semmni=6000000 kern.ipc.semmns=6000000 kern.ipc.semmnu=256 net.isr.numthreads=4 net.isr.maxthreads=4 net.isr.bindthreads=1 vfs.zfs.l2arc_noprefetch=1 Other information: # camcontrol devlist at scbus0 target 8 lun 0 (pass0,da0) at scbus0 target 10 lun 0 (pass1,da1) at scbus0 target 11 lun 0 (pass2,da2) at scbus0 target 12 lun 0 (pass3,da3) at scbus0 target 13 lun 0 (pass4,ses0) at scbus0 target 14 lun 0 (pass5,da4) at scbus0 target 15 lun 0 (pass6,da5) at scbus0 target 17 lun 0 (pass7,da6) at scbus0 target 18 lun 0 (pass8,da7) at scbus0 target 19 lun 0 (pass9,da8) at scbus0 target 20 lun 0 (pass10,da9) at scbus0 target 21 lun 0 (pass11,da10) at scbus0 target 22 lun 0 (pass12,da11) at scbus0 target 29 lun 0 (pass13,da12) at scbus0 target 30 lun 0 (pass14,da13) at scbus0 target 31 lun 0 (pass15,da14) at scbus0 target 34 lun 0 (pass16,da15) at scbus0 target 35 lun 0 (pass17,da16) at scbus0 target 36 lun 0 (pass18,da17) at scbus0 target 37 lun 0 (pass19,da18) at scbus0 target 38 lun 0 (pass20,da19) at scbus0 target 40 lun 0 (pass21,da20) at scbus0 target 41 lun 0 (pass22,da21) at scbus2 target 0 lun 0 (pass23,ada0) at scbus3 target 0 lun 0 (pass24,ada1) at scbus8 target 0 lun 0 (pass25,ses1) Sysctl output for ZFS: # sysctl -a |grep zfs 2 PART diskid/DISK-1350790500009986007Fp2 229319956992 512 i 2 o 10737435648 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b 2 PART diskid/DISK-1350790500009986007Fp1 10737418240 512 i 1 o 17408 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b 2 PART diskid/DISK-13507905000099860071p2 229319956992 512 i 2 o 10737435648 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b 2 PART diskid/DISK-13507905000099860071p1 10735321088 512 i 1 o 2097152 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b 2 PART diskid/DISK-14067903000097960BD7p3 119445590016 512 i 3 o 8590065664 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b 1 PART ada0p3 119445590016 512 i 3 o 8590065664 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b z0xfffff80012422d00 [shape=box,label="ZFS::VDEV\nzfs::vdev\nr#4"]; zfs::vdev freebsd-zfs freebsd-zfs freebsd-zfs freebsd-zfs freebsd-zfs freebsd-zfs vfs.zfs.trim.max_interval: 1 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.enabled: 1 vfs.zfs.vol.unmap_enabled: 1 vfs.zfs.vol.mode: 1 vfs.zfs.version.zpl: 5 vfs.zfs.version.spa: 5000 vfs.zfs.version.acl: 1 vfs.zfs.version.ioctl: 5 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 vfs.zfs.sync_pass_rewrite: 2 vfs.zfs.sync_pass_dont_compress: 5 vfs.zfs.sync_pass_deferred_free: 2 vfs.zfs.zio.exclude_metadata: 0 vfs.zfs.zio.use_uma: 1 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_replay_disable: 0 vfs.zfs.min_auto_ashift: 9 vfs.zfs.max_auto_ashift: 13 vfs.zfs.vdev.trim_max_pending: 10000 vfs.zfs.vdev.bio_delete_disable: 0 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.trim_max_active: 64 vfs.zfs.vdev.trim_min_active: 1 vfs.zfs.vdev.scrub_max_active: 2 vfs.zfs.vdev.scrub_min_active: 1 vfs.zfs.vdev.async_write_max_active: 10 vfs.zfs.vdev.async_write_min_active: 1 vfs.zfs.vdev.async_read_max_active: 3 vfs.zfs.vdev.async_read_min_active: 1 vfs.zfs.vdev.sync_write_max_active: 10 vfs.zfs.vdev.sync_write_min_active: 10 vfs.zfs.vdev.sync_read_max_active: 10 vfs.zfs.vdev.sync_read_min_active: 10 vfs.zfs.vdev.max_active: 1000 vfs.zfs.vdev.async_write_active_max_dirty_percent: 60 vfs.zfs.vdev.async_write_active_min_dirty_percent: 30 vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1 vfs.zfs.vdev.mirror.non_rotating_inc: 0 vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576 vfs.zfs.vdev.mirror.rotating_seek_inc: 5 vfs.zfs.vdev.mirror.rotating_inc: 0 vfs.zfs.vdev.trim_on_init: 1 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 0 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.metaslabs_per_vdev: 200 vfs.zfs.txg.timeout: 5 vfs.zfs.space_map_blksz: 4096 vfs.zfs.spa_slop_shift: 5 vfs.zfs.spa_asize_inflation: 24 vfs.zfs.deadman_enabled: 1 vfs.zfs.deadman_checktime_ms: 5000 vfs.zfs.deadman_synctime_ms: 1000000 vfs.zfs.recover: 0 vfs.zfs.spa_load_verify_data: 1 vfs.zfs.spa_load_verify_metadata: 1 vfs.zfs.spa_load_verify_maxinflight: 10000 vfs.zfs.check_hostid: 1 vfs.zfs.mg_fragmentation_threshold: 85 vfs.zfs.mg_noalloc_threshold: 0 vfs.zfs.condense_pct: 200 vfs.zfs.metaslab.bias_enabled: 1 vfs.zfs.metaslab.lba_weighting_enabled: 1 vfs.zfs.metaslab.fragmentation_factor_enabled: 1 vfs.zfs.metaslab.preload_enabled: 1 vfs.zfs.metaslab.preload_limit: 3 vfs.zfs.metaslab.unload_delay: 8 vfs.zfs.metaslab.load_pct: 50 vfs.zfs.metaslab.min_alloc_size: 33554432 vfs.zfs.metaslab.df_free_pct: 4 vfs.zfs.metaslab.df_alloc_threshold: 131072 vfs.zfs.metaslab.debug_unload: 0 vfs.zfs.metaslab.debug_load: 0 vfs.zfs.metaslab.fragmentation_threshold: 70 vfs.zfs.metaslab.gang_bang: 16777217 vfs.zfs.free_bpobj_enabled: 1 vfs.zfs.free_max_blocks: 18446744073709551615 vfs.zfs.no_scrub_prefetch: 0 vfs.zfs.no_scrub_io: 0 vfs.zfs.resilver_min_time_ms: 3000 vfs.zfs.free_min_time_ms: 1000 vfs.zfs.scan_min_time_ms: 1000 vfs.zfs.scan_idle: 50 vfs.zfs.scrub_delay: 4 vfs.zfs.resilver_delay: 2 vfs.zfs.top_maxinflight: 32 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.max_distance: 8388608 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 0 vfs.zfs.delay_scale: 500000 vfs.zfs.delay_min_dirty_percent: 60 vfs.zfs.dirty_data_sync: 67108864 vfs.zfs.dirty_data_max_percent: 10 vfs.zfs.dirty_data_max_max: 4294967296 vfs.zfs.dirty_data_max: 2570453401 vfs.zfs.max_recordsize: 1048576 vfs.zfs.mdcomp_disable: 0 vfs.zfs.nopwrite_enabled: 1 vfs.zfs.dedup.prefetch: 1 vfs.zfs.l2c_only_size: 0 vfs.zfs.mfu_ghost_data_lsize: 3288968704 vfs.zfs.mfu_ghost_metadata_lsize: 5136092672 vfs.zfs.mfu_ghost_size: 8425061376 vfs.zfs.mfu_data_lsize: 8574981632 vfs.zfs.mfu_metadata_lsize: 68123648 vfs.zfs.mfu_size: 8745474560 vfs.zfs.mru_ghost_data_lsize: 5324684800 vfs.zfs.mru_ghost_metadata_lsize: 923847680 vfs.zfs.mru_ghost_size: 6248532480 vfs.zfs.mru_data_lsize: 1456756224 vfs.zfs.mru_metadata_lsize: 1278004224 vfs.zfs.mru_size: 2862586368 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 2841088 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 1 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 134217728 vfs.zfs.l2arc_write_max: 67108864 vfs.zfs.arc_meta_limit: 5979973632 vfs.zfs.arc_free_target: 42350 vfs.zfs.arc_shrink_shift: 7 vfs.zfs.arc_average_blocksize: 8192 vfs.zfs.arc_min: 2989986816 vfs.zfs.arc_max: 23919894528 debug.zfs_flags: 0 kstat.zfs.misc.vdev_cache_stats.misses: 0 kstat.zfs.misc.vdev_cache_stats.hits: 0 kstat.zfs.misc.vdev_cache_stats.delegations: 0 kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch: 3034069 kstat.zfs.misc.arcstats.sync_wait_for_async: 1779944 kstat.zfs.misc.arcstats.arc_meta_min: 1494993408 kstat.zfs.misc.arcstats.arc_meta_max: 12233249160 kstat.zfs.misc.arcstats.arc_meta_limit: 5979973632 kstat.zfs.misc.arcstats.arc_meta_used: 4638138472 kstat.zfs.misc.arcstats.duplicate_reads: 1709068 kstat.zfs.misc.arcstats.duplicate_buffers_size: 0 kstat.zfs.misc.arcstats.duplicate_buffers: 0 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 2200 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 4772510 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 561679501832704 kstat.zfs.misc.arcstats.l2_write_pios: 377935 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 1193136 kstat.zfs.misc.arcstats.l2_write_full: 148 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 264598116 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 83 kstat.zfs.misc.arcstats.l2_write_in_l2: 5284665382 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 14626890890 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 3318575 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 6328431 kstat.zfs.misc.arcstats.l2_compress_failures: 655251 kstat.zfs.misc.arcstats.l2_compress_zeros: 0 kstat.zfs.misc.arcstats.l2_compress_successes: 1205377 kstat.zfs.misc.arcstats.l2_hdr_size: 63556704 kstat.zfs.misc.arcstats.l2_asize: 84595239936 kstat.zfs.misc.arcstats.l2_size: 93178570752 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_cksum_bad: 0 kstat.zfs.misc.arcstats.l2_abort_lowmem: 42 kstat.zfs.misc.arcstats.l2_cdata_free_on_write: 41 kstat.zfs.misc.arcstats.l2_free_on_write: 722 kstat.zfs.misc.arcstats.l2_evict_l1cached: 0 kstat.zfs.misc.arcstats.l2_evict_reading: 0 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 kstat.zfs.misc.arcstats.l2_writes_lock_retry: 63 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_done: 377935 kstat.zfs.misc.arcstats.l2_writes_sent: 377935 kstat.zfs.misc.arcstats.l2_write_bytes: 101118255104 kstat.zfs.misc.arcstats.l2_read_bytes: 59571878912 kstat.zfs.misc.arcstats.l2_rw_clash: 0 kstat.zfs.misc.arcstats.l2_feeds: 1193136 kstat.zfs.misc.arcstats.l2_misses: 137818470 kstat.zfs.misc.arcstats.l2_hits: 3613135 kstat.zfs.misc.arcstats.mfu_ghost_evictable_metadata: 5136092672 kstat.zfs.misc.arcstats.mfu_ghost_evictable_data: 3722561024 kstat.zfs.misc.arcstats.mfu_ghost_size: 8858653696 kstat.zfs.misc.arcstats.mfu_evictable_metadata: 68123648 kstat.zfs.misc.arcstats.mfu_evictable_data: 8575112704 kstat.zfs.misc.arcstats.mfu_size: 8745605632 kstat.zfs.misc.arcstats.mru_ghost_evictable_metadata: 923847680 kstat.zfs.misc.arcstats.mru_ghost_evictable_data: 5324684800 kstat.zfs.misc.arcstats.mru_ghost_size: 6248532480 kstat.zfs.misc.arcstats.mru_evictable_metadata: 1278004224 kstat.zfs.misc.arcstats.mru_evictable_data: 1457411584 kstat.zfs.misc.arcstats.mru_size: 2863241728 kstat.zfs.misc.arcstats.anon_evictable_metadata: 0 kstat.zfs.misc.arcstats.anon_evictable_data: 0 kstat.zfs.misc.arcstats.anon_size: 2038272 kstat.zfs.misc.arcstats.other_size: 2797801256 kstat.zfs.misc.arcstats.metadata_size: 1576374272 kstat.zfs.misc.arcstats.data_size: 10034527744 kstat.zfs.misc.arcstats.hdr_size: 200406240 kstat.zfs.misc.arcstats.size: 14672666216 kstat.zfs.misc.arcstats.c_max: 23919894528 kstat.zfs.misc.arcstats.c_min: 2989986816 kstat.zfs.misc.arcstats.c: 14673666683 kstat.zfs.misc.arcstats.p: 8668447917 kstat.zfs.misc.arcstats.hash_chain_max: 7 kstat.zfs.misc.arcstats.hash_chains: 219061 kstat.zfs.misc.arcstats.hash_collisions: 33107789 kstat.zfs.misc.arcstats.hash_elements_max: 1529284 kstat.zfs.misc.arcstats.hash_elements: 1529163 kstat.zfs.misc.arcstats.evict_l2_skip: 0 kstat.zfs.misc.arcstats.evict_l2_ineligible: 353901531136 kstat.zfs.misc.arcstats.evict_l2_eligible: 611148992512 kstat.zfs.misc.arcstats.evict_l2_cached: 471776311808 kstat.zfs.misc.arcstats.evict_not_enough: 2164 kstat.zfs.misc.arcstats.evict_skip: 232562 kstat.zfs.misc.arcstats.mutex_miss: 17547 kstat.zfs.misc.arcstats.deleted: 10350064 kstat.zfs.misc.arcstats.allocated: 172235521 kstat.zfs.misc.arcstats.mfu_ghost_hits: 8494679 kstat.zfs.misc.arcstats.mfu_hits: 1457647309 kstat.zfs.misc.arcstats.mru_ghost_hits: 5765227 kstat.zfs.misc.arcstats.mru_hits: 90829356 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 4657105 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 14515029 kstat.zfs.misc.arcstats.prefetch_data_misses: 6610395 kstat.zfs.misc.arcstats.prefetch_data_hits: 6837739 kstat.zfs.misc.arcstats.demand_metadata_misses: 127929204 kstat.zfs.misc.arcstats.demand_metadata_hits: 404655615 kstat.zfs.misc.arcstats.demand_data_misses: 2235496 kstat.zfs.misc.arcstats.demand_data_hits: 1138662563 kstat.zfs.misc.arcstats.misses: 141432200 kstat.zfs.misc.arcstats.hits: 1564670946 kstat.zfs.misc.zcompstats.skipped_insufficient_gain: 4581339 kstat.zfs.misc.zcompstats.empty: 842987 kstat.zfs.misc.zcompstats.attempts: 121463608 kstat.zfs.misc.zfetchstats.max_streams: 2029717049 kstat.zfs.misc.zfetchstats.misses: 2043239863 kstat.zfs.misc.zfetchstats.hits: 15544425 kstat.zfs.misc.xuio_stats.write_buf_nocopy: 1453761 kstat.zfs.misc.xuio_stats.write_buf_copied: 0 kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0 kstat.zfs.misc.xuio_stats.read_buf_copied: 0 kstat.zfs.misc.xuio_stats.onloan_write_buf: 0 kstat.zfs.misc.xuio_stats.onloan_read_buf: 0 kstat.zfs.misc.zio_trim.failed: 0 kstat.zfs.misc.zio_trim.unsupported: 867 kstat.zfs.misc.zio_trim.success: 165333478 kstat.zfs.misc.zio_trim.bytes: 9734294003712 security.jail.param.allow.mount.zfs: 0 security.jail.mount_zfs_allowed: 0 I really don't know but could it be a conflict between the MB SATA ports and LSI HBA?? As upon startup there do seem to be some ATA Error messages in dmesg... So more of a physical HW issue then FS based? Or is it due to the term "bursty IO" that happens with ZFS... either way I have been looking at this for months trying to figure things out but other then a reboot nothing I do makes things better! Turning off my monitoring jail does help on occasion but outside of that I'm lost. I have another NAS based system with UFS root on SSD also yet ZPOOLs over the various large mechanical drives but never run into this particular issue! Would anyone be able to help??? Many thanks. Kaya From owner-freebsd-fs@freebsd.org Sun Jun 19 19:46:02 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5C9C8A7907F for ; Sun, 19 Jun 2016 19:46:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CB192BE3 for ; Sun, 19 Jun 2016 19:46:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5JJk190060132 for ; Sun, 19 Jun 2016 19:46:02 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 139715] [zfs] vfs.numvnodes leak on busy zfs Date: Sun, 19 Jun 2016 19:46:01 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mckusick@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 19:46:02 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D139715 Kirk McKusick changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mckusick@FreeBSD.org --- Comment #7 from Kirk McKusick --- Hopefully this problem, if it still exists, has been fixed for the upcoming 11.0 release by commit 301996. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sun Jun 19 19:53:12 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3ADAA792A8 for ; Sun, 19 Jun 2016 19:53:12 +0000 (UTC) (envelope-from fidaj@ukr.net) Received: from frv157.fwdcdn.com (frv157.fwdcdn.com [212.42.77.157]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 66666100F for ; Sun, 19 Jun 2016 19:53:12 +0000 (UTC) (envelope-from fidaj@ukr.net) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net; s=fsm; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date; bh=Md2EGEmg17ekiDcbB9bvGslPbXXfjdFelDnb5v23dlU=; b=TpSRXcBdnqAh6GsPgqY3YXNF0OhGMYEv2v6PUD+Um+pDzPkVcpRTU6QmzZoogKejgOund9CIGKnoVpH1YCcrlvluh6wKlu5/FGjP2pdO2piVlSCCYYrRmre3gXmUdKWfD1wlYuB79mTrqMaj0PNOafqHHwmflce6J2F1jZTXZzI=; Received: from [37.229.193.176] (helo=nonamehost.local) by frv157.fwdcdn.com with esmtpsa ID 1bEimK-0000cl-TZ ; Sun, 19 Jun 2016 22:53:08 +0300 Date: Sun, 19 Jun 2016 22:53:08 +0300 From: Ivan Klymenko To: Kaya Saman Cc: FreeBSD Filesystems Subject: Re: High CPU Interrupt using ZFS Message-ID: <20160619225308.2339eead@nonamehost.local> In-Reply-To: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> References: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd11.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Authentication-Result: IP=37.229.193.176; mail.from=fidaj@ukr.net; dkim=pass; header.d=ukr.net X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 19:53:12 -0000 On Sun, 19 Jun 2016 20:38:05 +0100 Kaya Saman wrote: > kern.ipc.semmns=6000000 Is it really needs a number of semaphores? From owner-freebsd-fs@freebsd.org Sun Jun 19 20:07:57 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 479A5A795A1 for ; Sun, 19 Jun 2016 20:07:57 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2B14318F2 for ; Sun, 19 Jun 2016 20:07:57 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5JK7unn037056 for ; Sun, 19 Jun 2016 20:07:57 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 139715] [zfs] vfs.numvnodes leak on busy zfs Date: Sun, 19 Jun 2016 20:07:56 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mckusick@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 20:07:57 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D139715 --- Comment #8 from Kirk McKusick --- Addendum to my previous comment, this was fixed by commits 301996 and 30199= 7. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sun Jun 19 20:45:51 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B5C4A7716A for ; Sun, 19 Jun 2016 20:45:51 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: from mail-qk0-x230.google.com (mail-qk0-x230.google.com [IPv6:2607:f8b0:400d:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5E1242BB9 for ; Sun, 19 Jun 2016 20:45:51 +0000 (UTC) (envelope-from paul@kraus-haus.org) Received: by mail-qk0-x230.google.com with SMTP id p10so140108947qke.3 for ; Sun, 19 Jun 2016 13:45:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kraus-haus-org.20150623.gappssmtp.com; s=20150623; h=subject:mime-version:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=Y8QUsRYPkDNx1mE8TkzoGx1smx0hwgWAZ0hU0prQxJo=; b=fKJ0X4cnh58+yQz1FtHK7j1vWSxthvkDMkl4RRyxn0ctAzPd2UgSiU0aIvzl6feJF8 qc4IJHCqSokSqZHzJeciaiMIkOvdIeKPAcIRaSBbtX+yf/YChVZKTRZUr1IibMRCt7td kYFz3d8TEp1VVKTuqsAmuhMznB9xawF/XuOBhsksncNF2rvU71sVUbMKvtdtxJcxuh0B xe+2xSQZumaxFPh+1CTfAbqlE+KEyw3CpIbxH6KOVc64Ch8K5mnmzt92j9//L6y5KxU8 DJg/OY/Tpps7d+sm0qy9mmkUywxqR04/FeC5WYVjDD4rNpX1KirQUwdtOQjCpkUQRs+X Uv9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:mime-version:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=Y8QUsRYPkDNx1mE8TkzoGx1smx0hwgWAZ0hU0prQxJo=; b=dMyPaGe3tECfLbE99DZNSbcGQVUfq83xuXNcHkTesFNMmKC0WU5FQ7lGGmZAM2dhqg LcV5TpSg7wKF5TijX8Jt6NFlQcReP4WoqPM1MFS38lZE+4qeEjoajudbwaR82YqxT7LY UM86G+kcz3kAIV9pMMciJC3iOmpz4B+keLoOiRtxk3voqPgTTXCaDSLGQo8JTSEEvFA7 2PyvT5ODSTYxQR3bUVxn5502vhWALGujanNTthMABTAwUzvMnu/okRo/P3VBOKBhARhK wwaQzMGUMWC1glO+Vu/C/xMzjEvYxuLf6RUafi0gn4N8sKKVB8BQfdkLKT9buME5CDf3 f7Hw== X-Gm-Message-State: ALyK8tJmdtyHdMO08C0C3vecprpJ2YjjIr5THd+jJ19GphpkJee6eXJe8IE5ZV4fwMNe5A== X-Received: by 10.55.77.4 with SMTP id a4mr16212142qkb.198.1466369150352; Sun, 19 Jun 2016 13:45:50 -0700 (PDT) Received: from [192.168.2.133] (pool-100-4-209-221.albyny.fios.verizon.net. [100.4.209.221]) by smtp.gmail.com with ESMTPSA id t27sm9480703qtc.30.2016.06.19.13.45.49 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 19 Jun 2016 13:45:49 -0700 (PDT) Subject: Re: High CPU Interrupt using ZFS Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Content-Type: text/plain; charset=utf-8 From: Paul Kraus In-Reply-To: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> Date: Sun, 19 Jun 2016 16:45:48 -0400 Cc: FreeBSD Filesystems Content-Transfer-Encoding: quoted-printable Message-Id: <2F83F199-80C1-4B98-A18D-C5343EE4F783@kraus-haus.org> References: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> To: Kaya Saman X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 20:45:51 -0000 > On Jun 19, 2016, at 3:38 PM, Kaya Saman wrote: > # zpool list > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH = ALTROOT > ZPOOL_2 27.2T 26.3T 884G - 41% 96% 1.00x ONLINE = - > ZPOOL_3 298G 248G 50.2G - 34% 83% 1.00x ONLINE = - > ZPOOL_4 1.81T 1.75T 66.4G - 25% 96% 1.00x ONLINE = - > ZPOOL_5 186G 171G 14.9G - 62% 92% 1.00x ONLINE = - > workspaces 119G 77.7G 41.3G - 56% 65% 1.00x ONLINE = - > zroot 111G 88.9G 22.1G - 70% 80% 1.00x ONLINE = - Are you aware that ZFS performance drops substantially once a pool = exceeds a certain % full, the threshold for which varies with pool type = and work load. It is generally considered a bad idea to run pools more = than 80% full with any configuration or workload. ZFS is designed first = and foremost for data integrity, not performance and running pools too = full causes _huge_ write performance penalties. Does your system hang = correspond to a write request to any of the pools that are more than 80% = full ? The pool that is at 92% capacity and 62% fragmented is especially = at risk for this behavior. The underlying reason for this behavior is that as a pool get more and = more full it takes more and more time to find an appropriate available = slab to write new data to, since _all_ writes are treated as new data = (that is the whole point of the Copy on Write design) _any_ write to a = close to full pool incurs the huge performance penalty. This means that if you write the data and _never_ modify it and that you = can stand the write penalty as you add data to the mostly full zpools, = then you may be able to use ZFS like this, otherwise just don=E2=80=99t. On my virtual hosts, running FreeBSD 10.x and VirtualBox, a pool more = than 80% full will make the VMs unacceptably unresponsive, I strive to = keep the pools at less than 60% capacity. Disk storage is (relatively) = cheap these days. From owner-freebsd-fs@freebsd.org Sun Jun 19 21:08:58 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DABD3A77540 for ; Sun, 19 Jun 2016 21:08:58 +0000 (UTC) (envelope-from kayasaman@gmail.com) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7A085131C for ; Sun, 19 Jun 2016 21:08:58 +0000 (UTC) (envelope-from kayasaman@gmail.com) Received: by mail-wm0-x22e.google.com with SMTP id r201so36771638wme.1 for ; Sun, 19 Jun 2016 14:08:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=+Ut9zSQCpJGxEkobF3odWKwUrfZDLb0gSBldIxOBmjg=; b=A/97cRQlFyNaJZrkTaoByb6fRI9R785rl/B1E+JraTAgx2gTRwpLTpLUVuKs7Fdhco yj15Dp++f8j/4cxepbWsZUjptzrEan1m6OXlHUuEezHQ3czgXkBwrqvZPPru/5rhbszo saeJYNbTkC1AjpOPuzpESKMdv1/KgmFPXlf7BM9aUR8MIwQF8sLUbJ8nRaYOjIpsrj3D JTGsabBn82yFZ/HIXkVmgXe4TyHU8aqvxFdIapyEgu4kat33WHfPHTthQjT+tkVk1I+V Ch80bG1BUzreOYXJjz80AtNmS76aoQjAp9F+ZAFjHghLHelk3AgyPHhX1POFEYPdzJHq 5DFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=+Ut9zSQCpJGxEkobF3odWKwUrfZDLb0gSBldIxOBmjg=; b=GJqAEWAAaTaCQI1Pz6V/vXYBtYPdpWJ19tgH93zCgofpDXGp5Q05SVozomf6ej7UOA 0GRhum9To2+MMAOly0hFMyhzpqtjQz2VKclblgAu2xdz/mfHoju0rFG6gqJQ/fRMKgSQ 32HT1H+dGVfjOmLv/j/INSCKRs/2vfwW5v5kUXivqDtq9XWG86xV4gia0ZX4lNEx2z3p yHuPEv8VTPZYizjHyQiRyqivWfXEJVT+tTZzw4b8K1m4mhXtGqoJfuSmVX2HigVZvn9d 6oosBWX5q/gi9GBh1tfpIFy8lXuI13mmt7Rx8+ypefatp2YizRkM+jCTNYoQ6q4olhhh LEmA== X-Gm-Message-State: ALyK8tLZbaJ/VOQUKdo91m+a7iOxgkK63U0GVbEkcUAxLGZRPjY1AGwbfekFYWASBao+gA== X-Received: by 10.28.41.134 with SMTP id p128mr2876764wmp.20.1466370536329; Sun, 19 Jun 2016 14:08:56 -0700 (PDT) Received: from [192.168.20.30] (optiplexnetworks.plus.com. [212.159.80.17]) by smtp.googlemail.com with ESMTPSA id ej9sm26377632wjd.7.2016.06.19.14.08.55 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 19 Jun 2016 14:08:55 -0700 (PDT) Subject: Re: High CPU Interrupt using ZFS To: Paul Kraus References: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> <2F83F199-80C1-4B98-A18D-C5343EE4F783@kraus-haus.org> Cc: FreeBSD Filesystems From: Kaya Saman Message-ID: <71559f37-9439-f7b4-9d1f-091875c098ab@gmail.com> Date: Sun, 19 Jun 2016 22:08:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <2F83F199-80C1-4B98-A18D-C5343EE4F783@kraus-haus.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 21:08:58 -0000 On 06/19/2016 09:45 PM, Paul Kraus wrote: >> On Jun 19, 2016, at 3:38 PM, Kaya Saman wrote: > > >> # zpool list >> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT >> ZPOOL_2 27.2T 26.3T 884G - 41% 96% 1.00x ONLINE - >> ZPOOL_3 298G 248G 50.2G - 34% 83% 1.00x ONLINE - >> ZPOOL_4 1.81T 1.75T 66.4G - 25% 96% 1.00x ONLINE - >> ZPOOL_5 186G 171G 14.9G - 62% 92% 1.00x ONLINE - >> workspaces 119G 77.7G 41.3G - 56% 65% 1.00x ONLINE - >> zroot 111G 88.9G 22.1G - 70% 80% 1.00x ONLINE - > Are you aware that ZFS performance drops substantially once a pool exceeds a certain % full, the threshold for which varies with pool type and work load. It is generally considered a bad idea to run pools more than 80% full with any configuration or workload. ZFS is designed first and foremost for data integrity, not performance and running pools too full causes _huge_ write performance penalties. Does your system hang correspond to a write request to any of the pools that are more than 80% full ? The pool that is at 92% capacity and 62% fragmented is especially at risk for this behavior. > > The underlying reason for this behavior is that as a pool get more and more full it takes more and more time to find an appropriate available slab to write new data to, since _all_ writes are treated as new data (that is the whole point of the Copy on Write design) _any_ write to a close to full pool incurs the huge performance penalty. > > This means that if you write the data and _never_ modify it and that you can stand the write penalty as you add data to the mostly full zpools, then you may be able to use ZFS like this, otherwise just don’t. > > On my virtual hosts, running FreeBSD 10.x and VirtualBox, a pool more than 80% full will make the VMs unacceptably unresponsive, I strive to keep the pools at less than 60% capacity. Disk storage is (relatively) cheap these days. > Thanks for this! Yep I was aware that things would "slow down" but not render the system totally unresponsive. This maybe the reason.... as once the CPU interrupt goes to unacceptably high levels even the NICs start flapping as they're all bound to an LACP LAGG interface. I will probably need to get a few JBOD chassis for expansion as though my chassis can take 22 disks they are all full :-( Regards, Kaya From owner-freebsd-fs@freebsd.org Sun Jun 19 21:10:56 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1FD63A775B7 for ; Sun, 19 Jun 2016 21:10:56 +0000 (UTC) (envelope-from kayasaman@gmail.com) Received: from mail-wm0-x22b.google.com (mail-wm0-x22b.google.com [IPv6:2a00:1450:400c:c09::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A827E141A for ; Sun, 19 Jun 2016 21:10:55 +0000 (UTC) (envelope-from kayasaman@gmail.com) Received: by mail-wm0-x22b.google.com with SMTP id f126so46963724wma.1 for ; Sun, 19 Jun 2016 14:10:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=mewqFbgCdFm6xkq4oEmkTegbnqMZ+ymM4VMDc6QVBGE=; b=G4YrSF1C4anL5l/3k4e7rG1xk4Lf75a6SNKNzMAiZqTbeljAKhnVif9topAXDOv5Aq XnePNM0quRFrKdcJgz3bx4b4kDkR/KG9tceYsvrI5W/1RiSggg9pRs6Ec8K25oHKRV3b PpnoYkvyGHOGjnQpNwrL0O2KPzhLJ6mK5dPbihWW9IXWQzWK1nNz1fcK3NBaJ0F7Kxwq 7B3eF2BHJkRTbnh6oHmfXw5PzLoTykzHRfzcmVOw2I36j2oezTjz4d4uZE6WImTvWlkk AJyfA6dl0f75Q4/HTwAWGyUpNOvrnxIORGUgbsaYRkFFzIniftrQWTGrZKuBLQRvLv93 WvQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=mewqFbgCdFm6xkq4oEmkTegbnqMZ+ymM4VMDc6QVBGE=; b=I4j0m29TAWTW9kWLQwoxzR1NS7gWqT9ar/OYz5tomBCRYtuznUjQJ1/j01/vY5LPFv lTsFFYH9MqmmAYGTfgmq457ELbfSz6BceG2r1l427i4+kCafQcc7Y5XInyQcuvN5UEGG DCCTbqEFkcx9dFEvyu7Ey3F4FVxZ0eKjhn2tbRbFTjNGo+B3whwVWTcKsENWE7UMgZ/r sSN0dalFhoZePQVksImuB/8tDazMaiWSGXCggexmiIa7m0OlRLyGqlLlVYiLZQtetHBj G0Ljl/GZeU+ywIzaiP9au9EJmBkDEtl4q5A+wnE5O96yNovYG4oW4CNoUlNwkF/P+8JO IwuQ== X-Gm-Message-State: ALyK8tLo6u2Jq3zNluxgmCIfRbA7rMyBSqbmSdo/A+sdpMrOIY7inu7YlRAdVHEcmhU2pA== X-Received: by 10.194.88.5 with SMTP id bc5mr12633122wjb.100.1466370654257; Sun, 19 Jun 2016 14:10:54 -0700 (PDT) Received: from [192.168.20.30] (optiplexnetworks.plus.com. [212.159.80.17]) by smtp.googlemail.com with ESMTPSA id r6sm10185478wme.14.2016.06.19.14.10.53 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 19 Jun 2016 14:10:53 -0700 (PDT) Subject: Re: High CPU Interrupt using ZFS To: Ivan Klymenko References: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> <20160619225308.2339eead@nonamehost.local> Cc: FreeBSD Filesystems From: Kaya Saman Message-ID: <44afded2-82a1-cd5f-4d2d-9b8355ed18a6@gmail.com> Date: Sun, 19 Jun 2016 22:10:52 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <20160619225308.2339eead@nonamehost.local> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 21:10:56 -0000 On 06/19/2016 08:53 PM, Ivan Klymenko wrote: > On Sun, 19 Jun 2016 20:38:05 +0100 > Kaya Saman wrote: > >> kern.ipc.semmns=6000000 > Is it really needs a number of semaphores? Yep I do as am running too many processes for a single machine ( I think ) and without the number set that high the system kept complaining and some processes wouldn't run. I really need to migrate or offload some of the stuff onto a separate machine. From owner-freebsd-fs@freebsd.org Sun Jun 19 21:47:21 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DB1FCA77FEB for ; Sun, 19 Jun 2016 21:47:21 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com [IPv6:2a00:1450:400c:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4870426D7 for ; Sun, 19 Jun 2016 21:47:20 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x233.google.com with SMTP id a66so54417349wme.0 for ; Sun, 19 Jun 2016 14:47:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=N1HeFim2tKA08zAQ8kopXDpHPx73hvGMgXGxzMkvmAw=; b=neTb/ia0c7czTThzH1aBnm1cYqdZoA5Zp+iP0OdpOHZ5dTsk51+/OvpVt1i2vWywlD 7NthcuRXYPNYvSM/GAlfIADl2gf7HaSiW61ph9oyOajGV3Kt4uq5kgWF3z2Yib+q1X/z Hl46rQkxMwVndWLDxOrHLrByEghqP7Th9nPYTFMp4AiS5MVY8EMM/bX1mlakvq6xLPTA bONZEpPS4Wyd5nbyooxRoDhCNveofpOmQSiwddm5cTvqA7kTcGqS0pI/rL4XIfaYdVO0 1nid5J3hUSH3GmEJBgSkb9fYXbW1aC/FhX69uhZG2EoyyAYEfEK9X0rLr1LalG+8fg7M feew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=N1HeFim2tKA08zAQ8kopXDpHPx73hvGMgXGxzMkvmAw=; b=C9za7B2vp2x3+HKCzp2HAwCKvXoqndYYPinyQkj+7Wv8t1IasDbVSyU0KKiGCdeF5E TOoXQtFg1a3/bYvOzNLsuqUqHTwHfKWc8RM4bUMzzATXe7PltSrrKfWPONs6gfW/YUis GmnEImiDArPf+s51g1/R27ITY6SKQhHm5OYs5skh6/VDoJ3m1t7ZlaHCastIhhxXsIpj Lwl1Di4mfW0UJW/KJW1hZpdEalu0d8ETUHuw3f7130F60E4CtFssJOlSqcfnWRC/XtPc yQn+OlCXUm+McZKpY1EY+Bsw+5uRJHRr+H6c9XGfemP6bNPKWimuPky5gn9xzvdyXyRQ ckAg== X-Gm-Message-State: ALyK8tKmhINPoqZNgjWnF/5icrTnM/8WIop9kG2T1blr4EAAe4P8wfk6TNoGOWhxKchcvDgr X-Received: by 10.28.48.138 with SMTP id w132mr7624181wmw.44.1466372839242; Sun, 19 Jun 2016 14:47:19 -0700 (PDT) Received: from [10.10.1.58] (liv3d.labs.multiplay.co.uk. [82.69.141.171]) by smtp.gmail.com with ESMTPSA id ue1sm39476536wjc.44.2016.06.19.14.47.18 for (version=TLSv1/SSLv3 cipher=OTHER); Sun, 19 Jun 2016 14:47:18 -0700 (PDT) Subject: Re: High CPU Interrupt using ZFS To: freebsd-fs@freebsd.org References: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> From: Steven Hartland Message-ID: <48d498c8-ef9c-355b-ed5e-43ae003e8925@multiplay.co.uk> Date: Sun, 19 Jun 2016 22:47:20 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <57cfcda4-6ff7-0c2e-4f58-ad09ce7cab28@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 21:47:22 -0000 You usage levels are really high I would recommend keeping things below 80% otherwise when new data is written its much more costly to locate free space. On 19/06/2016 20:38, Kaya Saman wrote: > Hi, > > > I have a strange problem and I'm not sure if anyone has ever > experienced this to help give me some advice on how to tackle it. > > > Basically I run ZFS as root FS mirrored over two drives which are > directly connected to the SATA connectors on a SuperMicro Xeon E5 > server based MB. > > > Then I have an LSI HBA connected to the remaining disks with various > ZPOOLs. The main pool has ZIL and L2ARC enabled. > > > As the majority of data is A/V content I disabled Prefetch as > instructed in the FreeBSD tuning tips guide. > > > https://www.freebsd.org/doc/handbook/zfs.html > > > For some reason after a period of time the CPU interrupt will just got > sky high and the system will totally bog down. My home drive is > running off the "Main Pool" too and when this happens it becomes > inaccessible. > > > The system runs FBSD 10.3: 10.3-RELEASE FreeBSD 10.3-RELEASE #0 > r297264: Fri Mar 25 02:10:02 UTC 2016 > > > ZPOOL List output: > > > # zpool list > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH > ALTROOT > ZPOOL_2 27.2T 26.3T 884G - 41% 96% 1.00x ONLINE - > ZPOOL_3 298G 248G 50.2G - 34% 83% 1.00x ONLINE - > ZPOOL_4 1.81T 1.75T 66.4G - 25% 96% 1.00x ONLINE - > ZPOOL_5 186G 171G 14.9G - 62% 92% 1.00x ONLINE - > workspaces 119G 77.7G 41.3G - 56% 65% 1.00x ONLINE - > zroot 111G 88.9G 22.1G - 70% 80% 1.00x ONLINE - > > > The system has a Xeon E5 with 24GB RAM and 16GB of Swap space. > > > I also run 5x jails on this box. > > 1x for network based monitoring (munin, zabbix etc) > > 1x DB jail which runs Postgresql and Mysql > > > + some others; they are all run off the ZRoot > > > Boot Loader info: > > > zfs_load="YES" > > kern.ipc.semmni=6000000 > kern.ipc.semmns=6000000 > kern.ipc.semmnu=256 > > net.isr.numthreads=4 > net.isr.maxthreads=4 > net.isr.bindthreads=1 > > vfs.zfs.l2arc_noprefetch=1 > > > Other information: > > > # camcontrol devlist > at scbus0 target 8 lun 0 (pass0,da0) > at scbus0 target 10 lun 0 (pass1,da1) > at scbus0 target 11 lun 0 (pass2,da2) > at scbus0 target 12 lun 0 (pass3,da3) > at scbus0 target 13 lun 0 (pass4,ses0) > at scbus0 target 14 lun 0 (pass5,da4) > at scbus0 target 15 lun 0 (pass6,da5) > at scbus0 target 17 lun 0 (pass7,da6) > at scbus0 target 18 lun 0 (pass8,da7) > at scbus0 target 19 lun 0 (pass9,da8) > at scbus0 target 20 lun 0 (pass10,da9) > at scbus0 target 21 lun 0 > (pass11,da10) > at scbus0 target 22 lun 0 > (pass12,da11) > at scbus0 target 29 lun 0 > (pass13,da12) > at scbus0 target 30 lun 0 > (pass14,da13) > at scbus0 target 31 lun 0 > (pass15,da14) > at scbus0 target 34 lun 0 > (pass16,da15) > at scbus0 target 35 lun 0 > (pass17,da16) > at scbus0 target 36 lun 0 > (pass18,da17) > at scbus0 target 37 lun 0 > (pass19,da18) > at scbus0 target 38 lun 0 > (pass20,da19) > at scbus0 target 40 lun 0 > (pass21,da20) > at scbus0 target 41 lun 0 > (pass22,da21) > at scbus2 target 0 lun 0 (pass23,ada0) > at scbus3 target 0 lun 0 (pass24,ada1) > at scbus8 target 0 lun 0 (pass25,ses1) > > > Sysctl output for ZFS: > > > # sysctl -a |grep zfs > 2 PART diskid/DISK-1350790500009986007Fp2 229319956992 512 i 2 o > 10737435648 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b > 2 PART diskid/DISK-1350790500009986007Fp1 10737418240 512 i 1 o 17408 > ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b > 2 PART diskid/DISK-13507905000099860071p2 229319956992 512 i 2 o > 10737435648 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b > 2 PART diskid/DISK-13507905000099860071p1 10735321088 512 i 1 o > 2097152 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b > 2 PART diskid/DISK-14067903000097960BD7p3 119445590016 512 i 3 o > 8590065664 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b > 1 PART ada0p3 119445590016 512 i 3 o 8590065664 ty freebsd-zfs xs GPT > xt 516e7cba-6ecf-11d6-8ff8-00022d09712b > z0xfffff80012422d00 [shape=box,label="ZFS::VDEV\nzfs::vdev\nr#4"]; > zfs::vdev > freebsd-zfs > freebsd-zfs > freebsd-zfs > freebsd-zfs > freebsd-zfs > freebsd-zfs > vfs.zfs.trim.max_interval: 1 > vfs.zfs.trim.timeout: 30 > vfs.zfs.trim.txg_delay: 32 > vfs.zfs.trim.enabled: 1 > vfs.zfs.vol.unmap_enabled: 1 > vfs.zfs.vol.mode: 1 > vfs.zfs.version.zpl: 5 > vfs.zfs.version.spa: 5000 > vfs.zfs.version.acl: 1 > vfs.zfs.version.ioctl: 5 > vfs.zfs.debug: 0 > vfs.zfs.super_owner: 0 > vfs.zfs.sync_pass_rewrite: 2 > vfs.zfs.sync_pass_dont_compress: 5 > vfs.zfs.sync_pass_deferred_free: 2 > vfs.zfs.zio.exclude_metadata: 0 > vfs.zfs.zio.use_uma: 1 > vfs.zfs.cache_flush_disable: 0 > vfs.zfs.zil_replay_disable: 0 > vfs.zfs.min_auto_ashift: 9 > vfs.zfs.max_auto_ashift: 13 > vfs.zfs.vdev.trim_max_pending: 10000 > vfs.zfs.vdev.bio_delete_disable: 0 > vfs.zfs.vdev.bio_flush_disable: 0 > vfs.zfs.vdev.write_gap_limit: 4096 > vfs.zfs.vdev.read_gap_limit: 32768 > vfs.zfs.vdev.aggregation_limit: 131072 > vfs.zfs.vdev.trim_max_active: 64 > vfs.zfs.vdev.trim_min_active: 1 > vfs.zfs.vdev.scrub_max_active: 2 > vfs.zfs.vdev.scrub_min_active: 1 > vfs.zfs.vdev.async_write_max_active: 10 > vfs.zfs.vdev.async_write_min_active: 1 > vfs.zfs.vdev.async_read_max_active: 3 > vfs.zfs.vdev.async_read_min_active: 1 > vfs.zfs.vdev.sync_write_max_active: 10 > vfs.zfs.vdev.sync_write_min_active: 10 > vfs.zfs.vdev.sync_read_max_active: 10 > vfs.zfs.vdev.sync_read_min_active: 10 > vfs.zfs.vdev.max_active: 1000 > vfs.zfs.vdev.async_write_active_max_dirty_percent: 60 > vfs.zfs.vdev.async_write_active_min_dirty_percent: 30 > vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1 > vfs.zfs.vdev.mirror.non_rotating_inc: 0 > vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576 > vfs.zfs.vdev.mirror.rotating_seek_inc: 5 > vfs.zfs.vdev.mirror.rotating_inc: 0 > vfs.zfs.vdev.trim_on_init: 1 > vfs.zfs.vdev.cache.bshift: 16 > vfs.zfs.vdev.cache.size: 0 > vfs.zfs.vdev.cache.max: 16384 > vfs.zfs.vdev.metaslabs_per_vdev: 200 > vfs.zfs.txg.timeout: 5 > vfs.zfs.space_map_blksz: 4096 > vfs.zfs.spa_slop_shift: 5 > vfs.zfs.spa_asize_inflation: 24 > vfs.zfs.deadman_enabled: 1 > vfs.zfs.deadman_checktime_ms: 5000 > vfs.zfs.deadman_synctime_ms: 1000000 > vfs.zfs.recover: 0 > vfs.zfs.spa_load_verify_data: 1 > vfs.zfs.spa_load_verify_metadata: 1 > vfs.zfs.spa_load_verify_maxinflight: 10000 > vfs.zfs.check_hostid: 1 > vfs.zfs.mg_fragmentation_threshold: 85 > vfs.zfs.mg_noalloc_threshold: 0 > vfs.zfs.condense_pct: 200 > vfs.zfs.metaslab.bias_enabled: 1 > vfs.zfs.metaslab.lba_weighting_enabled: 1 > vfs.zfs.metaslab.fragmentation_factor_enabled: 1 > vfs.zfs.metaslab.preload_enabled: 1 > vfs.zfs.metaslab.preload_limit: 3 > vfs.zfs.metaslab.unload_delay: 8 > vfs.zfs.metaslab.load_pct: 50 > vfs.zfs.metaslab.min_alloc_size: 33554432 > vfs.zfs.metaslab.df_free_pct: 4 > vfs.zfs.metaslab.df_alloc_threshold: 131072 > vfs.zfs.metaslab.debug_unload: 0 > vfs.zfs.metaslab.debug_load: 0 > vfs.zfs.metaslab.fragmentation_threshold: 70 > vfs.zfs.metaslab.gang_bang: 16777217 > vfs.zfs.free_bpobj_enabled: 1 > vfs.zfs.free_max_blocks: 18446744073709551615 > vfs.zfs.no_scrub_prefetch: 0 > vfs.zfs.no_scrub_io: 0 > vfs.zfs.resilver_min_time_ms: 3000 > vfs.zfs.free_min_time_ms: 1000 > vfs.zfs.scan_min_time_ms: 1000 > vfs.zfs.scan_idle: 50 > vfs.zfs.scrub_delay: 4 > vfs.zfs.resilver_delay: 2 > vfs.zfs.top_maxinflight: 32 > vfs.zfs.zfetch.array_rd_sz: 1048576 > vfs.zfs.zfetch.max_distance: 8388608 > vfs.zfs.zfetch.min_sec_reap: 2 > vfs.zfs.zfetch.max_streams: 8 > vfs.zfs.prefetch_disable: 0 > vfs.zfs.delay_scale: 500000 > vfs.zfs.delay_min_dirty_percent: 60 > vfs.zfs.dirty_data_sync: 67108864 > vfs.zfs.dirty_data_max_percent: 10 > vfs.zfs.dirty_data_max_max: 4294967296 > vfs.zfs.dirty_data_max: 2570453401 > vfs.zfs.max_recordsize: 1048576 > vfs.zfs.mdcomp_disable: 0 > vfs.zfs.nopwrite_enabled: 1 > vfs.zfs.dedup.prefetch: 1 > vfs.zfs.l2c_only_size: 0 > vfs.zfs.mfu_ghost_data_lsize: 3288968704 > vfs.zfs.mfu_ghost_metadata_lsize: 5136092672 > vfs.zfs.mfu_ghost_size: 8425061376 > vfs.zfs.mfu_data_lsize: 8574981632 > vfs.zfs.mfu_metadata_lsize: 68123648 > vfs.zfs.mfu_size: 8745474560 > vfs.zfs.mru_ghost_data_lsize: 5324684800 > vfs.zfs.mru_ghost_metadata_lsize: 923847680 > vfs.zfs.mru_ghost_size: 6248532480 > vfs.zfs.mru_data_lsize: 1456756224 > vfs.zfs.mru_metadata_lsize: 1278004224 > vfs.zfs.mru_size: 2862586368 > vfs.zfs.anon_data_lsize: 0 > vfs.zfs.anon_metadata_lsize: 0 > vfs.zfs.anon_size: 2841088 > vfs.zfs.l2arc_norw: 1 > vfs.zfs.l2arc_feed_again: 1 > vfs.zfs.l2arc_noprefetch: 1 > vfs.zfs.l2arc_feed_min_ms: 200 > vfs.zfs.l2arc_feed_secs: 1 > vfs.zfs.l2arc_headroom: 2 > vfs.zfs.l2arc_write_boost: 134217728 > vfs.zfs.l2arc_write_max: 67108864 > vfs.zfs.arc_meta_limit: 5979973632 > vfs.zfs.arc_free_target: 42350 > vfs.zfs.arc_shrink_shift: 7 > vfs.zfs.arc_average_blocksize: 8192 > vfs.zfs.arc_min: 2989986816 > vfs.zfs.arc_max: 23919894528 > debug.zfs_flags: 0 > kstat.zfs.misc.vdev_cache_stats.misses: 0 > kstat.zfs.misc.vdev_cache_stats.hits: 0 > kstat.zfs.misc.vdev_cache_stats.delegations: 0 > kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch: 3034069 > kstat.zfs.misc.arcstats.sync_wait_for_async: 1779944 > kstat.zfs.misc.arcstats.arc_meta_min: 1494993408 > kstat.zfs.misc.arcstats.arc_meta_max: 12233249160 > kstat.zfs.misc.arcstats.arc_meta_limit: 5979973632 > kstat.zfs.misc.arcstats.arc_meta_used: 4638138472 > kstat.zfs.misc.arcstats.duplicate_reads: 1709068 > kstat.zfs.misc.arcstats.duplicate_buffers_size: 0 > kstat.zfs.misc.arcstats.duplicate_buffers: 0 > kstat.zfs.misc.arcstats.memory_throttle_count: 0 > kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 2200 > kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 4772510 > kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 561679501832704 > kstat.zfs.misc.arcstats.l2_write_pios: 377935 > kstat.zfs.misc.arcstats.l2_write_buffer_iter: 1193136 > kstat.zfs.misc.arcstats.l2_write_full: 148 > kstat.zfs.misc.arcstats.l2_write_not_cacheable: 264598116 > kstat.zfs.misc.arcstats.l2_write_io_in_progress: 83 > kstat.zfs.misc.arcstats.l2_write_in_l2: 5284665382 > kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 14626890890 > kstat.zfs.misc.arcstats.l2_write_passed_headroom: 3318575 > kstat.zfs.misc.arcstats.l2_write_trylock_fail: 6328431 > kstat.zfs.misc.arcstats.l2_compress_failures: 655251 > kstat.zfs.misc.arcstats.l2_compress_zeros: 0 > kstat.zfs.misc.arcstats.l2_compress_successes: 1205377 > kstat.zfs.misc.arcstats.l2_hdr_size: 63556704 > kstat.zfs.misc.arcstats.l2_asize: 84595239936 > kstat.zfs.misc.arcstats.l2_size: 93178570752 > kstat.zfs.misc.arcstats.l2_io_error: 0 > kstat.zfs.misc.arcstats.l2_cksum_bad: 0 > kstat.zfs.misc.arcstats.l2_abort_lowmem: 42 > kstat.zfs.misc.arcstats.l2_cdata_free_on_write: 41 > kstat.zfs.misc.arcstats.l2_free_on_write: 722 > kstat.zfs.misc.arcstats.l2_evict_l1cached: 0 > kstat.zfs.misc.arcstats.l2_evict_reading: 0 > kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0 > kstat.zfs.misc.arcstats.l2_writes_lock_retry: 63 > kstat.zfs.misc.arcstats.l2_writes_error: 0 > kstat.zfs.misc.arcstats.l2_writes_done: 377935 > kstat.zfs.misc.arcstats.l2_writes_sent: 377935 > kstat.zfs.misc.arcstats.l2_write_bytes: 101118255104 > kstat.zfs.misc.arcstats.l2_read_bytes: 59571878912 > kstat.zfs.misc.arcstats.l2_rw_clash: 0 > kstat.zfs.misc.arcstats.l2_feeds: 1193136 > kstat.zfs.misc.arcstats.l2_misses: 137818470 > kstat.zfs.misc.arcstats.l2_hits: 3613135 > kstat.zfs.misc.arcstats.mfu_ghost_evictable_metadata: 5136092672 > kstat.zfs.misc.arcstats.mfu_ghost_evictable_data: 3722561024 > kstat.zfs.misc.arcstats.mfu_ghost_size: 8858653696 > kstat.zfs.misc.arcstats.mfu_evictable_metadata: 68123648 > kstat.zfs.misc.arcstats.mfu_evictable_data: 8575112704 > kstat.zfs.misc.arcstats.mfu_size: 8745605632 > kstat.zfs.misc.arcstats.mru_ghost_evictable_metadata: 923847680 > kstat.zfs.misc.arcstats.mru_ghost_evictable_data: 5324684800 > kstat.zfs.misc.arcstats.mru_ghost_size: 6248532480 > kstat.zfs.misc.arcstats.mru_evictable_metadata: 1278004224 > kstat.zfs.misc.arcstats.mru_evictable_data: 1457411584 > kstat.zfs.misc.arcstats.mru_size: 2863241728 > kstat.zfs.misc.arcstats.anon_evictable_metadata: 0 > kstat.zfs.misc.arcstats.anon_evictable_data: 0 > kstat.zfs.misc.arcstats.anon_size: 2038272 > kstat.zfs.misc.arcstats.other_size: 2797801256 > kstat.zfs.misc.arcstats.metadata_size: 1576374272 > kstat.zfs.misc.arcstats.data_size: 10034527744 > kstat.zfs.misc.arcstats.hdr_size: 200406240 > kstat.zfs.misc.arcstats.size: 14672666216 > kstat.zfs.misc.arcstats.c_max: 23919894528 > kstat.zfs.misc.arcstats.c_min: 2989986816 > kstat.zfs.misc.arcstats.c: 14673666683 > kstat.zfs.misc.arcstats.p: 8668447917 > kstat.zfs.misc.arcstats.hash_chain_max: 7 > kstat.zfs.misc.arcstats.hash_chains: 219061 > kstat.zfs.misc.arcstats.hash_collisions: 33107789 > kstat.zfs.misc.arcstats.hash_elements_max: 1529284 > kstat.zfs.misc.arcstats.hash_elements: 1529163 > kstat.zfs.misc.arcstats.evict_l2_skip: 0 > kstat.zfs.misc.arcstats.evict_l2_ineligible: 353901531136 > kstat.zfs.misc.arcstats.evict_l2_eligible: 611148992512 > kstat.zfs.misc.arcstats.evict_l2_cached: 471776311808 > kstat.zfs.misc.arcstats.evict_not_enough: 2164 > kstat.zfs.misc.arcstats.evict_skip: 232562 > kstat.zfs.misc.arcstats.mutex_miss: 17547 > kstat.zfs.misc.arcstats.deleted: 10350064 > kstat.zfs.misc.arcstats.allocated: 172235521 > kstat.zfs.misc.arcstats.mfu_ghost_hits: 8494679 > kstat.zfs.misc.arcstats.mfu_hits: 1457647309 > kstat.zfs.misc.arcstats.mru_ghost_hits: 5765227 > kstat.zfs.misc.arcstats.mru_hits: 90829356 > kstat.zfs.misc.arcstats.prefetch_metadata_misses: 4657105 > kstat.zfs.misc.arcstats.prefetch_metadata_hits: 14515029 > kstat.zfs.misc.arcstats.prefetch_data_misses: 6610395 > kstat.zfs.misc.arcstats.prefetch_data_hits: 6837739 > kstat.zfs.misc.arcstats.demand_metadata_misses: 127929204 > kstat.zfs.misc.arcstats.demand_metadata_hits: 404655615 > kstat.zfs.misc.arcstats.demand_data_misses: 2235496 > kstat.zfs.misc.arcstats.demand_data_hits: 1138662563 > kstat.zfs.misc.arcstats.misses: 141432200 > kstat.zfs.misc.arcstats.hits: 1564670946 > kstat.zfs.misc.zcompstats.skipped_insufficient_gain: 4581339 > kstat.zfs.misc.zcompstats.empty: 842987 > kstat.zfs.misc.zcompstats.attempts: 121463608 > kstat.zfs.misc.zfetchstats.max_streams: 2029717049 > kstat.zfs.misc.zfetchstats.misses: 2043239863 > kstat.zfs.misc.zfetchstats.hits: 15544425 > kstat.zfs.misc.xuio_stats.write_buf_nocopy: 1453761 > kstat.zfs.misc.xuio_stats.write_buf_copied: 0 > kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0 > kstat.zfs.misc.xuio_stats.read_buf_copied: 0 > kstat.zfs.misc.xuio_stats.onloan_write_buf: 0 > kstat.zfs.misc.xuio_stats.onloan_read_buf: 0 > kstat.zfs.misc.zio_trim.failed: 0 > kstat.zfs.misc.zio_trim.unsupported: 867 > kstat.zfs.misc.zio_trim.success: 165333478 > kstat.zfs.misc.zio_trim.bytes: 9734294003712 > security.jail.param.allow.mount.zfs: 0 > security.jail.mount_zfs_allowed: 0 > > > I really don't know but could it be a conflict between the MB SATA > ports and LSI HBA?? As upon startup there do seem to be some ATA Error > messages in dmesg... > > So more of a physical HW issue then FS based? > > Or is it due to the term "bursty IO" that happens with ZFS... either > way I have been looking at this for months trying to figure things out > but other then a reboot nothing I do makes things better! Turning off > my monitoring jail does help on occasion but outside of that I'm lost. > > > I have another NAS based system with UFS root on SSD also yet ZPOOLs > over the various large mechanical drives but never run into this > particular issue! > > > Would anyone be able to help??? > > > Many thanks. > > > Kaya > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Sun Jun 19 23:29:26 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 13485A7A0B9 for ; Sun, 19 Jun 2016 23:29:26 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 8F99A2EBF; Sun, 19 Jun 2016 23:29:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:TEfmQRJsW/aqu/m6StmcpTZWNBhigK39O0sv0rFitYgUL//xwZ3uMQTl6Ol3ixeRBMOAu6MC2rad6fuocFdDyKjCmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TWM5DIfUi/yKRBybrysXNWC3oLmi6vooNX6WEZhunmUWftKNhK4rAHc5IE9oLBJDeIP8CbPuWZCYO9MxGlldhq5lhf44dqsrtY4q3wD86Fpy8kVc6Lgc60+BZxFBj4vKWx9sM/otTHCXRCe/WcRV35QmR1NVVvr9hb/C63wuSiyk+N22y2XOIWiV7U9Ujem4qJDVRjnlSoDLz5/+2iB2Z84t75SvB/0/083+IXTeozAbPc= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DYBACJKmdX/61jaINchBR9Bq5bjXwihXUCgV4RAQEBAQEBAQFkJ4IxghoBAQEDASMESAoFCwIBCA4KAgINGQICITYCBBMbh3sDDwgOrz6MHw2DXgEBAQcBAQEBI4EBhSaETYJDgWcCFIMBgloFjXWKTTSGBoYqhDGMa4gKh2sCNCCEDCAyiQVEfwEBAQ X-IronPort-AV: E=Sophos;i="5.26,495,1459828800"; d="scan'208";a="288352735" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 19 Jun 2016 19:29:14 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 406D415F5C0; Sun, 19 Jun 2016 19:29:14 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CV-WxLD3aO4Y; Sun, 19 Jun 2016 19:29:13 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 245AC15F5C6; Sun, 19 Jun 2016 19:29:13 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id xQ7sZFtPMu2C; Sun, 19 Jun 2016 19:29:13 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id F35F115F5C0; Sun, 19 Jun 2016 19:29:12 -0400 (EDT) Date: Sun, 19 Jun 2016 19:29:12 -0400 (EDT) From: Rick Macklem To: Jordan Hubbard Cc: Chris Watson , freebsd-fs , Alexander Motin Message-ID: <1845469514.159182764.1466378952929.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com> Subject: Re: pNFS server Plan B MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF47 (Win)/8.0.9_GA_6191) Thread-Topic: pNFS server Plan B Thread-Index: 10x17g/wb7qkit7Lp7CytadquFv17Q== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2016 23:29:26 -0000 Jordan Hubbard wrote: >=20 > > On Jun 18, 2016, at 6:14 PM, Chris Watson wrote: > >=20 > > Since Jordan brought up clustering, I would be interested to hear Justi= n > > Gibbs thoughts here. I know about a year ago he was asked on an "after > > hours" video chat hosted by Matt Aherns about a feature he would really > > like to see and he mentioned he would really like, in a universe filled > > with time and money I'm sure, to work on a native clustering solution f= or > > FreeBSD. I don't know if he is subscribed to the list, and I'm certainl= y > > not throwing him under the bus by bringing his name up, but I know he h= as > > at least been thinking about this for some time and probably has some > > value to add here. >=20 > I think we should also be careful to define our terms in such a discussio= n. > Specifically: >=20 > 1. Are we talking about block-level clustering underneath ZFS (e.g. HAST = or > ${somethingElse}) or otherwise incorporated into ZFS itself at some low > level? If you Google for =E2=80=9CHigh-availability ZFS=E2=80=9D you wil= l encounter things > like RSF-1 or the somewhat more mysterious Zetavault > (http://www.zeta.systems/zetavault/high-availability/) but it=E2=80=99s n= ot entirely > clear how these technologies work, they simply claim to =E2=80=9Cscale-ou= t ZFS=E2=80=9D or > =E2=80=9Ccluster ZFS=E2=80=9D (which can be done within ZFS or one level = above and still > probably pass the Marketing Test for what people are willing to put on a = web > page). >=20 > 2. Are we talking about clustering at a slightly higher level, in a > filesystem-agnostic fashion which still preserves filesystem semantics? >=20 > 3. Are we talking about clustering for data objects, in a fashion which d= oes > not necessarily provide filesystem semantics (a sharding database which c= an > store arbitrary BLOBs would qualify)? >=20 For the pNFS use case I am looking at, I would say #2. I suspect #1 sits at a low enough level that redirecting I/O via the pNFS l= ayouts isn't practical, since ZFS is taking care of block allocations, etc. I see #3 as a separate problem space, since NFS deals with files and not ob= jects. However, GlusterFS maps file objects on top of the POSIX-like FS, so I supp= ose that could be done at the client end. (What glusterfs.org calls SwiftonFile, I t= hink?) It is also possible to map POSIX files onto file objects, but that sounds l= ike more work, which would need to be done under the NFS service. > For all of the above: Are we seeking to be compatible with any other > mechanisms, or are we talking about a FreeBSD-only solution? >=20 > This is why I brought up glusterfs / ceph / RiakCS in my previous comment= s - > when talking to the $users that Rick wants to involve in the discussion, > they rarely come to the table asking for =E2=80=9Csome or any sort of clu= stering, > don=E2=80=99t care which or how it works=E2=80=9D - they ask if I can off= er an S3 compatible > object store with horizontal scaling, or=20 > if they can use NFS in some > clustered fashion where there=E2=80=99s a single namespace offering petab= ytes of > storage with configurable redundancy such that no portion of that namespa= ce > is ever unavailable. >=20 I tend to think of this last case as the target for any pNFS server. The ba= sic idea is to redirect the I/O operations to wherever the data is actually sto= red, so that I/O performance doesn't degrade with scale. If redundancy is a necessary feature, then maybe Plan A is preferable to Pl= an B, since GlusterFS does provide for redundancy and resilvering of lost copies,= at least from my understanding of the docs on gluster.org. I'd also like to see how GlusterFS performs on a typical Linux setup. Even without having the nfsd use FUSE, access of GlusterFS via FUSE results= in crossing user (syscall on mount) --> kernel --> user (glusterfs daemon) within the c= lient machine, if I understand how GlusterFS works. Then the gluster brick server glusterf= sd daemon does file system syscall(s) to get at the actual file on the underlying FS (xfs = or ZFS or ...). As such, there is already a lot of user<->kernel boundary crossings. I wonder how much delay is added by the extra nfsd step for metadata? - I can't say much about performance of Plan A yet, but metadata operations= are slow and latency seems to be the issue. (I actually seem to get better perform= ance by disabling SMP, for example.) > I=E2=80=99d be interested in what Justin had in mind when he asked Matt a= bout this. > Being able to =E2=80=9Cattach ZFS pools to one another=E2=80=9D in such a= fashion that all > clients just see One Big Pool and ZFS=E2=80=99s own redundancy / snapshot= ting > characteristics magically apply to the =C3=BCberpool would be Pretty Cool= , > obviously, and would allow one to do round-robin DNS for NFS such that an= y > node could serve the same contents, but that also sounds pretty ambitious= , > depending on how it=E2=80=99s implemented. >=20 This would probably work with the extant nfsd and wouldn't have a use for p= NFS. I also agree that this sounds pretty ambitious. rick > - Jordan >=20 >=20 From owner-freebsd-fs@freebsd.org Mon Jun 20 01:54:46 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3636A79DCC for ; Mon, 20 Jun 2016 01:54:46 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 40D9F2549; Mon, 20 Jun 2016 01:54:45 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:VUEyRBMMkLwWEY8xV2sl6mtUPXoX/o7sNwtQ0KIMzox0KPn4rarrMEGX3/hxlliBBdydsKIVzbuK+P29EUU7or+/81k6OKRWUBEEjchE1ycBO+WiTXPBEfjxciYhF95DXlI2t1uyMExSBdqsLwaK+i760zceF13FOBZvIaytQ8iJ35Xxhrz5psCbSj4LrQT+SIs6FA+xowTVu5teqqpZAYF19CH0pGBVcf9d32JiKAHbtR/94sCt4MwrqHwI6Lpyv/JHBIf9ZagxS/R4ET4sOno5rJnpthnrTBuU92AAX2AN1BFPBl6Wwgv9W8LLsyD5/s900yqeMMi+GaoxUD+h66puYALvhzoKMyY5tmre3J8jxJlHqQ6s8kQsi7XfZ5uYYaJz X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CtBABTTGdX/61jaINcFoN+fQa8VyKFdQKBYREBAQEBAQEBAWQngjGCGgEBAQIBASMEQQwFBQsCAQgOCgICDRkCAlcCBBMZAgSICQgOr0iQBwEBAQEGAQEBAQEBIYEBhSaETYQqAhSDAYJaBYYAknaGBoVohCVOhzGFOo91AjQggggcgWggMgGJBER/AQEB X-IronPort-AV: E=Sophos;i="5.26,496,1459828800"; d="scan'208";a="288359348" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 19 Jun 2016 21:54:23 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 7A3D515F5CB; Sun, 19 Jun 2016 21:54:23 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id z_5kHGqocULR; Sun, 19 Jun 2016 21:54:22 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 2677D15F5CE; Sun, 19 Jun 2016 21:54:22 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id V0YNgW5izuwd; Sun, 19 Jun 2016 21:54:22 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 06B0915F5CB; Sun, 19 Jun 2016 21:54:22 -0400 (EDT) Date: Sun, 19 Jun 2016 21:54:22 -0400 (EDT) From: Rick Macklem To: Jordan Hubbard Cc: freebsd-fs , Alexander Motin Message-ID: <1996808572.159331289.1466387661988.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> Subject: Re: pNFS server Plan B MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF47 (Win)/8.0.9_GA_6191) Thread-Topic: pNFS server Plan B Thread-Index: 20IRk/I/vFO7zBU673ZT0UPB7/0o2w== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2016 01:54:46 -0000 Jordan Hubbard wrote: >=20 > > On Jun 13, 2016, at 3:28 PM, Rick Macklem wrote: > >=20 > > You may have already heard of Plan A, which sort of worked > > and you could test by following the instructions here: > >=20 > > http://people.freebsd.org/~rmacklem/pnfs-setup.txt > >=20 > > However, it is very slow for metadata operations (everything other than > > read/write) and I don't think it is very useful. >=20 > Hi guys, >=20 > I finally got a chance to catch up and bring up Rick=E2=80=99s pNFS setup= on a couple > of test machines. He=E2=80=99s right, obviously - The =E2=80=9Cplan A=E2= =80=9D approach is a bit > convoluted and not at all surprisingly slow. With all of those transits > twixt kernel and userland, not to mention glusterfs itself which has not > really been tuned for our platform (there are a number of papers on this = we > probably haven=E2=80=99t even all read yet), we=E2=80=99re obviously stil= l in the =E2=80=9Cfirst > make it work=E2=80=9D stage. >=20 > That said, I think there are probably more possible plans than just A and= B > here, and we should give the broader topic of =E2=80=9Cwhat does FreeBSD = want to do > in the Enterprise / Cloud computing space?" at least some consideration a= t > the same time, since there are more than a few goals running in parallel > here. >=20 > First, let=E2=80=99s talk about our story around clustered filesystems + = associated > command-and-control APIs in FreeBSD. There is something of an embarrassm= ent > of riches in the industry at the moment - glusterfs, ceph, Hadoop HDFS, > RiakCS, moose, etc. All or most of them offer different pros and cons, a= nd > all offer more than just the ability to store files and scale =E2=80=9Cel= astically=E2=80=9D. > They also have ReST APIs for configuring and monitoring the health of the > cluster, some offer object as well as file storage, and Riak offers a > distributed KVS for storing information *about* file objects in addition = to > the object themselves (and when your application involves storing and > managing several million photos, for example, the idea of distributing th= e > index as well as the files in a fault-tolerant fashion is also compelling= ). > Some, if not most, of them are also far better supported under Linux than > FreeBSD (I don=E2=80=99t think we even have a working ceph port yet). I= =E2=80=99m not > saying we need to blindly follow the herds and do all the same things oth= ers > are doing here, either, I=E2=80=99m just saying that it=E2=80=99s a much = bigger problem > space than simply =E2=80=9Cparallelizing NFS=E2=80=9D and if we can kill = multiple birds with > one stone on the way to doing that, we should certainly consider doing so= . >=20 > Why? Because pNFS was first introduced as a draft RFC (RFC5661 > ) in 2005. The linux folks ha= ve > been working on it > si= nce > 2006. Ten years is a long time in this business, and when I raised the > topic of pNFS at the recent SNIA DSI conference (where storage developers > gather to talk about trends and things), the most prevalent reaction I go= t > was =E2=80=9Cpeople are still using pNFS?!=E2=80=9D This is clearly one= of those > technologies that may still have some runway left, but it=E2=80=99s been = rapidly > overtaken by other approaches to solving more or less the same problems i= n > coherent, distributed filesystem access and if we want to get mindshare f= or > this, we should at least have an answer ready for the =E2=80=9Cwhy did yo= u guys do > pNFS that way rather than just shimming it on top of ${someNewerHotness}?= ?=E2=80=9D > argument. I=E2=80=99m not suggesting pNFS is dead - hell, even AFS > still appears to be somewhat alive, but there= =E2=80=99s a > difference between appealing to an increasingly narrow niche and trying t= o > solve the sorts of problems most DevOps folks working At Scale these days > are running into. >=20 Here are a few pNFS papers from the Netapp and Panansas sites. They are dated 2012->2015: (these papers give a nice overview of what pNFS is) http://www.netapp.com/us/media/tr-4063.pdf http://www.netapp.com/us/media/tr-4239.pdf http://www.netapp.com/us/media/wp-7153.pdf http://www.panasas.com/products/pnfs-overview One of these notes that the first Linux distribution that shipped with pNFS support was RHEL6.4 in 2013. So, I have no idea if it will catch on, but I don't think it can be conside= red end of life. (Many use NFSv3 and its RFC is dated June 1995.) rick > That is also why I am not sure I would totally embrace the idea of a cent= ral > MDS being a Real Option. Sure, the risks can be mitigated (as you say, b= y > mirroring it), but even saying the words =E2=80=9Ccentral MDS=E2=80=9D (o= r central anything) > may be such a turn-off to those very same DevOps folks, folks who have be= en > burned so many times by SPOFs and scaling bottlenecks in large environmen= ts, > that we'll lose the audience the minute they hear the trigger phrase. Ev= en > if it means signing up for Other Problems later, it=E2=80=99s a lot easie= r to =E2=80=9Csell=E2=80=9D > the concept of completely distributed mechanisms where, if there is any > notion of centralization at all, it=E2=80=99s at least the result of a qu= orum > election and the DevOps folks don=E2=80=99t have to do anything manually = to cause it > to happen - the cluster is =E2=80=9Cresilient" and "self-healing" and the= y are happy > with being able to say those buzzwords to the CIO, who nods knowingly and > tells them they=E2=80=99re doing a fine job! >=20 > Let=E2=80=99s get back, however, to the notion of downing multiple avians= with the > same semi-spherical kinetic projectile: What seems to be The Rage at the > moment, and I don=E2=80=99t know how well it actually scales since I=E2= =80=99ve yet to be at > the pointy end of such a real-world deployment, is the idea of clustering > the storage (=E2=80=9Csomehow=E2=80=9D) underneath and then providing NFS= and SMB protocol > access entirely in userland, usually with both of those services cooperat= ing > with the same lock manager and even the same ACL translation layer. Our > buddies at Red Hat do this with glusterfs at the bottom and NFS Ganesha + > Samba on top - I talked to one of the Samba core team guys at SNIA and he > indicated that this was increasingly common, with the team having helped > here and there when approached by different vendors with the same idea. = We > (iXsystems) also get a lot of requests to be able to make the same file(s= ) > available via both NFS and SMB at the same time and they don=E2=80=99t mu= ch at all > like being told =E2=80=9Cbut that=E2=80=99s dangerous - don=E2=80=99t do = that! Your file contents > and permissions models are not guaranteed to survive such an experience!= =E2=80=9D > They really want to do it, because the rest of the world lives in > Heterogenous environments and that=E2=80=99s just the way it is. >=20 > Even the object storage folks, like Openstack=E2=80=99s Swift project, ar= e spending > significant amounts of mental energy on the topic of how to re-export the= ir > object stores as shared filesystems over NFS and SMB, the single consiste= nt > and distributed object store being, of course, Their Thing. They wish, o= f > course, that the rest of the world would just fall into line and use thei= r > object system for everything, but they also get that the "legacy stuff=E2= =80=9D just > won=E2=80=99t go away and needs some sort of attention if they=E2=80=99re= to remain players > at the standards table. >=20 > So anyway, that=E2=80=99s the view I have from the perspective of someone= who > actually sells storage solutions for a living, and while I could certainl= y > =E2=80=9Csell some pNFS=E2=80=9D to various customers who just want to ad= d a dash of > steroids to their current NFS infrastructure, or need to use NFS but also > need to store far more data into a single namespace than any one box will > accommodate, I also know that offering even more elastic solutions will b= e a > necessary part of offering solutions to the growing contingent of folks w= ho > are not tied to any existing storage infrastructure and have various > non-greybearded folks shouting in their ears about object this and cloud > that. Might there not be some compromise solution which allows us to put > more of this in userland with less context switches in and out of the > kernel, also giving us the option of presenting a more united front to > multiple protocols that require more ACL and lock impedance-matching than > we=E2=80=99d ever want to put in the kernel anyway? >=20 > - Jordan >=20 >=20 >=20 >=20 From owner-freebsd-fs@freebsd.org Mon Jun 20 03:27:37 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9287A7BEE1 for ; Mon, 20 Jun 2016 03:27:37 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A20A22CCC for ; Mon, 20 Jun 2016 03:27:37 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5K3Rb2o032580 for ; Mon, 20 Jun 2016 03:27:37 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 210347] External USB Raidz2 zpool cause zfsloader auto reboot Date: Mon, 20 Jun 2016 03:27:37 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: chunlinyao@gmail.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2016 03:27:37 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D210347 --- Comment #1 from chunlinyao@gmail.com --- This error is dependent on the Mobo. I installed that internal disk and external enclosure to another PC, It works fine. Maybe there's a bug in the bios or zfsloader. I think it is hard to reproduce on other environment. If anyone require detail information, I will test it on my environment. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Mon Jun 20 10:01:42 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 557CBA79CDB for ; Mon, 20 Jun 2016 10:01:42 +0000 (UTC) (envelope-from dfr@rabson.org) Received: from mail-ob0-x22a.google.com (mail-ob0-x22a.google.com [IPv6:2607:f8b0:4003:c01::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1EF2228E2 for ; Mon, 20 Jun 2016 10:01:42 +0000 (UTC) (envelope-from dfr@rabson.org) Received: by mail-ob0-x22a.google.com with SMTP id ot10so64983obb.2 for ; Mon, 20 Jun 2016 03:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rabson-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=yAkun73SYRxvnX59kJLizbLIUNLVd8MJkWfR0VxBL4w=; b=FRk3O6nyS/ykLf2hwzzsrql74uTC61y2bl4xPs8A+0wBsgQGymHAyRnIgEyKEzGmEb iQKK6n/Jq8hLhIam31rvSssR7U2RfFPD4ZKqkglPkUMoXbvq32EllwpDaEt8HPVZMRjn C3cng59DwD8MgMPcOISkUyM74WYA8gwVEVNEALDo1YG2GdVx+M/w2N9YRk/6sNsgVvQq Zu7CpG7wBEBAxMCJIe6mENd3ruWCwuxXe0u2DWLj76tst23t3To+muNS2GSKyNrYyaTX /cKzD0Axqpqg3yataJtEvwVLA6PmnOXsi7yUEZttBSLFhTPcU3lU9/vmzO+NMyYSkJ8k kAqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=yAkun73SYRxvnX59kJLizbLIUNLVd8MJkWfR0VxBL4w=; b=TpAt4FOdLZ9IS1gdYFbLnTMAgUQ6xS/Ro+MOD3h8Y017mv8JGcrOeH5sM9F9E6zG2z w8WJqR3hNxxbtG7z2d8lynHpoQGHgdT7zZuy1XovWfB2Tpj9KwsDcb190Khb7rAYN3ow LqsjoljDKAkFd5kFwMr0Eyk0Vav+x8/PayFZ3Bg/iG3TgGvrboVIOcuo/y9+lmeH/ujM 4biJUAp0EyzZyXXEwCf6I8OyUFdP0P8nfq6K3EYIYj7O5asYSIf8LsND9ccXi3jZGfBl lUQGGcXxvFBXrrRecfgYUPdpDxfOoDZvm5z6whqPJe8o9iLKR8UgouHS1+pPCyZB+owA gJkA== X-Gm-Message-State: ALyK8tJf7xOy3foZQxeduNndci494yHZYG8x15WHWfV1DnVQBrRBrozGGIgqgOM9gFvDcCeDiVOBHspHVm999A== X-Received: by 10.157.31.82 with SMTP id x18mr2275600otx.137.1466416901259; Mon, 20 Jun 2016 03:01:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.22.68 with HTTP; Mon, 20 Jun 2016 03:01:40 -0700 (PDT) In-Reply-To: References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> From: Doug Rabson Date: Mon, 20 Jun 2016 11:01:40 +0100 Message-ID: Subject: Re: pNFS server Plan B To: Jordan Hubbard Cc: Rick Macklem , freebsd-fs , Alexander Motin Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2016 10:01:42 -0000 On 18 June 2016 at 21:50, Jordan Hubbard wrote: > > > On Jun 13, 2016, at 3:28 PM, Rick Macklem wrote: > > > > You may have already heard of Plan A, which sort of worked > > and you could test by following the instructions here: > > > > http://people.freebsd.org/~rmacklem/pnfs-setup.txt > > > > However, it is very slow for metadata operations (everything other than > > read/write) and I don't think it is very useful. > > Hi guys, > > I finally got a chance to catch up and bring up Rick=E2=80=99s pNFS setup= on a > couple of test machines. He=E2=80=99s right, obviously - The =E2=80=9Cpl= an A=E2=80=9D approach is > a bit convoluted and not at all surprisingly slow. With all of those > transits twixt kernel and userland, not to mention glusterfs itself which > has not really been tuned for our platform (there are a number of papers = on > this we probably haven=E2=80=99t even all read yet), we=E2=80=99re obviou= sly still in the > =E2=80=9Cfirst make it work=E2=80=9D stage. > > That said, I think there are probably more possible plans than just A and > B here, and we should give the broader topic of =E2=80=9Cwhat does FreeBS= D want to > do in the Enterprise / Cloud computing space?" at least some consideratio= n > at the same time, since there are more than a few goals running in parall= el > here. > > First, let=E2=80=99s talk about our story around clustered filesystems + > associated command-and-control APIs in FreeBSD. There is something of an > embarrassment of riches in the industry at the moment - glusterfs, ceph, > Hadoop HDFS, RiakCS, moose, etc. All or most of them offer different pro= s > and cons, and all offer more than just the ability to store files and sca= le > =E2=80=9Celastically=E2=80=9D. They also have ReST APIs for configuring = and monitoring the > health of the cluster, some offer object as well as file storage, and Ria= k > offers a distributed KVS for storing information *about* file objects in > addition to the object themselves (and when your application involves > storing and managing several million photos, for example, the idea of > distributing the index as well as the files in a fault-tolerant fashion i= s > also compelling). Some, if not most, of them are also far better support= ed > under Linux than FreeBSD (I don=E2=80=99t think we even have a working ce= ph port > yet). I=E2=80=99m not saying we need to blindly follow the herds and do= all the > same things others are doing here, either, I=E2=80=99m just saying that i= t=E2=80=99s a much > bigger problem space than simply =E2=80=9Cparallelizing NFS=E2=80=9D and = if we can kill > multiple birds with one stone on the way to doing that, we should certain= ly > consider doing so. > > Why? Because pNFS was first introduced as a draft RFC (RFC5661 < > https://datatracker.ietf.org/doc/rfc5661/>) in 2005. The linux folks > have been working on it < > http://events.linuxfoundation.org/sites/events/files/slides/pnfs.pdf> > since 2006. Ten years is a long time in this business, and when I raised > the topic of pNFS at the recent SNIA DSI conference (where storage > developers gather to talk about trends and things), the most prevalent > reaction I got was =E2=80=9Cpeople are still using pNFS?!=E2=80=9D This= is clearly one of > those technologies that may still have some runway left, but it=E2=80=99s= been > rapidly overtaken by other approaches to solving more or less the same > problems in coherent, distributed filesystem access and if we want to get > mindshare for this, we should at least have an answer ready for the =E2= =80=9Cwhy > did you guys do pNFS that way rather than just shimming it on top of > ${someNewerHotness}??=E2=80=9D argument. I=E2=80=99m not suggesting pNF= S is dead - hell, > even AFS still appears to be somewhat alive, > but there=E2=80=99s a difference between appealing to an increasingly nar= row niche > and trying to solve the sorts of problems most DevOps folks working At > Scale these days are running into. > > That is also why I am not sure I would totally embrace the idea of a > central MDS being a Real Option. Sure, the risks can be mitigated (as yo= u > say, by mirroring it), but even saying the words =E2=80=9Ccentral MDS=E2= =80=9D (or central > anything) may be such a turn-off to those very same DevOps folks, folks w= ho > have been burned so many times by SPOFs and scaling bottlenecks in large > environments, that we'll lose the audience the minute they hear the trigg= er > phrase. Even if it means signing up for Other Problems later, it=E2=80= =99s a lot > easier to =E2=80=9Csell=E2=80=9D the concept of completely distributed me= chanisms where, if > there is any notion of centralization at all, it=E2=80=99s at least the r= esult of a > quorum election and the DevOps folks don=E2=80=99t have to do anything ma= nually to > cause it to happen - the cluster is =E2=80=9Cresilient" and "self-healing= " and they > are happy with being able to say those buzzwords to the CIO, who nods > knowingly and tells them they=E2=80=99re doing a fine job! > My main reason for liking NFS is that it has decent client support in upstream Linux. One reason I started working on pNFS was that at $work our existing cluster filesystem product which uses a proprietary client protocol caused us to delay OS upgrades for months while we waited for $vendor to port their client code to RHEL7. The NFS protocol is well documented with several accessible reference implementations and pNFS gives enough flexibility to support a distributed filesystem at an interesting scale. You mention a 'central MDS' as being an issue. I'm not going to go through your list but at least HDFS also has this 'issue' and it doesn't seem to be a problem for many users storing >100 Pb across >10^5 servers. In practice, the MDS would be replicated for redundancy - there are lots of approaches for this, my preference being Paxos but Raft would work just as well. Google's GFS also followed this model and was an extremely reliable large scale filesystem. I am building an MDS as a layer on top of a key/value database which offers the possibility of moving the backing store to some kind of distributed key/value store in future which would remove the scaling and reliability concerns. > > Let=E2=80=99s get back, however, to the notion of downing multiple avians= with the > same semi-spherical kinetic projectile: What seems to be The Rage at the > moment, and I don=E2=80=99t know how well it actually scales since I=E2= =80=99ve yet to be > at the pointy end of such a real-world deployment, is the idea of > clustering the storage (=E2=80=9Csomehow=E2=80=9D) underneath and then pr= oviding NFS and > SMB protocol access entirely in userland, usually with both of those > services cooperating with the same lock manager and even the same ACL > translation layer. Our buddies at Red Hat do this with glusterfs at the > bottom and NFS Ganesha + Samba on top - I talked to one of the Samba core > team guys at SNIA and he indicated that this was increasingly common, wit= h > the team having helped here and there when approached by different vendor= s > with the same idea. We (iXsystems) also get a lot of requests to be abl= e > to make the same file(s) available via both NFS and SMB at the same time > and they don=E2=80=99t much at all like being told =E2=80=9Cbut that=E2= =80=99s dangerous - don=E2=80=99t do > that! Your file contents and permissions models are not guaranteed to > survive such an experience!=E2=80=9D They really want to do it, because = the rest > of the world lives in Heterogenous environments and that=E2=80=99s just t= he way it > is. > > Even the object storage folks, like Openstack=E2=80=99s Swift project, ar= e > spending significant amounts of mental energy on the topic of how to > re-export their object stores as shared filesystems over NFS and SMB, the > single consistent and distributed object store being, of course, Their > Thing. They wish, of course, that the rest of the world would just fall > into line and use their object system for everything, but they also get > that the "legacy stuff=E2=80=9D just won=E2=80=99t go away and needs some= sort of attention > if they=E2=80=99re to remain players at the standards table. > > So anyway, that=E2=80=99s the view I have from the perspective of someone= who > actually sells storage solutions for a living, and while I could certainl= y > =E2=80=9Csell some pNFS=E2=80=9D to various customers who just want to ad= d a dash of > steroids to their current NFS infrastructure, or need to use NFS but also > need to store far more data into a single namespace than any one box will > accommodate, I also know that offering even more elastic solutions will b= e > a necessary part of offering solutions to the growing contingent of folks > who are not tied to any existing storage infrastructure and have various > non-greybearded folks shouting in their ears about object this and cloud > that. Might there not be some compromise solution which allows us to put > more of this in userland with less context switches in and out of the > kernel, also giving us the option of presenting a more united front to > multiple protocols that require more ACL and lock impedance-matching than > we=E2=80=99d ever want to put in the kernel anyway? > I can agree with this - everything I'm working on is in userland. Given that I'm not trying to export a local filesystem most of the reasons for wanting a kernel implementation disappear. Adding support for NFS over RDMA removes all the network context switching and for frequently accessed data would typically be served out of a userland cache which removes the rest of the context switches. From owner-freebsd-fs@freebsd.org Mon Jun 20 15:00:43 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0269AC43D5 for ; Mon, 20 Jun 2016 15:00:43 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.codepro.be", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AB6E82A3F for ; Mon, 20 Jun 2016 15:00:43 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from [150.158.232.236] (114.40-78-194.adsl-static.isp.belgacom.be [194.78.40.114]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id DC7A81EF71 for ; Mon, 20 Jun 2016 17:00:41 +0200 (CEST) From: "Kristof Provost" To: freebsd-fs@freebsd.org Subject: Re: Panic with Root on ZFS Date: Mon, 20 Jun 2016 17:00:40 +0200 Message-ID: <614F3551-AC9B-4D99-9811-2E0092152FB0@FreeBSD.org> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Mailer: MailMate (1.9.4r5234) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2016 15:00:44 -0000 (In case this affects other people too) On 15 Jun 2016, at 21:38, Kristof Provost wrote: > I’ running a root-on-ZFS system and reliably see this panic during boot: > It’s a raidz vdev. The faulting kernel is r301916 (head). > The last version known to boot it r299060 (head). > After bisecting it looks like r300881 is responsible for the panic. Backing it out lets me boot with r302028. Regards, Kristof From owner-freebsd-fs@freebsd.org Mon Jun 20 21:05:32 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C8E6AC47B0; Mon, 20 Jun 2016 21:05:32 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-yw0-x230.google.com (mail-yw0-x230.google.com [IPv6:2607:f8b0:4002:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0E1C729C6; Mon, 20 Jun 2016 21:05:32 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-yw0-x230.google.com with SMTP id i12so37812897ywa.1; Mon, 20 Jun 2016 14:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=zhxF3O3Z4/iFA/VXcurrF8ydkC4MdzPnyxlEFUiqBd8=; b=ZogEQ7KHgt6uN8lDT8KBROYkdr5lU+YJoB7uM+wSvFTL5p4zKK7lrI3MGdTmHHidCQ 1+ewLZY8+ba8i7WULExMGIztbBMJwfvBPBbnhkACWWYVQSJwKZ8mFrtGV8KPs2sVEjya HmW8ZxpWXzINlejrmBbmAatukz0VqAPWColP2XKoikKp3jLKgvHgaS3nynpa/bL47Tev 83/gFIPjdocfCJ3MGk+EU5is2kct3HqFc0qC9l15axKWlAAedOD80FyxxI0nBYB1pmrP BBN1GgbM4Aj8o+g69QDJuZfmZIGUk0zx4jMqH+QBazbLajSV1z725xSm57T/mZz17hpF U8qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=zhxF3O3Z4/iFA/VXcurrF8ydkC4MdzPnyxlEFUiqBd8=; b=AHlF6KhuxWSKL4BYHmnBVCuGPl/4ur44rIPTQzQ/U4F3wTroveoexoWiiB1+AM223F 7KH0v04su2FESbxfOKaPHdQv0AiLMDNqruPmWYn//DHkkyrIMhVpSF5tkZw2YJMRFC1T TxmR2+fObPp/BxRj2fNYwEguzpi16mjtDq1ym/EmDYiekX5EI/rrd8HrOvYilFxO4jpV ZGZh3OB0+o/RxWjeI1jtmO94cNalImUYSU9GW1sTn38FrO20eOdWBkpcCcvIsc98bU4w 53tw528k4xZWhZV0AOiNlU6tKA3/vzOBFCGmiQ/QxJykUUu4OBxGYcKSAcD770blfoX0 Mbsw== X-Gm-Message-State: ALyK8tIaAGLTj6PMHbsc0U71wcs4JJQJzebr6VCdJ9dyBhXF6Yw2KmJ5+/2HNHX2SRU94hUJgm59tylMCWetRQ== X-Received: by 10.13.221.204 with SMTP id g195mr5655025ywe.238.1466456731187; Mon, 20 Jun 2016 14:05:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.27.130 with HTTP; Mon, 20 Jun 2016 14:05:30 -0700 (PDT) From: Zaphod Beeblebrox Date: Mon, 20 Jun 2016 17:05:30 -0400 Message-ID: Subject: The small installations network filesystem and users. To: freebsd-fs , FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2016 21:05:32 -0000 Correct me if I'm wrong, but amidst discussions of pNFS (among other things) I thought I should bring up something someone said to me: and that is (to quote him) "using NFS is too hard, I always fail." I can empathize (although I know better) with this statement. I've been using NFS since v2 was a "new thing." Rick Maclem was the sysadmin at my University. So here's the thing. SMB is easier to implement than NFSv4. NFSv3 is easier to implement than v4. In general, even though I know what is required, I implement SMB or v3 rather than v4... which means I'm better off than my friend: he just does without network filesystems. Back-in-the-day, (1995-ish) I worked for an outfit that released on some 30 odd platforms including VMS. We had /d// mounted on every machine. Besides the fact that power outages were a bit of a nightmare (many machines didn't recover well if their NFS imports were not yet ready), This worked well and you could access your home directory on any machine from any other machine. The company never really had the money to have a proper home directory server ... and generally that ended up being your own workstation... and we worked on satellite imagery ... so disks were always full... and the backbone was 10Base2... But just networking 2 FreeBSD boxes' filesystems seems harder than that lot back then. Add in a couple linux boxes and something from M$, and you're into the territory where you just scp files around. I get the fact that network authentication is hard. I get that this is the problem. I've made 3 or 4 serious runs at LDAP ... but I haven't gotten it working. Is it time we (FreeBSD) had a solution that at least worked? Something ever-so-close-to turnkey? I've we're looking at the other more complex adoptions (like pNFS and ZFS and whatnot) ... it would seem that we should ship something that has a chance of working. From owner-freebsd-fs@freebsd.org Mon Jun 20 22:11:20 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72AE5AC451C; Mon, 20 Jun 2016 22:11:20 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [IPv6:2607:f8b0:4003:c01::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 29314297F; Mon, 20 Jun 2016 22:11:20 +0000 (UTC) (envelope-from bsdunix44@gmail.com) Received: by mail-ob0-x235.google.com with SMTP id mu6so1173302obc.3; Mon, 20 Jun 2016 15:11:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:mime-version:in-reply-to:content-transfer-encoding :message-id:cc:from:subject:date:to; bh=pd4D1vJ1e3sfx0cwKLev4YCZDbMTLfP2qfUMdoUdGpo=; b=PiWdI4Br8XjsjNfBVPqpAxWS1M56zy/7Ph+YoPg6jAiQyF873FwlDIWM7m3A7tx2vs NHHs17QGTdJq6d8NUI/QNtvtOVF+QPEKeSGWQdt9AiRuLV5Notn7RqbZDh+rtFAe99Og RMeK6TR7qqdecm20U1o2OpZMbsBb4FfjGSTriH1VCRtXDRKiAQgljksfvCZX3FZEGKO7 mv7CBxV4Lz7xUp/HNnvTbOPdHPLOKL3PefSmucI9XkmJN8lFVTQtt6x+o4S5Q+HJdTQj R0qGdNZepMSMjkLbVzSx1ZQVPQR7ieFCe1vdGflvHxWZ/48FowKdMSw7l2AD7N2Jkebi ej1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:references:mime-version:in-reply-to :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=pd4D1vJ1e3sfx0cwKLev4YCZDbMTLfP2qfUMdoUdGpo=; b=Y7HziGyzL4L5iAt7sBBXDgd+88+wfz7Oc6/bROPF9Q75yDCs2EyOfqKx2hW2FxOiRM t5WYzD4RqvEJH/nSAj/ukPoxmVDV2BHQePS1uGFO1JuxzRBY0+ocNCuR2DQKD7p36jpX 8DGbGjaBzwWMhWiNR9yhHPi8jyzRgjT8JORqBaplX9NVNsEC6NmMRUMBcF2fXy6nETGc 4t/7zmNctksyhZVneBpEr3oUB6MxjZD034QHX81nKEV2xODgzrx4NIQt1aX5XCi2ycrB tr7BI7ahA03ti1ZZHvu5RfvFQnzFMfJULZ2WSfO9R3lmuZyTZwwolxrbpsRz+Ekeexj3 Wm4g== X-Gm-Message-State: ALyK8tK9k4a7AlrBFNqw0szqRWQABjoroVKYk+lWAAhCbq2Z2kX2ALhDWwuo8buMTDpglw== X-Received: by 10.157.38.185 with SMTP id l54mr12707874otb.112.1466460678994; Mon, 20 Jun 2016 15:11:18 -0700 (PDT) Received: from [192.168.0.10] (cpe-70-118-225-173.kc.res.rr.com. [70.118.225.173]) by smtp.gmail.com with ESMTPSA id e48sm4762426otc.23.2016.06.20.15.11.17 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 20 Jun 2016 15:11:18 -0700 (PDT) References: Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: <9BB7E8B3-EC0E-457E-B2B2-FB80B1CF02B0@gmail.com> Cc: freebsd-fs , FreeBSD Hackers X-Mailer: iPhone Mail (13F69) From: Chris Watson Subject: Re: The small installations network filesystem and users. Date: Mon, 20 Jun 2016 17:11:16 -0500 To: Zaphod Beeblebrox X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2016 22:11:20 -0000 I'm glad you brought this up. I wanted to but I've heard it before on the li= sts and realize that there is this disconnect between the developers doing t= he actual work to implement these things and the end users.=20 I have always been very grateful to all the developers who over the years, a= nd I've been a FreeBSD consumer since the late 90s? And attended my first us= enix/freenix conf in Monterrey in 2001?, have done some really hard work on m= any many things in FreeBSD. For zero pay. But the thing that has always both= ered me about a lot of it is, it's just to complex to use for most end users= . Not all. But people want to get work done. Sifting through .conf files, go= ogling howtos, spending more time configuring it than installing it has alwa= ys been an issue. Developers in general do not think like an end user. And t= his leads to non developers just going "screw it I'll just get it running on= Linux with my GUI installer." Which is why FreeNas is so popular. It's take= n a lot, not all, but a lot of the pain and time consuming nature of learnin= g all the ins and outs of a NAS appliance from the equation.=20 It's wonderful to have flexibility and lord knows there are plenty of option= s and flags for most software. ZFS took a lot of pain out of file systems an= d volume management. I remember in 2001 staring at an HPUX box trying to fig= ure out Its volume manager and truth be told I never did and wanted to stick= my head in a meat grinder. It would have been less painful. I don't know if= the problem is simply writing things that are simple and optionally complex= is hard? Or if the people doing the work just want it to work for them and d= on't really want to take even more time to sit down and actually consider th= e software and its management from a users/consumers viewpoint.=20 There was a photo from bsdcan this year of a "sysadmin spotting" shirt. If y= ou read the text on it you actually begin to see how systemic and difficult a= ctually using and configuring most software is. It's probably a good reason m= ost developers use macs. In addition to better HW support. I'm not sure what= the solution to this is. I think it would be great if beta testers and the d= evelopers had a closer connection and issues were handles in a timely manner= . But in a volunteer project I get why that is unreasonable. But I mean go t= hrough the bug database and you can see PRs that are years old. I don't know= . I just know I'm getting to old to spend all day beating my head against so= ftware to get it working. Honestly if I have to spend over an hour reading c= rap docs all over the net because your manpage make no sense or is vague, tr= ying to configure the software, your software sucks and I'm rm'ing it. I rec= ently went through this with opensmtpd. I went right back to postfix. And al= l over something as simple or should be as simple as mail aliases!=20 Chris Sent from my iPhone 5 > On Jun 20, 2016, at 4:05 PM, Zaphod Beeblebrox wrote: >=20 > Correct me if I'm wrong, but amidst discussions of pNFS (among other > things) I thought I should bring up something someone said to me: and that= > is (to quote him) "using NFS is too hard, I always fail." >=20 > I can empathize (although I know better) with this statement. I've been > using NFS since v2 was a "new thing." Rick Maclem was the sysadmin at my > University. >=20 > So here's the thing. SMB is easier to implement than NFSv4. NFSv3 is > easier to implement than v4. In general, even though I know what is > required, I implement SMB or v3 rather than v4... which means I'm better > off than my friend: he just does without network filesystems. >=20 > Back-in-the-day, (1995-ish) I worked for an outfit that released on some 3= 0 > odd platforms including VMS. We had /d// mounted on every > machine. Besides the fact that power outages were a bit of a nightmare > (many machines didn't recover well if their NFS imports were not yet > ready), This worked well and you could access your home directory on any > machine from any other machine. The company never really had the money to= > have a proper home directory server ... and generally that ended up being > your own workstation... and we worked on satellite imagery ... so disks > were always full... and the backbone was 10Base2... >=20 > But just networking 2 FreeBSD boxes' filesystems seems harder than that lo= t > back then. Add in a couple linux boxes and something from M$, and you're > into the territory where you just scp files around. >=20 > I get the fact that network authentication is hard. I get that this is th= e > problem. I've made 3 or 4 serious runs at LDAP ... but I haven't gotten i= t > working. Is it time we (FreeBSD) had a solution that at least worked? > Something ever-so-close-to turnkey? >=20 > I've we're looking at the other more complex adoptions (like pNFS and ZFS > and whatnot) ... it would seem that we should ship something that has a > chance of working. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Tue Jun 21 00:58:19 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AD441A7AA39; Tue, 21 Jun 2016 00:58:19 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-yw0-x230.google.com (mail-yw0-x230.google.com [IPv6:2607:f8b0:4002:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6D91713D9; Tue, 21 Jun 2016 00:58:19 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-yw0-x230.google.com with SMTP id i12so1035353ywa.1; Mon, 20 Jun 2016 17:58:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=BlkcvIv/mNI6Yc7HidWq/8qV8OyrMptLR56uH23xosI=; b=D/UipZ2CdGHVURfkoIMxt8xOqKi3x8c+YHw2jmJwfvk0t4GLxl7jJ/95Z3MIqRzFWJ UVDRGEgITHVZILUqPqGrecjmo+whIJmSFVCACaAI6JuB4ElJ+HHS4TwU0LvLE0PuFbtJ EswAyPDF3vrvC3E6VoUQvpM/u/FXpWT5wtlR/kO/6Ic/kfyN0AG/IH4eazF1/bkaopRT +RuqfFVk9hHJeBFVm6wnf86UXGnvLr2KFw83dr65rCFnRIHZn0Stjzz2nU/QPaOjrY6W LlOrTnEaqKiQ5QdAUXwnTzEaCKI9UbZjoRbiFhw5FJhRu/ARlTMzwPFsJmzXoS18EtUn kp2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=BlkcvIv/mNI6Yc7HidWq/8qV8OyrMptLR56uH23xosI=; b=CNwfCFuE60GSZzVWtBSkNCzPAKELTs7DXzIazV+gr8aZ6/PnooC8YO/AO4yn5gtiTX 9IodCmvyRNEsUe3afw4EOnjsap+L7flQ2eQZG8vBvwH8a6SdAZrtOJCDNZJDcxz9/qmc srbho8kx8/zXTIgiUDRnPfYnpabKY0XZuaYq8hcSb3mbwezFiPATUqxPb0UDg/qkctlU YrdmeBspL2+wgkvERmX4MvPlFPFhG1bAzJrkd3zs46FiLJzEpCzMSuWFfA1K8opaQW8P LioggofBK56j69axvqVHzCpkD6G4O+dVDHNkhV0TCOB9//l0O1QmB+c/gBcUnM8s4Td0 AXIg== X-Gm-Message-State: ALyK8tK/lTVXHjkwR69YeDdzS0ZLMCDEqgRB0gFccEwmIv24gGmqjmhK2bmxL2GpIhLh+6kT+nEUjKNwL7jkZw== X-Received: by 10.129.145.136 with SMTP id i130mr5702225ywg.1.1466470698674; Mon, 20 Jun 2016 17:58:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.27.130 with HTTP; Mon, 20 Jun 2016 17:58:18 -0700 (PDT) In-Reply-To: <9BB7E8B3-EC0E-457E-B2B2-FB80B1CF02B0@gmail.com> References: <9BB7E8B3-EC0E-457E-B2B2-FB80B1CF02B0@gmail.com> From: Zaphod Beeblebrox Date: Mon, 20 Jun 2016 20:58:18 -0400 Message-ID: Subject: Re: The small installations network filesystem and users. To: Chris Watson Cc: freebsd-fs , FreeBSD Hackers Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 00:58:19 -0000 On Mon, Jun 20, 2016 at 6:11 PM, Chris Watson wrote: > I'm glad you brought this up. I wanted to but I've heard it before on the > lists and realize that there is this disconnect between the developers > doing the actual work to implement these things and the end users. > > [...] > > There was a photo from bsdcan this year of a "sysadmin spotting" shirt. If > you read the text on it you actually begin to see how systemic and > difficult actually using and configuring most software is. It's probably a > good reason most developers use macs. In addition to better HW support. I'm > not sure what the solution to this is. I think it would be great if beta > testers and the developers had a closer connection and issues were handles > in a timely manner. But in a volunteer project I get why that is > unreasonable. But I mean go through the bug database and you can see PRs > that are years old. I don't know. I just know I'm getting to old to spend > all day beating my head against software to get it working. Honestly if I > have to spend over an hour reading crap docs all over the net because your > manpage make no sense or is vague, trying to configure the software, your > software sucks and I'm rm'ing it. I recently went through this with > opensmtpd. I went right back to postfix. And all over something as simple > or should be as simple as mail aliases! > > Not exactly where I expected this post to go, but for the record, I was at BSDCan this year. When I can get my head around something, I have submitted patches (ethernet drivers, netgraph, softupdate bugs (back-in-the-day), many ports and a few userland utilities). I'm not exactly a user who chucks things and installs linux. I even run a full on ADSL-providing ISP on FreeBSD without help from any non-FreeBSD product other than my core switch. That-all-said, authentication is a possible huge win. I was recently involved in a deployment of ubuntu that included LDAP and even though it was a mess, it eventually was hammered into working. Ubuntu and the implementation were not my choice, but you do-what-you're-told when someone else is paying the bill. Honestly, I don't know how I would have pitched FreeBSD there. Not even ubuntu itself had LDAP right. It was a combination of third parties. Even with that gigantic head start, LDAP was a bear --- but AFAICT, LDAP is _required_ for NFSv4 deployments. Now, LDAP without Winblows is slightly less of a bear, _but_ Maybe this dovetails with a subtext at BSDCan's keysigning BOF: that many projects risk irrelevance with their complexity. It's not that I believe complex setups are bad. But simple things need be simple. I have 3 machines at home (for instance) and a cluster of 8 machines in colo (run the ISP). On my 3 machines at home, I run NFSv3 because it works and I can get it setup. I'd like to run NFSv4 because then my windows machines would look at it, but I run SMB instead (v3, no less) because it roughly works. So at home... I have three machines and a fairly liberal hacking time budget. I have failed at LDAP several times. I'm back to copying the master.password file around because that works. I don't like it, but it works. It seems like the breakeven for LDAP effort vs. scp master.password is somewhere around 50 machines. -ish. I realize the real problem is that authentication has become more complex in the world since networks can't be trusted. I have to wonder if we're getting back closer to that now with all the tunneling on wifi and campus networks. Sigh. I'm starting to feel like this whole post has no purpose. From owner-freebsd-fs@freebsd.org Tue Jun 21 02:11:17 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C4E0A7AAB6; Tue, 21 Jun 2016 02:11:17 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.netplex.net", Issuer "RapidSSL SHA256 CA - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 33CD319F5; Tue, 21 Jun 2016 02:11:17 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.15.1/8.15.1/NETPLEX) with ESMTP id u5L20VdP025342; Mon, 20 Jun 2016 22:00:31 -0400 X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.4.3 (mail.netplex.net [204.213.176.9]); Mon, 20 Jun 2016 22:00:31 -0400 (EDT) Date: Mon, 20 Jun 2016 22:00:31 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net Reply-To: Daniel Eischen To: Zaphod Beeblebrox cc: Chris Watson , freebsd-fs , FreeBSD Hackers Subject: Re: The small installations network filesystem and users. In-Reply-To: Message-ID: References: <9BB7E8B3-EC0E-457E-B2B2-FB80B1CF02B0@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 02:11:17 -0000 On Mon, 20 Jun 2016, Zaphod Beeblebrox wrote: > > That-all-said, authentication is a possible huge win. I was recently > involved in a deployment of ubuntu that included LDAP and even though it > was a mess, it eventually was hammered into working. Ubuntu and the > implementation were not my choice, but you do-what-you're-told when someone > else is paying the bill. Honestly, I don't know how I would have pitched > FreeBSD there. Not even ubuntu itself had LDAP right. It was a > combination of third parties. Even with that gigantic head start, LDAP was > a bear --- but AFAICT, LDAP is _required_ for NFSv4 deployments. Now, LDAP > without Winblows is slightly less of a bear, _but_ > > Maybe this dovetails with a subtext at BSDCan's keysigning BOF: that many > projects risk irrelevance with their complexity. It's not that I believe > complex setups are bad. But simple things need be simple. I have 3 > machines at home (for instance) and a cluster of 8 machines in colo (run > the ISP). On my 3 machines at home, I run NFSv3 because it works and I can > get it setup. I'd like to run NFSv4 because then my windows machines would > look at it, but I run SMB instead (v3, no less) because it roughly works. > So at home... I have three machines and a fairly liberal hacking time > budget. I have failed at LDAP several times. I'm back to copying the > master.password file around because that works. I don't like it, but it > works. It seems like the breakeven for LDAP effort vs. scp master.password > is somewhere around 50 machines. -ish. > > I realize the real problem is that authentication has become more complex > in the world since networks can't be trusted. I have to wonder if we're > getting back closer to that now with all the tunneling on wifi and campus > networks. Sigh. I'm starting to feel like this whole post has no purpose. We should support LDAP client out of the box, in base. What sucks now is that we need 3 packages (plus their dependencies) and multiple config files for ldap: pam_ldap nss_ldap openldap-client And modify/tailor 3 config files in ${LOCALBASE}, all similarly: ldap.conf nss_ldap.conf openldap/ldap.conf Then the secret files also in ${LOCALBASE}, again with the same info: etc/ldap.secret etc/nss_ldap.secret Then you have to deal with the certificates, and more than one is a pain. Then in ${BASE} you have to add an ldap file in /etc/pam.d/. And modify /etc/nsswitch.conf. It seems easier, with less config duplication, in Solaris (11): # Initialize the NSS database. $ certutil -N -d /var/ldap $ chmod 444 /var/ldap/*.db # Add your certificate(s). $ certutil -A -n -i /tmp/certfile.pem -t CT -d /var/ldap # Setup the system as an LDAP client. $ ldapclient init Modifying /etc/nsswitch.conf on FreeBSD is easier than Solaris, still haven't gotten use to the many keystrokes needed for svc mods: $ svccfg svc:/system/name-service/switch> setprop config/host = astring: "files dns" svc:/system/name-service/switch> setprop config/password = astring: "files [NOTFOUND=continue] ldap" svc:/system/name-service/switch> setprop config/group = astring: "files [NOTFOUND=continue] ldap" svc:/system/name-service/switch> select system/name-service/switch:default svc:/system/name-service/switch:default> refresh svc:/system/name-service/switch:default> validate svc:/system/name-service/switch:default> quit $ svcadm refresh name-service/switch -- DE From owner-freebsd-fs@freebsd.org Tue Jun 21 02:54:21 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BCA22AC52A7 for ; Tue, 21 Jun 2016 02:54:21 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: from barracuda.ixsystems.com (barracuda.ixsystems.com [12.229.62.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certificate Authority - G2" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id A041D12BA for ; Tue, 21 Jun 2016 02:54:21 +0000 (UTC) (envelope-from jkh@ixsystems.com) X-ASG-Debug-ID: 1466477660-08ca0411401af1c0001-3nHGF7 Received: from zimbra.ixsystems.com ([10.246.0.20]) by barracuda.ixsystems.com with ESMTP id fH24DdDtIrRWPyYo (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 20 Jun 2016 19:54:20 -0700 (PDT) X-Barracuda-Envelope-From: jkh@ixsystems.com X-Barracuda-RBL-Trusted-Forwarder: 10.246.0.20 X-ASG-Whitelist: Client Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 656F1DE1BD2; Mon, 20 Jun 2016 19:54:20 -0700 (PDT) Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 6CbDzFxCe2mH; Mon, 20 Jun 2016 19:54:20 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id DF189DE1BD0; Mon, 20 Jun 2016 19:54:19 -0700 (PDT) X-Virus-Scanned: amavisd-new at ixsystems.com Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id HRcf4eGmXRwl; Mon, 20 Jun 2016 19:54:19 -0700 (PDT) Received: from [172.20.0.18] (vpn.ixsystems.com [10.249.0.2]) by zimbra.ixsystems.com (Postfix) with ESMTPSA id A5D32DE1BCB; Mon, 20 Jun 2016 19:54:19 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: pNFS server Plan B From: Jordan Hubbard X-ASG-Orig-Subj: Re: pNFS server Plan B In-Reply-To: Date: Mon, 20 Jun 2016 19:54:24 -0700 Cc: freebsd-fs , Alexander Motin Content-Transfer-Encoding: quoted-printable Message-Id: <74CD7EB1-1656-4511-8B63-5C4401D1BB8D@ixsystems.com> References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> To: Doug Rabson X-Mailer: Apple Mail (2.3124) X-Barracuda-Connect: UNKNOWN[10.246.0.20] X-Barracuda-Start-Time: 1466477660 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://10.246.0.26:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ixsystems.com X-Barracuda-BRTS-Status: 1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 02:54:21 -0000 OK, wow. This appears to have turned into something of a referendum on = NFS and, just based on Rick and Doug=E2=80=99s defense of pNFS, I also = think my commentary on that may have been misconstrued somewhat. So, let me just set the record straight by saying that I=E2=80=99m all = in favor of pNFS. It addresses a very definite need in the Enterprise = marketplace and gives FreeBSD yet another arrow in its quiver when it = comes to being =E2=80=9Ca player=E2=80=9D in that (ever-growing) arena. = The only point I was trying to make before was that if we could ALSO = address clustering in a more general way as part of providing a pNFS = solution, that would be great. I am not, however, the one writing the = code and if my comments were in any way discouraging to the folks that = are, I apologize and want to express my enthusiasm for it. If iXsystems = engineering resources can contribute in any way to moving this ball = forward, let me know and we=E2=80=99ll start doing so. On the more general point of =E2=80=9CNFS is hard, let=E2=80=99s go = shopping=E2=80=9D let me also say that it=E2=80=99s kind of important = not to conflate end-user targeted solutions with enterprise solutions. = Setting up a Kerberized NFSv4, for example, is not really designed to be = trivial to set up and if anyone is waiting for that to happen, they may = be waiting a very long time (like, forever). NFS and SMB are both = fairly simple technologies to use if you restrict yourself to using, = say, just 20% of their overall feature-sets. Once you add ACLs, = Directory Services, user/group and permissions mappings, and any of the = other more enterprise-centric features of these filesharing = technologies, however, things rapidly get more complicated and the = DevOps people who routinely play in these kinds of environments are = quite happy to have all those options available because they=E2=80=99re = not consumers operating in consumer environments. =20 Sun didn=E2=80=99t design NFS to be particularly consumer-centric, for = that matter, and if you think SMB is =E2=80=9Csimple=E2=80=9D because = you clicked Network on Windows Explorer one day and stuff just = automagically appeared, you should try operating it in a serious Windows = Enterprise environment (just flip through some of the SMB bugs in the = FreeNAS bug tracker - = https://bugs.freenas.org/projects/freenas/issues?utf8=3D=E2=9C=93&set_filt= er=3D1&f%5B%5D=3Dstatus_id&op%5Bstatus_id%5D=3D*&f%5B%5D=3Dcategory_id&op%= 5Bcategory_id%5D=3D%3D&v%5Bcategory_id%5D%5B%5D=3D57&f%5B%5D=3D&c%5B%5D=3D= tracker&c%5B%5D=3Dstatus&c%5B%5D=3Dpriority&c%5B%5D=3Dsubject&c%5B%5D=3Das= signed_to&c%5B%5D=3Dupdated_on&c%5B%5D=3Dfixed_version&group_by=3D - if = you want to see the kinds of problems users wrestle with all the time). Anyway, I=E2=80=99ll get off the soapbox now, I just wanted to dispute = the premise that =E2=80=9Csimple file sharing=E2=80=9D that is also = =E2=80=9Csecure file sharing=E2=80=9D and =E2=80=9Cflexible file = sharing=E2=80=9D doesn=E2=80=99t really exist. The simplest end-user = oriented file sharing system I=E2=80=99ve used to date is probably AFP, = and Apple has been trying to kill it for years, probably because it = doesn=E2=80=99t have all those extra knobs and Kerberos / Directory = Services integration business users have been asking for (it=E2=80=99s = also not particularly industry standard). - Jordan From owner-freebsd-fs@freebsd.org Tue Jun 21 06:05:28 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CF744A7B135; Tue, 21 Jun 2016 06:05:28 +0000 (UTC) (envelope-from gerrit.kuehn@aei.mpg.de) Received: from umail.aei.mpg.de (umail.aei.mpg.de [194.94.224.6]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 85E29174A; Tue, 21 Jun 2016 06:05:28 +0000 (UTC) (envelope-from gerrit.kuehn@aei.mpg.de) Received: from mailgate.aei.mpg.de (mailgate.aei.mpg.de [194.94.224.5]) by umail.aei.mpg.de (Postfix) with ESMTP id 583AF20015F; Tue, 21 Jun 2016 07:56:32 +0200 (CEST) Received: from mailgate.aei.mpg.de (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 4A1B5406ADE; Tue, 21 Jun 2016 07:56:32 +0200 (CEST) Received: from intranet.aei.uni-hannover.de (ahin1.aei.uni-hannover.de [130.75.117.40]) by mailgate.aei.mpg.de (Postfix) with ESMTP id 2AF47406ADB; Tue, 21 Jun 2016 07:56:32 +0200 (CEST) Received: from arc.aei.uni-hannover.de ([130.75.117.1]) by intranet.aei.uni-hannover.de (IBM Domino Release 9.0.1FP5) with ESMTP id 2016062107563122-16958 ; Tue, 21 Jun 2016 07:56:31 +0200 Date: Tue, 21 Jun 2016 07:56:31 +0200 From: Gerrit =?ISO-8859-1?Q?K=FChn?= To: Daniel Eischen Cc: Zaphod Beeblebrox , freebsd-fs , FreeBSD Hackers Subject: Re: The small installations network filesystem and users. Message-Id: <20160621075631.38c2eeaa7c224aa22ea9be4d@aei.mpg.de> In-Reply-To: References: <9BB7E8B3-EC0E-457E-B2B2-FB80B1CF02B0@gmail.com> Organization: Max Planck Gesellschaft X-Mailer: Sylpheed 3.4.2 (GTK+ 2.24.22; amd64-portbld-freebsd10.0) Mime-Version: 1.0 X-MIMETrack: Itemize by SMTP Server on intranet/aei-hannover(Release 9.0.1FP5|November 22, 2015) at 21/06/2016 07:56:31, Serialize by Router on intranet/aei-hannover(Release 9.0.1FP5|November 22, 2015) at 21/06/2016 07:56:31, Serialize complete at 21/06/2016 07:56:31 X-TNEFEvaluated: 1 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-PMX-Version: 6.0.2.2308539, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2016.6.21.54517 X-PerlMx-Spam: Gauge=IIIIIIIII, Probability=9%, Report=' MULTIPLE_RCPTS 0.1, HTML_00_01 0.05, HTML_00_10 0.05, MIME_LOWER_CASE 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_1000_LESS 0, BODY_SIZE_2000_LESS 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, BODY_SIZE_800_899 0, IN_REP_TO 0, LEGITIMATE_NEGATE 0, MSG_THREAD 0, MULTIPLE_RCPTS_RND 0, NO_URI_HTTPS 0, REFERENCES 0, __ANY_URI 0, __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0, __HAS_CC_HDR 0, __HAS_FROM 0, __HAS_MSGID 0, __HAS_X_MAILER 0, __IN_REP_TO 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __MULTIPLE_RCPTS_CC_X2 0, __REFERENCES 0, __SANE_MSGID 0, __SUBJ_ALPHA_NEGATE 0, __TO_MALFORMED_2 0, __URI_NO_WWW 0, __URI_NS ' X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 06:05:28 -0000 On Mon, 20 Jun 2016 22:00:31 -0400 (EDT) Daniel Eischen wrote about Re: The small installations network filesystem and users.: DE> We should support LDAP client out of the box, in base. What DE> sucks now is that we need 3 packages (plus their dependencies) DE> and multiple config files for ldap: DE> DE> pam_ldap DE> nss_ldap DE> openldap-client I only have to install/config ldap-clients every now and then, but I would also strongly favour a more "integrated" setup (if that requires having it in base is a different question, though). A few weeks ago I used nss-pam-ldapd instead of pam_ldap and nss_ldap for the first time, and it appeared to work with a bit less of a hassle for me (otoh, I don't do any funky things here, I just need a replacement for what we did with NIS something like 20 years ago). cu Gerrit From owner-freebsd-fs@freebsd.org Tue Jun 21 15:03:53 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 33B9CAC4DBE for ; Tue, 21 Jun 2016 15:03:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 237A723D0 for ; Tue, 21 Jun 2016 15:03:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5LF3r0V032849 for ; Tue, 21 Jun 2016 15:03:53 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 210430] Lock order reversal zfs_vfsops.c - when switching to snapshots directory Date: Tue, 21 Jun 2016 15:03:53 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 15:03:53 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D210430 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-fs@FreeBSD.org --- Comment #1 from Andriy Gapon --- This is a harmless warning which happens only because vnodes under .zfs are tagged with special "zfs_gfs" tag while logically they belong to the same z= fs filesystem as the normal vnodes. I am not sure if there is any real reason to have that "zfs_gfs" tag, but t= here is no bug here. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Tue Jun 21 15:04:50 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5E1FEAC4E6D for ; Tue, 21 Jun 2016 15:04:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DBD624C3 for ; Tue, 21 Jun 2016 15:04:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5LF4oaK052040 for ; Tue, 21 Jun 2016 15:04:50 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 210430] Lock order reversal zfs_vfsops.c - when switching to snapshots directory Date: Tue, 21 Jun 2016 15:04:50 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: Closed X-Bugzilla-Resolution: Works As Intended X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 15:04:50 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D210430 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- Status|New |Closed Resolution|--- |Works As Intended --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Tue Jun 21 16:20:33 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4628AC5BD5 for ; Tue, 21 Jun 2016 16:20:33 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-it0-x22c.google.com (mail-it0-x22c.google.com [IPv6:2607:f8b0:4001:c0b::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 94CE3219C for ; Tue, 21 Jun 2016 16:20:33 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by mail-it0-x22c.google.com with SMTP id f6so18139247ith.0 for ; Tue, 21 Jun 2016 09:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kateley-com.20150623.gappssmtp.com; s=20150623; h=reply-to:subject:references:to:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=xUZbt6t5MHl8Y9Ex6+5wM7zbfuQ0jK4TD+HlOgqS9MQ=; b=rCl7umQ/p68PjmhsnLeV2m34xTCOQKVvPr1Jh3hRZTq7ihGzJO9zve/Q1nmVS62+J4 zQ0bMK+KV2QmVeeBiEqZpEsvlfmLNnm56bj3X7UY8Gs8897Pv5L3EtPuObLl8TFdfb4M SiySybo8kkaho41r6APirjaOdsII+8gfWgAA8RsLIoskIMUcM1zQoSyYkT09Ggxk6Ccm AfuxoC8MNMBSiIl/S88BUakcsa5GoZhNiY3X0FimYCM40Ia2IcaGaezNJU+T9bEJz/6s 1BLyTRMIcn5WagvXV7aJbKhVZDbSHmknsk+XwrJOEltds0fNnxSb8B+y23JToCTmA00Q PlFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:reply-to:subject:references:to:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding; bh=xUZbt6t5MHl8Y9Ex6+5wM7zbfuQ0jK4TD+HlOgqS9MQ=; b=STgar0GTqtMdOjv7SUiDw+tRyLIENB1Qtoji1Bh2l2sOM8HAVG/9oOCDDA7qrooV64 bdVBMis2jesBVAAs5fh1a1gUX7BhsGCK/mh9FzjNpc9psV+VusgbtWyikC475b6cHYFF exRiG6ELYJ3iNxvWFhEAtq+Dc1rcyjzfDW/m6GX7eiK/2OTud5gZpSdpQ+Rj8NEw06Kr qH7Zsv7LgMviitwIy3pZYSECTTUnzEDXa+/ujZNdfJ1mH0Gej1wKUJbNbKtRORpQ0G1F rYjlUS+H/s3ITy8ID8LqamvGlZwenZVK6kuErPqoh2yvGUscIRzXm2AdwDQE4bHADmCn M5TQ== X-Gm-Message-State: ALyK8tKo5I+GLAROiJGfrbaqtys9RBxbW2LqIv6LxrJjqc69dfkag1R+RWlicCu/n8Gz2Q== X-Received: by 10.36.192.9 with SMTP id u9mr7160157itf.90.1466526032478; Tue, 21 Jun 2016 09:20:32 -0700 (PDT) Received: from Kateleyco-iMac.local ([63.231.252.189]) by smtp.googlemail.com with ESMTPSA id s205sm1644167itd.16.2016.06.21.09.20.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Jun 2016 09:20:31 -0700 (PDT) Reply-To: linda@kateley.com Subject: Re: pNFS server Plan B References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <74CD7EB1-1656-4511-8B63-5C4401D1BB8D@ixsystems.com> To: freebsd-fs@freebsd.org From: Linda Kateley Organization: Kateley Company Message-ID: <5ccbd619-88f2-8480-727a-4b70f11a35ba@kateley.com> Date: Tue, 21 Jun 2016 11:20:27 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <74CD7EB1-1656-4511-8B63-5C4401D1BB8D@ixsystems.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 16:20:33 -0000 I have really enjoyed this discussion. Just to echo this point further. I have spent most of my career with 1 foot in opensource and the other 3 feet in the enterprise(And yes I have 4 feet.). Enterprise always makes decisions based on reliability or someone telling them something is reliable. If you ask 100 vmware admins why they use nfs probably 100 will say because vmware recommends it. If you ask a CT at vmware why they recommend it, the couple I have asked have said because it is a reliable transport. Vmware now has interest in pnfs. Technology gets driven by business/enterprise. I talked to a CA at a large electronics chain and asked why they are using ceph and he said about 100 words, then said because red hat recommends it with openstack. Intel is driving lustre. RHEL driving ceph. Vmware driving pnfs. I don't see anyone driving gluster. Every once in awhile you see products grow on their merit(watching proxmox and zerto right now) but those usually get swooped up by a bigger one. To the point of setting up kerberized nfs, AD has made kerberos easy, it could be just as easy with nfs. Everything is easy once you know it. lk On 6/20/16 9:54 PM, Jordan Hubbard wrote: > OK, wow. This appears to have turned into something of a referendum on NFS and, just based on Rick and Doug’s defense of pNFS, I also think my commentary on that may have been misconstrued somewhat. > > So, let me just set the record straight by saying that I’m all in favor of pNFS. It addresses a very definite need in the Enterprise marketplace and gives FreeBSD yet another arrow in its quiver when it comes to being “a player” in that (ever-growing) arena. The only point I was trying to make before was that if we could ALSO address clustering in a more general way as part of providing a pNFS solution, that would be great. I am not, however, the one writing the code and if my comments were in any way discouraging to the folks that are, I apologize and want to express my enthusiasm for it. If iXsystems engineering resources can contribute in any way to moving this ball forward, let me know and we’ll start doing so. > > On the more general point of “NFS is hard, let’s go shopping” let me also say that it’s kind of important not to conflate end-user targeted solutions with enterprise solutions. Setting up a Kerberized NFSv4, for example, is not really designed to be trivial to set up and if anyone is waiting for that to happen, they may be waiting a very long time (like, forever). NFS and SMB are both fairly simple technologies to use if you restrict yourself to using, say, just 20% of their overall feature-sets. Once you add ACLs, Directory Services, user/group and permissions mappings, and any of the other more enterprise-centric features of these filesharing technologies, however, things rapidly get more complicated and the DevOps people who routinely play in these kinds of environments are quite happy to have all those options available because they’re not consumers operating in consumer environments. > > Sun didn’t design NFS to be particularly consumer-centric, for that matter, and if you think SMB is “simple” because you clicked Network on Windows Explorer one day and stuff just automagically appeared, you should try operating it in a serious Windows Enterprise environment (just flip through some of the SMB bugs in the FreeNAS bug tracker - https://bugs.freenas.org/projects/freenas/issues?utf8=✓&set_filter=1&f%5B%5D=status_id&op%5Bstatus_id%5D=*&f%5B%5D=category_id&op%5Bcategory_id%5D=%3D&v%5Bcategory_id%5D%5B%5D=57&f%5B%5D=&c%5B%5D=tracker&c%5B%5D=status&c%5B%5D=priority&c%5B%5D=subject&c%5B%5D=assigned_to&c%5B%5D=updated_on&c%5B%5D=fixed_version&group_by= - if you want to see the kinds of problems users wrestle with all the time). > > Anyway, I’ll get off the soapbox now, I just wanted to dispute the premise that “simple file sharing” that is also “secure file sharing” and “flexible file sharing” doesn’t really exist. The simplest end-user oriented file sharing system I’ve used to date is probably AFP, and Apple has been trying to kill it for years, probably because it doesn’t have all those extra knobs and Kerberos / Directory Services integration business users have been asking for (it’s also not particularly industry standard). > > - Jordan > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Tue Jun 21 17:18:41 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3FE71AC4F94 for ; Tue, 21 Jun 2016 17:18:41 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 2205827F5; Tue, 21 Jun 2016 17:18:40 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (ppp121-45-242-176.lns20.per4.internode.on.net [121.45.242.176]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u5LHISxB093782 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Tue, 21 Jun 2016 10:18:31 -0700 (PDT) (envelope-from julian@freebsd.org) Subject: Re: futimens and utimensat vs birthtime (resurrected) To: Kirk McKusick References: <201508142122.t7ELMPjR002452@chez.mckusick.com> <56401002.8020909@freebsd.org> Cc: "freebsd-fs@freebsd.org" , "'Jilles Tjoelker'" , John Baldwin From: Julian Elischer Message-ID: Date: Wed, 22 Jun 2016 01:18:22 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <56401002.8020909@freebsd.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 17:18:41 -0000 bringing this up again. see below for new info.. On 9/11/2015 11:16 AM, Julian Elischer wrote: > On 8/15/15 5:22 AM, Kirk McKusick wrote: >>> From: John Baldwin >>> To: freebsd-current@freebsd.org >>> Subject: Re: futimens and utimensat vs birthtime >>> Date: Fri, 14 Aug 2015 10:39:41 -0700 >>> Cc: "freebsd-fs@freebsd.org" , >>> "'Jilles Tjoelker'" >>> >>> On Friday, August 14, 2015 10:46:10 PM Julian Elischer wrote: >>>> I would like to implement this call. but would like input as to it's >>>> nature. >>>> The code inside the system would already appear to support handling >>>> three elements, though it needs some scrutiny, >>>> so all that is needed is a system call with the ability to set the >>>> birthtime directly. >>>> >>>> Whether it should take the form of the existing calls but expecting >>>> three items is up for discussion. >>>> Maybe teh addition of a flags argument to specify which items are >>>> present and which to set. >>>> >>>> ideas? >>> I believe these should be new calls. Only utimensat() provides a >>> flag >>> argument, but it is reserved for AT_* flags. I would be fine with >>> something like futimens3() and utimensat3() (where 3 means "three >>> timespecs"). Jilles implemented futimens() and utimensat(), so he >>> might have ideas as well. I would probably stick the birth time in >>> the third (final) timespec slot to make it easier to update new code >>> (you can use an #ifdef just around ts[2] without having to #ifdef the >>> entire block). >>> >>> -- >>> John Baldwin >> I concur with John's suggestion. Add a new system call with three >> argument set of times specifying birthtime as the last one. I >> proposed doing this when I added birthtime, but did not as the >> sentiment at the time was that it would gratuitously make FreeBSD >> written applications less portable if they used this new non-standard >> system call. > > time has passed and I would like to get back to this: > There was some feedback last time. Taking that into account: > > One problem with the '3 arg' version is that we have to reinvent it > again if we get a 4th. > We could make something like the following: > > It has been suggested that a 4th entry might be "last archive time" > and that > "time created on this filesystem" and "file created first time > (ever)" might also > be separate in some systems. (as examples of why 3 might not be enough) ok, so a real 4th arg has turned up it turns out that to be really compatible with windows servers and if you are running a filesystem capable of doing it.. then you need to be able to stamp the "change time". Apparently Windows does this an there are applications that require it. I believe that most filesystems would simply not do this but at $JOB we have our own FS and it CAN do this. we just need a way to interface to it. So now we have 4 real and a hypothetical timestamps. access time modification time birth/creation time change time (archive time) The filesystem would have to support changing ctime, and you'd have to have some other safeguards but does anyone have any comments on the example below? Does anyone know what people like netapp and panasas do? isilon? > > the syscall name is also not decided. (fsetnstimes()) > one suggested form is: > $name (int fd, int32 flags/mask, const struct timespec *arrayptr[]); > > vs the current: > utimensat(int fd, const char *path, const struct timespec times[2], > int flag); > > where mask is: > --- > 0x01 disable_heuristic > 0x02 AT_SYMLINK_NOFOLLOW > 0x04-0x08 unused > -- times present--- > 0x10 access time > 0x20 mod time > 0x40 birth time > 0x80 change time > 0x100 archive time > 0x200-on reserverd for future times > > any bit not set in 0x010-on is not represented in the array. > no bits would be a nop (the price for orthogonality) and would > effectively be the same as a test for writeability. > "disable heuristic" would disable the forcing of birthtime back to > mod time or earlier (and any other 'logical fixes') > setting all 5 'time-present' bits would imply the array has 5 entries. > > anyone care to comment ? >> >> Kirk McKusick >> > From owner-freebsd-fs@freebsd.org Tue Jun 21 17:50:23 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA290AC56A7 for ; Tue, 21 Jun 2016 17:50:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BA3C218D4 for ; Tue, 21 Jun 2016 17:50:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5LHoLf4057845 for ; Tue, 21 Jun 2016 17:50:23 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204764] Filesystem deadlock, process in vodead state Date: Tue, 21 Jun 2016 17:50:21 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 17:50:23 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D204764 --- Comment #26 from commit-hook@freebsd.org --- A commit references this bug: Author: kib Date: Tue Jun 21 17:49:33 UTC 2016 New revision: 302063 URL: https://svnweb.freebsd.org/changeset/base/302063 Log: The vmtotal sysctl handler marks active vm objects to calculate statistics. Marking is done by setting the OBJ_ACTIVE flag. The flags change is locked, but the problem is that many parts of system assume that vm object initialization ensures that no other code could change the object, and thus performed lockless. The end result is corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag, causing random hangs. Avoid the active object marking, instead provide equally inexact but immutable is_object_alive() definition for the object mapped state. Avoid iterating over the processes mappings altogether by using arguably improved definition of the paging thread as one which sleeps on the v_free_count. PR: 204764 Diagnosed by: pho Tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb) Changes: head/sys/vm/vm_meter.c head/sys/vm/vm_object.h --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Tue Jun 21 19:30:43 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1E366AC5F24 for ; Tue, 21 Jun 2016 19:30:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D658716F4; Tue, 21 Jun 2016 19:30:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 9C3EAB9A1; Tue, 21 Jun 2016 15:30:41 -0400 (EDT) From: John Baldwin To: Julian Elischer Cc: Kirk McKusick , "freebsd-fs@freebsd.org" , 'Jilles Tjoelker' Subject: Re: futimens and utimensat vs birthtime (resurrected) Date: Tue, 21 Jun 2016 11:55:53 -0700 Message-ID: <2931129.2EFevmTKZu@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.3-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: <201508142122.t7ELMPjR002452@chez.mckusick.com> <56401002.8020909@freebsd.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 21 Jun 2016 15:30:41 -0400 (EDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 19:30:43 -0000 On Wednesday, June 22, 2016 01:18:22 AM Julian Elischer wrote: > bringing this up again. > > see below for new info.. > > On 9/11/2015 11:16 AM, Julian Elischer wrote: > > On 8/15/15 5:22 AM, Kirk McKusick wrote: > >>> From: John Baldwin > >>> To: freebsd-current@freebsd.org > >>> Subject: Re: futimens and utimensat vs birthtime > >>> Date: Fri, 14 Aug 2015 10:39:41 -0700 > >>> Cc: "freebsd-fs@freebsd.org" , > >>> "'Jilles Tjoelker'" > >>> > >>> On Friday, August 14, 2015 10:46:10 PM Julian Elischer wrote: > >>>> I would like to implement this call. but would like input as to it's > >>>> nature. > >>>> The code inside the system would already appear to support handling > >>>> three elements, though it needs some scrutiny, > >>>> so all that is needed is a system call with the ability to set the > >>>> birthtime directly. > >>>> > >>>> Whether it should take the form of the existing calls but expecting > >>>> three items is up for discussion. > >>>> Maybe teh addition of a flags argument to specify which items are > >>>> present and which to set. > >>>> > >>>> ideas? > >>> I believe these should be new calls. Only utimensat() provides a > >>> flag > >>> argument, but it is reserved for AT_* flags. I would be fine with > >>> something like futimens3() and utimensat3() (where 3 means "three > >>> timespecs"). Jilles implemented futimens() and utimensat(), so he > >>> might have ideas as well. I would probably stick the birth time in > >>> the third (final) timespec slot to make it easier to update new code > >>> (you can use an #ifdef just around ts[2] without having to #ifdef the > >>> entire block). > >>> > >> I concur with John's suggestion. Add a new system call with three > >> argument set of times specifying birthtime as the last one. I > >> proposed doing this when I added birthtime, but did not as the > >> sentiment at the time was that it would gratuitously make FreeBSD > >> written applications less portable if they used this new non-standard > >> system call. > > > > time has passed and I would like to get back to this: > > There was some feedback last time. Taking that into account: > > > > One problem with the '3 arg' version is that we have to reinvent it > > again if we get a 4th. > > We could make something like the following: > > > > It has been suggested that a 4th entry might be "last archive time" > > and that > > "time created on this filesystem" and "file created first time > > (ever)" might also > > be separate in some systems. (as examples of why 3 might not be enough) > > ok, so a real 4th arg has turned up > > it turns out that to be really compatible with windows servers > and if you are running a filesystem capable of doing it.. > then you need to be able to stamp the "change time". > > Apparently Windows does this an there are applications that require it. > I believe that most filesystems would simply not do this but at $JOB > we have our own FS and it CAN do this. > we just need a way to interface to it. > So now we have 4 real and a hypothetical timestamps. > access time > modification time > birth/creation time > change time > (archive time) I think the bitmask thing is perhaps complex, and posix already prefers to handle "sparse" timespec arrays via UTIME_OMIT. I would suggest adding a 'count' argument that specifies the number of items in the array. Callers can use UTIME_OMIT for items in the array they do not wish to set. I think having some helper macros might be nice for the array indicies if that doesn't hose the namespace too badly (UTIME_ACCESS, UTIME_CHANGE, UTIME_MODIFY, UTIME_BIRTH, UTIME_MAX, etc.). You could then do something like: struct timespec ts[UTIME_MAX]; for (i = 0; i < UTIME_MAX; i++) ts[i].tv_nsec = UTIME_OMIT; #ifdef UTIME_BIRTH ts[UTIME_BIRTH] = foo; #endif if (utimensat5(fd, path, ts, count, 0) < 0) err(...); -- John Baldwin From owner-freebsd-fs@freebsd.org Tue Jun 21 20:36:56 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4E0D3AC527D for ; Tue, 21 Jun 2016 20:36:56 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 39FB728C3 for ; Tue, 21 Jun 2016 20:36:56 +0000 (UTC) (envelope-from ken@kdm.org) Received: by mailman.ysv.freebsd.org (Postfix) id 32BB7AC527A; Tue, 21 Jun 2016 20:36:56 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 321C0AC5278; Tue, 21 Jun 2016 20:36:56 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D874528BF; Tue, 21 Jun 2016 20:36:55 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPS id u5LKaqpR090893 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 21 Jun 2016 16:36:53 -0400 (EDT) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.15.2/8.14.9/Submit) id u5LKaqpp090892; Tue, 21 Jun 2016 16:36:52 -0400 (EDT) (envelope-from ken) Date: Tue, 21 Jun 2016 16:36:52 -0400 From: "Kenneth D. Merry" To: current@freebsd.org Cc: fs@freebsd.org Subject: Heads Up: struct disk KBI change Message-ID: <20160621203652.GA90745@mithlond.kdm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Tue, 21 Jun 2016 16:36:54 -0400 (EDT) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mithlond.kdm.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 20:36:56 -0000 This will break binary compatibility for loadable modules that depend on struct disk. DISK_VERSION has been bumped, and I bumped __FreeBSD_version in a subsequent change. So, if you have module that uses struct disk, you'll need to recompile against the latest version of head. Ken ----- Forwarded message from "Kenneth D. Merry" ----- Date: Tue, 21 Jun 2016 20:18:19 +0000 (UTC) From: "Kenneth D. Merry" To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r302069 - head/sys/geom Author: ken Date: Tue Jun 21 20:18:19 2016 New Revision: 302069 URL: https://svnweb.freebsd.org/changeset/base/302069 Log: Fix a bug that caused da(4) instances to hang around after the underlying device is gone. The problem was that when disk_gone() is called, if the GEOM disk creation process has not yet happened, the withering process couldn't start. We didn't record any state in the GEOM disk code, and so the d_gone() callback to the da(4) driver never happened. The solution is to track the state of the creation process, and initiate the withering process from g_disk_create() if the disk is being created. This change does add fields to struct disk, and so I have bumped DISK_VERSION. geom_disk.c: Track where we are in the disk creation process, and check to see whether our underlying disk has gone away or not. In disk_gone(), set a new d_goneflag variable that g_disk_create() can check to see if it needs to clean up the disk instance. geom_disk.h: Add a mutex to struct disk (for internal use) disk init level, and a gone flag. Bump DISK_VERSION because the size of struct disk has changed and fields have been added at the beginning. Sponsored by: Spectra Logic Approved by: re (marius) Modified: head/sys/geom/geom_disk.c head/sys/geom/geom_disk.h ----- End forwarded message ----- -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-fs@freebsd.org Tue Jun 21 20:42:41 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 637F0AC5538 for ; Tue, 21 Jun 2016 20:42:41 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 701092E07 for ; Tue, 21 Jun 2016 20:42:39 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:iTVBdRf54qM5qvazSPDEa6ZLlGMj4u6mDksu8pMizoh2WeGdxc6zYh7h7PlgxGXEQZ/co6odzbGH6+a4ASdQv96oizMrTt9lb1c9k8IYnggtUoauKHbQC7rUVRE8B9lIT1R//nu2YgB/Ecf6YEDO8DXptWZBUiv2OQc9HOnpAIma153xjLHqvcWLKFUWzBOGIppMbzyO5T3LsccXhYYwYo0Q8TDu5kVyRuJN2GlzLkiSlRuvru25/Zpk7jgC86l5r50IAu3Heb8lR+lYECg+KDJyo8nqrgXYCwWV63YWSWlQlQBHRA3M7RX/V5G2tirhqut71i7dM9f7QLovVXGs9PRXT0rsiSELPhY6+X3ajsFhyqlcpUGPvRt6lrTVa4LdEfN1fafQeJtOX29IVcVVWilpH4SzcoYLF+pHNu8O/Nq1nEcHsRbrXVrkP+jo0DId3nI= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CvBAA0pmlX/61jaINdhBR9BrxrFw2FKUoCgXARAQEBAQEBAQFkJ4IxghoBAQEDAQEBASArIAsFCwIBCBgCAg0ZAgInAQkYAQ0CDAcEARoCBIgHCA6yLpBLAQEBBwEBAQEBARwFgQGFJoRNhCMBAQUWgkk4E4JHBYZLki6FH2mFNZISj3YCNCCCCByBaCAyB4kMNn8BAQE X-IronPort-AV: E=Sophos;i="5.26,506,1459828800"; d="scan'208";a="288811352" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 21 Jun 2016 16:42:28 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id A26D415F5E1; Tue, 21 Jun 2016 16:42:28 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 0ljxOMr8zeUk; Tue, 21 Jun 2016 16:42:27 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 57D1E15F5E2; Tue, 21 Jun 2016 16:42:27 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id M-jZQW8pY5TP; Tue, 21 Jun 2016 16:42:27 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3521615F5E1; Tue, 21 Jun 2016 16:42:27 -0400 (EDT) Date: Tue, 21 Jun 2016 16:42:26 -0400 (EDT) From: Rick Macklem To: linda@kateley.com Cc: freebsd-fs@freebsd.org Message-ID: <661035450.164996651.1466541746806.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <5ccbd619-88f2-8480-727a-4b70f11a35ba@kateley.com> References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <74CD7EB1-1656-4511-8B63-5C4401D1BB8D@ixsystems.com> <5ccbd619-88f2-8480-727a-4b70f11a35ba@kateley.com> Subject: Re: pNFS server Plan B MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF47 (Win)/8.0.9_GA_6191) Thread-Topic: pNFS server Plan B Thread-Index: +mYn4vlmaonMnV7Wc1qSenr7TemuVQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 20:42:41 -0000 Linda Kateley wrote: > I have really enjoyed this discussion. Just to echo this point further. > I have spent most of my career with 1 foot in opensource and the other 3 > feet in the enterprise(And yes I have 4 feet.). Enterprise always makes > decisions based on reliability or someone telling them something is > reliable. If you ask 100 vmware admins why they use nfs probably 100 > will say because vmware recommends it. If you ask a CT at vmware why > they recommend it, the couple I have asked have said because it is a > reliable transport. >=20 > Vmware now has interest in pnfs. >=20 > Technology gets driven by business/enterprise. I talked to a CA at a > large electronics chain and asked why they are using ceph and he said > about 100 words, then said because red hat recommends it with openstack. >=20 > Intel is driving lustre. RHEL driving ceph. Vmware driving pnfs. I don't > see anyone driving gluster. >=20 I don't know of any vendors (Redhat people basically maintain it, afaik), b= ut Jordan sent me this a little while back: https://www.socallinuxexpo.org/scale/14x/presentations/scaling-glusterfs-fa= cebook Facebook is a user, but a large one. Although GlusterFS seems to supports OpenStack stuff, it seems to be layere= d on top of the POSIX file system using something they call SwiftOnFile. Thanks for the comments, rick > Every once in awhile you see products grow on their merit(watching > proxmox and zerto right now) but those usually get swooped up by a > bigger one. >=20 > To the point of setting up kerberized nfs, AD has made kerberos easy, it > could be just as easy with nfs. Everything is easy once you know it. >=20 > lk >=20 >=20 > On 6/20/16 9:54 PM, Jordan Hubbard wrote: > > OK, wow. This appears to have turned into something of a referendum on= NFS > > and, just based on Rick and Doug=E2=80=99s defense of pNFS, I also thin= k my > > commentary on that may have been misconstrued somewhat. > > > > So, let me just set the record straight by saying that I=E2=80=99m all = in favor of > > pNFS. It addresses a very definite need in the Enterprise marketplace = and > > gives FreeBSD yet another arrow in its quiver when it comes to being = =E2=80=9Ca > > player=E2=80=9D in that (ever-growing) arena. The only point I was try= ing to make > > before was that if we could ALSO address clustering in a more general w= ay > > as part of providing a pNFS solution, that would be great. I am not, > > however, the one writing the code and if my comments were in any way > > discouraging to the folks that are, I apologize and want to express my > > enthusiasm for it. If iXsystems engineering resources can contribute i= n > > any way to moving this ball forward, let me know and we=E2=80=99ll star= t doing so. > > > > On the more general point of =E2=80=9CNFS is hard, let=E2=80=99s go sho= pping=E2=80=9D let me also > > say that it=E2=80=99s kind of important not to conflate end-user target= ed > > solutions with enterprise solutions. Setting up a Kerberized NFSv4, fo= r > > example, is not really designed to be trivial to set up and if anyone i= s > > waiting for that to happen, they may be waiting a very long time (like, > > forever). NFS and SMB are both fairly simple technologies to use if yo= u > > restrict yourself to using, say, just 20% of their overall feature-sets= . > > Once you add ACLs, Directory Services, user/group and permissions > > mappings, and any of the other more enterprise-centric features of thes= e > > filesharing technologies, however, things rapidly get more complicated = and > > the DevOps people who routinely play in these kinds of environments are > > quite happy to have all those options available because they=E2=80=99re= not > > consumers operating in consumer environments. > > > > Sun didn=E2=80=99t design NFS to be particularly consumer-centric, for = that matter, > > and if you think SMB is =E2=80=9Csimple=E2=80=9D because you clicked Ne= twork on Windows > > Explorer one day and stuff just automagically appeared, you should try > > operating it in a serious Windows Enterprise environment (just flip > > through some of the SMB bugs in the FreeNAS bug tracker - > > https://bugs.freenas.org/projects/freenas/issues?utf8=3D=E2=9C=93&set_f= ilter=3D1&f%5B%5D=3Dstatus_id&op%5Bstatus_id%5D=3D*&f%5B%5D=3Dcategory_id&o= p%5Bcategory_id%5D=3D%3D&v%5Bcategory_id%5D%5B%5D=3D57&f%5B%5D=3D&c%5B%5D= =3Dtracker&c%5B%5D=3Dstatus&c%5B%5D=3Dpriority&c%5B%5D=3Dsubject&c%5B%5D=3D= assigned_to&c%5B%5D=3Dupdated_on&c%5B%5D=3Dfixed_version&group_by=3D > > - if you want to see the kinds of problems users wrestle with all the > > time). > > > > Anyway, I=E2=80=99ll get off the soapbox now, I just wanted to dispute = the premise > > that =E2=80=9Csimple file sharing=E2=80=9D that is also =E2=80=9Csecure= file sharing=E2=80=9D and > > =E2=80=9Cflexible file sharing=E2=80=9D doesn=E2=80=99t really exist. = The simplest end-user > > oriented file sharing system I=E2=80=99ve used to date is probably AFP,= and Apple > > has been trying to kill it for years, probably because it doesn=E2=80= =99t have all > > those extra knobs and Kerberos / Directory Services integration busines= s > > users have been asking for (it=E2=80=99s also not particularly industry= standard). > > > > - Jordan > > > > > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Tue Jun 21 21:54:58 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BC7CAAC53D4 for ; Tue, 21 Jun 2016 21:54:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 458CB228A; Tue, 21 Jun 2016 21:54:57 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:odWhRRZWHwd4vwOZ15x0Fy3/LSx+4OfEezUN459isYplN5qZpcm5bnLW6fgltlLVR4KTs6sC0LuO9fi5EjRaqb+681k8M7V0HycfjssXmwFySOWkMmbcaMDQUiohAc5ZX0Vk9XzoeWJcGcL5ekGA6ibqtW1aJBzzOEJPK/jvHcaK1oLshrj0o8SYO18ArQH+SI0xBS3+lR/WuMgSjNkqAYcK4TyNnEF1ff9Lz3hjP1OZkkW0zM6x+Jl+73YY4Kp5pIYTGZn9KoY/V7BRCnwGLmo/7dfn/U3BTgun52sHQ34Knx9TRQPC6UepcI32t37At+F+kAyTNs7yQLV8DS6n5qxoTBLtoDoAOCM09HnXzMd52vEI6Cm9rgByltaHKLqeM+BzK/vQ X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2C9BABSt2lX/61jaINdhBQuTwa8axcLhStKAoFxEAEBAQEBAQEBZCeCMYIaAQEBAwEBAQEgBCcgCwULAgEIDgoCAg0ZAgInAQkYAQ0CBAgHBAEaAgSIBwgOshGQSwEBAQEGAQEBAQEdBYEBhSaDSoEDhCMBAQUWgwGCWgWTM4VGhgiFNY1ThD+PdgI1H4IIHIFoIDIHiQw2fwEBAQ X-IronPort-AV: E=Sophos;i="5.26,506,1459828800"; d="scan'208";a="290713447" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 21 Jun 2016 17:54:45 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id D9FE215F5E9; Tue, 21 Jun 2016 17:54:45 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id SQv2xGM2nS3g; Tue, 21 Jun 2016 17:54:44 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id B6E1615F5EA; Tue, 21 Jun 2016 17:54:44 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id SrvKT0rIOG_w; Tue, 21 Jun 2016 17:54:44 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 9099D15F5E9; Tue, 21 Jun 2016 17:54:44 -0400 (EDT) Date: Tue, 21 Jun 2016 17:54:44 -0400 (EDT) From: Rick Macklem To: Jordan Hubbard Cc: Doug Rabson , freebsd-fs , Alexander Motin Message-ID: <901639916.165261477.1466546084028.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <74CD7EB1-1656-4511-8B63-5C4401D1BB8D@ixsystems.com> References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <74CD7EB1-1656-4511-8B63-5C4401D1BB8D@ixsystems.com> Subject: Re: pNFS server Plan B MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF47 (Win)/8.0.9_GA_6191) Thread-Topic: pNFS server Plan B Thread-Index: Eq1mVZaOzAjOuMK1UDy3wrWAHDMYZw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2016 21:54:58 -0000 Jordan Hubbard wrote: > OK, wow. This appears to have turned into something of a referendum on N= FS > and, just based on Rick and Doug=E2=80=99s defense of pNFS, I also think = my > commentary on that may have been misconstrued somewhat. >=20 Actually, I thought it had become a referendum on LDAP;-) As for defending pNFS, all I was trying to say was that "although it is har= d to believe, it has taken 10years for pNFS to hit the streets". As such, it is anyone's guess w.r.t. whether or not it will become widely adopted? If it came across as more than that, I am the one that should be apologizin= g and am in no way discouraged by any of the comments. > So, let me just set the record straight by saying that I=E2=80=99m all in= favor of > pNFS. It addresses a very definite need in the Enterprise marketplace an= d > gives FreeBSD yet another arrow in its quiver when it comes to being =E2= =80=9Ca > player=E2=80=9D in that (ever-growing) arena. The only point I was tryin= g to make > before was that if we could ALSO address clustering in a more general way= as > part of providing a pNFS solution, that would be great. When I did a fairly superficial evaluation of the open source clustering sy= stems out there (looking at online doc and not actually their code), it seemed th= at GlusterFS was the best bet for "one size fits all". It had: - a distributed file system (replication, etc) with a POSIX/FUSE interface. - SwiftOnFile that put the Swift/Openstack on top of this. - It had decentralized metadata handling. For pNFS: - It had a NFSv3 server built into it. - Was ported to FreeBSD. The others were: - Object store only with no POSIX file system support or - Single centralized metadata store (MooseFS, for example) - No FreeBSD port and rumoured to be hard to port (Ceph, Lustre are two exa= mples). Now that I've worked with GlusterFS a little bit, I am skeptical that it ca= n deliver adequate performance for pNFS using the nfsd. I am still hoping I w= ill be proven wrong on this, but??? A GlusterFS/Ganesha-NFS user space solution may be feasible. This is what t= he GlusterFS folk are planning. However, for FreeBSD... - Ganesha-NFS apparently was ported to FreeBSD, but the port was removed fr= om their source tree and it is said it now uses Linux-specific thread primit= ives. --> As such, I have no idea what effort is involved in getting this porte= d and working well on FreeBSD is. - I would also wait until this is working in Linux and would want to do an evaluation of that, to make sure it actually works/performs well, before considering this. *** For me personally, I am probably not interested in working on this. I know the FreeBSD nfsd kernel code well and can easily work with that, but Ganesha-NFS would be an entirely different beast. Bottom line, at this point I am skeptical that a generic clustering system will work for pNFS. > I am not, however, > the one writing the code and if my comments were in any way discouraging = to > the folks that are, I apologize and want to express my enthusiasm for it. > If iXsystems engineering resources can contribute in any way to moving th= is > ball forward, let me know and we=E2=80=99ll start doing so. >=20 Well, although they may not be useful for building a pNFS server, but some = sort of evaluation of the open source clustering systems might be useful. Sooner or later, the Enterprise marketplace may want one or more of these a= nd it seems to me that having one of these layered on top of ZFS may be an att= ractive solution. - Some will never be ported to FreeBSD, but the ones that are could probabl= y be evaluated fairly easily, if you have the resources. Since almost all the code I've written gets reused if I do a PlanB, I will probably pursue that, leaving the GlusterFS interface bits in place in case they are useful. Thanks for all the interesting comments, rick > On the more general point of =E2=80=9CNFS is hard, let=E2=80=99s go shopp= ing=E2=80=9D let me also say > that it=E2=80=99s kind of important not to conflate end-user targeted sol= utions with > enterprise solutions. Setting up a Kerberized NFSv4, for example, is not > really designed to be trivial to set up and if anyone is waiting for that= to > happen, they may be waiting a very long time (like, forever). NFS and SM= B > are both fairly simple technologies to use if you restrict yourself to > using, say, just 20% of their overall feature-sets. Once you add ACLs, > Directory Services, user/group and permissions mappings, and any of the > other more enterprise-centric features of these filesharing technologies, > however, things rapidly get more complicated and the DevOps people who > routinely play in these kinds of environments are quite happy to have all > those options available because they=E2=80=99re not consumers operating i= n consumer > environments. >=20 > Sun didn=E2=80=99t design NFS to be particularly consumer-centric, for th= at matter, > and if you think SMB is =E2=80=9Csimple=E2=80=9D because you clicked Netw= ork on Windows > Explorer one day and stuff just automagically appeared, you should try > operating it in a serious Windows Enterprise environment (just flip throu= gh > some of the SMB bugs in the FreeNAS bug tracker - > https://bugs.freenas.org/projects/freenas/issues?utf8=3D=E2=9C=93&set_fil= ter=3D1&f%5B%5D=3Dstatus_id&op%5Bstatus_id%5D=3D*&f%5B%5D=3Dcategory_id&op%= 5Bcategory_id%5D=3D%3D&v%5Bcategory_id%5D%5B%5D=3D57&f%5B%5D=3D&c%5B%5D=3Dt= racker&c%5B%5D=3Dstatus&c%5B%5D=3Dpriority&c%5B%5D=3Dsubject&c%5B%5D=3Dassi= gned_to&c%5B%5D=3Dupdated_on&c%5B%5D=3Dfixed_version&group_by=3D > - if you want to see the kinds of problems users wrestle with all the tim= e). >=20 > Anyway, I=E2=80=99ll get off the soapbox now, I just wanted to dispute th= e premise > that =E2=80=9Csimple file sharing=E2=80=9D that is also =E2=80=9Csecure f= ile sharing=E2=80=9D and =E2=80=9Cflexible > file sharing=E2=80=9D doesn=E2=80=99t really exist. The simplest end-use= r oriented file > sharing system I=E2=80=99ve used to date is probably AFP, and Apple has b= een trying > to kill it for years, probably because it doesn=E2=80=99t have all those = extra knobs > and Kerberos / Directory Services integration business users have been > asking for (it=E2=80=99s also not particularly industry standard). >=20 > - Jordan >=20 >=20 >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Wed Jun 22 11:26:26 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 28118AC58CC for ; Wed, 22 Jun 2016 11:26:26 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E3B5F2DFA for ; Wed, 22 Jun 2016 11:26:25 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id 3A79A153402; Wed, 22 Jun 2016 10:57:05 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xWtymWGBsLaS; Wed, 22 Jun 2016 10:56:55 +0200 (CEST) Received: from [192.168.10.10] (asus [192.168.10.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id F0A86153430; Wed, 22 Jun 2016 10:56:54 +0200 (CEST) Subject: Re: pNFS server Plan B To: Jordan Hubbard , Rick Macklem References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> Cc: freebsd-fs , Alexander Motin From: Willem Jan Withagen Message-ID: <16d38847-f515-f532-1300-d2843005999e@digiware.nl> Date: Wed, 22 Jun 2016 10:56:57 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2016 11:26:26 -0000 Hi Jordan, To rip just a bit of your text out of context: On 18-6-2016 22:50, Jordan Hubbard wrote: > Some, if not most, of them are also far > better supported under Linux than FreeBSD (I don’t think we even have > a working ceph port yet). In the spare time I have left, I'm trying to get a lot of small fixes into the ceph tree to get it actually compiling, testing, and running on FreeBSD. But Ceph is a lot of code, and since a lot of people are working on it, the number of code changes are big. And just keeping up with that is sometimes hard. More and more Linux-isms are dropped into the the code. So progress is slow. I only because it is hard to get people to look at the commits and get them. Current state is that I have it compile everything, and I can run 120 of 129 test with success. I once had them complete all, but then a busload of changes were dropped in the tree. And so I needed to "start "repairing" again. I gave a small presentation of my work thus far at Ceph Day Cern in Geneva. https://indico.cern.ch/event/542464/contributions/2202309/ Differences in code are not really that big in the CC-code, most of the things to fix are additional tools that have to deal with the infrastructure that fully assumes it is running a Linux-distro. Next to that is Ceph going to its own diskstore system: BlueStore, where as I hope(d) to base it on a ZFS underlying layer... To run BlueStore AIO is needed for diskdevices, but the current AIO is not call for call compatible, and requires a glue layer. I have not looked into the size of the semantic problems between Linux and FreeBSD here. On the other hand they just declared CephFS (a posix filesystem running on Ceph) stable and to be used. --WjW From owner-freebsd-fs@freebsd.org Thu Jun 23 06:08:23 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29356B73308 for ; Thu, 23 Jun 2016 06:08:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 197BC2986 for ; Thu, 23 Jun 2016 06:08:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5N68MXM014527 for ; Thu, 23 Jun 2016 06:08:22 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 209158] node / npm triggering zfs rename deadlock Date: Thu, 23 Jun 2016 06:08:22 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: swills@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2016 06:08:23 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209158 --- Comment #31 from Steve Wills --- (In reply to Doug Luce from comment #30) I'm currently testing this patch and so far it fixes the issue and doesn't cause any adverse affects, but it's only been running for a few hours. More testing would be good, but this looks promising to me. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Thu Jun 23 10:09:40 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 91194A79FCD for ; Thu, 23 Jun 2016 10:09:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 673A727FA for ; Thu, 23 Jun 2016 10:09:40 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5NA9dUF094384 for ; Thu, 23 Jun 2016 10:09:40 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 194513] zfs recv hangs in state kmem arena Date: Thu, 23 Jun 2016 10:09:40 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: shared+bugs.freebsd.org@twingly.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2016 10:09:40 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D194513 --- Comment #17 from Twingly --- We are also seeing this on our backup server 64 GB RAM 4x6TB (zpool with 2 mirrors) *no* SSD ZIL or L2Arc zfs send ... | ssh backupserver zfs receive zfs receive -u ... often hangs, and I have to reboot the server to get rid of the hang process= and to be able to receive datasets again no problem with 'zpool history', it runs fine $ uname -a FreeBSD hodor.live.lkp.primelabs.se 10.2-RELEASE-p18 FreeBSD 10.2-RELEASE-p= 18 #0: Sat May 28 08:53:43 UTC 2016=20=20=20=20 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 $ freebsd-version 10.2-RELEASE-p19 $ ps -l -U backup UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 1006 59745 1 0 20 0 42248 3164 kmem are Ds - 0:03.81 zfs receive= -u storage/backup/mysql/leaf/data $ procstat 59745 PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 59745 1 59745 59745 0 1 backup kmem are FreeBSD ELF64 zfs $ sudo procstat -kk 59745 Password: PID TID COMM TDNAME KSTACK 59745 100809 zfs - mi_switch+0xe1 sleepq_wait+0= x3a _cv_wait+0x16d vmem_xalloc+0x568 vmem_alloc+0x3d kmem_malloc+0x33 uma_large_malloc+0x49 malloc+0x43 dmu_recv_stream+0x114 zfs_ioc_recv+0x955 zfsdev_ioctl+0x5ca devfs_ioctl_f+0x139 kern_ioctl+0x255 sys_ioctl+0x140 amd64_syscall+0x357 Xfast_syscall+0xfb $ sysctl -h hw.physmem vm.kmem_size vm.kmem_size_max hw.physmem: 68,578,963,456 vm.kmem_size: 66,774,925,312 vm.kmem_size_max: 1,319,413,950,874 I will look into raising vm.kmem_size, but right now I have to reboot hand hopefully get some backups going... --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Thu Jun 23 10:46:35 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 144D4AC5C64 for ; Thu, 23 Jun 2016 10:46:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0110D1E35 for ; Thu, 23 Jun 2016 10:46:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u5NAkXbZ077081 for ; Thu, 23 Jun 2016 10:46:34 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 194513] zfs recv hangs in state kmem arena Date: Thu, 23 Jun 2016 10:46:33 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: shared+bugs.freebsd.org@twingly.com X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2016 10:46:35 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D194513 --- Comment #18 from Twingly --- Created attachment 171704 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D171704&action= =3Dedit reboot doesn't get past this When zfs recive has hung, 'reboot' won't work to reboot the machine, it just hangs and I have to reset the machine via remote management. 'reboot' works fine on this machine when there is no hung zfs receive proce= sses --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Fri Jun 24 07:35:24 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80BFEB80F0C for ; Fri, 24 Jun 2016 07:35:24 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: from barracuda.ixsystems.com (barracuda.ixsystems.com [12.229.62.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certificate Authority - G2" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 55571280C for ; Fri, 24 Jun 2016 07:35:24 +0000 (UTC) (envelope-from jkh@ixsystems.com) X-ASG-Debug-ID: 1466753723-08ca041121252c0001-3nHGF7 Received: from zimbra.ixsystems.com ([10.246.0.20]) by barracuda.ixsystems.com with ESMTP id R38iwKZ2iihlhQDA (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 24 Jun 2016 00:35:23 -0700 (PDT) X-Barracuda-Envelope-From: jkh@ixsystems.com X-Barracuda-RBL-Trusted-Forwarder: 10.246.0.20 X-ASG-Whitelist: Client Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 4C0FDDE0F26; Fri, 24 Jun 2016 00:35:23 -0700 (PDT) Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id ISbnJgLYUd9H; Fri, 24 Jun 2016 00:35:22 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 3D2E1DE4CB1; Fri, 24 Jun 2016 00:35:22 -0700 (PDT) X-Virus-Scanned: amavisd-new at ixsystems.com Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id fXhc0qangd-3; Fri, 24 Jun 2016 00:35:22 -0700 (PDT) Received: from [172.20.0.18] (vpn.ixsystems.com [10.249.0.2]) by zimbra.ixsystems.com (Postfix) with ESMTPSA id D4321DE4BA6; Fri, 24 Jun 2016 00:35:21 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: pNFS server Plan B From: Jordan Hubbard X-ASG-Orig-Subj: Re: pNFS server Plan B In-Reply-To: <16d38847-f515-f532-1300-d2843005999e@digiware.nl> Date: Fri, 24 Jun 2016 00:35:21 -0700 Cc: Rick Macklem , freebsd-fs , Alexander Motin Content-Transfer-Encoding: quoted-printable Message-Id: <0CB465F9-B819-4DA7-969C-690A02BEB66E@ixsystems.com> References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <16d38847-f515-f532-1300-d2843005999e@digiware.nl> To: Willem Jan Withagen X-Mailer: Apple Mail (2.3124) X-Barracuda-Connect: UNKNOWN[10.246.0.20] X-Barracuda-Start-Time: 1466753723 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://10.246.0.26:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ixsystems.com X-Barracuda-BRTS-Status: 1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2016 07:35:24 -0000 > On Jun 22, 2016, at 1:56 AM, Willem Jan Withagen = wrote: >=20 > In the spare time I have left, I'm trying to get a lot of small fixes > into the ceph tree to get it actually compiling, testing, and running = on > FreeBSD. But Ceph is a lot of code, and since a lot of people are > working on it, the number of code changes are big. Hi Willem, Yes, I read your paper on the porting effort! I also took a look at porting ceph myself, about 2 years ago, and = rapidly concluded that it wasn=E2=80=99t a small / trivial effort by any = means and would require a strong justification in terms of ceph=E2=80=99s = feature set over glusterfs / moose / OpenAFS / RiakCS / etc. Since = that time, there=E2=80=99s been customer interest but nothing truly = =E2=80=9Cstrong=E2=80=9D per-se. My attraction to ceph remains centered = around at least these 4 things: 1. Distributed Object store with S3-compatible ReST API 2. Interoperates with Openstack via Swift compatibility 3. Block storage (RADOS) - possibly useful for iSCSI and other block = storage requirements 4. Filesystem interface Is there anything we can do to help? Do the CEPH folks seem receptive = to actually having a =E2=80=9CTier 1=E2=80=9D FreeBSD port? I know that = stas@ did an early almost-port awhile back, but it never reached = fruition and my feeling was that they (ceph) might be a little gun-shy = about seeing another port that might wind up in the same place, crufting = up their code base to no purpose. Do you have any initial impressions = about that? I=E2=80=99ve never talked to any of the 3 principle guys = working on the project and this is pure guesswork on my part. - Jordan From owner-freebsd-fs@freebsd.org Fri Jun 24 08:31:53 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E5D4EA77E70 for ; Fri, 24 Jun 2016 08:31:53 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9B455204F; Fri, 24 Jun 2016 08:31:53 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id AC1B9153401; Fri, 24 Jun 2016 10:31:50 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.com Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dsKx4bkuC9WT; Fri, 24 Jun 2016 10:31:22 +0200 (CEST) Received: from [192.168.10.67] (opteron [192.168.10.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.digiware.nl (Postfix) with ESMTPSA id 29129153402; Fri, 24 Jun 2016 10:21:14 +0200 (CEST) Subject: Re: pNFS server Plan B To: Jordan Hubbard References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <16d38847-f515-f532-1300-d2843005999e@digiware.nl> <0CB465F9-B819-4DA7-969C-690A02BEB66E@ixsystems.com> Cc: Rick Macklem , freebsd-fs , Alexander Motin From: Willem Jan Withagen Message-ID: <20e89f76-867f-67b7-bb80-17acf8de6ed3@digiware.nl> Date: Fri, 24 Jun 2016 10:21:07 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <0CB465F9-B819-4DA7-969C-690A02BEB66E@ixsystems.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jun 2016 08:31:54 -0000 On 24-6-2016 09:35, Jordan Hubbard wrote: > >> On Jun 22, 2016, at 1:56 AM, Willem Jan Withagen >> wrote: >> >> In the spare time I have left, I'm trying to get a lot of small >> fixes into the ceph tree to get it actually compiling, testing, and >> running on FreeBSD. But Ceph is a lot of code, and since a lot of >> people are working on it, the number of code changes are big. > > Hi Willem, > > Yes, I read your paper on the porting effort! > > I also took a look at porting ceph myself, about 2 years ago, and > rapidly concluded that it wasn’t a small / trivial effort by any > means and would require a strong justification in terms of ceph’s > feature set over glusterfs / moose / OpenAFS / RiakCS / etc. Since > that time, there’s been customer interest but nothing truly “strong” > per-se. I've been going at it since last November... And all I go in are about 3 batches of FreeBSD specific commits. Lots has to do with release windows and code slush, like we know on FreeBSD. But then still reviews tend to slow and I need people to push to look at them. Whilst in the mean time all kinds of thing get pulled and inserted in the tree, that seriously are not FreeBSD. Sometimes I see them during commit, and "negotiate" better comparability with the author. At other times I missed the whole thing, and I need to rebase to get ride of merge conflicts. To find out the hard way that somebody has made the whole peer communication async. And has thrown kqueue for the BSDs at it. But they don't work (yet). So to get my other patches in, if First need to fix this. Takes a lot of time ..... That all said I was in Geneva and a lot of the Ceph people were there including Sage Weil. And I go the feeling they appreciated a larger community. I think they see what ZFS has done with OpenZFS and see that communities get somewhere. Now on of the things to do to continue, now that I sort of can compile and run the first testset, is set up sort of my own Jenkins stuff. So that I can at least test drive some of the tree automagically to get some testcoverage of the code on FreeBSD. In my mind (and Sage warned me that that will be more or less required) it is the only way to actually get a serious foot in the door with the Ceph guys. > My attraction to ceph remains centered around at least these > 4 things: > > 1. Distributed Object store with S3-compatible ReST API > 2. Interoperates with Openstack via Swift compatibility > 3. Block storage > (RADOS) - possibly useful for iSCSI and other block storage > requirements > 4. Filesystem interface > > Is there anything we can do to help? I'll get back on that in a separate Email. > Do the CEPH folks seem > receptive to actually having a “Tier 1” FreeBSD port? I know that > stas@ did an early almost-port awhile back, but it never reached > fruition and my feeling was that they (ceph) might be a little > gun-shy about seeing another port that might wind up in the same > place, crufting up their code base to no purpose. Well, as you know, I for the era before there was automake.... So then porting was still very much an art. So I've been balancing between crufting up the code, and hiding things nice and cleanly in C++ classes and place. And as an go inbetween stuff get stuck in compat.h. One of my slides was actually about the impact of foreign code in the tree. And uptill now that is relatively minimal. Which seemed to please a lot of the folks. But they also like the idee that getting FreeBSD stuff in actually showed code weakness (and fixes) in the odd corners. > Do you have any > initial impressions about that? I’ve never talked to any of the 3 > principle guys working on the project and this is pure guesswork on > my part. I think they are going their own path, like writting their own datastore so they can do things they require that posix can't deliver. And as such are also diverging from what is default on Linux. The systemarchitect in me also sees things in pain happen, because of the "reinvention" of things. But then again, that happens with projects this big. Things like checksums, compression, encryption, .... Lots of stuff I've seen happen to ZFS over its time. But so be it, everybody gets to chose their own axes to grind. The community person to talk to is perhaps Patrick McGarry, but even Sage would be good to talk to. --WjW From owner-freebsd-fs@freebsd.org Sat Jun 25 14:32:56 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C6592B801FB for ; Sat, 25 Jun 2016 14:32:56 +0000 (UTC) (envelope-from mchangir@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AC52F1891 for ; Sat, 25 Jun 2016 14:32:56 +0000 (UTC) (envelope-from mchangir@redhat.com) Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7F07E81F03 for ; Sat, 25 Jun 2016 14:32:55 +0000 (UTC) Received: from [10.10.50.6] (unused [10.10.50.6] (may be forged)) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u5PEWrhC026014 for ; Sat, 25 Jun 2016 10:32:54 -0400 To: freebsd-fs@freebsd.org From: Milind Changire Subject: Can NetBSD fdiscard() syscall support be added to FreeBSD? Organization: Red Hat India Private Limited Message-ID: <59b3ac11-6f2a-1f38-df8f-3f07859a9f9f@redhat.com> Date: Sat, 25 Jun 2016 20:02:52 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Sat, 25 Jun 2016 14:32:55 +0000 (UTC) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jun 2016 14:32:56 -0000 I couldn't find any discussion in the mailing list archives regarding this issue. Kindly point me to the appropriate discussion thread if there's any. ----- Here's my point of view ... fdiscard() will help to support rsync --inplace and --sparse flags at the same time which will help the Gluster geo-replication feature. Punching holes in a file is possible on: 1. Linux 2. NetBSD but not 3. FreeBSD at the moment. In-place hole punching will help Gluster geo-replicate virtual machine images efficiently. Kindly comment. -- Milind