Date: Sat, 18 Jun 2016 13:50:29 -0700 From: Jordan Hubbard <jkh@ixsystems.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-fs <freebsd-fs@freebsd.org>, Alexander Motin <mav@freebsd.org> Subject: Re: pNFS server Plan B Message-ID: <D20C793E-A2FD-49F3-AD88-7C2FED5E7715@ixsystems.com> In-Reply-To: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> References: <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Jun 13, 2016, at 3:28 PM, Rick Macklem <rmacklem@uoguelph.ca> = wrote: >=20 > You may have already heard of Plan A, which sort of worked > and you could test by following the instructions here: >=20 > http://people.freebsd.org/~rmacklem/pnfs-setup.txt >=20 > However, it is very slow for metadata operations (everything other = than > read/write) and I don't think it is very useful. Hi guys, I finally got a chance to catch up and bring up Rick=E2=80=99s pNFS = setup on a couple of test machines. He=E2=80=99s right, obviously - The = =E2=80=9Cplan A=E2=80=9D approach is a bit convoluted and not at all = surprisingly slow. With all of those transits twixt kernel and = userland, not to mention glusterfs itself which has not really been = tuned for our platform (there are a number of papers on this we probably = haven=E2=80=99t even all read yet), we=E2=80=99re obviously still in the = =E2=80=9Cfirst make it work=E2=80=9D stage. That said, I think there are probably more possible plans than just A = and B here, and we should give the broader topic of =E2=80=9Cwhat does = FreeBSD want to do in the Enterprise / Cloud computing space?" at least = some consideration at the same time, since there are more than a few = goals running in parallel here. First, let=E2=80=99s talk about our story around clustered filesystems + = associated command-and-control APIs in FreeBSD. There is something of = an embarrassment of riches in the industry at the moment - glusterfs, = ceph, Hadoop HDFS, RiakCS, moose, etc. All or most of them offer = different pros and cons, and all offer more than just the ability to = store files and scale =E2=80=9Celastically=E2=80=9D. They also have = ReST APIs for configuring and monitoring the health of the cluster, some = offer object as well as file storage, and Riak offers a distributed KVS = for storing information *about* file objects in addition to the object = themselves (and when your application involves storing and managing = several million photos, for example, the idea of distributing the index = as well as the files in a fault-tolerant fashion is also compelling). = Some, if not most, of them are also far better supported under Linux = than FreeBSD (I don=E2=80=99t think we even have a working ceph port = yet). I=E2=80=99m not saying we need to blindly follow the herds and = do all the same things others are doing here, either, I=E2=80=99m just = saying that it=E2=80=99s a much bigger problem space than simply = =E2=80=9Cparallelizing NFS=E2=80=9D and if we can kill multiple birds = with one stone on the way to doing that, we should certainly consider = doing so. Why? Because pNFS was first introduced as a draft RFC (RFC5661 = <https://datatracker.ietf.org/doc/rfc5661/>) in 2005. The linux folks = have been working on it = <http://events.linuxfoundation.org/sites/events/files/slides/pnfs.pdf> = since 2006. Ten years is a long time in this business, and when I = raised the topic of pNFS at the recent SNIA DSI conference (where = storage developers gather to talk about trends and things), the most = prevalent reaction I got was =E2=80=9Cpeople are still using pNFS?!=E2=80=9D= This is clearly one of those technologies that may still have some = runway left, but it=E2=80=99s been rapidly overtaken by other approaches = to solving more or less the same problems in coherent, distributed = filesystem access and if we want to get mindshare for this, we should at = least have an answer ready for the =E2=80=9Cwhy did you guys do pNFS = that way rather than just shimming it on top of ${someNewerHotness}??=E2=80= =9D argument. I=E2=80=99m not suggesting pNFS is dead - hell, even AFS = <https://www.openafs.org/> still appears to be somewhat alive, but = there=E2=80=99s a difference between appealing to an increasingly narrow = niche and trying to solve the sorts of problems most DevOps folks = working At Scale these days are running into. That is also why I am not sure I would totally embrace the idea of a = central MDS being a Real Option. Sure, the risks can be mitigated (as = you say, by mirroring it), but even saying the words =E2=80=9Ccentral = MDS=E2=80=9D (or central anything) may be such a turn-off to those very = same DevOps folks, folks who have been burned so many times by SPOFs and = scaling bottlenecks in large environments, that we'll lose the audience = the minute they hear the trigger phrase. Even if it means signing up = for Other Problems later, it=E2=80=99s a lot easier to =E2=80=9Csell=E2=80= =9D the concept of completely distributed mechanisms where, if there is = any notion of centralization at all, it=E2=80=99s at least the result of = a quorum election and the DevOps folks don=E2=80=99t have to do anything = manually to cause it to happen - the cluster is =E2=80=9Cresilient" and = "self-healing" and they are happy with being able to say those buzzwords = to the CIO, who nods knowingly and tells them they=E2=80=99re doing a = fine job! Let=E2=80=99s get back, however, to the notion of downing multiple = avians with the same semi-spherical kinetic projectile: What seems to = be The Rage at the moment, and I don=E2=80=99t know how well it actually = scales since I=E2=80=99ve yet to be at the pointy end of such a = real-world deployment, is the idea of clustering the storage = (=E2=80=9Csomehow=E2=80=9D) underneath and then providing NFS and SMB = protocol access entirely in userland, usually with both of those = services cooperating with the same lock manager and even the same ACL = translation layer. Our buddies at Red Hat do this with glusterfs at the = bottom and NFS Ganesha + Samba on top - I talked to one of the Samba = core team guys at SNIA and he indicated that this was increasingly = common, with the team having helped here and there when approached by = different vendors with the same idea. We (iXsystems) also get a lot of = requests to be able to make the same file(s) available via both NFS and = SMB at the same time and they don=E2=80=99t much at all like being told = =E2=80=9Cbut that=E2=80=99s dangerous - don=E2=80=99t do that! Your = file contents and permissions models are not guaranteed to survive such = an experience!=E2=80=9D They really want to do it, because the rest of = the world lives in Heterogenous environments and that=E2=80=99s just the = way it is. Even the object storage folks, like Openstack=E2=80=99s Swift project, = are spending significant amounts of mental energy on the topic of how to = re-export their object stores as shared filesystems over NFS and SMB, = the single consistent and distributed object store being, of course, = Their Thing. They wish, of course, that the rest of the world would = just fall into line and use their object system for everything, but they = also get that the "legacy stuff=E2=80=9D just won=E2=80=99t go away and = needs some sort of attention if they=E2=80=99re to remain players at the = standards table. So anyway, that=E2=80=99s the view I have from the perspective of = someone who actually sells storage solutions for a living, and while I = could certainly =E2=80=9Csell some pNFS=E2=80=9D to various customers = who just want to add a dash of steroids to their current NFS = infrastructure, or need to use NFS but also need to store far more data = into a single namespace than any one box will accommodate, I also know = that offering even more elastic solutions will be a necessary part of = offering solutions to the growing contingent of folks who are not tied = to any existing storage infrastructure and have various non-greybearded = folks shouting in their ears about object this and cloud that. Might = there not be some compromise solution which allows us to put more of = this in userland with less context switches in and out of the kernel, = also giving us the option of presenting a more united front to multiple = protocols that require more ACL and lock impedance-matching than we=E2=80=99= d ever want to put in the kernel anyway? - Jordan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D20C793E-A2FD-49F3-AD88-7C2FED5E7715>