Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 18 Jun 2016 20:14:29 -0500
From:      Chris Watson <bsdunix44@gmail.com>
To:        Jordan Hubbard <jkh@ixsystems.com>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, freebsd-fs <freebsd-fs@freebsd.org>, Alexander Motin <mav@freebsd.org>
Subject:   Re: pNFS server Plan B
Message-ID:  <7E27FA25-E18F-41D3-8974-EAE1EACABF38@gmail.com>
In-Reply-To: <D20C793E-A2FD-49F3-AD88-7C2FED5E7715@ixsystems.com>
References:  <1524639039.147096032.1465856925174.JavaMail.zimbra@uoguelph.ca> <D20C793E-A2FD-49F3-AD88-7C2FED5E7715@ixsystems.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Since Jordan brought up clustering, I would be interested to hear Justin Gib=
bs thoughts here. I know about a year ago he was asked on an "after hours" v=
ideo chat hosted by Matt Aherns about a feature he would really like to see a=
nd he mentioned he would really like, in a universe filled with time and mon=
ey I'm sure, to work on a native clustering solution for FreeBSD. I don't kn=
ow if he is subscribed to the list, and I'm certainly not throwing him under=
 the bus by bringing his name up, but I know he has at least been thinking a=
bout this for some time and probably has some value to add here.=20

Chris

Sent from my iPhone 5

> On Jun 18, 2016, at 3:50 PM, Jordan Hubbard <jkh@ixsystems.com> wrote:
>=20
>=20
>> On Jun 13, 2016, at 3:28 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>=20
>> You may have already heard of Plan A, which sort of worked
>> and you could test by following the instructions here:
>>=20
>> http://people.freebsd.org/~rmacklem/pnfs-setup.txt
>>=20
>> However, it is very slow for metadata operations (everything other than
>> read/write) and I don't think it is very useful.
>=20
> Hi guys,
>=20
> I finally got a chance to catch up and bring up Rick=E2=80=99s pNFS setup o=
n a couple of test machines.  He=E2=80=99s right, obviously - The =E2=80=9Cp=
lan A=E2=80=9D approach is a bit convoluted and not at all surprisingly slow=
.  With all of those transits twixt kernel and userland, not to mention glus=
terfs itself which has not really been tuned for our platform (there are a n=
umber of papers on this we probably haven=E2=80=99t even all read yet), we=E2=
=80=99re obviously still in the =E2=80=9Cfirst make it work=E2=80=9D stage.
>=20
> That said, I think there are probably more possible plans than just A and B=
 here, and we should give the broader topic of =E2=80=9Cwhat does FreeBSD wa=
nt to do in the Enterprise / Cloud computing space?" at least some considera=
tion at the same time, since there are more than a few goals running in para=
llel here.
>=20
> First, let=E2=80=99s talk about our story around clustered filesystems + a=
ssociated command-and-control APIs in FreeBSD.  There is something of an emb=
arrassment of riches in the industry at the moment - glusterfs, ceph, Hadoop=
 HDFS, RiakCS, moose, etc.  All or most of them offer different pros and con=
s, and all offer more than just the ability to store files and scale =E2=80=9C=
elastically=E2=80=9D.  They also have ReST APIs for configuring and monitori=
ng the health of the cluster, some offer object as well as file storage, and=
 Riak offers a distributed KVS for storing information *about* file objects i=
n addition to the object themselves (and when your application involves stor=
ing and managing several million photos, for example, the idea of distributi=
ng the index as well as the files in a fault-tolerant fashion is also compel=
ling).  Some, if not most, of them are also far better supported under Linux=
 than FreeBSD (I don=E2=80=99t think we even have a working ceph port yet). =
  I=E2=80=99m not saying we need to blindly follow the herds and do all the s=
ame things others are doing here, either, I=E2=80=99m just saying that it=E2=
=80=99s a much bigger problem space than simply =E2=80=9Cparallelizing NFS=E2=
=80=9D and if we can kill multiple birds with one stone on the way to doing t=
hat, we should certainly consider doing so.
>=20
> Why?  Because pNFS was first introduced as a draft RFC (RFC5661 <https://d=
atatracker.ietf.org/doc/rfc5661/>) in 2005.  The linux folks have been worki=
ng on it <http://events.linuxfoundation.org/sites/events/files/slides/pnfs.p=
df> since 2006.  Ten years is a long time in this business, and when I raise=
d the topic of pNFS at the recent SNIA DSI conference (where storage develop=
ers gather to talk about trends and things), the most prevalent reaction I g=
ot was =E2=80=9Cpeople are still using pNFS?!=E2=80=9D   This is clearly one=
 of those technologies that may still have some runway left, but it=E2=80=99=
s been rapidly overtaken by other approaches to solving more or less the sam=
e problems in coherent, distributed filesystem access and if we want to get m=
indshare for this, we should at least have an answer ready for the =E2=80=9C=
why did you guys do pNFS that way rather than just shimming it on top of ${s=
omeNewerHotness}??=E2=80=9D argument.   I=E2=80=99m not suggesting pNFS is d=
ead - hell, even AFS <https://www.openafs.org/>; still appears to be somewhat=
 alive, but there=E2=80=99s a difference between appealing to an increasingl=
y narrow niche and trying to solve the sorts of problems most DevOps folks w=
orking At Scale these days are running into.
>=20
> That is also why I am not sure I would totally embrace the idea of a centr=
al MDS being a Real Option.  Sure, the risks can be mitigated (as you say, b=
y mirroring it), but even saying the words =E2=80=9Ccentral MDS=E2=80=9D (or=
 central anything) may be such a turn-off to those very same DevOps folks, f=
olks who have been burned so many times by SPOFs and scaling bottlenecks in l=
arge environments, that we'll lose the audience the minute they hear the tri=
gger phrase.  Even if it means signing up for Other Problems later, it=E2=80=
=99s a lot easier to =E2=80=9Csell=E2=80=9D the concept of completely distri=
buted mechanisms where, if there is any notion of centralization at all, it=E2=
=80=99s at least the result of a quorum election and the DevOps folks don=E2=
=80=99t have to do anything manually to cause it to happen - the cluster is =E2=
=80=9Cresilient" and "self-healing" and they are happy with being able to sa=
y those buzzwords to the CIO, who nods knowingly and tells them they=E2=80=99=
re doing a fine job!
>=20
> Let=E2=80=99s get back, however, to the notion of downing multiple avians w=
ith the same semi-spherical kinetic projectile:  What seems to be The Rage a=
t the moment, and I don=E2=80=99t know how well it actually scales since I=E2=
=80=99ve yet to be at the pointy end of such a real-world deployment, is the=
 idea of clustering the storage (=E2=80=9Csomehow=E2=80=9D) underneath and t=
hen providing NFS and SMB protocol access entirely in userland, usually with=
 both of those services cooperating with the same lock manager and even the s=
ame ACL translation layer.  Our buddies at Red Hat do this with glusterfs at=
 the bottom and NFS Ganesha + Samba on top - I talked to one of the Samba co=
re team guys at SNIA and he indicated that this was increasingly common, wit=
h the team having helped here and there when approached by different vendors=
 with the same idea.   We (iXsystems) also get a lot of requests to be able t=
o make the same file(s) available via both NFS and SMB at the same time and t=
hey don=E2=80=99t much at all like being told =E2=80=9Cbut that=E2=80=99s da=
ngerous - don=E2=80=99t do that!  Your file contents and permissions models a=
re not guaranteed to survive such an experience!=E2=80=9D  They really want t=
o do it, because the rest of the world lives in Heterogenous environments an=
d that=E2=80=99s just the way it is.
>=20
> Even the object storage folks, like Openstack=E2=80=99s Swift project, are=
 spending significant amounts of mental energy on the topic of how to re-exp=
ort their object stores as shared filesystems over NFS and SMB, the single c=
onsistent and distributed object store being, of course, Their Thing.  They w=
ish, of course, that the rest of the world would just fall into line and use=
 their object system for everything, but they also get that the "legacy stuf=
f=E2=80=9D just won=E2=80=99t go away and needs some sort of attention if th=
ey=E2=80=99re to remain players at the standards table.
>=20
> So anyway, that=E2=80=99s the view I have from the perspective of someone w=
ho actually sells storage solutions for a living, and while I could certainl=
y =E2=80=9Csell some pNFS=E2=80=9D to various customers who just want to add=
 a dash of steroids to their current NFS infrastructure, or need to use NFS b=
ut also need to store far more data into a single namespace than any one box=
 will accommodate, I also know that offering even more elastic solutions wil=
l be a necessary part of offering solutions to the growing contingent of fol=
ks who are not tied to any existing storage infrastructure and have various n=
on-greybearded folks shouting in their ears about object this and cloud that=
.  Might there not be some compromise solution which allows us to put more o=
f this in userland with less context switches in and out of the kernel, also=
 giving us the option of presenting a more united front to multiple protocol=
s that require more ACL and lock impedance-matching than we=E2=80=99d ever w=
ant to put in the kernel anyway?
>=20
> - Jordan
>=20
>=20
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7E27FA25-E18F-41D3-8974-EAE1EACABF38>