From owner-freebsd-fs@freebsd.org Wed Sep 9 12:20:33 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B6215A00A3F for ; Wed, 9 Sep 2015 12:20:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 5B0B914A0 for ; Wed, 9 Sep 2015 12:20:32 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:g0TAFhSIOgb5dJsvAj5nA5vMr9psv+yvbD5Q0YIujvd0So/mwa65Zx2N2/xhgRfzUJnB7Loc0qyN4/ymADBLvcbJmUtBWaIPfidNsd8RkQ0kDZzNImzAB9muURYHGt9fXkRu5XCxPBsdMs//Y1rPvi/6tmZKSV3BPAZ4bt74BpTVx5zukbvip9uKP04U1HKUWvBbElaflU3prM4YgI9veO4a6yDihT92QdlQ3n5iPlmJnhzxtY+a9Z9n9DlM6bp6r5YTGY2zRakzTKRZATI6KCh1oZSz7ViQBTaJ/WYWB2UKjgJTUU+C6BDhQoy3vDH3u+Bm1G+dJ8KxSLk1XTGr6eBvSQT0iSEJMHk36mzagNd8yaxA8y6m8jti34Tda4LdGPt4caSVKdQHWWBIVcVdVipOBauzaoIOC6wKOuMO/KfnoF5blxq1BkGJDejszjJNzivs2KQx0OAsFCnb2wM9EtYWsDLfpYOmZ+8pTempwfyQnn34ZPRM1GK4sdCQfw== X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DABADMIvBV/61jaINdg0YxaQaDHroKgW0KhS9KAoFwEgEBAQEBAQEBgQmCHYIHAQEEAQEBICsgCxACAQgSBgICDRkCAicBCRgOAgQIBwQBCBQEiA0NtVOUPQEBAQEGAQEBAQEdgSKFUYN2gQWEOgEBBRcBMweCLgwvEoExBZUYPoUKhQ6ELoQzgx+NboNqAiaEHCIzB4cDOoEFAQEB X-IronPort-AV: E=Sophos;i="5.17,496,1437451200"; d="scan'208";a="237418319" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 09 Sep 2015 08:20:25 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id C40D915F565; Wed, 9 Sep 2015 08:20:25 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 7zhqnZDvfr3j; Wed, 9 Sep 2015 08:20:24 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 7840715F56D; Wed, 9 Sep 2015 08:20:24 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Re828nJ6cNry; Wed, 9 Sep 2015 08:20:24 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 55D6D15F565; Wed, 9 Sep 2015 08:20:24 -0400 (EDT) Date: Wed, 9 Sep 2015 08:20:24 -0400 (EDT) From: Rick Macklem To: Outback Dingo Cc: Mark Saad , freebsd-fs@freebsd.org, Rakshith Venkatesh , Jordan Hubbard Message-ID: <488408636.4345946.1441801224232.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <100306673.40344407.1441279047901.JavaMail.zimbra@uoguelph.ca> <1564D4FA-9BE1-4E37-8E91-F14A009D6B62@icloud.com> <838814506.1858817.1441577912291.JavaMail.zimbra@uoguelph.ca> Subject: Re: CEPH + FreeBSD MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: CEPH + FreeBSD Thread-Index: n1CLYu3Luc+d+Rgy3/q9eSnSA2e9Dg== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Sep 2015 12:20:33 -0000 Outback Dingo wrote: > On Wed, Sep 9, 2015 at 11:31 AM, Mark Saad wrote= : >=20 > > All > > What about leofs. It's in ports has and s3 obj store and NFS support = out > > of the box > > > > > > http://www.freshports.org/databases/leofs/ > > http://leo-project.net >=20 >=20 > LeoFS supperts NFSv3 and does not have a lock manager.... >=20 I doubt lack of a lock manager is an issue for what I want to do, since the= NFSv4.1 metadata server (just a regular NFSv4.1 server that can give out layouts fo= r reading/writing the data directly on the data servers) handles the locking. It is actually = much easier to keep track of the locking in the NFSv4.1 server and not have to worry about lock= ing on the underlying cluster FS. All I intend to do with a NFSv3 server on the data s= erver(s) is do Read/Write RPCs. Everything else is handled via the NFSv4.1 metadata server= . (The original RFC required use of NFSv4.1 read/write ops on the data server= s, but a new layout type called flex files supports NFSv3 Read/Write for the = data servers.) The key issue for me is whether or not it has a VFS interface to a POSIX li= ke file system (via FUSE or ???). At a quick glance at the web page, I don't s= ee any mention of this? Why? Well, simply the fact that I am looking at extending the current kerne= l based NFSv4.1 server to support pNFS. Obviously, there are other ways a NFSv= 4.1/pNFS server can be built (userland NFS-Ganehsa that is on Linux, for exampl= e), but that isn't what I'm interested in doing. Btw, I took a quick look at MooseFS and it does seem to have this and could= be an alternative to glusterFS. It isn't an object store and only appears to= have a single metadata server, which might be a limitation for the long term? It sounds like MooseFS uses custom prototcol for the chunk/data servers and I don't feel like trying to define yet another layout type= , so I think I would need to add a partial NFSv3 server to the chunk/data ser= vers. I will be looking more closely at both glusterFS and MooseFS soon. If there are yet more of these cluster object stores that you think might b= e worth considering, feel free to mention them. (I thought I had looked at most of = them, but hadn't noticed MooseFS, so...) Thanks for all the comments, rick >=20 >=20 > > > > > > > > --- > > Mark Saad | nonesuch@longcount.org > > > > > On Sep 6, 2015, at 6:18 PM, Rick Macklem wrote= : > > > > > > Jordan Hubbard wrote: > > >> > > >>> On Sep 3, 2015, at 4:17 AM, Rick Macklem wro= te: > > >>> > > >>> Slightly off topic but, btw, there is a port of GLusterFS and those > > folks > > >>> do seem > > >>> interested in seeing it brought "up to speed". I am not sure how > > mature it > > >>> is at > > >>> this point, but it has been known to build on amd64. (I don't have = an > > amd64 > > >>> machine, > > >>> so I haven't gotten around to building/testing it, but I do plan to > > try and > > >>> use > > >>> it as a basis for a pNFS server, if I can figure out how to get the= FH > > info > > >>> out of it. > > >>> I'm working on that;-) > > >> > > >> There are at least two distributed (multi-node) object stores for > > FreeBSD > > >> that I know of. > > >> > > >> One is glusterfs, for which I=E2=80=99m not even really clear on the= status of > > the > > >> ports for. I don=E2=80=99t see any glusterfs port in the master bra= nch of > > >> https://github.com/freebsd/freebsd-ports (or > > >> https://github.com/freebsd/freebsd-ports/tree/branches/2015Q3 for th= at > > >> matter). > > >> > > >> Our FreeNAS ports tree (https://github.com/freenas/ports), in which = we > > have a > > >> bit more latitude to add and curate our own ports, has both a > > net/glusterfs > > >> and sysutils/glusterfs, from separate sources (looks like we need to > > clean > > >> things up) - net/glusterfs lists craig001@lerwick.hopto.org as the > > >> MAINTAINER and is at version 3.6.2. The sysutils/glusterfs port lis= ts > > >> bapt@FreeBSD.org as the MAINTAINER and is at version 20140811. > > >> > > >> I=E2=80=99m not really sure about the provenance since we were simpl= y evaluating > > >> glusterfs for awhile and may have pulled in interim versions from th= ose > > >> sources, but obviously it would be best to have an official maintain= er > > and > > >> someone in the FreeBSD project actually curating a glusterfs port so > > that > > >> all users of FreeBSD can use it. It would also be fairly key to you= r > > own > > >> efforts, assuming you decide to pursue glusterfs as a foundation > > technology > > >> for pNFS. > > >> > > >> The other object store, which is pretty mature and is currently lead= ing > > the > > >> pack (of two :) ) for inclusion into FreeNAS is RiakCS from Basho. > > There is > > >> a port currently in databases/riak but it=E2=80=99s pretty out of da= te at > > version > > >> 1.4.12 (the current version is 2.0.1, with 2.0 being a major upgrade= of > > >> RiakCS). > > >> > > >> We are very interested in investigating various ways of shimming Ria= kCS > > to > > >> NFS, using RiakCS a back-end store. Is that something you=E2=80=99= d be > > amenable to > > >> discussing? I=E2=80=99d be happy to send you an amd64 architecture= machine to > > >> develop on. :) > > > Hmm. From a quick look at their web page (I looked once before as wel= l), > > I don't > > > think RiakCS has what I need to do pNFS in a reasonable (for me) amou= nt > > of effort. > > > Two things that glusterFS has that I am hoping to use (and I don't th= ink > > RiakCS has > > > either of these) are: > > > - A Fuse file system interface which allows the kernel nfsd to access > > the store as > > > a file system, so that it can provide the metadata services (NFS > > without the reads/writes). > > > - A userland NFSv3 server in each node which will allow the node to a= ct > > as a data server. > > > > > > If I am wrong and RiakCS does support a VFS file system interface (vi= a > > Fuse or ???), then > > > please correct me. With that, it might be a reasonable alternative. > > > I'll admit I've spent a little time looking at the glusterFS sources = and > > haven't yet > > > solved the problem of how to generate the file handles I need, but th= at > > sounds trivial > > > compared with an entire Fuse and/or VFS file system interface, I thin= k? > > > > > > In general, using a cloud object store to implement a pNFS server is = a > > *mis*use of > > > the technology, imho. I think it may be possible with glusterFS, sinc= e > > that technology > > > seems to be based on a cluster file system, which is what a pNFS serv= er > > can also use. > > > > > > I think there would be a lot of work involved in mapping a POSIX file > > system onto the > > > Riak database and then exporting that via NFS, etc. It might also be > > more practical to > > > do this via a userland NFS service than the kernel based one currentl= y > > in FreeBSD. > > > (glusterFS is starting to use the NFS-ganesha server, but I believe i= t > > is pretty Linux specific, > > > so I doubt it would be useful for Riak running on FreeBSD?) > > > > > > rick > > > > > >> - Jordan > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > >=20