Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 May 2018 21:14:04 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Cc:        James Rose <james.rose@framestore.com>
Subject:   pNFS mirror file distribution (big picture question?)
Message-ID:  <YTXPR0101MB09595418D7B853DBCEB9458CDD690@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>

next in thread | raw e-mail | index | archive | help
Hi,

#1 The code currently in projects/pnfs-planb-server allows creation of sets=
 of
mirrored data servers (DSs). For example, the "-p" nfsd option argument:
nfsv4-data0#nfsv4-data1,nfsv4-data2#nfsv4-data3
defines two mirrored sets of data servers with two servers in each one.
("#" separates mirrors within a mirror set)

I did this a couple of years ago, in part because I thought having a well d=
efined
"mirror" for a DS would facilitate mirror recovery.
Now that I have completed the mirror recovery code, having a defined mirror
set is not needed.

#2 An alternate mirroring approach would be what I might call the random/di=
stributed
approach, where each file is distributed on any two (or more) of the DSs.
For this approach, the "-p" nfsd option argument:
nfsv4-data0,nfsv4-data1,nfsv4-data2,nfsv4-data3
defines four DSs and a separate flag would say "two way mirroring", so
each file would be placed on 2 of the 4 DSs.

The question is "should I switch the code to approach #2?".

I won't call it elegant, but #1 is neat and tidy, since the sysadmin knows =
that
a data file ends up on either nfsv4-data0, nfsv4-data1 or nfsv4-data2, nfsv=
4-data3.
Assuming the mirrored DSs  in a set have the same amount of storage, they w=
ill
have the same amount of free space.
--> This implies that they will run out of space at the same time and the p=
NFS
      service won't be able to write to files on the mirror set.

With #2, one of the DSs will probably run out of space first. I think this =
will make
a client trying to write a file on it to report to the Metadata server (MDS=
) a write
error and that will cause the DS to be taken offline.
Then, the write will succeed on the other mirror and things will continue t=
o run.
Eventually all the DSs will fill up, but hopefully a sysadmin can step in a=
nd fix the
"out of space" problem before that point.
Another advantage I can see for #2 is that it gives the MDS more flexibilit=
y when it
chooses which DSs to create the data files on than #1 does.
(I will be less "neat and tidy", but the sysadmin can find out which DSs st=
ore the
data for a file on the MDS on a "per file" basis.)

James Rose was asking about "manual migration". It is almost the same as wh=
at
is already done for mirror recovery and is a pretty trivial addition for #2=
. For #1, it
can be done, but is more work. "manual migration" refers to a sysadmin doin=
g a
command that moves a data file from one DS to another.
(Others that are more clever than I could use the "manual migration" syscal=
l
 to implement automagic versions to try and balance storage use and I/O loa=
d.)

Given easier migration and what I think is better handling of "out of space=
" failures,
I am leaning towards switching the code to #2.

What do others think? rick=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTXPR0101MB09595418D7B853DBCEB9458CDD690>