Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 May 2019 21:43:56 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "cem@freebsd.org" <cem@freebsd.org>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.org>
Subject:   Re: test hash functions for fsid
Message-ID:  <YQBPR0101MB226084001EB8D4756A3D81BEDD330@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CAG6CVpWpduHKVU9Y_W01sZUcL6zT4uP5iNnF-4Lr3Bv_bt=jUg@mail.gmail.com>
References:  <YQBPR0101MB2260D82BAE348FB82902508CDD320@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM> <CAG6CVpUER0ODF5mfdGFA-soSpGbh=qkjc4rKiJEdaOUaUaUmAw@mail.gmail.com> <YQBPR0101MB226068818D8C0D9591DD3D94DD330@YQBPR0101MB2260.CANPRD01.PROD.OUTLOOK.COM>, <CAG6CVpWpduHKVU9Y_W01sZUcL6zT4uP5iNnF-4Lr3Bv_bt=jUg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Conrad Meyer wrote:
>On Wed, May 8, 2019 at 5:41 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:
>[liberal snipping throughout :-)]
>>
>> I'll admit I had never heard of PCTRIE, but <sys/pctrie.h> seems to be
>> #ifdef _KERNEL, so it can't be used in userland?
>
>Ah, you're completely correct.  I had considered using PCTRIE for a
>userspace application earlier, and had started converting it to be
>buildable in userspace, but I never finished.  (I determined it
>wouldn't be a good fit for that application, unfortunately.)  I do
>think it might be a good datastructure to expose to base userspace
>programs eventually, but you probably don't really want to tackle that
>extra work :-).  A hash table is totally fine.
>
>> Yes. I just chose 256 for this test program. Unfortunately, the way moun=
td.c is
>> currently coded, the hash table must be sized before the getmntinfo() ca=
ll,
>> so it must be sized before it knows how many file systems there are.
>> Since this is userland and table size isn't a memory issue, I'm just tem=
pted to
>> make it large enough for a large server with something like 25,000 file =
systems.
>
>No objection, so long as people can still run tiny NFS servers on
>their Raspberry Pis. :-)
I do now think I can dynamically size it based on how many file systems
(which is how many structures) will be linked off the table, so this should=
n't be
a problem.

>> (I'd guess somewhere in the 256->1024 range would work. I'm not sure wha=
t
>>  you mean by load factor below 0.8?)
>
>Load factor is just number of valid entries divided by space in the
>table.  So if your table has 256 spaces, load factor 0.8 would be
>having about 205 valid entries.
Hmm. That would suggest a large table size of about 100,000 for Peter's
72,000 file system case. It would result in a list length averaging about 1=
 entry,
but I think a linear search through 10-20 entries won't take long enough to=
 be
a problem for this case. (This happens whenever mountd receives a SIGHUP an=
d each time a client does a mount.)

As noted in the last post, I was thinking along the lines of 10->20 as a ta=
rget
linked list length. (Or "table_size =3D num / L", where L is the target ave=
rage list length.=20
L =3D 10->20 would be roughly a load average of 10->20.)

Does that sound reasonable? (Interested in hearing from anyone on this.)

>> Fortunately, neither ZFS nor UFS uses vfs_getnewfsid() unless there is a=
 collision
>> and I don't really care about other file system types.
>
>Ah, sorry =97 I didn't read carefully enough when looking for fsid initial=
ization.
>
>> After all, it just does
>>  a linear search down the list of N file systems right and just about an=
ything
>>  should be an improvement.)
>
>Yes :-).
>
>> I added a simple (take the low order bits of val[0]) case to the test. I=
 actually
>> suspect any of the hash functions will work well enough, since, as you n=
ote, most of the values (val[0] and 24bits of val[1]) are from a random # g=
enerator which should
>> be pretty uniform in distribution for ZFS.
>> UFS now uses a value from the superblock. It appears that newfs sets val=
[0] to the
>> creation time of the file system and val[1] to a random value.
>
>Hm, it looks like makefs sets val[1] (fs_id[1]) to a non-random value
>generated by the predictable PRNG random(3), for reproducible build
>reasons.  makefs seeds srandom(3) with either the current time in
>seconds (low entropy) or some known timestamp (either from a file,
>also in seconds, or an integer) (no entropy).  I guess random(3) may
>provide better distribution for the table than a plain global counter,
>though.
Yes. It would be the distribution of values and not their randomness that w=
ould
matter for this case. Hopefully someone with a large # of UFS file systems =
will
run the test program and post the results. To be honest, I doubt if anyone =
will
create a server with enough UFS file systems for it to be important to hash=
 their
fsid well. It works fine for the small number of UFS file systems I have.)

>> If it had been the
>> reverse, I would be tempted to only use val[0], but I'll see how well th=
e test goes
>> for Peter.
>
>Seems like you were right =97 any old function is good enough :-).
>From Peter's test, the first three did fine and almost the same. The last t=
wo weren't
as good, but were still tolerable.

Thanks for the comments, rick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQBPR0101MB226084001EB8D4756A3D81BEDD330>