From owner-freebsd-fs@freebsd.org Thu May 9 18:11:20 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5DA301589B19 for ; Thu, 9 May 2019 18:11:20 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-it1-f171.google.com (mail-it1-f171.google.com [209.85.166.171]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 625AF72EEE for ; Thu, 9 May 2019 18:11:19 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-it1-f171.google.com with SMTP id q65so5135775itg.2 for ; Thu, 09 May 2019 11:11:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc:content-transfer-encoding; bh=JLwv82Mf5vJoaaWZ7Y5bQ0CS0ihiwgLQr119eJWxO10=; b=VcZLNP9ztTcbSMhc72oJi0C7AGVgJcybhICRUbRccw50wGO00cG5au3ZmC3H7e/a90 NgfnjUp0N4CDkTGB0Xe67M+J0ItAtlXZwWG6zBvtchS8ueDQGU0Cb7VkYMwA4uZzI908 36P0QfPLTL33RJW1RdpueECKZCJA+TEN29rRTjFvGXdzTkHtzZJv8J+m4tTlWqEOBs0s XHrOscQrgTNL9b/W8lLP8RCMlQkGN3esGFARTg460QC/XjfdE8BlSXjZykO2BQJc4LcD 68GavOLTV8P4wzZ5Oj+owmpqv7MDIKfwJ4leI+iSRdDNdcY1dAmoB9DMb9m1imW0/drS dTsg== X-Gm-Message-State: APjAAAVqoiRbhzzKh1nObmOOTFwwoMoiu6DToMDBqgavdLrY5dBWT+xL RoHzLk+OfOQgkwzzNkSleoSNF2Zp X-Google-Smtp-Source: APXvYqxD1tSglWKysIHTXeQRkc3RAo6YBHLaBVDBGVeaznQDcvxZv0/XoN0kv1hwlf6r9d24eZQ2Zg== X-Received: by 2002:a24:4f05:: with SMTP id c5mr4372198itb.102.1557425472913; Thu, 09 May 2019 11:11:12 -0700 (PDT) Received: from mail-io1-f45.google.com (mail-io1-f45.google.com. [209.85.166.45]) by smtp.gmail.com with ESMTPSA id w3sm1303084ita.43.2019.05.09.11.11.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 May 2019 11:11:12 -0700 (PDT) Received: by mail-io1-f45.google.com with SMTP id g84so2385407ioa.1 for ; Thu, 09 May 2019 11:11:12 -0700 (PDT) X-Received: by 2002:a5d:84ce:: with SMTP id z14mr3890245ior.107.1557425472124; Thu, 09 May 2019 11:11:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Reply-To: cem@freebsd.org From: Conrad Meyer Date: Thu, 9 May 2019 11:11:00 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: test hash functions for fsid To: Rick Macklem Cc: "freebsd-fs@freebsd.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 625AF72EEE X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of csecem@gmail.com designates 209.85.166.171 as permitted sender) smtp.mailfrom=csecem@gmail.com X-Spamd-Result: default: False [-5.48 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; HAS_REPLYTO(0.00)[cem@freebsd.org]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.98)[-0.976,0]; FORGED_SENDER(0.30)[cem@freebsd.org,csecem@gmail.com]; IP_SCORE(-2.49)[ip: (-6.44), ipnet: 209.85.128.0/17(-3.69), asn: 15169(-2.27), country: US(-0.06)]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; TAGGED_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_NEQ_ENVFROM(0.00)[cem@freebsd.org,csecem@gmail.com]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[171.166.85.209.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 May 2019 18:11:20 -0000 Hi Rick, On Wed, May 8, 2019 at 5:41 PM Rick Macklem wrote: [liberal snipping throughout :-)] > > I'll admit I had never heard of PCTRIE, but seems to be > #ifdef _KERNEL, so it can't be used in userland? Ah, you're completely correct. I had considered using PCTRIE for a userspace application earlier, and had started converting it to be buildable in userspace, but I never finished. (I determined it wouldn't be a good fit for that application, unfortunately.) I do think it might be a good datastructure to expose to base userspace programs eventually, but you probably don't really want to tackle that extra work :-). A hash table is totally fine. > Yes. I just chose 256 for this test program. Unfortunately, the way mount= d.c is > currently coded, the hash table must be sized before the getmntinfo() cal= l, > so it must be sized before it knows how many file systems there are. > Since this is userland and table size isn't a memory issue, I'm just temp= ted to > make it large enough for a large server with something like 25,000 file s= ystems. No objection, so long as people can still run tiny NFS servers on their Raspberry Pis. :-) > (I'd guess somewhere in the 256->1024 range would work. I'm not sure what > you mean by load factor below 0.8?) Load factor is just number of valid entries divided by space in the table. So if your table has 256 spaces, load factor 0.8 would be having about 205 valid entries. > Fortunately, neither ZFS nor UFS uses vfs_getnewfsid() unless there is a = collision > and I don't really care about other file system types. Ah, sorry =E2=80=94 I didn't read carefully enough when looking for fsid in= itialization. > After all, it just does > a linear search down the list of N file systems right and just about any= thing > should be an improvement.) Yes :-). > I added a simple (take the low order bits of val[0]) case to the test. I = actually > suspect any of the hash functions will work well enough, since, as you no= te, most of the values (val[0] and 24bits of val[1]) are from a random # ge= nerator which should > be pretty uniform in distribution for ZFS. > UFS now uses a value from the superblock. It appears that newfs sets val[= 0] to the > creation time of the file system and val[1] to a random value. Hm, it looks like makefs sets val[1] (fs_id[1]) to a non-random value generated by the predictable PRNG random(3), for reproducible build reasons. makefs seeds srandom(3) with either the current time in seconds (low entropy) or some known timestamp (either from a file, also in seconds, or an integer) (no entropy). I guess random(3) may provide better distribution for the table than a plain global counter, though. > If it had been the > reverse, I would be tempted to only use val[0], but I'll see how well the= test goes > for Peter. Seems like you were right =E2=80=94 any old function is good enough :-). Take care, Conrad