FreeBSD Mail Archives

Date:      Mon, 26 Feb 2001 00:14:36 +0100
From:      Bernd Walter <ticso@cicely8.cicely.de>
To:        David Gilbert <dgilbert@velocet.ca>
Cc:        Matt Dillon <dillon@earth.backplane.com>, Bernd Walter <ticso@cicely5.cicely.de>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: [hackers] Re: Large MFS on NFS-swap?
Message-ID:  <20010226001436.A728@cicely8.cicely.de>
In-Reply-To: <15001.35517.468307.915125@trooper.velocet.net>; from dgilbert@velocet.ca on Sun, Feb 25, 2001 at 05:44:13PM -0500
References:  <15000.8884.6165.759008@trooper.velocet.net> <20010225042933.A508@cicely5.cicely.de> <200102250644.f1P6iuL12016@earth.backplane.com> <15001.21129.307283.198917@trooper.velocet.net> <200102251913.f1PJDAc15495@earth.backplane.com> <15001.35517.468307.915125@trooper.velocet.net>

On Sun, Feb 25, 2001 at 05:44:13PM -0500, David Gilbert wrote:
> >>>>> "Matt" == Matt Dillon <dillon@earth.backplane.com> writes:
> 
> [... my newfw bomb deleted ...]
> 
> Matt>     I had a set of patches for newfs a year or two ago but never
> Matt> incorporated them.  We'd have to do a run-through on newfs to
> Matt> get it to newfs a swap-backed (i.e. 4K/sector) 1TB filesystem.
> 
> Matt>     Actually, this brings up a good point.  Drive storage is
> Matt> beginning to reach the limitations of FFS and our internal (512
> Matt> byte/block) block numbering scheme.  IBM is almost certain to
> Matt> come out with their 500GB hard drive sometime this year.  We
> Matt> should probably do a bit of cleanup work to make sure that we
> Matt> can at least handle FFS's theoretical limitations for real.
> 
> That and the availability of vinum and other raid solutions.  You can
> always make multiple partitions for no good reason (other than
> filesystem limitations), but we were planning to put together a 1TB
> filesystem next month.  From what you're telling me, I'd need larger
> block sizes to make this work?

With 512 Byte blocksizes you are limited to 1T because the physical
block number is a signed 32bit.
FFS uses 32bit (I wouldn't count on the high bit) frag numbers.
A fragment defaults to 1k so even with 1k fragments the limit is
at least 2T.
It is possible to reformat most SCSI disks to 1k or 2k block sizes,
but I'm not shure if vinum handles non 512 byte blocks correctly
and I don't know if the buffer code always uses 512 or physical sizes.
Maybe ccd is an option.
AFAIK the same limit is there for SCSI as SCSI uses 32bit block numbers.
Using a RAID controller will show the same limits and it's
usually untested with block sizes != 512.

> IMHO, we might reconsider that.  With SAN-type designs, you're
> probably going to find the distribution of filesizes on
> multi-terrabyte filesystems that are shared by 100's of computers to
> be roughly the same as the filesize distributions on today's
> filesystems.
> 
> Making the run for larger block sizes puts us in the same league as
> DOS.  While it will stave off the wolves, it will only work for so
> long give Moore's law.

Noone is telling that that's the way the world will go.
But is a goable workaround until the world is more perfect.

The base design should allow using 64bit values some day without
the scaling problem and consistency risks as DOS.

The steps have to go increasing the limits of the buffer system
and the filesystem after that.
I'm shure the persons with knowledge about that will do what
is possible after -current stabilizes a bit.
Changing everything at the same time is not a good way to go.

-- 
B.Walter              COSMO-Project         http://www.cosmo-project.de
ticso@cicely.de         Usergroup           info@cosmo-project.de

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010226001436.A728>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation