From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 05:09:21 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 311FF16A4B3; Tue, 21 Oct 2003 05:09:21 -0700 (PDT)
Received: from genius.tao.org.uk (genius.tao.org.uk [212.135.162.51])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 57AB843FA3; Tue, 21 Oct 2003 05:09:20 -0700 (PDT)
	(envelope-from joe@genius.tao.org.uk)
Received: by genius.tao.org.uk (Postfix, from userid 100)
	id DD409476C; Tue, 21 Oct 2003 13:09:18 +0100 (BST)
Date: Tue, 21 Oct 2003 13:09:18 +0100
From: Josef Karthauser <joe@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Message-ID: <20031021120918.GC15345@genius.tao.org.uk>
Mail-Followup-To: Josef Karthauser <joe@FreeBSD.org>,
	freebsd-fs@FreeBSD.org, current@FreeBSD.org
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="Y5rl02BVI9TCfPar"
Content-Disposition: inline
User-Agent: Mutt/1.5.4i
cc: current@FreeBSD.org
Subject: Problems with NFS (client) under 5.1.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 12:09:21 -0000


--Y5rl02BVI9TCfPar
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I'm trying to set a FreeBSD 5.1 machine up as an NFS client.  The
server is on an SGI box.  Things are strange:

    phoenix# uname -a
    FreeBSD phoenix.mydomain 5.1-CURRENT FreeBSD 5.1-CURRENT #0: Thu Sep 18=
 15:20:19 GMT 2003 root@pheonix.mydomain:/usr/obj/usr/src/sys/GENERIC  i386

    phoenix# ls -ld /mnt
    drwxr-xr-x  2 root  wheel  512 Jun  5 01:53 /mnt

    phoenix# mount rebus:/rebus/home /mnt
    phoenix# ls -ld /mnt
    ls: /mnt: Permission denied
    phoenix# ls -ld /* | grep mnt

    phoenix# umount /mnt
    phoenix# ls -ld /* | grep mnt
    drwxr-xr-x   2 root  wheel   512 Jun  5 01:53 /mnt

What's going on here?  Is it a bug or something that I'm doing wrong?

    phoenix# grep nfs /etc/rc.conf
    nfs_client_enable=3D"YES"         # This host is an NFS client (or NO).

The NFS server is:
    IRIX64 rebus 6.5 04101930 IP35 mips

Joe
--=20
Josef Karthauser (joe@tao.org.uk)	       http://www.josef-k.net/
FreeBSD (cvs meister, admin and hacker)     http://www.uk.FreeBSD.org/
Physics Particle Theory (student)   http://www.pact.cpes.sussex.ac.uk/
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D An eclectic mix of fact an=
d theory. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

--Y5rl02BVI9TCfPar
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iEYEARECAAYFAj+VIe4ACgkQXVIcjOaxUBYajgCdGAdJ/9jILRfhHzrPfwjBAm8o
qRwAoOEcC1XF1uCOBYPdHRaTtOqSCSRy
=+G/n
-----END PGP SIGNATURE-----

--Y5rl02BVI9TCfPar--

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 08:23:59 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id CA99716A4B3; Tue, 21 Oct 2003 08:23:59 -0700 (PDT)
Received: from genius.tao.org.uk (genius.tao.org.uk [212.135.162.51])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 261A143FAF; Tue, 21 Oct 2003 08:23:57 -0700 (PDT)
	(envelope-from joe@genius.tao.org.uk)
Received: by genius.tao.org.uk (Postfix, from userid 100)
	id D05C04228; Tue, 21 Oct 2003 16:23:51 +0100 (BST)
Date: Tue, 21 Oct 2003 16:23:51 +0100
From: Josef Karthauser <joe@FreeBSD.org>
To: ticso@cicely.de
Message-ID: <20031021152351.GB1438@genius.tao.org.uk>
Mail-Followup-To: Josef Karthauser <joe@FreeBSD.org>,
	ticso@cicely.de, freebsd-fs@FreeBSD.org, current@FreeBSD.org
References: <20031021120918.GC15345@genius.tao.org.uk>
	<20031021133336.GT38650@cicely12.cicely.de>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="mojUlQ0s9EVzWg2t"
Content-Disposition: inline
In-Reply-To: <20031021133336.GT38650@cicely12.cicely.de>
User-Agent: Mutt/1.5.4i
cc: freebsd-fs@FreeBSD.org
cc: current@FreeBSD.org
Subject: Re: Problems with NFS (client) under 5.1.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 15:24:00 -0000


--mojUlQ0s9EVzWg2t
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 21, 2003 at 03:33:37PM +0200, Bernd Walter wrote:
>=20
> You are root - and root is often mapped to nobody on the server.
> Are you shure that nobody is allowed to see?
> The ls -ld /mnt case is strange, but /mnt is already on the server
> namespace.
>=20

The linux boxes on the network don't appear to have any problems.
Either way why is the /mnt entry disappearing?

    phoenix# mount rebus:/rebus/home /mnt
    phoenix# suspend
    [1] + Suspended (signal)      su
    $ id
    uid=3D1001(joe) gid=3D1001(joe) groups=3D1001(joe), 0(wheel)
    $ ls /mnt
    ls: /mnt: Permission denied
    $ ls -l /
    ls: mnt: Permission denied
    total 45
    -r--r--r--   1 root  wheel  4735 Jun  5 01:57 COPYRIGHT
    drwxr-xr-x   2 root  wheel  1024 Sep 17 19:31 bin
    drwxr-xr-x   6 root  wheel   512 Sep 18 17:04 boot
    drwxr-xr-x   2 root  wheel   512 Jul  2 17:32 cdrom
    lrwxr-xr-x   1 root  wheel    10 Jul  2 17:49 compat -> usr/compat
    dr-xr-xr-x   4 root  wheel   512 Oct 21 12:09 dev
    drwxr-xr-x   2 root  wheel   512 Jul  2 17:32 dist
    -rw-------   1 root  wheel  4096 Sep 17 11:28 entropy
    drwxr-xr-x  16 root  wheel  2048 Oct 21 12:21 etc
    lrwxr-xr-x   1 root  wheel     9 Sep 17 12:10 home -> /usr/home
    drwxr-xr-x   2 root  wheel  1024 Sep 17 19:33 lib
    drwxr-xr-x   2 root  wheel   512 Sep 17 19:33 libexec
    lrwxr-xr-x   1 root  wheel    10 Sep 23 11:40 local -> /usr/local
    dr-xr-xr-x   2 root  wheel   512 Jun  5 01:53 proc
    drwxr-xr-x   2 root  wheel  2560 Sep 17 19:33 rescue
    drwxr-xr-x   3 root  wheel   512 Sep 29 13:01 root
    drwxr-xr-x   2 root  wheel  2560 Sep 17 19:33 sbin
    drwxr-xr-x   4 root  wheel  1024 Jul  2 17:32 stand
    lrwxr-xr-x   1 root  wheel    11 Sep 17 19:30 sys -> usr/src/sys
    drwxrwxrwt   4 root  wheel   512 Oct 21 15:27 tmp
    drwxr-xr-x  18 root  wheel   512 Oct 16 11:48 usr
    drwxr-xr-x  20 root  wheel   512 Oct 16 11:53 var

    $ fg
    su
    phoenix# umount /mnt
    phoenix# ls -l /
    total 51
    -rw-r--r--   2 root  wheel   797 Jun  5 01:57 .cshrc
    -rw-r--r--   2 root  wheel   251 Jun  5 01:57 .profile
    -r--r--r--   1 root  wheel  4735 Jun  5 01:57 COPYRIGHT
    drwxr-xr-x   2 root  wheel  1024 Sep 17 19:31 bin
    drwxr-xr-x   6 root  wheel   512 Sep 18 17:04 boot
    drwxr-xr-x   2 root  wheel   512 Jul  2 17:32 cdrom
    lrwxr-xr-x   1 root  wheel    10 Jul  2 17:49 compat -> usr/compat
    dr-xr-xr-x   4 root  wheel   512 Oct 21 12:09 dev
    drwxr-xr-x   2 root  wheel   512 Jul  2 17:32 dist
    -rw-------   1 root  wheel  4096 Sep 17 11:28 entropy
    drwxr-xr-x  16 root  wheel  2048 Oct 21 12:21 etc
    lrwxr-xr-x   1 root  wheel     9 Sep 17 12:10 home -> /usr/home
    drwxr-xr-x   2 root  wheel  1024 Sep 17 19:33 lib
    drwxr-xr-x   2 root  wheel   512 Sep 17 19:33 libexec
    lrwxr-xr-x   1 root  wheel    10 Sep 23 11:40 local -> /usr/local
    drwxr-xr-x   2 root  wheel   512 Jun  5 01:53 mnt
    dr-xr-xr-x   2 root  wheel   512 Jun  5 01:53 proc
    drwxr-xr-x   2 root  wheel  2560 Sep 17 19:33 rescue
    drwxr-xr-x   3 root  wheel   512 Sep 29 13:01 root
    drwxr-xr-x   2 root  wheel  2560 Sep 17 19:33 sbin
    drwxr-xr-x   4 root  wheel  1024 Jul  2 17:32 stand
    lrwxr-xr-x   1 root  wheel    11 Sep 17 19:30 sys -> usr/src/sys
    drwxrwxrwt   4 root  wheel   512 Oct 21 15:27 tmp
    drwxr-xr-x  18 root  wheel   512 Oct 16 11:48 usr
    drwxr-xr-x  20 root  wheel   512 Oct 16 11:53 var

Joe
--=20
Josef Karthauser (joe@tao.org.uk)	       http://www.josef-k.net/
FreeBSD (cvs meister, admin and hacker)     http://www.uk.FreeBSD.org/
Physics Particle Theory (student)   http://www.pact.cpes.sussex.ac.uk/
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D An eclectic mix of fact an=
d theory. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

--mojUlQ0s9EVzWg2t
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iEYEARECAAYFAj+VT4cACgkQXVIcjOaxUBbKpgCfUWatD80U6RRDqVO35zkE01Aq
vSMAnjJ+LEBWEUSu/QSe7wQm+edIlOx6
=LBn1
-----END PGP SIGNATURE-----

--mojUlQ0s9EVzWg2t--

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 09:27:46 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id F0B5316A4B3; Tue, 21 Oct 2003 09:27:46 -0700 (PDT)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 48A1E43F75; Tue, 21 Oct 2003 09:27:46 -0700 (PDT)
	(envelope-from bright@elvis.mu.org)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id 3C4B42ED441; Tue, 21 Oct 2003 09:27:46 -0700 (PDT)
Date: Tue, 21 Oct 2003 09:27:46 -0700
From: Alfred Perlstein <bright@mu.org>
To: Josef Karthauser <joe@FreeBSD.org>, ticso@cicely.de,
	freebsd-fs@FreeBSD.org, current@FreeBSD.org
Message-ID: <20031021162746.GB99943@elvis.mu.org>
References: <20031021120918.GC15345@genius.tao.org.uk>
	<20031021133336.GT38650@cicely12.cicely.de>
	<20031021152351.GB1438@genius.tao.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20031021152351.GB1438@genius.tao.org.uk>
User-Agent: Mutt/1.4.1i
Subject: Re: Problems with NFS (client) under 5.1.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 16:27:47 -0000

* Josef Karthauser <joe@FreeBSD.org> [031021 08:24] wrote:
> On Tue, Oct 21, 2003 at 03:33:37PM +0200, Bernd Walter wrote:
> > 
> > You are root - and root is often mapped to nobody on the server.
> > Are you shure that nobody is allowed to see?
> > The ls -ld /mnt case is strange, but /mnt is already on the server
> > namespace.
> > 
> 
> The linux boxes on the network don't appear to have any problems.
> Either way why is the /mnt entry disappearing?

I saw this before with QNX server and FreeBSD client as well.  Same
behavior, root ok, other users not.  I was nearly a year ago, but
I haven't seen a fix go by either so...

-Alfred

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 09:33:03 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4729516A4C0; Tue, 21 Oct 2003 09:33:03 -0700 (PDT)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id A29B643FBF; Tue, 21 Oct 2003 09:33:01 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (localhost [127.0.0.1])
	by fledge.watson.org (8.12.9p2/8.12.9) with ESMTP id h9LGWCMg031433;
	Tue, 21 Oct 2003 12:32:12 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Received: from localhost (robert@localhost)h9LGWChh031430;
	Tue, 21 Oct 2003 12:32:12 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Tue, 21 Oct 2003 12:32:12 -0400 (EDT)
From: Robert Watson <rwatson@FreeBSD.org>
X-Sender: robert@fledge.watson.org
To: Josef Karthauser <joe@FreeBSD.org>
In-Reply-To: <20031021120918.GC15345@genius.tao.org.uk>
Message-ID: <Pine.NEB.3.96L.1031021123049.31330B-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-fs@FreeBSD.org
cc: current@FreeBSD.org
Subject: Re: Problems with NFS (client) under 5.1.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 16:33:03 -0000


On Tue, 21 Oct 2003, Josef Karthauser wrote:

> I'm trying to set a FreeBSD 5.1 machine up as an NFS client.  The
> server is on an SGI box.  Things are strange:

Any chance you could grab a copy of ethereal and do a bit of on-the-wire
inspection of the RPCs?  It would be interesting to know which of the
requests are serviced out of the client cache, and which make it to the
server.  It would also be interesting to see if you can see failures in
the wire protocol, or if they're purely an artifact of the client.

Also, can you confirm the Linux and FreeBSD clients are both using the
same version of NFS with similar protocol settings (i.e., NFSv3 over UDP).

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories


From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 09:39:59 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 49ACD16A4B3; Tue, 21 Oct 2003 09:39:59 -0700 (PDT)
Received: from obsecurity.dyndns.org
	(adsl-63-207-60-234.dsl.lsan03.pacbell.net [63.207.60.234])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 47DAF43FAF; Tue, 21 Oct 2003 09:39:58 -0700 (PDT)
	(envelope-from kris@obsecurity.org)
Received: from rot13.obsecurity.org (rot13.obsecurity.org [10.0.0.5])
	by obsecurity.dyndns.org (Postfix) with ESMTP
	id 11BCD66C9E; Tue, 21 Oct 2003 09:39:58 -0700 (PDT)
Received: by rot13.obsecurity.org (Postfix, from userid 1000)
	id F11725C4; Tue, 21 Oct 2003 09:39:57 -0700 (PDT)
Date: Tue, 21 Oct 2003 09:39:57 -0700
From: Kris Kennaway <kris@obsecurity.org>
To: Robert Watson <rwatson@FreeBSD.org>
Message-ID: <20031021163957.GA66248@rot13.obsecurity.org>
References: <20031021120918.GC15345@genius.tao.org.uk>
	<Pine.NEB.3.96L.1031021123049.31330B-100000@fledge.watson.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="DocE+STaALJfprDB"
Content-Disposition: inline
In-Reply-To: <Pine.NEB.3.96L.1031021123049.31330B-100000@fledge.watson.org>
User-Agent: Mutt/1.4.1i
cc: Josef Karthauser <joe@FreeBSD.org>
cc: freebsd-fs@FreeBSD.org
cc: current@FreeBSD.org
Subject: Re: Problems with NFS (client) under 5.1.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 16:39:59 -0000


--DocE+STaALJfprDB
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 21, 2003 at 12:32:12PM -0400, Robert Watson wrote:
>=20
> On Tue, 21 Oct 2003, Josef Karthauser wrote:
>=20
> > I'm trying to set a FreeBSD 5.1 machine up as an NFS client.  The
> > server is on an SGI box.  Things are strange:
>=20
> Any chance you could grab a copy of ethereal and do a bit of on-the-wire
> inspection of the RPCs?  It would be interesting to know which of the
> requests are serviced out of the client cache, and which make it to the
> server.  It would also be interesting to see if you can see failures in
> the wire protocol, or if they're purely an artifact of the client.
>=20
> Also, can you confirm the Linux and FreeBSD clients are both using the
> same version of NFS with similar protocol settings (i.e., NFSv3 over UDP).

Does Linux do NFSv3 yet?  I thought that at least until recently there
were stability issues and it was recommended it not be used.

Kris

--DocE+STaALJfprDB
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/lWFdWry0BWjoQKURAvplAKDQ6j7JNVHpjTZMGNlvnHwKLp3x0wCgzyTK
kGBIwjDdgO32OTYiPvY9i3g=
=KLkc
-----END PGP SIGNATURE-----

--DocE+STaALJfprDB--

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 10:12:48 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7328816A4B3
	for <freebsd-fs@FreeBSD.org>; Tue, 21 Oct 2003 10:12:48 -0700 (PDT)
Received: from web14107.mail.yahoo.com (web14107.mail.yahoo.com
	[216.136.172.137])
	by mx1.FreeBSD.org (Postfix) with SMTP id 97B4B43FA3
	for <freebsd-fs@FreeBSD.org>; Tue, 21 Oct 2003 10:12:46 -0700 (PDT)
	(envelope-from cguttesen@yahoo.dk)
Message-ID: <20031021171246.64372.qmail@web14107.mail.yahoo.com>
Received: from [194.248.174.33] by web14107.mail.yahoo.com via HTTP;
	Tue, 21 Oct 2003 19:12:46 CEST
Date: Tue, 21 Oct 2003 19:12:46 +0200 (CEST)
From: =?iso-8859-1?q?Claus=20Guttesen?= <cguttesen@yahoo.dk>
To: Kris Kennaway <kris@obsecurity.org>,
	Robert Watson <rwatson@FreeBSD.org>
In-Reply-To: <20031021163957.GA66248@rot13.obsecurity.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
cc: Josef Karthauser <joe@FreeBSD.org>
cc: freebsd-fs@FreeBSD.org
cc: current@FreeBSD.org
Subject: Re: Problems with NFS (client) under 5.1.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 17:12:48 -0000

hi.

> 
> Does Linux do NFSv3 yet?  I thought that at least
> until recently there
> were stability issues and it was recommended it not
> be used.
> 

I had some problems with stale NFS handle when
NFS-mounting two FreeBSD 5.1 client (one with the
frozen 5.1 and one as of Oct. 10'th) to a Linux server
with ReiserFS.

When I mounted with ver. 2 the problems went away on
the FreeBSD with source from Oct. 10'th and have less
stale NFS handles with the frozen 5.1.

regards
Claus


Yahoo! Mail (http://dk.mail.yahoo.com) - Gratis: 6 MB lagerplads, spamfilter og virusscan

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 15:54:42 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5844816A4B3
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 15:54:42 -0700 (PDT)
Received: from mail.siscom.net (mail2.siscom.net [209.251.18.174])
	by mx1.FreeBSD.org (Postfix) with SMTP id 4760A43F75
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 15:54:41 -0700 (PDT)
	(envelope-from support@newshosting.com)
Received: (qmail 44142 invoked by uid 1005); 21 Oct 2003 22:54:27 -0000
Received: from support@newshosting.com by mail.siscom.net by uid 0 with
	qmail-scanner-1.14  (f-prot: 3.12.  Clear:. 
	Processed in 0.224603 secs); 21 Oct 2003 22:54:27 -0000
X-Qmail-Scanner-Mail-From: support@newshosting.com via mail.siscom.net
X-Qmail-Scanner: 1.14 (Clear:. Processed in 0.224603 secs)
Received: from unknown (HELO newshosting.com) (209.251.6.250)
  by mail.siscom.net with SMTP; 21 Oct 2003 22:54:26 -0000
Message-ID: <3F95B946.8010309@newshosting.com>
Date: Tue, 21 Oct 2003 18:55:02 -0400
From: NH Support <support@newshosting.com>
Organization: Newshosting.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.5b) Gecko/20030914 Thunderbird/0.3a
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Subject: >1 systems 1 FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 22:54:42 -0000

Hello,

I'm working on a new cluster design and had a quick question. If I have 
a few boxes mounting the same FS (over a SAN) all read-only will it 
work? Will I have any trouble? Has anyone tried this with UFS/UFS2 ..

Lets take it one step further.. lets say I have 1 box that mounts it 
RW.. and it updates the contents .. will the other systems that have it 
mounted RO puke?

-j
SISCOM

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 16:34:14 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B78D216A4B3
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 16:34:14 -0700 (PDT)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 510CD43F93
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 16:34:14 -0700 (PDT)
	(envelope-from bright@elvis.mu.org)
Received: by elvis.mu.org (Postfix, from userid 1192)
	id 46D4D2ED445; Tue, 21 Oct 2003 16:34:14 -0700 (PDT)
Date: Tue, 21 Oct 2003 16:34:14 -0700
From: Alfred Perlstein <bright@mu.org>
To: NH Support <support@newshosting.com>
Message-ID: <20031021233414.GJ99943@elvis.mu.org>
References: <3F95B946.8010309@newshosting.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3F95B946.8010309@newshosting.com>
User-Agent: Mutt/1.4.1i
cc: freebsd-fs@freebsd.org
Subject: Re: >1 systems 1 FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 23:34:14 -0000

* NH Support <support@newshosting.com> [031021 15:55] wrote:
> Hello,
> 
> I'm working on a new cluster design and had a quick question. If I have 
> a few boxes mounting the same FS (over a SAN) all read-only will it 
> work? Will I have any trouble? Has anyone tried this with UFS/UFS2 ..

You shouldn't.

> Lets take it one step further.. lets say I have 1 box that mounts it 
> RW.. and it updates the contents .. will the other systems that have it 
> mounted RO puke?

Likely.

-- 
- Alfred Perlstein
- Research Engineering Development Inc.
- email: bright@mu.org cell: 408-480-4684

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 16:53:01 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DA5FC16A4B3
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 16:53:01 -0700 (PDT)
Received: from mail.siscom.net (mail2.siscom.net [209.251.18.174])
	by mx1.FreeBSD.org (Postfix) with SMTP id A7C7243F85
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 16:53:00 -0700 (PDT)
	(envelope-from radams@siscom.net)
Received: (qmail 2568 invoked by uid 1005); 21 Oct 2003 23:52:47 -0000
Received: from radams@siscom.net by mail.siscom.net by uid 0 with
	qmail-scanner-1.14  (f-prot: 3.12.  Clear:. 
	Processed in 0.104372 secs); 21 Oct 2003 23:52:47 -0000
X-Qmail-Scanner-Mail-From: radams@siscom.net via mail.siscom.net
X-Qmail-Scanner: 1.14 (Clear:. Processed in 0.104372 secs)
Received: from unknown (HELO siscom.net) (209.251.6.250)
  by mail.siscom.net with SMTP; 21 Oct 2003 23:52:47 -0000
Message-ID: <3F95C6F3.8030005@siscom.net>
Date: Tue, 21 Oct 2003 19:53:23 -0400
From: "Robert J. Adams (jason)" <radams@siscom.net>
Organization: Newshosting.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.5b) Gecko/20030914 Thunderbird/0.3a
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <3F95B946.8010309@newshosting.com>
	<20031021233414.GJ99943@elvis.mu.org>
In-Reply-To: <20031021233414.GJ99943@elvis.mu.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: >1 systems 1 FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Oct 2003 23:53:02 -0000

Alfred Perlstein wrote:

>>Hello,
>>
>>I'm working on a new cluster design and had a quick question. If I have 
>>a few boxes mounting the same FS (over a SAN) all read-only will it 
>>work? Will I have any trouble? Has anyone tried this with UFS/UFS2 ..
> 
> 
> You shouldn't.

I shouldn't do this or I shouldn't have trouble? :)

>>Lets take it one step further.. lets say I have 1 box that mounts it 
>>RW.. and it updates the contents .. will the other systems that have it 
>>mounted RO puke?
> 
> 
> Likely.

Well shit.. I need this.

-j
SISCOM

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 17:08:57 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9C3D516A4B3
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 17:08:57 -0700 (PDT)
Received: from quic.net (rrcs-central-24-123-205-180.biz.rr.com
	[24.123.205.180])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D470543F85
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 17:08:54 -0700 (PDT)
	(envelope-from utsl@quic.net)
Received: from localhost (localhost [127.0.0.1])
  (uid 1032)
  by quic.net with local; Tue, 21 Oct 2003 20:08:53 -0400
Date: Tue, 21 Oct 2003 20:08:53 -0400
To: "Robert J. Adams (jason)" <radams@siscom.net>
Message-ID: <20031022000853.GA409@quic.net>
References: <3F95B946.8010309@newshosting.com>
	<20031021233414.GJ99943@elvis.mu.org> <3F95C6F3.8030005@siscom.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
In-Reply-To: <3F95C6F3.8030005@siscom.net>
User-Agent: Mutt/1.3.28i
From: Nathan Hawkins <utsl@quic.net>
cc: freebsd-fs@freebsd.org
Subject: Re: >1 systems 1 FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 00:08:57 -0000

On Tue, Oct 21, 2003 at 07:53:23PM -0400, Robert J. Adams (jason) wrote:
> Alfred Perlstein wrote:
> 
> >>Hello,
> >>
> >>I'm working on a new cluster design and had a quick question. If I have 
> >>a few boxes mounting the same FS (over a SAN) all read-only will it 
> >>work? Will I have any trouble? Has anyone tried this with UFS/UFS2 ..
> >
> >
> >You shouldn't.
> 
> I shouldn't do this or I shouldn't have trouble? :)

No, you can get away with _all_ read only. It's the part where you mount
RW somewhere that causes trouble. There is a little problem of cache
coherency.

> >>Lets take it one step further.. lets say I have 1 box that mounts it 
> >>RW.. and it updates the contents .. will the other systems that have it 
> >>mounted RO puke?
> >
> >
> >Likely.
> 
> Well shit.. I need this.

There are some options:
1. Go to NAS, and use NFS.
2. Switch OS to one that has a cluster filesystem
3. Implement a filesystem with cluster support
4. Don't use a filesystem, use devices, and work around the problem in
userspace

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 17:53:42 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 0371416A4B3
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 17:53:42 -0700 (PDT)
Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2ED3143FB1
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 17:53:41 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Received: from vicor.com (localhost [127.0.0.1])
	by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9M0mpT1015227;
	Tue, 21 Oct 2003 17:48:51 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Message-ID: <3F95D3F3.2050203@vicor.com>
Date: Tue, 21 Oct 2003 17:48:51 -0700
From: Ken Marx <kmarx@vicor.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: Ken Marx <kmarx@vicor.com>
cc: Cayford Burrell <cburrell@vicor.com>
cc: Julian Elischer <julian@vicor-nb.com>
cc: victor elischer <VicPE@aol.com>
cc: John Lynch <jpl@vicor.com>
cc: Josh Howard <jrh@vicor.com>
cc: Dave Parker Smith <davep@vicor.com>
Subject: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 00:53:42 -0000

Hi,

We have 560GB raids that were sometimes bogging down heavily
in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4)
we find that when:

	o the raid file system grows to over 85% capacity (with only
	  30% inode usage)
	o we create ~1500 or so 2-6kb files in a given dir
	o (note: soft updates NOT enabled)

We see:

	o 100% cpu utilization, all in system
	o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s

We profiled the kernel and found a large number of calls to ffs_alloc().
After many twisty pasages, we finally diff'd 4.4 with 4.8 ffs_alloc.c,
and found a major difference in the ffs_dirpref() call. Hacking the
4.4 logic back in 'fixed' the problem: We can now fill the /raid
entirely with no real noticeable performance degradation.

The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain
things fairly clearly:

4.4 -  ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps:
--------------------------------------
 * The policy implemented by this algorithm is to select from
 * among those cylinder groups with above the average number of
 * free inodes, the one with the smallest number of directories.

4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon:
-----------------------------------------
 * The policy implemented by this algorithm is to allocate a
 * directory inode in the same cylinder group as its parent
 * directory, but also to reserve space for its files inodes
 * and data. Restrict the number of directories which may be
 * allocated one after another in the same cylinder group
 * without intervening allocation of files.
 *
 * If we allocate a first level directory then force allocation
 * in another cylinder group.

For us, the 4.4 policy seems far superior, at least when the file system
approches capacity.

We'd like to avoid local kernel hacks and keep with main line
FreeBSD code. Is there some way that the old policy can be supported,
perhaps via a tunefs or sysctl type option?

Actually, if the new policy can be fixed up to avoid the problem, that
would of course be just as dandy.

Thanks very much,
k
-- 
Ken Marx, kmarx@vicor-nb.com
We need to hit the nail on the head and set the agenda regarding total 
quality.
		- http://www.bigshed.com/cgi-bin/speak.cgi

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 18:06:10 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4A55E16A4B3
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 18:06:10 -0700 (PDT)
Received: from imf18aec.mail.bellsouth.net (imf18aec.mail.bellsouth.net
	[205.152.59.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 276EB43FD7
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 18:06:09 -0700 (PDT)
	(envelope-from drhodus@catpa.com)
Received: from catpa.com ([68.209.168.6]) by imf18aec.mail.bellsouth.net
          (InterMail vM.5.01.05.27 201-253-122-126-127-20021220) with ESMTP
          id <20031022010604.XGWR1849.imf18aec.mail.bellsouth.net@catpa.com>;
          Tue, 21 Oct 2003 21:06:04 -0400
Date: Tue, 21 Oct 2003 21:06:04 -0400
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v552)
To: Ken Marx <kmarx@vicor.com>
From: David Rhodus <drhodus@catpa.com>
In-Reply-To: <3F95D3F3.2050203@vicor.com>
Message-Id: <E9300747-042B-11D8-8810-00039380DD2C@catpa.com>
Content-Transfer-Encoding: 7bit
X-Mailer: Apple Mail (2.552)
cc: freebsd-fs@freebsd.org
cc: Cayford Burrell <cburrell@vicor.com>
cc: Julian Elischer <julian@vicor-nb.com>
cc: victor elischer <VicPE@aol.com>
cc: John Lynch <jpl@vicor.com>
cc: Josh Howard <jrh@vicor.com>
cc: Dave Parker Smith <davep@vicor.com>
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 01:06:10 -0000


On Tuesday, October 21, 2003, at 08:48 PM, Ken Marx wrote:

> Hi,
>
> We have 560GB raids that were sometimes bogging down heavily
> in our production systems. Under 4.8-RELEASE (recently upgrated from 
> 4.4)
> we find that when:
>
> 	o the raid file system grows to over 85% capacity (with only
> 	  30% inode usage)
> 	o we create ~1500 or so 2-6kb files in a given dir
> 	o (note: soft updates NOT enabled)

I have one question, why do you have softupdates turned off ?
With softupdates it could be possible to get faster writes than using
an async mount.

-DR

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 19:02:00 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id DFF8416A4B3
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 19:02:00 -0700 (PDT)
Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2F3CA43FB1
	for <freebsd-fs@freebsd.org>; Tue, 21 Oct 2003 19:01:58 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Received: from vicor.com (localhost [127.0.0.1])
	by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9M1v2T1016186;
	Tue, 21 Oct 2003 18:57:03 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Message-ID: <3F95E3EE.4070401@vicor.com>
Date: Tue, 21 Oct 2003 18:57:02 -0700
From: Ken Marx <kmarx@vicor.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: David Rhodus <drhodus@catpa.com>
References: <E9300747-042B-11D8-8810-00039380DD2C@catpa.com>
In-Reply-To: <E9300747-042B-11D8-8810-00039380DD2C@catpa.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: Cayford Burrell <cburrell@vicor.com>
cc: Julian Elischer <julian@vicor-nb.com>
cc: victor elischer <VicPE@aol.com>
cc: John Lynch <jpl@vicor.com>
cc: Josh Howard <jrh@vicor.com>
cc: Dave Parker Smith <davep@vicor.com>
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 02:02:01 -0000

Wow - quick reply. Thanks!

I dunno. Wasn't my idea.

I just quickly tried this: In the problem dirs it still bogs
down for about 20 seconds or so. I wish I could tell you
how long that was taking before, but I wasn't there for
that part, and I have to take off just now.

The systat -vmstat numbers look similar, but I don't
want to make any bold claims.

I'll re-disable soft updates, retest in the morning, and report back.

Thanks again,
k.

David Rhodus wrote:
> 
> On Tuesday, October 21, 2003, at 08:48 PM, Ken Marx wrote:
> 
>> Hi,
>>
>> We have 560GB raids that were sometimes bogging down heavily
>> in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4)
>> we find that when:
>>
>>     o the raid file system grows to over 85% capacity (with only
>>       30% inode usage)
>>     o we create ~1500 or so 2-6kb files in a given dir
>>     o (note: soft updates NOT enabled)
> 
> 
> I have one question, why do you have softupdates turned off ?
> With softupdates it could be possible to get faster writes than using
> an async mount.
> 
> -DR
> 
> 
> 

-- 
Ken Marx, kmarx@vicor-nb.com
They must reach agreement and stop beating around the bush on the long pole in 
the tent.
		- http://www.bigshed.com/cgi-bin/speak.cgi

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 21 22:28:46 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3B3B116A4B3
	for <freebsd-fs@FreeBSD.org>; Tue, 21 Oct 2003 22:28:46 -0700 (PDT)
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D7F2143F93
	for <freebsd-fs@FreeBSD.org>; Tue, 21 Oct 2003 22:28:41 -0700 (PDT)
	(envelope-from bde@zeta.org.au)
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id PAA05003;
	Wed, 22 Oct 2003 15:27:19 +1000
Date: Wed, 22 Oct 2003 15:25:58 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: David Rhodus <drhodus@catpa.com>
In-Reply-To: <E9300747-042B-11D8-8810-00039380DD2C@catpa.com>
Message-ID: <20031022145836.J21067@gamplex.bde.org>
References: <E9300747-042B-11D8-8810-00039380DD2C@catpa.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: Ken Marx <kmarx@vicor.com>
cc: freebsd-fs@FreeBSD.org
cc: Cayford Burrell <cburrell@vicor.com>
cc: Julian Elischer <julian@vicor-nb.com>
cc: victor elischer <VicPE@aol.com>
cc: John Lynch <jpl@vicor.com>
cc: Josh Howard <jrh@vicor.com>
cc: Dave Parker Smith <davep@vicor.com>
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 05:28:46 -0000

On Tue, 21 Oct 2003, David Rhodus wrote:

> I have one question, why do you have softupdates turned off ?
> With softupdates it could be possible to get faster writes than using
> an async mount.

Soft updates have never been faster than async for me, and in recent
simple tests (copying /usr/src) they have become significantly slower
than even ordinary (non-soft-update, non-sync, non-async mounts):

2002/05/30
----------
ffs-16384-2048-1:
tarcp /e src:                278.82 real         0.76 user        14.99 sys
ffs-16384-2048-as-1:
tarcp /e src:                180.39 real         0.73 user        13.69 sys
ffs-16384-2048-su-1:
tarcp /e src:                181.98 real         0.69 user        13.81 sys

2003/09/23
----------
ffs-16384-02048-1:
tarcp /f src:                 68.66 real         0.82 user        13.81 sys
ffs-16384-02048-as-1:
tarcp /f src:                 41.09 real         0.83 user        11.25 sys
ffs-16384-02048-su-1:
tarcp /f src:                111.62 real         0.82 user        11.49 sys

ffs-16384-02048-1 means ffs with a block size of 16384, a fragment size of
2048, soft updates and ffs^WUFS1, etc. (there was no ffs2 at the time of the
old benchmark and the "-1" in it actually meant doreallocblks=1).

The machine is Athlon 1600XP overclocked running -current at the time with
only the following major changes:
- main memory increased from 512MB to 1024MB.  This increases the relevance
  of the test as a write benchmark by giving enough memory to keep the
  source of the copy cached.
- target disk changed from an IBM-DTLA-307030 (30MB ATA) to an
  IC35L060AVVA07-0 (60MB ATA).  The file system had size 13GB in both cases
  and was at a similar offset in the disks; this puts it closer to the outer
  tracks on the larger disk so accesses to it were faster (approx. 40MB/sec
  vs 29 MB/sec)
Most of the real times were improved significantly by the hardware changes,
but for some reason soft updates didn't benefit as much as the others.  This
behaviour is not dependent on the block/frag sizes or ffs1/2.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 22 01:44:02 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 03A5A16A4B3
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 01:44:02 -0700 (PDT)
Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net
	[207.217.120.189])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 787CD43FBD
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 01:44:00 -0700 (PDT)
	(envelope-from tlambert2@mindspring.com)
Received: from user-2ivfjup.dialup.mindspring.com ([165.247.207.217]
	helo=mindspring.com)
	by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128)
	(Exim 3.33 #1)	id 1ACEaz-0002BE-00; Wed, 22 Oct 2003 01:43:58 -0700
Message-ID: <3F96431E.A30656E3@mindspring.com>
Date: Wed, 22 Oct 2003 01:43:10 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.79 [en] (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: "Robert J. Adams (jason)" <radams@siscom.net>
References: <3F95B946.8010309@newshosting.com>
	<20031021233414.GJ99943@elvis.mu.org> <3F95C6F3.8030005@siscom.net>
Content-Type: text/plain; charset=big5
Content-Transfer-Encoding: 7bit
X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a494b6a1359814014566978bc4660100a9350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
cc: freebsd-fs@freebsd.org
Subject: Re: >1 systems 1 FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 08:44:02 -0000

"Robert J. Adams (jason)" wrote:
> Alfred Perlstein wrote:
> >>Hello,
> >>
> >>I'm working on a new cluster design and had a quick question. If I have
> >>a few boxes mounting the same FS (over a SAN) all read-only will it
> >>work? Will I have any trouble? Has anyone tried this with UFS/UFS2 ..
> >
> > You shouldn't.
> 
> I shouldn't do this or I shouldn't have trouble? :)
> 
> >>Lets take it one step further.. lets say I have 1 box that mounts it
> >>RW.. and it updates the contents .. will the other systems that have it
> >>mounted RO puke?
> >
> >
> > Likely.
> 
> Well shit.. I need this.

Then you need a new FS.

The issue is that you effectively need block-level or range of
blocks locking on the device over the shared interface wire to
be able to do this effectively, since a device that is a target
of multiple master devices has to know who to permit onto the
blocks and who not to permit onto the blocks.

Firewire was supposed to fix this, and so was SCSI 3.  The parts
of the SCSI 3 standard that deal with this particular issue have
not been finalized, because each device vendor is jockeying to
get their implementation standardized to get a jump on all the
other vendors, instead of cooperating on establishing an open
standard.  This is one of the main reasons that the SCSI 3
standard is not yet final (the other main reason is that a number
of the participants also sell IDE disks, and whatever's bad for
SCSI is good for IDE, so they are being obstructionist jerks
because they can).

There are a number of FS implementations that can deal with this,
however, and they way they deal with this is by implementing an
out-of-device-control-band block-level or range of blocks locking
protocol, usually over ethernet, to ensure that they can get
exclusive access to the blocks.  Usually, this is implemented as
multiple reader, single writer locking, with the ability to go
exclusive ("SIX locking" -- "Shared Intention eXclusive"; look
for it in your favorite search engine).

Obviously, doing this in-band with explicit enforcement, and no
issue of inter-node failure recovery being necessary because the
locks are stored in the physical device (i.e. the SCSI 3 approach)
would have significant performance benefits over the external lock
manager that relies on the machines voluntarily participating and
not going down.

One example of an FS that can do this is GFS, from Sistina; they
used to have an open-source version (under the GPL), but appear
to have since come to their senses.  I ported all the user space
tools for GFS to FreeBSD in about 4 hours of work one night, when
it was still available under the GPL.  See their propaganda at:

	http://www.sistina.com/products_gfs.htm

IBM also has two FS's that can do this, but they don't even run
on Linux, let alone FreeBSD.

In theory, SGI CXFS will also do this (I haven't gotten enough
information from non-proprietary channels to be able to disclose
much here and be on sound legal footing).


Another company that had a product in this space was Zambeel; they
were a Fremont startup, and, among other people, they had hired
Mohit Aron from Rice University (he did the ResCon LRP implementation
and was associated with the SCALA Server project and Peter Druschel's
group).  The company showed a lot of promise, but apparently burnt
all it's first round money to the tune $65M at the rate of $1M/month,
with only 90 people in headcount a little more than a year ago.

Unfortunately, they croaked last April:
http://www.byteandswitch.com/document.asp?doc_id=31886&site=byteandswitch
and it's not likely that anyone will be jumping into the space very
soon, since it hasn't been very profitable for the companies trying
to stake out territory there.


Anyway, the normal way this is handled for SAN/NAS devices is
to carve out a logical volume region on a per-machine basis, and
forget the locking altogether (giving a management node "ownership"
of the "as yet unallocated regions"), which avoid contention by
separation of the contention domain entirely.  Not a very
satisfying way of doing it, if you ask me.

-- Terry

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 22 03:28:33 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2962016A4B3
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 03:28:33 -0700 (PDT)
Received: from frontend3.aha.ru (frontend3.aha.ru [195.2.83.143])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C620643F3F
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 03:28:31 -0700 (PDT)
	(envelope-from uitm@zmail.ru)
Received: from [195.2.83.134] (HELO backend4.aha.ru)
  by frontend3.aha.ru (CommuniGate Pro SMTP 4.1.5)
  with ESMTP id 44190818; Wed, 22 Oct 2003 14:28:30 +0400
Received: from [193.125.99.9] (account uitm@zmail.ru)
  by backend4.aha.ru (CommuniGate Pro WebUser 4.1.5)
  with HTTP id 16445456; Wed, 22 Oct 2003 14:28:30 +0400
From: "Andrey Alekseyev" <uitm@zmail.ru>
To: "Robert J. Adams (jason)" <radams@siscom.net>,
	"Terry Lambert" <tlambert2@mindspring.com>
X-Mailer: CommuniGate Pro WebUser Interface v.4.1.5
Date: Wed, 22 Oct 2003 14:28:30 +0400
Message-ID: <web-16445456@backend4.aha.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset="KOI8-R"; format="flowed"
Content-Transfer-Encoding: 8bit
cc: freebsd-fs@freebsd.org
Subject: 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 10:28:33 -0000

>One example of an FS that can do this is GFS, from Sistina; they
>used to have an open-source version (under the GPL), but appear

An excellent implementation of an extremely cost-effective 
yet
very flexible and powerful shared-FS could be DataPlow SFS
(see http://www.dataplow.com).

DataPlow SFS is available for Solaris (both Intel and 
Sparc), Irix
and Linux and is the ultimate product of its class.
---
Professional hosting for everyone - http://www.host.ru

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 22 12:57:53 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id EBD2616A4B3
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 12:57:53 -0700 (PDT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6065143F93
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 12:57:53 -0700 (PDT)
	(envelope-from julian@vicor.com)
Received: by mail.vicor-nb.com (Postfix, from userid 1058)
	id 27C707A49F; Wed, 22 Oct 2003 12:57:53 -0700 (PDT)
To: freebsd-fs@freebsd.org, kmarx@vicor.com, mckusick@mckusick.com
In-Reply-To: <3F95D3F3.2050203@vicor.com>
Message-Id: <20031022195753.27C707A49F@mail.vicor-nb.com>
Date: Wed, 22 Oct 2003 12:57:53 -0700 (PDT)
From: julian@vicor.com (Julian Elischer)
cc: cburrell@vicor.com
cc: julian@vicor-nb.com
cc: VicPE@aol.com
cc: jpl@vicor.com
cc: jrh@vicor.com
cc: davep@vicor.com
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 19:57:54 -0000

Kirk?, 
I'm away in easwtern europe on the end of a wet bit of string....


>From kmarx@vicor.com  Tue Oct 21 17:53:42 2003
	X-Original-To: julian@vicor-nb.com
	Delivered-To: julian@vicor-nb.com
	Date: Tue, 21 Oct 2003 17:48:51 -0700
	From: Ken Marx <kmarx@vicor.com>
	User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402
	X-Accept-Language: en-us, en
	MIME-Version: 1.0
	To: freebsd-fs@freebsd.org
	Cc: Julian Elischer <julian@vicor-nb.com>,
		John Lynch <jpl@vicor.com>, Dave Parker Smith <davep@vicor.com>,
		Cayford Burrell <cburrell@vicor.com>,
		victor elischer <VicPE@aol.com>, Josh Howard <jrh@vicor.com>,
		Ken Marx <kmarx@vicor.com>
	Subject: 4.8 ffs_dirpref problem
	Content-Type: text/plain; charset=us-ascii; format=flowed
	Content-Transfer-Encoding: 7bit

	Hi,

	We have 560GB raids that were sometimes bogging down heavily
	in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4)
	we find that when:

		o the raid file system grows to over 85% capacity (with only
		  30% inode usage)
		o we create ~1500 or so 2-6kb files in a given dir
		o (note: soft updates NOT enabled)

	We see:

		o 100% cpu utilization, all in system
		o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s

	We profiled the kernel and found a large number of calls to ffs_alloc().
	After many twisty pasages, we finally diff'd 4.4 with 4.8 ffs_alloc.c,
	and found a major difference in the ffs_dirpref() call. Hacking the
	4.4 logic back in 'fixed' the problem: We can now fill the /raid
	entirely with no real noticeable performance degradation.

	The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain
	things fairly clearly:

	4.4 -  ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps:
	--------------------------------------
	 * The policy implemented by this algorithm is to select from
	 * among those cylinder groups with above the average number of
	 * free inodes, the one with the smallest number of directories.

	4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon:
	-----------------------------------------
	 * The policy implemented by this algorithm is to allocate a
	 * directory inode in the same cylinder group as its parent
	 * directory, but also to reserve space for its files inodes
	 * and data. Restrict the number of directories which may be
	 * allocated one after another in the same cylinder group
	 * without intervening allocation of files.
	 *
	 * If we allocate a first level directory then force allocation
	 * in another cylinder group.

	For us, the 4.4 policy seems far superior, at least when the file system
	approches capacity.

	We'd like to avoid local kernel hacks and keep with main line
	FreeBSD code. Is there some way that the old policy can be supported,
	perhaps via a tunefs or sysctl type option?

	Actually, if the new policy can be fixed up to avoid the problem, that
	would of course be just as dandy.

	Thanks very much,
	k
	-- 
	Ken Marx, kmarx@vicor-nb.com
	We need to hit the nail on the head and set the agenda regarding total 
	quality.
			- http://www.bigshed.com/cgi-bin/speak.cgi


From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 22 15:52:35 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 945FF16A4B3
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 15:52:35 -0700 (PDT)
Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A9DBD43F3F
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 15:52:32 -0700 (PDT)
	(envelope-from dwhite@gumbysoft.com)
Received: by carver.gumbysoft.com (Postfix, from userid 1000)
	id 9BB6172DA8; Wed, 22 Oct 2003 15:52:32 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
	by carver.gumbysoft.com (Postfix) with ESMTP
	id 99A6C72DA3; Wed, 22 Oct 2003 15:52:32 -0700 (PDT)
Date: Wed, 22 Oct 2003 15:52:32 -0700 (PDT)
From: Doug White <dwhite@gumbysoft.com>
To: Ken Marx <kmarx@vicor.com>
In-Reply-To: <3F95D3F3.2050203@vicor.com>
Message-ID: <20031022154631.H71676@carver.gumbysoft.com>
References: <3F95D3F3.2050203@vicor.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-fs@freebsd.org
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Oct 2003 22:52:35 -0000

Purge extensive cc:.

On Tue, 21 Oct 2003, Ken Marx wrote:

> We have 560GB raids that were sometimes bogging down heavily
> in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4)
> we find that when:
>
> 	o the raid file system grows to over 85% capacity (with only
> 	  30% inode usage)
> 	o we create ~1500 or so 2-6kb files in a given dir
> 	o (note: soft updates NOT enabled)

Interesting problems and analysis.

If I'm reading the diffs and source right, the old allocation algorithm
exists at the end of the dirpref function, below the comment about the
backstop.  It would be interesting to wrap the rest of the function in a
tunable so you could easily short-circuit to the backstop.

I don't know if it could be done on a per-filesystem basis. You might just
have to eat the old layout semantics for the entire system if we want to
keep the cost of the tunable low.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 22 18:58:50 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 192B716A4B3
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 18:58:50 -0700 (PDT)
Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4B90143FB1
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 18:58:49 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Received: from vicor.com (localhost [127.0.0.1])
	by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9N1ruT1036900;
	Wed, 22 Oct 2003 18:53:56 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Message-ID: <3F9734B4.2050201@vicor.com>
Date: Wed, 22 Oct 2003 18:53:56 -0700
From: Ken Marx <kmarx@vicor.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Doug White <dwhite@gumbysoft.com>
References: <3F95D3F3.2050203@vicor.com>
	<20031022154631.H71676@carver.gumbysoft.com>
In-Reply-To: <20031022154631.H71676@carver.gumbysoft.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-fs@freebsd.org
cc: Ken Marx <kmarx@vicor.com>
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2003 01:58:50 -0000

Thanks very much for the reply. (Sorry about the cc list - I'm
working with others on this, but will forward things on from now on.)

It seems to me the old source differs just slightly from the
new 'backstop'. Old dirpref has an additional check for

	if (fs->fs_cs(fs, cg).cs_ndir < minndir 

I don't know how relevant this is. I could hack things to
only do the backstop and see empircally what happens.
Open to suggestions.

To recap what we did/found today:

1. Soft updates on non-hacked ffs_dirpref() does seem to improve
   things significantly, but not quite as much as using 4.4 code.

2. Talking with others here: We're a bit scared to enable soft updates
   in production. These are 7x24 sites. If there's a crash, we're
   concerned that at the very least, downtime could increase due to fsck.
   And at worst, more data could be lost.

3. Strangely, I can't find ffs_dirpref in the kgmon call graph. I
   even made it a non-static function. Hm...  Don't quite grok this.

4. I haven't looked into where tunefs (et al) store their info (superblock(s)?).
   Presuming for the sake of argument that making the backstop
   code configurable is a reasonable approach, might we have room
   to store this as a file system attribute?

5. We discussed why others might not be seeing this elsewhere. The guess
   is that it *does* happen, but that typical file systems aren't 550Gb.
   Hence the cost of the linear fallback cycl group searches isn't as
   noticeable. (I can try to give rationale for our large fs, but anyway,
   it's what we have.)

   I suppose one approach (uh, hack) is to do the backstop code first
   under certain extreme conditions (such as huge number of cyl groups)?

6. Open to ideas of where to instrument ffs a bit more. E.g., counters
   in ffs_alloc() for which strategies it uses or some such? Conditional upon?

7. Are there other tunefs settings that might help? We tried changing
   avg files/dir (-s) from 64 to 4096 since we often have >> 64.
   Results were varried: Sometimes things went more quickly, but we
   still often saw the very sluggish behavior.


Again thanks. And thanks in advance for any further guidance.

regards,
k


Doug White wrote:
> Purge extensive cc:.
> 
> On Tue, 21 Oct 2003, Ken Marx wrote:
> 
> 
>>We have 560GB raids that were sometimes bogging down heavily
>>in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4)
>>we find that when:
>>
>>	o the raid file system grows to over 85% capacity (with only
>>	  30% inode usage)
>>	o we create ~1500 or so 2-6kb files in a given dir
>>	o (note: soft updates NOT enabled)
> 
> 
> Interesting problems and analysis.
> 
> If I'm reading the diffs and source right, the old allocation algorithm
> exists at the end of the dirpref function, below the comment about the
> backstop.  It would be interesting to wrap the rest of the function in a
> tunable so you could easily short-circuit to the backstop.
> 
> I don't know if it could be done on a per-filesystem basis. You might just
> have to eat the old layout semantics for the entire system if we want to
> keep the cost of the tunable low.
> 

-- 
Ken Marx, kmarx@vicor-nb.com
Speaking candidly, I say that we intend to do the right thing and maintain our 
commitment to the impedance match.
		- http://www.bigshed.com/cgi-bin/speak.cgi

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 22 22:30:04 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4814716A4B3
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 22:30:04 -0700 (PDT)
Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7578A43FAF
	for <freebsd-fs@freebsd.org>; Wed, 22 Oct 2003 22:30:03 -0700 (PDT)
	(envelope-from mckusick@beastie.mckusick.com)
Received: from beastie.mckusick.com (localhost [127.0.0.1])
	by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h9MNbseN005704;
	Wed, 22 Oct 2003 16:37:54 -0700 (PDT)
	(envelope-from mckusick@beastie.mckusick.com)
Message-Id: <200310222337.h9MNbseN005704@beastie.mckusick.com>
To: Ken Marx <kmarx@vicor.com>
In-Reply-To: Your message of "Wed, 22 Oct 2003 12:57:53 PDT."
             <20031022195753.27C707A49F@mail.vicor-nb.com> 
Date: Wed, 22 Oct 2003 16:37:54 -0700
From: Kirk McKusick <mckusick@beastie.mckusick.com>
X-Mailman-Approved-At: Thu, 23 Oct 2003 06:57:01 -0700
cc: freebsd-fs@freebsd.org
cc: cburrell@vicor.com
cc: julian@vicor-nb.com
cc: julian@vicor.com
cc: VicPE@aol.com
cc: jpl@vicor.com
cc: Grigoriy Orlov <gluk@ptci.ru>
cc: jrh@vicor.com
cc: davep@vicor.com
Subject: Re: 4.8 ffs_dirpref problem 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2003 05:30:04 -0000

I believe that you can dsolve your problem by tuning the existing
algorithm using tunefs. There are two parameters to control dirpref,
avgfilesize (which defaults to 16384) and filesperdir (which defaults
to 50). I suggest that you try using an avgfilesize of 4096 and
filesperdir of 1500. This is done by running tunefs on the unmounted
(or at least mounted read-only) filesystem as:

	tunefs -f 4096 -s 1500 /dev/<disk for my broken filesystem>

Note that this affects future layout, so needs to be done before you
put any data into the filesystem. If you are building the filesystem
from scratch, you can use:

	newfs -g 4096 -h 1500 ...

to set these fields. Please let me know if this solves your problem.
If it does not, I will ask Grigoriy Orlov <gluk@ptci.ru> if he has
any ideas on how to proceed.

	Kirk McKusick

=-=-=-=-=-=-=

> Date: Tue, 21 Oct 2003 17:48:51 -0700
> From: Ken Marx <kmarx@vicor.com>
> To: freebsd-fs@freebsd.org
> Cc: Julian Elischer <julian@vicor-nb.com>,
> 	John Lynch <jpl@vicor.com>, Dave Parker Smith <davep@vicor.com>,
> 	Cayford Burrell <cburrell@vicor.com>,
> 	victor elischer <VicPE@aol.com>, Josh Howard <jrh@vicor.com>,
> 	Ken Marx <kmarx@vicor.com>
> Subject: 4.8 ffs_dirpref problem
> 
> Hi,
> 
> We have 560GB raids that were sometimes bogging down heavily
> in our production systems. Under 4.8-RELEASE (recently
> upgrated from 4.4) we find that when:
> 
> 	o the raid file system grows to over 85% capacity (with only
> 	  30% inode usage)
> 	o we create ~1500 or so 2-6kb files in a given dir
> 	o (note: soft updates NOT enabled)
> 
> We see:
> 
> 	o 100% cpu utilization, all in system
> 	o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s
> 
> We profiled the kernel and found a large number of calls to
> ffs_alloc().  After many twisty pasages, we finally diff'd 4.4
> with 4.8 ffs_alloc.c, and found a major difference in the
> ffs_dirpref() call. Hacking the 4.4 logic back in 'fixed' the
> problem: We can now fill the /raid entirely with no real
> noticeable performance degradation.
> 
> The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain
> things fairly clearly:
> 
> 4.4 -  ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps:
> --------------------------------------
>  * The policy implemented by this algorithm is to select from
>  * among those cylinder groups with above the average number of
>  * free inodes, the one with the smallest number of directories.
> 
> 4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon:
> -----------------------------------------
>  * The policy implemented by this algorithm is to allocate a
>  * directory inode in the same cylinder group as its parent
>  * directory, but also to reserve space for its files inodes
>  * and data. Restrict the number of directories which may be
>  * allocated one after another in the same cylinder group
>  * without intervening allocation of files.
>  *
>  * If we allocate a first level directory then force allocation
>  * in another cylinder group.
> 
> For us, the 4.4 policy seems far superior, at least when the file system
> approches capacity.
> 
> We'd like to avoid local kernel hacks and keep with main line
> FreeBSD code. Is there some way that the old policy can be supported,
> perhaps via a tunefs or sysctl type option?
> 
> Actually, if the new policy can be fixed up to avoid the problem, that
> would of course be just as dandy.
> 
> Thanks very much,
> k
> -- 
> Ken Marx, kmarx@vicor-nb.com
> We need to hit the nail on the head and set the agenda regarding total 
> quality.
> 		- http://www.bigshed.com/cgi-bin/speak.cgi

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 23 08:01:51 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BF1AC16A4B3
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 08:01:51 -0700 (PDT)
Received: from perrin.nxad.com (internal.nxad.com [69.1.70.251])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2FAA343FE3
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 08:01:51 -0700 (PDT)
	(envelope-from hmp@nxad.com)
Received: by perrin.nxad.com (Postfix, from userid 1072)
	id 5251921068; Thu, 23 Oct 2003 08:01:50 -0700 (PDT)
Date: Thu, 23 Oct 2003 08:01:50 -0700
From: Hiten Pandya <hmp@FreeBSD.ORG>
To: Terry Lambert <tlambert2@mindspring.com>
Message-ID: <20031023150150.GA46202@perrin.nxad.com>
References: <3F95B946.8010309@newshosting.com>
	<20031021233414.GJ99943@elvis.mu.org> <3F95C6F3.8030005@siscom.net>
	<3F96431E.A30656E3@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3F96431E.A30656E3@mindspring.com>
X-Operating-System: FreeBSD FreeBSD 4.7-STABLE
User-Agent: Mutt/1.5.4i
cc: freebsd-fs@freebsd.org
Subject: Re: >1 systems 1 FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2003 15:01:51 -0000

On Wed, Oct 22, 2003 at 01:43:10AM -0700, Terry Lambert wrote:
:
: [ ... ]
:
: One example of an FS that can do this is GFS, from Sistina; they
: used to have an open-source version (under the GPL), but appear
: to have since come to their senses.  I ported all the user space
: tools for GFS to FreeBSD in about 4 hours of work one night, when
: it was still available under the GPL.  See their propaganda at:
: 
: 	http://www.sistina.com/products_gfs.htm

	On the other hand, you could also check out the OpenGFS project
	which is still being worked on actively:

	http://opengfs.sourceforge.net/

	This is for Linux, ofcourse.
 
: Anyway, the normal way this is handled for SAN/NAS devices is
: to carve out a logical volume region on a per-machine basis, and
: forget the locking altogether (giving a management node "ownership"
: of the "as yet unallocated regions"), which avoid contention by
: separation of the contention domain entirely.  Not a very
: satisfying way of doing it, if you ask me.

	You could also checkout another interesting file system, called
	Lustre, located at http://www.lustre.org/, which could probably
	help your need.

	Regards,

-- 
Hiten Pandya
hmp@FreeBSD.ORG

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 23 10:19:33 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id D712F16A4B3
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 10:19:33 -0700 (PDT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D01C543FA3
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 10:19:32 -0700 (PDT)
	(envelope-from julian@vicor.com)
Received: by mail.vicor-nb.com (Postfix, from userid 1058)
	id 9F36C7A425; Thu, 23 Oct 2003 10:19:32 -0700 (PDT)
To: kmarx@vicor.com, mckusick@beastie.mckusick.com
In-Reply-To: <200310222337.h9MNbseN005704@beastie.mckusick.com>
Message-Id: <20031023171932.9F36C7A425@mail.vicor-nb.com>
Date: Thu, 23 Oct 2003 10:19:32 -0700 (PDT)
From: julian@vicor.com (Julian Elischer)
X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700
cc: freebsd-fs@freebsd.org
cc: cburrell@vicor.com
cc: davep@vicor.com
cc: julian@vicor.com
cc: VicPE@aol.com
cc: jpl@vicor.com
cc: gluk@ptci.ru
cc: jrh@vicor.com
cc: julian@vicor-nb.com
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2003 17:19:34 -0000

> From mckusick@beastie.mckusick.com  Wed Oct 22 22:30:03 2003
> X-Original-To: julian@vicor-nb.com
> Delivered-To: julian@vicor-nb.com
> To: Ken Marx <kmarx@vicor.com>
> Subject: Re: 4.8 ffs_dirpref problem 
> Cc: freebsd-fs@freebsd.org, cburrell@vicor.com, davep@vicor.com,
> 	jpl@vicor.com, jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com,
> 	julian@vicor.com, Grigoriy Orlov <gluk@ptci.ru>
> In-Reply-To: Your message of "Wed, 22 Oct 2003 12:57:53 PDT."
>              <20031022195753.27C707A49F@mail.vicor-nb.com> 
> Date: Wed, 22 Oct 2003 16:37:54 -0700
> From: Kirk McKusick <mckusick@beastie.mckusick.com>

> I believe that you can dsolve your problem by tuning the existing
> algorithm using tunefs. There are two parameters to control dirpref,
> avgfilesize (which defaults to 16384) and filesperdir (which defaults
> to 50). I suggest that you try using an avgfilesize of 4096 and
> filesperdir of 1500. This is done by running tunefs on the unmounted
> (or at least mounted read-only) filesystem as:

> 	tunefs -f 4096 -s 1500 /dev/<disk for my broken filesystem>

On the same filesystem are directories that contain 1GB files
and others that contain maybe 100 100K files (images)


> Note that this affects future layout, so needs to be done before you
> put any data into the filesystem. If you are building the filesystem
> from scratch, you can use:

would this have an effect on an existing filesystem with respect to new data
being added to it?


> 	newfs -g 4096 -h 1500 ...
>
> to set these fields. Please let me know if this solves your problem.
> If it does not, I will ask Grigoriy Orlov <gluk@ptci.ru> if he has
> any ideas on how to proceed.

> 	Kirk McKusick

> =-=-=-=-=-=-=

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 23 11:12:57 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BDABB16A4C0
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 11:12:57 -0700 (PDT)
Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81])
	by mx1.FreeBSD.org (Postfix) with ESMTP id F3EE443FB1
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 11:12:56 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Received: from vicor.com (localhost [127.0.0.1])
	by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9NI82T1050765;
	Thu, 23 Oct 2003 11:08:02 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Message-ID: <3F981902.90607@vicor.com>
Date: Thu, 23 Oct 2003 11:08:02 -0700
From: Ken Marx <kmarx@vicor.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Julian Elischer <julian@vicor.com>
References: <20031023171932.9F36C7A425@mail.vicor-nb.com>
In-Reply-To: <20031023171932.9F36C7A425@mail.vicor-nb.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700
cc: freebsd-fs@freebsd.org
cc: cburrell@vicor.com
cc: julian@vicor-nb.com
cc: davep@vicor.com
cc: VicPE@aol.com
cc: jpl@vicor.com
cc: gluk@ptci.ru
cc: jrh@vicor.com
cc: mckusick@beastie.mckusick.com
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2003 18:12:58 -0000

Thanks for the reply,

We actually *did* try -s 4096 yesterday (not quite what you suggested)
with spotty results: Sometimes it seemed to go more quickly, but often
not.

Let me clarify our test: We have a 1.5gb tar file from our production
raid that fairly represents the distribution of data. We hit the
performance problem when we get to dirs with lots of small-ish files.
But, as Julian mentioned, we typically have many flavors of file
sizes and populations.

Admittedly, our untar'ing test isn't necessarily representitive
of what happens in production - we were just trying to fill the
disk and recreate the problem here. We *did* at least hit a noticeable
problem, and we believe it's the same behavior that's hitting production.

I just tried your exact suggested settings on an fs that was
already 96% full, and still experienced the very sluggish
behavior on exactly the same type of files/dirs.

Our untar typically takes around 60-100 sec of system time
when things are going ok; 300-1000+ sec when the sluggishness occurs.
This time tends to increase as we get closer to 99%. Sometimes
as high as 4000+ secs.

I wasn't clear from your mail if I should newfs the entire
fs and start over, or if I could have expected the settings
to make a difference for any NEW data.

I can do this latter if you think it's required. The test will
then take several hours to run since we need at least 85% disk usage
to start seeing the problem.

Thanks!
k

Julian Elischer wrote:
>>From mckusick@beastie.mckusick.com  Wed Oct 22 22:30:03 2003
>>X-Original-To: julian@vicor-nb.com
>>Delivered-To: julian@vicor-nb.com
>>To: Ken Marx <kmarx@vicor.com>
>>Subject: Re: 4.8 ffs_dirpref problem 
>>Cc: freebsd-fs@freebsd.org, cburrell@vicor.com, davep@vicor.com,
>>	jpl@vicor.com, jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com,
>>	julian@vicor.com, Grigoriy Orlov <gluk@ptci.ru>
>>In-Reply-To: Your message of "Wed, 22 Oct 2003 12:57:53 PDT."
>>             <20031022195753.27C707A49F@mail.vicor-nb.com> 
>>Date: Wed, 22 Oct 2003 16:37:54 -0700
>>From: Kirk McKusick <mckusick@beastie.mckusick.com>
> 
> 
>>I believe that you can dsolve your problem by tuning the existing
>>algorithm using tunefs. There are two parameters to control dirpref,
>>avgfilesize (which defaults to 16384) and filesperdir (which defaults
>>to 50). I suggest that you try using an avgfilesize of 4096 and
>>filesperdir of 1500. This is done by running tunefs on the unmounted
>>(or at least mounted read-only) filesystem as:
> 
> 
>>	tunefs -f 4096 -s 1500 /dev/<disk for my broken filesystem>
> 
> 
> On the same filesystem are directories that contain 1GB files
> and others that contain maybe 100 100K files (images)
> 
> 
> 
>>Note that this affects future layout, so needs to be done before you
>>put any data into the filesystem. If you are building the filesystem
>>from scratch, you can use:
> 
> 
> would this have an effect on an existing filesystem with respect to new data
> being added to it?
> 
> 
> 
> 
> 
>>	newfs -g 4096 -h 1500 ...
>>
>>to set these fields. Please let me know if this solves your problem.
>>If it does not, I will ask Grigoriy Orlov <gluk@ptci.ru> if he has
>>any ideas on how to proceed.
> 
> 
>>	Kirk McKusick
> 
> 
>>=-=-=-=-=-=-=
> 
> 
> 

-- 
Ken Marx, kmarx@vicor-nb.com
It's too costly to get lean and mean and analyze progress on the diminishing 
expectations.
		- http://www.bigshed.com/cgi-bin/speak.cgi

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 23 12:46:28 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 6C86A16A4B3
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 12:46:28 -0700 (PDT)
Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9CDDF43FBD
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 12:46:27 -0700 (PDT)
	(envelope-from mckusick@beastie.mckusick.com)
Received: from beastie.mckusick.com (localhost [127.0.0.1])
	by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h9NJkQeN007683;
	Thu, 23 Oct 2003 12:46:26 -0700 (PDT)
	(envelope-from mckusick@beastie.mckusick.com)
Message-Id: <200310231946.h9NJkQeN007683@beastie.mckusick.com>
To: Ken Marx <kmarx@vicor.com>
In-Reply-To: Your message of "Thu, 23 Oct 2003 11:08:02 PDT."
             <3F981902.90607@vicor.com> 
Date: Thu, 23 Oct 2003 12:46:26 -0700
From: Kirk McKusick <mckusick@beastie.mckusick.com>
X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700
cc: freebsd-fs@freebsd.org
cc: cburrell@vicor.com
cc: julian@vicor-nb.com
cc: davep@vicor.com
cc: VicPE@aol.com
cc: jpl@vicor.com
cc: gluk@ptci.ru
cc: jrh@vicor.com
cc: Julian Elischer <julian@vicor.com>
Subject: Re: 4.8 ffs_dirpref problem 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2003 19:46:28 -0000

	Date: Thu, 23 Oct 2003 11:08:02 -0700
	From: Ken Marx <kmarx@vicor.com>
	To: Julian Elischer <julian@vicor.com>
	CC: mckusick@mckusick.com, cburrell@vicor.com, davep@vicor.com,
		freebsd-fs@freebsd.org, gluk@ptci.ru, jpl@vicor.com,
		jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com
	Subject: Re: 4.8 ffs_dirpref problem
	X-ASK-Info: Confirmed by User

	Thanks for the reply,

	We actually *did* try -s 4096 yesterday (not quite what you
	suggested) with spotty results: Sometimes it seemed to go
	more quickly, but often not.

	Let me clarify our test: We have a 1.5gb tar file from our
	production raid that fairly represents the distribution of
	data. We hit the performance problem when we get to dirs
	with lots of small-ish files.  But, as Julian mentioned,
	we typically have many flavors of file sizes and populations.

	Admittedly, our untar'ing test isn't necessarily representitive
	of what happens in production - we were just trying to fill
	the disk and recreate the problem here. We *did* at least
	hit a noticeable problem, and we believe it's the same
	behavior that's hitting production.

	I just tried your exact suggested settings on an fs that
	was already 96% full, and still experienced the very sluggish
	behavior on exactly the same type of files/dirs.

	Our untar typically takes around 60-100 sec of system time
	when things are going ok; 300-1000+ sec when the sluggishness
	occurs.  This time tends to increase as we get closer to
	99%. Sometimes as high as 4000+ secs.

	I wasn't clear from your mail if I should newfs the entire
	fs and start over, or if I could have expected the settings
	to make a difference for any NEW data.

	I can do this latter if you think it's required. The test
	will then take several hours to run since we need at least
	85% disk usage to start seeing the problem.

	Thanks!
	k

Unfortunately, I do believe that you will need to start over from
scratch with a newfs. The problem is that by the time you are at
85% full with the old parameters, the directory structure is already
too "dense" forcing you to search far and wide for more inodes. If
you start from the beginning with a large filesperdir then your
directory structure will expand across more of the disk which
should approximate the old algorithm.

	Kirk McKusick

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 23 16:58:36 2003
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2988416A4B3
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 16:58:36 -0700 (PDT)
Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 549C443FE3
	for <freebsd-fs@freebsd.org>; Thu, 23 Oct 2003 16:58:35 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Received: from vicor.com (localhost [127.0.0.1])
	by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9NNrdT1063942;
	Thu, 23 Oct 2003 16:53:39 -0700 (PDT)
	(envelope-from kmarx@vicor.com)
Message-ID: <3F986A03.2050809@vicor.com>
Date: Thu, 23 Oct 2003 16:53:39 -0700
From: Ken Marx <kmarx@vicor.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Kirk McKusick <mckusick@beastie.mckusick.com>
References: <200310231946.h9NJkQeN007683@beastie.mckusick.com>
In-Reply-To: <200310231946.h9NJkQeN007683@beastie.mckusick.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700
cc: freebsd-fs@freebsd.org
cc: cburrell@vicor.com
cc: julian@vicor-nb.com
cc: davep@vicor.com
cc: VicPE@aol.com
cc: jpl@vicor.com
cc: gluk@ptci.ru
cc: jrh@vicor.com
cc: Julian Elischer <julian@vicor.com>
Subject: Re: 4.8 ffs_dirpref problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Oct 2003 23:58:36 -0000

Ok, thanks, Kirk. Re newfs'ing and re-doing our test is on the
todo list. Probably an overnight thing.

Meanwhile we did a bit more digging and, maybe, found an anomaly:

We did a few escapes to ddb while the perfomance was bad to
see what a typical stack was:

--- interrupt, eip = 0xc01d9af4, esp = 0xcfe24bf8, ebp = 0xcfe24c04 ---
gbincore(cf3c6d00,1d090040,cfe24ca8,401,0) at gbincore+0x34
getblk(cf3c6d00,1d090040,1000,0,0) at getblk+0x80
bread(cf3c6d00,1d090040,1000,0,cfe24ca8) at bread+0x27
ffs_alloccg(c21eaf00,1d09,0,800) at ffs_alloccg+0x70
ffs_hashalloc(c21eaf00,1908,6420008,800,c026f110) at ffs_hashalloc+0x8c
ffs_alloc(c21eaf00,0,6420008,800,c1f93080) at ffs_alloc+0xc9
ffs_balloc(cfe24e2c,cfc9da40,c203bd80,20001,cfccfde0) at ffs_balloc+0x46a
ffs_write(cfe24e64,c203bd80,cf9934e0,41b,c03695a0) at ffs_write+0x319
vn_write(c203bd80,cfe24ed4,c1f93080,0,cf9934e0) at vn_write+0x15e
dofilewrite(cf9934e0,c203bd80,4,809d200,41b) at dofilewrite+0xc1
write(cf9934e0,cfe24f80,41b,809d200,0) at write+0x3b
---------------

So, alloccg logic needs to get the cg block. It goes through
getblk which in turn looks to see if the block is alredy in
an in-mem hashtable via the lookup routine, gbincore.

Julian had the thought that perhaps there was something funny
about this hash table. Possible wrt to cg blocks.

So, we hacked in a frew routines to histogram how often each
bucket was searched, and the 'average depth' of the bucket.
(This crude average is total running sum of depths found over all times
bucket was searched, divided by total times bucket was searched.)

We found that block numbers really spike at bucket 250, and
that the avg-depth of that bucket is 10-100 times that of any other
over the total of 1023 buckets in the hash:

 bh[247]: freq=1863, avgdepth = 1
 bh[248]: freq=1860, avgdepth = 1
 bh[249]: freq=1777, avgdepth = 1
 bh[250]: freq=969100, avgdepth = 440
 bh[251]: freq=1595, avgdepth = 12
 bh[252]: freq=1437, avgdepth = 1

To verify that these were cg block lookups we did a
similar histogram of hash indexes for the actual
bread() calls in ffs_alloccg. That is the bucket
that would be hashed for

	(ip->i_devvp, fsbtodb(fs, cgtod(fs, cg))

We got similar, corroborating results:

 bh[248]: freq=0
 bh[249]: freq=0
 bh[250]: freq=662387
 bh[251]: freq=0
 bh[252]: freq=40
 bh[253]: freq=0

It appears that lookups for cg blocks (that are probably
in memory already) tend to be more costly than necessary(?).

So, it may be that a better tuned file system would likely help.
But is it also possible that tuning wouldn't be needed if
the hash table were more evenly distributed?

We can dump the block list for the anomalous hashtable
bucket if you wish. And/or any other info/suggestions you
have for that matter. Maybe we'll hack in a new hashing
function just for kicks to see what happens...

Thanks again for your time!

k
	

Kirk McKusick wrote:
> 	Date: Thu, 23 Oct 2003 11:08:02 -0700
> 	From: Ken Marx <kmarx@vicor.com>
> 	To: Julian Elischer <julian@vicor.com>
> 	CC: mckusick@mckusick.com, cburrell@vicor.com, davep@vicor.com,
> 		freebsd-fs@freebsd.org, gluk@ptci.ru, jpl@vicor.com,
> 		jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com
> 	Subject: Re: 4.8 ffs_dirpref problem
> 	X-ASK-Info: Confirmed by User
> 
> 	Thanks for the reply,
> 
> 	We actually *did* try -s 4096 yesterday (not quite what you
> 	suggested) with spotty results: Sometimes it seemed to go
> 	more quickly, but often not.
> 
> 	Let me clarify our test: We have a 1.5gb tar file from our
> 	production raid that fairly represents the distribution of
> 	data. We hit the performance problem when we get to dirs
> 	with lots of small-ish files.  But, as Julian mentioned,
> 	we typically have many flavors of file sizes and populations.
> 
> 	Admittedly, our untar'ing test isn't necessarily representitive
> 	of what happens in production - we were just trying to fill
> 	the disk and recreate the problem here. We *did* at least
> 	hit a noticeable problem, and we believe it's the same
> 	behavior that's hitting production.
> 
> 	I just tried your exact suggested settings on an fs that
> 	was already 96% full, and still experienced the very sluggish
> 	behavior on exactly the same type of files/dirs.
> 
> 	Our untar typically takes around 60-100 sec of system time
> 	when things are going ok; 300-1000+ sec when the sluggishness
> 	occurs.  This time tends to increase as we get closer to
> 	99%. Sometimes as high as 4000+ secs.
> 
> 	I wasn't clear from your mail if I should newfs the entire
> 	fs and start over, or if I could have expected the settings
> 	to make a difference for any NEW data.
> 
> 	I can do this latter if you think it's required. The test
> 	will then take several hours to run since we need at least
> 	85% disk usage to start seeing the problem.
> 
> 	Thanks!
> 	k
> 
> Unfortunately, I do believe that you will need to start over from
> scratch with a newfs. The problem is that by the time you are at
> 85% full with the old parameters, the directory structure is already
> too "dense" forcing you to search far and wide for more inodes. If
> you start from the beginning with a large filesperdir then your
> directory structure will expand across more of the disk which
> should approximate the old algorithm.
> 
> 	Kirk McKusick
> 
> 

-- 
Ken Marx, kmarx@vicor-nb.com
It's an orthogonal issue to leverage our critical resources and focus hard to 
resolve the market forces.
		- http://www.bigshed.com/cgi-bin/speak.cgi