From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 05:09:21 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 311FF16A4B3; Tue, 21 Oct 2003 05:09:21 -0700 (PDT) Received: from genius.tao.org.uk (genius.tao.org.uk [212.135.162.51]) by mx1.FreeBSD.org (Postfix) with ESMTP id 57AB843FA3; Tue, 21 Oct 2003 05:09:20 -0700 (PDT) (envelope-from joe@genius.tao.org.uk) Received: by genius.tao.org.uk (Postfix, from userid 100) id DD409476C; Tue, 21 Oct 2003 13:09:18 +0100 (BST) Date: Tue, 21 Oct 2003 13:09:18 +0100 From: Josef Karthauser To: freebsd-fs@FreeBSD.org Message-ID: <20031021120918.GC15345@genius.tao.org.uk> Mail-Followup-To: Josef Karthauser , freebsd-fs@FreeBSD.org, current@FreeBSD.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Y5rl02BVI9TCfPar" Content-Disposition: inline User-Agent: Mutt/1.5.4i cc: current@FreeBSD.org Subject: Problems with NFS (client) under 5.1. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 12:09:21 -0000 --Y5rl02BVI9TCfPar Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I'm trying to set a FreeBSD 5.1 machine up as an NFS client. The server is on an SGI box. Things are strange: phoenix# uname -a FreeBSD phoenix.mydomain 5.1-CURRENT FreeBSD 5.1-CURRENT #0: Thu Sep 18= 15:20:19 GMT 2003 root@pheonix.mydomain:/usr/obj/usr/src/sys/GENERIC i386 phoenix# ls -ld /mnt drwxr-xr-x 2 root wheel 512 Jun 5 01:53 /mnt phoenix# mount rebus:/rebus/home /mnt phoenix# ls -ld /mnt ls: /mnt: Permission denied phoenix# ls -ld /* | grep mnt phoenix# umount /mnt phoenix# ls -ld /* | grep mnt drwxr-xr-x 2 root wheel 512 Jun 5 01:53 /mnt What's going on here? Is it a bug or something that I'm doing wrong? phoenix# grep nfs /etc/rc.conf nfs_client_enable=3D"YES" # This host is an NFS client (or NO). The NFS server is: IRIX64 rebus 6.5 04101930 IP35 mips Joe --=20 Josef Karthauser (joe@tao.org.uk) http://www.josef-k.net/ FreeBSD (cvs meister, admin and hacker) http://www.uk.FreeBSD.org/ Physics Particle Theory (student) http://www.pact.cpes.sussex.ac.uk/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D An eclectic mix of fact an= d theory. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --Y5rl02BVI9TCfPar Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iEYEARECAAYFAj+VIe4ACgkQXVIcjOaxUBYajgCdGAdJ/9jILRfhHzrPfwjBAm8o qRwAoOEcC1XF1uCOBYPdHRaTtOqSCSRy =+G/n -----END PGP SIGNATURE----- --Y5rl02BVI9TCfPar-- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 08:23:59 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA99716A4B3; Tue, 21 Oct 2003 08:23:59 -0700 (PDT) Received: from genius.tao.org.uk (genius.tao.org.uk [212.135.162.51]) by mx1.FreeBSD.org (Postfix) with ESMTP id 261A143FAF; Tue, 21 Oct 2003 08:23:57 -0700 (PDT) (envelope-from joe@genius.tao.org.uk) Received: by genius.tao.org.uk (Postfix, from userid 100) id D05C04228; Tue, 21 Oct 2003 16:23:51 +0100 (BST) Date: Tue, 21 Oct 2003 16:23:51 +0100 From: Josef Karthauser To: ticso@cicely.de Message-ID: <20031021152351.GB1438@genius.tao.org.uk> Mail-Followup-To: Josef Karthauser , ticso@cicely.de, freebsd-fs@FreeBSD.org, current@FreeBSD.org References: <20031021120918.GC15345@genius.tao.org.uk> <20031021133336.GT38650@cicely12.cicely.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="mojUlQ0s9EVzWg2t" Content-Disposition: inline In-Reply-To: <20031021133336.GT38650@cicely12.cicely.de> User-Agent: Mutt/1.5.4i cc: freebsd-fs@FreeBSD.org cc: current@FreeBSD.org Subject: Re: Problems with NFS (client) under 5.1. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 15:24:00 -0000 --mojUlQ0s9EVzWg2t Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 21, 2003 at 03:33:37PM +0200, Bernd Walter wrote: >=20 > You are root - and root is often mapped to nobody on the server. > Are you shure that nobody is allowed to see? > The ls -ld /mnt case is strange, but /mnt is already on the server > namespace. >=20 The linux boxes on the network don't appear to have any problems. Either way why is the /mnt entry disappearing? phoenix# mount rebus:/rebus/home /mnt phoenix# suspend [1] + Suspended (signal) su $ id uid=3D1001(joe) gid=3D1001(joe) groups=3D1001(joe), 0(wheel) $ ls /mnt ls: /mnt: Permission denied $ ls -l / ls: mnt: Permission denied total 45 -r--r--r-- 1 root wheel 4735 Jun 5 01:57 COPYRIGHT drwxr-xr-x 2 root wheel 1024 Sep 17 19:31 bin drwxr-xr-x 6 root wheel 512 Sep 18 17:04 boot drwxr-xr-x 2 root wheel 512 Jul 2 17:32 cdrom lrwxr-xr-x 1 root wheel 10 Jul 2 17:49 compat -> usr/compat dr-xr-xr-x 4 root wheel 512 Oct 21 12:09 dev drwxr-xr-x 2 root wheel 512 Jul 2 17:32 dist -rw------- 1 root wheel 4096 Sep 17 11:28 entropy drwxr-xr-x 16 root wheel 2048 Oct 21 12:21 etc lrwxr-xr-x 1 root wheel 9 Sep 17 12:10 home -> /usr/home drwxr-xr-x 2 root wheel 1024 Sep 17 19:33 lib drwxr-xr-x 2 root wheel 512 Sep 17 19:33 libexec lrwxr-xr-x 1 root wheel 10 Sep 23 11:40 local -> /usr/local dr-xr-xr-x 2 root wheel 512 Jun 5 01:53 proc drwxr-xr-x 2 root wheel 2560 Sep 17 19:33 rescue drwxr-xr-x 3 root wheel 512 Sep 29 13:01 root drwxr-xr-x 2 root wheel 2560 Sep 17 19:33 sbin drwxr-xr-x 4 root wheel 1024 Jul 2 17:32 stand lrwxr-xr-x 1 root wheel 11 Sep 17 19:30 sys -> usr/src/sys drwxrwxrwt 4 root wheel 512 Oct 21 15:27 tmp drwxr-xr-x 18 root wheel 512 Oct 16 11:48 usr drwxr-xr-x 20 root wheel 512 Oct 16 11:53 var $ fg su phoenix# umount /mnt phoenix# ls -l / total 51 -rw-r--r-- 2 root wheel 797 Jun 5 01:57 .cshrc -rw-r--r-- 2 root wheel 251 Jun 5 01:57 .profile -r--r--r-- 1 root wheel 4735 Jun 5 01:57 COPYRIGHT drwxr-xr-x 2 root wheel 1024 Sep 17 19:31 bin drwxr-xr-x 6 root wheel 512 Sep 18 17:04 boot drwxr-xr-x 2 root wheel 512 Jul 2 17:32 cdrom lrwxr-xr-x 1 root wheel 10 Jul 2 17:49 compat -> usr/compat dr-xr-xr-x 4 root wheel 512 Oct 21 12:09 dev drwxr-xr-x 2 root wheel 512 Jul 2 17:32 dist -rw------- 1 root wheel 4096 Sep 17 11:28 entropy drwxr-xr-x 16 root wheel 2048 Oct 21 12:21 etc lrwxr-xr-x 1 root wheel 9 Sep 17 12:10 home -> /usr/home drwxr-xr-x 2 root wheel 1024 Sep 17 19:33 lib drwxr-xr-x 2 root wheel 512 Sep 17 19:33 libexec lrwxr-xr-x 1 root wheel 10 Sep 23 11:40 local -> /usr/local drwxr-xr-x 2 root wheel 512 Jun 5 01:53 mnt dr-xr-xr-x 2 root wheel 512 Jun 5 01:53 proc drwxr-xr-x 2 root wheel 2560 Sep 17 19:33 rescue drwxr-xr-x 3 root wheel 512 Sep 29 13:01 root drwxr-xr-x 2 root wheel 2560 Sep 17 19:33 sbin drwxr-xr-x 4 root wheel 1024 Jul 2 17:32 stand lrwxr-xr-x 1 root wheel 11 Sep 17 19:30 sys -> usr/src/sys drwxrwxrwt 4 root wheel 512 Oct 21 15:27 tmp drwxr-xr-x 18 root wheel 512 Oct 16 11:48 usr drwxr-xr-x 20 root wheel 512 Oct 16 11:53 var Joe --=20 Josef Karthauser (joe@tao.org.uk) http://www.josef-k.net/ FreeBSD (cvs meister, admin and hacker) http://www.uk.FreeBSD.org/ Physics Particle Theory (student) http://www.pact.cpes.sussex.ac.uk/ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D An eclectic mix of fact an= d theory. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --mojUlQ0s9EVzWg2t Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iEYEARECAAYFAj+VT4cACgkQXVIcjOaxUBbKpgCfUWatD80U6RRDqVO35zkE01Aq vSMAnjJ+LEBWEUSu/QSe7wQm+edIlOx6 =LBn1 -----END PGP SIGNATURE----- --mojUlQ0s9EVzWg2t-- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 09:27:46 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F0B5316A4B3; Tue, 21 Oct 2003 09:27:46 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 48A1E43F75; Tue, 21 Oct 2003 09:27:46 -0700 (PDT) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 3C4B42ED441; Tue, 21 Oct 2003 09:27:46 -0700 (PDT) Date: Tue, 21 Oct 2003 09:27:46 -0700 From: Alfred Perlstein To: Josef Karthauser , ticso@cicely.de, freebsd-fs@FreeBSD.org, current@FreeBSD.org Message-ID: <20031021162746.GB99943@elvis.mu.org> References: <20031021120918.GC15345@genius.tao.org.uk> <20031021133336.GT38650@cicely12.cicely.de> <20031021152351.GB1438@genius.tao.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031021152351.GB1438@genius.tao.org.uk> User-Agent: Mutt/1.4.1i Subject: Re: Problems with NFS (client) under 5.1. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 16:27:47 -0000 * Josef Karthauser [031021 08:24] wrote: > On Tue, Oct 21, 2003 at 03:33:37PM +0200, Bernd Walter wrote: > > > > You are root - and root is often mapped to nobody on the server. > > Are you shure that nobody is allowed to see? > > The ls -ld /mnt case is strange, but /mnt is already on the server > > namespace. > > > > The linux boxes on the network don't appear to have any problems. > Either way why is the /mnt entry disappearing? I saw this before with QNX server and FreeBSD client as well. Same behavior, root ok, other users not. I was nearly a year ago, but I haven't seen a fix go by either so... -Alfred From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 09:33:03 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4729516A4C0; Tue, 21 Oct 2003 09:33:03 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id A29B643FBF; Tue, 21 Oct 2003 09:33:01 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.9p2/8.12.9) with ESMTP id h9LGWCMg031433; Tue, 21 Oct 2003 12:32:12 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)h9LGWChh031430; Tue, 21 Oct 2003 12:32:12 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Tue, 21 Oct 2003 12:32:12 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Josef Karthauser In-Reply-To: <20031021120918.GC15345@genius.tao.org.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@FreeBSD.org cc: current@FreeBSD.org Subject: Re: Problems with NFS (client) under 5.1. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 16:33:03 -0000 On Tue, 21 Oct 2003, Josef Karthauser wrote: > I'm trying to set a FreeBSD 5.1 machine up as an NFS client. The > server is on an SGI box. Things are strange: Any chance you could grab a copy of ethereal and do a bit of on-the-wire inspection of the RPCs? It would be interesting to know which of the requests are serviced out of the client cache, and which make it to the server. It would also be interesting to see if you can see failures in the wire protocol, or if they're purely an artifact of the client. Also, can you confirm the Linux and FreeBSD clients are both using the same version of NFS with similar protocol settings (i.e., NFSv3 over UDP). Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 09:39:59 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 49ACD16A4B3; Tue, 21 Oct 2003 09:39:59 -0700 (PDT) Received: from obsecurity.dyndns.org (adsl-63-207-60-234.dsl.lsan03.pacbell.net [63.207.60.234]) by mx1.FreeBSD.org (Postfix) with ESMTP id 47DAF43FAF; Tue, 21 Oct 2003 09:39:58 -0700 (PDT) (envelope-from kris@obsecurity.org) Received: from rot13.obsecurity.org (rot13.obsecurity.org [10.0.0.5]) by obsecurity.dyndns.org (Postfix) with ESMTP id 11BCD66C9E; Tue, 21 Oct 2003 09:39:58 -0700 (PDT) Received: by rot13.obsecurity.org (Postfix, from userid 1000) id F11725C4; Tue, 21 Oct 2003 09:39:57 -0700 (PDT) Date: Tue, 21 Oct 2003 09:39:57 -0700 From: Kris Kennaway To: Robert Watson Message-ID: <20031021163957.GA66248@rot13.obsecurity.org> References: <20031021120918.GC15345@genius.tao.org.uk> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DocE+STaALJfprDB" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i cc: Josef Karthauser cc: freebsd-fs@FreeBSD.org cc: current@FreeBSD.org Subject: Re: Problems with NFS (client) under 5.1. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 16:39:59 -0000 --DocE+STaALJfprDB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 21, 2003 at 12:32:12PM -0400, Robert Watson wrote: >=20 > On Tue, 21 Oct 2003, Josef Karthauser wrote: >=20 > > I'm trying to set a FreeBSD 5.1 machine up as an NFS client. The > > server is on an SGI box. Things are strange: >=20 > Any chance you could grab a copy of ethereal and do a bit of on-the-wire > inspection of the RPCs? It would be interesting to know which of the > requests are serviced out of the client cache, and which make it to the > server. It would also be interesting to see if you can see failures in > the wire protocol, or if they're purely an artifact of the client. >=20 > Also, can you confirm the Linux and FreeBSD clients are both using the > same version of NFS with similar protocol settings (i.e., NFSv3 over UDP). Does Linux do NFSv3 yet? I thought that at least until recently there were stability issues and it was recommended it not be used. Kris --DocE+STaALJfprDB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/lWFdWry0BWjoQKURAvplAKDQ6j7JNVHpjTZMGNlvnHwKLp3x0wCgzyTK kGBIwjDdgO32OTYiPvY9i3g= =KLkc -----END PGP SIGNATURE----- --DocE+STaALJfprDB-- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 10:12:48 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7328816A4B3 for ; Tue, 21 Oct 2003 10:12:48 -0700 (PDT) Received: from web14107.mail.yahoo.com (web14107.mail.yahoo.com [216.136.172.137]) by mx1.FreeBSD.org (Postfix) with SMTP id 97B4B43FA3 for ; Tue, 21 Oct 2003 10:12:46 -0700 (PDT) (envelope-from cguttesen@yahoo.dk) Message-ID: <20031021171246.64372.qmail@web14107.mail.yahoo.com> Received: from [194.248.174.33] by web14107.mail.yahoo.com via HTTP; Tue, 21 Oct 2003 19:12:46 CEST Date: Tue, 21 Oct 2003 19:12:46 +0200 (CEST) From: =?iso-8859-1?q?Claus=20Guttesen?= To: Kris Kennaway , Robert Watson In-Reply-To: <20031021163957.GA66248@rot13.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit cc: Josef Karthauser cc: freebsd-fs@FreeBSD.org cc: current@FreeBSD.org Subject: Re: Problems with NFS (client) under 5.1. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 17:12:48 -0000 hi. > > Does Linux do NFSv3 yet? I thought that at least > until recently there > were stability issues and it was recommended it not > be used. > I had some problems with stale NFS handle when NFS-mounting two FreeBSD 5.1 client (one with the frozen 5.1 and one as of Oct. 10'th) to a Linux server with ReiserFS. When I mounted with ver. 2 the problems went away on the FreeBSD with source from Oct. 10'th and have less stale NFS handles with the frozen 5.1. regards Claus Yahoo! Mail (http://dk.mail.yahoo.com) - Gratis: 6 MB lagerplads, spamfilter og virusscan From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 15:54:42 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5844816A4B3 for ; Tue, 21 Oct 2003 15:54:42 -0700 (PDT) Received: from mail.siscom.net (mail2.siscom.net [209.251.18.174]) by mx1.FreeBSD.org (Postfix) with SMTP id 4760A43F75 for ; Tue, 21 Oct 2003 15:54:41 -0700 (PDT) (envelope-from support@newshosting.com) Received: (qmail 44142 invoked by uid 1005); 21 Oct 2003 22:54:27 -0000 Received: from support@newshosting.com by mail.siscom.net by uid 0 with qmail-scanner-1.14 (f-prot: 3.12. Clear:. Processed in 0.224603 secs); 21 Oct 2003 22:54:27 -0000 X-Qmail-Scanner-Mail-From: support@newshosting.com via mail.siscom.net X-Qmail-Scanner: 1.14 (Clear:. Processed in 0.224603 secs) Received: from unknown (HELO newshosting.com) (209.251.6.250) by mail.siscom.net with SMTP; 21 Oct 2003 22:54:26 -0000 Message-ID: <3F95B946.8010309@newshosting.com> Date: Tue, 21 Oct 2003 18:55:02 -0400 From: NH Support Organization: Newshosting.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5b) Gecko/20030914 Thunderbird/0.3a X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: >1 systems 1 FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 22:54:42 -0000 Hello, I'm working on a new cluster design and had a quick question. If I have a few boxes mounting the same FS (over a SAN) all read-only will it work? Will I have any trouble? Has anyone tried this with UFS/UFS2 .. Lets take it one step further.. lets say I have 1 box that mounts it RW.. and it updates the contents .. will the other systems that have it mounted RO puke? -j SISCOM From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 16:34:14 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B78D216A4B3 for ; Tue, 21 Oct 2003 16:34:14 -0700 (PDT) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 510CD43F93 for ; Tue, 21 Oct 2003 16:34:14 -0700 (PDT) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 46D4D2ED445; Tue, 21 Oct 2003 16:34:14 -0700 (PDT) Date: Tue, 21 Oct 2003 16:34:14 -0700 From: Alfred Perlstein To: NH Support Message-ID: <20031021233414.GJ99943@elvis.mu.org> References: <3F95B946.8010309@newshosting.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3F95B946.8010309@newshosting.com> User-Agent: Mutt/1.4.1i cc: freebsd-fs@freebsd.org Subject: Re: >1 systems 1 FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 23:34:14 -0000 * NH Support [031021 15:55] wrote: > Hello, > > I'm working on a new cluster design and had a quick question. If I have > a few boxes mounting the same FS (over a SAN) all read-only will it > work? Will I have any trouble? Has anyone tried this with UFS/UFS2 .. You shouldn't. > Lets take it one step further.. lets say I have 1 box that mounts it > RW.. and it updates the contents .. will the other systems that have it > mounted RO puke? Likely. -- - Alfred Perlstein - Research Engineering Development Inc. - email: bright@mu.org cell: 408-480-4684 From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 16:53:01 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA5FC16A4B3 for ; Tue, 21 Oct 2003 16:53:01 -0700 (PDT) Received: from mail.siscom.net (mail2.siscom.net [209.251.18.174]) by mx1.FreeBSD.org (Postfix) with SMTP id A7C7243F85 for ; Tue, 21 Oct 2003 16:53:00 -0700 (PDT) (envelope-from radams@siscom.net) Received: (qmail 2568 invoked by uid 1005); 21 Oct 2003 23:52:47 -0000 Received: from radams@siscom.net by mail.siscom.net by uid 0 with qmail-scanner-1.14 (f-prot: 3.12. Clear:. Processed in 0.104372 secs); 21 Oct 2003 23:52:47 -0000 X-Qmail-Scanner-Mail-From: radams@siscom.net via mail.siscom.net X-Qmail-Scanner: 1.14 (Clear:. Processed in 0.104372 secs) Received: from unknown (HELO siscom.net) (209.251.6.250) by mail.siscom.net with SMTP; 21 Oct 2003 23:52:47 -0000 Message-ID: <3F95C6F3.8030005@siscom.net> Date: Tue, 21 Oct 2003 19:53:23 -0400 From: "Robert J. Adams (jason)" Organization: Newshosting.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5b) Gecko/20030914 Thunderbird/0.3a X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <3F95B946.8010309@newshosting.com> <20031021233414.GJ99943@elvis.mu.org> In-Reply-To: <20031021233414.GJ99943@elvis.mu.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: >1 systems 1 FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2003 23:53:02 -0000 Alfred Perlstein wrote: >>Hello, >> >>I'm working on a new cluster design and had a quick question. If I have >>a few boxes mounting the same FS (over a SAN) all read-only will it >>work? Will I have any trouble? Has anyone tried this with UFS/UFS2 .. > > > You shouldn't. I shouldn't do this or I shouldn't have trouble? :) >>Lets take it one step further.. lets say I have 1 box that mounts it >>RW.. and it updates the contents .. will the other systems that have it >>mounted RO puke? > > > Likely. Well shit.. I need this. -j SISCOM From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 17:08:57 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9C3D516A4B3 for ; Tue, 21 Oct 2003 17:08:57 -0700 (PDT) Received: from quic.net (rrcs-central-24-123-205-180.biz.rr.com [24.123.205.180]) by mx1.FreeBSD.org (Postfix) with ESMTP id D470543F85 for ; Tue, 21 Oct 2003 17:08:54 -0700 (PDT) (envelope-from utsl@quic.net) Received: from localhost (localhost [127.0.0.1]) (uid 1032) by quic.net with local; Tue, 21 Oct 2003 20:08:53 -0400 Date: Tue, 21 Oct 2003 20:08:53 -0400 To: "Robert J. Adams (jason)" Message-ID: <20031022000853.GA409@quic.net> References: <3F95B946.8010309@newshosting.com> <20031021233414.GJ99943@elvis.mu.org> <3F95C6F3.8030005@siscom.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <3F95C6F3.8030005@siscom.net> User-Agent: Mutt/1.3.28i From: Nathan Hawkins cc: freebsd-fs@freebsd.org Subject: Re: >1 systems 1 FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 00:08:57 -0000 On Tue, Oct 21, 2003 at 07:53:23PM -0400, Robert J. Adams (jason) wrote: > Alfred Perlstein wrote: > > >>Hello, > >> > >>I'm working on a new cluster design and had a quick question. If I have > >>a few boxes mounting the same FS (over a SAN) all read-only will it > >>work? Will I have any trouble? Has anyone tried this with UFS/UFS2 .. > > > > > >You shouldn't. > > I shouldn't do this or I shouldn't have trouble? :) No, you can get away with _all_ read only. It's the part where you mount RW somewhere that causes trouble. There is a little problem of cache coherency. > >>Lets take it one step further.. lets say I have 1 box that mounts it > >>RW.. and it updates the contents .. will the other systems that have it > >>mounted RO puke? > > > > > >Likely. > > Well shit.. I need this. There are some options: 1. Go to NAS, and use NFS. 2. Switch OS to one that has a cluster filesystem 3. Implement a filesystem with cluster support 4. Don't use a filesystem, use devices, and work around the problem in userspace From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 17:53:42 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0371416A4B3 for ; Tue, 21 Oct 2003 17:53:42 -0700 (PDT) Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2ED3143FB1 for ; Tue, 21 Oct 2003 17:53:41 -0700 (PDT) (envelope-from kmarx@vicor.com) Received: from vicor.com (localhost [127.0.0.1]) by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9M0mpT1015227; Tue, 21 Oct 2003 17:48:51 -0700 (PDT) (envelope-from kmarx@vicor.com) Message-ID: <3F95D3F3.2050203@vicor.com> Date: Tue, 21 Oct 2003 17:48:51 -0700 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Ken Marx cc: Cayford Burrell cc: Julian Elischer cc: victor elischer cc: John Lynch cc: Josh Howard cc: Dave Parker Smith Subject: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 00:53:42 -0000 Hi, We have 560GB raids that were sometimes bogging down heavily in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4) we find that when: o the raid file system grows to over 85% capacity (with only 30% inode usage) o we create ~1500 or so 2-6kb files in a given dir o (note: soft updates NOT enabled) We see: o 100% cpu utilization, all in system o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s We profiled the kernel and found a large number of calls to ffs_alloc(). After many twisty pasages, we finally diff'd 4.4 with 4.8 ffs_alloc.c, and found a major difference in the ffs_dirpref() call. Hacking the 4.4 logic back in 'fixed' the problem: We can now fill the /raid entirely with no real noticeable performance degradation. The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain things fairly clearly: 4.4 - ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps: -------------------------------------- * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. 4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon: ----------------------------------------- * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. For us, the 4.4 policy seems far superior, at least when the file system approches capacity. We'd like to avoid local kernel hacks and keep with main line FreeBSD code. Is there some way that the old policy can be supported, perhaps via a tunefs or sysctl type option? Actually, if the new policy can be fixed up to avoid the problem, that would of course be just as dandy. Thanks very much, k -- Ken Marx, kmarx@vicor-nb.com We need to hit the nail on the head and set the agenda regarding total quality. - http://www.bigshed.com/cgi-bin/speak.cgi From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 18:06:10 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4A55E16A4B3 for ; Tue, 21 Oct 2003 18:06:10 -0700 (PDT) Received: from imf18aec.mail.bellsouth.net (imf18aec.mail.bellsouth.net [205.152.59.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 276EB43FD7 for ; Tue, 21 Oct 2003 18:06:09 -0700 (PDT) (envelope-from drhodus@catpa.com) Received: from catpa.com ([68.209.168.6]) by imf18aec.mail.bellsouth.net (InterMail vM.5.01.05.27 201-253-122-126-127-20021220) with ESMTP id <20031022010604.XGWR1849.imf18aec.mail.bellsouth.net@catpa.com>; Tue, 21 Oct 2003 21:06:04 -0400 Date: Tue, 21 Oct 2003 21:06:04 -0400 Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v552) To: Ken Marx From: David Rhodus In-Reply-To: <3F95D3F3.2050203@vicor.com> Message-Id: Content-Transfer-Encoding: 7bit X-Mailer: Apple Mail (2.552) cc: freebsd-fs@freebsd.org cc: Cayford Burrell cc: Julian Elischer cc: victor elischer cc: John Lynch cc: Josh Howard cc: Dave Parker Smith Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 01:06:10 -0000 On Tuesday, October 21, 2003, at 08:48 PM, Ken Marx wrote: > Hi, > > We have 560GB raids that were sometimes bogging down heavily > in our production systems. Under 4.8-RELEASE (recently upgrated from > 4.4) > we find that when: > > o the raid file system grows to over 85% capacity (with only > 30% inode usage) > o we create ~1500 or so 2-6kb files in a given dir > o (note: soft updates NOT enabled) I have one question, why do you have softupdates turned off ? With softupdates it could be possible to get faster writes than using an async mount. -DR From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 19:02:00 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DFF8416A4B3 for ; Tue, 21 Oct 2003 19:02:00 -0700 (PDT) Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2F3CA43FB1 for ; Tue, 21 Oct 2003 19:01:58 -0700 (PDT) (envelope-from kmarx@vicor.com) Received: from vicor.com (localhost [127.0.0.1]) by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9M1v2T1016186; Tue, 21 Oct 2003 18:57:03 -0700 (PDT) (envelope-from kmarx@vicor.com) Message-ID: <3F95E3EE.4070401@vicor.com> Date: Tue, 21 Oct 2003 18:57:02 -0700 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: David Rhodus References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: Cayford Burrell cc: Julian Elischer cc: victor elischer cc: John Lynch cc: Josh Howard cc: Dave Parker Smith Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 02:02:01 -0000 Wow - quick reply. Thanks! I dunno. Wasn't my idea. I just quickly tried this: In the problem dirs it still bogs down for about 20 seconds or so. I wish I could tell you how long that was taking before, but I wasn't there for that part, and I have to take off just now. The systat -vmstat numbers look similar, but I don't want to make any bold claims. I'll re-disable soft updates, retest in the morning, and report back. Thanks again, k. David Rhodus wrote: > > On Tuesday, October 21, 2003, at 08:48 PM, Ken Marx wrote: > >> Hi, >> >> We have 560GB raids that were sometimes bogging down heavily >> in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4) >> we find that when: >> >> o the raid file system grows to over 85% capacity (with only >> 30% inode usage) >> o we create ~1500 or so 2-6kb files in a given dir >> o (note: soft updates NOT enabled) > > > I have one question, why do you have softupdates turned off ? > With softupdates it could be possible to get faster writes than using > an async mount. > > -DR > > > -- Ken Marx, kmarx@vicor-nb.com They must reach agreement and stop beating around the bush on the long pole in the tent. - http://www.bigshed.com/cgi-bin/speak.cgi From owner-freebsd-fs@FreeBSD.ORG Tue Oct 21 22:28:46 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B3B116A4B3 for ; Tue, 21 Oct 2003 22:28:46 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id D7F2143F93 for ; Tue, 21 Oct 2003 22:28:41 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id PAA05003; Wed, 22 Oct 2003 15:27:19 +1000 Date: Wed, 22 Oct 2003 15:25:58 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: David Rhodus In-Reply-To: Message-ID: <20031022145836.J21067@gamplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Ken Marx cc: freebsd-fs@FreeBSD.org cc: Cayford Burrell cc: Julian Elischer cc: victor elischer cc: John Lynch cc: Josh Howard cc: Dave Parker Smith Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 05:28:46 -0000 On Tue, 21 Oct 2003, David Rhodus wrote: > I have one question, why do you have softupdates turned off ? > With softupdates it could be possible to get faster writes than using > an async mount. Soft updates have never been faster than async for me, and in recent simple tests (copying /usr/src) they have become significantly slower than even ordinary (non-soft-update, non-sync, non-async mounts): 2002/05/30 ---------- ffs-16384-2048-1: tarcp /e src: 278.82 real 0.76 user 14.99 sys ffs-16384-2048-as-1: tarcp /e src: 180.39 real 0.73 user 13.69 sys ffs-16384-2048-su-1: tarcp /e src: 181.98 real 0.69 user 13.81 sys 2003/09/23 ---------- ffs-16384-02048-1: tarcp /f src: 68.66 real 0.82 user 13.81 sys ffs-16384-02048-as-1: tarcp /f src: 41.09 real 0.83 user 11.25 sys ffs-16384-02048-su-1: tarcp /f src: 111.62 real 0.82 user 11.49 sys ffs-16384-02048-1 means ffs with a block size of 16384, a fragment size of 2048, soft updates and ffs^WUFS1, etc. (there was no ffs2 at the time of the old benchmark and the "-1" in it actually meant doreallocblks=1). The machine is Athlon 1600XP overclocked running -current at the time with only the following major changes: - main memory increased from 512MB to 1024MB. This increases the relevance of the test as a write benchmark by giving enough memory to keep the source of the copy cached. - target disk changed from an IBM-DTLA-307030 (30MB ATA) to an IC35L060AVVA07-0 (60MB ATA). The file system had size 13GB in both cases and was at a similar offset in the disks; this puts it closer to the outer tracks on the larger disk so accesses to it were faster (approx. 40MB/sec vs 29 MB/sec) Most of the real times were improved significantly by the hardware changes, but for some reason soft updates didn't benefit as much as the others. This behaviour is not dependent on the block/frag sizes or ffs1/2. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 01:44:02 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 03A5A16A4B3 for ; Wed, 22 Oct 2003 01:44:02 -0700 (PDT) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 787CD43FBD for ; Wed, 22 Oct 2003 01:44:00 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjup.dialup.mindspring.com ([165.247.207.217] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 1ACEaz-0002BE-00; Wed, 22 Oct 2003 01:43:58 -0700 Message-ID: <3F96431E.A30656E3@mindspring.com> Date: Wed, 22 Oct 2003 01:43:10 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Robert J. Adams (jason)" References: <3F95B946.8010309@newshosting.com> <20031021233414.GJ99943@elvis.mu.org> <3F95C6F3.8030005@siscom.net> Content-Type: text/plain; charset=big5 Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a494b6a1359814014566978bc4660100a9350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-fs@freebsd.org Subject: Re: >1 systems 1 FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 08:44:02 -0000 "Robert J. Adams (jason)" wrote: > Alfred Perlstein wrote: > >>Hello, > >> > >>I'm working on a new cluster design and had a quick question. If I have > >>a few boxes mounting the same FS (over a SAN) all read-only will it > >>work? Will I have any trouble? Has anyone tried this with UFS/UFS2 .. > > > > You shouldn't. > > I shouldn't do this or I shouldn't have trouble? :) > > >>Lets take it one step further.. lets say I have 1 box that mounts it > >>RW.. and it updates the contents .. will the other systems that have it > >>mounted RO puke? > > > > > > Likely. > > Well shit.. I need this. Then you need a new FS. The issue is that you effectively need block-level or range of blocks locking on the device over the shared interface wire to be able to do this effectively, since a device that is a target of multiple master devices has to know who to permit onto the blocks and who not to permit onto the blocks. Firewire was supposed to fix this, and so was SCSI 3. The parts of the SCSI 3 standard that deal with this particular issue have not been finalized, because each device vendor is jockeying to get their implementation standardized to get a jump on all the other vendors, instead of cooperating on establishing an open standard. This is one of the main reasons that the SCSI 3 standard is not yet final (the other main reason is that a number of the participants also sell IDE disks, and whatever's bad for SCSI is good for IDE, so they are being obstructionist jerks because they can). There are a number of FS implementations that can deal with this, however, and they way they deal with this is by implementing an out-of-device-control-band block-level or range of blocks locking protocol, usually over ethernet, to ensure that they can get exclusive access to the blocks. Usually, this is implemented as multiple reader, single writer locking, with the ability to go exclusive ("SIX locking" -- "Shared Intention eXclusive"; look for it in your favorite search engine). Obviously, doing this in-band with explicit enforcement, and no issue of inter-node failure recovery being necessary because the locks are stored in the physical device (i.e. the SCSI 3 approach) would have significant performance benefits over the external lock manager that relies on the machines voluntarily participating and not going down. One example of an FS that can do this is GFS, from Sistina; they used to have an open-source version (under the GPL), but appear to have since come to their senses. I ported all the user space tools for GFS to FreeBSD in about 4 hours of work one night, when it was still available under the GPL. See their propaganda at: http://www.sistina.com/products_gfs.htm IBM also has two FS's that can do this, but they don't even run on Linux, let alone FreeBSD. In theory, SGI CXFS will also do this (I haven't gotten enough information from non-proprietary channels to be able to disclose much here and be on sound legal footing). Another company that had a product in this space was Zambeel; they were a Fremont startup, and, among other people, they had hired Mohit Aron from Rice University (he did the ResCon LRP implementation and was associated with the SCALA Server project and Peter Druschel's group). The company showed a lot of promise, but apparently burnt all it's first round money to the tune $65M at the rate of $1M/month, with only 90 people in headcount a little more than a year ago. Unfortunately, they croaked last April: http://www.byteandswitch.com/document.asp?doc_id=31886&site=byteandswitch and it's not likely that anyone will be jumping into the space very soon, since it hasn't been very profitable for the companies trying to stake out territory there. Anyway, the normal way this is handled for SAN/NAS devices is to carve out a logical volume region on a per-machine basis, and forget the locking altogether (giving a management node "ownership" of the "as yet unallocated regions"), which avoid contention by separation of the contention domain entirely. Not a very satisfying way of doing it, if you ask me. -- Terry From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 03:28:33 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2962016A4B3 for ; Wed, 22 Oct 2003 03:28:33 -0700 (PDT) Received: from frontend3.aha.ru (frontend3.aha.ru [195.2.83.143]) by mx1.FreeBSD.org (Postfix) with ESMTP id C620643F3F for ; Wed, 22 Oct 2003 03:28:31 -0700 (PDT) (envelope-from uitm@zmail.ru) Received: from [195.2.83.134] (HELO backend4.aha.ru) by frontend3.aha.ru (CommuniGate Pro SMTP 4.1.5) with ESMTP id 44190818; Wed, 22 Oct 2003 14:28:30 +0400 Received: from [193.125.99.9] (account uitm@zmail.ru) by backend4.aha.ru (CommuniGate Pro WebUser 4.1.5) with HTTP id 16445456; Wed, 22 Oct 2003 14:28:30 +0400 From: "Andrey Alekseyev" To: "Robert J. Adams (jason)" , "Terry Lambert" X-Mailer: CommuniGate Pro WebUser Interface v.4.1.5 Date: Wed, 22 Oct 2003 14:28:30 +0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="KOI8-R"; format="flowed" Content-Transfer-Encoding: 8bit cc: freebsd-fs@freebsd.org Subject: X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 10:28:33 -0000 >One example of an FS that can do this is GFS, from Sistina; they >used to have an open-source version (under the GPL), but appear An excellent implementation of an extremely cost-effective yet very flexible and powerful shared-FS could be DataPlow SFS (see http://www.dataplow.com). DataPlow SFS is available for Solaris (both Intel and Sparc), Irix and Linux and is the ultimate product of its class. --- Professional hosting for everyone - http://www.host.ru From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 12:57:53 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EBD2616A4B3 for ; Wed, 22 Oct 2003 12:57:53 -0700 (PDT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6065143F93 for ; Wed, 22 Oct 2003 12:57:53 -0700 (PDT) (envelope-from julian@vicor.com) Received: by mail.vicor-nb.com (Postfix, from userid 1058) id 27C707A49F; Wed, 22 Oct 2003 12:57:53 -0700 (PDT) To: freebsd-fs@freebsd.org, kmarx@vicor.com, mckusick@mckusick.com In-Reply-To: <3F95D3F3.2050203@vicor.com> Message-Id: <20031022195753.27C707A49F@mail.vicor-nb.com> Date: Wed, 22 Oct 2003 12:57:53 -0700 (PDT) From: julian@vicor.com (Julian Elischer) cc: cburrell@vicor.com cc: julian@vicor-nb.com cc: VicPE@aol.com cc: jpl@vicor.com cc: jrh@vicor.com cc: davep@vicor.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 19:57:54 -0000 Kirk?, I'm away in easwtern europe on the end of a wet bit of string.... >From kmarx@vicor.com Tue Oct 21 17:53:42 2003 X-Original-To: julian@vicor-nb.com Delivered-To: julian@vicor-nb.com Date: Tue, 21 Oct 2003 17:48:51 -0700 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-fs@freebsd.org Cc: Julian Elischer , John Lynch , Dave Parker Smith , Cayford Burrell , victor elischer , Josh Howard , Ken Marx Subject: 4.8 ffs_dirpref problem Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hi, We have 560GB raids that were sometimes bogging down heavily in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4) we find that when: o the raid file system grows to over 85% capacity (with only 30% inode usage) o we create ~1500 or so 2-6kb files in a given dir o (note: soft updates NOT enabled) We see: o 100% cpu utilization, all in system o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s We profiled the kernel and found a large number of calls to ffs_alloc(). After many twisty pasages, we finally diff'd 4.4 with 4.8 ffs_alloc.c, and found a major difference in the ffs_dirpref() call. Hacking the 4.4 logic back in 'fixed' the problem: We can now fill the /raid entirely with no real noticeable performance degradation. The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain things fairly clearly: 4.4 - ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps: -------------------------------------- * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. 4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon: ----------------------------------------- * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. For us, the 4.4 policy seems far superior, at least when the file system approches capacity. We'd like to avoid local kernel hacks and keep with main line FreeBSD code. Is there some way that the old policy can be supported, perhaps via a tunefs or sysctl type option? Actually, if the new policy can be fixed up to avoid the problem, that would of course be just as dandy. Thanks very much, k -- Ken Marx, kmarx@vicor-nb.com We need to hit the nail on the head and set the agenda regarding total quality. - http://www.bigshed.com/cgi-bin/speak.cgi From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 15:52:35 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 945FF16A4B3 for ; Wed, 22 Oct 2003 15:52:35 -0700 (PDT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id A9DBD43F3F for ; Wed, 22 Oct 2003 15:52:32 -0700 (PDT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 9BB6172DA8; Wed, 22 Oct 2003 15:52:32 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 99A6C72DA3; Wed, 22 Oct 2003 15:52:32 -0700 (PDT) Date: Wed, 22 Oct 2003 15:52:32 -0700 (PDT) From: Doug White To: Ken Marx In-Reply-To: <3F95D3F3.2050203@vicor.com> Message-ID: <20031022154631.H71676@carver.gumbysoft.com> References: <3F95D3F3.2050203@vicor.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-fs@freebsd.org Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Oct 2003 22:52:35 -0000 Purge extensive cc:. On Tue, 21 Oct 2003, Ken Marx wrote: > We have 560GB raids that were sometimes bogging down heavily > in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4) > we find that when: > > o the raid file system grows to over 85% capacity (with only > 30% inode usage) > o we create ~1500 or so 2-6kb files in a given dir > o (note: soft updates NOT enabled) Interesting problems and analysis. If I'm reading the diffs and source right, the old allocation algorithm exists at the end of the dirpref function, below the comment about the backstop. It would be interesting to wrap the rest of the function in a tunable so you could easily short-circuit to the backstop. I don't know if it could be done on a per-filesystem basis. You might just have to eat the old layout semantics for the entire system if we want to keep the cost of the tunable low. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 18:58:50 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 192B716A4B3 for ; Wed, 22 Oct 2003 18:58:50 -0700 (PDT) Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4B90143FB1 for ; Wed, 22 Oct 2003 18:58:49 -0700 (PDT) (envelope-from kmarx@vicor.com) Received: from vicor.com (localhost [127.0.0.1]) by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9N1ruT1036900; Wed, 22 Oct 2003 18:53:56 -0700 (PDT) (envelope-from kmarx@vicor.com) Message-ID: <3F9734B4.2050201@vicor.com> Date: Wed, 22 Oct 2003 18:53:56 -0700 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Doug White References: <3F95D3F3.2050203@vicor.com> <20031022154631.H71676@carver.gumbysoft.com> In-Reply-To: <20031022154631.H71676@carver.gumbysoft.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-fs@freebsd.org cc: Ken Marx Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 01:58:50 -0000 Thanks very much for the reply. (Sorry about the cc list - I'm working with others on this, but will forward things on from now on.) It seems to me the old source differs just slightly from the new 'backstop'. Old dirpref has an additional check for if (fs->fs_cs(fs, cg).cs_ndir < minndir I don't know how relevant this is. I could hack things to only do the backstop and see empircally what happens. Open to suggestions. To recap what we did/found today: 1. Soft updates on non-hacked ffs_dirpref() does seem to improve things significantly, but not quite as much as using 4.4 code. 2. Talking with others here: We're a bit scared to enable soft updates in production. These are 7x24 sites. If there's a crash, we're concerned that at the very least, downtime could increase due to fsck. And at worst, more data could be lost. 3. Strangely, I can't find ffs_dirpref in the kgmon call graph. I even made it a non-static function. Hm... Don't quite grok this. 4. I haven't looked into where tunefs (et al) store their info (superblock(s)?). Presuming for the sake of argument that making the backstop code configurable is a reasonable approach, might we have room to store this as a file system attribute? 5. We discussed why others might not be seeing this elsewhere. The guess is that it *does* happen, but that typical file systems aren't 550Gb. Hence the cost of the linear fallback cycl group searches isn't as noticeable. (I can try to give rationale for our large fs, but anyway, it's what we have.) I suppose one approach (uh, hack) is to do the backstop code first under certain extreme conditions (such as huge number of cyl groups)? 6. Open to ideas of where to instrument ffs a bit more. E.g., counters in ffs_alloc() for which strategies it uses or some such? Conditional upon? 7. Are there other tunefs settings that might help? We tried changing avg files/dir (-s) from 64 to 4096 since we often have >> 64. Results were varried: Sometimes things went more quickly, but we still often saw the very sluggish behavior. Again thanks. And thanks in advance for any further guidance. regards, k Doug White wrote: > Purge extensive cc:. > > On Tue, 21 Oct 2003, Ken Marx wrote: > > >>We have 560GB raids that were sometimes bogging down heavily >>in our production systems. Under 4.8-RELEASE (recently upgrated from 4.4) >>we find that when: >> >> o the raid file system grows to over 85% capacity (with only >> 30% inode usage) >> o we create ~1500 or so 2-6kb files in a given dir >> o (note: soft updates NOT enabled) > > > Interesting problems and analysis. > > If I'm reading the diffs and source right, the old allocation algorithm > exists at the end of the dirpref function, below the comment about the > backstop. It would be interesting to wrap the rest of the function in a > tunable so you could easily short-circuit to the backstop. > > I don't know if it could be done on a per-filesystem basis. You might just > have to eat the old layout semantics for the entire system if we want to > keep the cost of the tunable low. > -- Ken Marx, kmarx@vicor-nb.com Speaking candidly, I say that we intend to do the right thing and maintain our commitment to the impedance match. - http://www.bigshed.com/cgi-bin/speak.cgi From owner-freebsd-fs@FreeBSD.ORG Wed Oct 22 22:30:04 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4814716A4B3 for ; Wed, 22 Oct 2003 22:30:04 -0700 (PDT) Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7578A43FAF for ; Wed, 22 Oct 2003 22:30:03 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h9MNbseN005704; Wed, 22 Oct 2003 16:37:54 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200310222337.h9MNbseN005704@beastie.mckusick.com> To: Ken Marx In-Reply-To: Your message of "Wed, 22 Oct 2003 12:57:53 PDT." <20031022195753.27C707A49F@mail.vicor-nb.com> Date: Wed, 22 Oct 2003 16:37:54 -0700 From: Kirk McKusick X-Mailman-Approved-At: Thu, 23 Oct 2003 06:57:01 -0700 cc: freebsd-fs@freebsd.org cc: cburrell@vicor.com cc: julian@vicor-nb.com cc: julian@vicor.com cc: VicPE@aol.com cc: jpl@vicor.com cc: Grigoriy Orlov cc: jrh@vicor.com cc: davep@vicor.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 05:30:04 -0000 I believe that you can dsolve your problem by tuning the existing algorithm using tunefs. There are two parameters to control dirpref, avgfilesize (which defaults to 16384) and filesperdir (which defaults to 50). I suggest that you try using an avgfilesize of 4096 and filesperdir of 1500. This is done by running tunefs on the unmounted (or at least mounted read-only) filesystem as: tunefs -f 4096 -s 1500 /dev/ Note that this affects future layout, so needs to be done before you put any data into the filesystem. If you are building the filesystem from scratch, you can use: newfs -g 4096 -h 1500 ... to set these fields. Please let me know if this solves your problem. If it does not, I will ask Grigoriy Orlov if he has any ideas on how to proceed. Kirk McKusick =-=-=-=-=-=-= > Date: Tue, 21 Oct 2003 17:48:51 -0700 > From: Ken Marx > To: freebsd-fs@freebsd.org > Cc: Julian Elischer , > John Lynch , Dave Parker Smith , > Cayford Burrell , > victor elischer , Josh Howard , > Ken Marx > Subject: 4.8 ffs_dirpref problem > > Hi, > > We have 560GB raids that were sometimes bogging down heavily > in our production systems. Under 4.8-RELEASE (recently > upgrated from 4.4) we find that when: > > o the raid file system grows to over 85% capacity (with only > 30% inode usage) > o we create ~1500 or so 2-6kb files in a given dir > o (note: soft updates NOT enabled) > > We see: > > o 100% cpu utilization, all in system > o I/O transfer rates of ~200kb/sec, down from normal of 15-30MB/s > > We profiled the kernel and found a large number of calls to > ffs_alloc(). After many twisty pasages, we finally diff'd 4.4 > with 4.8 ffs_alloc.c, and found a major difference in the > ffs_dirpref() call. Hacking the 4.4 logic back in 'fixed' the > problem: We can now fill the /raid entirely with no real > noticeable performance degradation. > > The nice comments for 4.4/4.8 versions of ffs_dirpref() seem to explain > things fairly clearly: > > 4.4 - ffs_alloc.c,v 1.64.2.1 2000/03/16 08:15:53 ps: > -------------------------------------- > * The policy implemented by this algorithm is to select from > * among those cylinder groups with above the average number of > * free inodes, the one with the smallest number of directories. > > 4.8 - ffs_alloc.c,v 1.64.2.2 2001/09/21 19:15:21 dillon: > ----------------------------------------- > * The policy implemented by this algorithm is to allocate a > * directory inode in the same cylinder group as its parent > * directory, but also to reserve space for its files inodes > * and data. Restrict the number of directories which may be > * allocated one after another in the same cylinder group > * without intervening allocation of files. > * > * If we allocate a first level directory then force allocation > * in another cylinder group. > > For us, the 4.4 policy seems far superior, at least when the file system > approches capacity. > > We'd like to avoid local kernel hacks and keep with main line > FreeBSD code. Is there some way that the old policy can be supported, > perhaps via a tunefs or sysctl type option? > > Actually, if the new policy can be fixed up to avoid the problem, that > would of course be just as dandy. > > Thanks very much, > k > -- > Ken Marx, kmarx@vicor-nb.com > We need to hit the nail on the head and set the agenda regarding total > quality. > - http://www.bigshed.com/cgi-bin/speak.cgi From owner-freebsd-fs@FreeBSD.ORG Thu Oct 23 08:01:51 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BF1AC16A4B3 for ; Thu, 23 Oct 2003 08:01:51 -0700 (PDT) Received: from perrin.nxad.com (internal.nxad.com [69.1.70.251]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2FAA343FE3 for ; Thu, 23 Oct 2003 08:01:51 -0700 (PDT) (envelope-from hmp@nxad.com) Received: by perrin.nxad.com (Postfix, from userid 1072) id 5251921068; Thu, 23 Oct 2003 08:01:50 -0700 (PDT) Date: Thu, 23 Oct 2003 08:01:50 -0700 From: Hiten Pandya To: Terry Lambert Message-ID: <20031023150150.GA46202@perrin.nxad.com> References: <3F95B946.8010309@newshosting.com> <20031021233414.GJ99943@elvis.mu.org> <3F95C6F3.8030005@siscom.net> <3F96431E.A30656E3@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3F96431E.A30656E3@mindspring.com> X-Operating-System: FreeBSD FreeBSD 4.7-STABLE User-Agent: Mutt/1.5.4i cc: freebsd-fs@freebsd.org Subject: Re: >1 systems 1 FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 15:01:51 -0000 On Wed, Oct 22, 2003 at 01:43:10AM -0700, Terry Lambert wrote: : : [ ... ] : : One example of an FS that can do this is GFS, from Sistina; they : used to have an open-source version (under the GPL), but appear : to have since come to their senses. I ported all the user space : tools for GFS to FreeBSD in about 4 hours of work one night, when : it was still available under the GPL. See their propaganda at: : : http://www.sistina.com/products_gfs.htm On the other hand, you could also check out the OpenGFS project which is still being worked on actively: http://opengfs.sourceforge.net/ This is for Linux, ofcourse. : Anyway, the normal way this is handled for SAN/NAS devices is : to carve out a logical volume region on a per-machine basis, and : forget the locking altogether (giving a management node "ownership" : of the "as yet unallocated regions"), which avoid contention by : separation of the contention domain entirely. Not a very : satisfying way of doing it, if you ask me. You could also checkout another interesting file system, called Lustre, located at http://www.lustre.org/, which could probably help your need. Regards, -- Hiten Pandya hmp@FreeBSD.ORG From owner-freebsd-fs@FreeBSD.ORG Thu Oct 23 10:19:33 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D712F16A4B3 for ; Thu, 23 Oct 2003 10:19:33 -0700 (PDT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id D01C543FA3 for ; Thu, 23 Oct 2003 10:19:32 -0700 (PDT) (envelope-from julian@vicor.com) Received: by mail.vicor-nb.com (Postfix, from userid 1058) id 9F36C7A425; Thu, 23 Oct 2003 10:19:32 -0700 (PDT) To: kmarx@vicor.com, mckusick@beastie.mckusick.com In-Reply-To: <200310222337.h9MNbseN005704@beastie.mckusick.com> Message-Id: <20031023171932.9F36C7A425@mail.vicor-nb.com> Date: Thu, 23 Oct 2003 10:19:32 -0700 (PDT) From: julian@vicor.com (Julian Elischer) X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700 cc: freebsd-fs@freebsd.org cc: cburrell@vicor.com cc: davep@vicor.com cc: julian@vicor.com cc: VicPE@aol.com cc: jpl@vicor.com cc: gluk@ptci.ru cc: jrh@vicor.com cc: julian@vicor-nb.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 17:19:34 -0000 > From mckusick@beastie.mckusick.com Wed Oct 22 22:30:03 2003 > X-Original-To: julian@vicor-nb.com > Delivered-To: julian@vicor-nb.com > To: Ken Marx > Subject: Re: 4.8 ffs_dirpref problem > Cc: freebsd-fs@freebsd.org, cburrell@vicor.com, davep@vicor.com, > jpl@vicor.com, jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com, > julian@vicor.com, Grigoriy Orlov > In-Reply-To: Your message of "Wed, 22 Oct 2003 12:57:53 PDT." > <20031022195753.27C707A49F@mail.vicor-nb.com> > Date: Wed, 22 Oct 2003 16:37:54 -0700 > From: Kirk McKusick > I believe that you can dsolve your problem by tuning the existing > algorithm using tunefs. There are two parameters to control dirpref, > avgfilesize (which defaults to 16384) and filesperdir (which defaults > to 50). I suggest that you try using an avgfilesize of 4096 and > filesperdir of 1500. This is done by running tunefs on the unmounted > (or at least mounted read-only) filesystem as: > tunefs -f 4096 -s 1500 /dev/ On the same filesystem are directories that contain 1GB files and others that contain maybe 100 100K files (images) > Note that this affects future layout, so needs to be done before you > put any data into the filesystem. If you are building the filesystem > from scratch, you can use: would this have an effect on an existing filesystem with respect to new data being added to it? > newfs -g 4096 -h 1500 ... > > to set these fields. Please let me know if this solves your problem. > If it does not, I will ask Grigoriy Orlov if he has > any ideas on how to proceed. > Kirk McKusick > =-=-=-=-=-=-= From owner-freebsd-fs@FreeBSD.ORG Thu Oct 23 11:12:57 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BDABB16A4C0 for ; Thu, 23 Oct 2003 11:12:57 -0700 (PDT) Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id F3EE443FB1 for ; Thu, 23 Oct 2003 11:12:56 -0700 (PDT) (envelope-from kmarx@vicor.com) Received: from vicor.com (localhost [127.0.0.1]) by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9NI82T1050765; Thu, 23 Oct 2003 11:08:02 -0700 (PDT) (envelope-from kmarx@vicor.com) Message-ID: <3F981902.90607@vicor.com> Date: Thu, 23 Oct 2003 11:08:02 -0700 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Julian Elischer References: <20031023171932.9F36C7A425@mail.vicor-nb.com> In-Reply-To: <20031023171932.9F36C7A425@mail.vicor-nb.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700 cc: freebsd-fs@freebsd.org cc: cburrell@vicor.com cc: julian@vicor-nb.com cc: davep@vicor.com cc: VicPE@aol.com cc: jpl@vicor.com cc: gluk@ptci.ru cc: jrh@vicor.com cc: mckusick@beastie.mckusick.com Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 18:12:58 -0000 Thanks for the reply, We actually *did* try -s 4096 yesterday (not quite what you suggested) with spotty results: Sometimes it seemed to go more quickly, but often not. Let me clarify our test: We have a 1.5gb tar file from our production raid that fairly represents the distribution of data. We hit the performance problem when we get to dirs with lots of small-ish files. But, as Julian mentioned, we typically have many flavors of file sizes and populations. Admittedly, our untar'ing test isn't necessarily representitive of what happens in production - we were just trying to fill the disk and recreate the problem here. We *did* at least hit a noticeable problem, and we believe it's the same behavior that's hitting production. I just tried your exact suggested settings on an fs that was already 96% full, and still experienced the very sluggish behavior on exactly the same type of files/dirs. Our untar typically takes around 60-100 sec of system time when things are going ok; 300-1000+ sec when the sluggishness occurs. This time tends to increase as we get closer to 99%. Sometimes as high as 4000+ secs. I wasn't clear from your mail if I should newfs the entire fs and start over, or if I could have expected the settings to make a difference for any NEW data. I can do this latter if you think it's required. The test will then take several hours to run since we need at least 85% disk usage to start seeing the problem. Thanks! k Julian Elischer wrote: >>From mckusick@beastie.mckusick.com Wed Oct 22 22:30:03 2003 >>X-Original-To: julian@vicor-nb.com >>Delivered-To: julian@vicor-nb.com >>To: Ken Marx >>Subject: Re: 4.8 ffs_dirpref problem >>Cc: freebsd-fs@freebsd.org, cburrell@vicor.com, davep@vicor.com, >> jpl@vicor.com, jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com, >> julian@vicor.com, Grigoriy Orlov >>In-Reply-To: Your message of "Wed, 22 Oct 2003 12:57:53 PDT." >> <20031022195753.27C707A49F@mail.vicor-nb.com> >>Date: Wed, 22 Oct 2003 16:37:54 -0700 >>From: Kirk McKusick > > >>I believe that you can dsolve your problem by tuning the existing >>algorithm using tunefs. There are two parameters to control dirpref, >>avgfilesize (which defaults to 16384) and filesperdir (which defaults >>to 50). I suggest that you try using an avgfilesize of 4096 and >>filesperdir of 1500. This is done by running tunefs on the unmounted >>(or at least mounted read-only) filesystem as: > > >> tunefs -f 4096 -s 1500 /dev/ > > > On the same filesystem are directories that contain 1GB files > and others that contain maybe 100 100K files (images) > > > >>Note that this affects future layout, so needs to be done before you >>put any data into the filesystem. If you are building the filesystem >>from scratch, you can use: > > > would this have an effect on an existing filesystem with respect to new data > being added to it? > > > > > >> newfs -g 4096 -h 1500 ... >> >>to set these fields. Please let me know if this solves your problem. >>If it does not, I will ask Grigoriy Orlov if he has >>any ideas on how to proceed. > > >> Kirk McKusick > > >>=-=-=-=-=-=-= > > > -- Ken Marx, kmarx@vicor-nb.com It's too costly to get lean and mean and analyze progress on the diminishing expectations. - http://www.bigshed.com/cgi-bin/speak.cgi From owner-freebsd-fs@FreeBSD.ORG Thu Oct 23 12:46:28 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C86A16A4B3 for ; Thu, 23 Oct 2003 12:46:28 -0700 (PDT) Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9CDDF43FBD for ; Thu, 23 Oct 2003 12:46:27 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.8/8.12.3) with ESMTP id h9NJkQeN007683; Thu, 23 Oct 2003 12:46:26 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200310231946.h9NJkQeN007683@beastie.mckusick.com> To: Ken Marx In-Reply-To: Your message of "Thu, 23 Oct 2003 11:08:02 PDT." <3F981902.90607@vicor.com> Date: Thu, 23 Oct 2003 12:46:26 -0700 From: Kirk McKusick X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700 cc: freebsd-fs@freebsd.org cc: cburrell@vicor.com cc: julian@vicor-nb.com cc: davep@vicor.com cc: VicPE@aol.com cc: jpl@vicor.com cc: gluk@ptci.ru cc: jrh@vicor.com cc: Julian Elischer Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 19:46:28 -0000 Date: Thu, 23 Oct 2003 11:08:02 -0700 From: Ken Marx To: Julian Elischer CC: mckusick@mckusick.com, cburrell@vicor.com, davep@vicor.com, freebsd-fs@freebsd.org, gluk@ptci.ru, jpl@vicor.com, jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com Subject: Re: 4.8 ffs_dirpref problem X-ASK-Info: Confirmed by User Thanks for the reply, We actually *did* try -s 4096 yesterday (not quite what you suggested) with spotty results: Sometimes it seemed to go more quickly, but often not. Let me clarify our test: We have a 1.5gb tar file from our production raid that fairly represents the distribution of data. We hit the performance problem when we get to dirs with lots of small-ish files. But, as Julian mentioned, we typically have many flavors of file sizes and populations. Admittedly, our untar'ing test isn't necessarily representitive of what happens in production - we were just trying to fill the disk and recreate the problem here. We *did* at least hit a noticeable problem, and we believe it's the same behavior that's hitting production. I just tried your exact suggested settings on an fs that was already 96% full, and still experienced the very sluggish behavior on exactly the same type of files/dirs. Our untar typically takes around 60-100 sec of system time when things are going ok; 300-1000+ sec when the sluggishness occurs. This time tends to increase as we get closer to 99%. Sometimes as high as 4000+ secs. I wasn't clear from your mail if I should newfs the entire fs and start over, or if I could have expected the settings to make a difference for any NEW data. I can do this latter if you think it's required. The test will then take several hours to run since we need at least 85% disk usage to start seeing the problem. Thanks! k Unfortunately, I do believe that you will need to start over from scratch with a newfs. The problem is that by the time you are at 85% full with the old parameters, the directory structure is already too "dense" forcing you to search far and wide for more inodes. If you start from the beginning with a large filesperdir then your directory structure will expand across more of the disk which should approximate the old algorithm. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Thu Oct 23 16:58:36 2003 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2988416A4B3 for ; Thu, 23 Oct 2003 16:58:36 -0700 (PDT) Received: from sploot.vicor-nb.com (sploot.vicor-nb.com [208.206.78.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id 549C443FE3 for ; Thu, 23 Oct 2003 16:58:35 -0700 (PDT) (envelope-from kmarx@vicor.com) Received: from vicor.com (localhost [127.0.0.1]) by sploot.vicor-nb.com (8.12.8/8.12.8) with ESMTP id h9NNrdT1063942; Thu, 23 Oct 2003 16:53:39 -0700 (PDT) (envelope-from kmarx@vicor.com) Message-ID: <3F986A03.2050809@vicor.com> Date: Thu, 23 Oct 2003 16:53:39 -0700 From: Ken Marx User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3b) Gecko/20030402 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kirk McKusick References: <200310231946.h9NJkQeN007683@beastie.mckusick.com> In-Reply-To: <200310231946.h9NJkQeN007683@beastie.mckusick.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Sat, 25 Oct 2003 07:10:35 -0700 cc: freebsd-fs@freebsd.org cc: cburrell@vicor.com cc: julian@vicor-nb.com cc: davep@vicor.com cc: VicPE@aol.com cc: jpl@vicor.com cc: gluk@ptci.ru cc: jrh@vicor.com cc: Julian Elischer Subject: Re: 4.8 ffs_dirpref problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2003 23:58:36 -0000 Ok, thanks, Kirk. Re newfs'ing and re-doing our test is on the todo list. Probably an overnight thing. Meanwhile we did a bit more digging and, maybe, found an anomaly: We did a few escapes to ddb while the perfomance was bad to see what a typical stack was: --- interrupt, eip = 0xc01d9af4, esp = 0xcfe24bf8, ebp = 0xcfe24c04 --- gbincore(cf3c6d00,1d090040,cfe24ca8,401,0) at gbincore+0x34 getblk(cf3c6d00,1d090040,1000,0,0) at getblk+0x80 bread(cf3c6d00,1d090040,1000,0,cfe24ca8) at bread+0x27 ffs_alloccg(c21eaf00,1d09,0,800) at ffs_alloccg+0x70 ffs_hashalloc(c21eaf00,1908,6420008,800,c026f110) at ffs_hashalloc+0x8c ffs_alloc(c21eaf00,0,6420008,800,c1f93080) at ffs_alloc+0xc9 ffs_balloc(cfe24e2c,cfc9da40,c203bd80,20001,cfccfde0) at ffs_balloc+0x46a ffs_write(cfe24e64,c203bd80,cf9934e0,41b,c03695a0) at ffs_write+0x319 vn_write(c203bd80,cfe24ed4,c1f93080,0,cf9934e0) at vn_write+0x15e dofilewrite(cf9934e0,c203bd80,4,809d200,41b) at dofilewrite+0xc1 write(cf9934e0,cfe24f80,41b,809d200,0) at write+0x3b --------------- So, alloccg logic needs to get the cg block. It goes through getblk which in turn looks to see if the block is alredy in an in-mem hashtable via the lookup routine, gbincore. Julian had the thought that perhaps there was something funny about this hash table. Possible wrt to cg blocks. So, we hacked in a frew routines to histogram how often each bucket was searched, and the 'average depth' of the bucket. (This crude average is total running sum of depths found over all times bucket was searched, divided by total times bucket was searched.) We found that block numbers really spike at bucket 250, and that the avg-depth of that bucket is 10-100 times that of any other over the total of 1023 buckets in the hash: bh[247]: freq=1863, avgdepth = 1 bh[248]: freq=1860, avgdepth = 1 bh[249]: freq=1777, avgdepth = 1 bh[250]: freq=969100, avgdepth = 440 bh[251]: freq=1595, avgdepth = 12 bh[252]: freq=1437, avgdepth = 1 To verify that these were cg block lookups we did a similar histogram of hash indexes for the actual bread() calls in ffs_alloccg. That is the bucket that would be hashed for (ip->i_devvp, fsbtodb(fs, cgtod(fs, cg)) We got similar, corroborating results: bh[248]: freq=0 bh[249]: freq=0 bh[250]: freq=662387 bh[251]: freq=0 bh[252]: freq=40 bh[253]: freq=0 It appears that lookups for cg blocks (that are probably in memory already) tend to be more costly than necessary(?). So, it may be that a better tuned file system would likely help. But is it also possible that tuning wouldn't be needed if the hash table were more evenly distributed? We can dump the block list for the anomalous hashtable bucket if you wish. And/or any other info/suggestions you have for that matter. Maybe we'll hack in a new hashing function just for kicks to see what happens... Thanks again for your time! k Kirk McKusick wrote: > Date: Thu, 23 Oct 2003 11:08:02 -0700 > From: Ken Marx > To: Julian Elischer > CC: mckusick@mckusick.com, cburrell@vicor.com, davep@vicor.com, > freebsd-fs@freebsd.org, gluk@ptci.ru, jpl@vicor.com, > jrh@vicor.com, julian@vicor-nb.com, VicPE@aol.com > Subject: Re: 4.8 ffs_dirpref problem > X-ASK-Info: Confirmed by User > > Thanks for the reply, > > We actually *did* try -s 4096 yesterday (not quite what you > suggested) with spotty results: Sometimes it seemed to go > more quickly, but often not. > > Let me clarify our test: We have a 1.5gb tar file from our > production raid that fairly represents the distribution of > data. We hit the performance problem when we get to dirs > with lots of small-ish files. But, as Julian mentioned, > we typically have many flavors of file sizes and populations. > > Admittedly, our untar'ing test isn't necessarily representitive > of what happens in production - we were just trying to fill > the disk and recreate the problem here. We *did* at least > hit a noticeable problem, and we believe it's the same > behavior that's hitting production. > > I just tried your exact suggested settings on an fs that > was already 96% full, and still experienced the very sluggish > behavior on exactly the same type of files/dirs. > > Our untar typically takes around 60-100 sec of system time > when things are going ok; 300-1000+ sec when the sluggishness > occurs. This time tends to increase as we get closer to > 99%. Sometimes as high as 4000+ secs. > > I wasn't clear from your mail if I should newfs the entire > fs and start over, or if I could have expected the settings > to make a difference for any NEW data. > > I can do this latter if you think it's required. The test > will then take several hours to run since we need at least > 85% disk usage to start seeing the problem. > > Thanks! > k > > Unfortunately, I do believe that you will need to start over from > scratch with a newfs. The problem is that by the time you are at > 85% full with the old parameters, the directory structure is already > too "dense" forcing you to search far and wide for more inodes. If > you start from the beginning with a large filesperdir then your > directory structure will expand across more of the disk which > should approximate the old algorithm. > > Kirk McKusick > > -- Ken Marx, kmarx@vicor-nb.com It's an orthogonal issue to leverage our critical resources and focus hard to resolve the market forces. - http://www.bigshed.com/cgi-bin/speak.cgi