From owner-freebsd-fs Thu Dec 14 14:57:45 2000 From owner-freebsd-fs@FreeBSD.ORG Thu Dec 14 14:57:42 2000 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 6DD0E37B402 for ; Thu, 14 Dec 2000 14:57:42 -0800 (PST) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id PAA29369; Thu, 14 Dec 2000 15:53:27 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAA.kaqn5; Thu Dec 14 15:53:20 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id PAA15102; Thu, 14 Dec 2000 15:57:28 -0700 (MST) From: Terry Lambert Message-Id: <200012142257.PAA15102@usr08.primenet.com> Subject: Re: Filesystem tuning (minimize seeks) To: henrich@sigbus.com (Charles Henrich) Date: Thu, 14 Dec 2000 22:57:26 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), freebsd-fs@FreeBSD.ORG In-Reply-To: <20001213130138.A25214@sigbus.com> from "Charles Henrich" at Dec 13, 2000 01:01:38 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: tlambert@usr08.primenet.com Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > Yes, my test is running about 25-50 machines writing a 20mb file to the > > > FreeBSD box. (The clients are FreeBSD as well). The write is nothing > > > more than a dd. > > I think maybe you've misunderstood my initial question. What Filesystem > tuning options are there, or any suggestions, to reduce the amount of seeking > going on when N files are being created and written to at once. I have N > machines, each one opens a file, writes out a chunk of data, then closes the > file. Unfortunatly, because all 50 are doing this simultaneously, the data is > getting written to disk very non-sequentially (From a per file perspective). > Is there any options to UFS (or via NFSd?) to delay writes, or anything of > that nature to allow the data to be serialized more often than not? The NFS protocol is defined as not returning success unless the write has been committed to stable storage. In FreeBSD, this tends to serialize NFS I/O from a single client, and between multiple clients in excess of the number of nfsiod's you are running. For your large number of clients, increasing the number of nfsiod's should prefent inter-client contention. For the write latency and the intra-client contention (e.g. several writes from a single client), the only thing you can really do at this time is mount the exported FS async. SVR4 has an option called "write gathering", where they violate the NFS protocol definition (and make server failures nearly impossible to fully recover from completely) by scheduling the write to occur after a short period, and lie to the client that the data has been committed to stable storage. Then if subsequent writes occur in the same pages where the previous write fell, the writes are "gathered together", and occur in the same physical write. In general, most high performance NFS servers have battery backed RAM, where they log the writes, so that they can state to the client that the write has been committed to stable storage, without lying to the client. (if the system fails, after a boot, the write log is replayed to recover any uncommitted writes that the server told the client had been committed). Network Appliance, PrestoServ, and similar products use this technique. If you end up with a lot of client stalls because a client is stalling itself (i.e. not inter-client stalls, which can be fixed by upping the number of nfsiod's), then you might want to consider going to one of these boxes. If the data is not critical, an async mount might resolve the problem, with the added risk, equal to write gathering in case of a crash, that you will have to redo the work between the last time the writes were committed, and the time of the crash; this would generally mean restarting the clients, since they believe the server when it says the data has been written to stable storage, so there's no way to cause them to rewrite the mising sections, assuming they even still have them available. > I mean, in top, what is the process state "inode" relating? What is the > process blocking on at that point? An inode allocation into the ihash table. You should be able to tune the number of inodes larger (on the machine with the problem; I'm assuming the NFS server) to avoid them being unnecessarily recycled too quickly. This should actually produce a significant speed up, since the inode/vnode association is destroyed even though the cache contents hung off the vnode are valid. When this happens, the cache contents are unrecoverable, and have to be recreated by rereading them off of disk. Having enough inodes to allow cached vnodes to stay associated makes the cached data recoverable. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message