From owner-freebsd-fs Wed Dec 16 13:18:28 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id NAA13293 for freebsd-fs-outgoing; Wed, 16 Dec 1998 13:18:28 -0800 (PST) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pail.scd.ucar.edu (pail.scd.ucar.edu [128.117.28.5]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA13284 for ; Wed, 16 Dec 1998 13:18:25 -0800 (PST) (envelope-from rousskov@nlanr.net) Received: from localhost (rousskov@localhost) by pail.scd.ucar.edu (8.8.7/8.8.7) with SMTP id OAA29627 for ; Wed, 16 Dec 1998 14:17:57 -0700 (MST) (envelope-from rousskov@nlanr.net) X-Authentication-Warning: pail.scd.ucar.edu: rousskov owned process doing -bs Date: Wed, 16 Dec 1998 14:17:57 -0700 (MST) From: Alex Rousskov X-Sender: rousskov@pail.scd.ucar.edu Reply-To: Alex Rousskov To: FreeBSD-FS@FreeBSD.ORG Subject: Fast/slow close and open Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hi there, I have a strange problem that is probably FS related. Any help is appreciated. Background: ---------- - A small program ("player") replays a trace from Squid Web proxy running under Web Polygraph benchmark - The trace consists of ~ 300K of open(2)/write(2)/close(2) file system calls - The calls may interleave (e.g. open(#5), write(#3), write(#5), close(#3), write(#5)) but all FS calls are blocking (no threads and such) - No artificial delays between the calls are introduced - Most files are "small": 11K mean, 7.5K median - No other activity on the system - FreeBSD 2.2.7-RELEASE - 256 RAM; Pentium II (267.27-MHz 686-class CPU) - kern.update is set to 43200 (12 hours) (for these tests I do not care about FS consistency) - 9GB disk(s) ["IBM DDRS-39130W S92A" type 0 fixed SCSI 2; Direct-Access 8715MB] with one partition per disk - newfs -o time mount options: rw,noauto - each disk has /cache?/??/???/ directories pre-created (1x16x128 directories); no other data on the disk - Squid (and hence player) fill leaf directories with files one leaf directory at a time (until there are 128 files in the directory) - Disk space utilization starts with 0% and is 20% at the end of the end of each experiment The problem: ----------- The player measures open/write/close delays using gettimeofday() calls wrapped around file system calls. I monitor sudden peaks and dives in close and open calls throughput: http://ircache.nlanr.net/Polygraph/tmp/ During peaks, close(2) response time DEcreases from a mean of 17 msec to tens of usec(!) and open(2) response time INcreases from 20 to 27 msec or so. During dives, both response times increase 50-400%. Write(2) response time is very fast (e.g., 300 usec mean) and is virtually not affected by the peaks and dives. There are no peaks for 3- and 4- disks experiments. Dives are present on 1-, 2-, 3-, and 4-disk runs. There are up to three bursts within a ~2-3 hour experiment so it is hard to say if they occur at "close-to-regular" intervals. Usually the bursts are 40-80 minutes apart. Each burst lasts 5-10 minutes (9K-18K open/close calls) so it is not a "random" thing. The same behavior was measured on Squid. The player program is very small and simple and confirms that those oddities are not caused by something in Squid or in the network. Question: -------- What kind of system activity can suddenly significantly speedup or slowdown open/close calls? Since it happens once in 40-80 minutes and lasts for several minutes (with 30-50 open_calls/sec rate) it is probably something major.. I am especially confused by close calls being almost zero-overhead in the middle of a big experiment. Looks like some write-behind buffers suddenly appear out of nowhere, get used (speeding up close calls), and then disappear. Dives are also very disturbing as they hurt overall performance... Any clues? Thank you, Alex. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message