From owner-freebsd-stable Thu Jun 6 13:50: 9 2002 Delivered-To: freebsd-stable@freebsd.org Received: from bseis.eis.cs.tu-bs.de (bseis.eis.cs.tu-bs.de [134.169.33.1]) by hub.freebsd.org (Postfix) with ESMTP id 73FA737B403 for ; Thu, 6 Jun 2002 13:49:50 -0700 (PDT) Received: from ultra4.eis.cs.tu-bs.de (ultra4.eis.cs.tu-bs.de [134.169.33.25]) by bseis.eis.cs.tu-bs.de (8.10.2+Sun/8.10.2) with ESMTP id g56KnmF01745 for ; Thu, 6 Jun 2002 22:49:48 +0200 (MET DST) Received: (from koch@localhost) by ultra4.eis.cs.tu-bs.de (8.10.2+Sun/8.10.2) id g56KnmJ04601 for freebsd-stable@freebsd.org; Thu, 6 Jun 2002 22:49:48 +0200 (MET DST) Date: Thu, 6 Jun 2002 22:49:48 +0200 From: Andreas Koch To: freebsd-stable@freebsd.org Subject: 4.6-RC: Glacial speed of dump backups Message-ID: <20020606204948.GA4540@ultra4.eis.cs.tu-bs.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.25i Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG The following applies to 4.6-RC cvsup'ped on May 23. I noticed that I have considerable problems getting dump to actually stream on my DLT VS80 tape drive, regardless of the block length specified (`-b' option to dump). Instead, the drive operates in start-stop-rewind-start mode (colloquially known as shoe-shining). Note that it is possible to actually operate in streaming mode when I pipe the output of dump to a buffering program such as team (from the ports tree). But then the multi-volume capability, which depends on the end-of-media recognition of dump (quite essential when writing to compressed tapes with variable capacity) is of course no longer available So, I looked deeper to enable the direct use of dump without a buffering utility, and something seems fishy. But first, some more details: I am trying to backup a filesystem hosted on an Adaptec 2400A IDE RAID controller operating in RAID 5 mode. The machine itself has an Athlon XP 1700+ CPU and 512 MB of memory. The sustained read throughput from the RAID (as measured by reading 1G of data from a raw partition to /dev/null using dd) is 16-17MB/s. When using the file system, this is only a bit lower (roughly 16 MB/s for non-fragmented files). The tape drive can accept 3 MB/s natively and up to 6 MB/s for compressible data. For a well-compressible file system such as /usr, a high input data rate is thus a necessity to keep the tape streaming. This machine was newly installed and hasn't been used much, thus the degree of fragmentation on the file systems is very low (for the /usr example used below, fsck reports a value of just 0.2%) However, when using a command such as dump -0af /dev/nsa0 /usr the average throughput reported by dump is only 2.7MB/s (which explains the constant shoe shining). Adding a buffering command such as dump -0af - /usr | team 8m 32 >/dev/nsa0 keeps the tape streaming and leads to an average throughput of 5.7MB/s (but makes multi-volume backups impossible). After establishing in this way that the path to the tape itself is not at fault, I performed some more experiments concentrating on dump writing to /dev/null and keeping an eye on the iostats of the disk. To my great astonishment, the command dump -0af /dev/null /usr has roughly the following throughput profile: 1) In the first phase, dump's Pass I and II (mapping files and directories), I get the following data from iostats tty da0 acd0 acd1 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 16.00 783 12.24 0.00 0 0.00 0.00 0 0.00 1 0 1 2 96 0 76 16.00 742 11.59 0.00 0 0.00 0.00 0 0.00 1 0 0 1 98 0 76 16.00 777 12.14 0.00 0 0.00 0.00 0 0.00 1 0 2 0 97 0 76 16.00 743 11.60 0.00 0 0.00 0.00 0 0.00 1 0 0 0 99 0 76 16.00 769 12.02 0.00 0 0.00 0.00 0 0.00 0 0 0 0100 0 76 16.00 770 12.04 0.00 0 0.00 0.00 0 0.00 0 0 0 0100 0 76 16.00 757 11.83 0.00 0 0.00 0.00 0 0.00 2 0 0 0 98 0 76 16.00 768 12.00 0.00 0 0.00 0.00 0 0.00 0 0 1 1 98 0 76 16.00 766 11.97 0.00 0 0.00 0.00 0 0.00 1 0 0 2 97 Thus, dump appears to be reading in 16KB blocks from the disk da0, leading to a throughput 11-12 MB/s, which isn't too shabby. 2) Then, in Pass III, dumping directories, the directory data is supposed to be written to tape (or, in this scenario, to /dev/null). Now the throughput profile changes to tty da0 acd0 acd1 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 2.32 651 1.47 0.00 0 0.00 0.00 0 0.00 1 0 4 0 95 0 229 2.33 757 1.72 0.00 0 0.00 0.00 0 0.00 0 0 2 1 97 0 76 2.34 581 1.33 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 76 2.34 549 1.25 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 76 2.41 548 1.29 0.00 0 0.00 0.00 0 0.00 1 0 0 1 98 0 76 2.35 647 1.49 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 76 2.32 672 1.52 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 76 2.36 577 1.33 0.00 0 0.00 0.00 0 0.00 0 0 1 4 95 0 76 2.34 678 1.55 0.00 0 0.00 0.00 0 0.00 0 0 3 0 97 0 76 2.34 788 1.80 0.00 0 0.00 0.00 0 0.00 0 0 3 3 94 Two things are immediately noticeable: dump now accesses the disk only in 2.3 KB blocks, leading to a corresponding drop of throughput to only 1.47 MB/s. When using a real tape, it would start shoe-shining right from the start here. 3) In Pass IV, dumping files, throughput rates vary wildly: tty da0 acd0 acd1 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 2.49 273 0.66 0.00 0 0.00 0.00 0 0.00 0 0 0 0100 0 76 2.45 268 0.64 0.00 0 0.00 0.00 0 0.00 0 0 0 0100 0 76 2.67 297 0.78 0.00 0 0.00 0.00 0 0.00 0 0 2 1 97 0 76 3.00 290 0.85 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 0 76 2.66 313 0.81 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 0 76 3.76 281 1.03 0.00 0 0.00 0.00 0 0.00 0 0 1 0 99 0 76 2.67 445 1.16 0.00 0 0.00 0.00 0 0.00 1 0 1 1 97 0 76 3.66 227 0.81 0.00 0 0.00 0.00 0 0.00 1 0 0 0 99 0 229 2.49 649 1.58 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 2.40 974 2.28 0.00 0 0.00 0.00 0 0.00 0 0 1 3 96 0 76 4.87 732 3.48 0.00 0 0.00 0.00 0 0.00 0 0 4 1 95 0 76 3.14 840 2.57 0.00 0 0.00 0.00 0 0.00 1 0 7 1 91 0 76 4.35 796 3.38 0.00 0 0.00 0.00 0 0.00 3 0 5 1 91 0 76 3.70 964 3.49 0.00 0 0.00 0.00 0 0.00 0 0 3 1 96 0 76 2.77 833 2.25 0.00 0 0.00 0.00 0 0.00 1 0 1 0 98 0 76 2.62 1073 2.75 0.00 0 0.00 0.00 0 0.00 1 0 1 2 96 0 77 3.56 451 1.57 0.00 0 0.00 0.00 0 0.00 0 0 2 0 98 0 76 3.29 686 2.20 0.00 0 0.00 0.00 0 0.00 0 0 4 1 95 In general, all of these rates are insufficient to keep the tape streaming (especially when considering the compressibility of the data). Furthermore, the block sizes used for the reads are also quite low and the average number of transactions has also dropped (but also has some peaks). Given the low degree of fragmentation, dump should easily be able to achive the 6MB/s required to operate the tape drive in streaming mode. Especially when considering that dump internally appears to perform some double buffering itself (three processes: one reading, one for slack, one writing). Currently, the only explanation I can think of is this: dump always seems to read the entire file at once if the file size is less than 64KB. Otherwise, the file is read in multiple 64KB chunks. Each chunk is sent individually to the tape, even if the block size is larger (e.g., -b 1000 to set 500KB blocks is capped at 64KB). Thus, for small files, the lack of adequate buffering plus the file system overhead for the large number of files leads to the reduction in throughput. The reason `| team 8m 32' helps is that many of these small files are collected together and the write to tape is only started after sufficient data has actually been accumulated. Maybe someone more familiar with the internal operation of dump could clarify this (I am not too clear on the interaction between the master and slave processes from glancing at the source). If the previous hypothesis is indeed true, dump in its current form would be severely limited when trying to use reasonably fast tape drives in a multi-volume backup situation. As for the speed of the tape drive: The DLT VS80 I used is actually on the lower end of the spectrum. SDLT has 11 MB/s native, and LTO goes up to 15 MB/s native. How do people with those drives keep their tapes streaming? Alternatively, does anyone know of a more intelligent dump variant for FreeBSD that performs better buffering internally? I would be grateful for any comments. Am I overlooking something? Or is there a real problem in dump as distributed by FreeBSD? Many thanks for any help (and for the patience in reading these ramblings :), Andreas Koch -- Andreas Koch Email : koch@eis.cs.tu-bs.de Technische Universit"at Braunschweig Phone : x49-531-391-2384 Abteilung Entwurf integrierter Schaltungen FAX : x49-531-391-5840 M"uhlenpfordtstr. 23, D-38106 Braunschweig, Germany * PGP key available * To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message