Date: Mon, 6 Jul 2009 11:12:46 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: freebsd-arch@freebsd.org Subject: Re: DFLTPHYS vs MAXPHYS Message-ID: <200907061812.n66ICkTc075260@apollo.backplane.com> References: <4A4FAA2D.3020409@FreeBSD.org> <20090705100044.4053e2f9@ernst.jennejohn.org> <4A50667F.7080608@FreeBSD.org> <20090705223126.I42918@delplex.bde.org> <4A50BA9A.9080005@FreeBSD.org> <20090706005851.L1439@besplex.bde.org> <4A50DEE8.6080406@FreeBSD.org> <20090706034250.C2240@besplex.bde.org> <4A50F619.4020101@FreeBSD.org> <20090707011217.O43961@delplex.bde.org> <4A522DC1.2080908@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Linear dd tty da0 cpu tin tout KB/t tps MB/s us ni sy in id 0 11 0.50 17511 8.55 0 0 15 0 85 bs=512 0 11 1.00 16108 15.73 0 0 12 0 87 bs=1024 0 11 2.00 14758 28.82 0 0 11 0 89 bs=2048 0 11 4.00 12195 47.64 0 0 7 0 93 bs=4096 0 11 8.00 8026 62.70 0 0 5 0 95 bs=8192 << MB/s breakpt 0 11 16.00 4018 62.78 0 0 4 0 96 bs=16384 0 11 32.00 2025 63.28 0 0 2 0 98 bs=32768 << id breakpt 0 11 64.00 1004 62.75 0 0 1 0 99 bs=65536 0 11 128.00 506 63.25 0 0 1 0 99 bs=131072 Random seek/read tty da0 cpu tin tout KB/t tps MB/s us ni sy in id 0 11 0.50 189 0.09 0 0 0 0 100 bs=512 0 11 1.00 184 0.18 0 0 0 0 100 bs=1024 0 11 2.00 177 0.35 0 0 0 0 100 bs=2048 0 11 4.00 175 0.68 0 0 0 0 100 bs=4096 0 11 8.00 172 1.34 0 0 0 0 100 bs=8192 0 11 16.00 166 2.59 0 0 0 0 100 bs=16384 0 11 32.00 159 4.97 0 0 1 0 99 bs=32768 0 11 64.00 142 8.87 0 0 0 0 100 bs=65536 0 11 128.00 117 14.62 0 0 0 0 100 bs=131072 ^^^ ^^^ note TPS rate and MB/s Which is the more important tuning variable? Efficiency of linear reads or saving re-seeks by buffering more data? If you didn't choose saving re-seeks you lose. To go from 16K to 32K requires saving 5% of future re-seeks to break-even. To go from 32K to 64K requires saving 11% of future re-seeks. To go from 64K to 128K requires saving 18% of future re-seeks. (at least with this particular disk) At the point where the block size exceeds 32768 if you aren't saving re-seeks with locality of reference from the additional cached data, you lose. If you are saving reseeks you win. cpu caches do not enter into the equation at all. For most filesystems the re-seeks being saved depend on the access pattern. For example, if you are doing a ls -lR or a find the re-seek pattern will be related to inode and directory lookups. The number of inodes which fit in a cluster_read(), assuming reasonable locality of reference, will wind up determining the performance. However, as the buffer size grows the total number of bytes you are able to cache becomes the dominant factor in calculating the re-seek efficiency. I don't have a graph for that but, ultimately, it means that reading very large blocks (i.e. 1MB) with a non-linear access pattern is bad because most of the additional data cached will never be used before the memory winds up being re-used to cache some other cluster. Another thing to note here is that command transfer overhead also becomes mostly irrelevant once you hit 32K, even if you have a lot of discrete disks. I/O's of less then 8KB are clearly wasteful of resources (in my test even a linear transfer couldn't achieve the bandwidth ceiling of the device). I/O's greater then 32K are clearly dependant on saving re-seeks. Note in particular that the data transfer rate for random I/O doubles as the buffer size doubles when you have a random access pattern (because seek times are so long). In otherwords, it's a huge win if you are actually able to save future re-seeks by caching the additional data. What this all means is that cpu caches are basically irrelevant when it comes to hard drive I/O. You are either saving enough re-seeks to make up for the greater seek latency or you aren't. One re-seek is something like 7ms. 7ms is a LONG time, which is why the cpu caches are irrelevant for choosing the block size. One can bean-count cache misses all day long but it won't make the machine perform any better in this case. -Matt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200907061812.n66ICkTc075260>