Date: Wed, 31 Aug 2011 03:12:11 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Lev Serebryakov <lev@FreeBSD.org> Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 Message-ID: <20110831101211.GA98865@icarus.home.lan> In-Reply-To: <147623060.20110831123623@serebryakov.spb.ru> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <147623060.20110831123623@serebryakov.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 31, 2011 at 12:36:23PM +0400, Lev Serebryakov wrote: > Hello, Jeremy. > You wrote 31 ??????? 2011 ?., 4:42:51: > > > Furthermore, why are these benchmarks not providing speed data > > per-device (e.g. gstat or iostat -x data)? There is a possibility that > > one of your drives could be performing at less-than-ideal rates (yes, > > intermittently) and therefore impacts (intermittently) your overall I/O > > throughput. > Ok. I've run my benchamrk when `iostat -x -d -c 999999' is running. > Results are like this: > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 340.9 292.9 43138.8 146.5 0 1.2 42 > ada2 340.9 293.9 43138.8 147.0 0 1.9 63 > ada3 340.9 292.9 43044.7 146.5 0 1.5 57 > ada4 341.9 292.9 43232.9 146.5 0 1.3 42 > ada5 341.9 292.0 43138.8 146.0 2 1.3 40 > > {snipping text, focusing on data} > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 165.3 87.0 10515.9 43.5 2 5.0 50 > ada2 165.3 87.0 10547.2 43.5 2 7.7 61 > ada3 167.2 87.0 10703.7 43.5 1 6.1 55 > ada4 165.3 87.0 10484.6 43.5 3 4.9 44 > ada5 160.4 87.0 10265.5 43.5 5 5.1 48 > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 884.1 350.9 56583.1 175.4 0 1.0 49 > ada2 886.1 350.9 56677.2 175.4 0 1.3 58 > ada3 882.2 349.9 56489.0 175.0 2 1.7 63 > ada4 885.1 350.9 56614.5 175.4 0 1.4 64 > ada5 887.1 350.9 56739.9 175.4 0 1.5 63 > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 640.6 261.5 41001.3 130.8 0 0.9 40 > ada2 639.7 261.5 40969.9 130.8 0 0.9 35 > ada3 637.7 262.5 40844.5 131.3 0 1.5 46 > ada4 640.6 260.6 41001.3 130.3 1 1.3 65 > ada5 638.7 261.5 40875.9 130.8 0 1.3 46 > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 243.7 102.8 15660.2 51.4 2 1.9 36 > ada2 240.8 102.8 15503.6 51.4 3 1.9 43 > ada3 242.7 103.7 15566.2 51.9 0 1.9 30 > ada4 244.7 103.7 15785.5 51.9 2 2.4 56 > ada5 243.7 102.8 15566.2 51.4 2 1.8 30 This benchmark data is more or less unhelpful due to the fact that there are writes occurring during the middle of your reads. There's another spun-off portion of this thread that is discussing how you're benchmarking these things (specifically some code you wrote?). I don't know what else to say in this regard. It would really help if you could use something like bonnie++ and make sure the filesystem is not being used by ANYTHING during your benchmarks. Anyway, the data is interesting because from an aggregate total perspective, you're hitting some arbitrary limit on all of your devices which almost indicates memory bus throttling or something along those lines; CPU time? I really don't know. Aggregate write speeds respectively: 43138.8 + 43138.8 + 43044.7 + 43232.9 + 43138.8 == 215694.0 KByte/sec 10515.9 + 10547.2 + 10703.7 + 10484.6 + 10265.5 == 52516.9 KByte/sec 56583.1 + 56677.2 + 56489.0 + 56614.5 + 56739.9 == 283103.7 KByte/sec 41001.3 + 40969.9 + 40844.5 + 41001.3 + 40875.9 == 204692.9 KByte/sec 15660.2 + 15503.6 + 15566.2 + 15785.5 + 15566.2 == 78081.7 KByte/sec The totals are "all over the place", but what interests me the most is that the total aggregate never exceeds an amount that's slightly under 300MBytes/sec.. That number has some relevance if, say, you're using a port multiplier (5 devices aggregated across one SATA300 port). Despite these being WD20EARS drives (4 platters, ugh!), these individual devices should be able to push 75-90MBytes/sec writes, and slightly higher reads. Like you, it also interests me that all the drives behave the same; meaning all speeds are roughly the same on all 5 devices simultaneously, regardless of speed/rate/throughput. Here's an idea: can you stop using the filesystem for a bit and instead do raw dd's from all of the /dev/adaX entries to /dev/null simultaneously (pick something like bs=64k or bs=256k), then run your iostats? I'm basically trying to figure out if the bad speeds are actually the devices themselves or if it's the geom_raid5 stuff. You get where I'm going with this. If 5 simultaneously dds reading from the drives is very fast (way faster than the above) and there aren't sporadic drops in performance which aren't caused by writes (hence my "stop using the filesystem" comment), then I think we've narrowed down where the issue lies -- not the drives. > 1) benchmark induce some writing. atime modification? No, I've turned > this one off, but it doesn't help. I afraid, that this read-write > interleaving could be cause of "problems", but I don't understand, > WHY here is some writing (1 writing per 2 reads in average) when > read-only benchmark runs. It doesn't write any logs, etc. Yes, > writing speed is very low, every write transaction is about 2Kb, > but WHY they are here?! If I stop benchmark, here will be less than > 1 write transaction per second. (Note: I'm going to assume by "Kb" you mean "kilobytes" and not "kilobits"; B = byte, b = bit. This is why I got into the habit of just writing out the unit in full, because too many people try to shorthand it and pick the wrong one. And it'll be a cold day in hell before I ever use "XXbi" (e.g. kibi, mebi, gibi, tebi)) The dd method I describe should absolutely not induce writes, hence my recommendation. If writes are seen during the dd's, then either the filesystem is mounted and FreeBSD is doing something "interesting" on a filesystem or vfs level, or your system is actually an izbushka..... Maybe softupdates are somehow responsible? Not sure. > 2) without `-x' it shows, that typical read transaction size is > about 50Kb. It is very strange, as geom_raid5 shows (I have > diagnostics in it), that almost all file access is aligned and is > 128Kb-sized... I'm not sure -- please take what I say here with a grain of salt -- but I believe there was a recent discussion on -stable or -fs about some sort of 64KByte "limit" within UFS/UFS2 somewhere? I think I'm thinking of "MAX_BSIZE". I'm having a lot of difficulty following all these storage-related threads. Everyone seems to show up "in bulk" on the mailing lists all at once and it's overwhelming at times. I'm getting old, in more ways than one. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110831101211.GA98865>