Date: Mon, 26 Oct 2009 02:30:32 +0100 From: Solon Lutz <solon@pyro.de> To: Wes Morgan <morganw@chemikals.org>, freebsd-fs@FreeBSD.ORG Subject: Re: raidz slowing down Message-ID: <1791999980.20091026023032@pyro.de> In-Reply-To: <alpine.BSF.2.00.0910251941080.2854@ibyngvyr.purzvxnyf.bet> References: <886802879.20091008113716@pyro.de> <alpine.BSF.2.00.0910251941080.2854@ibyngvyr.purzvxnyf.bet>
next in thread | previous in thread | raw e-mail | index | archive | help
> Did you ever get any response? I have a very similar sounding issue with= =20 > my raidz2. I've always assumed it was because the volume was nearly full= =20 > and maybe some fragmentation or something. All of my devices are on MPT= =20 > controllers, so I don't think that the highpoint device is an issue. Nope, no responses... Since I was working on a rescue operation, I didn't have the patience to eliminated all kinds of errors and so I swapped out da1 (maybe a little bit slow or buggy?) and used the forensics version of dd 'dcfldd'. It has a split option and I suspected that ZFS has problems when writing huge amounts of continous data streams - so I split the 10TB in 100GB files, which took about 11 hours. I don't know if this is general problem, or if this only happens when the input id delivered at a much higher data-rate. In this case, the HW-RAID/zp= ool was able to deliver data at 600MB/s while the RAIDZ/zpool could only write at 1= 30MB/s. The dynamics of this 'slow-down' that I could watch via gstat looked like t= he whole access on the device level was desynchronizing completely. In the end, before I quit the process, write-speed was down to 5MB/s ! But as I mentioned earlier, I had no nerves for bug-hunting, due to a bigger (still unsolved) problem at hand. Maybe somebody else likes to investigate? I'm busy with ZFS forensics... solon > On Thu, 8 Oct 2009, Solon Lutz wrote: >> I built a 9x hdd 11TB raidz for some rescue purposes and started >> copying an image from another partition via "dd if=3D/dev/da0..." to it. >> It consists of: ad4 da1 da2 da3 da4 da5 da6 da7 da8, da1 to da8 are >> connected via two highpoint controllers. >> In the beginning write speeds were quite fair: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 424 0 0 0.0 424 52483 33.9 84.6| ad4 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >> 35 356 0 0 0.0 356 44584 76.4 124.5| da1 >> 35 296 0 0 0.0 296 36919 84.5 121.0| da2 >> 34 361 0 0 0.0 361 45111 75.5 124.7| da3 >> 35 346 0 0 0.0 346 43196 78.6 123.2| da4 >> 35 344 0 0 0.0 344 42940 80.0 124.7| da5 >> 35 343 0 0 0.0 343 42812 80.7 124.5| da6 >> 35 344 0 0 0.0 344 43051 79.8 123.9| da7 >> 34 342 0 0 0.0 342 42796 80.6 124.4| da8 >> Now, some 10 hours and 2.5TB later, it look like that most of the time: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 10 0 0 0.0 10 6 0.8 0.2| ad4 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >> 4 13 0 0 0.0 13 8 550.4 178.5| da1 >> 0 12 0 0 0.0 12 7 0.7 0.2| da2 >> 0 11 0 0 0.0 11 7 0.7 0.2| da3 >> 0 10 0 0 0.0 10 5 0.6 0.2| da4 >> 0 11 0 0 0.0 11 6 0.9 0.3| da5 >> 0 12 0 0 0.0 12 7 0.7 0.2| da6 >> 0 11 0 0 0.0 11 7 0.7 0.2| da7 >> 0 9 0 0 0.0 9 6 0.8 0.2| da8 >> da1 seems to be busy most of time and every few seconds all the other >> devices write some data with nearly normal speed: >> dT: 1.003s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 0 254 0 0 0.0 254 31331 34.9 35.4| ad4 >> 0 0 0 0 0.0 0 0 0.0 0.0| da0 >> 4 0 0 0 0.0 0 0 0.0 0.0| da1 >> 0 254 0 0 0.0 254 31346 107.4 104.5| da2 >> 0 256 0 0 0.0 256 31345 108.1 104.0| da3 >> 0 255 0 0 0.0 255 31345 110.2 105.1| da4 >> 35 200 0 0 0.0 200 24912 143.3 115.0| da5 >> 35 211 0 0 0.0 211 26303 137.8 114.9| da6 >> 35 210 0 0 0.0 210 26079 139.3 114.9| da7 >> 35 209 0 0 0.0 209 25952 135.2 113.7| da8 >> Sometimes it even gets back to 'normal' behaviour, but never reaches >> the speeds it once had: >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name >> 35 274 0 0 0.0 274 34334 44.2 66.6| ad4 >> 0 1166 1166 149243 0.1 0 0 0.0 14.3| da0 >> 35 120 0 0 0.0 120 14717 94.4 64.5| da1 >> 35 96 0 0 0.0 96 11665 113.9 64.3| da2 >> 35 100 0 0 0.0 100 12288 98.7 63.9| da3 >> 35 103 0 0 0.0 103 12496 93.4 59.4| da4 >> 34 112 0 0 0.0 112 13694 106.1 67.4| da5 >> 35 71 0 0 0.0 71 8596 115.3 66.8| da6 >> 35 116 0 0 0.0 116 14205 101.7 67.3| da7 >> 35 83 0 0 0.0 83 10066 112.2 65.9| da8 >> Syslog reports the following: >> Oct 8 09:53:40 radium kernel: hptrr: start channel [0,0] >> Oct 8 09:53:40 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 09:57:44 radium kernel: hptrr: start channel [0,0] >> Oct 8 09:57:45 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 10:54:26 radium kernel: hptrr: start channel [0,0] >> Oct 8 10:54:27 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 11:10:29 radium kernel: hptrr: start channel [0,0] >> Oct 8 11:10:30 radium kernel: hptrr: channel [0,0] started successfully >> Oct 8 11:17:27 radium kernel: hptrr: start channel [0,0] >> Oct 8 11:17:27 radium kernel: hptrr: channel [0,0] started successfully >> Is this a problem of the hptrr device or is da1 failing? >> Mit freundlichen Gr=FC=DFen >> Best regards, >> Solon Lutz >> +-----------------------------------------------+ >> | Pyro.Labs Berlin - Creativity for tomorrow | >> | Wasgenstrasse 75/13 - 14129 Berlin, Germany | >> | www.pyro.de - phone + 49 - 30 - 48 48 58 58 | >> | info@pyro.de - fax + 49 - 30 - 80 94 03 52 | >> +-----------------------------------------------+ >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1791999980.20091026023032>