Date: Sat, 4 Jan 2014 09:56:50 +0100 From: "O. Hartmann" <ohartman@zedat.fu-berlin.de> To: "Steven Hartland" <killing@multiplay.co.uk> Cc: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: ZFS command can block the whole ZFS subsystem! Message-ID: <20140104095650.2c500d20@thor.walstatt.dyndns.org> In-Reply-To: <4FB654C6DBC1479C943BD2C305A1C92E@multiplay.co.uk> References: <20140103130021.30569db4@thor.walstatt.dyndns.org> <FC618C2B94D9425EAE5C11FEF2042F49@multiplay.co.uk> <20140103171457.0fbf0cd4@telesto> <4FB654C6DBC1479C943BD2C305A1C92E@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/JH3CdiJWm_mtysontn0VZ.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Fri, 3 Jan 2014 17:04:00 -0000 "Steven Hartland" <killing@multiplay.co.uk> wrote: > ----- Original Message -----=20 > From: "O. Hartmann" <ohartman@zedat.fu-berlin.de> > > On Fri, 3 Jan 2014 14:38:03 -0000 > > "Steven Hartland" <killing@multiplay.co.uk> wrote: > >=20 > > >=20 > > > ----- Original Message -----=20 > > > From: "O. Hartmann" <ohartman@zedat.fu-berlin.de> > > > >=20 > > > > For some security reasons, I dumped via "dd" a large file onto > > > > a 3TB disk. The systems is 11.0-CURRENT #1 r259667: Fri Dec 20 > > > > 22:43:56 CET 2013 amd64. Filesystem in question is a single ZFS > > > > pool. > > > >=20 > > > > Issuing the command > > > >=20 > > > > "rm dumpfile.txt" > > > >=20 > > > > and then hitting Ctrl-Z to bring the rm command into background > > > > via fg" (I use FreeBSD's csh in that console) locks up the > > > > entire command and even worse - it seems to wind up the pool in > > > > question for being exported! > > >=20 > > > I cant think of any reason why backgrounding a shell would export > > > a pool. > >=20 > > I sent the job "rm" into background and I didn't say that implies an > > export of the pool! > >=20 > > I said that the pool can not be exported once the bg-command has > > been issued.=20 >=20 > Sorry Im confused then as you said "locks up the entire command and > even worse - it seems to wind up the pool in question for being > exported!" >=20 > Which to me read like you where saying the pool ended up being > exported. I'm not a native English speaker. My intention was, to make it short: renove the dummy file. While having issued the command in the foreground of the terminal, I decided a second later after hitting return, to send it in the background via suspending the rm-command and issuing "bg" then. >=20 > > > > I expect to get the command into the background as every other > > > > UNIX command does when sending Ctrl-Z in the console. > > > > Obviously, ZFS related stuff in FreeBSD doesn't comply.=20 > > > >=20 > > > > The file has been removed from the pool but the console is still > > > > stuck with "^Z fg" (as I typed this in). Process list tells me: > > > >=20 > > > > top > > > > 17790 root 1 20 0 8228K 1788K STOP 10 0:05 > > > > 0.00% rm > > > >=20 > > > > for the particular "rm" command issued. > > >=20 > > > Thats not backgrounded yet otherwise it wouldnt be in the state > > > STOP. > >=20 > > As I said - the job never backgrounded, locked up the terminal and > > makes the whole pool inresponsive. >=20 > Have you tried sending a continue signal to the process? No, not by intention. Since the operation started to slow down the whole box and seemed to influence nearly every operation with ZFS pools I intended (zpool status, zpool import the faulty pool, zpool export) I rebootet the machine. After the reboot, when ZFS came up, the drive started working like crazy again and the system stopped while in recognizing the ZFS pools. I did then a hard reset and restarted in single user mode, exported the pool successfully, and rebooted. But the moment I did an zpool import POOL, the heavy working continued. >=20 > > > > Now, having the file deleted, I'd like to export the pool for > > > > further maintainance > > >=20 > > > Are you sure the delete is complete? Also don't forget ZFS has > > > TRIM by default, so depending on support of the underlying > > > devices you could be seeing deletes occuring. > >=20 > > Quite sure it didn't! It takes hours (~ 8 now) and the drive is > > still working, although I tried to stop.=20 >=20 > A delete of a file shouldn't take 8 hours, but you dont say how large > the file actually is? The drive has a capacity of ~ 2,7 TiB (Western Digital 3TB drive). The file I created was, do not laugh, please, 2,7 TB :-( I guess depending on COW technique and what I read about ZFS accordingly to this thread and others, this seems to be the culprit. There is no space left to delete the file savely. By the way - the box is still working on 100% on that drive :-( That's now > 12 hours. =20 >=20 > > > You can check that gstat -d > >=20 > > command report 100% acticity on the drive. I exported the pool in > > question in single user mode and now try to import it back while in > > miltiuser mode. >=20 > Sorry you seem to be stating conflicting things: > 1. The delete hasnt finished > 2. The pool export hung > 3. You have exported the pool >=20 Not conflicting, but in my non-expert terminology not quite accurate and precise as you may expect. ad item 1) I terminated (by the brute force of the mighty RESET button) the copy command. It hasn't finished the operation on the pool as I can see, but it might be a kind of recovery mechanism in progress now, not the rm-command anymore. ad 2) Yes, first it hung, then I reset the box, then in single user mode the export to avoid further interaction, then I tried to import the pool again ... ad 3) yes, successfully after the reset, now I imported the pool and the terminal, in which I issued the command is still stuck again while the pool is under heavy load. > What exactly is gstat -d reporting, can you paste the output please. I think this is boring looking at 100% activity, but here it is ;-) dT: 1.047s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d= %busy Name 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada2 10 114 114 455 85.3 0 0 0.0 0 0 0.0= 100.0| ada3 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada4 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| cd0 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p2 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p3 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p4 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p5 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p6 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p7 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p8 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p9 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p10 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p11 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p12 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p13 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada0p14 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/boot 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gptid/c130298b-046a-11e0-b2d6-001d60a6fa74 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/root 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/swap 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gptid/fa3f37b1-046a-11e0-b2d6-001d60a6fa74 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/var 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/var.tmp 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.src 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.obj 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.ports 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/data 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/compat 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/var.mail 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| gpt/usr.local 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada1p1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada2p1 10 114 114 455 85.3 0 0 0.0 0 0 0.0= 100.0| ada3p1 0 0 0 0 0.0 0 0 0.0 0 0 0.0= 0.0| ada4p1 >=20 > > Shortly after issuing the command > >=20 > > zpool import POOL00 > >=20 > > the terminal is stuck again, the drive is working at 100% for two > > hours now and it seems the great ZFS is deleting every block per > > pedes. Is this supposed to last days or a week? >=20 > What controller and what drive? Hardware is as follows: CPU: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz (3201.89-MHz K8-class CPU) real memory =3D 34359738368 (32768 MB) avail memory =3D 33252507648 (31712 MB) ahci1: <Intel Patsburg AHCI SATA controller> port 0xf090-0xf097,0xf080-0xf0= 83,0xf070-0xf077,0xf060-0xf063,0xf020-0xf03f mem 0xfb520000-0xfb5207ff irq = 20 at device 31.2 on pci0 ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich8: <AHCI channel> at channel 0 on ahci1 ahcich9: <AHCI channel> at channel 1 on ahci1 ahcich10: <AHCI channel> at channel 2 on ahci1 ahcich11: <AHCI channel> at channel 3 on ahci1 ahcich12: <AHCI channel> at channel 4 on ahci1 ahcich13: <AHCI channel> at channel 5 on ahci1 ahciem0: <AHCI enclosure management bridge> on ahci1 >=20 > What does the following report: > sysctl kstat.zfs.misc.zio_trim sysctl kstat.zfs.misc.zio_trim kstat.zfs.misc.zio_trim.bytes: 0 kstat.zfs.misc.zio_trim.success: 0 kstat.zfs.misc.zio_trim.unsupported: 507 kstat.zfs.misc.zio_trim.failed: 0 >=20 > > > > but that doesn't work with > > > >=20 > > > > zpool export -f poolname > > > >=20 > > > > This command is now also stuck blocking the terminal and the > > > > pool from further actions. > > >=20 > > > If the delete hasnt completed and is stuck in the kernel this is > > > to be expected. > >=20 > > At this moment I will not imagine myself what will happen if I have > > to delete several deka terabytes. If the weird behaviour of the > > current system can be extrapolated, then this is a no-go. >=20 > As I'm sure you'll appreciate that depends if the file is simply being > unlinked or if each sector is being erased, the answers to the above > questions should help determine that :) You're correct in that. But sometimes I'd like to appreciate to have the ch= oice. >=20 > Regards > Steve Regards, Oliver --Sig_/JH3CdiJWm_mtysontn0VZ.0 Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQEcBAEBAgAGBQJSx8zSAAoJEOgBcD7A/5N8dYgH/AlFSUqGDzR5OcFgkC8Cg1Sm a+XhTAaoeDx1sbr2V4vRgyWHhtc2YjscfyWa4qSRj7Nvl/2m8NXkRMD/u51iwva/ P+5iD62bKuCFpJ4ZavKMlBWUEPrsdcAbyr7a9S1oIjSfhAxFdqaR0qzyy46CBLzb QJqOAETmbQluOsWMcgdgBv56NEjTLINmNWX9cKpj85feDgJdGrenJXPdl7DIW1Ze J6+uVeIf6WD9Woc5WqmOpnNfKXVq/XFMrz77KNvGSN5bItuKup10RPoQDsVMn9Ml mqhYBfWv5nDV5gwbPfjWIGtQ5Y6pKCGj4ubhxa4d3DLCY0HYRYupPJLLxY1A7YI= =44gh -----END PGP SIGNATURE----- --Sig_/JH3CdiJWm_mtysontn0VZ.0--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140104095650.2c500d20>