Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Oct 2017 09:15:27 -0700
From:      javocado <javocado@gmail.com>
To:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   lockup during zfs destroy
Message-ID:  <CAP1HOmQtU14X1EvwYMHQmOru9S4uyXep=n0pU4PL5z-%2BQnX02A@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
I am trying to destroy a dense, large filesystem and it's not going well.

Details:
- zpool is a raidz3 with 3 x 12 drive vdevs.
- target filesystem to be destroyed is ~2T with ~63M inodes.
- OS: FreeBSD 10.3amd with 192 GB of RAM.
- 120 GB of swap (90GB recently added as swap-on-disk)

What happened initially is the system locked after a few hours up and I had
to reboot. Upon rebooting and starting zfs, I see sustained disk activity
in gstat *and* that the sustained activity is usually just 6 disks reading.
Two raidz3 vdevs are involved in this filesystem I am
deleting so there are 6 parity disks ... not sure if that is correlated or
not.

At about the 1h40m mark of uptime I see things start to happen in top: a
sudden spike in load, and drop in the amount of "Free" memory as reported
in top:
([CODE]Mem: 23M Active, 32M Inact, 28G Wired, 24M Buf, 159G Free[/CODE])

It drops down under a GB and then fluctuates up and down till eventually it
reaches some small amount (41 MB). As this drop starts, I see gstat
activity on zpool drives cease, and there's some light activity on the swap
devices, but not much. Also, the amount of swap used is reported as very
little, maybe less than a MB to 24 MB. swapinfo shows nothing used. After
the memory usage settles the system eventually ends up in a locked state
where:

- nothing is going on in gstat; the only non-zero number is the queue
length for the swap device which is stuck at 4
- load drops to nothing, and occasionally I see the zfskern and zpool procs
stuck in vmwait state*.
- shell is unresponsive, but carriage returns register
- there are NO kernel/messages of any kind on console indicating a problem
or resource exhaustion

Finally, I cannot do this:
# zdb -dddd pool/filesystem  | grep DELETE_QUEUE
zdb: can't open 'pool/filesystem': Device busy
(presumably because it is pending destroy ...)

I had set:
vm.kmem_size="384G"
(and nothing else in loader)

but even removing that and setting more realistic figures like:
vm.kmem_size=200862670848
vm.kmem_size_max=200862670848
vfs.zfs.arc_max=187904819200

have not resulted in a different outcome, *though I don't see the processes
in vmwait any longer, the state is just "-"

I've just lowered these to:
vm.kmem_size=198642237440
vm.kmem_size_max=198642237440
vfs.zfs.arc_max=190052302848

to see if that will make a difference.

No matter how many times I reboot, so far about 6, I never make it past the
1h40m mark and this memory dip. I don't know if I'm making any progress or
just running into the same wall.

My questions:

- is this what it appears to be, a memory exhaustion?
- if so, why isn't swap utilized?
- how would I configure my way past this hurdle?
- a filesystem has a DELETE_QUEUE ... does the zpool itself have a destroy
queue of some    kind?  I am trying to see if I can see the zpool working
and how far along it is, but I do not know what to query with zdb

Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAP1HOmQtU14X1EvwYMHQmOru9S4uyXep=n0pU4PL5z-%2BQnX02A>