Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 02 Sep 2021 10:10:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 258208] [zfs] locks up when using rollback or destroy on both 13.0-RELEASE & sysutils/openzfs port
Message-ID:  <bug-258208-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D258208

            Bug ID: 258208
           Summary: [zfs] locks up when using rollback or destroy on both
                    13.0-RELEASE & sysutils/openzfs port
           Product: Base System
           Version: 13.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: bugs@FreeBSD.org
          Reporter: dch@freebsd.org

zfs operations such as rollback or destroy deadlock FreeBSD. Subsequent
commands
such as `mount -p` or `bectl list` also hang. writing data as files still
works.
dmesg, syslog etc are all empty.

## environment

tried under "default" 13.0-RELEASE-p3 zfs, and also under "kmod"

zfs-2.1.99-1
zfs-kmod-v2021073000-zfs_7eebcd2be

I will build CURRENT with debug, re-try this, and report back.

## pools

embiggen/koans dataset (zpool of 4 drive striped mirrors)
envy (nvme zpool)

  pool: embiggen
 state: ONLINE
  scan: scrub repaired 0B in 02:42:41 with 0 errors on Thu Aug 26 18:22:44 =
2021
config:

        NAME          STATE     READ WRITE CKSUM
        embiggen      ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            gpt/zfs0  ONLINE       0     0     0
            gpt/zfs1  ONLINE       0     0     0
          mirror-1    ONLINE       0     0     0
            gpt/zfs2  ONLINE       0     0     0
            gpt/zfs3  ONLINE       0     0     0

errors: No known data errors

  pool: envy
 state: ONLINE
  scan: scrub repaired 0B in 00:07:20 with 0 errors on Thu Aug 26 15:47:42 =
2021
config:

        NAME        STATE     READ WRITE CKSUM
        envy        ONLINE       0     0     0
          gpt/envy  ONLINE       0     0     0

errors: No known data errors


## default 13.0-RELEASE zfs

- the parent process is not a zombie process
- not writing a coredump
- can't be attached to from lldb etc
- won't respond to `kill -CONT ...`

```
$ top -SjwbHzPp 47255
last pid: 20225;  load averages:  0.78,  0.81,  0.70  up 2+02:11:14    14:1=
4:38
3744 threads:  10 running, 3693 sleeping, 6 stopped, 35 waiting
CPU 0:  2.6% user,  0.8% nice,  1.0% system,  0.0% interrupt, 95.6% idle
CPU 1:  3.0% user,  0.9% nice,  1.1% system,  0.6% interrupt, 94.3% idle
CPU 2:  3.2% user,  1.0% nice,  1.2% system,  0.0% interrupt, 94.6% idle
CPU 3:  3.3% user,  1.0% nice,  1.2% system,  0.0% interrupt, 94.6% idle
CPU 4:  3.4% user,  1.0% nice,  1.2% system,  0.1% interrupt, 94.4% idle
CPU 5:  3.5% user,  1.0% nice,  1.2% system,  0.1% interrupt, 94.3% idle
CPU 6:  3.5% user,  1.0% nice,  1.3% system,  1.1% interrupt, 93.0% idle
CPU 7:  3.4% user,  1.0% nice,  1.2% system,  0.0% interrupt, 94.3% idle
Mem: 2489M Active, 20G Inact, 472M Laundry, 66G Wired, 8074K Buf, 35G Free
ARC: 51G Total, 34G MFU, 9441M MRU, 15M Anon, 518M Header, 7401M Other
     36G Compressed, 69G Uncompressed, 1.93:1 Ratio
Swap: 252G Total, 252G Free

  PID   JID USERNAME    PRI NICE   SIZE    RES SWAP STATE    C   TIME    WC=
PU
COMMAND
47255     0 dch          20    0  1316M   111M   0B STOP     4   0:00   0.0=
0%
beam.smp{10_dirty_io_sch}
47255     0 dch          20    0  1316M   111M   0B STOP     4   0:00   0.0=
0%
beam.smp{sys_sig_dispatc}

$ ps augx -p 47255
USER   PID %CPU %MEM     VSZ    RSS TT  STAT STARTED    TIME COMMAND
dch  47255  0.0  0.1 1347516 113316  1  TX   13:57   0:03.48
/usr/local/lib/erlang24/erts-12.0.3/bin/beam.smp -- -root
/usr/local/lib/erlang24 -progname erl

$ sudo procstat -kk 47255
  PID    TID COMM                TDNAME              KSTACK=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
47255 574625 beam.smp            sys_sig_dispatc     mi_switch+0xc1
thread_suspend_switch+0xc0 thread_single+0x69c exit1+0xc1 sys_sys_exit+0xd
amd64_syscall+0x10c fast_syscall_common+0xf8=20
47255 574653 beam.smp            10_dirty_io_sch     mi_switch+0xc1
_sleep+0x1cb rms_rlock_fallback+0x90 zfs_lookup+0x7e
zfs_freebsd_cachedlookup+0x6b vfs_cache_lookup+0xad lookup+0x68c namei+0x487
kern_statat+0xcf sys_fstatat+0x2f amd64_syscall+0x10c fast_syscall_common+0=
xf8=20
```

This situation repeats after reboot, and shows a hanging zfs rollback or
similar command each time:


```
/sbin/zfs rollback -r embiggen/...@pristine
18990 168480 zfs                 -                   mi_switch+0xc1
_vm_page_busy_sleep+0x100 vm_page_sleep_if_busy+0x28 vm_object_page_remove+=
0xdf
vn_pages_remove+0x4c zfs_rezget+0x35 zfs_resume_fs+0x258 zfs_ioc_rollback+0=
x158
zfsdev_ioctl_common+0x4e3 zfsdev_ioctl+0x143 devfs_ioctl+0xc7 vn_ioctl+0x1a4
devfs_ioctl_f+0x1e kern_ioctl+0x26d sys_ioctl+0xf6 amd64_syscall+0x10c
fast_syscall_common+0xf8=20
```

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-258208-227>