Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Apr 2017 12:19:07 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-fs@FreeBSD.org
Subject:   [Bug 218954] [ZFS] Add a sysctl to toggle zfs_free_leak_on_eio
Message-ID:  <bug-218954-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218954

            Bug ID: 218954
           Summary: [ZFS] Add a sysctl to toggle zfs_free_leak_on_eio
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Keywords: patch
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: fk@fabiankeil.de
                CC: freebsd-fs@FreeBSD.org

Created attachment 182174
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D182174&action=
=3Dedit
sys/cddl: Add a sysctl to toggle zfs_free_leak_on_eio

The attached patch adds a sysctl to toggle zfs_free_leak_on_eio.

Setting the sysctl allows to break a previously-endless cycle
of ZFS collecting checksum errors for metadata.

Before setting vfs.zfs.free_leak_on_eio=3D1:

fk@t520 ~ $zpool status cloudia2
  pool: cloudia2
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 308K in 53h23m with 3358 errors on Sun Apr 16 20:33:=
26
2017
config:

        NAME                  STATE     READ WRITE CKSUM
        cloudia2              ONLINE       0     0   129
          label/cloudia2.eli  ONLINE       0     0   516

errors: 3362 data errors, use '-v' for a list

fk@t520 ~ $zpool status -v cloudia2
[..]
errors: Permanent errors have been detected in the following files:

        <0x186>:<0x28>
        <0x186>:<0x35>
        <0xffffffffffffffff>:<0x28>

Every five seconds the checksum counter got increased.

zfsdbg-msg output:
2017 Apr 21 11:12:43: bptree index 0: traversing from min_txg=3D1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:43: bptree index 1: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 2: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 3: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 4: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 5: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 6: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:43: bptree index 7: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 0: traversing from min_txg=3D1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:48: bptree index 1: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 2: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 3: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 4: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 5: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 6: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:48: bptree index 7: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 0: traversing from min_txg=3D1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:53: bptree index 1: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 2: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 3: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 4: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 5: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 6: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:53: bptree index 7: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 0: traversing from min_txg=3D1 bookmark
-1/40/0/5120
2017 Apr 21 11:12:58: bptree index 1: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 2: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 3: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 4: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 5: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 6: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:12:58: bptree index 7: traversing from min_txg=3D-1 bookmark
0/0/0/0

fk@t520 ~ $zpool get all cloudia2
NAME      PROPERTY                       VALUE                          SOU=
RCE
cloudia2  size                           2.98T                          -
cloudia2  capacity                       54%                            -
cloudia2  altroot                        -                              def=
ault
cloudia2  health                         ONLINE                         -
cloudia2  guid                           4205907112567218706            def=
ault
cloudia2  version                        -                              def=
ault
cloudia2  bootfs                         -                              def=
ault
cloudia2  delegation                     on                             def=
ault
cloudia2  autoreplace                    off                            def=
ault
cloudia2  cachefile                      -                              def=
ault
cloudia2  failmode                       wait                           def=
ault
cloudia2  listsnapshots                  off                            def=
ault
cloudia2  autoexpand                     off                            def=
ault
cloudia2  dedupditto                     0                              def=
ault
cloudia2  dedupratio                     1.00x                          -
cloudia2  free                           1.37T                          -
cloudia2  allocated                      1.62T                          -
cloudia2  readonly                       off                            -
cloudia2  comment                        -                              def=
ault
cloudia2  expandsize                     -                              -
cloudia2  freeing                        24.2G                          def=
ault
cloudia2  fragmentation                  32%                            -
cloudia2  leaked                         0                              def=
ault
[...]

After setting vfs.zfs.free_leak_on_eio=3D1:

zfsdbg-msg output:
2017 Apr 21 11:13:03: bptree index 0: traversing from min_txg=3D1 bookmark
-1/40/0/5120
2017 Apr 21 11:13:06: freed 100000 blocks in 3050ms from free_bpobj/bptree =
txg
17892; err=3D-1
2017 Apr 21 11:13:07: bptree index 0: traversing from min_txg=3D1 bookmark
-1/68/0/718
2017 Apr 21 11:13:08: bptree index 1: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 2: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 3: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 4: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 5: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 6: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: bptree index 7: traversing from min_txg=3D-1 bookmark
0/0/0/0
2017 Apr 21 11:13:08: freed 96110 blocks in 1927ms from free_bpobj/bptree t=
xg
17893; err=3D0
2017 Apr 21 11:15:33: command: zpool clear cloudia2

The checksum error counters stopped incrementing,
"freeing" went to 0 and "leaked" from 0 to 256M.

fk@t520 ~ $zpool get all cloudia2
NAME      PROPERTY                       VALUE                          SOU=
RCE
cloudia2  size                           2.98T                          -
cloudia2  capacity                       53%                            -
cloudia2  altroot                        -                              def=
ault
cloudia2  health                         ONLINE                         -
cloudia2  guid                           4205907112567218706            def=
ault
cloudia2  version                        -                              def=
ault
cloudia2  bootfs                         -                              def=
ault
cloudia2  delegation                     on                             def=
ault
cloudia2  autoreplace                    off                            def=
ault
cloudia2  cachefile                      -                              def=
ault
cloudia2  failmode                       wait                           def=
ault
cloudia2  listsnapshots                  off                            def=
ault
cloudia2  autoexpand                     off                            def=
ault
cloudia2  dedupditto                     0                              def=
ault
cloudia2  dedupratio                     1.00x                          -
cloudia2  free                           1.39T                          -
cloudia2  allocated                      1.59T                          -
cloudia2  readonly                       off                            -
cloudia2  comment                        -                              def=
ault
cloudia2  expandsize                     -                              -
cloudia2  freeing                        0                              def=
ault
cloudia2  fragmentation                  32%                            -
cloudia2  leaked                         256M                           def=
ault
[...]

The difference on the receiving side confirmed that some space had been
recovered:

[fk@kendra ~]$ zfs list -r -p -t all dpool/ggated/cloudia2
NAME                                             USED          AVAIL=20=20=
=20=20=20=20=20=20=20
REFER  MOUNTPOINT
[...]
dpool/ggated/cloudia2@2017-04-21_10:37        9251840              -=20
1812645106176  -
dpool/ggated/cloudia2@2017-04-21_11:17        3950592              -=20
1800267106304  -

It's not obvious to me if the 256M were really leaked
but either way it looks like a clear win.

On another ZFS pool with the same issue but backed by an USB disk all
the space in "freeing" was supposedly "leaked" but it was a lot less
to begin with:

Before setting vfs.zfs.free_leak_on_eio=3D1:

fk@t520 /usr/src $zpool get all wde4
NAME  PROPERTY                       VALUE                          SOURCE
wde4  size                           1.81T                          -
wde4  capacity                       94%                            -
wde4  altroot                        -                              default
wde4  health                         ONLINE                         -
wde4  guid                           14402430966328721211           default
wde4  version                        -                              default
wde4  bootfs                         -                              default
wde4  delegation                     on                             default
wde4  autoreplace                    off                            default
wde4  cachefile                      -                              default
wde4  failmode                       wait                           default
wde4  listsnapshots                  off                            default
wde4  autoexpand                     off                            default
wde4  dedupditto                     0                              default
wde4  dedupratio                     1.00x                          -
wde4  free                           107G                           -
wde4  allocated                      1.71T                          -
wde4  readonly                       off                            -
wde4  comment                        -                              default
wde4  expandsize                     -                              -
wde4  freeing                        1.18M                          default
wde4  fragmentation                  23%                            -
wde4  leaked                         0                              default

After setting vfs.zfs.free_leak_on_eio=3D1:

fk@t520 /usr/src $zpool get all wde4
NAME  PROPERTY                       VALUE                          SOURCE
wde4  size                           1.81T                          -
wde4  capacity                       94%                            -
wde4  altroot                        -                              default
wde4  health                         ONLINE                         -
wde4  guid                           14402430966328721211           default
wde4  version                        -                              default
wde4  bootfs                         -                              default
wde4  delegation                     on                             default
wde4  autoreplace                    off                            default
wde4  cachefile                      -                              default
wde4  failmode                       wait                           default
wde4  listsnapshots                  off                            default
wde4  autoexpand                     off                            default
wde4  dedupditto                     0                              default
wde4  dedupratio                     1.00x                          -
wde4  free                           107G                           -
wde4  allocated                      1.71T                          -
wde4  readonly                       off                            -
wde4  comment                        -                              default
wde4  expandsize                     -                              -
wde4  freeing                        0                              default
wde4  fragmentation                  23%                            -
wde4  leaked                         1.18M                          default
[...]

The pool was affected by the issue since 2015:
https://lists.freebsd.org/pipermail/freebsd-fs/2015-February/020845.html

Obtained from: ElectroBSD

--=20
You are receiving this mail because:
You are on the CC list for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-218954-3630>