Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 May 2016 10:46:39 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 209759] [patch] Prevent deadlocks when paging on GELI-encrypted devices
Message-ID:  <bug-209759-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759

            Bug ID: 209759
           Summary: [patch] Prevent deadlocks when paging on
                    GELI-encrypted devices
           Product: Base System
           Version: 11.0-CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Keywords: patch
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: fk@fabiankeil.de
          Keywords: patch

Created attachment 170670
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D170670&action=
=3Dedit
GELI: Use a dedicated uma zone for writes to onetime devices

The attached patch lets GELI use a dedicated uma zone for
writes to onetime devices as they are likely to originate
from the vm page daemon.

Without the patch the system could deadlock because the vm daemon
was waiting for pages to be written to disk, while GELI was waiting
for the vm daemon to make room for the buffer GELI needed to actually
write the pages:

     (kgdb) where
     #0  sched_switch (td=3D0xfffff800055bf9a0, newtd=3D0xfffff80002341000,
flags=3D<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1969
     #1  0xffffffff80962635 in mi_switch (flags=3D<value optimized out>,
newtd=3D0x0) at /usr/src/sys/kern/kern_synch.c:455
     #2  0xffffffff809aaa3a in sleepq_wait (wchan=3D0x0, pri=3D0) at
/usr/src/sys/kern/subr_sleepqueue.c:637
     #3  0xffffffff80962038 in _sleep (ident=3D<value optimized out>, lock=
=3D<value
optimized out>, priority=3D<value optimized out>, wmesg=3D0xffffffff80e826ee
"vmwait", sbt=3D0, pr=3D<value optimized out>,
        flags=3D<value optimized out>) at /usr/src/sys/kern/kern_synch.c:229
     #4  0xffffffff80c1ac6b in vm_wait () at /usr/src/sys/vm/vm_page.c:2705
     #5  0xffffffff80c06a9f in kmem_back (object=3D0xffffffff8144d6f0,
addr=3D18446741874805047296, size=3D69632, flags=3D<value optimized out>) at
/usr/src/sys/vm/vm_kern.c:356
     #6  0xffffffff80c068d2 in kmem_malloc (vmem=3D0xffffffff813aa500,
size=3D69632, flags=3D2) at /usr/src/sys/vm/vm_kern.c:316
     #7  0xffffffff80bfd7d6 in uma_large_malloc (size=3D69632, wait=3D2) at
/usr/src/sys/vm/uma_core.c:1106
     #8  0xffffffff8092f614 in malloc (size=3D<value optimized out>,
mtp=3D0xffffffff81b4d520, flags=3D0) at /usr/src/sys/kern/kern_malloc.c:513
     #9  0xffffffff81b4ab99 in g_eli_crypto_run (wr=3D0xfffff80002560040,
bp=3D0xfffff80008a86d90) at
/usr/src/sys/modules/geom/geom_eli/../../../geom/eli/g_eli_privacy.c:262
     #10 0xffffffff81b3e860 in g_eli_worker (arg=3D0xfffff80002560040) at
/usr/src/sys/modules/geom/geom_eli/../../../geom/eli/g_eli.c:565
     #11 0xffffffff80910f5c in fork_exit (callout=3D0xffffffff81b3e0b0
<g_eli_worker>, arg=3D0xfffff80002560040, frame=3D0xfffffe005005ec00) at
/usr/src/sys/kern/kern_fork.c:1034
     #12 0xffffffff80c33f0e in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:611
     #13 0x0000000000000000 in ?? ()
     (kgdb) p vm_cnt
     $16 =3D {v_swtch =3D 0, v_trap =3D 0, v_syscall =3D 0, v_intr =3D 0, v=
_soft =3D 0,
      v_vm_faults =3D 0, v_io_faults =3D 0, v_cow_faults =3D 0, v_cow_optim=
 =3D 0,
      v_zfod =3D 0, v_ozfod =3D 0, v_swapin =3D 0, v_swapout =3D 0, v_swapp=
gsin =3D
      0, v_swappgsout =3D 0, v_vnodein =3D 0, v_vnodeout =3D 0, v_vnodepgsi=
n =3D
      0, v_vnodepgsout =3D 0, v_intrans =3D 0, v_reactivated =3D 0, v_pdwak=
eups
      =3D 22197, v_pdpages =3D 0, v_tcached =3D 0, v_dfree =3D 0, v_pfree =
=3D 0,
      v_tfree =3D 0, v_page_size =3D 4096, v_page_count =3D 247688,
      v_free_reserved =3D 372, v_free_target =3D 5320, v_free_min =3D 1609,
      v_free_count =3D 2, v_wire_count =3D 140735, v_active_count =3D 96194,
      v_inactive_target =3D 7980, v_inactive_count =3D 10756, v_cache_count=
 =3D
      0, v_pageout_free_min =3D 34, v_interrupt_free_min =3D 2, v_free_seve=
re
      =3D 990, v_forks =3D 0, v_vforks =3D 0, v_rforks =3D 0, v_kthreads =
=3D 0,
      v_forkpages =3D 0, v_vforkpages =3D 0, v_rforkpages =3D 0, v_kthreadp=
ages
      =3D 0, v_spare =3D 0xffffffff8144d5ac}

A sysctl is added to optionally use the zone for GELI writes
in general, without letting common writes cut into the reserve
for onetime writes. This may reduce latency for larger writes
and as we need to keep a couple of items in the zone anyway,
the impact on the zone size is minor.

Initial testing seems to indicate that the sysctl could be
safely enabled by default in the future.

Currently a single zone with a somewhat humongous item size
sufficient for all GELI writes is being used. While this may
look a bit wasteful, in practice we don't need a lot of items,
so this seem tolerable for now.

The best solution would probably be to only use the dedicated
uma zone for common writes if the size is above 65356 bytes,
the largest zone item size internally used by malloc.

Currently the zone isn't used for reads as those are less time
critical and usually are small enough for malloc() to succeed
right away anyway.

Example length distribution when reproducing ElectroBSD with
-j4 and 1 GB of RAM:

  gpt/swap-ada1.eli                                   BIO_WRITE
           value  ------------- Distribution ------------- count
          < 4096 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             4965848
            8192 |@@@@@                                    943980
           12288 |@@                                       362668
           16384 |@                                        161485
           20480 |@                                        120939
           24576 |                                         87827
           28672 |                                         57402
           32768 |                                         40470
           36864 |                                         42243
           40960 |                                         28543
           45056 |                                         20347
           49152 |                                         15235
           53248 |                                         13450
           57344 |                                         9535
           61440 |                                         9952
           65536 |@                                        179360
           69632 |                                         0

  gpt/swap-ada1.eli                                   BIO_READ
           value  ------------- Distribution ------------- count
          < 4096 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4645114
            8192 |                                         0
           12288 |                                         0
           16384 |                                         3446
           20480 |                                         0

Note that the GELI overhead is not accounted for here
and only the results for the swap device are shown.

Zone use:

[fk@elektrobier3 ~]$ vmstat -z | egrep 'ITEM|eli' | column -t
ITEM    SIZE     LIMIT  USED  FREE  REQ       FAIL  SLEEP
g_eli:  172032,  0,     0,    14,   8077487,  0,    0

This includes writes to gpt/dpool-ada1.eli and gpt/dpool-ada1.eli.

Discussion: While the zone served 8077487 memory requests total,
14 items were sufficient for this and therefore the zone only withheld
172032 * 14 bytes plus zone meta data from the rest of the system.

Obtained from: ElectroBSD

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-209759-8>