Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Oct 2014 09:51:06 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 194606] New: filesystem deadlock on 10.1 and head when TRIM enabled at unmount after r268815, MFC of 268205
Message-ID:  <bug-194606-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194606

            Bug ID: 194606
           Summary: filesystem deadlock on 10.1 and head when TRIM enabled
                    at unmount after r268815, MFC of 268205
           Product: Base System
           Version: 10.1-RC2
          Hardware: Any
                OS: Any
            Status: Needs Triage
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: madpilot@FreeBSD.org
                CC: imp@FreeBSD.org

While performing some tests with nanobad, FreeBSD 10.1-RC3 on alix hardware I
discovered a lockup when unmounting filesystems.

This hardware is a small motherboard using CF card as main storage.

I usually enable trim support on these. NanoBSD mounts filesystems read only,
and I use scripts to mount/unmount filesystems when changes need to be saved.

I have seen a deadlock when unmounting. With a debugging kernel I got this:

root@qtest:~ [0]# umount /cfg
panic: detach with active requests
KDB: stack backtrace:
db_trace_self_wrapper(c0968053,c08ea7f0,c2d48800,c23d6bc8,c0536a16,...)
at db_trace_self_wrapper+0x2d/frame 0xc23d6b98
kdb_backtrace(c09639e1,c09fa7e8,c095761d,c23d6c54,c095761d,...) at
kdb_backtrace+0x30/frame 0xc23d6c00
vpanic(c09fa682,100,c095761d,c23d6c54,c23d6c54,...) at vpanic+0x80/frame
0xc23d6c24
kassert_panic(c095761d,c09575b3,c2d7acc0,4c7,c2d7acc0,...) at
kassert_panic+0xe9/frame 0xc23d6c48
g_detach(c2d7acc0,4,c095725c,1c2,c09c8d5c,...) at g_detach+0x1d3/frame
0xc23d6c64
g_wither_washer(c09f7df4,0,c0956544,124,0,...) at
g_wither_washer+0x109/frame 0xc23d6c90
g_run_events(0,c23d6d08,c095d42a,3dc,0,...) at g_run_events+0x40/frame
0xc23d6ccc
fork_exit(c05c4e60,0,c23d6d08) at fork_exit+0x7f/frame 0xc23d6cf4
fork_trampoline() at fork_trampoline+0x8/frame 0xc23d6cf4
--- trap 0, eip = 0, esp = 0xc23d6d40, ebp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100006 ]
Stopped at      kdb_enter+0x3d: movl    $0,kdb_why
db>

I played around with ddb and discovered this:

db> show geom 0xc2e98b40
consumer: 0xc2e98b40
  class:    VFS (0xc09c8d5c)
  geom:     ffs.ada0s3 (0xc3293600)
  provider: ada0s3 (0xc2e7e200)
  access:   r0w0e0
  flags:    0x0030
  nstart:   19
  nend:     18

Which shows nstart != nend, while g_detach asserts them to be the same.

Going up the chain of providers I find also it's providers have nstart -
nend == 1:

db> show geom 0xc2e9b7c0
consumer: 0xc2e9b7c0
  class:    PART (0xc09c96b0)
  geom:     ada0 (0xc2e7e780)
  provider: ada0 (0xc2e7e500)
  access:   r2w0e0
  flags:    0x0030
  nstart:   1430
  nend:     1429
db> show geom 0xc2e7e500
provider: ada0 (0xc2e7e500)
  class:        DISK (0xc09c8890)
  geom:         ada0 (0xc2e7e580)
  mediasize:    4017807360
  sectorsize:   512
  stripesize:   0
  stripeoffset: 0
  access:       r2w0e0
  flags:         (0x0030)
  error:        0
  nstart:       2085
  nend:         2084
  consumer: 0xc2e9a700 (ada0), access=r0w0e0, flags=0x0030
  consumer: 0xc2e9b480 (ada0), access=r0w0e0, flags=0x0030
  consumer: 0xc2e9b7c0 (ada0), access=r2w0e0, flags=0x0030

Having no idea how to debug further I started testing various revisions and I
finally discovered that the commit that broke it is r268815, which MFCed
r268205. Also disabling trim on the FS "fixes" the problem, which seems to
confirm that change to be involved.

Since this depends on hardware support for trim I have been unable to reproduce
this in virtualbox. I'm sorry I'm unable to produce a use case.

I'm CCing imp, who committed r268815, hoping he can have some more insight in
this.

This also affects head, obviously.

I'm available for any further testing or information needed.

Thanks in advance.

-- 
You are receiving this mail because:
You are the assignee for the bug.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-194606-8>