Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Aug 2005 09:38:20 -0600
From:      Scott Long <scottl@samsco.org>
To:        Ben Kaduk <minimarmot@gmail.com>
Cc:        freebsd-current@freebsd.org, Kyle Brooks <captinsmock@columbus.rr.com>
Subject:   Re: panic after removing usb flash drive
Message-ID:  <4315CEEC.80100@samsco.org>
In-Reply-To: <47d0403c05083020044f6ac0be@mail.gmail.com>
References:  <1125452228.740.3.camel@arbitor.homelinux.com> <47d0403c05083020044f6ac0be@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ben Kaduk wrote:
> On 8/31/05, Kyle Brooks <captinsmock@columbus.rr.com> wrote:
> 
>>umass0: LEXAR MEDIA JUMPDRIVE2, rev 2.00/1.25, addr 2
>>umass0: at uhub4 port 6 (addr 2) disconnected
>>panic: vm_fault: fault on nofault entry, addr: deadc000
>>
>>kernel:
>>
>>FreeBSD 7.0-CURRENT #2: Mon Aug 29 00:39:21 UTC 2005
>>
>>problem:
>>
>>kernel panics when usb flash drive is removed
>>
>>backtrace:
>>
>>#0 doadump () at pcpu.h:165
>>#1 0xc068610e in boot (howto=260)
>>at /usr/src/sys/kern/kern_shutdown.c:397
>>#2 0xc0685b92 in panic (
>>fmt=0xc090e46c "vm_fault: fault on nofault entry, addr: %lx")
>>at /usr/src/sys/kern/kern_shutdown.c:553
>>#3 0xc0812de1 in vm_fault (map=0xc1060000, vaddr=3735928832,
>>fault_type=2 '\002', fault_flags=0)
>>at /usr/src/sys/vm/vm_fault.c:884
>>#4 0xc0888807 in trap_pfault (frame=0xe6a06bf0, usermode=0,
>>eva=3735929110)
>>at /usr/src/sys/i386/i386/trap.c:741
>>#5 0xc0888d04 in trap (frame=
>>{tf_fs = 8, tf_es = -1063649240, tf_ds = 40, tf_edi = -993875968,
>>tf_esi = -1014223872, tf_ebp = -425694000, tf_isp = -425694180, tf_ebx =
>>-1063640044, tf_edx = -993875900, tf_ecx = 0, tf_eax = -559038242,
>>tf_trapno = 12, tf_err = 2, tf_eip = -1069194040, tf_cs = 32, tf_eflags
>>= 66050, tf_esp = -1063640032, tf_ss = 0})
>>at /usr/src/sys/i386/i386/trap.c:442
>>#6 0xc08745ba in calltrap () at /usr/src/sys/i386/i386/exception.s:139
>>#7 0x00000008 in ?? ()
>>#8 0xc09a0028 in atdma_acpi_driver_mod ()
>>#9 0x00000028 in ?? ()
>>#10 0xc4c2a800 in ?? ()
>>#11 0xc38c2c00 in ?? ()
>>#12 0xe6a06cd0 in ?? ()
>>#13 0xe6a06c1c in ?? ()
>>---Type <return> to continue, or q <return> to quit---
>>#14 0xc09a2414 in xsoftc ()
>>#15 0xc4c2a844 in ?? ()
>>#16 0x00000000 in ?? ()
>>#17 0xdeadc0de in ?? ()
>>#18 0x0000000c in ?? ()
>>#19 0x00000002 in ?? ()
>>#20 0xc04564c8 in camisr (V_queue=0xc09a2414)
>>at /usr/src/sys/cam/cam_xpt.c:7066
>>#21 0xc066f84e in ithread_loop (arg=0xc356fa80)
>>at /usr/src/sys/kern/kern_intr.c:545
>>#22 0xc066e808 in fork_exit (callout=0xc066f665 <ithread_loop>, arg=0x0,
>>frame=0x0) at /usr/src/sys/kern/kern_fork.c:789
>>#23 0xc087461c in fork_trampoline ()
>>at /usr/src/sys/i386/i386/exception.s:208
>>
> 
> This is the expected behaviour

Panics are not acceptable or expected behaviour in any situation, btw.

> if you didn't unmount the filesystem on the 
> thumbdrive before removing it. There was some discussion on this a while ago 
> (but I don't seem to be able to find the exact posts), but the general idea 
> is that the kernel has no idea in what state the actual physical medium 
> (disc) is/was in after being pulled, and may have some stale buffers holding 
> data that got written to disk. It doesn't know what to do with this data, or 
> how to treat requests to that device, so it panics.
> 

I probably missed the earlier discussion that you are referring to, but
what you are saying here actually isn't true.  There are a number of 
problems:

1)  When the thumbdrive gets pulled, the umass driver gets told to
detach.  It tries to detach itself from CAM, but things don't get torn
down correctly because there is an open reference to the target in CAM
(because there is a mounted filesystem on the device).  umass truddles
along anyways and goes away, leaving lots of dangling pointers in CAM
that blow up on the next attempted I/O access.

Part of the problem here is that the umass driver is architected wrong.
It creates a SIM, bus, and target instance for every umass device that
gets inserted.  When the device gets pulled, it tries to tear down
each of those instances all at once.  CAM simply wasn't designed for
this.  It was designed for the SIMs and buses to be long-lived objects
where only the targets (and luns) come and go.  Making umass fit this
model would invlove turning it into two logical drivers.  One would be
a SIM that would attach to the root hub instance of each USB controller
and would treat the USB bus as a CAM bus.  The other would be a target
driver that gets created and destroyed on a per-device basis as those
devices come and go.  When a umass device gets plugged in, the USB
framework would tell the apprpriate SIM to create a target instance.
When the device gets pulled, the framework would tell the SIM to detach
and destroy the target.  No dangling pointers would be left behind by
the SIM going away.  I have some prototype work in progress on this.

2) Some filesystems, UFS in particular, assume that an I/O will never
fail.  Instead of checking the error status of the buf on completion,
they just continue on and assume that everything is fine.  If the
VM is trying to page in a vnode, for example, it'll think that
the operation succeeded, and then really bad things will happen.  I'm
not sure if the same problem exists in MSDOSFS because I don't have
any DOS filesystems except on USB, and the problem with umass stands
in the way of further testing.  In luei of fixing umass, I might have to
create a synthetic md device to hold a msdos filesystem so that I can
test how it behaves.

3) It's unknown if the VM system knows how to rationally deal with
failed I/O or how to propagate that kind of failure to the rest of the
kernel and/or applications.  What happens if you mmap a file, and then
the device holding the file goes away?  How do you let the application
know that its mmap is now invalid?  Send it a Sig11, maybe?  How should
the vnode pager deal with failure?  There are lots of interesting
problems here.

In any case, the panic posted in the grandparent message implicates CAM 
and umass, which is what I would expect.  There may be more layers of
problems underneath it.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4315CEEC.80100>