From owner-freebsd-stable@FreeBSD.ORG Fri Nov 7 22:01:05 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B113106568E for ; Fri, 7 Nov 2008 22:01:05 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id 2058F8FC16 for ; Fri, 7 Nov 2008 22:01:04 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA04.westchester.pa.mail.comcast.net ([76.96.62.35]) by QMTA04.westchester.pa.mail.comcast.net with comcast id cAuC1a00L0ldTLk54N14wf; Fri, 07 Nov 2008 22:01:04 +0000 Received: from koitsu.dyndns.org ([69.181.141.110]) by OMTA04.westchester.pa.mail.comcast.net with comcast id cN131a0052P6wsM3QN13WM; Fri, 07 Nov 2008 22:01:04 +0000 X-Authority-Analysis: v=1.0 c=1 a=yGehKeUrlSkA:10 a=8pqyQ0YcIiEA:10 a=6I5d2MoRAAAA:8 a=QycZ5dHgAAAA:8 a=cBVVqmFjOQ6zHc88hmEA:9 a=IqMF79kF0UpTwCIHRgUA:7 a=jhXCECP-zG0jPd1p39hawqy-uZwA:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id CA2305C19; Fri, 7 Nov 2008 14:01:02 -0800 (PST) Date: Fri, 7 Nov 2008 14:01:02 -0800 From: Jeremy Chadwick To: Kevin Oberman Message-ID: <20081107220102.GA14260@icarus.home.lan> References: <20081107212148.1A47245010@ptavv.es.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081107212148.1A47245010@ptavv.es.net> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-stable@freebsd.org Subject: Re: Problem with USB drive errors in recent 7-Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Nov 2008 22:01:05 -0000 On Fri, Nov 07, 2008 at 01:21:48PM -0800, Kevin Oberman wrote: > I recently started getting errors on a fairly new USB connected SATA > drive. Aside from the errors, the system was locking up as any process > attempting to access the drive would lock up in disk uninterruptible > wait ("D" in ps). I could not shut down the system and had to power it > off. (It's a laptop.) After a reboot, I tried to fsck it and that locked > up, too. I was able to recover by telling fsck to not fix the truncated > inode and fix everything else. Then I ran fsck again and it was > successful in fixing the inode. This happened several times. > > I then bought a new drive and got the identical behavior! It was not the > drive. I rolled my kernel back to 9/13/08 and tried again. This time it just > worked! No errors or lock up. > > I suspect that there are two issues. One results in the lock-up when the > disk had errors and the other caused the purported disk errors. The > latter has been introduced since 9/13/08. The kernel that produced the > errors was from 10/21. I also ran a kernel from 10/8 which did not cause > me problems, but I'm not sure that I used the USB drive with this > kernel. > > I'll be building a 10/8 kernel later, after I have backed up some data > from a failing drive (PATA, not USB, and SMART confirms that the this > disk is sick). I will try to track down exactly which change triggered > this ugly behavior, but that will take a number of kernel builds, so it > will take a while. > > Has anyone else seen this? Any ideas on what changes might be the most > likely cause. Could be USB, CAM, or something else, I guess. Funny you should post this today -- I just spent the past few days dealing with this problem, specifically the kernel being "stuck" when writing to a umass/da device (in my case, USB flash drives). When I say "stuck", I mean the kernel was still responsive: Ctrl-T would report statuses in processes (the states shown were all different) but the processes essentially had "hung". Ctrl-Alt-Esc on the console dropped me to a db> prompt, so it's not as if the machine had frozen/locked up; it was as if some part surrounding the storage subsystem was spinning in a loop. IP traffic still worked as well, but of course anything that accessed disks would hang. Rebooting the box via Ctrl-Alt-Del wouldn't work, because it would get stuck waiting for a bunch of PIDs to end. I switched the box to CURRENT (for a lot of reasons), and one of those was to try out the new USB4BSD (called "USB2" -- not to be confused with the USB2.0 protocol) stack. That simply induced a random kernel panic. However, HPS is fairly certain he found the issue, and it's with bus_dma(9) interaction. Here's the thread: http://lists.freebsd.org/pipermail/freebsd-current/2008-November/thread.html#235 http://lists.freebsd.org/pipermail/freebsd-current/2008-November/000220.html I have not yet tried his patches (I just woke up), but I will in a short while. So far I have a lot more faith in USB4BSD than I do the old stack, simply because there's active work going on in it. (It's ironic that I encountered this issue while working on a document describing how to put FreeBSD i386, amd64, and MS-DOS on a USB flash drive, so one could install FreeBSD from it, or boot MS-DOS for BIOS upgrades) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |