From owner-freebsd-arm@FreeBSD.ORG Wed Jan 21 17:15:17 2009 Return-Path: Delivered-To: freebsd-arm@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3900C1065672; Wed, 21 Jan 2009 17:15:17 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id DC4508FC08; Wed, 21 Jan 2009 17:15:16 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.2/8.14.1) with ESMTP id n0LHEMRC026015; Wed, 21 Jan 2009 10:14:22 -0700 (MST) (envelope-from imp@bsdimp.com) Date: Wed, 21 Jan 2009 10:14:59 -0700 (MST) Message-Id: <20090121.101459.2022307528.imp@bsdimp.com> To: krassi@bulinfo.net From: "M. Warner Losh" In-Reply-To: <20090121.100533.-1955669401.imp@bsdimp.com> References: <20090121.084023.188100520.imp@bsdimp.com> <4977500A.7060902@bulinfo.net> <20090121.100533.-1955669401.imp@bsdimp.com> X-Mailer: Mew version 5.2 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: mav@FreeBSD.org, freebsd-arm@FreeBSD.org Subject: Re: Mount root from SD card? X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jan 2009 17:15:17 -0000 In message: <20090121.100533.-1955669401.imp@bsdimp.com> "M. Warner Losh" writes: : In message: <4977500A.7060902@bulinfo.net> : Krassimir Slavchev writes: : : -----BEGIN PGP SIGNED MESSAGE----- : : Hash: SHA1 : : : : M. Warner Losh wrote: : : > In message: <4977236E.2020409@bulinfo.net> : : > Krassimir Slavchev writes: : : > Boot with verbose messages is here: : : > : : > http://mnemonic.bulinfo.net/~krassi/ARM/arm.verbose : : > : : >> This looks very similar to the data corruption I saw when I had : : >> enabled multiblock read. To track this down, we're going to have to : : >> print the actual data returned for each sector... : : > : : >> Warner : : : : : : Here is a dump of data right after the byte swapping in : : at91_mci_read_done(): : : : : http://mnemonic.bulinfo.net/~krassi/ARM/sd.dump : : : : and here is the first 1M of the SD card: : : : : http://mnemonic.bulinfo.net/~krassi/ARM/sd.bin : : Looks like we're getting some data corruption: : : CMD: 11 ARG 0 len 512 : : ff ff ff ff fc 31 c0 8e c0 8e d8 8e d0 bc 00 7c : 89 e6 bf 00 06 b9 00 01 f3 a5 89 fd b1 08 f3 ab : fe 45 f2 e9 00 8a f6 46 bb 20 75 08 84 d2 78 07 : 80 4e bb 40 8a 56 ba 88 56 00 e8 fc 00 52 bb c2 : ... : : and then: : : CMD: 11 ARG 0 len 512 : : 00 00 55 aa fc 31 c0 8e c0 8e d8 8e d0 bc 00 7c : 89 e6 bf 00 06 b9 00 01 f3 a5 89 fd b1 08 f3 ab : fe 45 f2 e9 00 8a f6 46 bb 20 75 08 84 d2 78 07 : 80 4e bb 40 8a 56 ba 88 56 00 e8 fc 00 52 bb c2 : ... : : So it looks like the first 4 bytes are corrupted on the read. If you : look closely at the data on the device, you'll see that 'fc 31 c0 8e' : are the first 4 bytes of the reads are the 'left over' data from prior : data streams. This didn't used to be the case in the prior code : before the recent changes. The only way we're going to find the bad : change is to do a binary search on the svn changes to find out where : we go off the rails. This problem seems familiar to me, but I can't : quite put my finger on what the root-cause was last time I had it. I should have said 'fc 31 c0 8e' are the first four bytes of the data on the device, and 'ff ff ff ff' and '00 00 55 aa' are the leftover data which is corrupting things. The latter is actually the last 4 bytes of the block, which indicates that our PMC usage has stopped too soon, or that we have left over PMC data from a previous "read" that didn't specify enough data to be transferred. I suspect that we're sending a command down and not expecting enough data. On other bridges we toss the data harmlessly. On at91, the data is still in the FIFO for the mci device, so we see it first on the next read. At least that's the theory that just popped into my head, and also the root-cause that I now recall from before when I saw similar problems... Of course, given the number of transfers that had a lot of 'ff' in them, maybe the PMC is trasnferring data that doesn't really exist yet... Warner