Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Mar 2011 20:20:11 GMT
From:      Ian Lepore <freebsd@damnhippie.dyndns.org>
To:        freebsd-arm@FreeBSD.org
Subject:   Re: arm/155214: [patch] MMC/SD IO slow on Atmel ARM with modern large SD cards
Message-ID:  <201103042020.p24KKBL7007848@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR arm/155214; it has been noted by GNATS.

From: Ian Lepore <freebsd@damnhippie.dyndns.org>
To: ticso@cicely.de
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: arm/155214: [patch] MMC/SD IO slow on Atmel ARM with modern
 large SD cards
Date: Fri, 04 Mar 2011 13:10:12 -0700

 On Thu, 2011-03-03 at 00:52 +0100, Bernd Walter wrote:
 > On Wed, Mar 02, 2011 at 02:53:18PM -0700, Ian Lepore wrote:
 > > 
 > > >Number:         155214
 > > >Category:       arm
 > > >Synopsis:       [patch] MMC/SD IO slow on Atmel ARM with modern large SD cards
 > > >Confidential:   no
 > > >Severity:       serious
 > > >Priority:       medium
 > > >Responsible:    freebsd-arm
 > > >State:          open
 > > >Quarter:        
 > > >Keywords:       
 > > >Date-Required:
 > > >Class:          sw-bug
 > > >Submitter-Id:   current-users
 > > >Arrival-Date:   Wed Mar 02 22:10:10 UTC 2011
 > > >Closed-Date:
 > > >Last-Modified:
 > > >Originator:     Ian Lepore <freebsd@damnhippie.dyndns.org>
 > > >Release:        FreeBSD 8.2-RC3 arm
 > > >Organization:
 > > none
 > > >Environment:
 > > FreeBSD dvb 8.2-RC3 FreeBSD 8.2-RC3 #49: Tue Feb 15 22:52:14 UTC 2011     root@revolution.hippie.lan:/usr/obj/arm/usr/src/sys/DVB  arm
 > > 
 > > Included patch is against -current even though the problem was first seen on
 > > 8.2-RC3
 > > 
 > > The problem was seen on AT91RM9200 hardware, but presumably also affects the
 > > SAM9 series which uses the same driver code.
 > > 
 > > >Description:
 > > With the latest generation of large-capacity SD cards, write speeds as low as
 > > 20 kbytes/sec are seen.  These modern cards have erase-block sizes as large as 
 > > 8192K (compared to 32K typical on previous generations).  The at91_mci driver 
 > > does only single-sector IO; apparently this requires the SD card to internally 
 > > perform an expensive read-erase-modify-write cycle for each 512 byte block 
 > > written to the card.
 > 
 > The complete details of this problem are completely known.
 > However the RM9200 has many hardware problems to be worked around and
 > so far noone actually did.
 > Your patch is quite large, so I would like to ask you explicitly:
 > Did you test your patch with an AT91RM9200 system?
 > You did enable multisector support for reading and (more important) for
 > writing?
 > But you didn't activate 4bit mode?
 > With 4bit mode there is no hardware bug, but when the driver was written
 > is was just done in a lazy way because activating 4bit on SD cards require
 > special handling - in the meantime the SD layer itself was extracted and
 > has 4bit support, but the at91_mci driver was never updated to use that.
 > 
 > PS: I'm very pleased to see your work since SD write speed was a
 > major show stopper for some applications
 > 
 
 I made some time today to try 4-bit mode in the mci driver, using 
 8.2-RELEASE as a testbed.  I quickly determined that just enabling 
 4-bit mode results in corrupted read data severe enough to virtually 
 always cause "root mount error" at boot.  Occasionally it'll manage to 
 mount root but then lock up or panic during rc-file processing.  It 
 does this both with the original driver and with my patched driver 
 configured for single-block or multi-block operation.  
 
 After some experimenting to find the cause of the corrupted data, I 
 realized we're violating the SD spec by running the bus at 30mhz -- 
 the spec says 25mhz max until you use CMD6 to switch to high-speed 
 mode if the card supports it.  Our next lower available speed is 
 15mhz, and when I set that as the max speed, 4-bit works perfectly, 
 both in the original driver and with my patches in single or 
 multi-block operation.  (In my patched driver I had to add a 
 controller reset following a multi-block read stop, similar to after a 
 multi-write, to avoid occasional spurious data crc errors in 4-bit 
 mode.  The data we want is read correctly; the crc error happens on 
 the block that's still coming in as the stop command is being issued.  
 I'm not sure why this only happens in 4-bit mode.) 
 
 Since we've been getting away with 30mhz/1-bit for years, I surmise 
 that any card that is capable of delivering 25mhz/4-bit is also 
 capable of doing 30mhz/1-bit even though that's a slight violation of 
 the spec.  But 30mhz/4-bit appears to be enough of a violation that 
 even modern cards don't keep up.  (When looking at dumps of the 
 corrupted read data, an old card had a lot of corruption, like 20% of 
 the data was read wrong.  A modern card had just a few bits wrong out 
 of every few kbytes read.) 
 
 Since 15mhz/4bit is still twice the data throughput of 30mhz/1bit I 
 decided to do some crude benchmarking to see if it's worth the trouble 
 of making 4-bit work correctly.  The results appear below.  In 
 summary, there is definitely a benefit to using 4-bit transfers, but 
 the improvement isn't nearly as dramatic as the change from single- to 
 multi-block IO.  
 
 Supporting 4-bit transfers properly will require some changes in 
 dev/mmc.  It doesn't currently use CMD6 to switch to high-speed mode 
 at all.  I'm assuming if we update it to do so, we'll have no problem 
 running at 30mhz/4-bit.  There'll also need to be some fixes in the 
 routine that calculates the speed to run at, because right now it 
 doesn't account for the 25mhz speed limit set by the spec before 
 switching to high-speed (which is why we end up running at 30mhz).  
 
 The mci driver will also need some updates to round down to the next 
 lower supported clock speed requested by the upper layers, but it 
 would probably be good to have a bit of a hack in there as well to 
 allow 30mhz operation in 1-bit mode since folks have come to expect 
 that and it seems to work ok.  
 
 About the benchmarks...
 
 I tested with two different cards, noted below by their erase block 
 sizes.  The card with the 32-block erase size is a SanDisk 512mb card 
 from several years ago.  The card with the 8192-block erase size is a 
 SanDisk 2gb card purchased recently.  The older card does not claim to 
 support high-speed mode, the newer card does (but of course we don't 
 switch the card to hs mode).  
 
 I tested each card with each combo of bus speed, bus width, and 
 single- versus multi-block IO.  All of the results below are with my 
 patched driver.  I also briefly tested the original unpatched 8.2 
 driver and found the results very much in line with the 1-block 
 results from my patched driver.  (The patched driver performs a little 
 better even in single-block mode, probably because it gets the same 
 work done with fewer interrupts.) 
 
 Read and write speeds are as reported by these commands:
 
   dd if=/dev/mmcsd0s2a of=/dev/null bs=1m count=10
   dd if=/dev/zero of=/dev/mmcsd0s2a bs=1m count=10
 
 Each test was run several times immediately after rebooting; median 
 values reported.  There were no writable filesystems mounted and 
 relatively little going on in the system in general, but I didn't get 
 fanatical about leveling the test conditions.  
 
 Erase/clock/bus/xfer size    Read bytes/sec   Write bytes/sec
 
   32/30MHz/1bit/1-block          864452          333324 
   32/15MHz/4bit/1-block          975780          346738 
 
 8192/30MHz/1bit/1-block          647241           24211 
 8192/15MHz/4bit/1-block          722659           24253 
 
   32/30MHz/1bit/64-block        2192806         1775660 
   32/15MHz/4bit/64-block        3075302         1775302 
 
 8192/30MHz/1bit/64-block        2133880         1503959 
 8192/15MHz/4bit/64-block        2947133         1753540 
 
 
 Another crude little benchmark...  right after booting I logged on as 
 root immediately and did a vmstat -i, so this should roughly represent 
 how many interrupts it took to get booted and launch root's shell (all 
 read IO, there are no writeable filesystems mounted, both done at 
 30mhz/1-bit): 
 
 vmstat -i                  interrupt         total       rate
 original driver (1-block)  irq10: at91_mci0  42384       1284
 patched  driver (64-block) irq10: at91_mci0   1365         52
 
 
 Based on the benchmark results, and the fact that I don't really have 
 the time to take on the dev/mmc changes right now, I think we should 
 adopt the multi-block patches and stick with 30mhz/1-bit for now.  
 Maybe I can find some time later this year to get dev/mmc working 
 better with high-speed mode (without accidentally breaking the sdhci 
 world, which I don't know enough about right now).  
 
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201103042020.p24KKBL7007848>