Date: Fri, 4 Mar 2011 20:20:11 GMT From: Ian Lepore <freebsd@damnhippie.dyndns.org> To: freebsd-arm@FreeBSD.org Subject: Re: arm/155214: [patch] MMC/SD IO slow on Atmel ARM with modern large SD cards Message-ID: <201103042020.p24KKBL7007848@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR arm/155214; it has been noted by GNATS. From: Ian Lepore <freebsd@damnhippie.dyndns.org> To: ticso@cicely.de Cc: FreeBSD-gnats-submit@freebsd.org Subject: Re: arm/155214: [patch] MMC/SD IO slow on Atmel ARM with modern large SD cards Date: Fri, 04 Mar 2011 13:10:12 -0700 On Thu, 2011-03-03 at 00:52 +0100, Bernd Walter wrote: > On Wed, Mar 02, 2011 at 02:53:18PM -0700, Ian Lepore wrote: > > > > >Number: 155214 > > >Category: arm > > >Synopsis: [patch] MMC/SD IO slow on Atmel ARM with modern large SD cards > > >Confidential: no > > >Severity: serious > > >Priority: medium > > >Responsible: freebsd-arm > > >State: open > > >Quarter: > > >Keywords: > > >Date-Required: > > >Class: sw-bug > > >Submitter-Id: current-users > > >Arrival-Date: Wed Mar 02 22:10:10 UTC 2011 > > >Closed-Date: > > >Last-Modified: > > >Originator: Ian Lepore <freebsd@damnhippie.dyndns.org> > > >Release: FreeBSD 8.2-RC3 arm > > >Organization: > > none > > >Environment: > > FreeBSD dvb 8.2-RC3 FreeBSD 8.2-RC3 #49: Tue Feb 15 22:52:14 UTC 2011 root@revolution.hippie.lan:/usr/obj/arm/usr/src/sys/DVB arm > > > > Included patch is against -current even though the problem was first seen on > > 8.2-RC3 > > > > The problem was seen on AT91RM9200 hardware, but presumably also affects the > > SAM9 series which uses the same driver code. > > > > >Description: > > With the latest generation of large-capacity SD cards, write speeds as low as > > 20 kbytes/sec are seen. These modern cards have erase-block sizes as large as > > 8192K (compared to 32K typical on previous generations). The at91_mci driver > > does only single-sector IO; apparently this requires the SD card to internally > > perform an expensive read-erase-modify-write cycle for each 512 byte block > > written to the card. > > The complete details of this problem are completely known. > However the RM9200 has many hardware problems to be worked around and > so far noone actually did. > Your patch is quite large, so I would like to ask you explicitly: > Did you test your patch with an AT91RM9200 system? > You did enable multisector support for reading and (more important) for > writing? > But you didn't activate 4bit mode? > With 4bit mode there is no hardware bug, but when the driver was written > is was just done in a lazy way because activating 4bit on SD cards require > special handling - in the meantime the SD layer itself was extracted and > has 4bit support, but the at91_mci driver was never updated to use that. > > PS: I'm very pleased to see your work since SD write speed was a > major show stopper for some applications > I made some time today to try 4-bit mode in the mci driver, using 8.2-RELEASE as a testbed. I quickly determined that just enabling 4-bit mode results in corrupted read data severe enough to virtually always cause "root mount error" at boot. Occasionally it'll manage to mount root but then lock up or panic during rc-file processing. It does this both with the original driver and with my patched driver configured for single-block or multi-block operation. After some experimenting to find the cause of the corrupted data, I realized we're violating the SD spec by running the bus at 30mhz -- the spec says 25mhz max until you use CMD6 to switch to high-speed mode if the card supports it. Our next lower available speed is 15mhz, and when I set that as the max speed, 4-bit works perfectly, both in the original driver and with my patches in single or multi-block operation. (In my patched driver I had to add a controller reset following a multi-block read stop, similar to after a multi-write, to avoid occasional spurious data crc errors in 4-bit mode. The data we want is read correctly; the crc error happens on the block that's still coming in as the stop command is being issued. I'm not sure why this only happens in 4-bit mode.) Since we've been getting away with 30mhz/1-bit for years, I surmise that any card that is capable of delivering 25mhz/4-bit is also capable of doing 30mhz/1-bit even though that's a slight violation of the spec. But 30mhz/4-bit appears to be enough of a violation that even modern cards don't keep up. (When looking at dumps of the corrupted read data, an old card had a lot of corruption, like 20% of the data was read wrong. A modern card had just a few bits wrong out of every few kbytes read.) Since 15mhz/4bit is still twice the data throughput of 30mhz/1bit I decided to do some crude benchmarking to see if it's worth the trouble of making 4-bit work correctly. The results appear below. In summary, there is definitely a benefit to using 4-bit transfers, but the improvement isn't nearly as dramatic as the change from single- to multi-block IO. Supporting 4-bit transfers properly will require some changes in dev/mmc. It doesn't currently use CMD6 to switch to high-speed mode at all. I'm assuming if we update it to do so, we'll have no problem running at 30mhz/4-bit. There'll also need to be some fixes in the routine that calculates the speed to run at, because right now it doesn't account for the 25mhz speed limit set by the spec before switching to high-speed (which is why we end up running at 30mhz). The mci driver will also need some updates to round down to the next lower supported clock speed requested by the upper layers, but it would probably be good to have a bit of a hack in there as well to allow 30mhz operation in 1-bit mode since folks have come to expect that and it seems to work ok. About the benchmarks... I tested with two different cards, noted below by their erase block sizes. The card with the 32-block erase size is a SanDisk 512mb card from several years ago. The card with the 8192-block erase size is a SanDisk 2gb card purchased recently. The older card does not claim to support high-speed mode, the newer card does (but of course we don't switch the card to hs mode). I tested each card with each combo of bus speed, bus width, and single- versus multi-block IO. All of the results below are with my patched driver. I also briefly tested the original unpatched 8.2 driver and found the results very much in line with the 1-block results from my patched driver. (The patched driver performs a little better even in single-block mode, probably because it gets the same work done with fewer interrupts.) Read and write speeds are as reported by these commands: dd if=/dev/mmcsd0s2a of=/dev/null bs=1m count=10 dd if=/dev/zero of=/dev/mmcsd0s2a bs=1m count=10 Each test was run several times immediately after rebooting; median values reported. There were no writable filesystems mounted and relatively little going on in the system in general, but I didn't get fanatical about leveling the test conditions. Erase/clock/bus/xfer size Read bytes/sec Write bytes/sec 32/30MHz/1bit/1-block 864452 333324 32/15MHz/4bit/1-block 975780 346738 8192/30MHz/1bit/1-block 647241 24211 8192/15MHz/4bit/1-block 722659 24253 32/30MHz/1bit/64-block 2192806 1775660 32/15MHz/4bit/64-block 3075302 1775302 8192/30MHz/1bit/64-block 2133880 1503959 8192/15MHz/4bit/64-block 2947133 1753540 Another crude little benchmark... right after booting I logged on as root immediately and did a vmstat -i, so this should roughly represent how many interrupts it took to get booted and launch root's shell (all read IO, there are no writeable filesystems mounted, both done at 30mhz/1-bit): vmstat -i interrupt total rate original driver (1-block) irq10: at91_mci0 42384 1284 patched driver (64-block) irq10: at91_mci0 1365 52 Based on the benchmark results, and the fact that I don't really have the time to take on the dev/mmc changes right now, I think we should adopt the multi-block patches and stick with 30mhz/1-bit for now. Maybe I can find some time later this year to get dev/mmc working better with high-speed mode (without accidentally breaking the sdhci world, which I don't know enough about right now).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201103042020.p24KKBL7007848>