Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Jun 2009 02:32:24 -0700
From:      freebsd@t41t.com
To:        FreeBSD-Questions@freebsd.org
Subject:   Re: you're not going to believe this.
Message-ID:  <20090624093223.GF3468@ece.pdx.edu>
In-Reply-To: <20090624010922.GA24335@thought.org>
References:  <20090622230729.GA20167@thought.org> <a9f4a3860906231222r65faaf1cia6b68186c79f4791@mail.gmail.com> <20090623201041.GA23561@thought.org> <20090623205944.GA43982@Grumpy.DynDNS.org> <20090624010922.GA24335@thought.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Gary Kline:
> Http://www.mydigitaldiscount.com/SPD/runcore-64gb-pata-mini-pci-e-pcie-ssd-for-asus-eee-pc-901-and-1000---backorder-runcore-64gb-pata-mini-pci-e-pcie-ssd-for-asus-eee-pc-901-and-1000--800008DB-1224129741.jsp

> ... statement that this device lasts ten years before it fails to
> hold state.

Roland Smith:
> The big difference is that it is much easier to tweak and change
> algorithms when doing it in software.

Wojciech Puchar:
> This flash chips have to emulate hard drive, which slows them down
> manyfold

> ... has acceptable lifetime/reliability, and uses less power/generates
> less heat than traditional platter HD ...

> [F]or example wear leveling and emulation small blocks requires moving
> of data within flash, this lowers both performance and lifetime.

I should know better, but I'm going to reply anyway.

First, be careful about statements like "10 years before it fails to hold
state." Usually that means if you write data to the device and put it on a
shelf, you've got 10 years before the data is unreadable. Being marketing
figures, these numbers are naturally stretched and inflated. Data retention
is strongly dependent on ambient temperature, among other things. More
to the point, that's a statistic you probably don't care about, because
who's going to buy a $200+ SSD hard drive and then leave it on a shelf
for a decade? The number you probably care about is how long _in active
use_ the drive will last, and that's probably _not_ 10 years. The primary
source of degredation (and eventually, failure) is writes, so minimizing
writes will probably extend the drive's life. NAND Flash, as used in SSDs,
is typically rated for (order of magnitude...) 10k write cycles. How many 
writes that gives you, once you put a bunch of chips together into an SSD 
and do wear leveling and all that, is anyone's guess. (The manufacturer 
probably knows, but won't tell you.)

Current NAND Flash chips do ECC and wear leveling transparently. It
is a significant time cost to move a block, so it's usually done
when a block is already being erased. This eliminates half the time
because you already know half the data trivially (it's being erased),
and erase is already a long operation, so making it a little longer
is less noticeable. Implementing wear leveling in OS-level software
isn't feasible. As I mentioned, wear leveling happens within the chip,
so the OS doesn't even know a block swap has occurred. (As an extension
of this, the OS doesn't know what the write count is, per block.) The OS
doesn't have access to physical parameters of the Flash cells (parameters
the chip itself can measure on-the-fly) to know when a swap needs to
occur. Depending on implementation, the OS may not even realize when (or
how often) an ECC correction occurs. Wear leveling algorithms are anything 
but trivial, are usually are closely guarded trade secrets, and depend 
heavily on manufacturing process parameters that are themselves trade 
secret. (This is not to say a Flash-specific file system doesn't have
value... you can probably get a lot just by caching writes as long as
possible, and putting commonly-modified pieces of data near each other
in the address space so they can be written together when updating is
needed.)

The SATA bridge does have a non-zero impact on read and write
times. However, that impact is nowhere near "manyfold" the inherenet
read/write time. In fact, it's pretty close to negligible. Most of the
time is eaten up by multi-level cell sensing/placement, ECC correction,
and as mentioned above, wear leveling (for writes).

The lifetime and reliability of SSDs are less-than-or-equal-to the
lifetime and reliability of spinning magnetic drives, so don't buy an SSD
for that. Whether SSDs use less power is an open question. There's a lot
of data going either way. The last comparison I saw suggested spinning 
drives average less power than their SSD counterparts. In any event, it's 
not clear-cut yet. SSDs probably do generate less heat (but I've not seen 
data on that). Of course, the access time on an SSD is order(s) of 
magnitude less than for a spinning drive, and that's cause enough for 
lots of people to buy one.

And finally, wear leveling is just a fact of life with Flash. It's not a
symptom of emulating a spinning drive or some particular block size. Wear
leveling won't go away (and you won't gain back that part of the write
time) by inventing a non-SATA, Flash-specific HD interface that nobody
supports yet. In fact, Gary's link talks about a device with a PCIe
interface, so the whole issue of acting like a spinning drive isn't
applicable.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090624093223.GF3468>