Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Jul 2011 03:32:34 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-fs@FreeBSD.ORG
Subject:   Re: Questions about erasing an ssd to restore performance under FreeBSD
Message-ID:  <20110728103234.GA33275@icarus.home.lan>
In-Reply-To: <FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>
References:  <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> <20110728012437.GA23430@icarus.home.lan> <FD3A11BEFD064193AA24C1DF09EDD719@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 28, 2011 at 10:10:03AM +0100, Steven Hartland wrote:
> ----- Original Message ----- From: "Jeremy Chadwick"
> <freebsd@jdc.parodius.com>
>
>> [snipping parts about BIO_DELETE and details pertaining to ZFS,
>> hoping TRIM support gets added eventually, or possibly through GEOM
>> directly someday...]
> 
> Didn't realise there was a /dev/urandom, but /dev/random was very much
> limited, which reading the man page makes sense now, something to remember
> for next time :)

Well, on FreeBSD /dev/urandom is a symlink to /dev/random.  I've
discussed in the past why I use /dev/urandom instead of /dev/random (I
happen to work in a heterogeneous OS environment at work, where urandom
and random are different things).

I was mainly curious why you were using if=/some/actual/file rather than
if=/dev/urandom directly.  'tis okay, not of much importance.

> >Worth reading is this whitepaper, by the way.
> >
> >http://www.stec-inc.com/downloads/whitepapers/Benchmarking_Enterprise_SSDs.pdf
> >
> >By the way, your above dd is the first time I've seen an SSD write
> >1.8GBytes in 0.5 seconds.  Though I cannot rely entirely on benchmark
> >reviews, the one I just skimmed indicated a fresh drive of your model
> >tends to write, sequentially, at about 60MBytes/sec..
> 
> Hmm, I must have copied the wrong results there some where, here's
> the correct one which shows 180MB/s, which is still lower than the spec's
> 285MB/s but its random data so not benefiting as much as it can from the
> compression on the sandforce controller, most defintielty not 1.8GB/s ;-)
> 
> dd if=/data/test of=/ssd/test bs=1m         1000+0 records in
> 1000+0 records out
> 1048576000 bytes transferred in 5.542815 secs (189177506 bytes/sec)
> 
> As an update I've manged to get the drive back to full performance using
> Parted Magic boot cd, but using the manual process shown on the following
> page "instead" of using Disk Erase utility. Not sure why this didnt work
> yet.
> https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase

Okay, so it sounds like what happened -- if I understand correctly -- is
that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of
data copied to it.  It still had 60% free space available.  After, the
SSD performance for writes really plummeted (~20MByte/sec), but reads
were still decent.  Performing an actual ATA-level secure erase brought
the drive back to normal write performance (~190MByte/sec).

If all of that is correct, then I would say the issue is that the
internal GC on the Corsair SSD in question sucks.  With 60% of the drive
still available, performance should not have dropped to such an abysmal
rate; the FTL and wear levelling should have, ideally, dealt with this
just fine.  But it didn't.

Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever,
that's an engineering discussion for elsewhere) lacks TRIM.  The
underlying filesystem is therefore unable to tell the drive "hey, these
LBAs aren't used any more, you can consider them free and perform a NAND
page erase when an entire NAND page is unused".  The FTL has to track
all LBAs you've written to, otherwise if erasing a NAND page which still
had used data in it (for the filesystem) it would result in loss of
data.

So in summary I'm not too surprised by this situation happening, but I
*AM* surprised at just how horrible writes became for you.  The white
paper I linked you goes over this to some degree -- it talks about how
everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks
or talks about how horrible they perform when very little free space is
available, or if the GC is badly implemented.  Maybe Corsair's GC is
badly implemented -- I don't know.

I would see if there are any F/W updates for that model of drive.  The
firmware controls the GC model/method.  Otherwise, if this issue is
reproducible, I'll add this model of Corsair SSD to my list of drives to
avoid.

> Obviously having to boot to an alternative OS is far from ideal, so could
> really do with a BSD solution that has the ability to secure erase the disk,
> to restore performance, given the lack of TRIM in ZFS.
> 
> Is this something that could be added to camcontrol or may be its already
> possible with "camcontrol cmd"?

Is it possible to accomplish Secure Erase via "camcontrol cmd" with
ada(4)?  Yes, but the procedure will be extremely painful, drawn out,
and very error-prone.

Given that you've followed the procedure on the Linux hdparm/ATA Secure
Erase web page, you're aware of the security and "locked" status one has
to deal with using password-protection to accomplish the erase.  hdparm
makes this easy because it's just a bunch of command-line flags; the
""heavy lifting"" on the ATA layer is done elsewhere.  With "camcontrol
cmd", you get to submit the raw ATA CDB yourself, multiple times, at
different phases.  Just how familiar with the ATA protocol are you?  :-)

Why I sound paranoid: a typo could potentially "brick" your drive.  If
you issue a set-password on the drive, ***ALL*** LBA accesses (read and
write) return I/O errors from that point forward.  Make a typo in the
password, formulate the CDB wrong, whatever -- suddenly you have a drive
that you can't access or use any more because the password was wrong,
etc...  If the user doesn't truly understand what they're doing
(including the formulation of the CDB), then they're going to panic.

camcontrol and atacontrol could both be modified to do the heavy
lifting, making similar options/arguments that would mimic hdparm in
operation.  This would greatly diminish the risks, but the *EXACT
PROCEDURE* would need to be explained in the man page.  But keep reading
for why that may not be enough.

I've been in the situation where I've gone through the procedure you
followed on said web page, only to run into a quirk with the ATA/IDE
subsystem on Windows XP, requiring a power-cycle of the system.  The
secure erase finished, but I was panicking when I saw the drive spitting
out I/O errors on every LBA.  I realised that I needed to unlock the
drive using --security-unlock then disable security by using
--security-disable.  Once I did that it was fine.  The web page omits
that part, in the case of emergency or anomalies are witnessed.  This
ordeal happened to me today, no joke, while tinkering with my new Intel
510 SSD.  So here's a better page:

http://tinyapps.org/docs/wipe_drives_hdparm.html

Why am I pointing this out?  Because, in effect, an entire "HOW TO DO
THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be
added to camcontrol/atacontrol to ensure people don't end up with
"bricked" drives and blame FreeBSD.  Trust me, it will happen.  Give
users tools to shoot themselves in the foot and they will do so.

Furthermore, SCSI drives (which is what camcontrol has historically been
for up until recently) have a completely different secure erase CDB
command for them.  ATA has SECURITY ERASE UNIT, SCSI has SECURITY
INITIALIZE -- and in the SCSI realm, this feature is optional!  So
there's that error-prone issue as well.  Do you know how many times I've
issued "camcontrol inquiry" instead of "camcontrol identify" on my
ada(4)-based systems?  Too many.  Food for thought.  :-)

Anyway, this is probably the only time you will ever find me saying
this, but: if improving camcontrol/atacontrol to accomplish the above is
what you want, patches are welcome.  I could try to spend some time on
this if there is great interest in the community for such (I'm more
familiar with atacontrol's code given my SMART work in the past), and I
do have an unused Intel 320-series SSD which I can test with.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110728103234.GA33275>