Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Dec 2011 01:26:24 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        "Patrick M. Hausen" <hausen@punkt.de>
Cc:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: Hot-changing a failed HDD with ahci.ko
Message-ID:  <20111214092624.GA96153@icarus.home.lan>
In-Reply-To: <B0A139EC-F6A3-48DA-A347-21A5ED0507BF@punkt.de>
References:  <B0A139EC-F6A3-48DA-A347-21A5ED0507BF@punkt.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 14, 2011 at 09:29:52AM +0100, Patrick M. Hausen wrote:
> Hi, all,
> 
> while most cheap servers with SATA disks are not really hot-plug
> capable, changing a failed disk (either gmirror or zfs) was possible
> without a reboot by executing e.g. if ad4 failed:
> 
> atacontrol detach ata2
> <change disks>
> atacontrol attach ata2
> 
> What is the proper equivalent for ahci, ada0 and camcontrol?

None is needed: yank the disk, reinsert, wait a few seconds, done.
Validation, with full output, hardware, etc:

http://koitsu.wordpress.com/2010/07/22/freebsd-and-zfs-hot-swapping-sata-disks-with-ahci/

I've made videos to demonstrate this as well, but need to edit them and
upload them.

> Stop unit commands seem not to work with SATA disks, so I
> tried:
> 
> <forcefully unplug "broken" disk>
> -> system logs about lost device, so far so good
> <insert new disk>
> camcontrol reset 1
> camcontrol devlist
> -> disk still not there
> camcontrol rescan 1
> -> command hangs
> <login to a second session, system still responsive>
> shutdown -r now
> -> system panics, eventually reboots

Before you yanked the disk, were any non-ZFS filesystems mounted?

This sounds similar to what happens if you were to yank a classic SATA
disk from a non-AHCI system, or under ata(4), without detaching first.
Or, on some systems, when SATA disks are yanked without use of a
hot-swap backplane.

> I can provide details about the panic if someone is interested,
> but maybe there is a proper procedure already, which I simply missed.
> 
> System is RELENG_8_2 amd64.
> ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf090-0xf097,0xf080-0xf083,0xf070-0xf077,0xf060-0xf063,0xf020-0xf03f mem 0xfb921000-0xfb9217ff irq 19 at device 31.2 on pci0
> ada0 at ahcich0 bus 0 scbus1 target 0 lun 0
> ada0: <ST31000340NS SN05> ATA-8 SATA 1.x device
> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada1 at ahcich1 bus 0 scbus2 target 0 lun 0
> ada1: <ST31000340NS SN05> ATA-8 SATA 1.x device
> ada1: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)

You might try booting RELENG_9 (which has ahci.ko as the default, so no
need to mess about) on a LiveCD or equivalent and attempt the same
thing.  I'm left wondering if there's some stuff in RELENG_8 (not a typo
compared to the above RELENG_9 reference) that you do not have in
RELENG_8_2.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20111214092624.GA96153>