Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Oct 2008 23:32:25 +0200
From:      Miroslav Lachman <000.fbsd@quip.cz>
To:        Jeremy Chadwick <koitsu@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: Recommendations for servers running SATA drives [hot-swap]
Message-ID:  <48F90469.7020503@quip.cz>
In-Reply-To: <20081017150616.GA24321@icarus.home.lan>
References:  <20080927202250.GA60980@icarus.home.lan> <48E0DB7E.20804@quip.cz> <1222699642.24339.12.camel@buffy.york.ac.uk> <48E0F36C.1080400@quip.cz> <20080929153220.GA11459@icarus.home.lan> <48F7964C.4060309@quip.cz> <20081016202322.GA2429@icarus.home.lan> <48F87C0E.8060404@quip.cz> <20081017120858.GA20746@icarus.home.lan> <48F89C8D.5020301@quip.cz> <20081017150616.GA24321@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Jeremy Chadwick wrote:
> On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote:
> 
>>Jeremy Chadwick wrote:
>>
>>>On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
>>>
>>>>Jeremy Chadwick wrote:
>>>>
>>>>>On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
>>>>>
>>>>>
>>>>>
>>>>>>Today I was replacing disk in one Sun Fire X2100 M2 so I tried    
>>>>>>hot-swapping. It was as you said: atacontrol detach ata3, replace 
>>>>>>the HDD, atacontrol attach ata3 and new disk is in the system. I 
>>>>>>tried it 3  times to be sure that it was not coincidence - no 
>>>>>>panic was produced ;o)
>>>>>>So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD 
>>>>>>7.0 i386  works.
>>>>>
>>>>>
>>>>>That's excellent news.  So it seems possibly the problem I was seeing
>>>>>was with "reinit" causing some sort of chaos.  I'll have to check things
>>>>>on my testbox here at home to see how I caused the panic last time.
>>>>>
>>>>>Thanks for providing feedback, as usual!  :-)
>>>>
>>>>Unfortunately there is one problem - I see a lot of interrupts after  
>>>>disk swapping (about 193k of atapci1)
>>>>
>>>>Interrupts
>>>>197k total
>>>>    ohci0 21
>>>>    ehci0 22
>>>>193k atapci1 23
>>>>2001 cpu0: time
>>>>  1 bge1 273
>>>>2001 cpu1: time
>>>
>>>
>>>Okay, so it looks like the interrupt rate on atapci1 after swapping is
>>>going crazy.  What you're showing there looks like heavily modified
>>>vmstat -i output.
>>
>>The shown is manually cropped from systat -vm, I'll try vmstat -i next  
>>time. ;)
>>
>>
>>>>Full output of systat -vm 2 is attached.
>>>>
>>>>It is shown in top as 50% interrupt (CPU state) and load 1 until I   
>>>>rebooted the machine (I can provide MRTG graphs). The system was not 
>>>>in  production load, but almost idle. (I will put it in production 
>>>>tomorrow).
>>>>After reboot, everything is OK.
>>>
>>>
>>>And this box is running the ATA patch Andrey provided, yes?
>>
>>It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches.
>>
>>
>>>>Can somebody test hot-swapping with SATA drives and confirm this   
>>>>behavior? (I can't test it now, because machine is in datacenter)
>>>
>>>
>>>I can test it on my P4SCE box.
>>>
>>>I'll check the interrupt rates after each step of the hot-swap to see
>>>if/when the problem starts.
>>
>>I'll check the interrupts next time too and will post results to this  
>>thread.
> 
> 
> As promised, here are notes from my testing:
> 
> 
> First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode
> set to "Auto", which was causing PATA emulation to happen on the SATA
> controller, e.g.  disk #0 == ata0-master, disk #1 == ata0-slave.
> 
> I changed the BIOS option from Auto to "SATA Enhanced", and now the
> disks show up on their own channels, e.g.  disk #0 == ata2-master, disk
> #1 == ata3-master.
> 
> Here's the applicable data.  Note that this kernel ***DOES*** include
> Andrey's ATA patch:
> 
> FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008     root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX  i386
> 
> atapci1: <Intel ICH5 SATA150 controller> port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0
> atapci1: [ITHREAD]
> ata2: <ATA channel 0> on atapci1
> ata2: [ITHREAD]
> ata3: <ATA channel 1> on atapci1
> ata3: [ITHREAD]
> 
> SATA controller is on IRQ 18.
> 
> ad4: 114473MB <Seagate ST3120026AS 3.05> at ata2-master SATA150
> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150
> 
> ATA channel 2:
>     Master:  ad4 <ST3120026AS/3.05> Serial ATA v1.0
>     Slave:       no device present
> ATA channel 3:
>     Master:  ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II
>     Slave:       no device present
> 
> testbox# df -k
> Filesystem  1024-blocks    Used     Avail Capacity  Mounted on
> /dev/ad4s1a      507630  230182    236838    49%    /
> devfs                 1       1         0   100%    /dev
> /dev/ad4s1e      507630      12    467008     0%    /tmp
> /dev/ad4s1f   108498334 2944826  96873642     3%    /usr
> /dev/ad4s1d     2008622   32360   1815574     2%    /var
> /dev/ad6s1d   236511738       4 217590796     0%    /hotswap
> 
> testbox# vmstat -i
> interrupt                          total       rate
> irq4: sio0                          1398         34
> irq6: fdc0                            10          0
> irq15: ata1                           58          1
> irq18: atapci1                       945         23
> irq23: em1                             8          0
> cpu0: timer                        80033       1952
> cpu1: timer                        79808       1946
> Total                             162260       3957
> 
> testbox# umount /hotswap
> testbox# atacontrol detach ata3
> subdisk6: detached
> ad6: detached
> testbox# vmstat -i | grep atapci1
> irq18: atapci1                      2671         11
> 
> At this point I wanted to see what happened if I just reattached without
> any physical changes to the SATA bus.
> 
> testbox# atacontrol attach ata3
> ata3: [ITHREAD]
> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150
> Master:  ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II
> Slave:       no device present
> 
> testbox# vmstat -i | grep atapci1
> irq18: atapci1                      2764          9
> testbox# mount /dev/ad6s1d /hotswap
> testbox# vmstat -i | grep atapci1
> irq18: atapci1                      2779          8
> 
> Now we're going to try detaching *without* umounting the filesystem,
> then reattaching to see what happens.  Based on what I've seen and
> others have reported in the past, this should panic the kernel.
> Supposedly this problem is fixed on CURRENT.
> 
> testbox# atacontrol detach ata3
> subdisk6: detached
> ad6: detached
> 
> testbox# atacontrol attach ata3
> ata3: [ITHREAD]
> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150
> Master:  ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II
> Slave:       no device present
> 
> testbox# df -k
> Filesystem  1024-blocks    Used     Avail Capacity  Mounted on
> /dev/ad4s1a      507630  230182    236838    49%    /
> devfs                 1       1         0   100%    /dev
> /dev/ad4s1e      507630      12    467008     0%    /tmp
> /dev/ad4s1f   108498334 2944826  96873642     3%    /usr
> /dev/ad4s1d     2008622   32360   1815574     2%    /var
> /dev/ad6s1d   236511738       4 217590796     0%    /hotswap
> 
> testbox# ls -l /hotswap
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0xc0
> fault code              = supervisor read, page not present
> instruction pointer     = 0x20:0xc0503ca7
> stack pointer           = 0x28:0xe6310a5c
> frame pointer           = 0x28:0xe6310a5c
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 795 (ls)
> [thread pid 795 tid 100043 ]
> Stopped at      dev2udev+0x11:  movl    0xc0(%eax),%eax
> 
> db> bt
> Tracing pid 795 tid 100043 td 0xc3dcc460
> dev2udev(3287166208,3861973668,3228755039,3861973872,3286025312,...) at dev2udev+17
> ufs_getattr(3861973664,3861973800,3227504003,3229700640,3861973664,...) at ufs_getattr+222
> VOP_GETATTR_APV(3229700640,3861973664,3229768320,3288955040,3861973684,...) at VOP_GETATTR_APV+68
> vn_stat(3288955040,3861973908,3286230784,0,3286025312,...) at vn_stat+73
> kern_lstat(3286025312,135344488,0,3861974040,3861974064,...) at kern_lstat+147
> lstat(3286025312,3861974268,8,3861974328,3861974316,...) at lstat+43
> syscall(3861974328) at syscall+814
> Xint0x80_syscall() at Xint0x80_syscall+32
> --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 ---
> 
> Yup, there's the panic.  :-)
> 
> I rebooted the box from db, brought the system up in single-user, fsck'd
> all the disks/filesystems (no anomalies were found), and rebooted the
> box once more.
> 
> Now we're going to do everything properly: unmount /hotswap, detach,
> yank the disk and insert a new Maxtor hard disk, attach, and see what
> happens.
> 
> testbox# umount /hotswap
> testbox# atacontrol detach ata3
> subdisk6: detached
> ad6: detached
> 
> testbox# vmstat -i | grep atapci1
> irq18: atapci1                      1174          6
> 
> I've now removed the disk physically from the machine.  Let's check
> interrupts again.
> 
> testbox# vmstat -i | grep atapci1
> irq18: atapci1                      1185          4
> 
> Now the new Maxtor disk has been inserted.  LEDs for the SATA hot-swap
> backplane lit up for about 5-6 seconds, then went off.  Let's check
> interrupts at this point:
> 
> testbox# vmstat -i | grep atapci1
> irq18: atapci1                      1193          3
> 
> Now let's attach.  Note that there is no filesystem on this disk (it's
> completely blank), so there's nothing to mount.
> 
> testbox# atacontrol attach ata3
> ata3: [ITHREAD]
> ad6: 286188MB <Maxtor 6L300S0 BANC1G20> at ata3-master SATA150
> Master:  ad6 <Maxtor 6L300S0/BANC1G20> Serial ATA v1.0
> Slave:       no device present
> 
> And now we check interrupts:
> 
> testbox# vmstat -i | grep atapci1
> irq18: atapci1                      1258          2
> 
> Looks fine to me.

Thank you for your time, testing and reporting detailed results!
I will investigate my case somewhen in the future (if time permits)

Miroslav Lachman



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48F90469.7020503>