Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Oct 2008 05:53:35 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Miroslav Lachman <000.fbsd@quip.cz>
Cc:        Gavin Atkinson <gavin@FreeBSD.org>, freebsd-stable@FreeBSD.org
Subject:   Re: Recommendations for servers running SATA drives [hot-swap]
Message-ID:  <20081026125335.GA89389@icarus.home.lan>
In-Reply-To: <49046596.3080605@quip.cz>
References:  <1222699642.24339.12.camel@buffy.york.ac.uk> <48E0F36C.1080400@quip.cz> <20080929153220.GA11459@icarus.home.lan> <48F7964C.4060309@quip.cz> <20081016202322.GA2429@icarus.home.lan> <48F87C0E.8060404@quip.cz> <20081017120858.GA20746@icarus.home.lan> <48F89C8D.5020301@quip.cz> <20081017150616.GA24321@icarus.home.lan> <49046596.3080605@quip.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Oct 26, 2008 at 01:41:58PM +0100, Miroslav Lachman wrote:
> Jeremy Chadwick wrote:
>> On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote:
>>
>>> Jeremy Chadwick wrote:
>>>
>>>> On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
>>>>
>>>>> Jeremy Chadwick wrote:
>>>>>
>>>>>> On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
>>>>>>
>>>>>>> Today I was replacing disk in one Sun Fire X2100 M2 so I 
>>>>>>> tried    hot-swapping. It was as you said: atacontrol detach 
>>>>>>> ata3, replace the HDD, atacontrol attach ata3 and new disk is 
>>>>>>> in the system. I tried it 3  times to be sure that it was not 
>>>>>>> coincidence - no panic was produced ;o)
>>>>>>> So in this case, hot-swapping on Sun Fire X2100 M2 with 
>>>>>>> FreeBSD 7.0 i386  works.
>>>>>>
>>>>>>
>>>>>> That's excellent news.  So it seems possibly the problem I was seeing
>>>>>> was with "reinit" causing some sort of chaos.  I'll have to check things
>>>>>> on my testbox here at home to see how I caused the panic last time.
>>>>>>
>>>>>> Thanks for providing feedback, as usual!  :-)
>>>>>
>>>>> Unfortunately there is one problem - I see a lot of interrupts 
>>>>> after  disk swapping (about 193k of atapci1)
>>>>>
>>>>> Interrupts
>>>>> 197k total
>>>>>    ohci0 21
>>>>>    ehci0 22
>>>>> 193k atapci1 23
>>>>> 2001 cpu0: time
>>>>>  1 bge1 273
>>>>> 2001 cpu1: time
>>>>
>>>>
>>>> Okay, so it looks like the interrupt rate on atapci1 after swapping is
>>>> going crazy.  What you're showing there looks like heavily modified
>>>> vmstat -i output.
>>>
>>> The shown is manually cropped from systat -vm, I'll try vmstat -i 
>>> next  time. ;)
>>>
>>>
>>>>> Full output of systat -vm 2 is attached.
>>>>>
>>>>> It is shown in top as 50% interrupt (CPU state) and load 1 until 
>>>>> I   rebooted the machine (I can provide MRTG graphs). The system 
>>>>> was not in  production load, but almost idle. (I will put it in 
>>>>> production tomorrow).
>>>>> After reboot, everything is OK.
>>>>
>>>>
>>>> And this box is running the ATA patch Andrey provided, yes?
>>>
>>> It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches.
>>>
>>>
>>>>> Can somebody test hot-swapping with SATA drives and confirm this  
>>>>>  behavior? (I can't test it now, because machine is in 
>>>>> datacenter)
>>>>
>>>>
>>>> I can test it on my P4SCE box.
>>>>
>>>> I'll check the interrupt rates after each step of the hot-swap to see
>>>> if/when the problem starts.
>>>
>>> I'll check the interrupts next time too and will post results to this 
>>>  thread.
>>
>>
>> As promised, here are notes from my testing:
>>
>>
>> First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode
>> set to "Auto", which was causing PATA emulation to happen on the SATA
>> controller, e.g.  disk #0 == ata0-master, disk #1 == ata0-slave.
>>
>> I changed the BIOS option from Auto to "SATA Enhanced", and now the
>> disks show up on their own channels, e.g.  disk #0 == ata2-master, disk
>> #1 == ata3-master.
>>
>> Here's the applicable data.  Note that this kernel ***DOES*** include
>> Andrey's ATA patch:
>>
>> FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008     root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX  i386
>>
>> atapci1: <Intel ICH5 SATA150 controller> port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0
>> atapci1: [ITHREAD]
>> ata2: <ATA channel 0> on atapci1
>> ata2: [ITHREAD]
>> ata3: <ATA channel 1> on atapci1
>> ata3: [ITHREAD]
>>
>> SATA controller is on IRQ 18.
>>
>> ad4: 114473MB <Seagate ST3120026AS 3.05> at ata2-master SATA150
>> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150
>>
>> ATA channel 2:
>>     Master:  ad4 <ST3120026AS/3.05> Serial ATA v1.0
>>     Slave:       no device present
>> ATA channel 3:
>>     Master:  ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II
>>     Slave:       no device present
>>
>> testbox# df -k
>> Filesystem  1024-blocks    Used     Avail Capacity  Mounted on
>> /dev/ad4s1a      507630  230182    236838    49%    /
>> devfs                 1       1         0   100%    /dev
>> /dev/ad4s1e      507630      12    467008     0%    /tmp
>> /dev/ad4s1f   108498334 2944826  96873642     3%    /usr
>> /dev/ad4s1d     2008622   32360   1815574     2%    /var
>> /dev/ad6s1d   236511738       4 217590796     0%    /hotswap
>>
>> testbox# vmstat -i
>> interrupt                          total       rate
>> irq4: sio0                          1398         34
>> irq6: fdc0                            10          0
>> irq15: ata1                           58          1
>> irq18: atapci1                       945         23
>> irq23: em1                             8          0
>> cpu0: timer                        80033       1952
>> cpu1: timer                        79808       1946
>> Total                             162260       3957
>>
>> testbox# umount /hotswap
>> testbox# atacontrol detach ata3
>> subdisk6: detached
>> ad6: detached
>> testbox# vmstat -i | grep atapci1
>> irq18: atapci1                      2671         11
>>
>> At this point I wanted to see what happened if I just reattached without
>> any physical changes to the SATA bus.
>>
>> testbox# atacontrol attach ata3
>> ata3: [ITHREAD]
>> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150
>> Master:  ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II
>> Slave:       no device present
>>
>> testbox# vmstat -i | grep atapci1
>> irq18: atapci1                      2764          9
>> testbox# mount /dev/ad6s1d /hotswap
>> testbox# vmstat -i | grep atapci1
>> irq18: atapci1                      2779          8
>>
>> Now we're going to try detaching *without* umounting the filesystem,
>> then reattaching to see what happens.  Based on what I've seen and
>> others have reported in the past, this should panic the kernel.
>> Supposedly this problem is fixed on CURRENT.
>>
>> testbox# atacontrol detach ata3
>> subdisk6: detached
>> ad6: detached
>>
>> testbox# atacontrol attach ata3
>> ata3: [ITHREAD]
>> ad6: 238474MB <WDC WD2500KS-00MJB0 02.01C03> at ata3-master SATA150
>> Master:  ad6 <WDC WD2500KS-00MJB0/02.01C03> Serial ATA II
>> Slave:       no device present
>>
>> testbox# df -k
>> Filesystem  1024-blocks    Used     Avail Capacity  Mounted on
>> /dev/ad4s1a      507630  230182    236838    49%    /
>> devfs                 1       1         0   100%    /dev
>> /dev/ad4s1e      507630      12    467008     0%    /tmp
>> /dev/ad4s1f   108498334 2944826  96873642     3%    /usr
>> /dev/ad4s1d     2008622   32360   1815574     2%    /var
>> /dev/ad6s1d   236511738       4 217590796     0%    /hotswap
>>
>> testbox# ls -l /hotswap
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id = 00
>> fault virtual address   = 0xc0
>> fault code              = supervisor read, page not present
>> instruction pointer     = 0x20:0xc0503ca7
>> stack pointer           = 0x28:0xe6310a5c
>> frame pointer           = 0x28:0xe6310a5c
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                         = DPL 0, pres 1, def32 1, gran 1
>> processor eflags        = interrupt enabled, resume, IOPL = 0
>> current process         = 795 (ls)
>> [thread pid 795 tid 100043 ]
>> Stopped at      dev2udev+0x11:  movl    0xc0(%eax),%eax
>>
>> db> bt
>> Tracing pid 795 tid 100043 td 0xc3dcc460
>> dev2udev(3287166208,3861973668,3228755039,3861973872,3286025312,...) at dev2udev+17
>> ufs_getattr(3861973664,3861973800,3227504003,3229700640,3861973664,...) at ufs_getattr+222
>> VOP_GETATTR_APV(3229700640,3861973664,3229768320,3288955040,3861973684,...) at VOP_GETATTR_APV+68
>> vn_stat(3288955040,3861973908,3286230784,0,3286025312,...) at vn_stat+73
>> kern_lstat(3286025312,135344488,0,3861974040,3861974064,...) at kern_lstat+147
>> lstat(3286025312,3861974268,8,3861974328,3861974316,...) at lstat+43
>> syscall(3861974328) at syscall+814
>> Xint0x80_syscall() at Xint0x80_syscall+32
>> --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 ---
>>
>> Yup, there's the panic.  :-)
>>
>> I rebooted the box from db, brought the system up in single-user, fsck'd
>> all the disks/filesystems (no anomalies were found), and rebooted the
>> box once more.
>>
>> Now we're going to do everything properly: unmount /hotswap, detach,
>> yank the disk and insert a new Maxtor hard disk, attach, and see what
>> happens.
>>
>> testbox# umount /hotswap
>> testbox# atacontrol detach ata3
>> subdisk6: detached
>> ad6: detached
>>
>> testbox# vmstat -i | grep atapci1
>> irq18: atapci1                      1174          6
>>
>> I've now removed the disk physically from the machine.  Let's check
>> interrupts again.
>>
>> testbox# vmstat -i | grep atapci1
>> irq18: atapci1                      1185          4
>>
>> Now the new Maxtor disk has been inserted.  LEDs for the SATA hot-swap
>> backplane lit up for about 5-6 seconds, then went off.  Let's check
>> interrupts at this point:
>>
>> testbox# vmstat -i | grep atapci1
>> irq18: atapci1                      1193          3
>>
>> Now let's attach.  Note that there is no filesystem on this disk (it's
>> completely blank), so there's nothing to mount.
>>
>> testbox# atacontrol attach ata3
>> ata3: [ITHREAD]
>> ad6: 286188MB <Maxtor 6L300S0 BANC1G20> at ata3-master SATA150
>> Master:  ad6 <Maxtor 6L300S0/BANC1G20> Serial ATA v1.0
>> Slave:       no device present
>>
>> And now we check interrupts:
>>
>> testbox# vmstat -i | grep atapci1
>> irq18: atapci1                      1258          2
>
>
> I played again with hot-swapping disks in Sun Fire X2100 M2 on FreeBSD  
> 7.0-RELEASE-p5 i386 without ATA patches.
> Both disks (ad4 + ad6) are in gmirror. There were high interrupts load  
> again! I tracked it to the point of pulling out the disk.
> Interrupt was OK after 'atacontrol detach', but rise up after disk was  
> removed. When the disk is inserted back (same disk), interrupts are  
> going to normal rate without need to reboot.
> I tried it three times and behavior was always the same.
> It can be related to the use of gmirror.

Interesting -- yeah, that's the one difference (besides hardware)
between your tests and mine: you're using gmirror, I'm not.

I'll take some time this weekend (or the upcoming weekend) to set up
gmirror on the above test box, and see if I can reproduce what you're
seeing.  This will be my first experience with gmirror.

> Side note: If the disk was detached by 'atacontrol detach ata2' without  
> removing from gmirror (without gmirror remove or gmirror deactivate) and  
> then pulled out + inserted back, it was automagically attached without  
> need of 'atacontrol attach ata2' and gmirror synchronization was  
> autostarted.

This could mean that gmirror is constantly "polling" the underlying
device layer for certain things, which might explain the high interrupt
rate you're seeing.  We should probably involve pjd@ in this discussion.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081026125335.GA89389>