From owner-freebsd-scsi@FreeBSD.ORG Sun May 29 16:01:22 2005 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8349316A429 for ; Sun, 29 May 2005 16:01:22 +0000 (GMT) (envelope-from braukmann@tse-online.de) Received: from lithium.plan-ix.de (lithium.plan-ix.de [212.37.39.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2B8C543D1D for ; Sun, 29 May 2005 16:01:19 +0000 (GMT) (envelope-from braukmann@tse-online.de) Received: from localhost (lithium.plan-ix.de [212.37.39.35]) by lithium.plan-ix.de (Postfix) with ESMTP id 947012EC35D for ; Sun, 29 May 2005 16:01:17 +0000 (GMT) Received: from localhost.plan-ix.de ([212.37.39.35]) by localhost (lithium.plan-ix.de [212.37.39.35]) (amavisd-new, port 10025) with ESMTP id 20194-08 for ; Sun, 29 May 2005 16:01:17 +0000 (GMT) Received: from [192.168.225.210] (pD95B30DB.dip0.t-ipconnect.de [217.91.48.219]) by lithium.plan-ix.de (Postfix) with ESMTP id B2D582EC35C for ; Sun, 29 May 2005 16:01:16 +0000 (GMT) Date: Sun, 29 May 2005 18:01:16 +0200 From: Andreas Braukmann To: freebsd-scsi@freebsd.org Message-ID: <77BED263063C97B2EA9A75CB@[192.168.225.210]> X-Mailer: Mulberry/3.1.6 (Mac OS X) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Scanned: by amavisd-new at plan-ix.de Subject: aac based raid, aaccli, enforce "optimal" state for "failed" containers X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 May 2005 16:01:22 -0000 Hi there, we've got a machine with 4 spindles connected to an Adaptec 2120S controller. Adverse circumstances (supposedly bad cabling and the scsi bus going hay- wire) lead to the situation that one of the RAID volumes got marked as "failed". This is an RAID 10 container. The second volume (RAID-5 across all spindles) is all well. Would it be possible (using aaccli, preferably) to reset the "failed" container to an "ok"-state? Thanks, Andreas From owner-freebsd-scsi@FreeBSD.ORG Sun May 29 21:09:57 2005 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7A04016A442 for ; Sun, 29 May 2005 21:09:56 +0000 (GMT) (envelope-from tspencer@hungry.com) Received: from terror.hungry.com (terror.hungry.com [199.181.107.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3B4C543EF8 for ; Sun, 29 May 2005 20:45:49 +0000 (GMT) (envelope-from tspencer@hungry.com) Received: from [172.16.1.14] (adsl-64-174-135-251.dsl.sntc01.pacbell.net [64.174.135.251]) (AUTH: LOGIN tspencer, TLS: TLSv1/SSLv3,128bits,RC4-SHA) by terror.hungry.com with esmtp; Sun, 29 May 2005 13:45:46 -0700 Mime-Version: 1.0 (Apple Message framework v730) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <047FCAD3-439C-47EC-A4E4-2253A25CCB39@hungry.com> Content-Transfer-Encoding: 7bit From: Tim Spencer Date: Sun, 29 May 2005 13:45:40 -0700 To: freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.730) Subject: isp driver + clustered NetApp failover = strangeness X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 May 2005 21:09:58 -0000 Hey there! I've got a pair of NetApp 940c heads that are exporting LUNs out to a bunch of FreeBSD hosts with qla2312 cards in them over a Brocade 2850 FC switch. Everything works great until I test out standby cluster failover on the NetApps. To quote NetApp's manual: "Port A on each target HBA operates as the active port, and Port B operates as a standby port. When the cluster is in normal operation, Port A provides access to local LUNs, and Port B is not available to the initiator. When one filer fails, Port B on the partner filer becomes active and provides access to the LUNs on the failed filer. The Port B assumes the WWPN of the Port A on the failed partner." So, to me, it sounds like this _should_ work for our FreeBSD hosts, which don't support multipathing, and thus must use this sort of failover. When the failover happens, the WWPN moves over to port B on the other head, perhaps a link reset happens or something, and everything keeps going. Well, it turns out that this is only partly true. If there is no I/O happening during the swap, then everything does seem to work out fine. But if there is I/O going on, then things quickly go downhill. I see this: May 28 19:35:56 toc2-db1 /kernel: (da0:isp0:0:1:0): Invalidating pack May 28 19:35:58 toc2-db1 /kernel: (da0:isp0:0:1:0): Invalidating pack May 28 19:36:50 toc2-db1 /kernel: (da0:isp0:0:1:0): isp0: watchdog timeout for handle 0x1f3 After this, sometimes the system locks up completely, and sometimes the system is operational, but anything that has to do with the filesystem in question hangs, etc. So here's my question: Is this something that we can make work? I really don't know all that much about the lower levels of how Fibre-Channel and the isp driver work, but it sounds like this ought to work. Is there anybody out there who knows more about the driver who might be willing to work on this? I can't guarantee anything, but our company does support FreeBSD development, and we might be able to swing some cash towards somebody who would be able to make this work. Is there anything else that I can include to help figure out what is going wrong? Below, I include dmesg from one of the hosts so you can see what sort of system is running this, but if you've got more things that I can do to diagnose this, let me know. Thanks, and have fun! -tspencer : toc2-db2 []$; dmesg Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.11-STABLE #0: Wed May 25 05:39:38 GMT 2005 root@:/usr/src/sys/compile/BSD4.11.GODSPEED-SMP Timecounter "i8254" frequency 1193182 Hz CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2786.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Hyperthreading: 2 logical CPUs real memory = 3221094400 (3145600K bytes) avail memory = 3134447616 (3060984K bytes) Changing APIC ID for IO APIC #0 from 0 to 8 on chip Changing APIC ID for IO APIC #1 from 0 to 9 on chip Changing APIC ID for IO APIC #2 from 0 to 10 on chip Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 Programming 16 pins in IOAPIC #2 FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000 cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000 cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000 io0 (APIC): apic id: 8, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 9, version: 0x000f0011, at 0xfec01000 io2 (APIC): apic id: 10, version: 0x000f0011, at 0xfec02000 Preloaded elf kernel "kernel" at 0x9f3d2000. Warning: Pentium 4 CPU: PSE disabled Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 9 entries at 0x9f0fc410 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard IOAPIC #1 intpin 3 -> irq 2 IOAPIC #1 intpin 7 -> irq 7 IOAPIC #1 intpin 11 -> irq 10 pci0: on pcib0 pci0: (vendor=0x1028, dev=0x000c) at 4.0 irq 2 pci0: (vendor=0x1028, dev=0x0008) at 4.1 irq 7 pci0: (vendor=0x1028, dev=0x000d) at 4.2 irq 10 pci0: at 14.0 atapci0: port 0x8b0-0x8bf, 0x8d8-0x8db,0x8d0-0x8d7,0x8c8-0x8cb,0x8c0-0x8c7 at device 15.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: at 15.2 irq 5 isab0: at device 15.3 on pci0 isa0: on isab0 pcib1: on motherboard IOAPIC #1 intpin 4 -> irq 11 pci1: on pcib1 fxp0: port 0xdcc0-0xdcff mem 0xfcf00000-0xfcf1ffff,0xfcf20000-0xfcf20fff irq 11 at device 8.0 on pci1 fxp0: Ethernet address 00:0e:0c:62:9e:17 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pcib2: on motherboard IOAPIC #1 intpin 8 -> irq 13 pci2: on pcib2 isp0: port 0xcc00-0xccff mem 0xfcd00000-0xfcd00fff irq 13 at device 6.0 on pci2 isp0: bad execution throttle of 0- using 16 pcib3: on motherboard IOAPIC #1 intpin 12 -> irq 16 IOAPIC #1 intpin 13 -> irq 17 pci3: on pcib3 bge0: mem 0xfcb10000-0xfcb1ffff irq 16 at device 6.0 on pci3 bge0: Ethernet address: 00:11:43:34:7b:3f miibus1: on bge0 brgphy0: on miibus1 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: mem 0xfcb00000-0xfcb0ffff irq 17 at device 8.0 on pci3 bge1: Ethernet address: 00:11:43:34:7b:40 miibus2: on bge1 brgphy1: on miibus2 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto pcib4: on motherboard IOAPIC #1 intpin 14 -> irq 18 pci4: on pcib4 pcib8: at device 8.0 on pci4 pci5: on pcib8 aac0: mem 0xf0000000-0xf7ffffff irq 18 at device 8.1 on pci4 aac0: i960RX 100MHz, 118MB cache memory, optional battery present aac0: Kernel 2.8-0, Build 6089, S/N 74a1d3 aac0: Supported Options=275c pcib5: on motherboard pci6: on pcib5 pcib6: on motherboard pci7: on pcib6 pcib7: on motherboard pci8: on pcib7 orm0: