From owner-freebsd-questions@FreeBSD.ORG Mon Feb 6 16:35:57 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BAFB106564A for ; Mon, 6 Feb 2012 16:35:57 +0000 (UTC) (envelope-from ryan.merrell@careerstep.com) Received: from hub021-ca-8.exch021.serverdata.net (hub021-ca-8.exch021.serverdata.net [64.78.56.73]) by mx1.freebsd.org (Postfix) with ESMTP id DF4E98FC08 for ; Mon, 6 Feb 2012 16:35:56 +0000 (UTC) Received: from MBX021-W3-CA-1.exch021.domain.local ([10.254.4.77]) by HUB021-CA-8.exch021.domain.local ([10.254.4.112]) with mapi id 14.01.0355.002; Mon, 6 Feb 2012 08:15:49 -0800 From: Ryan Merrell To: "freebsd-questions@freebsd.org" Thread-Topic: Multiple errors on server -- Where do I start looking? Thread-Index: Aczk6paXhWTftX2lTAyjE62g6ZOVsQ== Date: Mon, 6 Feb 2012 16:15:49 +0000 Message-ID: <2187B4E2EDE5044CA48617AC0C8D6E1B0E2AFA38@MBX021-W3-CA-1.exch021.domain.local> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [74.92.245.13] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Multiple errors on server -- Where do I start looking? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2012 16:35:57 -0000 I've run into some error messages on my server that are beyond my skill lev= el of interpreting, so I'm hoping some of you can help me out. I've already= posted this on the forums at http://forums.freebsd.org/showthread.php?p=3D= 165258#post165258 but since this is affecting our business, I'm trying to r= each out to a broader audience and hopefully get this thing resolved. We have an Intel modular blade server. The chassis has 2x 3-disk RAID(5) ar= rays. Volume 1 is what the OS (FreeBSD 7.2) is installed on and Volume 2 is= mounted at /usr. These two volumes are da0 and da1. I got email notifications saying the web host I run in a jail hosted on thi= s server was down. I try to SSH into it, but it fails. I ping it and I get = a 50% return rate. So I log in to the management blade and start a virtual = KVM sessions to get into the blade. Once I'm into the basehost blade, I cat= dmesg.today and get a slew of errors. Here we go.. (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state (da3:mpt0:0:6:1): Retrying Command (per Sense Data) (da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (da3:mpt0:0:6:1): CAM Status: SCSI Status Error (da3:mpt0:0:6:1): SCSI Status: Check Condition (da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state (da3:mpt0:0:6:1): Retrying Command (per Sense Data) (da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (da3:mpt0:0:6:1): CAM Status: SCSI Status Error (da3:mpt0:0:6:1): SCSI Status: Check Condition (da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state (da3:mpt0:0:6:1): Retries Exhausted As mentioned before, our two volumes are da0 and da1. /dev lists da2 and da= 3 as well, but I have no idea what they are. How do I figure out what da3 = is and what do the above error messages say about it? Someone on the forum = asked me if the two volumes are on the same controller and the answer is ye= s, they are. GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1. GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf. GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a. GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807. Trying to mount root from ufs:/dev/da0s1a GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed. GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1. GEOM_LABEL: Label ufsid/4aeb0387d999941a removed. GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed. GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a. GEOM_LABEL: Label for provider da1s1 is ufsid/4bd2077f23a6cc93. GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed. GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807. GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed. GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf. GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed. GEOM_LABEL: Label ufsid/4aeb0387d999941a removed. GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed. GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed. GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed. Was root unmounted? Whats going on here? Obviously there's some issue with = da0, which is mounted at /. The server has been up and running fine, so why= am I seeing "Trying to mount root from ufs:/dev/da0s1a"? pid 93248 (httpd), uid 80: exited on signal 10 pid 95624 (httpd), uid 80: exited on signal 10 pid 97956 (httpd), uid 80: exited on signal 10 pid 97935 (httpd), uid 80: exited on signal 10 pid 96603 (httpd), uid 80: exited on signal 10 pid 93210 (httpd), uid 80: exited on signal 10 pid 98246 (httpd), uid 80: exited on signal 10 This is apparently whats killing our webserver. Apache receives a signal 10= and quits.. Everything I've read says it's an issue with Apache trying to = access RAM that it shouldn't or that doesn't exist.. Is there something els= e with the above da0 or da3 errors that would cause a SIGBUS on httpd? Then after that it goes back and repeats that first block of da3 errors a b= unch more times. The server was down for about 10 minutes and then it just = fixed itself. It's weird because it seems the apache child processes all ge= t killed off by the sigbus but the parent process doesn't.. so once the pro= blem works itself out, it continues operations as normal without me having = to restart the daemon or anything. The management blade in the server chassis is reporting that all the hardwa= re is fine. We have a second blade that boots off of a second partition in = Volume 1 and it doesn't have any problems at all. I'm at a loss here! Ryan Merrell This e-mail message is for the sole use of the intended recipient(s) and ma= y contain privileged or confidential information. Unauthorized use, distrib= ution, review or disclosure is prohibited. If you are not the intended reci= pient, please notify the sender immediately by reply email and destroy all = copies of the original message.