FreeBSD Mail Archives

Date:      Mon, 6 Feb 2012 16:15:49 +0000
From:      Ryan Merrell <ryan.merrell@careerstep.com>
To:        "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Multiple errors on server -- Where do I start looking?
Message-ID:  <2187B4E2EDE5044CA48617AC0C8D6E1B0E2AFA38@MBX021-W3-CA-1.exch021.domain.local>

next in thread | raw e-mail | index | archive | help

I've run into some error messages on my server that are beyond my skill lev=
el of interpreting, so I'm hoping some of you can help me out. I've already=
 posted this on the forums at http://forums.freebsd.org/showthread.php?p=3D=
165258#post165258 but since this is affecting our business, I'm trying to r=
each out to a broader audience and hopefully get this thing resolved.

We have an Intel modular blade server. The chassis has 2x 3-disk RAID(5) ar=
rays. Volume 1 is what the OS (FreeBSD 7.2) is installed on and Volume 2 is=
 mounted at /usr. These two volumes are da0 and da1.

I got email notifications saying the web host I run in a jail hosted on thi=
s server was down. I try to SSH into it, but it fails. I ping it and I get =
a 50% return rate. So I log in to the management blade and start a virtual =
KVM sessions to get into the blade. Once I'm into the basehost blade, I cat=
 dmesg.today and get a slew of errors. Here we go..
(da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state
(da3:mpt0:0:6:1): Retrying Command (per Sense Data)
(da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0
(da3:mpt0:0:6:1): CAM Status: SCSI Status Error
(da3:mpt0:0:6:1): SCSI Status: Check Condition
(da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b
(da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state
(da3:mpt0:0:6:1): Retrying Command (per Sense Data)
(da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0
(da3:mpt0:0:6:1): CAM Status: SCSI Status Error
(da3:mpt0:0:6:1): SCSI Status: Check Condition
(da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b
(da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state
(da3:mpt0:0:6:1): Retries Exhausted

As mentioned before, our two volumes are da0 and da1. /dev lists da2 and da=
3 as well, but I have no idea what they are.  How do I figure out what da3 =
is and what do the above error messages say about it? Someone on the forum =
asked me if the two volumes are on the same controller and the answer is ye=
s, they are.


GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1.
GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf.
GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a.
GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807.
Trying to mount root from ufs:/dev/da0s1a
GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed.
GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1.
GEOM_LABEL: Label ufsid/4aeb0387d999941a removed.
GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed.
GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a.
GEOM_LABEL: Label for provider da1s1 is ufsid/4bd2077f23a6cc93.
GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed.
GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807.
GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed.
GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf.
GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed.
GEOM_LABEL: Label ufsid/4aeb0387d999941a removed.
GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed.
GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed.
GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed.

Was root unmounted? Whats going on here? Obviously there's some issue with =
da0, which is mounted at /. The server has been up and running fine, so why=
 am I seeing "Trying to mount root from ufs:/dev/da0s1a"?

pid 93248 (httpd), uid 80: exited on signal 10
pid 95624 (httpd), uid 80: exited on signal 10
pid 97956 (httpd), uid 80: exited on signal 10
pid 97935 (httpd), uid 80: exited on signal 10
pid 96603 (httpd), uid 80: exited on signal 10
pid 93210 (httpd), uid 80: exited on signal 10
pid 98246 (httpd), uid 80: exited on signal 10

This is apparently whats killing our webserver. Apache receives a signal 10=
 and quits.. Everything I've read says it's an issue with Apache trying to =
access RAM that it shouldn't or that doesn't exist.. Is there something els=
e with the above da0 or da3 errors that would cause a SIGBUS on httpd?

Then after that it goes back and repeats that first block of da3 errors a b=
unch more times. The server was down for about 10 minutes and then it just =
fixed itself. It's weird because it seems the apache child processes all ge=
t killed off by the sigbus but the parent process doesn't.. so once the pro=
blem works itself out, it continues operations as normal without me having =
to restart the daemon or anything.

The management blade in the server chassis is reporting that all the hardwa=
re is fine. We have a second blade that boots off of a second partition in =
Volume 1 and it doesn't have any problems at all.

I'm at a loss here!


Ryan Merrell


This e-mail message is for the sole use of the intended recipient(s) and ma=
y contain privileged or confidential information. Unauthorized use, distrib=
ution, review or disclosure is prohibited. If you are not the intended reci=
pient, please notify the sender immediately by reply email and destroy all =
copies of the original message.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2187B4E2EDE5044CA48617AC0C8D6E1B0E2AFA38>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation