Date: Mon, 6 Feb 2012 16:15:49 +0000 From: Ryan Merrell <ryan.merrell@careerstep.com> To: "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org> Subject: Multiple errors on server -- Where do I start looking? Message-ID: <2187B4E2EDE5044CA48617AC0C8D6E1B0E2AFA38@MBX021-W3-CA-1.exch021.domain.local>
next in thread | raw e-mail | index | archive | help
I've run into some error messages on my server that are beyond my skill lev= el of interpreting, so I'm hoping some of you can help me out. I've already= posted this on the forums at http://forums.freebsd.org/showthread.php?p=3D= 165258#post165258 but since this is affecting our business, I'm trying to r= each out to a broader audience and hopefully get this thing resolved. We have an Intel modular blade server. The chassis has 2x 3-disk RAID(5) ar= rays. Volume 1 is what the OS (FreeBSD 7.2) is installed on and Volume 2 is= mounted at /usr. These two volumes are da0 and da1. I got email notifications saying the web host I run in a jail hosted on thi= s server was down. I try to SSH into it, but it fails. I ping it and I get = a 50% return rate. So I log in to the management blade and start a virtual = KVM sessions to get into the blade. Once I'm into the basehost blade, I cat= dmesg.today and get a slew of errors. Here we go.. (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state (da3:mpt0:0:6:1): Retrying Command (per Sense Data) (da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (da3:mpt0:0:6:1): CAM Status: SCSI Status Error (da3:mpt0:0:6:1): SCSI Status: Check Condition (da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state (da3:mpt0:0:6:1): Retrying Command (per Sense Data) (da3:mpt0:0:6:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0 (da3:mpt0:0:6:1): CAM Status: SCSI Status Error (da3:mpt0:0:6:1): SCSI Status: Check Condition (da3:mpt0:0:6:1): ILLEGAL REQUEST asc:4,b (da3:mpt0:0:6:1): Logical unit not accessible, target port in standby state (da3:mpt0:0:6:1): Retries Exhausted As mentioned before, our two volumes are da0 and da1. /dev lists da2 and da= 3 as well, but I have no idea what they are. How do I figure out what da3 = is and what do the above error messages say about it? Someone on the forum = asked me if the two volumes are on the same controller and the answer is ye= s, they are. GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1. GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf. GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a. GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807. Trying to mount root from ufs:/dev/da0s1a GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed. GEOM_LABEL: Label for provider da0s1a is ufsid/4aeb03874c64d9f1. GEOM_LABEL: Label ufsid/4aeb0387d999941a removed. GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed. GEOM_LABEL: Label for provider da0s1e is ufsid/4aeb0387d999941a. GEOM_LABEL: Label for provider da1s1 is ufsid/4bd2077f23a6cc93. GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed. GEOM_LABEL: Label for provider da0s1f is ufsid/4aeb038766c4c807. GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed. GEOM_LABEL: Label for provider da0s1d is ufsid/4aeb038ae8ae24cf. GEOM_LABEL: Label ufsid/4aeb03874c64d9f1 removed. GEOM_LABEL: Label ufsid/4aeb0387d999941a removed. GEOM_LABEL: Label ufsid/4aeb038766c4c807 removed. GEOM_LABEL: Label ufsid/4aeb038ae8ae24cf removed. GEOM_LABEL: Label ufsid/4bd2077f23a6cc93 removed. Was root unmounted? Whats going on here? Obviously there's some issue with = da0, which is mounted at /. The server has been up and running fine, so why= am I seeing "Trying to mount root from ufs:/dev/da0s1a"? pid 93248 (httpd), uid 80: exited on signal 10 pid 95624 (httpd), uid 80: exited on signal 10 pid 97956 (httpd), uid 80: exited on signal 10 pid 97935 (httpd), uid 80: exited on signal 10 pid 96603 (httpd), uid 80: exited on signal 10 pid 93210 (httpd), uid 80: exited on signal 10 pid 98246 (httpd), uid 80: exited on signal 10 This is apparently whats killing our webserver. Apache receives a signal 10= and quits.. Everything I've read says it's an issue with Apache trying to = access RAM that it shouldn't or that doesn't exist.. Is there something els= e with the above da0 or da3 errors that would cause a SIGBUS on httpd? Then after that it goes back and repeats that first block of da3 errors a b= unch more times. The server was down for about 10 minutes and then it just = fixed itself. It's weird because it seems the apache child processes all ge= t killed off by the sigbus but the parent process doesn't.. so once the pro= blem works itself out, it continues operations as normal without me having = to restart the daemon or anything. The management blade in the server chassis is reporting that all the hardwa= re is fine. We have a second blade that boots off of a second partition in = Volume 1 and it doesn't have any problems at all. I'm at a loss here! Ryan Merrell This e-mail message is for the sole use of the intended recipient(s) and ma= y contain privileged or confidential information. Unauthorized use, distrib= ution, review or disclosure is prohibited. If you are not the intended reci= pient, please notify the sender immediately by reply email and destroy all = copies of the original message.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2187B4E2EDE5044CA48617AC0C8D6E1B0E2AFA38>