Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Nov 2007 03:55:42 +0100
From:      "n j" <nino80@gmail.com>
To:        "Randy Pratt" <bsd-unix@embarqmail.com>, rsmith@xs4all.nl,  kline@tao.thought.org
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Unexpected shutdown
Message-ID:  <92bcbda50711181855q174772e1q4c83cecde62f7dbe@mail.gmail.com>
In-Reply-To: <20071118164138.ebd3492c.bsd-unix@embarqmail.com>
References:  <92bcbda50711180451h5db8f4ady6e2d21da80d32548@mail.gmail.com> <20071118163747.36C5F4AB7D@mail.kaltimpost.net> <92bcbda50711181312l1dc6b26cteaad3c8db11e17b6@mail.gmail.com> <20071118164138.ebd3492c.bsd-unix@embarqmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello Randy, Roland, Gary,

> This is all ancedotal since I don't have any hard evidence to point
> to exactly one thing since I also swapped out a fan and reinserted
> connectors in the process.  My feeling is that it was hard
> drive heat-related so my suggestion is to do some poking around for hot
> spots, clogged fan filters and any other factors affecting temperatures.

I guess it is possible, if not even likely, that the shutdown was
temperature-related. I'll investigate the fan filters and clean some
dust if they're clogged. The fact that the machine has very small load
(0.1 - 0.2) most of the time and that the disk activity at the time of
shutdown was not intensive leads me to believe that this isn't the
case, but who knows?

> UPS drivers can shut the system down, but you seemed to have ruled
> that out?

The UPS is present, but I never set up and configured anything (no
snmp or any other agents) that would give the UPS the permission to
shutdown the machine and besides there are more machines on the same
UPS that continued to work just fine, so I guess that UPS is ruled
out, yes.

> It could be triggered by the acpi_thermal driver. Check system
> temperatures with sysctl or mbmon.

This is actually what I was looking for, even if it turns out it is
not the solution: a pointer to a useful port plus pointer to reading
the temperatures with sysctl. That kind of things makes the -questions
an invaluable resource.

That remark led me to discover the following:

- kldstat shows acpi.ko loaded
- sysctl has no acpi thermal variables whatsoever!

which further led me to check for acpi thermal variables on another
FreeBSD 6.2 (non-Dell) server and sure they were there. So it seems
that acpi thermal is not working (is perhaps "blacklisted", a term I
noticed in the man page) on Dell Poweredge (in this case PE 1750 as
well as PE 750) servers. Anyone can verify this?

> If the system both shutdown *and* rebooted, I had the same inexplicable thing happen to me many times.

Actually, a small correction - the server shut down and stayed that
way until I turned it back on a couple of hours later. After that, the
server booted just fine and is up right now. It even survived 3 AM
tonight without shutting down.

Regards,
-- 
Nino

"Fact of life: intermittent bugs are hardest to debug."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?92bcbda50711181855q174772e1q4c83cecde62f7dbe>