From owner-freebsd-hackers@FreeBSD.ORG  Sat Oct  2 05:44:20 2004
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id CDAF016A4CE
	for <freebsd-hackers@freebsd.org>;
	Sat,  2 Oct 2004 05:44:20 +0000 (GMT)
Received: from w2xo.jcdurham.com (18.gibs5.xdsl.nauticom.net [209.195.184.19])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5417843D54
	for <freebsd-hackers@freebsd.org>;
	Sat,  2 Oct 2004 05:44:20 +0000 (GMT)
	(envelope-from durham@jcdurham.com)
Received: from dhcp13.home.jcdurham.com (dhcp13.home.jcdurham.com
	[192.168.5.13])
	by w2xo.jcdurham.com (8.12.11/8.11.6) with ESMTP id i925iJL4086308;
	Sat, 2 Oct 2004 01:44:19 -0400 (EDT)
	(envelope-from durham@jcdurham.com)
From: Jim Durham <durham@jcdurham.com>
To: freebsd-hackers@freebsd.org
Date: Sat, 2 Oct 2004 01:44:17 -0400
User-Agent: KMail/1.7
References: <200410020334.i923YbYB000383@mail.cruzio.com>
In-Reply-To: <200410020334.i923YbYB000383@mail.cruzio.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200410020144.17936.durham@jcdurham.com>
Subject: Re: Sudden Reboots
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 05:44:20 -0000

On Friday 01 October 2004 11:34 pm, Bruce R. Montague wrote:
>  Hi, re:
> > The odd thing was that it was happening at virtualy
> > the same time every morning....
> > [...]
> > Then, they both just *stopped doing it by themselves* with no apparent
> > correlation to anything installed software-wise. Neither server has had
> > any problem for over a year now.
>
> * What was the external power situation, grounding,
> static situation, or other "noise"?  Was the UPS or
> power-conditioning OK? 

Same rack, same UPS as all the other Dell 2650 servers. Same ethernet 
switches, etc. Same UPS. 

> Any large radars nearby? :) 

Nope..

> Radars have actually been known to matter. I once
> knew a system that died like this and it turned out
> to be because it was mounted three floors above a
> loading dock... a ROM pin or somesuch was doing a
> great job as a vibration detector, whenever trucks
> backed into the dock hard.
>
> Which brings up the question, what's the cheapest/best
> way these days to atually monitor high-res
> sags/spikes/sags on the line into a box? Decades ago
> it was a Drantez meter; I see they're still around:
>   www.dranetz-bmi.com

You used to be able to get the power company to come out and put recording 
voltmeters on the line if you complained loudly enough..
>
> Does anyone have any such "line-monitor" unit that
> they particularly recommend as a good low-end buy?
>
>
> * Handwaving general remark about VM space overhead...
> Early virtual memory systems rapidly ran into the
> problem that all of physical memory became consummed
> by page tables. The solution was to page the page
> tables (which is why modern architectures support
> hierarchies of page tables). As systems become larger
> this solution typically becomes less-and-less
> effective, because each page in every _virtual_
> address space requires a page table entry. If you
> have many large addresses spaces, this requires many
> page table entries total (this acts as pressure to
> make pages larger). The page tables become large
> data structures; managing them (keeping parts in
> memory when needed) can become a bottleneck.  If you
> have other restrictions (the page tables have to fit
> in an address space segment, say, a kernel data
> segment), the virtual space allocated for this data
> structure can become exhausted. A kernel usually
> needs to have page tables that can map every page
> of physical memory, so for this page table, the more
> physical memory present, the larger the table.
>
> Page tables are used because they allow a page table
> entry to be accessed via a simple addition based
> on most of the virtual address. This is fast.
>
> As address spaces grow above 32-bits, the potential
> size of the page tables becomes more important. For
> very large address spaces some form of "single-level
> store" or "inverted page table" scheme is often
> proposed. Instead of having a page table entry for
> each page of virtual address space, these systems
> have the equivalent of a page table entry for each
> page of _physical_ memory. All addresses are effectively
> disk-block+offset addresses; the virtual memory
> hardware does an associative search to locate the
> physical block in memory that corresponds to the
> disk-block. This requires more expensive hardware
> then a simple addition, but such systems only require
> a page table entry for every page of physical memory.
> These systems have been built from early days, but
> are typically not competitive with VM systems that
> require simple addition. (I think the IBM AS/400 is
> the only widely-used commercial hardware using this
> approach) At some point address space growth, cheap
> associative lookup memories, and required page table
> size may make this approach competitive.

Yes, wow...you're dragging me back to CS-401 or whatever. We had a page fault 
indicating meter that you played around with different algorithms on and 
tried to get it to read lower. I think it was on a PDP-40. (Wow..am I old).
>

Thanks to all for the suggestions. I'm still not totally convinced it's 
hardware. Try googling for "FreeBSD Sudden Reboot" and you'll see a lot of 
the same syndrome. 

BTW, healthd is running on that box and show this:

Temp.= 49.0, 41.5,  0.0; Rot.= 5113,    0,    0
 Vcore = 1.71, 0.00; Volt. = 3.34, 4.89, 12.04,  -1.78, -0.91

that's all well within limits. The two temps are proc and chip set. The rest 
is pretty self-explanatory. 

Once again...much appreciated all who commented.
-- 
-Jim