From owner-freebsd-hackers@FreeBSD.ORG Sat Oct 2 05:44:20 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CDAF016A4CE for ; Sat, 2 Oct 2004 05:44:20 +0000 (GMT) Received: from w2xo.jcdurham.com (18.gibs5.xdsl.nauticom.net [209.195.184.19]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5417843D54 for ; Sat, 2 Oct 2004 05:44:20 +0000 (GMT) (envelope-from durham@jcdurham.com) Received: from dhcp13.home.jcdurham.com (dhcp13.home.jcdurham.com [192.168.5.13]) by w2xo.jcdurham.com (8.12.11/8.11.6) with ESMTP id i925iJL4086308; Sat, 2 Oct 2004 01:44:19 -0400 (EDT) (envelope-from durham@jcdurham.com) From: Jim Durham To: freebsd-hackers@freebsd.org Date: Sat, 2 Oct 2004 01:44:17 -0400 User-Agent: KMail/1.7 References: <200410020334.i923YbYB000383@mail.cruzio.com> In-Reply-To: <200410020334.i923YbYB000383@mail.cruzio.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200410020144.17936.durham@jcdurham.com> Subject: Re: Sudden Reboots X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 05:44:20 -0000 On Friday 01 October 2004 11:34 pm, Bruce R. Montague wrote: > Hi, re: > > The odd thing was that it was happening at virtualy > > the same time every morning.... > > [...] > > Then, they both just *stopped doing it by themselves* with no apparent > > correlation to anything installed software-wise. Neither server has had > > any problem for over a year now. > > * What was the external power situation, grounding, > static situation, or other "noise"? Was the UPS or > power-conditioning OK? Same rack, same UPS as all the other Dell 2650 servers. Same ethernet switches, etc. Same UPS. > Any large radars nearby? :) Nope.. > Radars have actually been known to matter. I once > knew a system that died like this and it turned out > to be because it was mounted three floors above a > loading dock... a ROM pin or somesuch was doing a > great job as a vibration detector, whenever trucks > backed into the dock hard. > > Which brings up the question, what's the cheapest/best > way these days to atually monitor high-res > sags/spikes/sags on the line into a box? Decades ago > it was a Drantez meter; I see they're still around: > www.dranetz-bmi.com You used to be able to get the power company to come out and put recording voltmeters on the line if you complained loudly enough.. > > Does anyone have any such "line-monitor" unit that > they particularly recommend as a good low-end buy? > > > * Handwaving general remark about VM space overhead... > Early virtual memory systems rapidly ran into the > problem that all of physical memory became consummed > by page tables. The solution was to page the page > tables (which is why modern architectures support > hierarchies of page tables). As systems become larger > this solution typically becomes less-and-less > effective, because each page in every _virtual_ > address space requires a page table entry. If you > have many large addresses spaces, this requires many > page table entries total (this acts as pressure to > make pages larger). The page tables become large > data structures; managing them (keeping parts in > memory when needed) can become a bottleneck. If you > have other restrictions (the page tables have to fit > in an address space segment, say, a kernel data > segment), the virtual space allocated for this data > structure can become exhausted. A kernel usually > needs to have page tables that can map every page > of physical memory, so for this page table, the more > physical memory present, the larger the table. > > Page tables are used because they allow a page table > entry to be accessed via a simple addition based > on most of the virtual address. This is fast. > > As address spaces grow above 32-bits, the potential > size of the page tables becomes more important. For > very large address spaces some form of "single-level > store" or "inverted page table" scheme is often > proposed. Instead of having a page table entry for > each page of virtual address space, these systems > have the equivalent of a page table entry for each > page of _physical_ memory. All addresses are effectively > disk-block+offset addresses; the virtual memory > hardware does an associative search to locate the > physical block in memory that corresponds to the > disk-block. This requires more expensive hardware > then a simple addition, but such systems only require > a page table entry for every page of physical memory. > These systems have been built from early days, but > are typically not competitive with VM systems that > require simple addition. (I think the IBM AS/400 is > the only widely-used commercial hardware using this > approach) At some point address space growth, cheap > associative lookup memories, and required page table > size may make this approach competitive. Yes, wow...you're dragging me back to CS-401 or whatever. We had a page fault indicating meter that you played around with different algorithms on and tried to get it to read lower. I think it was on a PDP-40. (Wow..am I old). > Thanks to all for the suggestions. I'm still not totally convinced it's hardware. Try googling for "FreeBSD Sudden Reboot" and you'll see a lot of the same syndrome. BTW, healthd is running on that box and show this: Temp.= 49.0, 41.5, 0.0; Rot.= 5113, 0, 0 Vcore = 1.71, 0.00; Volt. = 3.34, 4.89, 12.04, -1.78, -0.91 that's all well within limits. The two temps are proc and chip set. The rest is pretty self-explanatory. Once again...much appreciated all who commented. -- -Jim