From owner-freebsd-questions@freebsd.org Thu Jun 1 15:21:45 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 08593AFF35E for ; Thu, 1 Jun 2017 15:21:45 +0000 (UTC) (envelope-from galtsev@kicp.uchicago.edu) Received: from cosmo.uchicago.edu (cosmo.uchicago.edu [128.135.20.71]) by mx1.freebsd.org (Postfix) with ESMTP id BE0467E476 for ; Thu, 1 Jun 2017 15:21:44 +0000 (UTC) (envelope-from galtsev@kicp.uchicago.edu) Received: by cosmo.uchicago.edu (Postfix, from userid 48) id 476ABCB8CBD; Thu, 1 Jun 2017 10:21:43 -0500 (CDT) Received: from 128.135.52.6 (SquirrelMail authenticated user valeri) by cosmo.uchicago.edu with HTTP; Thu, 1 Jun 2017 10:21:43 -0500 (CDT) Message-ID: <56274.128.135.52.6.1496330503.squirrel@cosmo.uchicago.edu> In-Reply-To: <20170601151425.GF2256@erix.ericsson.se> References: <20170601235447.C98304@sola.nimnet.asn.au> <33501.128.135.52.6.1496329407.squirrel@cosmo.uchicago.edu> <20170601151425.GF2256@erix.ericsson.se> Date: Thu, 1 Jun 2017 10:21:43 -0500 (CDT) Subject: Re: Advice on kernel panics From: "Valeri Galtsev" To: freebsd-questions@freebsd.org Reply-To: galtsev@kicp.uchicago.edu User-Agent: SquirrelMail/1.4.8-5.el5.centos.7 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 15:21:45 -0000 On Thu, June 1, 2017 10:14 am, Raimo Niskanen wrote: > On Thu, Jun 01, 2017 at 10:03:27AM -0500, Valeri Galtsev wrote: >> On Thu, June 1, 2017 9:34 am, Ian Smith wrote: >> > In freebsd-questions Digest, Vol 678, Issue 4, Message: 4 >> > On Thu, 1 Jun 2017 10:27:49 +0200 Raimo Niskanen >> > wrote: >> > > On Thu, Jun 01, 2017 at 12:10:30AM -0500, Doug McIntyre wrote: >> > > > On Mon, May 29, 2017 at 11:20:43AM +0200, Raimo Niskanen wrote: >> > > > > I have a server that panics about every 3 days and need some >> advice >> > on how >> > > > > to handle that. >> > > > >> > > > I'd expect it is some sort of hardware failure, as I would expect >> kernel panics more on the order of once a decade with FreeBSD. Ie. >> I've seen one or two on my hundred or so servers, but its pretty >> > rare. >> > > > >> > > > Check and recheck your hardware items. >> > > >> > > I have removed one of four memory capsules - panicked again. Will >> > rotate >> > > through all of them... >> > > >> > > > >> > > > Runup memtest86+. Check your drive hardware, turn on SMART >> checking. >> > > >> > > I have run memtest86+ over night - no errors found. >> > > >> > > I have installed smartmontools - no errors found, short and long >> self >> > tests >> > > on both disks run fine. zpool scrub repaired 0 errors and has no >> known >> > data >> > > errors. >> > >> > Everyone's suggesting hardware problems, and it's certainly worthwhile >> eliminating that possibility - but this could be a software/OS issue. >> >> I would agree with Ian, it can be software, though it is less likely. I >> have seen a few times that SCSI attached external RAID (attached to LSI >> SCSI HBA) was announcing change of its status (like rebuilt finished or >> drive timed out/failed) which simultaneously with other traffic on SCSI >> bus confused adapter and led to kernel panic. >> >> That said, I will first check hardware thoroughly. Andrea mentioned aged >> PS under heavy load. And these are prime suspects. Of all components >> electrolytic capacitors are the ones degraded most, may even leak, and >> they don't filter ripple sufficiently, thus leading to ripple beyond >> tolerable at high currents. So: >> >> 1. open the box, and inspect interior. System board ("motherboard" is >> its >> jargon name for over 30 years): inspect electrolytic capacitors around >> CPU(s), and those that filter PCI (or PCI-X, or PCI-E) bus power leads. >> Any of them bulged, or even have traces of leaked electrolyte (brown >> residue usually) - throw away system board. The model of your box fall >> into the time span when they used worst electrolytic capacitors. > > I did not think this machine was old, but it has apparently been a few > years... If it is manufactured less than 5 years ago, then I'm mistaken (I do not follow Dell server models closely...) > >> >> 2. re-seat all components (including expansion boards, memory, CPU is >> less >> likely, but I would do that too), disconnect and reconnect all >> connectors. >> Contacts, even gold plated, sometimes do oxidize > > Will try. > >> >> 3. Get new power supply, not necessarily designed for this machine, but >> with the same connectors to the system board, and with higher power >> rating. disconnect box's own PS, and power it from new PS; see if it >> stops >> failing (PSes do have electrolytic capacitors inside as well; other >> components do not degrade but do not die totally, except for ultra high >> frequency diodes and transistors, and very high voltage diodes) >> >> Good luck! >> >> Valeri > > Thank you! > > > -- > > / Raimo Niskanen, Erlang/OTP, Ericsson AB > _______________________________________________ > freebsd-questions@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org" > ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++