From owner-freebsd-proliant@FreeBSD.ORG Tue Feb 28 22:56:40 2006 Return-Path: X-Original-To: freebsd-proliant@freebsd.org Delivered-To: freebsd-proliant@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA21616A420 for ; Tue, 28 Feb 2006 22:56:40 +0000 (GMT) (envelope-from mikhollo@cisco.com) Received: from sj-iport-4.cisco.com (sj-iport-4.cisco.com [171.68.10.86]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC91043D48 for ; Tue, 28 Feb 2006 22:56:39 +0000 (GMT) (envelope-from mikhollo@cisco.com) Received: from sj-core-5.cisco.com ([171.71.177.238]) by sj-iport-4.cisco.com with ESMTP; 28 Feb 2006 14:56:40 -0800 X-IronPort-AV: i="4.02,154,1139212800"; d="scan'208"; a="1780663545:sNHT32705042" Received: from beasley.cisco.com (beasley.cisco.com [171.71.180.166]) by sj-core-5.cisco.com (8.12.10/8.12.6) with ESMTP id k1SMudVb029575 for ; Tue, 28 Feb 2006 14:56:39 -0800 (PST) Received: from [64.101.130.146] ([64.101.130.146] (may be forged)) by beasley.cisco.com (8.8.6 (PHNE_14041)/CISCO.SERVER.1.2) with ESMTP id OAA15568; Tue, 28 Feb 2006 14:56:40 -0800 (PST) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v746.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <55636C59-3ED3-4448-AB91-80ACCBC8CFB1@cisco.com> Content-Transfer-Encoding: 7bit From: Mike Holloway Date: Tue, 28 Feb 2006 16:56:07 -0600 To: Mike Holloway X-Mailer: Apple Mail (2.746.2) Cc: freebsd-proliant@freebsd.org Subject: Re: hpasmcli locks up a DL380G3 X-BeenThere: freebsd-proliant@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Technical discussion of FreeBSD on HP ProLiant server platforms." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2006 22:56:40 -0000 These are probably of no use for debugging, but here are a couple of presumably non-maskable-interrupts that were logged just as 2 servers rebooted (guess which one of these machines is in Australia): Feb 28 13:28:36 host3 kernel: <<2<<>22>2>NN>NMNMIMMII I II SISIASA SA A 33330000,,, E, EI EISEIAISS AA S fAf ffff Feb 28 13:28:36 host3 kernel: Feb 28 13:28:36 host3 kernel: f Feb 28 13:28:36 host3 kernel: f Mar 1 06:30:51 host14 kernel: NMI INSMAI NINMSMAII 3II0SS3,AA0 ,E IESIAS A 3ff3f00f, Mar 1 06:30:51 host14 kernel: Mar 1 06:30:51 host14 kernel: Mar 1 06:30:51 host14 kernel: < Mar 1 06:30:51 host14 kernel: 2>< Mar 1 06:30:51 host14 kernel: 2,> EEIISSAA ffff Mar 1 06:30:51 host14 kernel: -mike On Feb 28, 2006, at 4:38 PM, Mike Holloway wrote: > >> Hi! > >> > >> Sorry for being late on this one, found this browsing around. > >> > >> Yes, I have had ONE machine lock up on me once. > >> And older HP Proliant DL380G1 UP. Just as you describe, it had been > >> working great for > >> a couple of weeks, then suddenly when starting hpasmcli it froze. > >> Couldn't even ping the machine. > >> > >> This particualar machine really is not doing anything, and as I > belive > >> it still is running (Moved/changed job) and I could probably > >> recreate the lockup. > >> I still have access to this machine, so if anyone want me to try > >> something, I can do it. > >> The machine is 600km away from me now, so if lockup occurs it can > >> take some time to get it > >> powercycled though. > >> > >> Oh, 5.3 or 5.4 as I recall. > >> > >> Have you seen any other lockup Greg? > > > >I haven't tempted fate that way yet. I always restart the hpasmd > >before using the client on a machine. This seems to avoid the > problem. > > > >Thanks for responding to my mail, you're the third person to confirm > >the problem, which given that it locks the machine up hard, is a very > >serious one. > > > >best. > >greg. > > > Besides the hpasmcli tool hanging just after the banner message, > I've also experienced reboots caused by hpasmd, and have had to > remove it completely from my test lab servers. I was able to find > a scenario which would invariably cause the servers to reboot, I > had hpasmd running on approximately 20 HP DL380 G4 servers all > running the same customized FreeBSD 6.0 release kernel on x86 > (intel xeon). > > All machines were configured to run hpasmcli -s "show temps;" every > 5 minutes, within a perl wrapper around hpasmcli (included below) > which would kill the perl wrapper process (and so hpasmcli) via an > ALARM signal if hpasmcli didn't exit within 45 seconds. Within a > few hours, a few machines would show the hpasmcli tool hanging and > only displaying the banner message. Cron was continuing to run-and- > kill the hung hpasmcli tool every 5 minutes for some period of > hours before I would notice. After commenting out the cron job and > verifying that no hpasmcli processes existed, I could then stop > hpasmd via the init script, which sends a TERM signal to the > process followed by a KILL signal a couple of seconds later. > Without exception those servers would spontaneously reboot a few > minutes (2-5) later. On servers that the hpasmcli tool hadn't yet > hung, I could stop hpasmd with no ill effects to the system. > > > John, are you still working on this very useful tool? I can > provide access to a DL380 G4 if you need a platform to test on. > > > -mike > > > #!/usr/bin/perl > > eval { > local $SIG{ALRM} = > sub { > local $SIG{HUP} = 'IGNORE'; > kill 1,(-$$); > }; > alarm 45; > system ("/usr/sbin/hpasmcli -s \"show temps;\""); > alarm 0; > }; > > $SIG{HUP} = 'DEFAULT'; > > exit 0; > _______________________________________________ > freebsd-proliant@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-proliant > To unsubscribe, send any mail to "freebsd-proliant- > unsubscribe@freebsd.org" >