Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Feb 2006 16:56:07 -0600
From:      Mike Holloway <mikhollo@cisco.com>
To:        Mike Holloway <mikhollo@cisco.com>
Cc:        freebsd-proliant@freebsd.org
Subject:   Re: hpasmcli locks up a DL380G3
Message-ID:  <55636C59-3ED3-4448-AB91-80ACCBC8CFB1@cisco.com>
In-Reply-To: <A26FFDF4-5C23-4C09-8934-73914CD0F331@cisco.com>
References:  <A26FFDF4-5C23-4C09-8934-73914CD0F331@cisco.com>

next in thread | previous in thread | raw e-mail | index | archive | help


These are probably of no use for debugging, but here are a couple of  
presumably non-maskable-interrupts that were logged just as 2 servers  
rebooted (guess which one of these machines is in Australia):


Feb 28 13:28:36 host3 kernel: <<2<<>22>2>NN>NMNMIMMII I  II SISIASA  
SA A  33330000,,,  E, EI EISEIAISS AA S fAf ffff
Feb 28 13:28:36 host3 kernel:
Feb 28 13:28:36 host3 kernel: f
Feb 28 13:28:36 host3 kernel: f


Mar  1 06:30:51 host14 kernel: NMI INSMAI  NINMSMAII    
3II0SS3,AA0   ,E IESIAS A 3ff3f00f,
Mar  1 06:30:51 host14 kernel:
Mar  1 06:30:51 host14 kernel:
Mar  1 06:30:51 host14 kernel: <
Mar  1 06:30:51 host14 kernel: 2><
Mar  1 06:30:51 host14 kernel: 2,>  EEIISSAA  ffff
Mar  1 06:30:51 host14 kernel:


-mike


On Feb 28, 2006, at 4:38 PM, Mike Holloway wrote:

> >> Hi!
> >>
> >> Sorry for being late on this one, found this browsing around.
> >>
> >> Yes, I have had ONE machine lock up on me once.
> >> And older HP Proliant DL380G1 UP. Just as you describe, it had been
> >> working great for
> >> a couple of weeks, then suddenly when starting hpasmcli it froze.
> >> Couldn't even ping the machine.
> >>
> >> This particualar machine really is not doing anything, and as I  
> belive
> >> it still is running (Moved/changed job) and I could probably
> >> recreate the lockup.
> >> I still have access to this machine, so if anyone want me to try
> >> something, I can do it.
> >> The machine is 600km away from me now, so if lockup occurs it can
> >> take some time to get it
> >> powercycled though.
> >>
> >> Oh, 5.3 or 5.4 as I recall.
> >>
> >> Have you seen any other lockup Greg?
> >
> >I haven't tempted fate that way yet. I always restart the hpasmd
> >before using the client on a machine. This seems to avoid the  
> problem.
> >
> >Thanks for responding to my mail, you're the third person to confirm
> >the problem, which given that it locks the machine up hard, is a very
> >serious one.
> >
> >best.
> >greg.
>
>
> Besides the hpasmcli tool hanging just after the banner message,  
> I've also experienced reboots caused by hpasmd, and have had to  
> remove it completely from my test lab servers.  I was able to find  
> a scenario which would invariably cause the servers to reboot, I  
> had hpasmd running on approximately 20 HP DL380 G4 servers all  
> running the same customized FreeBSD 6.0 release kernel on x86  
> (intel xeon).
>
> All machines were configured to run hpasmcli -s "show temps;" every  
> 5 minutes, within a perl wrapper around hpasmcli (included below)  
> which would kill the perl wrapper process (and so hpasmcli) via an  
> ALARM signal if hpasmcli didn't exit within 45 seconds.  Within a  
> few hours, a few machines would show the hpasmcli tool hanging and  
> only displaying the banner message.  Cron was continuing to run-and- 
> kill the hung hpasmcli tool every 5 minutes for some period of  
> hours before I would notice.  After commenting out the cron job and  
> verifying that no hpasmcli processes existed,  I could then stop  
> hpasmd via the init script, which sends a TERM signal to the  
> process followed by a KILL signal a couple of seconds later.   
> Without exception those servers would spontaneously reboot a few  
> minutes (2-5) later.  On servers that the hpasmcli tool hadn't yet  
> hung, I could stop hpasmd with no ill effects to the system.
>
>
> John, are you still working on this very useful tool?  I can  
> provide access to a DL380 G4 if you need a platform to test on.
>
>
> -mike
>
>
> #!/usr/bin/perl
>
> eval {
>    local $SIG{ALRM} =
>    sub {
>       local $SIG{HUP} = 'IGNORE';
>       kill 1,(-$$);
>    };
>    alarm 45;
>    system ("/usr/sbin/hpasmcli -s \"show temps;\"");
>    alarm 0;
> };
>
> $SIG{HUP} = 'DEFAULT';
>
> exit 0;
> _______________________________________________
> freebsd-proliant@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-proliant
> To unsubscribe, send any mail to "freebsd-proliant- 
> unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55636C59-3ED3-4448-AB91-80ACCBC8CFB1>