Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Sep 2012 13:54:37 +0400
From:      Anton Yuzhaninov <citrin@citrin.ru>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Problem with IPMI KCS driver
Message-ID:  <506573DD.2030808@citrin.ru>
In-Reply-To: <201208290825.44198.jhb@freebsd.org>
References:  <503DE2AB.6030702@citrin.ru> <201208290825.44198.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29.08.2012 16:25, John Baldwin wrote:
> On Wednesday, August 29, 2012 5:36:43 am Anton Yuzhaninov wrote:
>> We use servers witch motherboard Supermicro X8DTT-H and meet with such problem:
>> when watchdogd started, server is rebooted by IPMI watchdog several times per week.
>>
>> After some debugging I've found, that sometimes IPMI driver entered endless
>> loop, and watchdogd have no chances to reset watchdog timer.
>> In such situation top show:
>>
>> PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>> ...
>> 113 root          -16    -     0K    16K CPU4    4  17:18 99.17% ipmi0: kcs
>>
>> Endless loop located in file /sys/dev/ipmi/ipmi_kcs.c and function
>> kcs_wait_for_obf():
>>
>>           int status, start = ticks;
>>
>>           status = INB(sc, KCS_CTL_STS);
>>           if (state == 0) {
>>                   /* WAIT FOR OBF = 0 */
>>                   while (ticks - start<  MAX_TIMEOUT&&  status&  KCS_STATUS_OBF) {
>>                           DELAY(100);
>>                           status = INB(sc, KCS_CTL_STS);
>>                   }
>>           } else {
>>                   /* WAIT FOR OBF = 1 */
>>                   while (ticks - start<  MAX_TIMEOUT&&
>>                       !(status&  KCS_STATUS_OBF)) {
>>                           DELAY(100);
>>                           status = INB(sc, KCS_CTL_STS);
>>                   }
>>           }
>>
>> It seems to be, that this loop intended to run no more than MAX_TIMEOUT ticks.
>> but by some reason this timeout does not works and loop runs until reboot.
>>
>> Questions:
>> 1. Is it correct to check ticks to implement timeout here?
>> 2. how to fix this timeout?
>
> Hmm.  Can you try this:
>
> Index: kern/kern_clock.c
> ===================================================================
> --- kern/kern_clock.c	(revision 239819)
> +++ kern/kern_clock.c	(working copy)
> @@ -382,7 +382,7 @@
>   int	stathz;
>   int	profhz;
>   int	profprocs;
> -int	ticks;
> +volatile int	ticks;
>   int	psratio;
>
>   static DPCPU_DEFINE(int, pcputicks);	/* Per-CPU version of ticks. */
> @@ -469,7 +469,7 @@
>   hardclock(int usermode, uintfptr_t pc)
>   {
>
> -	atomic_add_int((volatile int *)&ticks, 1);
> +	atomic_add_int(&ticks, 1);
>   	hardclock_cpu(usermode);
>   	tc_ticktock(1);
>   	cpu_tick_calibration();
> Index: sys/kernel.h
> ===================================================================
> --- sys/kernel.h	(revision 239819)
> +++ sys/kernel.h	(working copy)
> @@ -63,7 +63,7 @@
>   extern int stathz;			/* statistics clock's frequency */
>   extern int profhz;			/* profiling clock's frequency */
>   extern int profprocs;			/* number of process's profiling */
> -extern int ticks;
> +extern volatile int ticks;
>
>   #endif /* _KERNEL */
>
>

With
extern volatile int ticks

Infinite loop repeated not so often, as before, but still repeated.

Symptoms is same:

$ ps -ax -o pid,comm,wchan,state,\%cpu | grep ipmi
   113 ipmi0: kcs    -      RL   100.0
  1317 watchdogd     ipmire Ds    0.0

DDB trace for pid 113:
Tracing pid 113 tid 100359 td 0xffffff0007913470
cpustop_handler() at cpustop_handler+0x37
ipi_nmi_handler() at ipi_nmi_handler+0x30
trap() at trap+0x345
nmi_calltrap() at nmi_calltrap+0x8
--- trap 0x13, rip = 0xffffffff809c6e64, rsp = 0xffffffff80fd1ec0, rbp = 
0xffffff88425d4b30 ---
DELAY() at DELAY+0x64
kcs_wait_for_obf() at kcs_wait_for_obf+0xb6
kcs_read_byte() at kcs_read_byte+0x7d
kcs_loop() at kcs_loop+0x372
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe

I can type cont from ddb, wait some time, enter to ddb - trace for pid 113 will 
be same.

kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to 
/usr/src/sys/dev/ipmi/ipmi_kcs.c:94

  91                 while (ticks - start < MAX_TIMEOUT &&
  92                     !(status & KCS_STATUS_OBF)) {
  93                         DELAY(100);
  94                         status = INB(sc, KCS_CTL_STS);
  95                 }

-- 
  Anton Yuzhaninov



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?506573DD.2030808>