Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Apr 2010 16:36:14 +0530
From:      "C. Jayachandran" <c.jayachandran@gmail.com>
To:        Rui Paulo <rpaulo@freebsd.org>
Cc:        freebsd-mips@freebsd.org
Subject:   Re: SMP support for XLR processors.
Message-ID:  <j2q98a59be81004200406xca7ac107o6435c4e9c1b8b81f@mail.gmail.com>
In-Reply-To: <C0874E32-D2F2-4A29-B868-CADDE47EB0C3@freebsd.org>
References:  <w2z98a59be81004171540t2f0d5193nca2ec9e2540502e2@mail.gmail.com> <CFE92A18-C834-45C5-B18C-7F62437D1A2B@lakerest.net> <z2z98a59be81004190411hd4bee7e4t6e5eed3d3789180a@mail.gmail.com> <6BDB3874-D779-45A6-ABAE-4C331D78A189@lakerest.net> <y2m98a59be81004190657kce2488b0p86a725b1175cb14b@mail.gmail.com> <l2n98a59be81004200252lf1d0a372pfae8ac5f55440e58@mail.gmail.com> <7BEFA3F5-97AE-477C-9DD3-EF1C4B7DCEB0@freebsd.org> <BC57A6F0-4F2E-47F4-92BF-849AD18FC004@freebsd.org> <o2n98a59be81004200349yefc11499n4497544d6dbd9d0b@mail.gmail.com> <C0874E32-D2F2-4A29-B868-CADDE47EB0C3@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 20, 2010 at 4:26 PM, Rui Paulo <rpaulo@freebsd.org> wrote:
> On 20 Apr 2010, at 11:49, C. Jayachandran wrote:
>
>> On Tue, Apr 20, 2010 at 4:03 PM, Rui Paulo <rpaulo@freebsd.org> wrote:
>>> On 20 Apr 2010, at 11:05, Rui Paulo wrote:
>>>
>>>> On 20 Apr 2010, at 10:52, C. Jayachandran wrote:
>>>>
>>>>> On Mon, Apr 19, 2010 at 7:27 PM, C. Jayachandran
>>>>> <c.jayachandran@gmail.com> wrote:
>>>>>> I have a possible cause for the panic with invariants - we should no=
t
>>>>>> schedule the msgring threads unless the smp is completely up. I gues=
s
>>>>>> we start getting message ring interrupts on before the message ring
>>>>>> threads can be scheduled. =A0I am trying out some changes for this -
>>>>>> will send you a patch if this fixes it.
>>>>>
>>>>> I've attached a patch that should fix the issue. The cause was the wa=
y
>>>>> message ring threads are started on individual cores and the way
>>>>> interrupts are enabled in the core. =A0I've moved starting message ri=
ng
>>>>> threads on other cpus to be a SYSINIT after SMP is started. =A0I'd
>>>>> thought originally that it was due to some clash with the changes in
>>>>> HEAD - but looks like I was completely off-track there.
>>>>>
>>>>> Please let me know if you don't get multi-user with 32 cpus with this
>>>>> patch. There is still the original hang in buildworld, but that shoul=
d
>>>>> be a bug elsewhere
>>>>>
>>>>> I have a copy at http://sites.google.com/site/cjayachandran/files too
>>>>
>>>> This works perfectly, thanks!
>>>
>>> On further inspection, I noticed that the load avg is now 7.
>>>
>>> last pid: =A01613; =A0load averages: =A06.99, =A06.97, =A06.08 =A0 =A0u=
p 0+00:30:11 =A010:32:48
>>> 108 processes: 40 running, 24 sleeping, 44 waiting
>>> CPU: =A00.0% user, =A00.0% nice, 21.9% system, =A00.0% interrupt, 78.1%=
 idle
>>> Mem: 8444K Active, 6028K Inact, 37M Wired, 308K Cache, 6800K Buf, 3190M=
 Free
>>> Swap:
>>>
>>> =A0PID USERNAME =A0THR PRI NICE =A0 SIZE =A0 =A0RES STATE =A0 C =A0 TIM=
E =A0 WCPU COMMAND
>>> =A0 10 root =A0 =A0 =A0 32 171 ki31 =A0 =A0 0G =A0 =A0 0G CPU0 =A0 =A00=
 263:26 2500.00% idle
>>> =A0 17 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU12 =
=A0 2 =A0 0:00 100.00% msg_intr12
>>> =A0 15 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU4 =
=A0 =A02 =A0 0:00 100.00% msg_intr4
>>> =A0 16 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU8 =
=A0 =A02 =A0 0:00 100.00% msg_intr8
>>> =A0 20 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU24 =
=A0 1 =A0 0:00 100.00% msg_intr24
>>> =A0 19 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU20 =
=A0 1 =A0 0:00 100.00% msg_intr20
>>> =A0 21 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU28 =
=A0 1 =A0 0:00 100.00% msg_intr28
>>> =A0 18 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU16 =
=A0 1 =A0 0:00 100.00% msg_intr16
>>>
>>> What are these msg_intrXX kprocs doing?
>>
>> They should really be sleeping unless there is a lot of network
>> traffic :) =A0The msg_intr threads are interrupt handlers which we run
>> one per core, in the first thread of each core. =A0They were modelled
>> after interrupt threads (in FreeBSD 6). This should be sleeping until
>> there is a message ring interrupt (which tells us that an IO has send
>> data to our core over the message ring).
>>
>> Thanks for the report - I will look at the sleep logic.


I'm not seeing the issue here(my output for ref below).  The rge patch
should not really make a difference - but it will be good to try with
that.  The only other difference I can think of between our configs is
MFS root/NFS root and rge0/rge1 - but none of these should affect the
message ring threads.  Can you send me the config you use?

regards,
JC.


114 processes: 33 running, 29 sleeping, 52 waiting
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 7380K Active, 9252K Inact, 38M Wired, 8K Cache, 9200K Buf, 3160M Free
Swap:

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   10 root       32 171 ki31     0G     0G CPU0    0  61:46 3199.46% idle
 1802 rmi         1  44    0     0G     0G CPU1    1   0:00  0.29% top
   11 root       44 -68    -     0K     0G WAIT    0   0:00  0.00% intr
    0 root        3   8    0     0G     0G -       0   0:00  0.00% kernel
 1789 root        1  55    0     0G     0G sbwait  0   0:00  0.00% sshd
 1784 root        1  70    0     0G     0G ttyin   1   0:00  0.00% csh
    1 root        1  44    0     0G     0G wait   26   0:00  0.00% init
 1783 root        1  50    0     0G     0G wait    1   0:00  0.00% login
    6 root        1  -8    -     0K     0G mdwait  3   0:00  0.00% md0
 1792 rmi         1  44    0     0G     0G select  0   0:00  0.00% sshd
   13 root        1 -16    -     0K     0G WAIT    0   0:00  0.00% msg_intr=
0
   22 root        1 -16    -     0K     0G WAIT   28   0:00  0.00% msg_intr=
28
   12 root        1 -16    -     0K     0G -       2   0:00  0.00% yarrow
 1424 root        1  44    0     0G     0G select  3   0:00  0.00% syslogd
   19 root        1 -16    -     0K     0G WAIT   16   0:00  0.00% msg_intr=
16
   18 root        1 -16    -     0K     0G WAIT   12   0:00  0.00% msg_intr=
12
   21 root        1 -16    -     0K     0G WAIT   24   0:00  0.00% msg_intr=
24
   17 root        1 -16    -     0K     0G WAIT    8   0:00  0.00% msg_intr=
8
   20 root        1 -16    -     0K     0G WAIT   20   0:00  0.00% msg_intr=
20
 1793 rmi         1  44    0     0G     0G wait    0   0:00  0.00% sh
    2 root        1  -8    -     0K     0G -       4   0:00  0.00% g_event
    3 root        1  -8    -     0K     0G -       4   0:00  0.00% g_up
   16 root        1 -16    -     0K     0G WAIT    4   0:00  0.00% msg_intr=
4



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?j2q98a59be81004200406xca7ac107o6435c4e9c1b8b81f>