Date: Tue, 20 Apr 2010 16:36:14 +0530 From: "C. Jayachandran" <c.jayachandran@gmail.com> To: Rui Paulo <rpaulo@freebsd.org> Cc: freebsd-mips@freebsd.org Subject: Re: SMP support for XLR processors. Message-ID: <j2q98a59be81004200406xca7ac107o6435c4e9c1b8b81f@mail.gmail.com> In-Reply-To: <C0874E32-D2F2-4A29-B868-CADDE47EB0C3@freebsd.org> References: <w2z98a59be81004171540t2f0d5193nca2ec9e2540502e2@mail.gmail.com> <CFE92A18-C834-45C5-B18C-7F62437D1A2B@lakerest.net> <z2z98a59be81004190411hd4bee7e4t6e5eed3d3789180a@mail.gmail.com> <6BDB3874-D779-45A6-ABAE-4C331D78A189@lakerest.net> <y2m98a59be81004190657kce2488b0p86a725b1175cb14b@mail.gmail.com> <l2n98a59be81004200252lf1d0a372pfae8ac5f55440e58@mail.gmail.com> <7BEFA3F5-97AE-477C-9DD3-EF1C4B7DCEB0@freebsd.org> <BC57A6F0-4F2E-47F4-92BF-849AD18FC004@freebsd.org> <o2n98a59be81004200349yefc11499n4497544d6dbd9d0b@mail.gmail.com> <C0874E32-D2F2-4A29-B868-CADDE47EB0C3@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 20, 2010 at 4:26 PM, Rui Paulo <rpaulo@freebsd.org> wrote: > On 20 Apr 2010, at 11:49, C. Jayachandran wrote: > >> On Tue, Apr 20, 2010 at 4:03 PM, Rui Paulo <rpaulo@freebsd.org> wrote: >>> On 20 Apr 2010, at 11:05, Rui Paulo wrote: >>> >>>> On 20 Apr 2010, at 10:52, C. Jayachandran wrote: >>>> >>>>> On Mon, Apr 19, 2010 at 7:27 PM, C. Jayachandran >>>>> <c.jayachandran@gmail.com> wrote: >>>>>> I have a possible cause for the panic with invariants - we should no= t >>>>>> schedule the msgring threads unless the smp is completely up. I gues= s >>>>>> we start getting message ring interrupts on before the message ring >>>>>> threads can be scheduled. =A0I am trying out some changes for this - >>>>>> will send you a patch if this fixes it. >>>>> >>>>> I've attached a patch that should fix the issue. The cause was the wa= y >>>>> message ring threads are started on individual cores and the way >>>>> interrupts are enabled in the core. =A0I've moved starting message ri= ng >>>>> threads on other cpus to be a SYSINIT after SMP is started. =A0I'd >>>>> thought originally that it was due to some clash with the changes in >>>>> HEAD - but looks like I was completely off-track there. >>>>> >>>>> Please let me know if you don't get multi-user with 32 cpus with this >>>>> patch. There is still the original hang in buildworld, but that shoul= d >>>>> be a bug elsewhere >>>>> >>>>> I have a copy at http://sites.google.com/site/cjayachandran/files too >>>> >>>> This works perfectly, thanks! >>> >>> On further inspection, I noticed that the load avg is now 7. >>> >>> last pid: =A01613; =A0load averages: =A06.99, =A06.97, =A06.08 =A0 =A0u= p 0+00:30:11 =A010:32:48 >>> 108 processes: 40 running, 24 sleeping, 44 waiting >>> CPU: =A00.0% user, =A00.0% nice, 21.9% system, =A00.0% interrupt, 78.1%= idle >>> Mem: 8444K Active, 6028K Inact, 37M Wired, 308K Cache, 6800K Buf, 3190M= Free >>> Swap: >>> >>> =A0PID USERNAME =A0THR PRI NICE =A0 SIZE =A0 =A0RES STATE =A0 C =A0 TIM= E =A0 WCPU COMMAND >>> =A0 10 root =A0 =A0 =A0 32 171 ki31 =A0 =A0 0G =A0 =A0 0G CPU0 =A0 =A00= 263:26 2500.00% idle >>> =A0 17 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU12 = =A0 2 =A0 0:00 100.00% msg_intr12 >>> =A0 15 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU4 = =A0 =A02 =A0 0:00 100.00% msg_intr4 >>> =A0 16 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU8 = =A0 =A02 =A0 0:00 100.00% msg_intr8 >>> =A0 20 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU24 = =A0 1 =A0 0:00 100.00% msg_intr24 >>> =A0 19 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU20 = =A0 1 =A0 0:00 100.00% msg_intr20 >>> =A0 21 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU28 = =A0 1 =A0 0:00 100.00% msg_intr28 >>> =A0 18 root =A0 =A0 =A0 =A01 -16 =A0 =A0- =A0 =A0 0K =A0 =A0 0G CPU16 = =A0 1 =A0 0:00 100.00% msg_intr16 >>> >>> What are these msg_intrXX kprocs doing? >> >> They should really be sleeping unless there is a lot of network >> traffic :) =A0The msg_intr threads are interrupt handlers which we run >> one per core, in the first thread of each core. =A0They were modelled >> after interrupt threads (in FreeBSD 6). This should be sleeping until >> there is a message ring interrupt (which tells us that an IO has send >> data to our core over the message ring). >> >> Thanks for the report - I will look at the sleep logic. I'm not seeing the issue here(my output for ref below). The rge patch should not really make a difference - but it will be good to try with that. The only other difference I can think of between our configs is MFS root/NFS root and rge0/rge1 - but none of these should affect the message ring threads. Can you send me the config you use? regards, JC. 114 processes: 33 running, 29 sleeping, 52 waiting CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 7380K Active, 9252K Inact, 38M Wired, 8K Cache, 9200K Buf, 3160M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 10 root 32 171 ki31 0G 0G CPU0 0 61:46 3199.46% idle 1802 rmi 1 44 0 0G 0G CPU1 1 0:00 0.29% top 11 root 44 -68 - 0K 0G WAIT 0 0:00 0.00% intr 0 root 3 8 0 0G 0G - 0 0:00 0.00% kernel 1789 root 1 55 0 0G 0G sbwait 0 0:00 0.00% sshd 1784 root 1 70 0 0G 0G ttyin 1 0:00 0.00% csh 1 root 1 44 0 0G 0G wait 26 0:00 0.00% init 1783 root 1 50 0 0G 0G wait 1 0:00 0.00% login 6 root 1 -8 - 0K 0G mdwait 3 0:00 0.00% md0 1792 rmi 1 44 0 0G 0G select 0 0:00 0.00% sshd 13 root 1 -16 - 0K 0G WAIT 0 0:00 0.00% msg_intr= 0 22 root 1 -16 - 0K 0G WAIT 28 0:00 0.00% msg_intr= 28 12 root 1 -16 - 0K 0G - 2 0:00 0.00% yarrow 1424 root 1 44 0 0G 0G select 3 0:00 0.00% syslogd 19 root 1 -16 - 0K 0G WAIT 16 0:00 0.00% msg_intr= 16 18 root 1 -16 - 0K 0G WAIT 12 0:00 0.00% msg_intr= 12 21 root 1 -16 - 0K 0G WAIT 24 0:00 0.00% msg_intr= 24 17 root 1 -16 - 0K 0G WAIT 8 0:00 0.00% msg_intr= 8 20 root 1 -16 - 0K 0G WAIT 20 0:00 0.00% msg_intr= 20 1793 rmi 1 44 0 0G 0G wait 0 0:00 0.00% sh 2 root 1 -8 - 0K 0G - 4 0:00 0.00% g_event 3 root 1 -8 - 0K 0G - 4 0:00 0.00% g_up 16 root 1 -16 - 0K 0G WAIT 4 0:00 0.00% msg_intr= 4
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?j2q98a59be81004200406xca7ac107o6435c4e9c1b8b81f>