Date: Thu, 21 Apr 2016 15:16:09 +0100 From: Justin Clift <justin@postgresql.org> To: freebsd-infiniband@freebsd.org Subject: Kernel panic (page fault) on 10.3-STABLE with IB & VIMAGE modules Message-ID: <210EB5F8-DEC1-4F5E-9CC7-003AF3784B50@postgresql.org>
next in thread | raw e-mail | index | archive | help
Hi all, Have been hitting a kernel panic (page fault) with the IB modules loaded on 10.3-STABLE. (compiled multiple times over the last few days, all = panicing) Spent several hours narrowing down the cause, and it's definitely a bad interaction between the IB modules (unsure which) + the "VIMAGE" module. I'll fill out a bug report in a bit. In the meantime, does the below = have any useful info in it that I can use for further investigation? (commands = taken from = https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.h= tml) = **************************************************************************= ********* root@cluster1:/usr/obj/usr/src/sys/CONNECTX # kgdb kernel.debug = /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain = conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for = details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 12 (irq271: mlx4_core0) trap number =3D 12 panic: page fault cpuid =3D 0 KDB: stack backtrace: #0 0xffffffff807263d0 at kdb_backtrace+0x60 #1 0xffffffff806e8c76 at vpanic+0x126 #2 0xffffffff806e8b43 at panic+0x43 #3 0xffffffff80b8bf3b at trap_fatal+0x36b #4 0xffffffff80b8c23d at trap_pfault+0x2ed #5 0xffffffff80b8b8ba at trap+0x47a #6 0xffffffff80b71892 at calltrap+0x8 #7 0xffffffff807be1a2 at netisr_dispatch_src+0x62 #8 0xffffffff808f89fa at ipoib_cm_handle_rx_wc+0x22a #9 0xffffffff808fcc98 at ipoib_ib_completion+0x78 #10 0xffffffff80930c43 at mlx4_cq_completion+0x63 #11 0xffffffff80933d43 at mlx4_eq_int+0x2c3 #12 0xffffffff80932fac at mlx4_msi_x_interrupt+0xc #13 0xffffffff806b35cb at intr_event_execute_handlers+0xab #14 0xffffffff806b3a16 at ithread_loop+0x96 #15 0xffffffff806b104a at fork_exit+0x9a #16 0xffffffff80b71dce at fork_trampoline+0xe Uptime: 3m47s Dumping 485 out of 7857 = MB:..4%..14%..24%..33%..43%..53%..63%..73%..83%..93% Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols #0 doadump (textdump=3D<value optimized out>) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) list *0xffffffff808f89fa 0xffffffff808f89fa is in ipoib_cm_handle_rx_wc = (/usr/src/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c:565). 560 mb->m_pkthdr.rcvif =3D dev; 561 proto =3D *mtod(mb, uint16_t *); 562 m_adj(mb, IPOIB_ENCAP_LEN); 563=09 564 IPOIB_MTAP_PROTO(dev, mb, proto); 565 ipoib_demux(dev, mb, ntohs(proto)); 566=09 567 repost: 568 if (has_srq) { 569 if (unlikely(ipoib_cm_post_receive_srq(priv, = wr_id))) Current language: auto; currently minimal (kgdb) list *0xffffffff807be1a2 0xffffffff807be1a2 is in netisr_dispatch_src = (/usr/src/sys/net/netisr.c:976). 971 if (dispatch_policy =3D=3D NETISR_DISPATCH_DIRECT) { 972 nwsp =3D DPCPU_PTR(nws); 973 npwp =3D &nwsp->nws_work[proto]; 974 npwp->nw_dispatched++; 975 npwp->nw_handled++; 976 netisr_proto[proto].np_handler(m); 977 error =3D 0; 978 goto out_unlock; 979 } 980=09 (kgdb) list *0xffffffff80b71892 0xffffffff80b71892 is at /usr/src/sys/amd64/amd64/exception.S:238. 233 .type calltrap,@function 234 calltrap: 235 movq %rsp,%rdi 236 call trap 237 MEXITCOUNT 238 jmp doreti /* Handle any pending = ASTs */ 239=09 240 /* 241 * alltraps_noen entry point. Unlike alltraps above, we = want to 242 * leave the interrupts disabled. This corresponds to (kgdb) list *0xffffffff80b8b8ba 0xffffffff80b8b8ba is in trap (/usr/src/sys/amd64/amd64/trap.c:447). 442=09 443 KASSERT(cold || td->td_ucred !=3D NULL, 444 ("kernel trap doesn't have ucred")); 445 switch (type) { 446 case T_PAGEFLT: /* page fault */ 447 (void) trap_pfault(frame, FALSE); 448 goto out; 449=09 450 case T_DNA: 451 KASSERT(!PCB_USER_FPU(td->td_pcb), (kgdb) = **************************************************************************= ********* Regards and best wishes, Justin Clift -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?210EB5F8-DEC1-4F5E-9CC7-003AF3784B50>