Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Apr 2016 15:16:09 +0100
From:      Justin Clift <justin@postgresql.org>
To:        freebsd-infiniband@freebsd.org
Subject:   Kernel panic (page fault) on 10.3-STABLE with IB & VIMAGE modules
Message-ID:  <210EB5F8-DEC1-4F5E-9CC7-003AF3784B50@postgresql.org>

next in thread | raw e-mail | index | archive | help
Hi all,

Have been hitting a kernel panic (page fault) with the IB modules loaded
on 10.3-STABLE.  (compiled multiple times over the last few days, all =
panicing)

Spent several hours narrowing down the cause, and it's definitely a bad
interaction between the IB modules (unsure which) + the "VIMAGE" module.

I'll fill out a bug report in a bit.  In the meantime, does the below =
have any
useful info in it that I can use for further investigation?  (commands =
taken from
=
https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.h=
tml)

=
**************************************************************************=
*********

root@cluster1:/usr/obj/usr/src/sys/CONNECTX # kgdb kernel.debug =
/var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you =
are
welcome to change it and/or distribute copies of it under certain =
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for =
details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
code segment		=3D base 0x0, limit 0xfffff, type 0x1b
			=3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	=3D interrupt enabled, resume, IOPL =3D 0
current process		=3D 12 (irq271: mlx4_core0)
trap number		=3D 12
panic: page fault
cpuid =3D 0
KDB: stack backtrace:
#0 0xffffffff807263d0 at kdb_backtrace+0x60
#1 0xffffffff806e8c76 at vpanic+0x126
#2 0xffffffff806e8b43 at panic+0x43
#3 0xffffffff80b8bf3b at trap_fatal+0x36b
#4 0xffffffff80b8c23d at trap_pfault+0x2ed
#5 0xffffffff80b8b8ba at trap+0x47a
#6 0xffffffff80b71892 at calltrap+0x8
#7 0xffffffff807be1a2 at netisr_dispatch_src+0x62
#8 0xffffffff808f89fa at ipoib_cm_handle_rx_wc+0x22a
#9 0xffffffff808fcc98 at ipoib_ib_completion+0x78
#10 0xffffffff80930c43 at mlx4_cq_completion+0x63
#11 0xffffffff80933d43 at mlx4_eq_int+0x2c3
#12 0xffffffff80932fac at mlx4_msi_x_interrupt+0xc
#13 0xffffffff806b35cb at intr_event_execute_handlers+0xab
#14 0xffffffff806b3a16 at ithread_loop+0x96
#15 0xffffffff806b104a at fork_exit+0x9a
#16 0xffffffff80b71dce at fork_trampoline+0xe
Uptime: 3m47s
Dumping 485 out of 7857 =
MB:..4%..14%..24%..33%..43%..53%..63%..73%..83%..93%

Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
#0  doadump (textdump=3D<value optimized out>) at pcpu.h:219
219		__asm("movq %%gs:%1,%0" : "=3Dr" (td)
(kgdb) list *0xffffffff808f89fa
0xffffffff808f89fa is in ipoib_cm_handle_rx_wc =
(/usr/src/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c:565).
560		mb->m_pkthdr.rcvif =3D dev;
561		proto =3D *mtod(mb, uint16_t *);
562		m_adj(mb, IPOIB_ENCAP_LEN);
563=09
564		IPOIB_MTAP_PROTO(dev, mb, proto);
565		ipoib_demux(dev, mb, ntohs(proto));
566=09
567	repost:
568		if (has_srq) {
569			if (unlikely(ipoib_cm_post_receive_srq(priv, =
wr_id)))
Current language:  auto; currently minimal
(kgdb) list *0xffffffff807be1a2
0xffffffff807be1a2 is in netisr_dispatch_src =
(/usr/src/sys/net/netisr.c:976).
971		if (dispatch_policy =3D=3D NETISR_DISPATCH_DIRECT) {
972			nwsp =3D DPCPU_PTR(nws);
973			npwp =3D &nwsp->nws_work[proto];
974			npwp->nw_dispatched++;
975			npwp->nw_handled++;
976			netisr_proto[proto].np_handler(m);
977			error =3D 0;
978			goto out_unlock;
979		}
980=09
(kgdb) list *0xffffffff80b71892
0xffffffff80b71892 is at /usr/src/sys/amd64/amd64/exception.S:238.
233		.type	calltrap,@function
234	calltrap:
235		movq	%rsp,%rdi
236		call	trap
237		MEXITCOUNT
238		jmp	doreti			/* Handle any pending =
ASTs */
239=09
240		/*
241		 * alltraps_noen entry point.  Unlike alltraps above, we =
want to
242		 * leave the interrupts disabled.  This corresponds to
(kgdb) list *0xffffffff80b8b8ba
0xffffffff80b8b8ba is in trap (/usr/src/sys/amd64/amd64/trap.c:447).
442=09
443			KASSERT(cold || td->td_ucred !=3D NULL,
444			    ("kernel trap doesn't have ucred"));
445			switch (type) {
446			case T_PAGEFLT:			/* page fault */
447				(void) trap_pfault(frame, FALSE);
448				goto out;
449=09
450			case T_DNA:
451				KASSERT(!PCB_USER_FPU(td->td_pcb),
(kgdb)

=
**************************************************************************=
*********

Regards and best wishes,

Justin Clift

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?210EB5F8-DEC1-4F5E-9CC7-003AF3784B50>