Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Jul 2018 09:52:24 +0200
From:      Niclas Zeising <zeising+freebsd@daemonic.se>
To:        Warner Losh <imp@bsdimp.com>, John Baldwin <jhb@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, Hans Petter Selasky <hps@selasky.org>, Pete Wright <pete@nomadlogic.org>, "O. Hartmann" <ohartmann@walstatt.org>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: atomic changes break drm-next-kmod?
Message-ID:  <4797c607-c261-77f7-eccf-45056bf56694@daemonic.se>
In-Reply-To: <CANCZdfqGyANQ5uUz_Ebc3i5HDLvkWocDs=J2p5xuj=1OttGWYQ@mail.gmail.com>
References:  <bb2cac77-4bcd-c87c-9bc9-ce5f8ce1c726@nomadlogic.org> <845aca10-8c01-fa3b-087f-f957df4e7531@nomadlogic.org> <063ae5c3-0584-1284-dd9d-ab8b5790baf1@FreeBSD.org> <0bf8e57b-fdb4-4c1a-3d0d-a734f8187ca8@nomadlogic.org> <4c5411dd-9f6b-7245-6ade-e11040f74687@FreeBSD.org> <24f5d737-a205-6fcc-0a33-a84601d2ff7a@nomadlogic.org> <c459a76c-21a2-2510-54b1-d7edee6eaa1e@FreeBSD.org> <eb84c2ed-1cd8-794f-9d5e-9454edeba4e4@nomadlogic.org> <29ce4eab-6667-d2ca-b5d8-3deeef28f142@selasky.org> <df73594c-785a-663d-6c76-bf95466a7aa3@selasky.org> <20180705193646.GM5562@kib.kiev.ua> <5dc2a315-4b71-9ff0-0a37-576649e9144b@FreeBSD.org> <CANCZdfqGyANQ5uUz_Ebc3i5HDLvkWocDs=J2p5xuj=1OttGWYQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 07/06/18 00:02, Warner Losh wrote:
>=20
>=20
> On Thu, Jul 5, 2018 at 1:44 PM, John Baldwin <jhb@freebsd.org=20
> <mailto:jhb@freebsd.org>> wrote:
>=20
>     On 7/5/18 12:36 PM, Konstantin Belousov wrote:
>      > On Thu, Jul 05, 2018 at 09:12:24PM +0200, Hans Petter Selasky wr=
ote:
>      >> On 07/05/18 20:59, Hans Petter Selasky wrote:
>      >>> On 07/05/18 19:48, Pete Wright wrote:
>      >>>>
>      >>>>
>      >>>> On 07/05/2018 10:10, John Baldwin wrote:
>      >>>>> On 7/3/18 5:10 PM, Pete Wright wrote:
>      >>>>>>
>      >>>>>> On 07/03/2018 15:56, John Baldwin wrote:
>      >>>>>>> On 7/3/18 3:34 PM, Pete Wright wrote:
>      >>>>>>>> On 07/03/2018 15:29, John Baldwin wrote:
>      >>>>>>>>> That seems like kgdb is looking at the wrong CPU.=C2=A0 =
Can
>     you use
>      >>>>>>>>> 'info threads' and look for threads not stopped in
>     'sched_switch'
>      >>>>>>>>> and get their backtraces?=C2=A0 You could also just do '=
thread
>     apply
>      >>>>>>>>> all bt' and put that file at a URL if that is easiest.
>      >>>>>>>>>
>      >>>>>>>> sure thing John - here's a gist of "thread apply all bt"
>      >>>>>>>>
>      >>>>>>>>
>     https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed
>     <https://gist.github.com/gem-pete/d8d7ab220dc8781f0827f965f09d43ed>;
>      >>>>>>> That doesn't look right at all.=C2=A0 Are you sure the ker=
nel
>     matches the
>      >>>>>>> vmcore?=C2=A0 Also, which kgdb version are you using?
>      >>>>>>>
>      >>>>>> yea i agree that doesn't look right at all.=C2=A0 here is m=
y setup:
>      >>>>>>
>      >>>>>> $ which kgdb
>      >>>>>> /usr/bin/kgdb
>      >>>>>> $ kgdb
>      >>>>>> GNU gdb 6.1.1 [FreeBSD]
>      >>>>>> $ ls -lh /var/crash/vmcore.1
>      >>>>>> -rw-------=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0 1.6G Jul=C2=
=A0 3 15:03
>     /var/crash/vmcore.1
>      >>>>>> $ ls -l /usr/lib/debug/boot/kernel/kernel.debug
>      >>>>>> -r-xr-xr-x=C2=A0 1 root=C2=A0 wheel=C2=A0 87840496 Jul=C2=A0=
 3 13:54
>      >>>>>> /usr/lib/debug/boot/kernel/kernel.debug
>      >>>>>>
>      >>>>>> and i invoke kgdb like so:
>      >>>>>> $ sudo kgdb /usr/lib/debug/boot/kernel/kernel.debug
>     /var/crash/vmcore.1
>      >>>>>>
>      >>>>>> here's a gist of my full gdb session:
>      >>>>>> http://termbin.com/krsn
>      >>>>>>
>      >>>>>> dunno - maybe i have a bad core dump?=C2=A0 regardless, mor=
e than
>     happy to
>      >>>>>> help so let me know if i should try anything else or patche=
s
>     etc..
>      >>>>> Can you try installing gdb from ports and using
>     /usr/local/bin/kgdb?
>      >>>>>
>      >>>>
>      >>>> that seems to have done the trick, at least the output looks =
more
>      >>>> encouraging.
>      >>>>
>      >>>> =C2=A0=C2=A0--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
>      >>>> KDB: enter: panic
>      >>>>
>      >>>> __curthread () at ./machine/pcpu.h:231
>      >>>> 231=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 __asm("movq %%gs:%1,=
%0" : "=3Dr" (td)
>      >>>>
>      >>>>
>      >>>> here's my full kgdb session:
>      >>>> http://termbin.com/qa4f
>      >>>>
>      >>>> i don't see any threads not in "sched_switch" though :(
>      >>>
>      >>> Hi,
>      >>>
>      >>> The problem may be that the patch to enable atomic inlining of=
 all
>      >>> macros forgot to set the SMP keyword which means SMP is not
>     defined at
>      >>> all for KLD's so all non-kernel atomic usage is with MPLOCKED
>     empty!
>      > Problem is that out-of-tree modules build does not have opt*.h f=
iles
>      > from the kernel.=C2=A0 UP config is a valid one, flipping some o=
ption's
>      > default value does not solve the problem.
>=20
>     Yes, but using the lock prefix in a generic module is ok (it will s=
till
>     work, just not quite as fast) whereas the lack of lock is fatal on
>     SMP.=C2=A0 I would amend Hans' patch slightly to honor the opt_* se=
tting
>     for KLD_TIED (but that is only true if KLD_TIED means "built as par=
t of
>     a kernel build, so has valid opt_foo.h headers" and not
>     'a standalone module where someone put MODULES_TIED=3D1 on the comm=
and
>     line
>     to make').
>=20
>=20
> I agree with this default. It's sensible to default to (a) the most=20
> popular thing and (b) thing that always works, especially when (a) and=20
> (b) are identical.
>=20
> Don't make me start the "Do we really need an SMP option, why not make=20
> it always on" thread :) The number of relevant uniprocessor x86 boxes=20
> that benefit from omitting SMP is so small as to be irrelevant, IMHO. A=
=20
> MP kernel runs just fine on them...
>=20
> Warner

Where are we on this?
It is important to get it fixed, it's already been 4 days, which means 4=20
days of all modern FreeBSD desktop systems being broken, and possibly=20
other systems with kernel modules from ports as well.


Another question, how hard would it be to expose how the kernel was=20
built to modules built from ports, so that they can figure out stuff=20
like SMP and others, that might affect the module build?

Regards
--=20
Niclas



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4797c607-c261-77f7-eccf-45056bf56694>