Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Nov 2013 09:56:10 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        pho@freebsd.org, freebsd-current@FreeBSD.org
Subject:   Re: panic: double fault with 11.0-CURRENT r258504
Message-ID:  <20131128075610.GJ59496@kib.kiev.ua>
In-Reply-To: <201311272111.rARLBZk9042868@gw.catspoiler.org>
References:  <20131127200050.GE59496@kib.kiev.ua> <201311272111.rARLBZk9042868@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--TjICR3deYYRuqXKh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Nov 27, 2013 at 01:11:35PM -0800, Don Lewis wrote:
> On 27 Nov, Konstantin Belousov wrote:
> > On Wed, Nov 27, 2013 at 11:35:19AM -0800, Don Lewis wrote:
> >> On 27 Nov, Konstantin Belousov wrote:
> >> > On Wed, Nov 27, 2013 at 11:02:57AM -0800, Don Lewis wrote:
> >> >> On 27 Nov, Konstantin Belousov wrote:
> >> >> > On Wed, Nov 27, 2013 at 10:33:30AM -0800, Don Lewis wrote:
> >> >> >> On 27 Nov, Konstantin Belousov wrote:
> >> >> >> > On Wed, Nov 27, 2013 at 09:41:36AM -0800, Don Lewis wrote:
> >> >> >> >> On 27 Nov, Konstantin Belousov wrote:
> >> >> >> >> > On Wed, Nov 27, 2013 at 02:49:12AM -0800, Don Lewis wrote:
> >> >> >> >> >> <http://people.freebsd.org/~truckman/doublefault2.JPG>;
> >> >> >> >> >=20
> >> >> >> >> > What is the instruction at cpu_switch+0x9b ?
> >> >> >> >>=20
> >> >> >> >> movl 0x8(%edx),%eax
> >> >> >> > So it is line 176 in swtch.s. Is machine still in ddb, or did =
you
> >> >> >> > obtained the core ? If yes, please print out the content of wo=
rds at
> >> >> >> > 0xe4f62bb0 + 4, +8 (*), +16. Please print the content of the w=
ord at
> >> >> >> > address (*) + 8.
> >> >> >>=20
> >> >> >> It is still in ddb.
> >> >> >>=20
> >> >> >> <http://people.freebsd.org/~truckman/doublefault3.JPG>, though n=
ot in
> >> >> >> the above order.
> >> >> > Uhm, sorry, I mistyped the last part of the instructions.
> >> >> >=20
> >> >> > The new thread pointer is 0xd2f4e000, there is nothing incriminat=
ing.
> >> >> > Please print the word at 0xd2f4e000+0x254 =3D=3D 0xd2f4e254, whic=
h would be
> >> >> > the address of the new thread pcb. It is load from the pcb + 8 wh=
ich
> >> >> > faults.
> >> >>=20
> >> >> 0xf3d44d60
> >> > Again, the pointer looks fine, and its tail is 0xd60, which is corre=
ct for
> >> > the pcb offset in the last page of the thread stack.
> >> >=20
> >> > Please do 'show thread 0xd2f4e000' before trying below instructions.
> >>=20
> >> Ok, see below:
> >> =20
> >> > What happens if you try to read word at 0xf3d44d68 ?
> >>=20
> >> Nothing bad ...
> >>=20
> >> <http://people.freebsd.org/~truckman/doublefault4.JPG>;
> >>=20
> > So the thread structure looks sane, the stack region is in place where
> > it is supposed to be, all the gathered data looks self-consistent. And,
> > the access to the faulted address from ddb does not fault.
> >=20
> > Thread stacks can only be invalidated when the process is swapped out a=
nd
> > kernel stack is written to swap.  Your thread flags indicate that it is
> > in memory, and TDF_CANSWAP is not set.  I do not believe that our swapo=
ut
> > code would invalidate stack mapping in such situation, otherwise we wou=
ld
> > have too many complaints already.
> >=20
> > Just in case, do you use swap on this box ?
>=20
> I do.
>=20
> > And, as the last resort, I do understand that this sounds as giving up,
> > do you monitor the temperature of the CPUs ? BTW, which CPUs are that,
> > please show the cpu identification lines from the boot dmesg.
>=20
> I don't monitor the temperature, but I do hear the CPU fan speed ramping
> up and down when I'm building ports like this.  Even though I'm pretty
> much keeping one core busy the whole time, the temperature must drop
> enough at times to let the fan speed drop.
>=20
> I can run math/mprime on this machine for a while to see if anything
> shows up.  I also have a very similar machine (same motherboard but
> different CPU) that I can move the drive over to and test.
>=20
> Here's the full dmesg.boot:
>=20
> Copyright (c) 1992-2013 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 11.0-CURRENT #63 r258614M: Tue Nov 26 00:29:01 PST 2013
>     dl@scratch.catspoiler.org:/usr/obj/usr/src/sys/GENERICSMB i386
> FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
> WARNING: WITNESS option enabled, expect reduced performance.
> CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ (2500.06-MHz 686-clas=
s CPU)
>   Origin =3D "AuthenticAMD"  Id =3D 0x60fb1  Family =3D 0xf  Model =3D 0x=
6b  Stepping =3D 1
>   Features=3D0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,=
PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=3D0x2001<SSE3,CX16>
>   AMD Features=3D0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow=
!>
>   AMD Features2=3D0x11f<LAHF,CMP,SVM,ExtAPIC,CR8,Prefetch>

The errata list for the Athlon 64 X2 is quite long.  Do you have latest
BIOS ?  I am not sure if AMD provides standalone firmware update blocks
for their CPUs.  If any Linux distribution ships updates for AMD CPUs,
it might be useful to load the update with cpucontrol(8).  Even if we
do not hit a CPU bug, it would provide me with more certainity that we
are not chasing ghost.

Another things to try, in vain, is to compile kernel with gcc or disable
SMP.

Peter, could you, please, try to reproduce the issue ?  It does not look
like a random hardware failure, since in all cases, it is curthread access
which is faulting.  The issue is only reported by Don, and so far only
for i386 SMP.

--TjICR3deYYRuqXKh
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSlvcZAAoJEJDCuSvBvK1BRnoP/1lf1AOF5GA266g/P5yaHzjC
pvdKVCY32MuAmhi44JVumIsEoCMPEiioYnOzpmMwWBL068zDoYcLA2WMrU8B4ra/
UCdbtzAh6qaDre/hGSycZ7hzds7KL4V17N01Js8d/p0AF5J/MitR7kbCrt3v/w4U
cEfQEC6lQSTw4IMGt5HeP3qLNQ9JpSqbxU/5eXtduze+xa9PBiTUSmQ7+twTQEEt
omslHubAiRrVwD2jbV9bxcb/08jpUyfLSREj9Fdyss20EGytfBkJrK9rhFdIb7HZ
ZVAPIuH+xIpOnCPNQ0IcSTgO16TPr41P+AnRihq11kJRz9jlfzUW8SeQMasSQ5I7
antuwk2nIPxz32IBphOCuVCVcso7u4kVj6J6k6Aj9TxTLE9jubz3fkSkWQBV58uk
YtgKG696mo0K6acLS4xbnoO2181vNiqXFZdWb39af/q00DCLWuMpPSH4WEyN36Mz
my/XmZblVN4XOkHpAib0XliMfDb2Xt9lcCXx3seBJ3AtslXzYMP6EpV6JLoNtIKM
zb+kZMnX1ayLF4Dd0gQZHZDL1hhEP8XCKuMxfZp7vcySJzTqWbQWO3fpdO2Ms3mP
X3o8KfmKiEjLX66V0Ohof3ZE0VuLn1ZQM1GO/yotBrlSjLu6RBXkoJyHyng7EZVi
Jffw/d2dQnzc1JxJF8E3
=e1jl
-----END PGP SIGNATURE-----

--TjICR3deYYRuqXKh--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131128075610.GJ59496>