Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Jun 2013 15:52:22 -0700
From:      Justin Hibbits <jhibbits@freebsd.org>
To:        Super Bisquit <superbisquit@gmail.com>
Cc:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: Strange panic on ppc64
Message-ID:  <CAHSQbTDOCj9hnBtoJ2SN63zDAo08ZekNx7nqgT1C957VHSZ1qg@mail.gmail.com>
In-Reply-To: <CA%2BWntOt%2BjhfgtjmoPLCLqoTymAMwfhLoEb4sdbJzFXULswOvfQ@mail.gmail.com>
References:  <CAHSQbTAZTc9puGaH0rbhyY11s0%2BL0xGjSabK1kj65UMm1t7j3w@mail.gmail.com> <51AF6661.3060007@freebsd.org> <CAHSQbTBjza0u7nZf4z%2BxpTCcWj-TW-ZigV2-CZexuBOYQX5=3A@mail.gmail.com> <CAHSQbTCvFXDZPsOnmogc0FkZeMXwOP6h40F2kFUu2s6UmffyPw@mail.gmail.com> <51B345BE.5030905@freebsd.org> <CAHSQbTDnwne3KJWN7xjcUw4PhF-uiD4B-4y1Lf90Bfou-2Ppvw@mail.gmail.com> <51B4A389.4020607@freebsd.org> <CAHSQbTACtejaRKiG4qScSV_EdTC8y_k5Qghx_FYebWzstBP61g@mail.gmail.com> <CA%2BWntOt%2BjhfgtjmoPLCLqoTymAMwfhLoEb4sdbJzFXULswOvfQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 9, 2013 at 3:40 PM, Super Bisquit <superbisquit@gmail.com>wrote:

> See if you have enough time to ssh into the box. It may not be so stupid
> of an idea. You could monitor from another machine.  I'm not sure if the
> gnome desktop recording works with ssh but that may help. Again, this is
> just an idea.
>
>
> On Sun, Jun 9, 2013 at 5:21 PM, Justin Hibbits <jhibbits@freebsd.org>wrote:
>
>> On Sun, Jun 9, 2013 at 8:47 AM, Nathan Whitehorn <nwhitehorn@freebsd.org
>> >wrote:
>>
>> >  On 06/08/13 17:33, Justin Hibbits wrote:
>> >
>> >
>> >
>> >
>> > On Sat, Jun 8, 2013 at 7:54 AM, Nathan Whitehorn <
>> nwhitehorn@freebsd.org>wrote:
>> >
>> >>   On 06/08/13 09:21, Justin Hibbits wrote:
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Jun 5, 2013 at 9:47 AM, Justin Hibbits <jhibbits@freebsd.org
>> >wrote:
>> >>
>> >>> Will do, when I get it panicking again.
>> >>>
>> >>> - Justin
>> >>>   On Jun 5, 2013 9:46 AM, "Nathan Whitehorn" <nwhitehorn@freebsd.org>
>> >>> wrote:
>> >>>
>> >>>> On 06/04/13 22:35, Justin Hibbits wrote:
>> >>>>
>> >>>>> After a string of seemingly random hangs, I added invariants (but
>> not
>> >>>>> witness) to my custom kernel config, and I get the following panic,
>> >>>>> recreated from a fuzzy cell phone picture:
>> >>>>>
>> >>>>>
>> >>>>> [thread pid -1 tid 1006665719 ]
>> >>>>> Stopped at 0: illegal instruction 0
>> >>>>> db> panic: mutex ohci1 owned at
>> >>>>> /usr/home/chmeee/freebsd/head/sys/dev/usb/usb_transfer.c:2280
>> >>>>> cpuid = 0
>> >>>>> Uptime: 9h8m1s
>> >>>>> <my dump code>
>> >>>>> ...
>> >>>>> panic: msleep1
>> >>>>> cpu = 0
>> >>>>> KDB: enter: panic
>> >>>>> [ thread pid -1 tid 100665719 ]
>> >>>>> ....
>> >>>>>
>> >>>>> The first question I have is how the hell it got such a strange
>> >>>>> PID/TID,
>> >>>>> memory corruption my guess, something is stomping on the pcpu or
>> >>>>> something,
>> >>>>> and I think these hangs have only happened since I added a lot more
>> >>>>> memory
>> >>>>> (up to 12G from 4G, Andreas Tobler was seeing hangs as well), so it
>> >>>>> might
>> >>>>> be something in the moea64 pmap code, but that's pure speculation
>> on my
>> >>>>> part.  Then the other panic messages, owned mutex and panic in
>> >>>>> msleep1.  I
>> >>>>> enabled more trace code, so hopefully the next time it panics I can
>> >>>>> collect
>> >>>>> better data.
>> >>>>>
>> >>>>> - Justin
>> >>>>> _______________________________________________
>> >>>>> freebsd-ppc@freebsd.org mailing list
>> >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
>> >>>>> To unsubscribe, send any mail to "
>> freebsd-ppc-unsubscribe@freebsd.org"
>> >>>>>
>> >>>>
>> >>>> Could you post the output from show reg? It looks like it tried to
>> jump
>> >>>> to a null pointer there.
>> >>>> -Nathan
>> >>>>
>> >>>
>> >>  Well, it's hard to do get that output, because I just hit that 'mutex
>> >> owned' panic, and here's the backtrace:
>> >>
>> >>
>> >>
>> >>  The mutex thing is spurious -- it was already panicing and then
>> paniced
>> >> again trying to panic. Can you get the backtrace for the original
>> panic (it
>> >> should be different) and the values of the registers?
>> >> -Nathan
>> >>
>> >
>> >  Here you go:
>> >
>> > [ thread pid -1 tid 1006665719 ]
>> > Stopped at      0:      illegal instruction 0
>> > db:0:kdb.enter.default> show reg
>> > r0                   0
>> > r1                   0
>> > r2            0xab63d0  M_MACTEMP
>> > r3            0xbb12e0
>> > r4            0x741f18  .ofwcall+0xa8
>> > r5                   0
>> > r6            0xa4f1a8
>> > r7                 0x1
>> > r8                 0x1
>> > r9            0xc10500  __pcpu
>> > r10          0x1c35ec0
>> > r11                  0
>> > r12         0x2000d032
>> > r13         0x342eb000
>> > r14         0x10014200
>> > r15         0xffffffffffffcb58
>> > r16                0x2
>> > r17                0x2
>> > r18         0xffffffffffffcb50
>> > r19                  0
>> > r20         0xc000000013231478
>> > r21         0xc00000014c0ce200
>> > r22                  0
>> > r23               0x64  dbsize+0x10
>> > r24         0xc00000014c0cdf70
>> > r25           0xb62cb8  smp_no_rendevous_barrier
>> > r26                  0
>> > r27           0x741f18  .ofwcall+0xa8
>> > r28           0x741f18  .ofwcall+0xa8
>> > r29         0x2000d032
>> > r30         0x9000000000001032
>> > r31           0xc0cad8  mac_labeled
>> > srr0          0x102ca4  k_trap+0x28
>> > srr1        0x9000000000001032
>> > lr            0x102c74  u_trap+0x10
>> > ctr         0xff846d78
>> > cr          0x2000f1b0
>> > xer                  0
>> > dar         0xfffffffffffffd60
>> > dsisr       0x42000000
>> > 0:      illegal instruction 0
>> > db:0:kdb.enter.default>  bt
>> > Tracing pid -1 tid 1006665719 td 0
>> >  (nothing)
>> >
>> >
>> > Well, that is all kinds of messed up. It appears to have halted while
>> > handling a userland trap due to an implicit branch caused by bad
>> > translations when it restores the kernel SRs. Could you see what 'show
>> > pcpu' does? Does that information look valid at all? I suspect it has
>> > become corrupted somehow.
>> > -Nathan
>> >
>> >
>> Here's the full log from dconschat, from bootup to panic.  Unfortunately,
>> not everything I wanted to print would print, and I can't type anything
>> once it panics, because it panics when reading the keyboard, so I have to
>> add everything as a ddb enter script.  Here's what I've added so far
>> (doesn't do everything as you can see from the transcript):
>>
>>     script kdb.enter.default=show reg; bt; show pcpu; ps; run lockinfo;
>> alltrace; show all procs; show files; show malloc; show allchains
>>
>> - Justin
>>
>> _______________________________________________
>> freebsd-ppc@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
>> To unsubscribe, send any mail to "freebsd-ppc-unsubscribe@freebsd.org"
>>
>
>
I do ssh into the box.  I can run a buildworld just fine, but as soon as it
finishes (and sometimes before it finishes) it panics.  If I don't put much
pressure on the VM (sit it idle, read man pages, etc), it can last for
days.  Heavy building, like what I've been doing using poudriere (a very
nice piece of software I might add), it crashes.  It could simply be bad
RAM, since it never crashed until I added my new RAM.  However, that's
nearly impossible to accurately test, so I'd like to exhaust every other
problem before tackling that idea.

- Justin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHSQbTDOCj9hnBtoJ2SN63zDAo08ZekNx7nqgT1C957VHSZ1qg>