Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Aug 2018 23:47:20 +0800
From:      Meowthink <meowthink@gmail.com>
To:        "karu.pruun" <karu.pruun@gmail.com>
Cc:        freebsd-hackers@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Help diagnose my Ryzen build problem (in progress)
Message-ID:  <CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV%2BEaw@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi Peeter,

On 8/28/18, karu.pruun <karu.pruun@gmail.com> wrote:
> On Mon, Aug 27, 2018 at 6:07 PM Meowthink <meowthink@gmail.com> wrote:
>
>> >> Unfortunately, that's for Ryzens family 17h model 00h-0fh, whereas my
>> >> Ryzen 5 2400G's model is 11h.
>> >>
>> >> On the microcode. It shall be updated through UEFI/BIOS updates. I
>> >> think mine is now PinnaclePI-AM4_1.0.0.4 with microcode patchlevel
>> >> 0x810100b.
>> >>
>> >> Seems like ... the only thing I can do is sit down and wait?
>> >
>> > The revision
>> >
>> > https://svnweb.freebsd.org/base/head/sys/x86/x86/cpu_machdep.c?r1=336763&r2=336762&pathrev=336763
>> >
>> > works around the mwait issue, i.e. it sets
>> >
>> > sysctl machdep.idle_mwait=0
>> > sysctl machdep.idle=hlt
>> >
>>
>> I think that shall not apply to 2400G, which is model 11h not 1h.
>> Here're what I have now:
>>
>> machdep.idle: acpi
>> machdep.idle_available: spin, mwait, hlt, acpi
>> machdep.idle_apl31: 0
>> machdep.idle_mwait: 1
>>
>> > Now it may or may not relate to your problem, but it appears that
>> > Ryzen 2400G also has another issue with HLT, see the DragonFly bug
>> > report
>> >
>> > https://bugs.dragonflybsd.org/issues/3131
>> >
>>
>> Thanks a lot for that info.
>> It's much easier to prove your problem, since it's reproducible. But
>> mine was so random to catch...
>> Anyway, it seems like the IRET issue [1] is still not fixed? I'm
>> highly doubt that my issue is this related because my system became
>> significantly more stable since I stop that irq storm from bluetooth
>> module - Though it still panics occasionally.
>> So could anybody tell, what's the difference between FreeBSD
>> workaround [2] and the DragonflyBSD one?
>>
>> > which AMD is aware of and is possibly working on, but it may not have
>> > appeared in the errata yet. The bug report says that until this is
>> > fixed, the workaround is to also disable HLT in cpu_idle. I am not
>> > sure what is the correct value for the sysctl on FreeBSD, perhaps
>> >
>> > sysctl machdep.idle=0
>> >
>> > or some other value?
>>
>> In the meantime, I have this microcode
>>
>> # cpucontrol -m 0x8b /dev/cpuctl0
>> MSR 0x8b: 0x00000000 0x0810100b
>>
>> Hence I should use mwait?
>> Still don't know what should I set. Any idea?
>
>
> If I was you, I'd play around with the sysctls mentioned above and see
> if it helps. Start with disabling both mwait and hlt, perhaps
>
> machdep.idle=spin
> machdep.idle_mwait=0
>
> (assuming that 'spin' means hlt will not used) and then if that does
> not lead to a panic, try enabling mwait. I can't test 2400G since I
> don't have it any more. I booted FreeBSD a couple of times but did not
> run it over long periods of time.

It works!
After hours and hours of different stressing. I got 8 copies of gcc
built without any problem.

But it costs lots of power and the fan will become very annoying. As
so, I don't think I'll test long term stability with this state.

machdep.idle: acpi -> spin
 - will add ~5W, maybe some deeper C states disabled?
machdep.idle_mwait: 1 -> 0
 - will add another ~50W, CPUs are working insomniac.

I tried to set machdep.idle_mwait to 1, or machdep.idle to mwait. Both
failed with panics when I start building gcc pass by pass.

I'm pretty sure mwait will cause problem, as once I experienced a
panic immediately after I issued the sysctl command (the 2nd dump info
followed)

So my next step will be hlt. Still need some time, though.

>
> Cheers
>
> Peeter
>
> --
>

Cheers,
meowthink

------------------------------------------------------------------------
machdep.idle=mwait

panic: ffs_syncvnode: syncing truncated data.
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80dcddc4 at ffs_syncvnode+0x5a4
#4 0xffffffff80dcc915 at ffs_fsync+0x25
#5 0xffffffff810ffcb2 at VOP_FSYNC_APV+0x82
#6 0xffffffff80bc3a62 at sched_sync+0x412
#7 0xffffffff80abd813 at fork_exit+0x83
#8 0xffffffff80f5cc7e at fork_trampoline+0xe

------------------------------------------------------------------------
machdep.idle_mwait=1

Fatal trap 9: general protection fault while in kernel mode
cpuid = 7; apic id = 07
instruction pointer     = 0x20:0xffffffff80e094fe
stack pointer           = 0x0:0xfffffe081e5df9e0
frame pointer           = 0x0:0xfffffe081e5dfa50
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 17 (dom0)
trap number             = 9
panic: general protection fault
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80b414b7 at kdb_backtrace+0x67
#1 0xffffffff80afa9e7 at vpanic+0x177
#2 0xffffffff80afa863 at panic+0x43
#3 0xffffffff80f7c14f at trap_fatal+0x35f
#4 0xffffffff80f7b70e at trap+0x5e
#5 0xffffffff80f5bccc at calltrap+0x8
#6 0xffffffff80e07a17 at vm_pageout+0x87
#7 0xffffffff80abd813 at fork_exit+0x83
#8 0xffffffff80f5cc7e at fork_trampoline+0xe



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABnABoamgeDUMBXvGwHzgjKrQvHSXC8o3wVRhtu5hFsiLV%2BEaw>