Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Feb 2017 00:35:37 +0100
From:      Mateusz Guzik <mjguzik@gmail.com>
To:        Mark Millard <markmi@dsl-only.net>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>, mjg@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, svn-src-head@freebsd.org, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]
Message-ID:  <20170220233537.GB26759@dft-labs.eu>
In-Reply-To: <CD8044D2-C11A-4ABE-B72D-62BDDE302C7C@dsl-only.net>
References:  <2FD12B8F-2255-470A-98D4-2DCE9C7495F5@dsl-only.net> <20170220191044.GA8526@dft-labs.eu> <83428304-87BE-413C-BAB9-8FF218E7661C@dsl-only.net> <CD8044D2-C11A-4ABE-B72D-62BDDE302C7C@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote:
> On 2017-Feb-20, at 2:58 PM, Mark Millard <markmi@dsl-only.net> wrote:
> 
> > On 2017-Feb-20, at 11:10 AM, Mateusz Guzik <mjguzik at gmail.com> wrote:
> > 
> >> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
> >>> [Note: I experiment with clang based powerpc64 builds,
> >>> reporting problems that I find. Justin is familiar
> >>> with this, as is Nathan.]
> >>> 
> >>> I tried to update the PowerMac G5 (a so-called "Quad Core")
> >>> that I have access to from head -r312761 to -r313864 and
> >>> ended up with random panics and hang ups in fairly short
> >>> order after booting.
> >>> 
> >>> Some approximate bisecting for the kernel lead to:
> >>> (sometimes getting part way into a buildkernel attempt
> >>> for a different version before a failure happens)
> >>> 
> >>> -r313266: works (just before use of atomic_fcmpset)
> >>> vs.
> >>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
> >>> 
> >>> (I did not try -r313268 through -r313270 as the use was
> >>> gradually added.)
> >>> 
> >>> So I'm currently running a -r313864 world with a -r313266
> >>> kernel.
> >>> 
> >>> No kernel that I tried that was from before -r313266 had the
> >>> problems.
> >>> 
> >>> Any kernel that I tried that was from after -r313271 had the
> >>> problems.
> >>> 
> >>> Of course I did not try them all in other direction. :)
> >>> 
> >> 
> >> I found that spin mutexes were not properly handling this, fixed in
> >> r313996.
> >> 
> >> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
> >> fcmpset to simulate failures. Everything works, while it would easily
> >> fail without the patch.
> >> 
> >> That said, I hope this concludes the 'missing check for not-reread value
> >> of failed fcmpset' saga.
> >> 
> >> -- 
> >> Mateusz Guzik <mjguzik gmail.com>
> > 
> > I tried to update from -r313864 to -r313999 in my amd64 context
> > (a VirtualBox machine under macOS) but it now crashes late in
> > the boot sequence (after it processes a dump if I make one but
> > before I can log in).
> > 
> > This update was via my usual explicit svnlite update; buildworld
> > buildkernel; etc. production style build of world and kernel,
> > including use of MALLOC_PRODUCTION.
> > 
> > The window shows:
> > 
> > _vm_map_lock+0xf
> > vm_map_wire+0x32
> > rtROMemObjNativeLockInMap+0x8c
> > rtROMemObjNativeLockUser+0x51
> > RTR0MemObjLockUserTag+0x231
> > vbglR0HGCMInternalPreprocessCall+0x65d
> > vbglR0HGCMInternalCall+0x17c
> > vgdrvIoCtl_HGCMCall+0x43f
> > VGDrvCommonIoCtl+0x261
> > vgdrvFreeBSDIOCtl+0x2cd
> > devfs_ioctl+0xae
> > VOP_IOCTL_APV+0x88
> > vn_ioctl+0x161
> > devfs_ioctl_f+0x1f
> > kern_ioctl+0x280
> > sys_ioctl+0x13f
> > amd64_syscall+0x397
> > Xfast_syscall+0xfb
> 
> More detail from booting with the -r313864 kernel.old
> and using kgdb on what the dump produced:
> 
> # kgdb kernel.debug /var/crash/vmcore.
> /var/crash/vmcore.0    /var/crash/vmcore.last
> # kgdb kernel.debug /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> <118>Starting vboxservice.
> <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 18:37:45) release log
> <118>00:00:00.000120 main     Log opened 2017-02-20T22:38:46.348080000Z
> <118>00:00:00.000162 main     OS Product: FreeBSD
> <118>00:00:00.000171 main     OS Release: 12.0-CURRENT
> <118>00:00:00.000180 main     OS Version: FreeBSD 12.0-CURRENT  r313999M
> <118>00:00:00.000192 main     Executable: /usr/local/sbin/VBoxService
> <118>00:00:00.000194 main     Process ID: 609
> <118>00:00:00.000196 main     Package type: BSD_64BITS_GENERIC (OSE)
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 02
> fault virtual address   = 0xd6
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff80d4ebaf
> stack pointer           = 0x28:0xfffffe0122e2bef0
> frame pointer           = 0x28:0xfffffe0122e2bf00
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 609 (VBoxService)
> 



> #9  0xffffffff80eb6be1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
> #10 0xffffffff80d4ebaf in _vm_map_lock (map=0x1, file=0x0, line=0) at /usr/src/sys/vm/vm_map.c:501

The function is:
void
_vm_map_lock(vm_map_t map, const char *file, int line)
{

        if (map->system_map)
                mtx_lock_flags_(&map->system_mtx, 0, file, line);
        else
                sx_xlock_(&map->lock, file, line);
        map->timestamp++;
}

system_map is at offset 0xd5, thus the faulting address of 0xd6 with map
address of 1 looks like the backtrace is corect. But this suggests the
bug is unrelated to my changes and there is a chance there is no bug in
the first place.

Please make sure that the virtualbox module is recompiled against proper
source tree. If the problem persists, please bisect. The range is not
big.

Off hand I don't see what can cause the failure in question (and chances
are there is no bug if kbi changed and the module was not recompiled).

> #11 0xffffffff80d51ea2 in vm_map_wire (map=<value optimized out>, start=4534272, end=4538368, flags=1) at /usr/src/sys/vm/vm_map.c:2534
> #12 0xffffffff8265291c in rtR0MemObjNativeLockInMap () from /boot/modules/vboxguest.ko
> #13 0xffffffff82652881 in rtR0MemObjNativeLockUser () from /boot/modules/vboxguest.ko
> #14 0xffffffff8264ec01 in RTR0MemObjLockUserTag () from /boot/modules/vboxguest.ko
> #15 0xffffffff82624afd in vbglR0HGCMInternalPreprocessCall () from /boot/modules/vboxguest.ko
> #16 0xffffffff8262411a in VbglR0HGCMInternalCall () from /boot/modules/vboxguest.ko
> #17 0xffffffff8261ec4f in vgdrvIoCtl_HGCMCall () from /boot/modules/vboxguest.ko
> #18 0xffffffff8261d221 in VGDrvCommonIoCtl () from /boot/modules/vboxguest.ko
> 

-- 
Mateusz Guzik <mjguzik gmail.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170220233537.GB26759>