Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 4 Sep 2016 11:39:58 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, stable@FreeBSD.org
Subject:   Re: X2APIC support
Message-ID:  <20160904083958.GD34394@zxy.spb.ru>
In-Reply-To: <4ba05c00-f737-f562-553d-a7fa59145768@FreeBSD.org>
References:  <20151212130615.GE70867@zxy.spb.ru> <20151212133513.GL82577@kib.kiev.ua> <20160901112724.GX88122@zxy.spb.ru> <20160901114500.GJ83214@kib.kiev.ua> <20160901121300.GZ88122@zxy.spb.ru> <4ba05c00-f737-f562-553d-a7fa59145768@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Sep 04, 2016 at 11:19:16AM +0300, Andriy Gapon wrote:

> On 01/09/2016 15:13, Slawa Olhovchenkov wrote:
> > DMAR: Found table at 0x79b32798
> > x2APIC available but disabled by DMAR table
> 
> > Event timer "LAPIC" quality 600
> > LAPIC: ipi_wait() us multiplier 1 (r 116268019 tsc 2200043851)
> > ACPI APIC Table: <ALASKA A M I >
> > Package ID shift: 5
> > L3 cache ID shift: 5
> > L2 cache ID shift: 1
> > L1 cache ID shift: 1
> > Core ID shift: 1
> > kernel trap 12 with interrupts disabled
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = ff
> 
> > fault virtual address   = 0x0
> > fault code              = supervisor read data, page not present
> > instruction pointer     = 0x20:0xffffffff80537e74
> > stack pointer           = 0x28:0xffffffff814b4a60
> > frame pointer           = 0x28:0xffffffff814b4a70
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = resume, IOPL = 0
> > current process         = 0 ()
> > trap number             = 12
> > panic: page fault
> > cpuid = 0
> > KDB: stack backtrace:
> > #0 0xffffffff805272e7 at kdb_backtrace+0x67
> > #1 0xffffffff804dd662 at vpanic+0x182
> > #2 0xffffffff804dd4d3 at panic+0x43
> > #3 0xffffffff807a3791 at trap_fatal+0x351
> > #4 0xffffffff807a3983 at trap_pfault+0x1e3
> > #5 0xffffffff807a2f0c at trap+0x26c
> > #6 0xffffffff80787ca1 at calltrap+0x8
> > #7 0xffffffff8083b52a at topo_probe+0x61a
> 
> Interesting.  Could you please do 'list *topo_probe+0x61a' in kgdb, so that I

(kgdb) list *topo_probe+0x61a
0xffffffff8083b52a is in topo_probe (/usr/src/sys/x86/x86/mp_x86.c:540).
535                                 topo_layers[layer].subtype);
536                     }
537             }
538
539             parent = &topo_root;
540             for (layer = 0; layer < nlayers; ++layer) {
541                     node_id = boot_cpu_id >> topo_layers[layer].id_shift;
542                     node = topo_find_node_by_hwid(parent, node_id,
543                         topo_layers[layer].type,
544                         topo_layers[layer].subtype);
Current language:  auto; currently minimal

> can see what code is being executed when the trap happens?  Also, disassembly of
> the function could be useful as well.

(kgdb) x/40i *topo_probe+0x600
0xffffffff8083b510 <topo_probe+1536>:   and    $0xf8,%al
0xffffffff8083b512 <topo_probe+1538>:   movslq -0x4(%r12),%rcx
0xffffffff8083b517 <topo_probe+1543>:   mov    %rbx,%rdi
0xffffffff8083b51a <topo_probe+1546>:   callq  0xffffffff80537e30 <topo_find_node_by_hwid>
0xffffffff8083b51f <topo_probe+1551>:   mov    %rax,%rbx
0xffffffff8083b522 <topo_probe+1554>:   mov    %rbx,%rdi
0xffffffff8083b525 <topo_probe+1557>:   callq  0xffffffff80537e70 <topo_promote_child>
0xffffffff8083b52a <topo_probe+1562>:   add    $0xc,%r12
0xffffffff8083b52e <topo_probe+1566>:   dec    %r14d
0xffffffff8083b531 <topo_probe+1569>:   jne    0xffffffff8083b500 <topo_probe+1520>
0xffffffff8083b533 <topo_probe+1571>:   movb   $0x1,0xffffffff80dfa664
0xffffffff8083b53b <topo_probe+1579>:   add    $0x68,%rsp
0xffffffff8083b53f <topo_probe+1583>:   pop    %rbx
0xffffffff8083b540 <topo_probe+1584>:   pop    %r12
0xffffffff8083b542 <topo_probe+1586>:   pop    %r13
0xffffffff8083b544 <topo_probe+1588>:   pop    %r14
0xffffffff8083b546 <topo_probe+1590>:   pop    %r15
0xffffffff8083b548 <topo_probe+1592>:   pop    %rbp
0xffffffff8083b549 <topo_probe+1593>:   retq
0xffffffff8083b54a <topo_probe+1594>:   nopw   0x0(%rax,%rax,1)


> Wait...
> Kostik, I see one strange thing which is common to both successful and
> unsuccessful configurations.  All "SMP: Added CPU..." lines have "AP" in them.

for #1..#23
no line 'SMP: AP CPU #0 Launched!'

> It seems like the platform does not tell explicitly tell which CPU is the BSP,
> see cpu_add() function.  This can break quite a few assumption.  And I am not
> even sure how the successful scenario works.

# mptable 

===============================================================================

MPTable

-------------------------------------------------------------------------------

MP Floating Pointer Structure:

  location:                     BIOS
  physical address:             0x000fd050
  signature:                    '_MP_'
  length:                       16 bytes
  version:                      1.4
  checksum:                     0x27
  mode:                         Virtual Wire

-------------------------------------------------------------------------------

MP Config Table Header:

  physical address:             0x000fcaa0
  signature:                    'PCMP'
  base table length:            1228
  version:                      1.4
  checksum:                     0x95
  OEM ID:                       'A M I'
  Product ID:                   'ALASKA'
  OEM table pointer:            0x00000000
  OEM table size:               0
  entry count:                  112
  local APIC address:           0xfee00000
  extended table length:        220
  extended table checksum:      72

-------------------------------------------------------------------------------

MP Config Base Table Entries:

--
Processors:     APIC ID Version State           Family  Model   Step    Flags
                 0       0x15    BSP, usable     6       15      1       0xbfebfbff
                 2       0x15    AP, usable      6       15      1       0xbfebfbff
                 4       0x15    AP, usable      6       15      1       0xbfebfbff
                 6       0x15    AP, usable      6       15      1       0xbfebfbff
                 8       0x15    AP, usable      6       15      1       0xbfebfbff
                10       0x15    AP, usable      6       15      1       0xbfebfbff
                16       0x15    AP, usable      6       15      1       0xbfebfbff
                18       0x15    AP, usable      6       15      1       0xbfebfbff
                20       0x15    AP, usable      6       15      1       0xbfebfbff
                22       0x15    AP, usable      6       15      1       0xbfebfbff
                24       0x15    AP, usable      6       15      1       0xbfebfbff
                26       0x15    AP, usable      6       15      1       0xbfebfbff
                32       0x15    AP, usable      6       15      1       0xbfebfbff
                34       0x15    AP, usable      6       15      1       0xbfebfbff
                36       0x15    AP, usable      6       15      1       0xbfebfbff
                38       0x15    AP, usable      6       15      1       0xbfebfbff
                40       0x15    AP, usable      6       15      1       0xbfebfbff
                42       0x15    AP, usable      6       15      1       0xbfebfbff
                48       0x15    AP, usable      6       15      1       0xbfebfbff
                50       0x15    AP, usable      6       15      1       0xbfebfbff
                52       0x15    AP, usable      6       15      1       0xbfebfbff
                54       0x15    AP, usable      6       15      1       0xbfebfbff
                56       0x15    AP, usable      6       15      1       0xbfebfbff
                58       0x15    AP, usable      6       15      1       0xbfebfbff


> Ah... I see that there is a backup code in cpu_mp_start() where boot_cpu_id is
> set based on the current CPU's Local APIC ID.  I suspect then that this
> information is incorrect in the failing case.
> 
> Slawa,
> my guess can be checked by adding a printf to cpu_mp_start() right after
> boot_cpu_id assignment.

System now in early production and I can't be reboot often.

> > #8 0xffffffff8078fe81 at cpu_mp_start+0x1b1
> > #9 0xffffffff805382ca at mp_start+0x3a
> > #10 0xffffffff80465cd8 at mi_startup+0x118
> > #11 0xffffffff8028dfac at btext+0x2c
> > Uptime: 1s
> 
> 
> -- 
> Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160904083958.GD34394>