Date: Mon, 7 Dec 2020 00:21:32 -0800 From: Mark Millard <marklmi@yahoo.com> To: mmel@freebsd.org Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: ThunderX Panic after r368370 Message-ID: <4528E502-D007-48E5-B6A5-8E4376A2B05E@yahoo.com> In-Reply-To: <BB5C4C3E-EDF6-4C3D-BEE1-F8B2989216E0@yahoo.com> References: <1C3442ED-278E-45B8-9206-0DD24FCBC237@brickporch.com> <4331eee0-74a6-565c-3bec-0051415b2bc1@freebsd.org> <56F0E9EB-0B78-4B0B-830A-48F8AFC5ABE1@yahoo.com> <91654fc4-8734-d8a7-5309-0400f418438a@freebsd.org> <BB5C4C3E-EDF6-4C3D-BEE1-F8B2989216E0@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Dec-6, at 13:30, Mark Millard <marklmi at yahoo.com> wrote: > On 2020-Dec-6, at 03:51, Michal Meloun <meloun.michal at gmail.com> = wrote: >=20 > On 06.12.2020 10:47, Mark Millard wrote: >>> On 2020-Dec-6, at 00:17, Michal Meloun <meloun.michal at gmail.com> = wrote: >>>> On 06.12.2020 3:21, Marcel Flores wrote: >>>>> Hi All, >>>>> Looks like the ThunderX started panicking at boot after r368370: >>>>> https://reviews.freebsd.org/rS368370 >>>>> =46rom a verbose boot, it looks like it bails in gic0 = redistributor setup(?): >>>>> gic0: CPU29 Re-Distributor woke up >>>>> gic0: CPU24 enabled CPU interface via system registers >>>>> gic0: CPU17 enabled CPU interface via system registers >>>>> gic0: CPU29 enabled CPU interface via system registers >>>>> done >>>>> Full Verbose boot: >>>>> https://gist.github.com/mesflores/f026122495c8494d041bce04d30b15bb >>>>> I'm not really familiar with the details of the commit, but happy = to test >>>>> anything if anyone has any ideas. >>>>=20 >>>>=20 >>>> Hi Marcel >>>> are you able to get crashdump and do backtrace? >>>> = https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html#= kerneldebug-obtain >>>> and >>>> = https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.h= tml >>>> If not, I'll make some debug patch. >>>>=20 >>>> It's weird, even though GIC is potentially affected by my patch, in = this case the cpuid numbering was not changed. >>> (I've no access to a ThunderX. I just looked for my own curiosity. >>> Sorry if this is obvious and so is noise.) >>> When I looked at the code it appeared to be the last "->" in >>> the following that was dereferencing the nullptr value (via [x8] >>> in assembler notation): >>> static uint64_t >>> its_cmd_prepare(struct its_cmd *cmd, struct its_cmd_desc *desc) >>> { >>> uint64_t target; >>> uint8_t cmd_type; >>> u_int size; >>> cmd_type =3D desc->cmd_type; >>> target =3D ITS_TARGET_NONE; >>> switch (cmd_type) { >>> case ITS_CMD_MOVI: /* Move interrupt ID to another = collection */ >>> target =3D desc->cmd_desc_movi.col->col_target; >>> . . . >>> In other words: it appeared to me that the above = desc->cmd_desc_movi.col >>> evaluated as 0 when used in what was reported. >> This is very probably right analysis. But problem is that = cmd_desc_movi.col should not be NULL, is initialized in its_cmd_movi = from sc->sc_its_cols which should be allocated in gicv3_its_attach(). >>=20 >=20 > The following is unlikely to directly contribute to the > specific problem's solution but documents an oddity that > took my time while looking around related the problem. >=20 . . . I'm omitting the material about the "start" part of the comment below. I've more directly useful for the problem later below. > /* > * Note that `start` and the returned value from BIT_FFS_AT are > * 1-based bit indices. > */ > #define BIT_FFS_AT(_s, p, start) __extension__ ({ = \ > . . . >=20 . . .=20 Looks to me like fdt_cpuid's use in cpu_init_fdt is one of the issues with what is added to each cpuset_domain[domain] : /* Skip boot CPU */ if (__pcpu[0].pc_mpidr =3D=3D (target_cpu & CPU_AFF_MASK)) return (TRUE); . . . fdt_cpuid++; /* Try to read the numa node of this cpu */ if (vm_ndomains =3D=3D 1 || OF_getencprop(node, "numa-node-id", &domain, sizeof(domain)) = <=3D 0) domain =3D 0; __pcpu[fdt_cpuid].pc_domain =3D domain; if (domain < MAXMEMDOM) CPU_SET(fdt_cpuid, &cpuset_domain[domain]); fdt_cpuid's initial value can not be added by this code: it is incremented first. cpu_mp_start initializes fdt_cpuid via: fdt_cpuid =3D 1; ofw_cpu_early_foreach(cpu_init_fdt, true); So fdt_cpuid=3D=3D2 is the smallest value that can be added to &cpuset_domain[domain] via that ofw_cpu_early_foreach call that in turn calls cpu_init_fdt. More then that, there is also the "Skip boot CPU" code that avoids ever adding the boot CPU to a &cpuset_domain[domain] .=20 This matches up well with the logs showing the two "NULL" lines in: gicv3_its_attach: per domain cpus gicv3_its_attach: NULL its col[0] gicv3_its_attach: NULL its col[1] gicv3_its_attach: new its col[2] gicv3_its_attach: new its col[3] . . . gicv3_its_attach: new its col[29] gicv3_its_attach: new its col[30] gicv3_its_attach: new its col[31] and the log's content just before the panic: gicv3_its_bind_intr: Enter gicv3_its_select_cpu: cpuset not empty its_cmd_movi: isrc_cpu 0, col; 0 panic: data abort with spinlock held =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4528E502-D007-48E5-B6A5-8E4376A2B05E>