Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Dec 2018 18:56:43 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        mmel@freebsd.org
Cc:        freebsd-emulation@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, ports-list freebsd <freebsd-ports@freebsd.org>, freebsd-arm <freebsd-arm@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org>
Subject:   Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)
Message-ID:  <2E3F6196-4652-40D2-937F-8860B6005A35@yahoo.com>
In-Reply-To: <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com>
References:  <FF9B4284-4E6B-4D36-86A0-18861B527AC0@yahoo.com> <865A13C8-9749-486E-9F79-5EEDDECBE621@yahoo.com> <0154C3AC-D85B-4FCF-BA63-454BC26BC1A2@yahoo.com> <A6A58CE3-062B-4B79-A8C2-ADFDAA04C6AF@yahoo.com> <13f5e4dd-33fb-2170-e31a-1b5d5f155869@freebsd.org> <ABA957EA-B8EE-4B8C-9C2F-B745BA652BF6@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote:

> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com> =
wrote:
>=20
>> Mark,
>> this is known problem with qemu-user-static.
>> Emulation of every single interruptible syscall is broken by design =
(it
>> have signal related races). Theses races cannot be solved without =
major
>> rewrite of syscall emulation code.
>> Unfortunately, nobody actively works on this, I think.
>>=20
>=20
> Thanks for the note setting some expectations.
>=20
> On the evidence that I have I expect that more is going on than that:
>=20
> A) The hang-up always happens and always in the same place. So
> it would appear that no race is involved.
>=20
> B) (A) is true even for varying the number of builders in parallel
> (so other builds also happening) and the number of jobs allowed per
> builder. It also fails for only one builder allowed only one process.
> (I get traces from that last kind of context.)
>=20
> C) The problem started on the package-building servers for armv7
> and armv6 without qemu-user-static having an update (FreeBSD and
> cmake had updates, for example).
>=20
> D) The problem is only observed for targeting armv7 and armv6 as
> far as I can tell. I've never seen it for aarch64, neither my
> own builds nor when I looked at the package-building server
> history.
>=20
> At least that is what got me started. (I've since learned that
> qemu-user-static uses fork in place of a requested vfork.)
>=20
> My ktrace/kdump experiment yesterday showed something odd for the
> kevent that hangs in cmake:
>=20
> 93172 qemu-arm-static CALL  =
kevent(0x3,0x7ffffffe7d40,0x2,0x7ffffffd7d40,0x400,0)
> 93172 qemu-arm-static STRU  struct kevent[] =3D { { ident=3D6, =
filter=3DEVFILT_READ, flags=3D0x1<EV_ADD>, fflags=3D0, data=3D0, =
udata=3D0x0 }
>             { ident=3D0x0, filter=3D<invalid=3D0>, flags=3D0, =
fflags=3D0x8, data=3D0x1ffff, udata=3D0x0 } }
>=20
> Note the 0x2 argument to kevent and the apparently-odd 2nd entry in =
the struct
> kevent[]. The kevent use is from cmake.
>=20
> So far I've not identified a signal being delivered at a time that =
would seem
> to me to be likely to contribute. (But this is not familiar code so my =
judgment
> is likely not the best.)
>=20
> Note: I normally run FreeBSD using a non-debug kernel, even when using
> head. (The kernel does have symbols.)


The detail of the signal usage involved leading up to the hang-up,
starting from just before the "press return" for the "make FLAVOR=3Dqt5"
command that I had entered:

The only "Interrupted system call" prior to my killing the hung cmake
process was (kdump -H -r -S output):

 93172 100717 qemu-arm-static CALL  =
execve[59](0x10392,0x8605051a0,0x860cf5400)
 93172 101706 qemu-arm-static RET   nanosleep[240] -1 errno 4 =
Interrupted system call
 93172 100717 qemu-arm-static NAMI  "/bin/sh"
 93172 100717 sh       RET   execve[59] JUSTRETURN
 93172 100717 sh       CALL  readlink[58](0x207a65,0x7fffffffccc0,0x400)

This is where ninja (via qemu-arm-static) execve's the amd64-native =
/bin/sh (to
in turn later run cmake via qemu-arm-static). (This was after the fork =
[for the
requested vfork].) So it is for the close-down of the thread that was in
nanosleep.

There were no PSIG's and no sigreturn's prior to the kill according to =
the
kdump output.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2E3F6196-4652-40D2-937F-8860B6005A35>