Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Feb 2019 15:49:56 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        freebsd-hackers Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: head -r344018 based powerpc64 pmac_thermal hangup (stuck sleeping): some preliminary evidence [not as uniform as I initially saw]
Message-ID:  <AADCB7B3-C970-48B4-B20A-F6F78D848F86@yahoo.com>
In-Reply-To: <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com>
References:  <40D1DDA1-10FB-4F2C-B38B-C7FED5795542@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
[I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to
a hangup.]

On 2019-Feb-25, at 14:46, Mark Millard <marklmi@yahoo.com> wrote:

> I adjusted what KTR_PROC does to just show just some of its pid 26
> (pmac_thermal) messages and adding some extra output as well. I'll
> list some of that output later. I'll note that beyond pmac_thermal
> the buf*daemon* threads also seem to be subject to being stuck
> sleeping in (offsets are for a specific build of mine):
>=20
> mi_switch+0x134 sleepq_switch+0x2ec sleepq_timedwait+0x48 _sleep+0x41c
>=20
>=20
> So far for pmac_thermal I've seen that until the failing case:
>=20
> sleepq_set_timeout_sbt was being given: sbt=3D=3D0xcccccbe0 pr=3D=3D0x0 =
flags=3D=3D0x100
> and in turn was using: prec=3D=3D0xcccccbe flags=3D=3D0x501 (of course =
the used
> td_sleeptimo varies).
>=20
> [I note that 16*0xcccccbe =3D=3D 0xcccccbe0, the original sbt value,
> not that I know yet if this matters.]
>=20
> But the sequence leading to failures is different:

I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to
a hangup. So it is not a reliable predictor of the hang-up in
sleep. I'm trying to see if I can observe a failure with
different value.

> sleepq_set_timeout_sbt was given: sbt=3D=3D0xfffffed8 pr=3D=3D0x0 =
flags=3D=3D0x100
> and in turn was using: prec=3D=3D0xfffffed flags=3D=3D0x501
>=20
> [I note that 16*0xfffffed =3D=3D 0xfffffed0, so less than the original
> sbt value, not that I know this matters at this point.]
>=20
> For sbt=3D=3D0xfffffed8, the callout to sleepq_timeout ends up with =
values
> like (a particular example):
>=20
> td_sleeptimo=3D0x470d360fe5 sbinuptime=3D0x46c869f6aa
>=20
> where the reporting code looks like:
>=20
> static void
> sleepq_timeout(void *arg)
> {
>        struct sleepqueue_chain *sc __unused;
>        struct sleepqueue *sq;
>        struct thread *td;
>        void *wchan;
>        int wakeup_swapper;
>        sbintime_t sbut; // HACK!!!
>=20
>        td =3D arg;
>        wakeup_swapper =3D 0;
>        if (26 =3D=3D td->td_proc->p_pid) // HACK!!!
>        CTR3(KTR_PROC, "sleepq_timeout: thread %p (pid %ld, %s)",
>            (void *)td, (long)td->td_proc->p_pid, (void *)td->td_name);
>=20
>        thread_lock(td);
>=20
>        sbut=3D sbinuptime(); // HACK!!!
>        if (td->td_sleeptimo > sbut || td->td_sleeptimo =3D=3D 0) {
>                /*
>                 * The thread does not want a timeout (yet).
>                 */
>                if (26 =3D=3D td->td_proc->p_pid) // HACK!!!
>                        CTR5(KTR_PROC, "sleepq_timeout thread not want =
timeout yet: thread %p (pid %ld, %s) td_sleeptimo=3D%jx sbinuptime=3D%jx",=

>                            (void *)td, (long)td->td_proc->p_pid, (void =
*)td->td_name, (uintmax_t)td->td_sleeptimo, (uintmax_t)sbut);
> . . .
>=20
> So far sleepq_set_timeout_sbt being given sbt=3D=3D0xfffffed8 instead =
of
> sbt=3D=3D0xcccccbe0 seems to be an accurate indicator of if the =
problem will
> happen in sleepq_timeout. (But I've only a few examples so far.)
>=20

I've now seen examples of sbt=3D=3D0xfffffed8 that did not lead to
a hangup. So it is not a reliable predictor of the hang-up in
sleep.

It looks like the values are sometimes more varied than I'd
seen before as well.

> I'll note that the sleepq_timeout code for this case does not set up
> another callout to itself for later and the sleep then continues
> indefinitely.
>=20
> I've not yet gotten into finding evidence for why the callout to
> sleepq_timeout itself happens. Hopefully I can find some.
>=20
>=20
> An example of some modified KTR_PROC output is:
>=20
> . . .



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AADCB7B3-C970-48B4-B20A-F6F78D848F86>