Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Sep 2005 10:24:43 +0100
From:      Antoine Pelisse <apelisse@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   freebsd-5.4-stable panics
Message-ID:  <61c746830509300224g3d79cbe4ve55e8b0b27004fc3@mail.gmail.com>
In-Reply-To: <61c746830509300215x7833746ew60896c4c1338ec65@mail.gmail.com>
References:  <da4a53d805092310237d732554@mail.gmail.com> <20050927140535.G50334@daemon.mistermishap.net> <20050927203128.S61419@fledge.watson.org> <cf6c78405092714227722d534@mail.gmail.com> <20050927222624.R34322@fledge.watson.org> <20050928134724.P56436@daemon.mistermishap.net> <20050929185538.R61419@fledge.watson.org> <20050929160945.A65402@daemon.mistermishap.net> <20050929212738.A34322@fledge.watson.org> <61c746830509300215x7833746ew60896c4c1338ec65@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
 Hi Robert,
I don't think your patch is correct, the total linked list can be broken
while the lock is released, thus just passing the link may not be enough
I have submitted a PR[1] for this a month ago but nobody took care of it ye=
t
  Regards,
Antoine Pelisse

[1] http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/84684
  On 9/29/05, Robert Watson <rwatson@freebsd.org> wrote:
>
> On Thu, 29 Sep 2005, Rob Watt wrote:
>
> > On Thu, 29 Sep 2005, Robert Watson wrote:
> >
> >> Could you dump the contents of *td and *td->td_proc for me? I'm quite
> >> interested to know what the value in td->td_proc->p_state is, among
> other
> >> things. If I could also have you generate a dump of the KSE group
> >> structures in td->td_proc->p_ksegrps and the threads in
> >> td->td_proc->p_threads.
> >
> > I've attached a file with many of the values you have asked for. We
> > looked at some of the threads referenced by td->td_proc->p_threads, but
> > we weren't sure we were walking the list correctly. Do you have any tip=
s
>
> > for walking those thread lists?
> >
> >> Could you tell me if the program named by p->p_comm is linked against =
a
> >> threading library? If it's a custom app, you may already know, and if
> >> not, you can run ldd on the application to see what it is linked
> >> against.
> >
> > The programs named by p->p_comm is linked against the pthreads library.
>
> This seems to be enough information to at least track this down a bit:
> td_ksegrp is NULL, rather than a corrupt value, which suggests that the
> thread is incompletely initialized. Other hints that this are the case
> are that td_critnest is 1 (as is set when it is allocated), and the state
> is TDS_INACTIVE. Some other fields are set though, such as td_oncpu,
> which is normally initialized to NOCPU.
>
> > (kgdb) p *td
> > $1 =3D {td_proc =3D 0xffffff004aa9f000, td_ksegrp =3D 0x0, td_plist =3D
> > {tqe_next =3D 0xff ffff00b4798000,
> > tqe_prev =3D 0xffffff00a97ae010}, td_kglist =3D {tqe_next =3D
> > 0xffffff00b4798000,
> > tqe_prev =3D 0xffffff00a97ae020}, td_slpq =3D {tqe_next =3D 0x0, tqe_pr=
ev
> > =3D 0xffff ff001fac7c10}, td_lockq =3D {
> > tqe_next =3D 0xffffff00a97ae000, tqe_prev =3D 0xffffffffb6797a70},
> > td_runq =3D {tq e_next =3D 0x0,
> > tqe_prev =3D 0xffffffff80608180}, td_selq =3D {tqh_first =3D 0x0, tqh_l=
ast
> > =3D 0xfff fff00633112c0},
> > td_sleepqueue =3D 0xffffff00382b0400, td_turnstile =3D 0xffffff00c17129=
00,
> > td_umtx q =3D 0xffffff00d1207080,
> > td_tid =3D 100253, td_flags =3D 16777216, td_inhibitors =3D 0, td_pflag=
s =3D
> > 128, td_d upfd =3D 0, td_wchan =3D 0x0,
> > td_wmesg =3D 0x0, td_lastcpu =3D 2 '\002', td_oncpu =3D 2 '\002',
> > td_owepreempt =3D 0 '\0', td_locks =3D 0,
> > td_blocked =3D 0x0, td_ithd =3D 0x0, td_lockname =3D 0x0, td_contested =
=3D
> > {lh_first =3D
> > 0x0}, td_sleeplocks =3D 0x0,
> > td_intr_nesting_level =3D 0, td_pinned =3D 0, td_mailbox =3D 0x0, td_uc=
red =3D
> > 0xfffff f00ad18f200,
> > td_standin =3D 0x0, td_upcall =3D 0x0, td_sticks =3D 0, td_uuticks =3D =
0,
> > td_usticks =3D
> > 0, td_intrval =3D 0,
> > td_oldsigmask =3D {__bits =3D {0, 0, 0, 0}}, td_sigmask =3D {__bits =3D
> > {4294967295, 4 294967295, 4294967295,
> > 4294967295}}, td_siglist =3D {__bits =3D {0, 0, 0, 0}}, td_generation
> > =3D 14, td _sigstk =3D {ss_sp =3D 0x0,
> > ss_size =3D 0, ss_flags =3D 0}, td_kflags =3D 0, td_xsig =3D 0,
> > td_profil_addr =3D 0, td_profil_ticks =3D 0,
> > td_base_pri =3D 182 '\uffff', td_priority =3D 182 '\uffff', td_pcb =3D
> > 0xffffffffb68 dcd10, td_state =3D TDS_INACTIVE,
> > td_retval =3D {1, 29309280}, td_slpcallout =3D {c_links =3D {sle =3D {s=
le_next
> > =3D 0x0},
> > tqe =3D {tqe_next =3D 0x0,
> > tqe_prev =3D 0xffffff001fac7d80}}, c_time =3D 55907602, c_arg =3D
> > 0xffffff0063 311260,
> > c_func =3D 0xffffffff802e32a0 <sleepq_timeout>, c_mtx =3D 0x0, c_flags =
=3D
> > 16}, td _frame =3D 0xffffffffb68dcc40,
> > td_kstack_obj =3D 0xffffff0087f93d20, td_kstack =3D 1844674407247731507=
2,
> > td_kstac k_pages =3D 4,
> > td_altkstack_obj =3D 0x0, td_altkstack =3D 0, td_altkstack_pages =3D 0,
> > td_critnest =3D 1, td_md =3D {
> > md_spinlock_count =3D 1, md_saved_flags =3D 582}, td_sched =3D
> > 0xffffff0063311488}
>
> I'm not familiar with the internals of the thread and KSE life cycle here=
,
>
> so I think we'll need to look to those more familiar with this to
> understand what of two things may be going on:
>
> (1) Is the fact that td_ksegrp !=3D NULL an invariant for a connected
> thread, and that kern_proc is relying on that but the thread code is
> failing to implement it safely?
>
> (2) Is td_ksegrp sometimes left legitimately as NULL as part of the threa=
d
> life cycle, and that kern_proc incorrectly assumes that it is never
> NULL when hooked up to a thread.
>
> This suggests a possible work-around of simply testing td_ksegrp for NULL
> in kern_proc in order to avoid this, while attempting to resolve whether
> an invariant is violated (or incorrectly assumed), which might require
> some serious thinking and a solution that is non-trivial. Something like
> the following might work in the mean time:
>
> Index: kern_proc.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /home/ncvs/src/sys/kern/kern_proc.c,v
> retrieving revision 1.231
> diff -u -r1.231 kern_proc.c
> --- kern_proc.c 27 Sep 2005 18:03:15 -0000 1.231
> +++ kern_proc.c 29 Sep 2005 20:50:33 -0000
> @@ -882,6 +882,8 @@
> } else {
> _PHOLD(p);
> FOREACH_THREAD_IN_PROC(p, td) {
> + if (td->td_ksegrp =3D=3D NULL)
> + continue;
> fill_kinfo_thread(td, &kinfo_proc);
> PROC_UNLOCK(p);
> error =3D SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
>
> I'm going to forward off your e-mail to the threads@ list and see if
> anyone there wants to talk some more about this. If you don't mind
> testing the above patch to see if this is a workable work-around, we may
> want to think about getting it committed in the mean time.
>
> Thanks,
>
> Robert N M Watson
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?61c746830509300224g3d79cbe4ve55e8b0b27004fc3>