Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Mar 2017 23:45:57 -0700
From:      John Baldwin <jhb@freebsd.org>
To:        Kristof Provost <kp@freebsd.org>
Cc:        Gleb Smirnoff <glebius@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r315136 - head/sys/netpfil/pf
Message-ID:  <1803226.Igex2bR0P8@ralph.baldwin.cx>
In-Reply-To: <7B1C8879-E636-4315-99A2-A258AB9AE500@FreeBSD.org>
References:  <201703120542.v2C5gvM4075391@repo.freebsd.org> <20170314215706.GB1072@FreeBSD.org> <7B1C8879-E636-4315-99A2-A258AB9AE500@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, March 15, 2017 10:26:39 AM Kristof Provost wrote:
> On 15 Mar 2017, at 6:57, Gleb Smirnoff wrote:
> > On Sun, Mar 12, 2017 at 05:42:57AM +0000, Kristof Provost wrote:
> > K> Log:
> > K>   pf: Fix incorrect rw_sleep() in pf_unload()
> > K>
> > K>   When we unload we don't hold the pf_rules_lock, so we cannot c=
all=20
> > rw_sleep()
> > K>   with it, because it would release a lock we do not hold. There=
's=20
> > no need for the
> > K>   lock either, so we can just tsleep().
> > K>
> > K>   While here also make the same change in pf_purge_thread(),=20
> > because it explicitly
> > K>   takes the lock before rw_sleep() and then immediately releases=
 it=20
> > afterwards.
> >
> > The correct change would to be grab lock in pf_unload(), exactly as=
=20
> > pf_purge_thread()
> > does. With your change you introduces a possible infinite sleep due=
 to=20
> > race, since
> > there is no timeout and no lock.
> >
> I must be missing something, because I don=E2=80=99t see the race, an=
d don=E2=80=99t=20
> see how we
> could end up with an infinite sleep.

You are ignoring interrupts and preemption.  Suppose you get an interru=
pt
after 'wakeup_one(pf_purge_thread)' and before 'tsleep(..., 0)' in
pf_unload().  If the interrupt preempts and results in the purge thread=

running and issuing its wakeup before the thread executing pf_unload()
resumes, then eventually when pf_unload() resumes it will do a tsleep()=
 with
no timeout that will never be awoken.

You obviously didn't test this in a debug kernel since there is a KASSE=
RT
explicitly to catch obvious tsleep races in _sleep():

        KASSERT(sbt !=3D 0 || mtx_owned(&Giant) || lock !=3D NULL,
            ("sleeping without a lock"));

You should fix this in the way that Gleb suggested.

Also, all kthreads/kprocs do a wakeup() inside of exit1() or kthread_ex=
it()
to allow you to wait for a kthread to exit when unloading a module.  Th=
e
general structure should be something like:

struct thread *my_thread;

void
thread_main(void *arg)
{

    LOCK(&mylock);
    while (!thread_quit) {
       UNLOCK(&mylock);
       /* do work */
       LOCK(&mylock);
       if (!thread_quit && no_work_to_do)
           lock_sleep(&some_wchan, &mylock, ...);
    }
    UNLOCK(&mylock);
    kthread_exit();
}

void
unload_handler(...)
{
    ...

    LOCK(&mylock);
    thread_quit =3D true;
    wakeup(&some_wchan);
    lock_sleep(my_thread, &mylock, ...);
    UNLOCK(&mylock);
}

void
load_handler(...)
{
    ...

    kthread_add(thread_main, arg, NULL, &my_thread, ...);
}

If you want to create a proc then you can use 'struct proc *my_proc'
and sleep on 'my_proc' instead (along with using kproc_exit(), though
kthread_exit() from the last thread in a kproc should call kproc_exit()=

for you).

--=20
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1803226.Igex2bR0P8>