Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 03 Apr 2014 18:26:56 +0900 (JST)
From:      Kohji Okuno <okuno.kohji@jp.panasonic.com>
To:        kostikbel@gmail.com
Cc:        freebsd-current@freebsd.org, okuno.kohji@jp.panasonic.com
Subject:   Re: kevent has bug?
Message-ID:  <20140403.182656.1696050559410663288.okuno.kohji@jp.panasonic.com>
In-Reply-To: <20140402174400.GR21331@kib.kiev.ua>
References:  <20140402120745.GN21331@kib.kiev.ua> <20140402164542.GC3270@funkthat.com> <20140402174400.GR21331@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
From: Konstantin Belousov <kostikbel@gmail.com>
Date: Wed, 2 Apr 2014 20:44:00 +0300
> On Wed, Apr 02, 2014 at 09:45:43AM -0700, John-Mark Gurney wrote:
>> Konstantin Belousov wrote this message on Wed, Apr 02, 2014 at 15:07 +0300:
>> Well, it's not that its preventing waking up the waiter, but failing to
>> register the event on the knote because of the _INFLUX flag...
> Yes, I used the wrong terminology.
> 
>> 
>> > Patch below fixed your test case for me, also tools/regression/kqueue did
>> > not noticed a breakage.  I tried to describe the situation in the
>> > comment in knote().  Also, I removed unlocked check for the KN_INFLUX
>> > in knote, since it seems to be an optimization for rare case, and is
>> > the race on its own.
>> 
>> Comments below...
>> 
>> > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
>> > index b3fb23d..380f1ff 100644
>> > --- a/sys/kern/kern_event.c
>> > +++ b/sys/kern/kern_event.c
>> 
>> [...]
>> 
>> > @@ -1506,7 +1506,7 @@ retry:
>> >  			KQ_LOCK(kq);
>> >  			kn = NULL;
>> >  		} else {
>> > -			kn->kn_status |= KN_INFLUX;
>> > +			kn->kn_status |= KN_INFLUX | KN_SCAN;
>> >  			KQ_UNLOCK(kq);
>> >  			if ((kn->kn_status & KN_KQUEUE) == KN_KQUEUE)
>> >  				KQ_GLOBAL_LOCK(&kq_global, haskqglobal);
>> 
>> Is there a reason you don't add the KN_SCAN to the other cases in
>> kqueue_scan that set the _INFLUX flag?
> Other cases in kqueue_scan() which set influx do the detach and drop,
> so I do not see a need to ensure that note is registered.  Except I missed
> one case, which you pointed out.
> 
>> 
>> [...]
>> 
>> > @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags)
>> >  	 */
>> >  	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
>> >  		kq = kn->kn_kq;
>> > -		if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) {
>> > +		KQ_LOCK(kq);
>> > +		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
>> > +			/*
>> > +			 * Do not process the influx notes, except for
>> > +			 * the influx coming from the kq unlock in the
>> > +			 * kqueue_scan().  In the later case, we do
>> > +			 * not interfere with the scan, since the code
>> > +			 * fragment in kqueue_scan() locks the knlist,
>> > +			 * and cannot proceed until we finished.
>> > +			 */
>> 
>> We might want to add a marker node, and reprocess the list from the
>> marker node, because this might introduce other races in the code too...
>> but the problem with that is that knote is expected to keep the list
>> locked throughout the call if called w/ it already locked, and so we
>> can't do that, without major work... :(
> Why ? If the knlist lock is not dropped, I do not see a need for the
> marker.  The patch does not introduce the sleep point for the KN_SCAN
> knotes anyway.
> 
>> 
>> I added a similar comment in knote_fork:
>>                  * XXX - Why do we skip the kn if it is _INFLUX?  Does this
>>                  * mean we will not properly wake up some notes?
>> 
>> and it looks like it was true...
>> 
>> So, upon reading the other _INFLUX cases, it looks like we should change
>> _SCAN to be, _CHANGING or something similar, and any place we don't end
>> up dropping the knote, we set this flag also...  Once such case is at
>> the end of kqueue_register, just before the label done_ev_add, where we
>> update the knote w/ new udata and other fields..  Or change the logic
>> of the flag, and set it for all the cases we are about to drop the
>> knote..
> So do you prefer KN_CHANGING instead of KN_SCAN ?  I do not have any
> objections against renaming the flag, but _CHANGING seems to not say
> anything about the flag intent.  I would say that KN_STABLE is more
> useful, or KN_INFLUX_NODEL, or whatever.
> 
> The done_ev_add case is indeed missed in my patch, thank you for noting.
> The case of EV_ADD does not need the KN_SCAN workaround, IMO, since the
> race is possible just by the nature of adding the knote.
> 
>> 
>> > +			KQ_UNLOCK(kq);
>> > +		} else if ((lockflags & KNF_NOKQLOCK) != 0) {
>> > +			kn->kn_status |= KN_INFLUX;
>> > +			KQ_UNLOCK(kq);
>> > +			error = kn->kn_fop->f_event(kn, hint);
>> >  			KQ_LOCK(kq);
>> 
>> I believe we can drop this unlock/lock pair as it's safe to hold the
>> KQ lock over f_event, we do that in knote_fork...
> The knote_fork() is for the special kinds of knote only, where we indeed know
> in advance that having the kqueue locked around f_event does not break things.
> 
> Updated patch below.
> 
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index b3fb23d..fadb8fd 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_event.c
> @@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid)
>  			continue;
>  		kq = kn->kn_kq;
>  		KQ_LOCK(kq);
> -		if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
> +		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
>  			KQ_UNLOCK(kq);
>  			continue;
>  		}
> @@ -1174,7 +1174,7 @@ findkn:
>  	 * but doing so will not reset any filter which has already been
>  	 * triggered.
>  	 */
> -	kn->kn_status |= KN_INFLUX;
> +	kn->kn_status |= KN_INFLUX | KN_SCAN;
>  	KQ_UNLOCK(kq);
>  	KN_LIST_LOCK(kn);
>  	kn->kn_kevent.udata = kev->udata;
> @@ -1197,7 +1197,7 @@ done_ev_add:
>  	KQ_LOCK(kq);
>  	if (event)
>  		KNOTE_ACTIVATE(kn, 1);
> -	kn->kn_status &= ~KN_INFLUX;
> +	kn->kn_status &= ~(KN_INFLUX | KN_SCAN);
>  	KN_LIST_UNLOCK(kn);
>  
>  	if ((kev->flags & EV_DISABLE) &&
> @@ -1506,7 +1506,7 @@ retry:
>  			KQ_LOCK(kq);
>  			kn = NULL;
>  		} else {
> -			kn->kn_status |= KN_INFLUX;
> +			kn->kn_status |= KN_INFLUX | KN_SCAN;
>  			KQ_UNLOCK(kq);
>  			if ((kn->kn_status & KN_KQUEUE) == KN_KQUEUE)
>  				KQ_GLOBAL_LOCK(&kq_global, haskqglobal);
> @@ -1515,7 +1515,8 @@ retry:
>  				KQ_LOCK(kq);
>  				KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal);
>  				kn->kn_status &=
> -				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX);
> +				    ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX |
> +				    KN_SCAN);
>  				kq->kq_count--;
>  				KN_LIST_UNLOCK(kn);
>  				influx = 1;
> @@ -1545,7 +1546,7 @@ retry:
>  			} else
>  				TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
>  			
> -			kn->kn_status &= ~(KN_INFLUX);
> +			kn->kn_status &= ~(KN_INFLUX | KN_SCAN);
>  			KN_LIST_UNLOCK(kn);
>  			influx = 1;
>  		}
> @@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags)
>  	 */
>  	SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
>  		kq = kn->kn_kq;
> -		if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) {
> +		KQ_LOCK(kq);
> +		if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
> +			/*
> +			 * Do not process the influx notes, except for
> +			 * the influx coming from the kq unlock in the
> +			 * kqueue_scan().  In the later case, we do
> +			 * not interfere with the scan, since the code
> +			 * fragment in kqueue_scan() locks the knlist,
> +			 * and cannot proceed until we finished.
> +			 */
> +			KQ_UNLOCK(kq);
> +		} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> +			kn->kn_status |= KN_INFLUX;
> +			KQ_UNLOCK(kq);
> +			error = kn->kn_fop->f_event(kn, hint);
>  			KQ_LOCK(kq);
> -			if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
> -				KQ_UNLOCK(kq);
> -			} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> -				kn->kn_status |= KN_INFLUX;
> -				KQ_UNLOCK(kq);
> -				error = kn->kn_fop->f_event(kn, hint);
> -				KQ_LOCK(kq);
> -				kn->kn_status &= ~KN_INFLUX;
> -				if (error)
> -					KNOTE_ACTIVATE(kn, 1);
> -				KQ_UNLOCK_FLUX(kq);
> -			} else {
> -				kn->kn_status |= KN_HASKQLOCK;
> -				if (kn->kn_fop->f_event(kn, hint))
> -					KNOTE_ACTIVATE(kn, 1);
> -				kn->kn_status &= ~KN_HASKQLOCK;
> -				KQ_UNLOCK(kq);
> -			}
> +			kn->kn_status &= ~KN_INFLUX;
> +			if (error)
> +				KNOTE_ACTIVATE(kn, 1);
> +			KQ_UNLOCK_FLUX(kq);
> +		} else {
> +			kn->kn_status |= KN_HASKQLOCK;
> +			if (kn->kn_fop->f_event(kn, hint))
> +				KNOTE_ACTIVATE(kn, 1);
> +			kn->kn_status &= ~KN_HASKQLOCK;
> +			KQ_UNLOCK(kq);
>  		}
> -		kq = NULL;
>  	}
>  	if ((lockflags & KNF_LISTLOCKED) == 0)
>  		list->kl_unlock(list->kl_lockarg); 
> diff --git a/sys/sys/event.h b/sys/sys/event.h
> index bad8c9e..3b765c0 100644
> --- a/sys/sys/event.h
> +++ b/sys/sys/event.h
> @@ -207,6 +207,7 @@ struct knote {
>  #define KN_MARKER	0x20			/* ignore this knote */
>  #define KN_KQUEUE	0x40			/* this knote belongs to a kq */
>  #define KN_HASKQLOCK	0x80			/* for _inevent */
> +#define	KN_SCAN		0x100			/* flux set in kqueue_scan() */
>  	int			kn_sfflags;	/* saved filter flags */
>  	intptr_t		kn_sdata;	/* saved data field */
>  	union {

Hi,

I think, we should add KN_SCAN after knote_attach() in
kqueue_register(), too. What do you think about this?

Best regards,
 Kohji Okuno




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140403.182656.1696050559410663288.okuno.kohji>