From owner-freebsd-hackers@freebsd.org Wed Aug 5 14:27:54 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D38A29B274C for ; Wed, 5 Aug 2015 14:27:54 +0000 (UTC) (envelope-from pjalaber@gmail.com) Received: from mail-qg0-x22d.google.com (mail-qg0-x22d.google.com [IPv6:2607:f8b0:400d:c04::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8ED9A396; Wed, 5 Aug 2015 14:27:54 +0000 (UTC) (envelope-from pjalaber@gmail.com) Received: by qgeu79 with SMTP id u79so30897757qge.1; Wed, 05 Aug 2015 07:27:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TLgDtLOiRw0BuAHrc4tyKCfnn1piKVf+ZSNDY3hkz24=; b=nTwbXH05Bcjv+CKKBKILWrJ4qKqPFUxnCw7/LUZhNWfbOcWtmR+Vwx2a0RTP7VkfcN G5qceFwvnZITeTJskBRWfLhQyIuEU4d1vq2o2R8OnIMVVo/YdAkyMVhrdcdknjsAlOeO ccQQUZICMYYjEleP3NALhcNTbkhgFrRylv+rLxFLXUGgPhcDHEfVMQ11aL7x1omxDPA8 BFsq/E3otEnrDbLy5oFmCY5MEG0a/d5fYRvvOYT0zpVoeMi0DHT4tus4YQc/HpZa/Zgb we/ZYlu4aiLguowlt2G3w3bNhGimx147Z47Rum39wQdXQypKmPhki7783w5TKhX5JzdD T7kg== MIME-Version: 1.0 X-Received: by 10.140.232.200 with SMTP id d191mr18774141qhc.87.1438784873611; Wed, 05 Aug 2015 07:27:53 -0700 (PDT) Received: by 10.140.92.247 with HTTP; Wed, 5 Aug 2015 07:27:53 -0700 (PDT) In-Reply-To: <2768515.JZVZhYiQVE@ralph.baldwin.cx> References: <2768515.JZVZhYiQVE@ralph.baldwin.cx> Date: Wed, 5 Aug 2015 16:27:53 +0200 Message-ID: Subject: Re: adaptive rwlock deadlock From: Philippe Jalaber To: John Baldwin Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Aug 2015 14:27:55 -0000 2015-08-04 22:10 GMT+02:00 John Baldwin : > On Tuesday, July 07, 2015 12:10:19 PM Philippe Jalaber wrote: > > Hi, > > > > I am facing a strange problem using the network stack and adaptive > rwlocks > > running Freebsd 9.3. > > Basically I can reproduce the problem with 3 threads: > > > > 1) thread 1 has taken the rwlock of structure inpcb in exclusive mode in > > tcp_input.c. This thread also runs my own code and repeatedly takes a > > rwlock (called g_rwlock) in shared mode and releases it, until a shared > > object is marked not "busy" any more: > > > > rwlock(inp_lock); > > .... > > do { // thread is active waiting in the loop > > rlock(g_rwlock); > > o = find(); > > if ( o == NULL ) > > break; > > busy = o.busy; > > if (o != NULL && busy) > > runlock(g_rwlock); > > } while ( busy ); > > > > if ( o != NULL ) > > { > > // do something with o > > .... > > } > > runlock(g_rwlock); > > .... > > > > 2) thread 2 wants to set the shared object as "ready". So it tries to > take > > g_rwlock in exclusive mode and is blocked in _rw_wlock_hard@kern_rwlock.c > :815 > > "turnstile_wait(ts, rw_owner(rw), TS_EXCLUSIVE_QUEUE)" because thread 1 > has > > already taken it in shared mode: > > > > wlock(g_rwlock); > > o = find(); > > if ( o != NULL ) > > o.busy = 1; > > wunlock(g_rwlock); > > > > // o is busy so work on it without any lock > > .... > > > > wlock(g_rwlock); // thread is blocked here > > o.busy = 0; > > maybe_delete(o); > > wunlock(g_rwlock); > > > > 3) thread 3 spins on the same inpcb rwlock than thread 1 in > > _rw_wlock_hard@kern_rwlock.c:721 "while ((struct > > thread*)RW_OWNER(rw->rw_lock) == owner && TD_IS_RUNNING(owner)) " > > > > > > My target machine has two cpus. > > Thread 1 is pinned to cpu 0. > > Thread 2 and Thread 3 are pinned to cpu 1. > > Thread 1 and Thread 2 have a priority of 28. > > Thread 3 has a priority of 127 > > > > Now what seems to happen is that when thread 1 calls runlock(g_rwlock), > it > > calls turnstile_broadcast@kern_rwlock.c:650, but thread 2 never regains > > control because thread 3 is spinning on the inpcb rwlock. Also the > > condition TD_IS_RUNNING(owner) is always true because thread 1 is active > > waiting in a loop. So the 3 threads deadlock. > > Note that if I compile the kernel without adaptive rwlocks it works > without > > any problem. > > A workaround is to add a call to "sched_relinquish(curthread)" in thread > 1 > > in the loop just after the call to runlock. > > It sounds like we are not forcing a preemption on CPU 1 in this case via > sched_add(). > > For SCHED_4BSD you could try the 'FULL_PREEMPTION' kernel option. > For ULE you can adjust 'preempt_thresh' on the fly, though I think the > default setting should actually still work. > > Can you use KTR or some such to determine if IPI_PREEMPT is being sent by > CPU 0 to CPU 1 in this case? > > > I am also wondering about the code in _rw_runlock after > > "turnstile_broadcast(ts, queue)". Isn't the flag RW_LOCK_WRITE_WAITERS > > definitely lost if the other thread which is blocked in turnstile_wait > > never regains control ? > > All the write waiters are awakened by a broadcast (as opposed to a signal > operation). They are on the run queue, not on the turnstile queue anymore, > so there aren't any write waiters left (the bit only tracks if there are > waiters on the turnstile). > > -- > John Baldwin > I Use ULE scheduler. Here's the KTR output using ktrdump on a vmcore after watchdog. 75447 ipi_selected: cpu: 1 ipi: fc 75446 stop_cpus() with 252 type 75445 ipi_cpu: cpu: 1 ipi: 2 75444 ipi_cpu: cpu: 1 ipi: 2 75443 ipi_cpu: cpu: 1 ipi: 2 75442 ipi_cpu: cpu: 1 ipi: 2 75441 ipi_cpu: cpu: 1 ipi: 2 .... 3862 ipi_cpu: cpu: 1 ipi: 2 3861 ipi_cpu: cpu: 1 ipi: 2 3860 ipi_cpu: cpu: 1 ipi: 2 3859 ipi_cpu: cpu: 1 ipi: 2 3858 ipi_cpu: cpu: 1 ipi: 2 3857 ipi_selected: cpu: 1 ipi: f3 3856 ipi_cpu: cpu: 1 ipi: 2 3855 ipi_cpu: cpu: 1 ipi: 2 3854 ipi_cpu: cpu: 1 ipi: 2 3853 ipi_selected: cpu: 0 ipi: f3 3852 ipi_cpu: cpu: 1 ipi: 2 3851 ipi_selected: cpu: 1 ipi: f3 3850 ipi_cpu: cpu: 1 ipi: 2 3849 ipi_cpu: cpu: 1 ipi: 2 3848 ipi_selected: cpu: 0 ipi: f3 3847 ipi_cpu: cpu: 1 ipi: 2 3846 ipi_cpu: cpu: 1 ipi: 2 3845 ipi_cpu: cpu: 1 ipi: 2 3844 ipi_cpu: cpu: 1 ipi: 2 3843 ipi_cpu: cpu: 1 ipi: 2 3842 ipi_cpu: cpu: 1 ipi: 2 3841 ipi_cpu: cpu: 1 ipi: 2 3840 ipi_cpu: cpu: 1 ipi: 2 3839 ipi_cpu: cpu: 1 ipi: 2 3838 ipi_cpu: cpu: 1 ipi: 2 3837 ipi_cpu: cpu: 1 ipi: 2 3836 ipi_cpu: cpu: 1 ipi: 2 3835 ipi_cpu: cpu: 0 ipi: 1 3834 ipi_cpu: cpu: 0 ipi: 1 3833 ipi_cpu: cpu: 0 ipi: 1 3832 ipi_cpu: cpu: 0 ipi: 1 3831 ipi_cpu: cpu: 0 ipi: 1 3830 ipi_cpu: cpu: 0 ipi: 1