From owner-freebsd-stable@FreeBSD.ORG Fri Jan 22 19:27:16 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D75D91065679; Fri, 22 Jan 2010 19:27:16 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 705A38FC1D; Fri, 22 Jan 2010 19:27:16 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAA+LWUuDaFvJ/2dsb2JhbADZL4IzggkE X-IronPort-AV: E=Sophos;i="4.49,325,1262581200"; d="scan'208";a="62589991" Received: from ganges.cs.uoguelph.ca ([131.104.91.201]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 22 Jan 2010 14:27:14 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id A1DD8FB808E; Fri, 22 Jan 2010 14:27:14 -0500 (EST) X-Virus-Scanned: amavisd-new at ganges.cs.uoguelph.ca Received: from ganges.cs.uoguelph.ca ([127.0.0.1]) by localhost (ganges.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZntHnpHgZ2wX; Fri, 22 Jan 2010 14:27:13 -0500 (EST) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 5C806FB8063; Fri, 22 Jan 2010 14:27:13 -0500 (EST) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o0MJbmD06729; Fri, 22 Jan 2010 14:37:48 -0500 (EST) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Fri, 22 Jan 2010 14:37:48 -0500 (EST) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Mikolaj Golub In-Reply-To: <86vdeuuo2y.fsf@zhuzha.ua1> Message-ID: References: <86ocl272mb.fsf@kopusha.onet> <86tyuqnz9x.fsf@zhuzha.ua1> <86zl4awmon.fsf@zhuzha.ua1> <86vdeywmha.fsf@zhuzha.ua1> <86vdeuuo2y.fsf@zhuzha.ua1> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: FreeBSD NFS client/Linux NFS server issue X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jan 2010 19:27:16 -0000 On Fri, 22 Jan 2010, Mikolaj Golub wrote: > > We have nonempty nm_bufq, nm_bufqiods = 1, but actually there is no nfsiod > thread run for this mount, which is wrong -- nm_bufq will not be emptied until > some other process starts writing to the nfsmount and starts nfsiod thread for > this mount. > > Reviewing the code how it could happen I see the following path. Could someone > confirm or disprove me? > > in nfs_bio.c:nfs_asyncio() we have: > > 1363 mtx_lock(&nfs_iod_mtx); > ... > 1374 /* > 1375 * Find a free iod to process this request. > 1376 */ > 1377 for (iod = 0; iod < nfs_numasync; iod++) > 1378 if (nfs_iodwant[iod]) { > 1379 gotiod = TRUE; > 1380 break; > 1381 } > 1382 > 1383 /* > 1384 * Try to create one if none are free. > 1385 */ > 1386 if (!gotiod) { > 1387 iod = nfs_nfsiodnew(); > 1388 if (iod != -1) > 1389 gotiod = TRUE; > 1390 } > > Let's consider situation when new nfsiod is created. > > nfs_nfsiod.c:nfs_nfsiodnew() before creating nfssvc_iod thread unlocks nfs_iod_mtx: > > 179 mtx_unlock(&nfs_iod_mtx); > 180 error = kthread_create(nfssvc_iod, nfs_asyncdaemon + i, NULL, RFHIGHPID, > 181 0, "nfsiod %d", newiod); > 182 mtx_lock(&nfs_iod_mtx); > > > And nfs_nfsiod.c:nfssvc_iod() do the followin: > > 226 mtx_lock(&nfs_iod_mtx); > ... > 238 nfs_iodwant[myiod] = curthread->td_proc; > 239 nfs_iodmount[myiod] = NULL; > ... > 244 error = msleep(&nfs_iodwant[myiod], &nfs_iod_mtx, PWAIT | PCATCH, > 245 "-", timo); > > Let's at this moment another nfs_asyncio() request for another nfsmount has > happened and this thread has locked nfs_iod_mtx. Then this thread will found > nfs_iodwant[iod] in "for" loop and will use it. When the first thread actually > has returned from nfs_nfsiodnew() it will insert buffer to nmp->nm_bufq but > nfsiod will process other nmp. > Ok, good catch, I think you've found the problem (or at least a race that might have caused it). > It looks like the fix for this situation would be to check nfs_iodwant[iod] > after nfs_nfsiodnew(): > > --- nfs_bio.c.orig 2010-01-22 15:38:02.000000000 +0000 > +++ nfs_bio.c 2010-01-22 15:39:58.000000000 +0000 > @@ -1385,7 +1385,7 @@ again: > */ > if (!gotiod) { > iod = nfs_nfsiodnew(); > - if (iod != -1) > + if ((iod != -1) && (nfs_iodwant[iod] == NULL)) > gotiod = TRUE; > } > Unfortunately, I don't think the above fixes the problem. If another thread that called nfs_asyncio() has "stolen" the this "iod", it will have set nfs_iodwant[iod] == NULL (set non-NULL at #238) and it will remain NULL until the other thread is done with it. If you instead make it: if (iod != -1 && nfs_iodwant[iod] != NULL) gotiod = TRUE; then I think it fixes your scenario above, but will break for the case where the mtx_lock(&nfs_iod_mtx) call in nfs_nfsnewiod() (#182) wins out over the one near the beginning of nfssvc_iod() (#226), since in that case, nfs_iodwant[iod] will still be NULL because it hasn't yet been set by nfssvc_iod() (#238). There should probably be some sort of 3 way handshake between the code in nfs_asyncio() after calling nfs_nfsnewiod() and the code near the beginning of nfssvc_iod(), but I think the following somewhat cheesy fix might do the trick: if (!gotiod) { iod = nfs_nfsiodnew(); if (iod != -1) { if (nfs_iodwant[iod] == NULL) { /* * Either another thread has acquired this * iod or I acquired the nfs_iod_mtx mutex * before the new iod thread did in * nfssvc_iod(). To be safe, go back and * try again after allowing another thread * to acquire the nfs_iod_mtx mutex. */ mtx_unlock(&nfs_iod_mtx); /* * So long as mtx_lock() implements some * sort of fairness, nfssvc_iod() should * get nfs_iod_mtx here and set * nfs_iodwant[iod] != NULL for the case * where the iod has not been "stolen" by * another thread for a different mount * point. */ mtx_lock(&nfs_iod_mtx); goto again; } gotiod = TRUE; } } Does anyone else have a better solution? (Mikolaj, could you by any chance test this? You can test yours, but I think it breaks.) rick