From owner-freebsd-stable@FreeBSD.ORG Fri Jan 22 20:37:54 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C5651065670; Fri, 22 Jan 2010 20:37:54 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.152]) by mx1.freebsd.org (Postfix) with ESMTP id 7658D8FC0C; Fri, 22 Jan 2010 20:37:53 +0000 (UTC) Received: by fg-out-1718.google.com with SMTP id 16so146326fgg.13 for ; Fri, 22 Jan 2010 12:37:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:to:cc:subject:references :organization:from:date:in-reply-to:message-id:user-agent :mime-version:content-type; bh=9bup00Tj3jlLs7R1jBd1+fha+HAjNnLMHba4LLKaQOo=; b=c71xc628ay2UbQsnNl1saJQal006lIYguiU71hIgIjqtaYa7iYALBLCCXAxmkZDWGr 8saI09bPD0MR16Axxgd06Guk5G5MyrfmpWJeqhbZWrJqiPo6GJR+Xh0fLMGiZ2kQmuTt W6O6hZDlwvz4QmQOvBhu6a1+8fwsBtp2pQjP8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=to:cc:subject:references:organization:from:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=H7YV26eeHmGdectzYBZ1m8wsrNbOEY1XyvpFuWPn2YApDKTAahYEwukuWU8oDdlp1q JlMte3n8cYHp1OIufDfvtKODgm4MCcfQxt6UCLUqQWqADifiBA4jpalb9FWdSoq6DAMV YNzY2d0iGzEgA2aGVviUc3RFgzvr2KFIV92kE= Received: by 10.103.50.15 with SMTP id c15mr1840004muk.35.1264192672286; Fri, 22 Jan 2010 12:37:52 -0800 (PST) Received: from localhost ([95.69.162.7]) by mx.google.com with ESMTPS id e10sm10542198muf.26.2010.01.22.12.37.50 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 22 Jan 2010 12:37:51 -0800 (PST) To: Rick Macklem References: <86ocl272mb.fsf@kopusha.onet> <86tyuqnz9x.fsf@zhuzha.ua1> <86zl4awmon.fsf@zhuzha.ua1> <86vdeywmha.fsf@zhuzha.ua1> <86vdeuuo2y.fsf@zhuzha.ua1> Organization: TOA Ukraine From: Mikolaj Golub Date: Fri, 22 Jan 2010 22:37:49 +0200 In-Reply-To: (Rick Macklem's message of "Fri\, 22 Jan 2010 14\:37\:48 -0500 \(EST\)") Message-ID: <86my05x4de.fsf@kopusha.onet> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: FreeBSD NFS client/Linux NFS server issue X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Jan 2010 20:37:54 -0000 On Fri, 22 Jan 2010 14:37:48 -0500 (EST) Rick Macklem wrote: >> --- nfs_bio.c.orig 2010-01-22 15:38:02.000000000 +0000 >> +++ nfs_bio.c 2010-01-22 15:39:58.000000000 +0000 >> @@ -1385,7 +1385,7 @@ again: >> */ >> if (!gotiod) { >> iod = nfs_nfsiodnew(); >> - if (iod != -1) >> + if ((iod != -1) && (nfs_iodwant[iod] == NULL)) >> gotiod = TRUE; >> } >> > > Unfortunately, I don't think the above fixes the problem. > If another thread that called nfs_asyncio() has "stolen" the this "iod", > it will have set nfs_iodwant[iod] == NULL (set non-NULL at #238) > and it will remain NULL until the other thread is done with it. I see. I have missed this. Thanks. > > There should probably be some sort of 3 way handshake between > the code in nfs_asyncio() after calling nfs_nfsnewiod() and the > code near the beginning of nfssvc_iod(), but I think the following > somewhat cheesy fix might do the trick: > > if (!gotiod) { > iod = nfs_nfsiodnew(); > if (iod != -1) { > if (nfs_iodwant[iod] == NULL) { > /* > * Either another thread has acquired this > * iod or I acquired the nfs_iod_mtx mutex > * before the new iod thread did in > * nfssvc_iod(). To be safe, go back and > * try again after allowing another thread > * to acquire the nfs_iod_mtx mutex. > */ > mtx_unlock(&nfs_iod_mtx); > /* > * So long as mtx_lock() implements some > * sort of fairness, nfssvc_iod() should > * get nfs_iod_mtx here and set > * nfs_iodwant[iod] != NULL for the case > * where the iod has not been "stolen" by > * another thread for a different mount > * point. > */ > mtx_lock(&nfs_iod_mtx); > goto again; > } > gotiod = TRUE; > } > } > > Does anyone else have a better solution? > (Mikolaj, could you by any chance test this? You can test yours, but I > think it breaks.) Unfortunately we observed this only on our production servers. A week ago we made some changes in configuration as workaround -- reconfigure cron no to run scripts simultaneously, set the scripts in cron that just periodically write a line to the file on nfs share (to "unlock" it if it is locked). We have not been observed problems since then and we would not like to experiment in production. If I manage to produce good test case in test environment I will be able to test the patch but I am not sure... -- Mikolaj Golub