From owner-freebsd-fs@FreeBSD.ORG Wed Mar 19 07:31:26 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 94369923 for ; Wed, 19 Mar 2014 07:31:26 +0000 (UTC) Received: from mail-we0-x231.google.com (mail-we0-x231.google.com [IPv6:2a00:1450:400c:c03::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2C6CCF5C for ; Wed, 19 Mar 2014 07:31:26 +0000 (UTC) Received: by mail-we0-f177.google.com with SMTP id u57so6612842wes.22 for ; Wed, 19 Mar 2014 00:31:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=HuboMjZLmK7SQPmK7apV5LE3VZHwP9R/MvSCd0FLBGA=; b=ppljB5vS9KIIYdFg3wzkidKpRCayWZl1mvkZ/BbSaQJ/6zfk1UnIdy65BO61wJbPkj gpzypIVP4YOMGh5X992PcsynEQY5OXpDbCbpwNvWU9e3dIjchzUmiSheE1EfzsymBNDI ih+D66jaXd0IuN1YJr7nA/KyLaO9maUuf5rTYsQ+5zTD/Mva6cMGvY7u+b06/4B8my33 /ysHM8hrTHb04sVZaobpZIrpJJST7w3/Ho64vQqSvJF2zx9nd+Pzd9JDmpXv9exlfyWI KtveU23qH4No9uRK+HsLuUI82S3MY8ZCvAxFDidKtQm9F+01lAhnmPyPmXvUW3fwqqQm kiaQ== X-Received: by 10.194.188.41 with SMTP id fx9mr723184wjc.56.1395214284569; Wed, 19 Mar 2014 00:31:24 -0700 (PDT) Received: from mavbook.mavhome.dp.ua ([134.249.139.101]) by mx.google.com with ESMTPSA id lz3sm41722354wic.1.2014.03.19.00.31.22 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 19 Mar 2014 00:31:23 -0700 (PDT) Sender: Alexander Motin Message-ID: <532947C9.9010607@FreeBSD.org> Date: Wed, 19 Mar 2014 09:31:21 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: Rick Macklem Subject: Re: review/test: NFS patch to use pagesize mbuf clusters References: <2092082855.24699674.1395187057807.JavaMail.root@uoguelph.ca> In-Reply-To: <2092082855.24699674.1395187057807.JavaMail.root@uoguelph.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2014 07:31:26 -0000 On 19.03.2014 01:57, Rick Macklem wrote: > Alexander Motin wrote: >> I run several profiles on em NIC with and without the patch. I can >> confirm that without the patch m_defrag() is indeed called, while >> with >> patch it is not any more. But profiler shows to me that very small >> amount of time (percents or even fractions) is spent there. I can't >> measure the effect (my Core-i7 desktop test system has only about 5% >> CPU >> load while serving full 1Gbps NFS over the em), though I can't say >> for >> sure that effect can't be there on some low-end system. >> > Well, since m_defrag() creates a new list and bcopy()s the data, there > is some overhead, although I'm not surprised it isn't that easy to measure. > (I thought your server built entirely of SSDs might show a difference.) I did my test even from TMPFS, not SSD, but mentioned em NIC is only 1Gbps, that is too slow to reasonably load the system. > I am more concerned with the possibility of m_defrag() failing and the > driver dropping the reply, forcing the client to do a fresh TCP connection > and retry of the RPC after a long timeout (1minute or more). This will > show up as "terrible performance" for users. > > Also, some drivers use m_collapse() instead of m_defrag() and these > will probably be "train wrecks". I get cases where reports of serious > NFS problems get "fixed" by disabling TSO and I was hoping this would > work around that. Yes, I accept that argument. I don't see much reason to cut continuous data in small chunks. >> I am also not very sure about replacing M_WAITOK with M_NOWAIT. >> Instead >> of waiting a bit while VM find a cluster, NFSMCLGET() will return >> single >> mbuf, as result, replacing chain of 2K clusters instead of 4K ones >> with >> chain of 256b mbufs. >> > I hoped the comment in the patch would explain this. > > When I was testing (on a small i386 system), I succeeded in getting > threads stuck sleeping on "btalloc" a couple of times when I used > M_WAITOK for m_getjcl(). As far as I could see, this indicated that > it hasd run out of kernel address space, but I'm not sure. > --> That is why I used M_NOWAIT for m_getjcl(). > > As for using MCLGET(..M_NOWAIT), the main reason for doing that > was I noticed that the code does a drain on zone_mcluster if this > allocation attempt for a cluster fails. For some reason, m_getcl() > and m_getjcl() do not do this drain of the zone? > I thought the drain might help memory constrained cases. > To be honest, I've never been able to get a MCLGET(..M_NOWAIT) > to fail during testing. If it is true, I think that should be handled inside the allocation code, not work arounded here. Passing M_NOWAIT means that you agree to get NULL there, but IMO you don't really want to cut 64K data in ~200 byte pieces in any case even if system is in low memory condition, since at least most NICs won't be able to send it without defragging, that will also be problematic in low-memory case. -- Alexander Motin