Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Mar 2014 09:31:21 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: review/test: NFS patch to use pagesize mbuf clusters
Message-ID:  <532947C9.9010607@FreeBSD.org>
In-Reply-To: <2092082855.24699674.1395187057807.JavaMail.root@uoguelph.ca>
References:  <2092082855.24699674.1395187057807.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On 19.03.2014 01:57, Rick Macklem wrote:
> Alexander Motin wrote:
>> I run several profiles on em NIC with and without the patch. I can
>> confirm that without the patch m_defrag() is indeed called, while
>> with
>> patch it is not any more. But profiler shows to me that very small
>> amount of time (percents or even fractions) is spent there. I can't
>> measure the effect (my Core-i7 desktop test system has only about 5%
>> CPU
>> load while serving full 1Gbps NFS over the em), though I can't say
>> for
>> sure that effect can't be there on some low-end system.
>>
> Well, since m_defrag() creates a new list and bcopy()s the data, there
> is some overhead, although I'm not surprised it isn't that easy to measure.
> (I thought your server built entirely of SSDs might show a difference.)

I did my test even from TMPFS, not SSD, but mentioned em NIC is only 
1Gbps, that is too slow to reasonably load the system.

> I am more concerned with the possibility of m_defrag() failing and the
> driver dropping the reply, forcing the client to do a fresh TCP connection
> and retry of the RPC after a long timeout (1minute or more). This will
> show up as "terrible performance" for users.
>
> Also, some drivers use m_collapse() instead of m_defrag() and these
> will probably be "train wrecks". I get cases where reports of serious
> NFS problems get "fixed" by disabling TSO and I was hoping this would
> work around that.

Yes, I accept that argument. I don't see much reason to cut continuous 
data in small chunks.

>> I am also not very sure about replacing M_WAITOK with M_NOWAIT.
>> Instead
>> of waiting a bit while VM find a cluster, NFSMCLGET() will return
>> single
>> mbuf, as result, replacing chain of 2K clusters instead of 4K ones
>> with
>> chain of 256b mbufs.
>>
> I hoped the comment in the patch would explain this.
>
> When I was testing (on a small i386 system), I succeeded in getting
> threads stuck sleeping on "btalloc" a couple of times when I used
> M_WAITOK for m_getjcl(). As far as I could see, this indicated that
> it hasd run out of kernel address space, but I'm not sure.
> --> That is why I used M_NOWAIT for m_getjcl().
>
> As for using MCLGET(..M_NOWAIT), the main reason for doing that
> was I noticed that the code does a drain on zone_mcluster if this
> allocation attempt for a cluster fails. For some reason, m_getcl()
> and m_getjcl() do not do this drain of the zone?
> I thought the drain might help memory constrained cases.
> To be honest, I've never been able to get a MCLGET(..M_NOWAIT)
> to fail during testing.

If it is true, I think that should be handled inside the allocation 
code, not work arounded here. Passing M_NOWAIT means that you agree to 
get NULL there, but IMO you don't really want to cut 64K data in ~200 
byte pieces in any case even if system is in low memory condition, since 
at least most NICs won't be able to send it without defragging, that 
will also be problematic in low-memory case.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?532947C9.9010607>