Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 03 Oct 2008 12:17:44 +0300
From:      Danny Braniss <danny@cs.huji.ac.il>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org, Jeremy Chadwick <koitsu@freebsd.org>, freebsd-stable@freebsd.org, Claus Guttesen <kometen@gmail.com>
Subject:   Re: bad NFS/UDP performance 
Message-ID:  <E1KlgnA-000F6w-NT@cs1.cs.huji.ac.il>
In-Reply-To: <alpine.BSF.1.10.0810031003440.41647@fledge.watson.org> 
References:  <E1Kj7NA-000FXz-3F@cs1.cs.huji.ac.il> <20080926081806.GA19055@icarus.home.lan> <E1Kj9bR-000H7t-0g@cs1.cs.huji.ac.il> <20080926095230.GA20789@icarus.home.lan> <E1KjEZw-000KkH-GP@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0809271114450.20117@fledge.watson.org> <E1KjY2h-0008GC-PP@cs1.cs.huji.ac.il> <b41c75520809290140i435a5f6dge5219cd03cad55fe@mail.gmail.com> <E1Klfac-000DzZ-Ie@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0810030910351.41647@fledge.watson.org> <E1KlgYe-000Es2-8u@cs1.cs.huji.ac.il> <alpine.BSF.1.10.0810031003440.41647@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> 
> On Fri, 3 Oct 2008, Danny Braniss wrote:
> 
> >> OK, so it looks like this was almost certainly the rwlock change.  What 
> >> happens if you pretty much universally substitute the following in 
> >> udp_usrreq.c:
> >>
> >> Currently		Change to
> >> ---------		---------
> >> INP_RLOCK		INP_WLOCK
> >> INP_RUNLOCK		INP_WUNLOCK
> >> INP_RLOCK_ASSERT	INP_WLOCK_ASSERT
> >
> > I guess you were almost certainly correct :-) I did the global subst. on the 
> > udp_usrreq.c from 19/08, __FBSDID("$FreeBSD: src/sys/netinet/udp_usrreq.c,v 
> > 1.218.2.3 2008/08/18 23:00:41 bz Exp $"); and now udp is fine again!
> 
> OK.  This is a change I'd rather not back out since it significantly improves 
> performance for many other UDP workloads, so we need to figure out why it's 
> hurting us so much here so that we know if there are reasonable alternatives.
> 
> Would it be possible for you to do a run of the workload with both kernels 
> using LOCK_PROFILING around the benchmark, and then we can compare lock 
> contention in the two cases?  What we often find is that relieving contention 
> at one point causes new contention at another point, and if the primitive used 
> at that point handles contention less well for whatever reason, performance 
> can be reduced rather than improved.  So maybe we're looking at an issue in 
> the dispatched UDP code from so_upcall?  Another less satisfying (and 
> fundamentally more difficult) answer might be "something to do with the 
> scheduler", but a bit more analysis may shed some light.

gladly, but have no idea how to do LOCK_PROFILING, so some pointers would be
helpfull.

as a side note, many years ago I checked out NFS/TCP and it was really bad,
I even remember NetApp telling us to drop TCP, but now, things look rather
better. Wonder what caused it.

danny





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1KlgnA-000F6w-NT>