FreeBSD Mail Archives

Date:      Wed, 23 Feb 2011 12:34:39 -0500
From:      Ryan Stone <rysto32@gmail.com>
To:        freebsd-net <freebsd-net@freebsd.org>
Subject:   New device_polling algorithm
Message-ID:  <AANLkTikssob6OLbqrgG43ahO6V_gTc0pDp8RX3TMRNgL@mail.gmail.com>

next in thread | raw e-mail | index | archive | help

I've put together a patch against HEAD that replaces the
device_polling algorithm(it should apply cleanly to stable/8 as well
-- nothing has changed with polling in some time).  The patch can be
found here:
http://people.freebsd.org/~rstone/kern_poll.diff

The new algorithm makes use of the feedback that is already provided
by pollers(but the current algorithm ignores it).  Each poller returns
a value indicating how much "work" the polling handler performed in
this call(typically this is the number of packets handled).  The new
algorithm tries to spend (100 - user_frac)% of CPU time handling
packets in the netisr thread.  This includes time in the pollers as
well as time spent on other netisr tasks.  It uses the feedback from
the pollers in two ways in order to achieve this:
- The feedback is used to decide whether it's worthwhile to do another
iteration of polling in this tick.  If no poller handles more than
count/2 packets in the current iteration, the algorithm concludes that
there isn't enough outstanding work to continue polling and another
iteration won't be scheduled until the next tick.  Note that this
means that polling iterations can be rescheduled again and again in a
tick if there are a lot of packets waiting, which is a new feature.
- The feedback is used to estimate how much time it will take to do
another iteration of polling.  The algorithm dynamically adjusts the
count parameter passed to each driver to try and ensure that it only
uses as much CPU time as it has been allotted with user_frac.  This is
necessary to prevent the poller from rescheduling itself too often and
starving other threads, especially on single-core machines.

If you're on a multicore machine it might be a good idea to decrease
the sysctl kern.polling.user_frac.  This sysctl restricts how much CPU
time the poller is allowed to use on a single CPU.  Smaller values
mean less time for other tasks and more time for the poller.  The
poller won't necessarily use (100 - user_frac)% of a CPU.  That's the
maximum amount of time it's allowed to use, but if the pollers are
lightly loaded the poller will use significantly less time.  The
default is 50, which is reasonable for a uniprocessor system.  On a
multicore machine you might find this overly restrictive as you could
set this all the way down to 0 on a dual-core machine and get the same
50-50 split of CPU time between the poller and everything else.

I've put SDT probes in various strategic places.  I have a simple
dtrace script that logs the data from the probes here:
http://people.freebsd.org/~rstone/device_polling.d

The script is just a replacement for some KTRs that we had at the same
places in our internal branch, so it currently doesn't do anything
fancy.  I've found the KTRs invaluable for debugging polling problems
in the past, though, so I think that it's worth sharing.

You might notice that the SDT probes log a "poller index", but it's
currently always 0.  I would like to extend the poller further to take
advantage of multiple netisrs so I've made sure that the probes are
ready for this, but I'll talk about my multi-polling ideas later on in
another thread.

Any comments and testing would be welcome.  We mostly only run our
code on machines with lem/em/ixgbe devices, so testing against other
drivers would be especially welcome.

Ryan

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTikssob6OLbqrgG43ahO6V_gTc0pDp8RX3TMRNgL>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation