Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Aug 2009 19:24:46 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Manish Vachharajani <manishv@lineratesystems.com>
Cc:        freebsd-net@FreeBSD.org
Subject:   Re: Dropped vs. missed packets in the ixgbe driver
Message-ID:  <20090819183756.Y35058@delplex.bde.org>
In-Reply-To: <5bc218350908181535o7c5275dfn2f6647454cfac804@mail.gmail.com>
References:  <5bc218350908171524m5a46c3dbm3e6af625c51370d0@mail.gmail.com> <373149.52091.qm@web63907.mail.re1.yahoo.com> <5bc218350908181535o7c5275dfn2f6647454cfac804@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 18 Aug 2009, Manish Vachharajani wrote:

> So, in a nutshell, the question is:  should these drivers be reporting
> miss events as input errors in the ifnet struct as the bge driver
> does, or as drops in the ifnet struct, was there some conscious
> decision not to report miss events anywhere outside the debug and
> stats info, or am I just being silly and not seeing where the numbers
> are reported?

Certainly they should be no worse than bge in this area.  Even bge has
problems for the 5705_PLUS versions.  PLUS really means MINUS; 5705-
hardware is dumbed down so the IFIN_DROPS register is almost unusable,
but since the hardware is so bad drops are more likely than with better
hardware.  The unusablility involves the register being only 8 (?) bits
wide and being reset on every read, so you have to read it often to
ensure that it doesn't wrap, but reading it (or any PCI register) is
very inefficient so the the read that is done often enough to work
(in bge_rxeof()) is only done if the "notyet" non-option is configured.
Resetting on every read of this and most or all other statistic
registers on 5705- hardware also completely breaks most or all bge
statistics in the bge statistics sysctl, due to the way sysctl(3) is
implemented: sysctl(3) always calls the sysctl syscall twice and uses
the results of the second call; both calls do a read at the lowest
level, so with registers that are reset on every read, the first call
resets the registers and the second call usually reads zero.  No history
is kept in the sysctl, so the sysctl also clobbers the statistics that
are maintained at the non-sysctl level (only collisions and ifin drops
for 5705-).  The non-sysctl level understands the reset and does keep
history, but this is defeated if the sysctl is used.

There may also still be a generic problem with intrq drops.  The default
ip intrq length (sysctl net.inet.ip.intr_queue_maxlen) was too small
by default (32 IIRC).  Now it is larger by default (256), but 256 is
still small if you have multiple NICs with rx ring sizes of hundreds
or thousands.  Direct dispatch reduces this problem.  Further, if an
intrq drop actually occurs, then it is only reported in generic ip
statistics (net.inet.ip.intr_queue_drops); there is no sign of it in
ierrors and no way to determine which interface it happened on.  I use
1024 to ensure no drops with a single bge NIC.

There is still a related design problem for intrq drops: packets that
will be dropped should not even be passed to upper layers, to avoid
unnecessary extra load on already-overloaded systems.  There is related
inefficiency of IFF_MONITOR mode: checking for this should be the very
first thing in ether_input(), or at least before asking for cache
misses by looking at packet headers, but the check is after mounds of
code and at least 1 likely cache miss (for initializing etype unecessarily
early).  Intrq drops would be efficient if they occurred near the start
of ether_input() too, but they occur up much further up the stack than
the check for monitor mode.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090819183756.Y35058>