FreeBSD Mail Archives

Date:      Thu, 19 Oct 2006 12:16:22 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        Kip Macy <kip.macy@gmail.com>, freebsd-stable@freebsd.org, Jack Vogel <jfvogel@gmail.com>, freebsd-net <freebsd-net@freebsd.org>
Subject:   Re: em network issues
Message-ID:  <20061019110950.X75878@delplex.bde.org>
In-Reply-To: <20061018224233.GA1632@xor.obsecurity.org>
References:  <2a41acea0610181046k822afd1qcec4187dc8514187@mail.gmail.com> <b1fa29170610181523t6d240839i887632d6d7576762@mail.gmail.com> <2a41acea0610181531y732cd5sa7bf733cc445491c@mail.gmail.com> <20061018224233.GA1632@xor.obsecurity.org>

On Wed, 18 Oct 2006, Kris Kennaway wrote:

> I have been working with someone's system that has em shared with fxp,
> and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is
> enough to produce watchdog timeouts after a few seconds.
>
> As previously mentioned, changing the INTR_FAST to INTR_MPSAFE in the
> driver avoids this problem.  However, others are seeing sporadic
> watchdog timeouts at higher system load on non-shared em systems too.

em_intr_fast() has no locking whatsoever.  I would be very surprised
if it even seemed to work for SMP.  For UP, masking of CPU interrupts
(as is automatic in fast interrupt handlers) might provide sufficient
locking, but for many drivers fast wth interrupt handlers, whatever
locking is used by the fast interrupt handler must be used all over
the driver to protect data strutures that are or might be accessed by
the fast interrupt handler.  That means lots of intr_disable/enable()s
if the UP case is micro-optimized and lots of mtx_lock/unlock_spin()s
for the general case.  But em has no references to spinlocks or CPU
interrupt disabling.

em_intr() starts with EM_LOCK(), so it isn't obviously broken near its
first statement.

Very few operations are valid in fast interrupt handlers.  Locking
and fastness must be considered for every operation, not only in
the interrupt handler but in all data structures shared by the
interrupt handler.  For just the interrupt handler in em:

% static void
% em_intr_fast(void *arg)
% {
% 	struct adapter	*adapter = arg;

This is safe because it has no side effects and doesn't take long.

% 	struct ifnet	*ifp;
% 	uint32_t	reg_icr;
% 
% 	ifp = adapter->ifp;
%

This is safe provided other parts of the driver ensure that the interrupt
handler is not reached after adapter->ifp goes away.  Similarly for other
long-lived almost-const parts of *adapter.

% 	reg_icr = E1000_READ_REG(&adapter->hw, ICR);
%

This is safe provided reading the register doesn't change it.

% 	/* Hot eject?  */
% 	if (reg_icr == 0xffffffff)
% 		return;
% 
% 	/* Definitely not our interrupt.  */
% 	if (reg_icr == 0x0)
% 		return;
%

These are safe since we don't do anything with the result.

% 	/*
% 	 * Starting with the 82571 chip, bit 31 should be used to
% 	 * determine whether the interrupt belongs to us.
% 	 */
% 	if (adapter->hw.mac_type >= em_82571 &&
% 	    (reg_icr & E1000_ICR_INT_ASSERTED) == 0)
% 		return;
%

This is safe, as above.

% 	/*
% 	 * Mask interrupts until the taskqueue is finished running.  This is
% 	 * cheap, just assume that it is needed.  This also works around the
% 	 * MSI message reordering errata on certain systems.
% 	 */
% 	em_disable_intr(adapter);

Now that we start doing things, we have various races.

The above races to disable interrupts with other entries to this interrupt
handler, and may race with other parts of the driver.

After we disable driver interrupts.  There should be no more races with
other entries to this handler.  However, reg_icr may be stale at this
point even if we handled the race correctly.  The other entries may have
partly or completely handled the interrupt when we get back here (we
should have locked just before here, and then if the lock blocked waiting
for the other entries (which can only happen in the SMP case), we should
reread the status register to see if we still have anything to do, or
more importantly to see what we have to do now (extrascheduling of the
SWI handler would just wake time, but missing scheduling would break
things).

% 	taskqueue_enqueue(adapter->tq, &adapter->rxtx_task);
%

Safe provided the API is correctly implemented.  (AFAIK, the API only
has huge design errors.)

% 	/* Link status change */
% 	if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))
% 		taskqueue_enqueue(taskqueue_fast, &adapter->link_task);
%

As above, plus might miss this call if the status changed underneath us.

% 	if (reg_icr & E1000_ICR_RXO)
% 		adapter->rx_overruns++;

Race updating the counter.

Generally, fast interrupt handlers should avoid book-keeping like this,
since correct locking for it would poison large parts of the driver
with the locking required for the fast interrupt handler.  Perhaps
similarly for important things.  It's safe to read the status register
provided reading it doesn't change it.  Then it is safe to schedule
tasks based on the contents of the register provided we don't do
anything else and schedule enough tasks.  But don't disable interrupts
-- leave that to the task and make the task do nothing if it handled
everything for a previous scheduling.  This would result in the task
usually being scheduled when the interrupt is for us but not if it is
for another device.  The above doesn't try to do much more than this.
However, a fast interrupt handler needs to handle the usual case to
be worth having except on systems where there are lots of shared
interrupts.

% }

Bruce

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061019110950.X75878>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation