Date: Thu, 19 Oct 2006 12:16:22 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Kris Kennaway <kris@obsecurity.org> Cc: Kip Macy <kip.macy@gmail.com>, freebsd-stable@freebsd.org, Jack Vogel <jfvogel@gmail.com>, freebsd-net <freebsd-net@freebsd.org> Subject: Re: em network issues Message-ID: <20061019110950.X75878@delplex.bde.org> In-Reply-To: <20061018224233.GA1632@xor.obsecurity.org> References: <2a41acea0610181046k822afd1qcec4187dc8514187@mail.gmail.com> <b1fa29170610181523t6d240839i887632d6d7576762@mail.gmail.com> <2a41acea0610181531y732cd5sa7bf733cc445491c@mail.gmail.com> <20061018224233.GA1632@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 18 Oct 2006, Kris Kennaway wrote: > I have been working with someone's system that has em shared with fxp, > and a simple fetch over the em (e.g. of a 10 GB file of zeroes) is > enough to produce watchdog timeouts after a few seconds. > > As previously mentioned, changing the INTR_FAST to INTR_MPSAFE in the > driver avoids this problem. However, others are seeing sporadic > watchdog timeouts at higher system load on non-shared em systems too. em_intr_fast() has no locking whatsoever. I would be very surprised if it even seemed to work for SMP. For UP, masking of CPU interrupts (as is automatic in fast interrupt handlers) might provide sufficient locking, but for many drivers fast wth interrupt handlers, whatever locking is used by the fast interrupt handler must be used all over the driver to protect data strutures that are or might be accessed by the fast interrupt handler. That means lots of intr_disable/enable()s if the UP case is micro-optimized and lots of mtx_lock/unlock_spin()s for the general case. But em has no references to spinlocks or CPU interrupt disabling. em_intr() starts with EM_LOCK(), so it isn't obviously broken near its first statement. Very few operations are valid in fast interrupt handlers. Locking and fastness must be considered for every operation, not only in the interrupt handler but in all data structures shared by the interrupt handler. For just the interrupt handler in em: % static void % em_intr_fast(void *arg) % { % struct adapter *adapter = arg; This is safe because it has no side effects and doesn't take long. % struct ifnet *ifp; % uint32_t reg_icr; % % ifp = adapter->ifp; % This is safe provided other parts of the driver ensure that the interrupt handler is not reached after adapter->ifp goes away. Similarly for other long-lived almost-const parts of *adapter. % reg_icr = E1000_READ_REG(&adapter->hw, ICR); % This is safe provided reading the register doesn't change it. % /* Hot eject? */ % if (reg_icr == 0xffffffff) % return; % % /* Definitely not our interrupt. */ % if (reg_icr == 0x0) % return; % These are safe since we don't do anything with the result. % /* % * Starting with the 82571 chip, bit 31 should be used to % * determine whether the interrupt belongs to us. % */ % if (adapter->hw.mac_type >= em_82571 && % (reg_icr & E1000_ICR_INT_ASSERTED) == 0) % return; % This is safe, as above. % /* % * Mask interrupts until the taskqueue is finished running. This is % * cheap, just assume that it is needed. This also works around the % * MSI message reordering errata on certain systems. % */ % em_disable_intr(adapter); Now that we start doing things, we have various races. The above races to disable interrupts with other entries to this interrupt handler, and may race with other parts of the driver. After we disable driver interrupts. There should be no more races with other entries to this handler. However, reg_icr may be stale at this point even if we handled the race correctly. The other entries may have partly or completely handled the interrupt when we get back here (we should have locked just before here, and then if the lock blocked waiting for the other entries (which can only happen in the SMP case), we should reread the status register to see if we still have anything to do, or more importantly to see what we have to do now (extrascheduling of the SWI handler would just wake time, but missing scheduling would break things). % taskqueue_enqueue(adapter->tq, &adapter->rxtx_task); % Safe provided the API is correctly implemented. (AFAIK, the API only has huge design errors.) % /* Link status change */ % if (reg_icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC)) % taskqueue_enqueue(taskqueue_fast, &adapter->link_task); % As above, plus might miss this call if the status changed underneath us. % if (reg_icr & E1000_ICR_RXO) % adapter->rx_overruns++; Race updating the counter. Generally, fast interrupt handlers should avoid book-keeping like this, since correct locking for it would poison large parts of the driver with the locking required for the fast interrupt handler. Perhaps similarly for important things. It's safe to read the status register provided reading it doesn't change it. Then it is safe to schedule tasks based on the contents of the register provided we don't do anything else and schedule enough tasks. But don't disable interrupts -- leave that to the task and make the task do nothing if it handled everything for a previous scheduling. This would result in the task usually being scheduled when the interrupt is for us but not if it is for another device. The above doesn't try to do much more than this. However, a fast interrupt handler needs to handle the usual case to be worth having except on systems where there are lots of shared interrupts. % } Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061019110950.X75878>