From owner-freebsd-net@FreeBSD.ORG Thu Apr 8 12:36:32 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0C46616A4CE; Thu, 8 Apr 2004 12:36:32 -0700 (PDT) Received: from tigra.ip.net.ua (tigra.ip.net.ua [82.193.96.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6F2F943D5C; Thu, 8 Apr 2004 12:36:30 -0700 (PDT) (envelope-from ru@ip.net.ua) Received: from heffalump.ip.net.ua (heffalump.ip.net.ua [82.193.96.213]) by tigra.ip.net.ua (8.12.11/8.12.11) with ESMTP id i38Jehd1045572 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 8 Apr 2004 22:40:45 +0300 (EEST) (envelope-from ru@ip.net.ua) Received: (from ru@localhost) by heffalump.ip.net.ua (8.12.11/8.12.11) id i38JaIDw002079; Thu, 8 Apr 2004 22:36:18 +0300 (EEST) (envelope-from ru) Date: Thu, 8 Apr 2004 22:36:18 +0300 From: Ruslan Ermilov To: Bruce Evans Message-ID: <20040408193618.GA1919@ip.net.ua> References: <20240000.1079394807@palle.girgensohn.se> <3810000.1081299464@palle.girgensohn.se> <20040407235838.K11719@gamplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ew6BAiZeqk4r7MaW" Content-Disposition: inline In-Reply-To: <20040407235838.K11719@gamplex.bde.org> User-Agent: Mutt/1.5.6i X-Virus-Scanned: by amavisd-new X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) cc: current@FreeBSD.org cc: Palle Girgensohn cc: net@FreeBSD.org Subject: Re: sk ethernet driver: watchdog timeout X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Apr 2004 19:36:32 -0000 --ew6BAiZeqk4r7MaW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Apr 08, 2004 at 12:17:06AM +1000, Bruce Evans wrote: [...] > The following patch reduces the problem on A7V8X-E a little. It limits > the tx queue to 1 packet and fixes handling of the timeout on txeof. > The first part probably makes the second part a no-op. Without this, > my A7V8X-E hangs on even light nfs activity (e.g., copying a 1MB file > to nfs). With it, it takes heavier nfs activity to hang (makeworld > never completes, and a flood ping always hangs). >=20 > I first suspected an interrupt-related bug, but the bug seems to be > more hardware-specific. Examination of the output queues shows that > the tx sometimes just stops before processing all packets. Resetting > in sk_watchdog() doesn't always fix the problem, and the timeout usually > stops firing after a couple of unsuccessful resets, giving a completely > hung device. But the problem may be related to interrupt timing, since > it is much smaller under RELENG_4. RELENG_4 hangs about as often > without this hack as -current does with it. >=20 > nv0 hangs similarly. fxp0 just works. >=20 > %%% > Index: if_sk.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: /home/ncvs/src/sys/pci/if_sk.c,v > retrieving revision 1.78 > diff -u -2 -r1.78 if_sk.c > --- if_sk.c 31 Mar 2004 12:35:51 -0000 1.78 > +++ if_sk.c 1 Apr 2004 07:33:58 -0000 > @@ -1830,4 +1830,9 @@ > SK_IF_LOCK(sc_if); >=20 > + if (sc_if->sk_cdata.sk_tx_cnt > 0) { > + SK_IF_UNLOCK(sc_if); > + return; > + } > + > idx =3D sc_if->sk_cdata.sk_tx_prod; >=20 > @@ -1853,4 +1858,5 @@ > */ > BPF_MTAP(ifp, m_head); > + break; > } >=20 > @@ -2000,5 +2031,4 @@ > sc_if->sk_cdata.sk_tx_cnt--; > SK_INC(idx, SK_TX_RING_CNT); > - ifp->if_timer =3D 0; > } >=20 > @@ -2007,4 +2037,6 @@ > if (cur_tx !=3D NULL) > ifp->if_flags &=3D ~IFF_OACTIVE; > + > + ifp->if_timer =3D (sc_if->sk_cdata.sk_tx_cnt =3D=3D 0) ? 0 : 5; >=20 > return; > %%% >=20 Always recharging the timer to 5 when there's some TX work still left is a bug. With DEVICE_POLLING (yes, I have plans to add polling(4) support for sk(4) too), sk_txeof() will be called periodically, and if the card gets stuck, the if_timer will never downgrade to zero, and sk_watchdog() will never be called. Without DEVICE_POLLING, recharging it back to 5 even when if_timer reaches 0 is still pointless, because when if_timer is 0 while in the sk_txeof(), it means it's called by sk_watchdog() which will reinit the card and both RX and TX lists, making them empty, so having the if_timer with the value of 5 _after_ executing the watchdog cleaning and having _no_ TX activity at all may cause a second (false) watchdog. My version of the TX fixes (which also fixes resetting of IFF_OACTIVE): %%% Index: if_sk.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/pci/if_sk.c,v retrieving revision 1.78 diff -u -p -r1.78 if_sk.c --- if_sk.c 31 Mar 2004 12:35:51 -0000 1.78 +++ if_sk.c 8 Apr 2004 19:10:50 -0000 @@ -1998,14 +1998,14 @@ sk_txeof(sc_if) sc_if->sk_cdata.sk_tx_chain[idx].sk_mbuf =3D NULL; } sc_if->sk_cdata.sk_tx_cnt--; + ifp->if_flags &=3D ~IFF_OACTIVE; SK_INC(idx, SK_TX_RING_CNT); - ifp->if_timer =3D 0; } =20 sc_if->sk_cdata.sk_tx_cons =3D idx; =20 - if (cur_tx !=3D NULL) - ifp->if_flags &=3D ~IFF_OACTIVE; + if (sc_if->sk_cdata.sk_tx_cnt =3D=3D 0) + ifp->if_timer =3D 0; =20 return; } %%% We have been running the 3COM 3C940 card on 4.9 (and from today on 4.10-BETA) without any problems and under a heavy TX load. Cheers, --=20 Ruslan Ermilov ru@FreeBSD.org FreeBSD committer --ew6BAiZeqk4r7MaW Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (FreeBSD) iD8DBQFAdamyUkv4P6juNwoRAv3lAJ4pOtf3uOhCykrrHmGz7O+IPAFHtACeLBs4 Z7+c3DoQnN1htNLZTmLkw08= =qgbu -----END PGP SIGNATURE----- --ew6BAiZeqk4r7MaW--