From owner-freebsd-hackers Mon Jan 28 12:41: 0 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by hub.freebsd.org (Postfix) with ESMTP id D8D7B37B42B for ; Mon, 28 Jan 2002 12:40:50 -0800 (PST) Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.11.0/8.11.0) id g0SKeof15794 for hackers@freebsd.org; Mon, 28 Jan 2002 12:40:50 -0800 Date: Mon, 28 Jan 2002 12:40:50 -0800 From: Brooks Davis To: hackers@freebsd.org Subject: bge + hardware checksum hangs Message-ID: <20020128124050.A13399@Odin.AC.HMC.Edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="cWoXeonUoKmBZSoM" Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --cWoXeonUoKmBZSoM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable It looks like the TCP recieve checksum issues weren't the only ones we had to contend with. I've got a couple of new iXsystems 2650's with 3Com 3C996-T's in them and while running cvsup I get long hangs usually resulting in a lost connection. When the machines recover I see watchdog timeout messages in /var/log/messages. The current system configuration is a bit weird in that I've got the nic hooked up to a 10/100 HUB so I'm currently running 100 half-duplex. Acting on the theory that HW checksuming had already failed in some situations, I modified the BGE_CSUM_FEATURES define to 0 and so far things seem to be working. I'm in the middle of a ports cvsup and I completed a cvsup over the 4.5 branch and tagging without a hitch. This seems to imply that at least TCP checksuming is broken across the board. The really odd thing is that I haven't had any real problems with local connections, only cvsups and possiably one hang due to a whole lot of console output over ssh. I've been able to do 10 minute long netperf runs in both TCP_STREAM and TCP_RR modes to local hosts without any hangs. Does anyone have any ideas other them the current disabling of hardware checksuming? That's probably fine for now, but it's really going to suck on the core NFS server for this cluster once we're up and running. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --cWoXeonUoKmBZSoM Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE8VbdSXY6L6fI4GtQRAmdqAKCFVRTGKell5QBge4xp4J7U/kD57ACZAXv9 n6iMvpFN8v47qMdJos3G/5A= =IhPB -----END PGP SIGNATURE----- --cWoXeonUoKmBZSoM-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message