Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Nov 2009 12:52:22 -0800
From:      Jack Vogel <jfvogel@gmail.com>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective  (7.2-R)
Message-ID:  <2a41acea0911121252y81f365fo2982e43e3efdba4d@mail.gmail.com>
In-Reply-To: <20091112204736.GA29095@icarus.home.lan>
References:  <4AFC63B0.5020707@alaska.net> <20091112204736.GA29095@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
It is critically important on these systems that you get the latest BIOS on
them, so
maybe that's the difference between you two.  I am going to be putting out a
new
em driver to CURRENT soon, it might be an option to try that as well, it
sounds
like a hang, management/os race in the driver is a possibility.

Jack


On Thu, Nov 12, 2009 at 12:47 PM, Jeremy Chadwick
<freebsd@jdc.parodius.com>wrote:

> On Thu, Nov 12, 2009 at 10:36:16AM -0900, Royce Williams wrote:
> > We have servers with dual 82573 NICs that work well during low-throughput
> activity, but during high-volume activity, they pause shortly after
> transfers start and do not recover.  Other sessions to the system are not
> affected.
>
> Please define "low-throughput" and "high-volume" if you could; it might
> help folks determine where the threshold is for problems.
>
> > These systems are being repurposed, jumping from 6.3 to 7.2.  The same
> system and its kin do not exhibit the symptom under 6.3-RELEASE-p13.  The
> symptoms appear under freebsd-updated 7.2-RELEASE GENERIC kernel with no
> tuning.
> >
> > Previously, we've been using DCGDIS.EXE (from Jack Vogel) for this
> symptom.  The first system to be repurposed accepts DCGDIS with 'Updated'
> and subsequent 'update not needed', with no relief.
> >
> > Notably, there are no watchdog timeout errors - unlike our various
> Supermicro models still running FreeBSD 6.x.  All of our other 7.x
> Supermicro flavors had already received the flash update and haven't show
> the symptom.
> >
> > Details follow.
> >
> > Kernel:
> >
> > rand# uname -a
> > FreeBSD rand.acsalaska.net 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Fri
> Oct  2 12:21:39 UTC 2009     root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
>  i386
> >
> > sysctls:
> >
> > rand# sysctl dev.em
> > dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6
> > dev.em.0.%driver: em
> > dev.em.0.%location: slot=0 function=0
> > dev.em.0.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9
> subdevice=0x108c class=0x020000
> > dev.em.0.%parent: pci13
> > dev.em.0.debug: -1
> > dev.em.0.stats: -1
> > dev.em.0.rx_int_delay: 0
> > dev.em.0.tx_int_delay: 66
> > dev.em.0.rx_abs_int_delay: 66
> > dev.em.0.tx_abs_int_delay: 66
> > dev.em.0.rx_processing_limit: 100
> > dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 6.9.6
> > dev.em.1.%driver: em
> > dev.em.1.%location: slot=0 function=0
> > dev.em.1.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9
> subdevice=0x108c class=0x020000
> > dev.em.1.%parent: pci14
> > dev.em.1.debug: -1
> > dev.em.1.stats: -1
> > dev.em.1.rx_int_delay: 0
> > dev.em.1.tx_int_delay: 66
> > dev.em.1.rx_abs_int_delay: 66
> > dev.em.1.tx_abs_int_delay: 66
> > dev.em.1.rx_processing_limit: 100
> >
> > kenv:
> >
> > rand# kenv | grep smbios | egrep -v 'socket|serial|uuid|tag|0123456789'
> > smbios.bios.reldate="03/05/2008"
> > smbios.bios.vendor="Phoenix Technologies LTD"
> > smbios.bios.version="6.00"
> > smbios.chassis.maker="Supermicro"
> > smbios.planar.maker="Supermicro"
> > smbios.planar.product="PDSMi "
> > smbios.planar.version="PCB Version"
> > smbios.system.maker="Supermicro"
> > smbios.system.product="PDSMi"
> >
> >
> > The system is not yet production, so I can invasively abuse it if needed.
>  The other systems are in production under 6.3-RELEASE-p13 and can also be
> inspected.
> >
> > Any pointers appreciated.
> >
> > Royce
>
> For what it's worth as a comparison base:
>
> We use the following Supermicro SuperServers, and can confirm that no
> such issues occur for us using RELENG_6 nor RELENG_7 on the following
> hardware:
>
> Supermicro SuperServer 5015B-MTB - amd64 - Intel 82573V + Intel 82573L
> Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L
> Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L
> Supermicro SuperServer 5015M-T+B - i386  - Intel 82573V + Intel 82573L
> Supermicro SuperServer 5015M-T+B - i386  - Intel 82573V + Intel 82573L
>
> The 5015B-MTB system presently runs RELENG_8 -- no issues there either.
>
> Relevant server configuration and network setup details:
>
> - All machines use pf(4).
> - All emX devices are configured for autoneg.
> - All emX devices use RXCSUM, TXCSUM, and TSO4.
> - We do not use polling.
> - All machines use both NICs simultaneously at all times.
> - All machines connected to an HP ProCurve 2626 switch (100mbit,
>  full-duplex ports, all autoneg).
> - We do not use Jumbo frames.
> - No add-in cards (PCI, PCI-X, nor PCIe) are used in the systems.
> - All of the systems had DCGDIS.EXE run on them; no EEPROM settings
>  were changed, indicating the from-the-Intel-factory MANC register
>  in question was set properly.
>
> Relevant throughput details per box:
>
> - em0 pushes ~600-1000kbit/sec at all times.
> - em1 pushes ~100-200kbit/sec at all times.
> - During nightly maintenance (backups), em1 pushes ~2-3mbit/sec
>  for a variable amount of time.
> - For a full level 0 backup (which I've done numerous times), em1
>  pushes 60-70mbit/sec without issues.
>
> I've compared your sysctl dev.em output to that of our 5015M-T+B systems
> (which use the PDSMi+, not the PDSMi, but whatever), and ours is 100%
> identical.
>
> All of our 5015M-T+B systems are using BIOS 1.3, and the 5015B-MTB
> system is using BIOS 1.30.
>
> If you'd like, I can provide the exact BIOS settings we use on the
> machines in question; they do deviate from the factory defaults a slight
> bit, but none of the adjustments are "tweaks" for performance or
> otherwise (just disabling things which we don't use, etc.).
>
> --
> | Jeremy Chadwick                                   jdc@parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.              PGP: 4BD6C0CB |
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2a41acea0911121252y81f365fo2982e43e3efdba4d>