Date: Mon, 11 Oct 2010 16:16:04 -0700 From: Pyun YongHyeon <pyunyh@gmail.com> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-current@freebsd.org Subject: Re: recent bge(4) changes causing problems Message-ID: <20101011231604.GI4607@michelle.cdnetworks.com> In-Reply-To: <20101011225331.GA2829@troutmask.apl.washington.edu> References: <20101011225331.GA2829@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
--7qSK/uQB79J36Y4o Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Oct 11, 2010 at 03:53:31PM -0700, Steve Kargl wrote: > It seems recent changes to the bge driver are causing > some problems with my hardware where the watchdog is > now timing out. > > /var/log/messages contains > > 14:23:14 kernel: SMP: AP CPU #1 Launched! > 14:23:14 kernel: Trying to mount root from ufs:/dev/ad6s1a > 14:23:15 kernel: bge1: link state changed to UP > 14:23:15 lpd[1190]: lpd startup: logging=0 > 14:23:15 ntpd[1224]: ntpd 4.2.4p5-a (1) > 14:23:15 kernel: bge0: link state changed to UP > 14:23:24 ntpd[1225]: time reset -0.677316 s > 14:23:24 ntpd[1225]: kernel time sync status change 2001 > 14:31:01 kernel: bge0: watchdog timeout -- resetting > 14:31:01 kernel: bge0: link state changed to DOWN > 14:31:02 kernel: Limiting icmp unreach response from 613 to 200 packets/sec > 14:31:04 ntpd[1225]: sendto(140.142.2.8) (fd=22): No route to host > 14:31:04 kernel: bge0: link state changed to UP > 14:31:30 kernel: Limiting icmp unreach response from 205 to 200 packets/sec > 14:31:31 kernel: Limiting icmp unreach response from 203 to 200 packets/sec > 15:40:11 su: kargl to root on /dev/pts/0 > 15:40:35 kernel: bge0: link state changed to DOWN > 15:40:38 kernel: bge0: link state changed to UP > > The last 2 bge messages are from me manually using > ifconfig to bring my net connect back to life. > > troutmask:kargl[206] sysctl -a | grep bge.0 > dev.bge.0.%desc: Broadcom Gigabit Ethernet Controller, ASIC rev. 0x002100 > dev.bge.0.%driver: bge > dev.bge.0.%location: slot=9 function=0 handle=\_SB_.PCI0.GOLA.GLAN > dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1648 subvendor=0x14e4 subdevice=0x1644 class=0x020000 > dev.bge.0.%parent: pci2 > dev.bge.0.forced_collapse: 0 > dev.bge.0.forced_udpcsum: 0 > dev.bge.0.stats.FramesDroppedDueToFilters: 0 > dev.bge.0.stats.DmaWriteQueueFull: 0 > dev.bge.0.stats.DmaWriteHighPriQueueFull: 0 > dev.bge.0.stats.NoMoreRxBDs: 0 > dev.bge.0.stats.InputDiscards: 0 > dev.bge.0.stats.InputErrors: 0 > dev.bge.0.stats.RecvThresholdHit: 325 > dev.bge.0.stats.DmaReadQueueFull: 0 > dev.bge.0.stats.DmaReadHighPriQueueFull: 0 > dev.bge.0.stats.SendDataCompQueueFull: 0 > dev.bge.0.stats.RingSetSendProdIndex: 469 > dev.bge.0.stats.RingStatusUpdate: 330 > dev.bge.0.stats.Interrupts: 330 > dev.bge.0.stats.AvoidedInterrupts: 0 > dev.bge.0.stats.SendThresholdHit: 0 > dev.bge.0.stats.rx.ifHCInOctets: 569452 > dev.bge.0.stats.rx.Fragments: 0 > dev.bge.0.stats.rx.UnicastPkts: 497 > dev.bge.0.stats.rx.MulticastPkts: 1 > dev.bge.0.stats.rx.FCSErrors: 0 > dev.bge.0.stats.rx.AlignmentErrors: 0 > dev.bge.0.stats.rx.xonPauseFramesReceived: 0 > dev.bge.0.stats.rx.xoffPauseFramesReceived: 0 > dev.bge.0.stats.rx.ControlFramesReceived: 0 > dev.bge.0.stats.rx.xoffStateEntered: 0 > dev.bge.0.stats.rx.FramesTooLong: 0 > dev.bge.0.stats.rx.Jabbers: 0 > dev.bge.0.stats.rx.UndersizePkts: 0 > dev.bge.0.stats.rx.inRangeLengthError: 0 > dev.bge.0.stats.rx.outRangeLengthError: 0 > dev.bge.0.stats.tx.ifHCOutOctets: 39056 > dev.bge.0.stats.tx.Collisions: 0 > dev.bge.0.stats.tx.XonSent: 0 > dev.bge.0.stats.tx.XoffSent: 0 > dev.bge.0.stats.tx.flowControlDone: 0 > dev.bge.0.stats.tx.InternalMacTransmitErrors: 0 > dev.bge.0.stats.tx.SingleCollisionFrames: 0 > dev.bge.0.stats.tx.MultipleCollisionFrames: 0 > dev.bge.0.stats.tx.DeferredTransmissions: 0 > dev.bge.0.stats.tx.ExcessiveCollisions: 0 > dev.bge.0.stats.tx.LateCollisions: 0 > dev.bge.0.stats.tx.UnicastPkts: 468 > dev.bge.0.stats.tx.MulticastPkts: 0 > dev.bge.0.stats.tx.BroadcastPkts: 1 > dev.bge.0.stats.tx.CarrierSenseErrors: 0 > dev.bge.0.stats.tx.Discards: 0 > dev.bge.0.stats.tx.Errors: 0 > dev.bge.0.wake: 0 > > In the time that it's taken me to compose this message > the timeout has fire again. > > 15:47:01 kernel: Limiting icmp unreach response from 662 to 200 packets/sec > 15:47:02 kernel: Limiting icmp unreach response from 446 to 200 packets/sec > 15:47:03 kernel: Limiting icmp unreach response from 436 to 200 packets/sec > 15:47:04 kernel: Limiting icmp unreach response from 464 to 200 packets/sec > 15:47:05 kernel: Limiting icmp unreach response from 438 to 200 packets/sec > 15:47:06 kernel: Limiting icmp unreach response from 445 to 200 packets/sec > 15:47:07 kernel: bge0: watchdog timeout -- resetting > 15:47:07 kernel: bge0: link state changed to DOWN > 15:47:07 kernel: Limiting icmp unreach response from 439 to 200 packets/sec > 15:47:08 kernel: Limiting icmp unreach response from 330 to 200 packets/sec > 15:47:11 kernel: bge0: link state changed to UP > 15:47:12 kernel: Limiting icmp unreach response from 214 to 200 packets/sec > 15:47:13 kernel: Limiting icmp unreach response from 202 to 200 packets/sec > 15:47:14 kernel: Limiting icmp unreach response from 238 to 200 packets/sec > 15:49:42 kernel: bge0: link state changed to DOWN > 15:49:44 kernel: bge0: link state changed to UP > > I not seen these icmp unreach response messages. > The icmp unreach has nothing to do with bge(4). Check whether a server that listens on an UDP port is still alive on your box. What worries me is bge(4) watchdog timeouts. It looks like your controller is BCM5704. I also have bge(4) regression report from marius on sparc64. He said r213945 seemed to cause the issue and I'm working on the issue. Could you also try the attached patch? --7qSK/uQB79J36Y4o Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="bge.rxprod.patch" Index: sys/dev/bge/if_bge.c =================================================================== --- sys/dev/bge/if_bge.c (revision 213695) +++ sys/dev/bge/if_bge.c (working copy) @@ -1619,9 +1619,6 @@ CSR_WRITE_4(sc, BGE_RX_STD_RCB_MAXLEN_FLAGS, rcb->bge_maxlen_flags); CSR_WRITE_4(sc, BGE_RX_STD_RCB_NICADDR, rcb->bge_nicaddr); - /* Reset the standard receive producer ring producer index. */ - bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, 0); - /* * Initialize the jumbo RX producer ring control * block. We set the 'ring disabled' bit in the @@ -1665,6 +1662,9 @@ bge_writembx(sc, BGE_MBX_RX_MINI_PROD_LO, 0); } + /* Reset the standard receive producer ring producer index. */ + bge_writembx(sc, BGE_MBX_RX_STD_PROD_LO, 0); + /* * The BD ring replenish thresholds control how often the * hardware fetches new BD's from the producer rings in host --7qSK/uQB79J36Y4o--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101011231604.GI4607>