FreeBSD Mail Archives

Date:      Wed, 21 Nov 2007 10:37:08 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        chrcoluk@gmail.com
Cc:        pyunyh@gmail.com, oleg.lomaka@gmail.com, freebsd-stable@FreeBSD.org
Subject:   Re: any hope for nfe/msk?
Message-ID:  <200711211837.lALIb8gB065394@gw.catspoiler.org>
In-Reply-To: <3aaaa3a0711211019h5cb8da70te146c3a7e3a556ca@mail.gmail.com>

On 21 Nov, Chris wrote:
> On 07/11/2007, Pyun YongHyeon <pyunyh@gmail.com> wrote:
>> On Wed, Nov 07, 2007 at 02:28:00PM +0200, Oleg Lomaka wrote:
>>  > Hello,
>>  >
>>  > Pyun YongHyeon wrote:
>>  > >On Thu, Nov 01, 2007 at 10:59:48AM +0200, Oleg Lomaka wrote:
>>  > > > Hello,
>>  > > >
>>  > > > Pyun YongHyeon wrote:
>>  > > > >On Tue, Oct 30, 2007 at 04:01:04PM +0200, Oleg Lomaka wrote:
>>  > > > >
>>  > > > >[...]
>>  > > > >
>>  > > > > > I had RxFIFO overrun again :(
>>  > > > > > from dmest:
>>  > > > > > msk0: Rx FIFO overrun!
>>  > > > >
>>  > > > >[...]
>>  > > > >
>>  > > > >Please try attached patch again. Sorry for the trouble.
>>  > > > >After applying the patch show me verbosed dmesg output related with
>>  > > > >msk(4)/PHY driver.
>>  > > > >
>>  > > > >Thanks for testing.
>>  > > > >
>>  > > > pcib1: <MPTable PCI-PCI bridge> irq 16 at device 28.0 on pci0
>>  > > > pcib1:   domain            0
>>  > > > pcib1:   secondary bus     2
>>  > > > pcib1:   subordinate bus   2
>>  > > > pcib1:   I/O decode        0x2000-0x2fff
>>  > > > pcib1:   memory decode     0xd0100000-0xd01fffff
>>  > > > pcib1:   no prefetched decode
>>  > > > pci2: <PCI bus> on pcib1
>>  > > > pci2: domain=0, physical bus=2
>>  > > > found-> vendor=0x11ab, dev=0x4352, revid=0x14
>>  > > >        domain=0, bus=2, slot=0, func=0
>>  > > >        class=02-00-00, hdrtype=0x00, mfdev=0
>>  > > >        cmdreg=0x0007, statreg=0x4010, cachelnsz=16 (dwords)
>>  > > >        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
>>  > > >        intpin=a, irq=11
>>  > > >        powerspec 2  supports D0 D1 D2 D3  current D0
>>  > > >        MSI supports 2 messages, 64 bit
>>  > > >        map[10]: type Memory, range 64, base 0xd0100000, size 14, enabled
>>  > > > pcib1: requested memory range 0xd0100000-0xd0103fff: good
>>  > > >        map[18]: type I/O Port, range 32, base 0x2000, size  8, enabled
>>  > > > pcib1: requested I/O range 0x2000-0x20ff: in range
>>  > > > pcib1: slot 0 INTA routed to irq 16
>>  > > > mskc0: <Marvell Yukon 88E8038 Gigabit Ethernet> port 0x2000-0x20ff mem
>>  > > > 0xd0100000-0xd0103fff irq 16 at device 0.0 on pci2
>>  > > > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd0100000
>>  > > > mskc0: MSI count : 2
>>  > > > mskc0: RAM buffer size : 4KB
>>  > > > mskc0: Port 0 : Rx Queue 2KB(0x00000000:0x000007ff)
>>  > > > mskc0: Port 0 : Tx Queue 2KB(0x00000800:0x00000fff)
>>  > > > msk0: <Marvell Technology Group Ltd. Yukon FE Id 0xb7 Rev 0x01> on mskc0
>>  > > > msk0: bpf attached
>>  > > > msk0: Ethernet address: 00:1b:24:0e:bc:26
>>  > > > miibus0: <MII bus> on msk0
>>  > > > e1000phy0: <Marvell 88E3082 10/100 Fast Ethernet PHY> PHY 0 on miibus0
>>  > > > e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
>>  > > > ioapic0: routing intpin 16 (PCI IRQ 16) to vector 49
>>  > > > mskc0: [MPSAFE]
>>  > > > mskc0: [FILTER]
>>  > > >
>>  > >
>>  > >So far all looks good to me. If you encounter watchdog timeouts
>>  > >or Rx FIFO overruns let me know.
>>  > >
>>  > >
>>  >
>>  > Got it again:
>>  > msk0: Rx FIFO overrun!
>>  > I believe this is happening under heavy CPU usage. Now i have firefox
>>  > compiling and watched pictures on remote windows box using rdesktop. And
>>  > after few minutes got network freeze.
>>
>> If it only happens under heavy system loads it's probably normal. If
>> system is too busy to serve other jobs the msk(4) may not recevie
>> more packets because its receive buffer was full. Probably msk(4)
>> should just count the overrun errors without printing the message
>> such that it would save more CPU cycles.
>> Btw, did you also see watchdog timeout errors?
>>
>>  > But it looks i didn't get any packet lost :). Take a look at ping
>>  > statistics... funny...
>>
>> I guess something is wrong here. Latency is unacceptable. However
>> I have no idea why ICMP echo reponse takes so long time. Are you
>> using any power saving mechanism(powerd, cpufreq etc)?
>>
>>  > tdevil% ping 10.1.1.254
>>  > PING 10.1.1.254 (10.1.1.254): 56 data bytes
>>  > 64 bytes from 10.1.1.254: icmp_seq=0 ttl=64 time=35926.404 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=1 ttl=64 time=34925.694 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=2 ttl=64 time=33924.729 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=3 ttl=64 time=32923.814 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=4 ttl=64 time=31922.833 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=5 ttl=64 time=30921.878 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=6 ttl=64 time=29920.923 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=7 ttl=64 time=28919.960 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=8 ttl=64 time=27919.009 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=9 ttl=64 time=26918.042 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=10 ttl=64 time=25917.078 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=11 ttl=64 time=24916.115 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=12 ttl=64 time=23915.144 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=13 ttl=64 time=22914.192 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=14 ttl=64 time=21913.214 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=15 ttl=64 time=20912.278 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=16 ttl=64 time=19911.330 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=17 ttl=64 time=18910.375 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=18 ttl=64 time=17909.419 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=19 ttl=64 time=16853.821 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=20 ttl=64 time=15854.710 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=21 ttl=64 time=14701.312 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=22 ttl=64 time=13701.003 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=23 ttl=64 time=12700.052 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=24 ttl=64 time=11699.098 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=25 ttl=64 time=10698.148 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=36 ttl=64 time=0.463 ms
>>  > 64 bytes from 10.1.1.254: icmp_seq=37 ttl=64 time=0.379 ms
>>  >
>>
>> --
>> Regards,
>> Pyun YongHyeon
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>>
> 
> I started having problems on nfe driver now I was using on 6.2 stable
> and I had polling enabled, the entire system was lagging and even when
> idle.  I have no upgraded the box in question to 7.0 beta 3 and
> keeping the nfe driver on.
> 
> irq22: nfe0 ehci0                1652548         20
> 
> It hasnt had heavy load since the upgrade yet.
> 
> ehci0: <EHCI (generic) USB 2.0 controller>
> 
> I have no local access so cannot disable usb in the bios, if I do a
> new kernel disabling ehci in the kernel config will this stop the
> interrupt sharing and allow me to use nfe reasonably without polling
> as I think polling itself has been causing me problems (i use nfs).
> 
> Is nfe still getting development as these are existing problems that
> are known but there has been no update to the below page for a while
> now so I am curious if its dead in the water now.
> 
> http://www.f.csce.kyushu-u.ac.jp/~shigeaki/software/freebsd-nfe.html
> 
> Chris

I've also seen wierd problems on a machine that shares an interrupt
between nfe and ehci.  I'm hoping that this recent commit to -CURRENT
fixes the problem.  I'm planning on trying it on my 7.0-BETA machine in
the next day or so.

scottl      2007-11-21 04:03:51 UTC

  FreeBSD src repository

  Modified files:
    sys/amd64/amd64      intr_machdep.c 
    sys/i386/i386        intr_machdep.c 
    sys/ia64/ia64        interrupt.c 
    sys/powerpc/powerpc  intr_machdep.c 
    sys/sparc64/sparc64  intr_machdep.c 
  Log:
  Extend critical section coverage in the low-level interrupt handlers to
  include the ithread scheduling step.  Without this, a preemption might
  occur in between the interrupt getting masked and the ithread getting
  scheduled.  Since the interrupt handler runs in the context of curthread,
  the scheudler might see it as having a such a low priority on a busy system
  that it doesn't get to run for a _long_ time, leaving the interrupt stranded
  in a disabled state.  The only way that the preemption can happen is by
  a fast/filter handler triggering a schduling event earlier in the handler,
  so this problem can only happen for cases where an interrupt is being
  shared by both a fast/filter handler and an ithread handler.  Unfortunately,
  it seems to be common for this sharing to happen with network and USB
  devices, for example.  This fixes many of the mysterious TCP session
  timeouts and NIC watchdogs that were being reported.  Many thanks to Sam
  Lefler for getting to the bottom of this problem.
  
  Reviewed by: jhb, jeff, silby
  
  Revision  Changes    Path
  1.35      +1 -1      src/sys/amd64/amd64/intr_machdep.c
  1.30      +1 -1      src/sys/i386/i386/intr_machdep.c
  1.62      +1 -1      src/sys/ia64/ia64/interrupt.c
  1.14      +1 -1      src/sys/powerpc/powerpc/intr_machdep.c
  1.28      +1 -1      src/sys/sparc64/sparc64/intr_machdep.c

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200711211837.lALIb8gB065394>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation