Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Oct 2006 12:31:54 +0200
From:      Guy Brand <gb@isis.u-strasbg.fr>
To:        freebsd-stable@freebsd.org
Subject:   Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]
Message-ID:  <20061004103154.GK1276@isis.u-strasbg.fr>
In-Reply-To: <20060930011904.GA62626@nowhere>
References:  <451AA7B1.5080202@samsco.org> <20060927191402.GB932@turion.vk2pj.dyndns.org> <20060927210349.GG14975@tnn.dglawrence.com> <451AEB02.2090806@samsco.org> <002201c6e290$45ece980$b3db87d4@multiplay.co.uk> <451BD89F.8080203@samsco.org> <451C1F6D.2020302@mail.uni-mainz.de> <7.0.1.0.0.20060928152807.17bbe448@sentex.net> <451C271A.9040904@samsco.org> <20060930011904.GA62626@nowhere>

next in thread | previous in thread | raw e-mail | index | archive | help
Craig Boston (craig@feniz.gank.org) on 29/09/2006 at 20:19 wrote:

> One thing this patch definitely did do though, is break the nvidia
> driver pretty badly.  Couldn't keep the X server running for more than a
> minute before it froze solid.  Lots of Xid: blah blah blah messages.
> Yes I remembered to rebuild the kernel module ;)

  Hi,


  Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
  Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
  IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):

  interrupt                          total       rate
  irq1: atkbd0                           5          0
  irq14: ata0                           47          0
  irq16: nvidia0 em+                 86545        185
  irq17: fwohci0                         7          0
  irq21: twe0                         6426         13
  cpu0: timer                       927735       1986
  Total                            1020765       2185

  I freeze the box by starting firefox which reloads a few tabs I keep
  open in my session when under X. This is perfectly reproductible.
  From the logs, first I see:

    Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010597
    Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 00000000
    Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010598
    Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 00010599
    Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059a
    Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059b
    Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059c
    Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059d
    Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059e
    Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0001059f
    Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a0

  then come the watchdogs:

    Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a1
    Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a2
    Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a3
    Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a4
    Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 000105a5
    Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
    Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
    Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
    Oct  2 16:49:49 mojito kernel: em0: link state changed to UP

  and the box ends up frozen less than a minute later. The traffic
  on the Intel card can be low (pinging a host for a few dozen of
  seconds), medium (reloading a few pages in the tabs of Firefox) or
  high (downloading several iso images from our local FTP mirror):
  whatever I do, if both nvidia and em0 are used, the box freezes.

  Note that I can't freeze the box when doing several simultaneous big
  downloads or taring up a lot of files but NOT running X. So I guess
  it is a shared nvidia/em IRQ issue.

  FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
  The "DEBUG" kernconf is GENERIC + witness options enabled (but they
  do not help in this case).

  I traced back to find which changeset introduced the trouble. The
  results are:

    #*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
    # OK
    ...

    #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
    # OK
    #
    #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
    # BROKEN
    ...

    #*default release=cvs tag=RELENG_6
    # BROKEN

  From sys commitlogs the culprit commits are:

  glebius     2006-08-08 09:19:25 utc
  freebsd src repository

  modified files:        (branch: releng_6)
    sys/dev/em           if_em.c 
  log:
  sync with head. this includes the following changes in chronological
  order:
  
  o a significant performance improvements. the interrupt handler
    schedules work to a private taskqueue. the em_rxeof() function
    runs lockless.
    rev. 1.98 - 1.101 by scottl.
    rev. 1.103 by mux
    rev. 1.106 by glebius, from andrey v. elsukov <bu7cher yandex.ru>
    rev. 1.116 by glebius
  o style cleanups:
    - rev. 1.102, 1.108, 1.109 by glebius
    - rev. 1.124 by pdeuskar
  o vendor merges:
    - merged with vendor driver version 5.1.5 by jack vogel.
      rev. 1.115 by glebius
    - merged with vendor driver version 6.0.5 by jack vogel.
      rev. 1.123 by glebius
  o various fixes:
    - invalid use of bus_dma_allocnow
      rev. 1.104 by scott, 1.121 by yongari
    - link state handling cleanup.
      rev. 1.110 by glebius
    - fix if_baudrate handling.
      rev. 1.111 by glebius
    - honor iff_drv_oactive in em_start_locked().
      rev. 1.117 by yongari
    - protect eeprom access with the driver lock.
      rev. 1.118 by yongari
    - fix link flap on siocgifaddr.
      rev. 1.119 by yongari
    - fix dma map handling in em_encap().
      rev. 1.120,1.122 by yongari
  
  revision   changes      path
  1.65.2.17  +1587 -1443  src/sys/dev/em/if_em.c


  glebius     2006-08-08 09:20:26 utc
  freebsd src repository

  modified files:        (branch: releng_6)
    sys/dev/em           license readme if_em.h if_em_hw.c 
                         if_em_hw.h if_em_osdep.h 
  log:
  sync with head, merging vendor drivers updates 5.1.5, 6.0.5 by jack vogel.
  
  revision  changes     path
  1.3.2.1   +1 -1       src/sys/dev/em/license
  1.10.2.1  +71 -30     src/sys/dev/em/readme
  1.32.2.3  +133 -157   src/sys/dev/em/if_em.h
  1.16.2.2  +3186 -906  src/sys/dev/em/if_em_hw.c
  1.15.2.3  +712 -48    src/sys/dev/em/if_em_hw.h
  1.14.2.2  +46 -15     src/sys/dev/em/if_em_osdep.h


  I confirmed that by building a kernel from 2006.08.08.09.21.00 which
  shows the problem and a kernel from 2006.08.08.09.18.00 which works
  like a charm.

  Dunno if this could be linked to the em* watchdogs reported in this
  thread. Let me know if I can do something useful to help fixing this
  issue.

-- 
  bug




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061004103154.GK1276>