Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Feb 2007 10:45:35 +0900
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        "Andrey V. Elsukov" <bu7cher@yandex.ru>
Cc:        "Bjoern A. Zeeb" <bzeeb-lists@lists.zabbadoz.net>, FreeBSD current mailing list <current@freebsd.org>, Sergey Zaharchenko <doublef-ctm@yandex.ru>
Subject:   Re: nve related LOR triggered by lots of small packets, and a hard hang
Message-ID:  <20070220014535.GC912@cdnetworks.co.kr>
In-Reply-To: <45D98B62.1060402@yandex.ru>
References:  <20070110120731.GA1515@shark.localdomain> <20070210171130.D47107@maildrop.int.zabbadoz.net> <45D98B62.1060402@yandex.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 19, 2007 at 02:34:58PM +0300, Andrey V. Elsukov wrote:
 > Bjoern A. Zeeb ¬á¬Ú¬ê¬Ö¬ä:
 > >>: lock order reversal:
 > >>:  1st 0xc3629f00 inp (tcpinp) @ 
 > >>/src/usr.src/sys/netinet/tcp_usrreq.c:801
 > >>:  2nd 0xc0a9feec tcp (tcp) @ /src/usr.src/sys/netinet/tcp_input.c:626
 > >
 > >I add this with LOR ID 200 to the LOR page:
 > >    http://sources.zabbadoz.net/freebsd/lor.html#200
 > Hi, All.
 > 
 > I have this LOR and deadlock on my notebook with recent CURRENT.
 > 
 > My hardvare detected with nve(4) as "NVIDIA nForce MCP13 Networking 
 > Adapter":
 > 
 > nve0: <NVIDIA nForce MCP13 Networking Adapter> port 0x30b8-0x30bf mem 
 > 0xc0007000-0xc0007fff irq 5 at device 20.0 on pci0
 > nve0: Ethernet address 00:90:f5:4f:18:1b
 > miibus0: <MII bus> on nve0
 > rlphy0: <RTL8201L 10/100 media interface> PHY 1 on miibus0
 > rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 > nve0: using obsoleted if_watchdog interface
 > 
 > With nfe(4) as "NVIDIA nForce 430 MCP13 Networking Adapter":
 > 
 > nfe0: <NVIDIA nForce 430 MCP13 Networking Adapter> port 0x30b8-0x30bf 
 > mem 0xc0007000-0xc0007fff irq 5 at device 20.0 on pci0
 > miibus0: <MII bus> on nfe0
 > rlphy0: <RTL8201L 10/100 media interface> PHY 1 on miibus0
 > rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 > nfe0: using obsoleted if_watchdog interface
 > 
 > DDB message:
 > lock order reversal:
 >  1st 0xc2ec2480 inp (tcpinp) @ /usr/src/sys/netinet/tcp_usrreq.c:801
 >  2nd 0xc07bbc6c tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:638
 > KDB: stack backtrace:
 > db_trace_self_wrapper(c06ee408) at db_trace_self_wrapper+0x25
 > kdb_backtrace(0,ffffffff,c077d708,c077d730,c072f944,...) at 
 > kdb_backtrace+0x29
 > witness_checkorder(c07bbc6c,9,c06fcd51,27e) at witness_checkorder+0x586
 > _mtx_lock_flags(c07bbc6c,0,c06fcd51,27e,0,...) at _mtx_lock_flags+0x84
 > tcp_input(c2def300,14,c0759b80,835115ac,0,...) at tcp_input+0x432
 > ip_input(c2def300) at ip_input+0x5c9
 > netisr_dispatch(2,c2def300,0,c2be2000,c2df0800,...) at 
 > netisr_dispatch+0x58
 > ether_demux(c2be2000,c2def300,c2def300,c314da10,d3ee075c,...) at 
 > ether_demux+0x28a
 > ether_input(c2be2000,c2def300,c2b53cd8,0,c31605fb,...) at 
 > ether_input+0x21e
 > nve_ospacketrx(c2b53c00,d3ee0798,1,0,0,...) at nve_ospacketrx+0xa4
 > UpdateReceiveDescRingData(c315fc64,c315fc84,c315fd24,c315fd50,c315fd68,...) 
 > at UpdateReceiveDescRingData+0x2f8
 > nve_osalloc(c2b80940,d4152010,c2b53c00,c315fbcc,c315fc64,...) at 
 > nve_osalloc
 > _end(0,c2c5c008,3065766e,0,0,...) at 0xc2b80900
 > _end(c2b80940,d4152010,c2b53c00,c315fbcc,c315fc64,...) at 0xc2af3630
 > _end(0,c2c5c008,3065766e,0,0,...) at 0xc2b80900
 > < ..too many strings ..>
 > 
 > db> where
 > Tracing pid 1045 tid 100068 td 0xc2dea510
 > kdb_enter(c06bbc68) at kdb_enter+0x2b
 > witness_checkorder(c07bbc6c,9,c06fcd51,27e) at witness_checkorder+0x599
 > _mtx_lock_flags(c07bbc6c,0,c06fcd51,27e,0,...) at _mtx_lock_flags+0x84
 > tcp_input(c2def300,14,c0759b80,835115ac,0,...) at tcp_input+0x432
 > ip_input(c2def300) at ip_input+0x5c9
 > netisr_dispatch(2,c2def300,0,c2be2000,c2df0800,...) at 
 > netisr_dispatch+0x58
 > ether_demux(c2be2000,c2def300,c2def300,c314da10,d3ee075c,...) at 
 > ether_demux+0x28a
 > ether_input(c2be2000,c2def300,c2b53cd8,0,c31605fb,...) at 
 > ether_input+0x21e
 > nve_ospacketrx(c2b53c00,d3ee0798,1,0,0,...) at nve_ospacketrx+0xa4
 > UpdateReceiveDescRingData(c315fc64,c315fc84,c315fd24,c315fd50,c315fd68,...) 
 > at UpdateReceiveDescRingData+0x2f8
 > nve_osalloc(c2b80940,d4152010,c2b53c00,c315fbcc,c315fc64,...) at 
 > nve_osalloc
 > _end(0,c2c5c008,3065766e,0,0,...) at 0xc2b80900
 > _end(c2b80940,d4152010,c2b53c00,c315fbcc,c315fc64,...) at 0xc2af3630
 > 
 > < ..too many strings ..>
 > 
 > db> ps
 >   pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
 >  1045  1043  1043    21  R       CPU 0               vsftpd
 > 
 > db> show allpcpu
 > Current CPU: 0
 > 
 > cpuid        = 0
 > curthread    = 0xc2dea510: pid 1045 "vsftpd"
 > curpcb       = 0xd3ee0d90
 > fpcurthread  = none
 > idlethread   = 0xc2a14510: pid 10 "idle"
 > APIC ID      = 0
 > currentldt   = 0x50
 > spin locks held:
 > 
 > db> show locks
 > exclusive sleep mutex inp (tcpinp) r = 0 (0xc2ec2480) locked @ 
 > /usr/src/sys/netinet/tcp_usrreq.c:801
 > 
 > db> show alllocks
 > Process 1045 (vsftpd) thread 0xc2dea510 (100068)
 > exclusive sleep mutex inp (tcpinp) r = 0 (0xc2ec2480) locked @ 
 > /usr/src/sys/netinet/tcp_usrreq.c:801
 > Process 21 (irq5: nvidia0 nve0) thread 0xc2ac2510 (100023)
 > exclusive sleep mutex tcp r = 0 (0xc07bbc6c) locked @ 
 > /usr/src/sys/netinet/tcp_input.c:638
 > 
 > db> call boot(0)
 > Waiting (max 60 seconds) for system process `vnlru' to stop...Sleeping 
 > on "suspkt" with the following non-sleepable locks held:
 > exclusive sleep mutex inp (tcpinp) r = 0 (0xc2ec2480) locked @ 
 > /usr/src/sys/netinet/tcp_usrreq.c:801
 > KDB: enter: witness_warn
 > [thread pid 1045 tid 100068 ]
 > Stopped at      kdb_enter+0x2b: nop
 > 
 > db> panic
 > panic: from debugger
 > Uptime: 7m18s
 > Physical memory: 434 MB
 > Dumping 47 MB: 32 16
 > Dump complete
 > 
 > If somebody have interest to help me resolve this problem, i can easy 
 > reproduce this deadlock - the simple downloading of big file (avi 
 > file, for example) from ftp server from my notebook will result deadlock.
 > 
 > With nfe(4) i don't have this deadlock, but nfe useless for me, he 
 > often displays message "nfe0: watchdog timeout".
 > 

Because your dmesg for nfe(4) shows 'obsoleted if_watchdog interface'
message I think you've used stock nfe(4) on CURRENT.
Try overhauld nfe(4) at the following URL.

http://people.freebsd.org/~yongari/nfe/if_nfe.c
http://people.freebsd.org/~yongari/nfe/if_nfereg.h
http://people.freebsd.org/~yongari/nfe/if_nfevar.h

The new nfe(4) has several protections for watchdog timeout errors
reported on the driver. One user repored TSO related issues for
the new driver so if you encounter strange errors in the new driver
please turn off TSO capability(e.g. #ifconfig nfe0 -tso)

 > -- 
 > WBR, Andrey V. Elsukov

-- 
Regards,
Pyun YongHyeon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070220014535.GC912>