Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Nov 2004 17:11:54 -0500
From:      John Baldwin <jhb@FreeBSD.org>
To:        Sten Spans <sten@blinkenlights.nl>
Cc:        freebsd-alpha@FreeBSD.org
Subject:   Re: alpha and em mtu
Message-ID:  <200411221711.54916.jhb@FreeBSD.org>
In-Reply-To: <Pine.SOC.4.61.0411222147180.10997@tea.blinkenlights.nl>
References:  <Pine.SOC.4.61.0411142153430.26307@tea.blinkenlights.nl> <200411221432.42028.jhb@FreeBSD.org> <Pine.SOC.4.61.0411222147180.10997@tea.blinkenlights.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 22 November 2004 04:15 pm, Sten Spans wrote:
> On Mon, 22 Nov 2004, John Baldwin wrote:
> > On Sunday 21 November 2004 07:35 am, Sten Spans wrote:
> >>> Does this panic go
> >>> away if you use a different MTU btw?
> >>
> >> I've tried running
> >>
> >> i=1; while true; echo $i; ifconfig em0 mtu $i; let i++; sleep 2;
> >>
> >> and on the client:
> >> while true; do echo bla | telnet alpha 22; sleep 1; done
> >>
> >> this caused no crashes with mtu 1-1500.
> >>
> >> But:
> >> deepthought# ifconfig em0 mtu 1666
> >> deepthought# tcp_input: ip 0xfffffc0018cdb00e is misaligned
> >> deepthought# ifconfig em0 mtu 1564
> >> deepthought# tcp_input: ip 0xfffffc001857c80e is misaligned
> >> deepthought# ifconfig em0 mtu 1532
> >> deepthought# tcp_input: ip 0xfffffc001859300e is misaligned
> >>
> >> If it has to be 8 bytes aligned then it's off by 4, doesn't
> >> seem to be vlanmtu though.
>
> erm, that would be 2.
>
> > Ok, this is helpful I think.  (Big MTU -> panic.)
>
> Another thing is :
>
> deepthought# ifconfig em0 mtu 9000
> sten@ford:~$ ping -s 8000 intern.dt
> PING intern.deepthought.blinkenlights.nl (192.168.1.3) 8000(8028) bytes of
> data. 8008 bytes from intern.deepthought.blinkenlights.nl (192.168.1.3):
> icmp_seq=1 ttl=64 time=1.19 ms 8008 bytes from
> intern.deepthought.blinkenlights.nl (192.168.1.3): icmp_seq=2 ttl=64
> time=0.756 ms
>
> 21:59:12.587494 IP intern.ford > intern.deepthought.blinkenlights.nl: icmp
> 8008: echo request seq 1 21:59:12.588223 IP
> intern.deepthought.blinkenlights.nl > intern.ford: icmp 8008: echo reply
> seq 1 21:59:13.587730 IP intern.ford > intern.deepthought.blinkenlights.nl:
> icmp 8008: echo request seq 2
>
> Aka icmp does work, which makes me think that the
> problem is tcp specific. I've also tried disabling all
> the sack/tcp sysctl's but that didn't seem to help.
> And I've tried connecting from a box with mtu 1500,
> but that also caused the same panic.
>
>
> I'll get an sk card soonish which will allow me to double
> check this panic with another nic. Although I would not guess
> that the panic is driver specific. Which makes me wonder why
> lo0 does work:
> deepthought# ifconfig lo0 mtu 1501
> deepthought# telnet 127.0.0.1 22
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> SSH-2.0-OpenSSH_3.8.1p1 FreeBSD-20040419
>
> > The next step is probably
> > to start walking up the stack determining where the pointer starts off
> > and how it ends up aligned.  Can you use gdb to figure out the source
> > file/line of the previous stack frame before tcp_input()?
>
> sure:
>
> db> trace
> tcp_input() at tcp_input+0x3a4
> ip_input() at ip_input+0x9fc
> netisr_processqueue() at netisr_processqueue+0xac
> swi_net() at swi_net+0xf0
> ithread_loop() at ithread_loop+0x1d4
> fork_exit() at fork_exit+0x100
> exception_return() at exception_return
> --- root of call graph ---
>
> (gdb) l *tcp_input+0x3a4
> 0xfffffc00004cd054 is in tcp_input (/usr/src/sys/netinet/tcp_input.c:554).
> 549
> 550             /*
> 551              * Check that TCP offset makes sense,
> 552              * pull out TCP options and adjust length.
> XXX
> 553              */
> 554             off = th->th_off << 2;
> 555             if (off < sizeof (struct tcphdr) || off > tlen) {
> 556                     tcpstat.tcps_rcvbadoff++;
> 557                     goto drop;
> 558             }
> (gdb) l *ip_input+0x9fc
> 0xfffffc00004c355c is in ip_input (/usr/src/sys/netinet/ip_input.c:739).
> 734             /*
> 735              * Switch out to protocol's input routine.
> 736              */
> 737             ipstat.ips_delivered++;
> 738
> 739             (*inetsw[ip_protox[ip->ip_p]].pr_input)(m, hlen);
> 740             return;
> 741     bad:
> 742             m_freem(m);
> 743     }
> (gdb) l *netisr_processqueue+0xac
> 0xfffffc00004ad45c is in netisr_processqueue
> (/usr/src/sys/net/netisr.c:233).
> 228
> 229             for (;;) {
> 230                     IF_DEQUEUE(ni->ni_queue, m);
> 231                     if (m == NULL)
> 232                             break;
> 233                     ni->ni_handler(m);
> 234             }
> 235     }

Hmm, so can you check here to see if the 'm' pointer in this routine is 
misaligned?  If so, then this may be a driver bug.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200411221711.54916.jhb>