Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Mar 2010 21:57:45 -0800
From:      Steven Noonan <steven@uplinklabs.net>
To:        yongari@freebsd.org
Cc:        freebsd-net@freebsd.org
Subject:   Re: kern/144689: [re] TCP transfer corruption using if_re
Message-ID:  <f488382f1003122157i12968043h31c8020007f7e8a1@mail.gmail.com>
In-Reply-To: <f488382f1003121624j34a8aee8kc127e82c08c3fe37@mail.gmail.com>
References:  <201003121754.o2CHsH7V065932@freefall.freebsd.org> <f488382f1003121619y17780ed9x52765b9a9133fb2@mail.gmail.com> <f488382f1003121624j34a8aee8kc127e82c08c3fe37@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 12, 2010 at 4:24 PM, Steven Noonan <steven@uplinklabs.net> wrot=
e:
> On Fri, Mar 12, 2010 at 4:19 PM, Steven Noonan <steven@uplinklabs.net> wr=
ote:
>> On Fri, Mar 12, 2010 at 9:54 AM, =C2=A0<yongari@freebsd.org> wrote:
>>> Synopsis: [re] TCP transfer corruption using if_re
>>>
>>> State-Changed-From-To: open->feedback
>>> State-Changed-By: yongari
>>> State-Changed-When: Fri Mar 12 17:53:37 UTC 2010
>>> State-Changed-Why:
>>> This looks like Rx checksum offloading issue. Would you try
>>> disabling Rx checksum offloading and test it again?
>>> #ifconfig re0 -rxcsum
>>> Also show me dmesg output(re(4) related part).
>>>
>>>
>>> Responsible-Changed-From-To: freebsd-net->yongari
>>> Responsible-Changed-By: yongari
>>> Responsible-Changed-When: Fri Mar 12 17:53:37 UTC 2010
>>> Responsible-Changed-Why:
>>> Mine.
>>>
>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D144689
>>>
>>
>> Hmm. Disabling Rx checksum offloading helped for one clone process,
>> but then this showed up in dmesg during my second test (it seems to be
>> doing this regularly for some reason):
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>>
>> And no, the cable isn't loose or something. It just decides to take
>> the interface down and put it back up.
>>
>> Here's the rest of 'dmesg | grep re0':
>>
>> firewire0: <IEEE1394(FireWire) bus> on fwohci0
>> dcons_crom0: <dcons configuration ROM> on firewire0
>> fwe0: <Ethernet over FireWire> on firewire0
>> fwip0: <IP over FireWire> on firewire0
>> firewire0: 1 nodes, maxhop <=3D 0 cable IRM irm(0) =C2=A0(me)
>> firewire0: bus manager 0
>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet>
>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on
>> cardbus0
>> re0: Chip rev. 0x10000000
>> re0: MAC rev. 0x00000000
>> miibus1: <MII bus> on re0
>> re0: Ethernet address: 00:18:4d:6e:c0:29
>> re0: [FILTER]
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: detached
>> re0: <RealTek 8169/8169S/8169SB(L)/8110S/8110SB(L) Gigabit Ethernet>
>> port 0x1200-0x12ff mem 0x88000000-0x880001ff irq 18 at device 0.0 on
>> cardbus0
>> re0: Chip rev. 0x10000000
>> re0: MAC rev. 0x00000000
>> miibus1: <MII bus> on re0
>> re0: Ethernet address: 00:18:4d:6e:c0:29
>> re0: [FILTER]
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: PHY read failed
>> re0: link state changed to DOWN
>> re0: link state changed to UP
>> re0: PHY read failed
>>
>> - Steven
>>
>
> I should note that the connection was _lost_ during the second test above=
.
>
> I also tested again, and it looks like it added another "re0: PHY read
> failed" before silently dropping the connection.
>
> - Steven
>

I did a couple captures with Wireshark on the client end. One is with
rxcsum enabled on the machine running git-daemon, one is without
rxcsum.

http://www.uplinklabs.net/~tycho/files/git-cap-norxcsum.bz2
http://www.uplinklabs.net/~tycho/files/git-cap.bz2

Obviously, you can look at the data yourself and make more sense of
it, but here are things I noticed in the captures:

With rxcsum:
- There are some silent problems that occur in the middle of the
capture. Client-to-server: 'TCP ACKed lost segment' a few times, then
'TCP previous segment lost'. This happens multiple times during the
capture (before 'git-upload-pack' starts sending data).
- Occasional 'TCP window update's. These are highlighted in black for
reasons unknown to me. It seems like this would be normal.
- The server calls 'git-upload-pack' and we start seeing a large
number of client-to-server TCP RST flags being sent and then the
connection gets closed due to some detected data corruption in the
transfer.

Without rxcsum:
- About the same amount of client-to-server 'TCP ACKed lost segment's.
- 'git-upload-pack' kicks in and things get _really_ hairy. 'TCP Dup
ACK' detected by the client many many times.
- Finally, a series of 'TCP retransmission's from server to client
happen (which is where the connection hangs).
- I closed the connection which triggered the final two 'TCP RST's.

Also, I forgot to note in my original report that I checked if there
was packet loss by using a ping flood, and one packet in the 1.5
million packets sent was lost. But I'm not sure whether it's
checksumming the data of these packets, so they could be coming back
with perfectly valid ICMP headers but corrupted data.

- Steven



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f488382f1003122157i12968043h31c8020007f7e8a1>