Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Jun 2007 10:19:49 -0400
From:      Bill Moran <wmoran@collaborativefusion.com>
To:        freebsd-net@freebsd.org
Subject:   Weird "ignoring syn" problem
Message-ID:  <20070612101949.646dcaa5.wmoran@collaborativefusion.com>

next in thread | raw e-mail | index | archive | help

This one has got me pretty befuddled.

We're seeing some really odd behaviour with FreeBSD ignoring SYN packets.
I've been trying to diagnose this for a couple of weeks now, and my current
guess is that there's something wrong with the em driver.  Here's a narrowed
down list of what I've ruled out:
*) I've done my best to eliminate other network components as the problem.
   My theory at this point is that it can't possibly be any other network
   hardware, based on the tcpdump show below.
*) The problem occurred on both FreeBSD 6.1 and FreeBSD 6.2-p3.
*) The problem does not appear to be tied to CPU usage -- the CPU is nearly
   idle when the problem occurs.
*) I can now reproduce it pretty easily, so I'll know when it's fixed.
*) The system exhibiting the problem is running 15 jails, but they are
   idle 95% of the time.  The problem initially occurred inside one of
   the jails, but I just recreated it outside the jail (on the host) and
   it's _easier_ to reproduce outside the jail.
*) The problem occurred with both GENERIC, and the SMP kernel (this is a
   dual-CPU, hyperthreaded system)
*) I've tested and the behavior occurs both with a dynamically generated
   file (from PHP) or from a static file.

The nature of the beast is that we've got a SOAP application running under
Apache and PHP.  This application is subject to many requests in rapid
succession, such that load can be simulated by the following loop:

while true; do fetch http://192.168.121.250/test.php; done

The problem is that occasionally, the Apache server machine just ignores
SYN packets.  Take the following tcpdump output for example:

13:34:17.312296 IP web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S 2645061726:2645061726(0) win 65535 <mss 1380,nop,wscale 1,nop,nop,timestamp 2690201156 0,sackOK,eol>
13:34:20.312398 IP web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S 2645061726:2645061726(0) win 65535 <mss 1380,nop,wscale 1,nop,nop,timestamp 2690204156 0,sackOK,eol>
13:34:23.512626 IP web04-v100.cust00.pitbpa1.priv.collaborativefusion.com.54808 > anchor-is00.is.pitbpa1.priv.collaborativefusion.com.http: S 2645061726:2645061726(0) win 65535 <mss 1380,nop,wscale 1,nop,nop,timestamp 2690207356 0,sackOK,eol>

This is the _only_ traffic on port 80 during the test.  It looks like the
kernel has ignored the initial syn packet and two duplicates.  I've seen it
take as long as 45 seconds to establish a connection, and this causes
ugly performance problems, as well as frequent timeouts on the client end.
The only clue I've found so far is this output from netstat -s.

        153099 syncache entries added
                6184 retransmitted
                6491 dupsyn
                0 dropped
                150923 completed
                0 bucket overflow
                0 cache overflow
                235 reset
                1941 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures

Unfortunately, I've been unable to determine how to fix the problem.  Any
advice is welcome.

Details:
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RELEASE-p3 #2: Thu Jun  7 21:37:54 UTC 2007
    root@is00:/usr/obj/usr/src/sys/SMP
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.00GHz (2992.71-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
  AMD Features=0x20100000<NX,LM>
  Logical CPUs per core: 2
real memory  = 2147221504 (2047 MB)
avail memory = 2096107520 (1999 MB)
ACPI APIC Table: <DELL   PE BKC  >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic1: WARNING: intbase 32 != expected base 24
ioapic2: Changing APIC ID to 10
ioapic2: WARNING: intbase 64 != expected base 56
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 32-55 on motherboard
ioapic2 <Version 2.0> irqs 64-87 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: <DELL PE BKC> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
amr0: <LSILogic MegaRAID 1.53> mem 0xd80f0000-0xd80fffff,0xdfde0000-0xdfdfffff irq 46 at device 14.0 on pci2
amr0: delete logical drives supported by controller
amr0: <LSILogic PERC 4e/Si> Firmware 521X, BIOS H430, 256MB RAM
pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci1
pci3: <ACPI PCI bus> on pcib3
em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0xecc0-0xecff mem 0xdfbe0000-0xdfbfffff irq 37 at device 11.0 on pci3
em0: Ethernet address: 00:04:23:c8:ff:f4
em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0xec80-0xecbf mem 0xdfbc0000-0xdfbdffff irq 38 at device 11.1 on pci3
em1: Ethernet address: 00:04:23:c8:ff:f5
pcib4: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 5.0 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 0.0 on pci5
pci6: <ACPI PCI bus> on pcib6
em2: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0xdcc0-0xdcff mem 0xdf8e0000-0xdf8fffff irq 64 at device 7.0 on pci6
em2: Ethernet address: 00:13:72:4f:71:23
pcib7: <ACPI PCI-PCI bridge> at device 0.2 on pci5
pci7: <ACPI PCI bus> on pcib7
em3: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0xccc0-0xccff mem 0xdf6e0000-0xdf6fffff irq 65 at device 8.0 on pci7
em3: Ethernet address: 00:13:72:4f:71:24
pcib8: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci8: <ACPI PCI bus> on pcib8
uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0xace0-0xacff irq 16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0xacc0-0xacdf irq 19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xaca0-0xacbf irq 18 at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0: <Intel 82801EB/R (ICH5) USB 2.0 controller> mem 0xdff00000-0xdff003ff irq 23 at device 29.7 on pci0
ehci0: [GIANT-LOCKED]
usb3: EHCI version 1.0
usb3: companion controllers, 2 ports each: usb0 usb1 usb2
usb3: <Intel 82801EB/R (ICH5) USB 2.0 controller> on ehci0
usb3: USB revision 2.0
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
uhub4: vendor 0x413c product 0xa001, class 9/0, rev 2.00/0.00, addr 2
uhub4: multiple transaction translators
uhub4: 2 ports with 2 removable, self powered
pcib9: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci9: <ACPI PCI bus> on pcib9
pci9: <unknown> at device 5.0 (no driver attached)
pci9: <unknown> at device 5.1 (no driver attached)
pci9: <unknown> at device 5.2 (no driver attached)
atapci0: <SiI 0680 UDMA133 controller> port 0xbcf0-0xbcf7,0xbce4-0xbce7,0xbcd8-0xbcdf,0xbcd0-0xbcd3,0xbc70-0xbc7f mem 0xdf3fec00-0xdf3fecff irq 23 at device 6.0 on pci9
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
pci9: <display, VGA> at device 13.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel ICH5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0
ata0: <ATA channel 0> on atapci1
ata1: <ATA channel 1> on atapci1
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FAST]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xec000-0xeffff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ukbd0: Dell DRAC4, rev 1.10/0.00, addr 2, iclass 3/1
kbd2 at ukbd0
ums0: Dell DRAC4, rev 1.10/0.00, addr 2, iclass 3/1
ums0: 3 buttons and Z dir.
Timecounters tick every 1.000 msec
acd0: CDROM <TEAC CD-ROM CD-224E-N/3.AB> at ata0-master UDMA33
device_attach: afd0 attach returned 6
acd1: CDROM <VIRTUALCDROM DRIVE/> at ata2-slave PIO3
amr0: delete logical drives supported by controller
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 34680MB (71024640 sectors) RAID 1 (optimal)
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
Trying to mount root from ufs:/dev/amrd0s1a


-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070612101949.646dcaa5.wmoran>