From owner-freebsd-stable@FreeBSD.ORG Thu Dec 3 04:50:32 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFBDA106566B for ; Thu, 3 Dec 2009 04:50:32 +0000 (UTC) (envelope-from kasahara@nc.kyushu-u.ac.jp) Received: from elvenbow.cc.kyushu-u.ac.jp (unknown [IPv6:2001:200:905:1314::80]) by mx1.freebsd.org (Postfix) with ESMTP id A54038FC12 for ; Thu, 3 Dec 2009 04:50:31 +0000 (UTC) Received: from localhost (kasahara@localhost [IPv6:::1]) by elvenbow.cc.kyushu-u.ac.jp (8.14.3/8.14.3) with ESMTP id nB34oTUc008552 for ; Thu, 3 Dec 2009 13:50:30 +0900 (JST) (envelope-from kasahara@nc.kyushu-u.ac.jp) Date: Thu, 03 Dec 2009 13:50:29 +0900 (JST) Message-Id: <20091203.135029.693112347142133127.kasahara@nc.kyushu-u.ac.jp> To: freebsd-stable@freebsd.org From: Yoshiaki Kasahara X-Fingerprint: CDA2 B6B6 6796 0DD3 9D80 2602 E909 4623 A15E A074 X-URL: http://www.nc.kyushu-u.ac.jp/~kasahara/ X-Mailer: Mew version 6.2.50 on Emacs 23.1.50 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Thu_Dec__3_13_50_29_2009_711)--" Content-Transfer-Encoding: 7bit Subject: 8.0R TCP connection stall with large Send-Q X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2009 04:50:32 -0000 ----Next_Part(Thu_Dec__3_13_50_29_2009_711)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hello, Recently I updated my main PC from 7.2(7-STABLE around Oct 8th actually) to 8.0R (amd64). After that, I noticed that the Apache HTTP server (2.2.13) on the PC started to behave strangely. When I tried to see a page on the PC with a lot of thumbnails (more than 30 or 40 maybe), the page stopped loading and many thumbnail images were missing (not downloaded properly). It never happened until upgrade (of course I recompiled httpd and related libraries). After some investigation, I noticed that TCP connections stalled with large Send-Q. % netstat -n (just after a browser hit a page with many thumbnails) Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp6 0 14767 2001:200:905:131.80 2001:200:905:15f.1369 ESTABLISHED tcp6 0 13793 2001:200:905:131.80 2001:200:905:15f.1368 ESTABLISHED tcp6 0 13148 2001:200:905:131.80 2001:200:905:15f.1367 ESTABLISHED tcp6 0 16749 2001:200:905:131.80 2001:200:905:15f.1366 ESTABLISHED tcp6 0 15136 2001:200:905:131.80 2001:200:905:15f.1365 ESTABLISHED tcp6 0 17390 2001:200:905:131.80 2001:200:905:15f.1364 ESTABLISHED (omit other lines) Soon the state changed to FIN_WAIT_1, and never gone until I close the web browser. The example is IPv6, but it also happened with IPv4. It seems that the size of queues are almost the same as the size of thumbnail files. Also I cannot csup the source tree on this PC now, because the connection stall in the same way. Qpopper stalls when my mailbox contains some emails (I'm not sure about the exact threshold). Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 897 133.5.7.92.110 202.228.xxx.xxx.38167 ESTABLISHED tcp4 0 18053 133.5.6.4.54003 59.106.xxx.xxx.5999 ESTABLISHED On the other hand, I can SCP dozens of photos to the PC without problems, and downloading a single large file from the PC via HTTP is also fine. I copied some HTML pages with many thumbnails to other PC running 8.0-RC2 (i386), but I couldn't reproduce the problem. The former PC uses vge and latter uses rl, so it might be a NIC driver issue?? I'll try to replace NIC cards when I have time.... vge0: flags=8843 metric 0 mtu 1500 options=1b ether 00:02:2a:dd:05:41 inet 133.5.6.4 netmask 0xffffff00 broadcast 133.5.6.255 inet6 fe80::202:2aff:fedd:541%vge0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (100baseTX ) status: active vge1: flags=8843 metric 0 mtu 1500 options=1b ether 00:02:2a:dd:0c:31 inet 133.5.7.92 netmask 0xffffff00 broadcast 133.5.7.255 inet6 fe80::202:2aff:fedd:c31%vge1 prefixlen 64 scopeid 0x2 inet6 2001:200:905:1407:202:2aff:fedd:c31 prefixlen 64 autoconf media: Ethernet autoselect (100baseTX ) status: active vge2: flags=8843 metric 0 mtu 1500 options=1b ether 00:02:2a:dd:00:7d inet 133.69.133.98 netmask 0xfffffff0 broadcast 133.69.133.111 inet6 fe80::202:2aff:fedd:7d%vge2 prefixlen 64 scopeid 0x3 inet6 2001:200:905:1314::80 prefixlen 64 inet6 2001:200:905:1314:202:2aff:fedd:7d prefixlen 64 autoconf media: Ethernet autoselect (100baseTX ) status: active vge0@pci0:6:1:0: class=0x020000 card=0x01101106 chip=0x31191106 rev=0x11 hdr=0x00 vendor = 'VIA Technologies Inc' device = ''Velocity' Gigabit Ethernet Controllers (VT6120/VT6121/VT6122)' class = network subclass = ethernet vge1@pci0:6:2:0: class=0x020000 card=0x01101106 chip=0x31191106 rev=0x11 hdr=0x00 vendor = 'VIA Technologies Inc' device = ''Velocity' Gigabit Ethernet Controllers (VT6120/VT6121/VT6122)' class = network subclass = ethernet vge2@pci0:6:3:0: class=0x020000 card=0x01101106 chip=0x31191106 rev=0x11 hdr=0x00 vendor = 'VIA Technologies Inc' device = ''Velocity' Gigabit Ethernet Controllers (VT6120/VT6121/VT6122)' class = network subclass = ethernet % netstat -ni Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll vge0 1500 00:02:2a:dd:05:41 2124119 0 2158388 3 0 vge0 1500 133.5.6.0/24 133.5.6.4 1958844 - 2157884 - - vge0 1500 fe80:1::202:2 fe80:1::202:2aff: 0 - 1 - - vge1 1500 00:02:2a:dd:0c:31 1005409 0 32166 0 0 vge1 1500 133.5.7.0/24 133.5.7.92 431605 - 11761 - - vge1 1500 fe80:2::202:2 fe80:2::202:2aff: 0 - 1452 - - vge1 1500 2001:200:905: 2001:200:905:1407 0 - 16518 - - vge2 1500 00:02:2a:dd:00:7d 563715 0 855146 2 0 vge2 1500 133.69.133.96 133.69.133.98 445915 - 3658 - - vge2 1500 fe80:3::202:2 fe80:3::202:2aff: 0 - 1875 - - vge2 1500 2001:200:905: 2001:200:905:1314 0 - 65850 - - vge2 1500 2001:200:905: 2001:200:905:1314 0 - 0 - - I believe these Oerrs were due to watchdog timeout like this: Dec 2 13:12:04 elvenbow kernel: vge2: watchdog timeout Dec 2 13:12:04 elvenbow kernel: vge2: link state changed to DOWN Dec 2 13:12:06 elvenbow kernel: vge2: link state changed to UP Dec 3 02:50:27 elvenbow kernel: vge0: watchdog timeout Dec 3 02:50:27 elvenbow kernel: vge0: link state changed to DOWN Dec 3 02:50:30 elvenbow kernel: vge0: link state changed to UP Such timeouts were observed before upgrade, and no noticable effect to the network connection so far. I tried ifconfig vge2 -rxcsum -txcsum but had no effect. I'm using pf, but disabling pf also had no effect. Any ideas to narrow down the cause? Please let me know if you need additional information. I attached dmesg.boot. I'm sorry it is not a verbose output because still I'm compiling all the installed ports.... Regards, -- Yoshiaki Kasahara Research Institute for Information Technology, Kyushu University kasahara@nc.kyushu-u.ac.jp ----Next_Part(Thu_Dec__3_13_50_29_2009_711)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="dmesg.boot" Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.0-RELEASE #0: Thu Nov 26 14:33:01 JST 2009 root@elvenbow.cc.kyushu-u.ac.jp:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (3000.10-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0xf43 Stepping = 3 Features=0xbfebfbff Features2=0x649d AMD Features=0x20100800 TSC: P-state invariant real memory = 2147483648 (2048 MB) avail memory = 2053185536 (1958 MB) ACPI APIC Table: WARNING: Non-uniform processors. WARNING: Using suboptimal topology. ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 7ff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 vgapci0: mem 0xf4000000-0xf7ffffff,0xd0000000-0xd7ffffff,0xf8000000-0xf8ffffff irq 16 at device 0.0 on pci1 hdac0: mem 0xf3afc000-0xf3afffff irq 16 at device 27.0 on pci0 hdac0: HDA Driver Revision: 20090624_0136 hdac0: [ITHREAD] pcib2: at device 28.0 on pci0 pci5: on pcib2 pcib3: at device 28.1 on pci0 pci4: on pcib3 pcib4: at device 28.2 on pci0 pci3: on pcib4 pcib5: at device 28.3 on pci0 pci2: on pcib5 uhci0: port 0xcc00-0xcc1f irq 23 at device 29.0 on pci0 uhci0: [ITHREAD] uhci0: LegSup = 0x0f30 usbus0: on uhci0 uhci1: port 0xd000-0xd01f irq 19 at device 29.1 on pci0 uhci1: [ITHREAD] uhci1: LegSup = 0x0f30 usbus1: on uhci1 uhci2: port 0xd400-0xd41f irq 18 at device 29.2 on pci0 uhci2: [ITHREAD] uhci2: LegSup = 0x0f30 usbus2: on uhci2 uhci3: port 0xd800-0xd81f irq 16 at device 29.3 on pci0 uhci3: [ITHREAD] uhci3: LegSup = 0x0f30 usbus3: on uhci3 ehci0: mem 0xf3afb800-0xf3afbbff irq 23 at device 29.7 on pci0 ehci0: [ITHREAD] usbus4: EHCI version 1.0 usbus4: on ehci0 pcib6: at device 30.0 on pci0 pci6: on pcib6 vge0: port 0xb800-0xb8ff mem 0xf3b00000-0xf3b000ff irq 22 at device 1.0 on pci6 miibus0: on vge0 ciphy0: PHY 1 on miibus0 ciphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto vge0: WARNING: using obsoleted if_watchdog interface vge0: Ethernet address: 00:02:2a:dd:05:41 vge0: [ITHREAD] vge1: port 0xb400-0xb4ff mem 0xf3b00400-0xf3b004ff irq 18 at device 2.0 on pci6 miibus1: on vge1 ciphy1: PHY 1 on miibus1 ciphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto vge1: WARNING: using obsoleted if_watchdog interface vge1: Ethernet address: 00:02:2a:dd:0c:31 vge1: [ITHREAD] vge2: port 0xb000-0xb0ff mem 0xf3b00800-0xf3b008ff irq 19 at device 3.0 on pci6 miibus2: on vge2 ciphy2: PHY 1 on miibus2 ciphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto vge2: WARNING: using obsoleted if_watchdog interface vge2: Ethernet address: 00:02:2a:dd:00:7d vge2: [ITHREAD] isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 31.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] atapci1: port 0xec00-0xec07,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc0f mem 0xf3afbc00-0xf3afbfff irq 19 at device 31.2 on pci0 atapci1: [ITHREAD] atapci1: AHCI called from vendor specific driver atapci1: AHCI v1.00 controller with 4 1.5Gbps ports, PM not supported ata2: on atapci1 ata2: [ITHREAD] ata3: on atapci1 ata3: [ITHREAD] ata4: on atapci1 ata4: [ITHREAD] ata5: on atapci1 ata5: [ITHREAD] pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 atrtc0: port 0x70-0x71 irq 8 on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] fdc0: port 0x3f0-0x3f1,0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: [FILTER] ppc0: port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode ppc0: [ITHREAD] ppbus0: on ppc0 plip0: on ppbus0 plip0: [ITHREAD] lpt0: on ppbus0 lpt0: [ITHREAD] lpt0: Interrupt-driven port ppi0: on ppbus0 cpu0: on acpi0 est0: on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr f2d00000f2d device_attach: est0 attach returned 6 p4tcc0: on cpu0 orm0: at iomem 0xc0000-0xcefff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 3000099607 Hz quality 800 Timecounters tick every 1.000 msec usbus0: 12Mbps Full Speed USB v1.0 usbus1: 12Mbps Full Speed USB v1.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 ugen4.1: at usbus4 uhub4: on usbus4 acd0: DVDR at ata0-master UDMA33 ad4: 239372MB at ata2-master SATA150 ad6: 305245MB at ata3-master SATA150 hdac0: HDA Codec #2: Realtek ALC880 pcm0: at cad 2 nid 1 on hdac0 pcm1: at cad 2 nid 1 on hdac0 pcm2: at cad 2 nid 1 on hdac0 uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered uhub2: 2 ports with 2 removable, self powered uhub3: 2 ports with 2 removable, self powered GEOM: ad4s1: geometry does not match label (255h,63s != 16h,63s). GEOM: ad4s1: media size does not match label. GEOM: ad6s1: geometry does not match label (255h,63s != 16h,63s). GEOM: ad6s1: media size does not match label. GEOM_MIRROR: Device mirror/gm0s1 launched (2/2). GEOM: ad6s2: geometry does not match label (255h,63s != 16h,63s). Root mount waiting for: usbus4 Root mount waiting for: usbus4 Root mount waiting for: usbus4 uhub4: 8 ports with 8 removable, self powered Trying to mount root from ufs:/dev/mirror/gm0s1a acd0: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 sks=0x40 0x00 0x01 (probe0:ata0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 (probe0:ata0:0:0:0): CAM Status: SCSI Status Error (probe0:ata0:0:0:0): SCSI Status: Check Condition (probe0:ata0:0:0:0): NOT READY csi:0,0,bb,0 asc:3a,0 (probe0:ata0:0:0:0): Medium not present (probe0:ata0:0:0:0): Unretryable error cd0 at ata0 bus 0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present ugen0.2: at usbus0 ums0: on usbus0 ums0: 5 buttons and [XYZ] coordinates ID=0 ----Next_Part(Thu_Dec__3_13_50_29_2009_711)----