From owner-freebsd-stable@FreeBSD.ORG Sat Nov 13 08:29:31 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CDF8916A4CE for ; Sat, 13 Nov 2004 08:29:31 +0000 (GMT) Received: from mail.frombach.com (frombach.com [208.179.193.6]) by mx1.FreeBSD.org (Postfix) with SMTP id 1D38043D4C for ; Sat, 13 Nov 2004 08:29:30 +0000 (GMT) (envelope-from zoltan@frombach.com) Received: (qmail 93380 invoked by uid 0); 13 Nov 2004 08:29:37 -0000 Received: from 24.24.201.219 by www.frombach.com (envelope-from , uid 0) with qmail-scanner-1.24 (clamscan: 0.80. spamassassin: 3.0.1. Clear:RC:0(24.24.201.219):SA:0(1.0/8.0):. Processed in 3.274257 secs); 13 Nov 2004 08:29:37 -0000 X-Spam-Status: No, hits=1.0 required=8.0 X-Spam-Level: + Received: from unknown (HELO p4) (zoltan@frombach.com@24.24.201.219) by frombach.com with SMTP; 13 Nov 2004 08:29:34 -0000 Message-ID: <000401c4c95a$e6287ff0$e001a8c0@p4> From: "Zoltan Frombach" To: Date: Sat, 13 Nov 2004 00:29:31 -0800 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 cc: freebsd-stable@freebsd.org Subject: Re: sshd stops accepting connections X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Nov 2004 08:29:31 -0000 > Today I suddenly couldn't log in via ssh to a server I upgraded to > FreeBSD 5.3-RELEASE 4 days ago. When I tried connect to port 22 using > telnet(1) the following just happend: > > [simon at zaphod:~] telnet 192.168.3.2 22 > Trying 192.168.3.2... > Connected to jet.nitro.dk. > Escape character is '^]'. > Connection closed by foreign host. > > The servar had been running FreeBSD 5.2.1 for a while without > problems. ... I had the exact same problem yesterday!! I installad FreeBSD 5.3-RELEASE about a week ago. And on the night of Nov.11th, I've noticed that sshd2 stopped accepting connections. It dropped (closed) any connection immediately. Everything else seemed to work on the server just fine. I also use standard Unix authentication, nothing fancy at all. And I installed SSH2 from ports. I had to call the colo center and asked them to reset my server. After it rebooted, SSH2 started to work again. Examining the content of the log files, I've noticed the following lines: Nov 11 13:45:10 www kernel: ad0: WARNING - WRITE_DMA interrupt was seen but timeout fired LBA=2928095 Nov 11 13:49:52 www kernel: maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5). Nov 11 13:49:54 www kernel: Limiting closed port RST response from 212 to 200 packets/sec Nov 11 13:49:55 www kernel: Limiting closed port RST response from 226 to 200 packets/sec Nov 11 13:49:58 www kernel: Limiting closed port RST response from 223 to 200 packets/sec Nov 11 13:50:00 www kernel: Limiting closed port RST response from 225 to 200 packets/sec Nov 11 13:50:01 www kernel: Limiting closed port RST response from 224 to 200 packets/sec Nov 11 13:50:03 www kernel: Limiting closed port RST response from 226 to 200 packets/sec Nov 11 13:50:04 www kernel: Limiting closed port RST response from 223 to 200 packets/sec Nov 11 13:50:07 www kernel: Limiting closed port RST response from 226 to 200 packets/sec Nov 11 13:50:08 www kernel: Limiting closed port RST response from 223 to 200 packets/sec Nov 11 13:50:10 www kernel: Limiting closed port RST response from 225 to 200 packets/sec Nov 11 13:50:11 www kernel: Limiting closed port RST response from 224 to 200 packets/sec Nov 11 13:50:13 www kernel: Limiting closed port RST response from 226 to 200 packets/sec Nov 11 13:50:14 www kernel: Limiting closed port RST response from 233 to 200 packets/sec Nov 11 13:50:17 www kernel: Limiting closed port RST response from 216 to 200 packets/sec Nov 11 13:50:18 www kernel: Limiting closed port RST response from 223 to 200 packets/sec Nov 11 13:50:20 www kernel: Limiting closed port RST response from 215 to 200 packets/sec Nov 11 13:50:21 www kernel: Limiting closed port RST response from 233 to 200 packets/sec Nov 11 13:50:23 www kernel: Limiting closed port RST response from 225 to 200 packets/sec Nov 11 13:50:25 www kernel: Limiting closed port RST response from 211 to 200 packets/sec Nov 11 13:50:27 www kernel: Limiting closed port RST response from 225 to 200 packets/sec Nov 11 13:50:29 www kernel: Limiting closed port RST response from 225 to 200 packets/sec Nov 11 13:50:31 www kernel: Limiting closed port RST response from 211 to 200 packets/sec Nov 11 13:50:33 www kernel: Limiting closed port RST response from 224 to 200 packets/sec Nov 11 13:50:35 www kernel: Limiting closed port RST response from 205 to 200 packets/sec Nov 11 13:50:37 www kernel: Limiting closed port RST response from 224 to 200 packets/sec Nov 11 13:50:51 www last message repeated 4 times Nov 11 13:50:54 www kernel: Limiting closed port RST response from 222 to 200 packets/sec Nov 11 13:50:58 www kernel: Limiting closed port RST response from 216 to 200 packets/sec Nov 11 13:51:00 www kernel: Limiting closed port RST response from 208 to 200 packets/sec Because of the maxproc message, I then compiled a new kernel with 1024 users. (I used the GENERIC kernel up to this point.) Since I was now building a new kernel, I commented out some drivers that I don't use, like some SCSI devices and some ISA network interfaces, etc. The new kernel seems to work great. However, today (on Friday) I had another weird encounter. This afternoon, for several minutes, I was unable to connect to the server at all: all tcp connection appeared to hang indefinitely! But ping worked and it was fast as always. I kept trying to get in via SSH2, and finally I was able to log in (it took like 2 minutes to get the login prompt, while ping time was normal). After switching to su, I issued the top command to see what is going on. I never get any output. The system was apparently so busy with something that top could not work. I had to force-close that connection. For several minutes I tried to log in again via SSH2, I just wanted to issue a reboot command at this time. When I was about to give up, suddenly, after like 5 minutes the login prompt appeared and I was able to log in. Since then EVERYTHING is working fine, I didn't even have to reboot, the server is still running fine! I saw only these lines in the log file: Nov 12 16:14:27 www kernel: ad0: WARNING - WRITE_DMA interrupt was seen but timeout fired LBA=2416335 Nov 12 16:35:51 www kernel: Limiting icmp unreach response from 276 to 200 packets/sec It seems to me that shortly after the WRITE_DMA warning (like 4 to 20 minutes later) all resources (I guess, processes) seemed to be consumed. It has caused somehow sshd2 to stop accepting new connections at the first time. The second time I greatly increased the maxproc number in the kernel by setting maxusers to 1024. So at that time nothing really failed, but like 20 minutes after the WRITE_DMA warning the system became very unresponsive for at least 5 minutes. And then it just cured itself. I am very what is causeing the WRITE_DMA warning... I'm willing to install any patches to track this down. Can anyone provide me some patches? Zoltan PS: Some info about my system: uname -a FreeBSD www.xxxxxxxx.com 5.3-RELEASE FreeBSD 5.3-RELEASE #0: Fri Nov 12 01:07:41 PST 2004 xxx@www.xxxxxxxx.com:/usr/obj/usr/src/sys/XXXXXXXX i386 dmesg Waiting (max 60 seconds) for system process `hpt_wt' to stop...done Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RELEASE #0: Fri Nov 12 01:07:41 PST 2004 tss@www.frombach.com:/usr/obj/usr/src/sys/FROMBACH Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2806.38-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Hyperthreading: 2 logical CPUs real memory = 1056899072 (1007 MB) avail memory = 1023688704 (976 MB) ACPI APIC Table: ioapic0: Changing APIC ID to 2 ioapic0 irqs 0-23 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 acpi_tz0: on acpi0 acpi_button0: on acpi0 acpi_button1: on acpi0 pcib0: port 0x10e0-0x10ff,0x1000-0x10df,0x480-0x48f,0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xd0000000-0xd7ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) isab0: at device 2.0 on pci0 isa0: on isab0 atapci0: port 0x4000-0x400f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 2.5 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 ohci0: mem 0xe1104000-0xe1104fff irq 20 at device 3.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: mem 0xe1100000-0xe1100fff irq 21 at device 3.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered ohci2: mem 0xe1101000-0xe1101fff irq 22 at device 3.2 on pci0 ohci2: [GIANT-LOCKED] usb2: OHCI version 1.0, legacy support usb2: on ohci2 usb2: USB revision 1.0 uhub2: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: at device 3.3 (no driver attached) xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe000-0xe07f mem 0xe1103000-0xe110307f irq 17 at device 9.0 on pci0 miibus0: on xl0 bmtphy0: <3c905B 10/100 internal PHY> on miibus0 bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:50:04:76:49:e7 fdc0: port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A orm0: at iomem 0xc0000-0xcbfff on isa0 pmtimer0 on isa0 atkbdc0: at port 0x64,0x60 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 2806375656 Hz quality 800 Timecounters tick every 10.000 msec ad0: 78167MB [158816/16/63] at ata0-master UDMA133 acd0: CDROM at ata1-master UDMA33 Mounting root from ufs:/dev/ad0s1a ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to deny, logging disabled my kernel config: # $FreeBSD: src/sys/i386/conf/GENERIC,v 1.413.2.6.2.2 2004/10/24 18:02:52 scottl Exp $ machine i386 #cpu I486_CPU #cpu I586_CPU cpu I686_CPU ident XXXXXXXX maxusers 1024 options PMAP_SHPGPERPROC=400 options KVA_PAGES=384 # To statically compile in device wiring instead of /boot/device.hints #hints "GENERIC.hints" # Default places to look for devices. options SCHED_4BSD # 4BSD scheduler options INET # InterNETworking #options INET6 # IPv6 communications protocols options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options MD_ROOT # MD is a potential root device #options NFSCLIENT # Network Filesystem Client #options NFSSERVER # Network Filesystem Server #options NFS_ROOT # NFS usable as /, requires NFSCLIENT #options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_GPT # GUID Partition Tables. options COMPAT_43 # Compatible with BSD 4.3 [KEEP THIS!] options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options SCSI_DELAY=15000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. options ADAPTIVE_GIANT # Giant mutex is adaptive. device apic # I/O APIC # Bus support. Do not remove isa, even if you have no isa slots device isa #device eisa device pci # Floppy drives device fdc # ATA and ATAPI devices device ata device atadisk # ATA disk drives #device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives #device atapifd # ATAPI floppy drives #device atapist # ATAPI tape drives options ATA_STATIC_ID # Static device numbering # SCSI Controllers #device ahb # EISA AHA1742 family #device ahc # AHA2940 and onboard AIC7xxx devices #device ahd # AHA39320/29320 and onboard AIC79xx devices #device amd # AMD 53C974 (Tekram DC-390(T)) #device isp # Qlogic family #device mpt # LSI-Logic MPT-Fusion #device ncr # NCR/Symbios Logic #device sym # NCR/Symbios Logic (newer chipsets + those of `ncr') #device trm # Tekram DC395U/UW/F DC315U adapters #device adv # Advansys SCSI adapters #device adw # Advansys wide SCSI adapters #device aha # Adaptec 154x SCSI adapters #device aic # Adaptec 15[012]x SCSI adapters, AIC-6[23]60. #device bt # Buslogic/Mylex MultiMaster SCSI adapters #device ncv # NCR 53C500 #device nsp # Workbit Ninja SCSI-3 #device stg # TMC 18C30/18C50 # SCSI peripherals device scbus # SCSI bus (required for SCSI) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # RAID controllers interfaced to the SCSI subsystem #device amr # AMI MegaRAID #device asr # DPT SmartRAID V, VI and Adaptec SCSI RAID #device ciss # Compaq Smart RAID 5* #device dpt # DPT Smartcache III, IV - See NOTES for options #device hptmv # Highpoint RocketRAID 182x #device iir # Intel Integrated RAID #device ips # IBM (Adaptec) ServeRAID #device mly # Mylex AcceleRAID/eXtremeRAID #device twa # 3ware 9000 series PATA/SATA RAID # RAID controllers #device aac # Adaptec FSA RAID #device aacp # SCSI passthrough for aac (requires CAM) #device ida # Compaq Smart RAID #device mlx # Mylex DAC960 family #device pst # Promise Supertrak SX6000 #device twe # 3ware ATA RAID # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device vga # VGA video card driver #device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc # Enable this for the pcvt (VT220 compatible) console driver #device vt #options XSERVER # support for X server on a vt console #options FAT_CURSOR # start with block cursor device agp # support several AGP chipsets # Floating point support - do not disable. device npx # Power management support (see NOTES for more options) #device apm # Add suspend/resume support for the i8254. device pmtimer # PCCARD (PCMCIA) support # PCMCIA and cardbus bridge support #device cbb # cardbus (yenta) bridge #device pccard # PC Card (16-bit) bus #device cardbus # CardBus (32-bit) bus # Serial (COM) ports device sio # 8250, 16[45]50 based serial ports # Parallel port device ppc device ppbus # Parallel port bus (required) #device lpt # Printer #device plip # TCP/IP over parallel device ppi # Parallel port interface device #device vpo # Requires scbus and da # If you've got a "dumb" serial or parallel PCI card that is # supported by the puc(4) glue driver, uncomment the following # line to enable it (connects to the sio and/or ppc drivers): #device puc # PCI Ethernet NICs. device de # DEC/Intel DC21x4x (``Tulip'') device em # Intel PRO/1000 adapter Gigabit Ethernet Card device ixgb # Intel PRO/10GbE Ethernet Card device txp # 3Com 3cR990 (``Typhoon'') device vx # 3Com 3c590, 3c595 (``Vortex'') # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support device bfe # Broadcom BCM440x 10/100 Ethernet device bge # Broadcom BCM570xx Gigabit Ethernet device dc # DEC/Intel 21143 and various workalikes device fxp # Intel EtherExpress PRO/100B (82557, 82558) device lge # Level 1 LXT1001 gigabit ethernet device nge # NatSemi DP83820 gigabit ethernet device pcn # AMD Am79C97x PCI 10/100 (precedence over 'lnc') device re # RealTek 8139C+/8169/8169S/8110S device rl # RealTek 8129/8139 device sf # Adaptec AIC-6915 (``Starfire'') device sis # Silicon Integrated Systems SiS 900/SiS 7016 device sk # SysKonnect SK-984x & SK-982x gigabit Ethernet device ste # Sundance ST201 (D-Link DFE-550TX) device ti # Alteon Networks Tigon I/II gigabit Ethernet device tl # Texas Instruments ThunderLAN device tx # SMC EtherPower II (83c170 ``EPIC'') device vge # VIA VT612x gigabit ethernet device vr # VIA Rhine, Rhine II device wb # Winbond W89C840F device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # ISA Ethernet NICs. pccard NICs included. #device cs # Crystal Semiconductor CS89x0 NIC # 'device ed' requires 'device miibus' #device ed # NE[12]000, SMC Ultra, 3c503, DS8390 cards #device ex # Intel EtherExpress Pro/10 and Pro/10+ #device ep # Etherlink III based cards #device fe # Fujitsu MB8696x based cards #device ie # EtherExpress 8/16, 3C507, StarLAN 10 etc. #device lnc # NE2100, NE32-VL Lance Ethernet cards #device sn # SMC's 9000 series of Ethernet chips #device xe # Xircom pccard Ethernet # ISA devices that use the old ISA shims #device le # Wireless NIC cards #device wlan # 802.11 support #device an # Aironet 4500/4800 802.11 wireless NICs. #device awi # BayStack 660 and others #device wi # WaveLAN/Intersil/Symbol 802.11 wireless NICs. #device wl # Older non 802.11 Wavelan wireless NIC. # Pseudo devices. device loop # Network loopback device mem # Memory and kernel memory devices device io # I/O device device random # Entropy device device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling #device faith # IPv6-to-IPv4 relaying (translation) # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! device bpf # Berkeley packet filter # USB support device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device usb # USB Bus (required) #device udbp # USB Double Bulk Pipe devices device ugen # Generic device uhid # "Human Interface Devices" device ukbd # Keyboard #device ulpt # Printer device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse #device urio # Diamond Rio 500 MP3 player #device uscanner # Scanners # USB Ethernet, requires mii #device aue # ADMtek USB Ethernet #device axe # ASIX Electronics USB Ethernet #device cue # CATC USB Ethernet #device kue # Kawasaki LSI USB Ethernet #device rue # RealTek RTL8150 USB Ethernet # FireWire support #device firewire # FireWire bus code #device sbp # SCSI over FireWire (Requires scbus and da) #device fwe # Ethernet over FireWire (non-standard!) my make.conf file: CPUTYPE?=p4 #NO_CPU_CFLAGS= true # Don't add -march= to CFLAGS automatically #NO_CPU_COPTFLAGS=true # Don't add -march= to COPTFLAGS automatically #COPTFLAGS= -O -pipe # Yes, this line is commented out!