From owner-freebsd-current@FreeBSD.ORG Wed Nov 19 21:26:11 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F7F616A4CE; Wed, 19 Nov 2003 21:26:11 -0800 (PST) Received: from monkeyflinger.anonymizer.com (monkeyflinger.anonymizer.com [168.143.113.15]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C09843FA3; Wed, 19 Nov 2003 21:26:10 -0800 (PST) (envelope-from rabbi@anonymizer.com) Mime-Version: 1.0 (Apple Message framework v606) Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <0C8643E8-1B1A-11D8-B160-000A959E7C72@anonymizer.com> Content-Transfer-Encoding: 7bit From: Len Sassaman Date: Wed, 19 Nov 2003 21:26:10 -0800 To: freebsd-current@freebsd.org, freebsd-hackers@freebsd.org X-Mailer: Apple Mail (2.606) Subject: Help request: problems with a 5.1 server and large numbers of ssh users. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2003 05:26:11 -0000 Hi folks, I have a problem, and I am unable to find previous discussions of it. Any pointers or clues would be much appreciated. I have a FreeBSD 5.1 server that needs to be able to handle several thousand simultaneous ssh sessions from distinct users. (I am using FreeBSD 5.1 because I need to be able to support ldap authentication.) Hardware info: CPU: AMD Athlon(tm) MP 2000+ (1666.74-MHz 686-class CPU) real memory = 1610088448 (1535 MB) avail memory = 1558822912 (1486 MB) My version of ssh is 3.6.1p2 patched to address the security concerns. (I am not using 3.7.1p because it dropped support for password authentication with PAM, and I cannot assume keyboard-interactive authentication will be present in my users' clients.) All of these users are doing ssh port forwarding, and are not assigned ptys. I have not modified login.conf in any way -- the defaults of "no limits" remain. The kernel tunables in /boot/loader.conf are set to: kern.maxfiles="49312" kern.maxproc="24656" kern.maxprocperuid="11094" kern.ipc.maxsockets="24656" kern.ipc.somaxconn="8192" The kernel is compiled with NMBCLUSTERS=65536 and maxusers=0 (which defaults to 384). The problem is that after about 150 users log in (300ish sshd sessions, since I am using privsep), incoming connections start getting dropped. i.e., bash-2.05b# ssh -v localhost OpenSSH_3.6.1p2, SSH protocols 1.5/2.0, OpenSSL 0x0090701f debug1: Reading configuration data /etc/ssh/ssh_config debug1: Rhosts Authentication disabled, originating port will not be trusted. debug1: Connecting to localhost [::1] port 22. socket: Protocol not supported debug1: Connecting to localhost [127.0.0.1] port 22. debug1: Connection established. debug1: identity file /root/.ssh/identity type -1 debug1: identity file /root/.ssh/id_rsa type -1 debug1: identity file /root/.ssh/id_dsa type -1 ssh_exchange_identification: Connection closed by remote host debug1: Calling cleanup 0x805f010(0x0) bash-2.05b# It is my intuition from this behavior that the sshd master process listening for connections is unable to spawn a new process to complete the authentication step, and thus the connection is being dropped. There is no information of use in dmesg, nor in the system logs. (I've cranked up LogLevel to DEBUG3 in sshd_config). I have a RedHat Linux server running the 2.4.18-3smp kernel on a dual Athlon MP 1800+ and 2048MB RAM that is known to handle 1000 users without issue -- so I have to believe the FreeBSD box, though not as beefy hardware-wise, should be able to do better than a few hundred users. I believe this to be some sort of resource limit issue, but I have addressed everything I could think of. Here's the sysctl vm.zone output: vm.zone: ITEM SIZE LIMIT USED FREE REQUESTS FFS2 dinode: 256, 0, 1089, 21, 1359 FFS1 dinode: 128, 0, 0, 0, 0 FFS inode: 144, 0, 1089, 59, 1359 SWAPMETA: 276, 121576, 0, 0, 0 unpcb: 140, 65548, 329, 63, 31364 ripcb: 228, 49317, 0, 0, 0 syncache: 136, 15370, 0, 58, 36747 tcptw: 48, 49385, 3812, 255, 89831 tcpcb: 360, 49313, 1048, 63, 195072 inpcb: 228, 49317, 4921, 94, 195072 udpcb: 228, 49317, 1, 33, 114497 socket: 256, 49320, 1383, 102, 340934 KNOTE: 64, 0, 0, 124, 114453 PIPE: 176, 0, 622, 68, 17402 DIRHASH: 1024, 0, 138, 6, 138 NAMEI: 1024, 0, 9, 11, 451791 VNODEPOLL: 76, 0, 0, 0, 0 VNODE: 292, 0, 1473, 35, 1473 g_bio: 144, 0, 259, 49, 186276 VMSPACE: 256, 0, 424, 26, 11035 UPCALL: 44, 0, 0, 0, 0 KSE: 64, 0, 496, 62, 496 KSEGRP: 120, 0, 496, 62, 496 THREAD: 292, 0, 496, 11, 496 PROC: 480, 0, 461, 35, 11074 Files: 60, 0, 6051, 153, 89241268 65536: 65536, 0, 3, 3, 3 32768: 32768, 0, 3, 3, 32 16384: 16384, 0, 56, 22, 1733 8192: 8192, 0, 2, 4, 50 4096: 4096, 0, 736, 44, 11965 2048: 2048, 0, 71, 5, 359215 1024: 1024, 0, 408, 20, 284756 512: 512, 0, 102, 18, 43908 256: 256, 0, 5166, 84, 131327 128: 128, 0, 6784, 253, 535182 64: 64, 0, 3032, 68, 87489 32: 32, 0, 2155, 182, 211243 16: 16, 0, 4485, 295, 844397 DP fakepg: 72, 0, 0, 0, 0 PV ENTRY: 28, 5324340, 347293, 45827, 7291251 MAP ENTRY: 60, 0, 27235, 89, 560399 KMAP ENTRY: 60, 32802, 817, 107, 22218 MAP: 176, 0, 8, 38, 6 VM OBJECT: 148, 0, 18751, 122, 318636 UMA Buckets: 512, 0, 741, 1, 0 UMA Hash: 128, 0, 1, 30, 0 UMA Slabs: 34, 0, 1092, 8, 0 UMA Zones: 284, 0, 48, 8, 0 Am I missing anything? Thanks, Len