From owner-freebsd-questions Wed Jul 17 16:21:49 1996 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id QAA00245 for questions-outgoing; Wed, 17 Jul 1996 16:21:49 -0700 (PDT) Received: from fw.tabula.com (fw.tabula.com [204.160.137.2]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id QAA00240 for ; Wed, 17 Jul 1996 16:21:42 -0700 (PDT) Received: by fw.tabula.com (4.1/SMI-4.1) id AA10564; Wed, 17 Jul 96 16:21:38 PDT Received: from tab012.tabula.com(204.119.64.12) by fw.tabula.com via smap (V1.3) id sma010562; Wed Jul 17 16:21:17 1996 Received: by tabula.com (5.x/SMI-SVR4) id AA05770; Wed, 17 Jul 1996 16:19:48 -0700 Date: Wed, 17 Jul 1996 16:19:46 -0700 (PDT) From: Thor Clark To: "T. William Wells" Cc: freebsd-questions@freebsd.org Subject: Re: system hangs? after resetting rtq_reallyold In-Reply-To: <4sg66v$9e@twwells.com> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-questions@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On 16 Jul 1996, T. William Wells wrote: > In article , > Thor Clark wrote: > : Am I out of space for new processes, or out of mbufs, or? > > You may also have user processes that group without limit. See > if you can get a systat or other *stat running on a virtual > console when the problem occurs; it might show interesting > results. Thanks for the pointers - I can't seem to shake this one - I've combed the mailing list, and found 3-4 others who've had exactly the same problem, but haven't seen a solution anywhere, and no bug reported - I'll put as much information as I can here - am I missing something simple?. My apologies for the length of this post. Any ideas,tests cheerfully tried. problem: After a few hours of activity, system will not start new processes of any kind. It will respond to pings, but that's about all - no telnet, login, http, etc. Background and interactive processes continue to run, but killing off interactive processes does not have any effect on the system - the only recourse is to physically reboot. This is now happening every ~3 hours. No kernel panic occurs, and no kernel messages are ever logged. The system has never recovered on its own ( down once for > 12 hours). system: 2.1 Release (from cd) - kernel recompiled, installed with maxusers 128 options "NMBCLUSTERS=2048" options "OPEN_MAX=256" options "CHILD_MAX=256" 16M, IDE, ASUS P55TP4, 3C509 ethernet runs CERN3.0 httpd, sends out a lot of mail, a few minor background processes, and a lot of short-term, cpu intensive scripts I ran an fstat | wc -l a few seconds before the last lockup, it returned 347. Generally this is about 200-250. Is this high? I'll try pretty much anything at this point... Thanks -Thor Clark (logs below) Some data: from (sysctl -a) kern.maxvnodes = 2813 kern.maxproc = 2068 kern.maxfiles = 4136 kern.maxfilesperproc = 4136 kern.maxprocperuid = 2067 (top) - while machine is locked (I've never seen load > 2 when system is ok) load averages: 5.70, 3.08, 1.64 16:56:53 51 processes: 1 running, 45 sleeping, 1 stopped, 4 zombie Cpu states: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle Memory: 7552K Active, 2668K Inact, 2708K Wired, 1104K Cache, 64K Free Swap: 82M Total, 63M Free, 24% Inuse (vmstat -w 5) - while machine is locked procs memory page disks faults cpu r b w avm fre flt re pi po fr sr f0 w0 w2 in sy cs us sy id 5 9 0 31328 608 0 0 0 0 0 0 0 0 0 229 13 3 0 0 100 610 0 55660 1484 23 2 9 31 14 1784 0 40 0 902 1808 62 4 12 84 610 0 55660 1476 1 0 0 0 0 0 0 0 0 287 5163 59 8 15 77 610 0 55660 1384 5 1 2 0 4 0 0 2 0 273 38 7 0 1 99 711 0 51820 1020 20 0 5 0 11 0 0 5 0 371 24 10 0 2 97 712 0 51328 904 10 0 2 0 6 0 0 4 2 331 13 6 0 2 98 712 0 44692 904 0 0 0 0 0 0 0 0 0 235 11 3 0 1 99 712 0 40436 904 0 0 0 0 0 0 0 0 0 229 11 3 0 0 100 713 0 35856 908 1 0 0 0 0 0 0 1 0 241 15 4 0 0 100 713 0 31476 860 17 3 6 0 26 0 0 13 0 417 75 20 0 3 97 714 0 35696 1060 4 0 2 0 15 0 0 3 0 272 22 7 0 2 98 (iostat -w 1) - while machine is locked tin tout sps tps msps sps tps msps sps tps msps us ni sy in id 355 753 0 0 0.0 0 0 0.0 16 1 0.0 3 0 10 0 87 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 1 0 99 0 627 0 0 0.0 0 0 0.0 0 0 0.0 0 0 1 0 99 0 0 0 0 0.0 0 0 0.0 0 0 0.0 0 0 0 0100 0 141 0 0 0.0 0 0 0.0 0 0 0.0 1 0 0 1 98 (tout #'s are characters written by top) So, near as I can tell, everything slows down, and no new processes are created... don't know what else to try.