From owner-freebsd-current Mon Apr 15 03:00:48 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id DAA05149 for current-outgoing; Mon, 15 Apr 1996 03:00:48 -0700 (PDT) Received: from bunyip.cc.uq.oz.au (pp@bunyip.cc.uq.oz.au [130.102.2.1]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id DAA05124 for ; Mon, 15 Apr 1996 03:00:27 -0700 (PDT) Received: from bunyip.cc.uq.oz.au by bunyip.cc.uq.oz.au id <23176-0@bunyip.cc.uq.oz.au>; Mon, 15 Apr 1996 20:00:21 +1000 Received: from orion.devetir.qld.gov.au by pandora.devetir.qld.gov.au (8.6.10/DEVETIR-E0.3a) with ESMTP id SAA00868 for ; Mon, 15 Apr 1996 18:54:25 +1000 Received: from localhost by orion.devetir.qld.gov.au (8.6.10/DEVETIR-0.3) id SAA14153; Mon, 15 Apr 1996 18:56:00 +1000 Message-Id: <199604150856.SAA14153@orion.devetir.qld.gov.au> To: freebsd-current@freebsd.org cc: syssgm@devetir.qld.gov.au Subject: Re: Just how stable is current Date: Mon, 15 Apr 1996 18:55:59 +1000 From: Stephen McKay Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Ollivier Robert thinks: >It seems that J Wunsch said: >> > Yes, I know that this is a bad question to ask, but.... >> >> Mine's from the Easter weekend, and i can't complain. > >Mine is from tuesday is running fine. -CURRENT has been very stable for me >for at least 3 weeks (if not more). Not all of us are happy campers. I have a -current kernel from January 9 which works well for me, and have had various problems with all kernels built since. My hardware is modest: 16Mhz 386sx with 4Mb ram, NFS for all source and object files, vnconfig swap + real swap totals 16Mb. I have 3 problems: 1) NFS problem: My January 9 kernel will work properly as a client with any server using 8Kb max size UDP connections. More recent kernels won't. I get severe performance degradation that I assume is from lots of retries and timeouts, even though I can't find them in nfsstat. Many processes hang for long periods in sbwait, nfsrcvlk and similar network states. Ok, overruns are a common problem with PC network cards, especially in slow machines. However, setting the maximum size to 1Kb does not cure the problem (or maybe moves the problem elsewhere). Switching to TCP transport produced a total cure, but is not available on all servers. 2) Processes with negative resident size: Friday, I started a make all of -current and snapped this: (some boring processes deleted) UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 0 0 0 0 -18 0 0 0 sched DLs ?? 0:03.33 (swapper) 0 1 0 12 10 0 392 0 wait IWs ?? 0:00.33 /sbin/init -- 0 2 0 75 -18 0 0 12 psleep DL ?? 11:54.65 (pagedaemon) 0 3 0 32 28 0 0 12 psleep DL ?? 2:52.65 (vmdaemon) 0 4 0 5 29 0 0 12 update DL ?? 0:14.13 (update) ... 0 2177 2176 9 10 5 340 -4 wait IWN p0 0:02.70 make 0 2179 2177 38 10 5 452 0 wait IWN p0 0:00.36 /bin/sh -ec for entry in include lib bin games gnu libexec sbin 0 2190 2179 75 10 5 308 -4 wait IWN p0 0:02.29 make all DIRPRFX 0 2192 2190 107 10 5 452 -4 wait IWN p0 0:00.33 /bin/sh -ec for entry in csu/i386 libc libcompat libcom_err libc 0 2195 2192 32 10 5 2840 8 wait IWN p0 1:12.30 make all DIRPRFX 0 2233 2195 135 10 5 216 16 wait IWN p0 0:00.99 cc -O2 -DLIBC_RCS -DSYSLIBC_RCS -D__DBINTERFACE_PRIVATE -DPOSIX_ 0 2238 2233 109 65 5 848 1004 - RN p0 0:17.92 /usr/libexec/cc1 /tmp/cc002233.i -quiet -dumpbase bt_open.c -O2 0 147 1 48 3 0 156 -4 ttyin IWs+ v0 0:00.49 /usr/libexec/getty Pc ttyv0 RSS < 0 may be a cosmetic flaw, or it may be seriously buggering the VM system. I don't know yet, but I'm valiantly struggling through the VM code. :-) 3) Madly spinning processes: This morning the scene was: UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND 0 4796 4399 131 10 5 308 -4 wait IWN ?? 0:01.85 make all DIRPRFX 0 4798 4796 87 10 5 452 -4 wait IWN ?? 0:00.72 /bin/sh -ec for entry in as awk bc cc cpio cvs dc dialog diff di 0 4990 4798 135 10 5 312 -4 wait IWN ?? 0:01.98 make all DIRPRFX 0 4992 4990 149 10 5 452 -4 wait IWN ?? 0:00.39 /bin/sh -ec for entry in libgroff libdriver libbib groff troff n 0 5011 4992 210 90 5 344 20 - RN ?? 3509:56.22 make all DIR All but one process had reasonable amounts of time accrued. Some even had normal resident memory. :-) vmstat -s revealed: (sorry, I don't know what's irrelevant here) 3010564 cpu context switches 69486232 device interrupts 2658782 software interrupts 371029200 traps 1002815 system calls 86889 swap pager pageins 195866 swap pager pages paged in 57630 swap pager pageouts 82118 swap pager pages paged out 115789 vnode pager pageins 238148 vnode pager pages paged in 0 vnode pager pageouts 0 vnode pager pages paged out 41415 page daemon wakeups 27543608 pages examined by the page daemon 15642 pages reactivated 158113 copy-on-write faults 262888 zero fill pages zeroed 253 intransit blocking page faults 367919662 total VM faults taken 514357 pages freed 39851 pages freed by daemon 368305 pages freed by exiting processes 286 pages active 68 pages inactive 9 pages in VM cache 313 pages wired down 13 pages free 4096 bytes per page 550001 total name lookups cache hits (77% pos + 2% neg) system 2% per-directory deletions 0%, falsehits 4%, toolong 0% 367919662 VM faults over 2.5 days equates to 1700 per second. This is far in excess of what the machine can fetch from disk, so it can only be "soft" faults (where pages really are there, but the VM system was hoping you didn't need them any more and was going to free them soon), or some total failure to provide the needed page at all, causing make to fault again immediately on returning to user mode. That make process has only 5 resident pages (or is it 6 :-)), but lots of memory was available for my shell, telnetd, etc when I logged in. It isn't lack of real memory that caused this. Now, for the final twist before the audience can return to the comfortable normalcy of their own lives: I stopped the whole process group with SIGSTOP, and noted that all processes went from RSS -4 to 8, presumably because the u area had faulted in. I waited all day (just because I had real work :-)), and found that the problem make process was eventually reduced to 8Kb, like the others. Then I restarted them with SIGCONT, and blow me down if they didn't just up and carry on like nothing had happened. The problem make exited (presumably after finishing successfully), and the compilation is proceeding normally as I write. Thanks to all who have bothered to read this far. I shall be consulting the special texts of the masters (sys/vm/*.[hc]) for enlightenment, but expect to be beaten to the answer by more knowledgeable persons. Stephen.