Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Oct 1997 21:32:03 +0930
From:      Mike Smith <mike@smith.net.au>
To:        hackers@freebsd.org
Subject:   Odd out-of-swap condition; ideas?
Message-ID:  <199710151202.VAA00264@word.smith.net.au>

next in thread | raw e-mail | index | archive | help

** Firstly, please note that this is on a 2.2 of around February vintage; 
** if this is known-and-fixed, say no more than that and we will proceed 
** to negotiating an upgrade.


We have a system in the field that is showing an odd out-of-swap 
condition.  What's most odd is that it appears to involve a leak of 
some sort, where swap remains attached to a process even though the 
process doesn't appear to require it.

Some background for the following:

 - The 'idl' processes are running under the Linux ABI emulation.  These
   suckers do *lots* of filesystem work; the 'temp' allocation class
   gets at least twice as much work as any other in the system.
 - The 'exptd' process hits the hardware directly (it has IOPL set).
 - Both of the above are started using 'su' out of system startup 
   scripts, so they inherit either the daemon or default resource 
   limits, in this case they should be limited to 64M max size.
 - All of the 'ps' output is from 'ps alxmwww', trimmed to keep the most
   interesting fields and processes.

Here is some relevant data shortly after startup:

Tue Oct 14 06:32:01 GMT 1997
Device      1K-blocks     Used    Avail Capacity  Type
/dev/sd0s1b    131072     5228   125780     4%    Interleaved
PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
 -6  0  4884  916 biowai D    con-  26:10.80 .../bin.linux/idl analysis_init
 69  0  4476 2196 -      R     ??    0:00.72 .../bin.linux/idl display_init
  2  0  3684 3476 select Ss    ??    0:47.29 /usr/X11R6/bin/X -auth ...
 18  0  2592  996 pause  S     ??    4:50.06 exptd: experiment ...

All looks pretty happy.  After a little while the display gets some 
more work done, and grows a bit:

Tue Oct 14 19:38:46 GMT 1997
Device      1K-blocks     Used    Avail Capacity  Type
/dev/sd0s1b    131072    64096    66912    49%    Interleaved
PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
 -6  0 30560 4152 biowai D     ??   54:18.29 .../bin.linux/idl display_init
 10  0  4884 1212 wait   S    con- 205:33.58 .../bin.linux/idl analysis_init
  2  0  3708 2260 select Ss    ??    3:21.22 /usr/X11R6/bin/X -auth ...
 -6  0  2592  760 biowai D     ??  153:04.58 exptd: experiment: ...

Ok, that's not unreasonable, but note the amount of swap in use; it's 
starting to look a bit suspicious.  There's nothing like that much in 
toto in the VSZ column.  A little bit later we see:

Tue Oct 14 21:13:42 GMT 1997
Device      1K-blocks     Used    Avail Capacity  Type
/dev/sd0s1b    131072   128116     2892    98%    Interleaved
PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
  2  0 39220 4132 select S     ??   61:35.05 .../bin.linux/idl display_init
 18  0  4908 1020 pause  S    con- 226:26.83 .../bin.linux/idl analysis_init
  2  0  3708 2444 select Ss    ??    3:35.28 /usr/X11R6/bin/X -auth ...
 74  0  2592 1916 -      R     ??  170:51.50 exptd: experiment: ...

Whoa, where'd it all go?  Next pass (10 seconds later) ps died because 
it couldn't allocate any memory.  At this point, various things were 
failing (normally lots of fork/exec activity), but it struggled along.  
The analysis died eventually, which let a single pass run:

Tue Oct 14 21:18:24 GMT 1997
Device      1K-blocks     Used    Avail Capacity  Type
/dev/sd0s1b    131072   130212      796    99%    Interleaved
 PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
  -6  0 39236 5188 biowai D     ??   61:57.12 .../bin.linux/idl display_init
   2  0  3708 2388 select Ss    ??    3:36.05 /usr/X11R6/bin/X -auth ...

Note that just about everything else is gone, and still no swap left.  
Then, eventually the display dies too, and immediately all is well 
again:

Device      1K-blocks     Used    Avail Capacity  Type
/dev/sd0s1b    131072     9696   121312     7%    Interleaved
PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
 -6  0  4240  836 biowai D     ??    0:00.23 .../bin.linux/idl analysis_init
  2  0  3708 2436 select Ss    ??    3:38.57 /usr/X11R6/bin/X -auth ...
 -6  0   944  464 biowai D     ??    0:00.02 exptd: experiment: ...

(The analysis and experiment were resurrected by their startup scripts)

The conclusion reached from this is that the display process has 
somehow managed to own a lot of swap that wasn't attached to it.

Any ideas?  Suggested explanations?  Upgrading this system will be a 
little difficult (it is in remote eastern Germany), but will be 
undertaken if a fix is likely.

Thanks,
mike









Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710151202.VAA00264>