Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 May 2009 14:02:40 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        "Marc G. Fournier" <scrappy@hub.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: More data on 7.2-RELEASE "hangs"
Message-ID:  <200905131402.41104.jhb@freebsd.org>
In-Reply-To: <20090513142806.V17646@hub.org>
References:  <20090513040719.D17646@hub.org> <200905131252.15171.jhb@freebsd.org> <20090513142806.V17646@hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 13 May 2009 1:44:55 pm Marc G. Fournier wrote:
> On Wed, 13 May 2009, John Baldwin wrote:
> 
> > Well, you had a whole lot of page faults and other VM activity, plus 500k
> > syscalls.  The 'w' is a count of swapped processes, so basically your box is
> > swapping a whole lot it seems.  I think your box is just overloaded.
> 
> I knew I was going to regret posting that :(
> 
> What I posted was what vmstat 5 shows after the issue *starts*, not what 
> it normally looks like ... right now, after 10 hours of uptime, and all 
> the same processes running, it looks like:
> 
> io# vmstat 5 (10 hours uptime now)
>   procs      memory      page                    disks     faults         cpu
>   r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy id
>   0 1 0  10477M   301M  3503  13   1   2  3620 286   0   0  331 45491 4566 26  8 66
>   0 1 0  10430M   305M   278   7   0   0   550   0  18   0  186 19243 2917 4  3 93
>   1 1 0  10474M   295M   511   0   0   0   359   0  91   0  253 11632 3516 7  3 90
>   0 1 0  10447M   310M   819   3   0   0  1473   0  14   0  143 29575 2486 8  3 89
>   0 1 0  10558M   295M  5008  18  13   5  4128   0 121   0  345 24212 4215 16  7 77
> 
> Right now, IO is running ~775 processes ... at the time of the vmstat I 
> provided earlier, it was up to 1400 processes ... since there is only 5 
> minutes between script runs, something is causing it to go from zero swap 
> -> high swap within a very short period of time, but since things get 
> badly locked up when it happens, I can't isolate where ...
> 
> I've got the following two ps outputs at the time of the high paging:
> 
> /bin/ps -aucxHl -O jid > ps-long.out
> /bin/ps -aux -O jid > ps-short.out

Perhaps do 'sort -n -k6 < ps-short.out' to find which processes have large
virtual memory sizes?  Something is using a lot of memory and causing your
box to thrash.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200905131402.41104.jhb>