Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Jul 1999 17:37:05 -0700 (PDT)
From:      Doug <Doug@gorean.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        freebsd-current@FreeBSD.ORG
Subject:   Re: PMAP_SHPGPERPROC: related to pagedaemon?
Message-ID:  <Pine.BSF.4.05.9907231722070.29551-100000@dt011n65.san.rr.com>
In-Reply-To: <199907232334.QAA28303@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 23 Jul 1999, Matthew Dillon wrote:

> :> 	Using two -current machines, both dated 7/16 I got the following
> :> message in my log file, which I think explains the weird spontaneous 
> :> reboots I've been getting.
> :> 
> :> /kernel: pmap_collect: collecting pv entries -- suggest increasing PMAP_SHPGPERPROC
> :
> :...
> :
> :	I've increased the PMAP_SHPGPERPROC setting in the kernel config
> :file to 400 and recompiled just in case the system panics again, however
> :with it set to 300 as it is now it recovers ok. Once again, any thoughts
> :or suggestions welcome.
> :
> :Thanks,
> :
> :Doug
> 
>     Very weird.  The system should definitely not be spontaniously rebooting
>     due to this, at least not without generating a panic message.  The
>     pmap_collect message is just a warning.

	*Nod* I am not 100% _sure_ that this is the reason for the panic,
however in more than one case this was the last message in the log,
followed very closely (never more than a minute or two) by the first
message of the boot. Given that it takes so long for our machines to do
the memory POST test, there is at least a correlative relationship. Also,
prior to increasing the limit to 300 today one of the machines that I was
testing on entered this state that I now know to be massive pagerdaemon
chugging, and then locked up and eventually panic'ed while I was working
on it. 

>     I'm not sure what could be causing the pageout daemon overhead. 

	I sent a followup to DG's post re our apache configuration, which
is my mostly likely candidate. 

>     It
>     sounds like it ought to be the pmap_collect() function (i386/i386/pmap.c)
>     but the only way we could tell for sure would be for you to compile up
>     a profiled kernel which you may not want to do on a production system.

	Well, the way we have the network set up if one box blows up there
are 2-3 others there to take the load, so if you think that it will really
be beneficial I can probably swing permission to do this. Give me precise
steps to follow re how to set it up and how to test it and I'll do what I
can on Monday. I'm familiar with stuff like gdb, DDB, etc. but never done
any profiling.

>     The failure case for the pmap stuff occurs when you have a lot of 
>     processes sharing a lot of data, usually via mmap, where the dataset
>     fits in memory.  Thus the system would run out of pv entries before
>     running out of physical memory.

	Yeah, the boxes aren't even close to swap'ing, with c. 200M of
physical ram free. 

>     In regards to your CGI execution:  One thing to look out for is what is
>     called a cascade failure.

	*Nod* I appreciate the warning. It's to solve this exact problem
that we're setting up this new network of boxes to do nothing other than
processing miva CGI. Unfortunately we don't have that kind of fine grained
control over the scripts themselves since the idea here is to allow the
customers to throw up whatever they want without interaction from us.
However I have a feeling that the actual panic is caused by this kind of
cascade failure, or symptoms that arise out of it. Now that I finally have
a complete 'ps' output while it's locked up I will have a better idea what
limits to set if increasing the number of PMAP_SHPGPERPROC's doesn't do
the trick. 

Doug
-- 
On account of being a democracy and run by the people, we are the only
nation in the world that has to keep a government four years, no matter
what it does.
                -- Will Rogers



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9907231722070.29551-100000>