Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Dec 1999 19:37:26 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Tom <tom@uniserve.com>
Cc:        Peter Wemm <peter@netplex.com.au>, freebsd-stable@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG
Subject:   Re: softupdates and debug.max_softdeps 
Message-ID:  <199912310337.TAA79239@apollo.backplane.com>
References:   <Pine.BSF.4.02A.9912301820230.9644-100000@shell.uniserve.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

:
:  I also don't think "sync" is a fix either.  I expect "sync" to reclaim
:unused space.  For instance, the file system currently shows 9 GB in use
:with "df", but there is only about 5 GB actually present on the disk.  I
:ran "sync", and I expected "df" to report about 5GB used, but it doesn't
:seem to change anything.  I'm going to try sync again tommorrow once the
:unreclaimed space is about 30GB or so, and see if it does anything.

    Try lots of sync's ... like one a second :-).  One sync won't do it.
    But what we really want to do is make the thing crash and hopefully
    (with the serial console maybe) get a panic message.

    Conventionally what should be occuring is that the kernel should be
    running out of some memory pool.  If this is what is occuring it should
    generate a panic message prior to rebooting.

    A couple of other things you can do:

	Compile up the kernel with DDB configured so the system drops into
	DDB instead of panicing (only do this if you have access to 
	the console).  Then you should be able to 'trace' and 'ps' prior
	to typing 'panic' <return> manually (type as many <return>s as
	necessary after that but be careful, you don't want to interrupt
	a kernel dump if the kernel has started one!).

	Using several local xterms with a large back buffer configured,
	ssh to the machine under test and setup a couple of csh while(1)
	loops to look at various kernel resources, e.g.

	while (1)
	    vmstat -z; vmstat -m
	    end
	end

	The reason you use a local xterm in which you ssh to the remote
	machine is so the xterm doesn't disappear on you when the remote
	machine crashes :-).

    A tail -f /var/log/messages will probably *NOT* spit out the panic 
    message quickly enough, but a true serial console (not just a getty
    running on the port) should spit it out just fine.


:  One thing that is interesting is that the following sysctl variables are
:always zero:
:
:debug.blk_limit_push: 0
:debug.ino_limit_push: 0
:debug.blk_limit_hit: 0
:debug.ino_limit_hit: 0
:debug.rush_requests: 0
:
:  So it doesn't look like softupdates is rushing things out.

    These aren't very useful unless you only have a tiny bit of main 
    memory.  for all practical purposes the limit is not usually ever
    reached (which is probably why its buggy when it *is* reached).

:  "vmstat -m" is showing that the storage for "inodedep" is steadily
:increasing.
:
:  I _think_ I need to increase tick_delay, so when the max_softdeps limit
:is finally hit, syncer gets run for a while and clean things up.

    tick_delay will probably not have much of an effect.

    look at the vmstat -m output carefully as you run the test (as suggested
    above).  Bad things happen if you run the kernel out of KVM, and that
    can happen even if you have plenty of normal ram.  There are *TWO* limits
    involved.  There is the limit for the memory pool you are observing,
    and there is a global limit on the grand total which is nominally 
    2x the per-pool limit.  If either limit is reached the machine is hosed.

:Tom
:Uniserve

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912310337.TAA79239>