Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Dec 2000 18:09:00 -0800 (PST)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        freebsd-stable@FreeBSD.ORG
Cc:        Kachun Lee <kachun@pathlink.com>
Subject:   Re: Extreme high load with 12/7 4-releng
Message-ID:  <200012130209.eBD290M79194@earth.backplane.com>
References:  <200012120230.SAA32402@pathlink.net> <200012121801.KAA42878@pathlink.net> <200012122138.NAA69074@pathlink.net> <200012122231.eBCMVE353411@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help

    Well, I've found a bunch of problems.  I'm not sure exactly which one
    Kachun is hitting, but I think it's likely he's hitting one of them.

    * I removed --page_shortage in the 'pageout daemon flushing dirty page'
      case.  I shouldn't have.  This can cause the pageout daemon to 
      free up way too many clean pages.

    * The pageout daemon skips pages for vnodes it can't lock.  BIG mistake.
      This results in completely non-optimal paging operation. 

      It turns out that if the pageout daemon is woken up from a vm_fault,
      which is quite common, it is highly likely that the vm_fault will be
      holding a vnode lock and be in the middle of an I/O when the pageout
      daemon runs, causing the pageout daemon to ignore the vnode the vm_fault
      is sitting on.  If you have a lot of processes doing I/O, a lot of
      vnodes get ignored.

      The lock-skipping code was originally in to prevent the pageout
      daemon from deadlocking in a low-memory situation, and to prevent it
      from locking up on dead NFS nodes.  However, with the low-memory
      deadlock fixes I recently committed, I think we may be able to 
      safely lock the vnode in the pageout daemon now.

    * The pageout daemon reorders pages it had to 'skip'.  The main culprit
      is when it decides it can't lock the vnode.  The reordering for this
      case only occurs for dirty pages which results in fragmentation of the
      queue ordering.  Additionally it gives dirty pages 'triple priority'...
      they get moved to the end of the inactive queue, and they also get
      moved to the end of the inactive queue when they are successfully
      cleaned.  This causes originally dirty pages to stick around much
      too long.

    I'm not certain why Kachun isn't having a problem with 4.1.1, because most
    of these problems are at least a year old.  But I can see how recent
    low-memory handling changes might have exasperated the existing problems.
    The --page_shortage issue in particular really hoses the inactive scan
    when maxlaunder is not sufficient to clean the dirty pages.

    To give you an idea on the difference in performance, running a program
    on my test box to iterate through a huge (3xMain-memory) file via mmap,
    alternately touching 8K and accessing 8K, resulted in long system stalls
    and a pidly pageout rate of maybe 2MB/sec.  To disk.

    When I made the pageout daemon block in the vnode lock rather then skip
    the vnode, fixed --page_shortage, got rid of the inactive queue
    reordering, and got rid of the artificial maxlaunder limitation,
    the paging rate went up to 24 MB/sec and the system no longer stalled.
    No stalling whatsoever.  The system ran like clockwork despite having
    no free memory.

    I am going to make patch sets available for -current and -stable later
    tonight for testing.  The changes are straightforward but serious so
    it could be up to two weeks before they get into -stable.  I intend to
    commit them to -current later this week, maybe thursday.  Some serious
    review is needed to ensure that the vnode locking change in the
    pagedaemon does not screw it up when you have things like dead NFS nodes
    floating around.

    The patches actually remove a whole lot of code.. the result is smaller
    then the original :-).  that's always nice!

    I found another serious issue related to the update daemon's
    synchronization when combined with pages dirtied via mmap() (when mmap
    is used normally, without MAP_NOSYNC).  We really need the incremental
    syncing feature but unless someone else wants to do it it may be a while
    before I can get to it.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200012130209.eBD290M79194>