Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jan 1999 19:20:54 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Alfred Perlstein <bright@hotjobs.com>
Cc:        Terry Lambert <tlambert@primenet.com>, dyson@iquest.net, pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG
Subject:   Re: questions/problems with vm_fault() in Stable
Message-ID:  <199901080320.TAA36935@apollo.backplane.com>

next in thread | raw e-mail | index | archive | help
:um, this doesn't give us a growing MFS and i may be niave about this but
:consider:
:
:FFS requests a block passing in a buffer, the buffer is switched out from
:under the attached vnode and stored in the free list (or possibly the
:'locked' list), a buffer from the Memory device is put in its place and
:the vnode is marked as such.
:
:FFS has the block for a while. 
:
:Eventually the block is taken off the LRU list because of the nature of
:the LRU queue, anything removing a block must check the mark to see if
:it's MFS backed.  

    Which buffer?  The one MFS passed back or the original one that was
    replaced?  I assume you mean that the original buffer is freed and
    we are now talking about the one MFS passed back, currently under
    control of FFS, is no longer being used and eventually is ready 
    to be freed again.

:In fact this could be a callback function, called in general when ANY
:buffers are reused allowing for other flexibilities, however, this
:callback may already be in place as the "flush to backing store"
:call that's done for traditional devices under FFS.

    In order for the callback to work, especially if you intend this
    mechanism to work across VFS layers, the original 'source' of the
    buffer must be recorded in the vm_page_t.  Otherwise the callback
    doesn't know who to call.

:At this point a buffer must be reattached to this vnode, it can be brought
:over from the free list, or perhaps the original buffer could have been
:placed on the 'locked' list (is this still around?)

    Anything put on a 'free' list is gone.  Or you are mis-defining the 
    function of the 'free' list... it isn't really a free list.  

    The problem with renaming a page isn't with the page being ripped out
    from the upper VFS layer, but the fact that the lower VFS layer is
    removing the page from its own map and thus 'looses' track of it -
    something a vm_alias would solve neatly.

						-Matt

:Maybe this is impossible with what we have, or doesn't make sense sorry.
:I _really_ need to UTSL more. :)
:
:-Alfred
:
:btw, what you and John Dyson are working on sounds trully awesome.  I
:really hope you guys consider publishing a paper on it sometime because at
:that point FreeBSD will have moved so far from 4.4BSD, that the reference
:books become very far removed from the actual underlying system.

    That is an excellent idea.  The VFS stuff we are flame festing over now
    is not something under immediate development... I'm spending the next
    3 months cleaning up the existing VM system first, but something is
    going to happen at some point down the line.  The current situation
    cannot be scaled or extended easily and it is also way too easy for
    programmers to make mistakes -- VOP_GETPAGES and VOP_PUTPAGES alone have
    so many assumed side effects (as to how objects and pages are locked
    and what state they should be in on return) that it's a wonder there 
    aren't more bugs.

    I could hack in vm_alias's in about two days, but that doesn't mean I
    should.  I figure by the time I'm done fixing the VM system, the proper
    course of action to take in regards to VFS/BIO will be more apparent.

    ( By 'fixing' I mean mainly removing low memory deadlocks, low memory
    special cases, and removing cross dependancy 'bypasses' and special
    cases from the vm_pager and vm_object APIs ).

    I'll include the README that iterates what has been fixed so far 
    (and will be committed on the 15th or 16th after the tree is split)

						-Matt

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    


    * Complete replacement of swap pager (vm/swap_pager.c)

	The swap pager has been completely replaced.  The new pager uses the
	new blist bitmap allocator and is able to allocate and deallocate swap
	from its bitmap without blocking anywhere.  Additionally, the new pager
	is able to avoid memory deadlock situations and as a consequence we
	have simplified a number of other areas of the VM system.

	Also vm/vm_swap.c was changed... the swap device block size is now
	PAGE_SIZE'd.  This simplifies code throughout both modules.

    * Addition of bitmap management module, kern/subr_blist.c

	Used by the swap system.  (the old rlist module has been depreciated).
	Could be used for other things.

    * ripped out vm_object_t->paging_offset

	This field was hacked in all over the source to optimize out a 
	single swap_pager_copy() command in vm/vm_map.c.  I've ripped out
	the optimization because it really doesn't improve performance
	with the new swapper.

    * added vm_page_t->swapblk

	The swapblk for resident pages is stored in the vm_page_t rather then
	in swap metadata.  This field can also be used by other pagers, not
	just the swap pager.

    * removed low-memory checks in a couple of places

	There are a few places, such as in vm/vm_fault.c, where the system
	will stall a process if memory is low.  The problem is that if you
	have a memory-hogging process this tends to lock up all other 
	processes, making it impossible to login to the machine for fork/exec
	new programs.  The result is an effective lockup.

    * getpbuf()/relpbuf() - added subsystem limits

	A new argument has been added, a pointer to an integer counter which
	is decrmented on getpbuf() and incremented on relpbuf().  getpbuf()
	will block if the counter is 0.  This is on top of blocking when the
	global buffer pool is exhausted.  

	This feature is required to prevent any one subsystem from hogging
	pbuf's, which can lockup the machine in a low-memory situation
	(or lockup the machine, period).

    * Fixed madvise(). 

	madvise() was badly broken, but people didn't notice
	it because it wasn't actually trying to free pages immediately so
	processes had a chance to recover from its mistakes.

	At the moment madvise() really tries to free the page, but we will 
	probably back off and just clean the page and move it into the cache
	after testing is complete.

    * Major revamping of vm/vm_pageout.c

	Fixed a number of blocking and deadlock situations in pageout.c, 
	mainly relate to the swapper and to the vnode pager.

    * Major revamping of vm/vm_page.c

	vm_page_free() has been revamped along with a bunch of other routines.
	Also, added pager callbacks vm_pager_page_inserted() and 
	vm_pager_page_removed() and shoehorned them into vm_page_insert()
	and vm_page_free() and such.  vm_page_remove()'s functionality has
	changed and it is now a static function.

	vm_page_alloc() has been revamped.  Removed unnecessary inlining of
	code.  We now formally free cache pages before reusing them (also
	necessary since the mechanism of freeing a page has changed).

	Added vm_await() and vm_page_asleep() functions - will be used later.

    * Major revamping of MFS filesystem code.

	Now supports VOP_FREEBLKS and handles low-memory conditions better 
	as a side effect of changes made elsewhere.  Also added protection
	of MFS queue at splbio().

    * Added device-block-to-page-block and page-block-to-device-block
      conversions to sys/param.h

    * Added u_daddr_t to sys/types.h - unsigned version of daddr_t (used by
      new swap code)

    * Greatly simplified vm_object_t's swap-related fields, making the 
      structure a little smaller.

    * Simplified vm_page_t->hashq.  Changed the doubly-linked list to a singly
      linked list, doubled the size of the hash table ( without doubling
      the storage), and this change also simplifies a bunch of critical path
      code.

    * Removed vm_object_t->page_hint.  It was slowing things down instead of
      speeding things up.

    * Inlined a number of critical vm_pager routines.

			     ----- OTHER CHANGES -----

    * Added M_ASLEEP functionality to malloc (this is for later)

    * Changed malloc flag M_KERNEL to M_USE_RESERVE

    * Fixed uipc_usrreq.c sorflush() call to make sure it's a socket - it
      might not be.

    * vm_meter does not count device objects (such as /dev/mem), because 
      these really skew the results and make vmstat less useful.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901080320.TAA36935>