From owner-freebsd-hackers Thu Oct 30 15:03:38 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id PAA18477 for hackers-outgoing; Thu, 30 Oct 1997 15:03:38 -0800 (PST) (envelope-from owner-freebsd-hackers) Received: from usr03.primenet.com (tlambert@usr03.primenet.com [206.165.6.203]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id PAA18466 for ; Thu, 30 Oct 1997 15:03:34 -0800 (PST) (envelope-from tlambert@usr03.primenet.com) Received: (from tlambert@localhost) by usr03.primenet.com (8.8.5/8.8.5) id QAA04930; Thu, 30 Oct 1997 16:01:14 -0700 (MST) From: Terry Lambert Message-Id: <199710302301.QAA04930@usr03.primenet.com> Subject: Re: help with fstat? To: karpen@ocean.campus.luth.se (Mikael Karpberg) Date: Thu, 30 Oct 1997 23:01:12 +0000 (GMT) Cc: tlambert@primenet.com, freebsd-hackers@FreeBSD.ORG In-Reply-To: <199710301129.MAA10740@ocean.campus.luth.se> from "Mikael Karpberg" at Oct 30, 97 12:29:30 pm X-Mailer: ELM [version 2.4 PL23] Content-Type: text Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > Well, it's not for sure that the pages used in a MADV_SEQUENTIAL reading > in a process will not be used again, is it? I might back up a few bytes > in parsing text, for example, but ALMOST be sequential, and then it might > be a good idea to hint the system anyway. That would easilly be solved with > three pages, though, if one page is enough read ahead. You will maintain a lookahead buffer in your code to do this, or you will insure that you *never* back over a page boundry (at least not one that's not in a read-ahead page chain, which you probably can't know), or... you won't like to the system via madvise. Alternately, you agree to pay heinous paging overhead each time you go back on your promise to the VM system. 8-) 8-). > But the real case of where it will be reused is, actually, if many > processes access the file after eachother, or almost simultaniously. Well, if you look at the code, there are reference instances which are divorced, so I think this will not be a problem. > That might be the case for something like a loaded webserver where the > speed of a read might matter a lot. If you can only make the flag apply to the shared object instead of the referencing object, then you'd be right. Most likely, you would not use MADV_SEQUENTIAL in that case: you'd save the flagging for a case like "cp" (which currently does not use mmap() because of a legacy "fix" and does not call madvise() to flag it MADV_SEQUENTIAL anyway). Ie; you mark things sequential only if you promise they well be accessed that wy, and you don't make promises you can't keep (promises you can't keep is what INN did before the msync() fixes took place). > It might be mmaping and writing a whole bunch of index.html copies > a second, accessing them sequientally, in which it is likely to use > MADV_SEQUENTIAL, no? No... at least not if the system is loaded above the amount of physical RAM. And if it's loaded above the amount of physical RAM + swap, you are utterly screwed. > It's a very good thing if it doesn't trash those pages right away, > then. Do you want the pages cached behind you, or are you promising to access them sequentially. You can have one or the other. Either you say "I will not use these pages again, and, oh yes, I want read-ahead from the get-go even though I have not triggered slow-start sequential access recognition" (which should *also* set OBJ_SEQUENTIAL, btw!), OR you say "I may bneed these later". The whole issue here is process vs. system locality of reference. The whole issue with per vnode working set quotas is to prevent fast process locality from stomping slow system locality to death. If I'm running 5 xterms, each with a copy of /bin/sh, I should favor the executable imnages used by 5 processses over the data images used by one when I'm deciding whose page gets stolen to satisfy an "I want a page" request. > But less accessed pages will be very happily discareded right > away. They will not be moved back in the free-queue all the > time, because they are not accessed again. So they WILL be > truly discarded. > > Now, this might not be completely correct, but don't I have a point, Terry? I think there is still a need for a quota. The need is *NOT* the result of the MADV_SEQUENTIAL case (which is specific enough that it can be tweaked to be "sort of optimal" relatively easily). In reality, when ld or some other program randomly accesses a working set larger than physical RAM, it does so quickly enough (it's an I/O bound process -- it's soft priority will be kicked up) that it will basically force everyone elses clean-but-going-to-be-reused pages (oh, like text image backing a running program) out of core to back the faulted pages. You can demostrate this using an mmap'ing ld to link a kernel while you are running from an xterm, and trying to select another window. You have to page: o The X server's mouse code o The mouse cursor bitmap o The xterm you are moving from for LeaveNotify o The window manager for EnterNotify o The xterm again for FocusChanged o The the xterm's cursor change code o The window manager again for FocusChanged (window manager window) o The window manager for LeaveNotify (out of one xterm frame) o The window manager for EnterNotify (into another) o The window manager for LeaveNotify (out of the second xterm frame) o The new xterm for EnterNotify o The window manager and new xterm for FocusChanged o The the new xterm's cursor change code Now you are ready to type: o The xterm's keyboard handling code o The shell on the other end of the pty o The xterms display handling code o The X server's font for that xterm, plus the GC, plus the colormap, plus... Etc. Each one of these event boundry transitions is a full transit of the run queue by the scheduler. Each page involved (after the ld has thrashed them all out of core and swap) is a disk access (tsleep() -- another run queue transition) for howwever many code pages are involved (X itself is 8-10M -- how big is Motif?). The interactive response basically goes in the toilet when a process is allowed to create a large virtual address space and basically displace all other clean pages to the end of the LRU, and discard them from there. Such processes need to be whacked on the knuckles. I'm up for any suggestions you have to do the whacking, if you think it's possible without a working set quota... Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.