From owner-freebsd-arch Sun Feb 16 18:36:13 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D216437B401 for ; Sun, 16 Feb 2003 18:36:10 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 17D4843FCB for ; Sun, 16 Feb 2003 18:36:10 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1H2ZqU63137 for freebsd-arch@freebsd.org; Sun, 16 Feb 2003 21:35:52 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Sun, 16 Feb 2003 21:35:52 -0500 From: Bosko Milekic To: freebsd-arch@freebsd.org Subject: mb_alloc cache balancer / garbage collector Message-ID: <20030216213552.A63109@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I've finally gotten around to implementing the cache balancer/garbage collector for the mbuf allocator. Actually, it's been sitting in a local tree for a while but I finally got one of my -CURRENT machines back up and was able to debug it. Here's a little about what it does right now: - Gets woken up if it is detected that at least one of the caches has a number of mbufs or clusters lower than the tunable low-watermark. - Gets woken up if it is detected (on free) that the global cache has more objects in it than the tunable high-limit watermark (actually, I forgot to add this wakeup in the patch I post below but it's trivial to add). - Checks the per-CPU caches and global cache and if there are less than low-watermark objects replenishes them. - Checks the global cache and if the number of objects is above the tunable limit it frees a chunk of memory back to the system, without interfering with simultaneous network buffer allocations (it doesn't lock up the per-cpu caches while doing this) and without increasing mb_free() latency at all (because the lazy freeing is done from a kproc context). Soon, as we whack this thing around, I hope to implement some auto-tuning algorithms and have the daemon auto-tune its watermarks maximizing performance but also allowing the rest of the system to recover unused physical pages more efficiently. What does this mean for us on the long term? One of the things it means is that we continue to have a high performance scalable network buffer allocations but while also being able to free resources to the rest of the system. The auto-tuning mechanism can be made as complicated as we would like it to be as all the computations would be done from the context of the mbufd kproc, and not in any critical allocation paths. What does this mean for us in the really short term? It means that we can finally make all M_DONTWAIT allocations NOT interface with the VM subsystem at all. Why is this good in the really really short term? For one, you can have network device drivers call the mbuf code without Giant because they'll know for a fact that Giant will never be needed down the line. Since the cache balancer will replenish caches when they're under a low watermark, assuming a well-tuned system, no noticable impact will be felt on mbuf allocations and deallocations. The patch is: http://people.freebsd.org/~bmilekic/code/mbufd.patch I don't think it's quite ready for commit yet. I have to clean up a few minor things (need to make sure that it's totally safe to dynamically change the watermark sysctls in all cases, for one), and test a little longer (so far, so good). Feedback is welcome. In particular, if anyone is familiar with clever cache auto-tuning algorithms, that person's input would be really valuable. Regards, -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Feb 16 18:45:12 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A024637B401 for ; Sun, 16 Feb 2003 18:45:10 -0800 (PST) Received: from smtp4.server.rpi.edu (smtp4.server.rpi.edu [128.113.2.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id D77E043F75 for ; Sun, 16 Feb 2003 18:45:09 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp4.server.rpi.edu (8.12.7/8.12.7) with ESMTP id h1H2j8b6006388; Sun, 16 Feb 2003 21:45:08 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <200302150905.08387.wes@softweyr.com> References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302141733.29304.wes@softweyr.com> <200302150905.08387.wes@softweyr.com> Date: Sun, 16 Feb 2003 21:45:07 -0500 To: Wes Peters From: Garance A Drosihn Subject: Re: syslog.conf syntax change (multiple program/host specifications) Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="iso-8859-1" ; format="flowed" Content-Transfer-Encoding: quoted-printable X-RPI-Spam-Score: -1.3 () IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,SIGNATURE_SHORT_DENSE,SPAM_PHRASE_03_05 X-Scanned-By: MIMEDefang 2.28 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 9:05 AM +0000 2/15/03, Wes Peters wrote: >On Saturday 15 February 2003, Garance A Drosihn wrote: > > Based on a few minutes of testing, I think newsyslog will > > pretty much do the right thing if you call it as: > > >> newsyslog -Fr /var/log/somefilename >> > > The '-r' is just so newsyslog doesn't turn around and send > > a signal back to syslogd. I'm still tempted to add a '-R'. > >I'll give that a try Tuesday, or maybe Monday if I get my >current box running this weekend. Okay. > > I would add some default rotate-action to newsyslog, which >> would be used if -R is specified and the file is not listed >> in the newsyslog.conf file. > >Sounds good to me. Are you going to look into that? I'll >definitely want your changes to newsyslog to go along with >my changes to syslog.conf. ;^) Assuming we do not get too much snow tomorrow (Monday), I'll try to write up something then. In a separate message on 2/15/03, Wes Peters wrote: >On Saturday 15 February 2003, Thomas Quinot wrote: > > Le 2003-02-14, Wes Peters =E9crivait : > > > To this end I've implemented another feature, 'N' for > > > newsyslog. When the file size limit is reached, newsyslog > > > is run with the log filename as the only argument. The > > > size limitation in syslog.conf and newsyslog.conf should > > > agree or you won't get what you expect. > > > > Well, precisely for this reason it would seem even nicer to > > me to delegate the size limitation to newsyslog as well > > perhaps rebuilding a tool similar to daemontool's multilog > > based on code shared with newsyslog). > >That's a better answer than incorporating multilog with all it's >djb licensing warts, but still costs another process for every >log file you want to size-limit. > >Garance, did you get this one? Do you want to look at this? I believe this issue would be handled by the "force" option (either '-Fr' for now, or '-R' & handling once I do that). So, my assumption is that there is nothing additional I need to do here. Let me know if I'm missing something. -- Garance Alistair Drosehn =3D gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Feb 16 22:41:36 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C38D137B401 for ; Sun, 16 Feb 2003 22:41:35 -0800 (PST) Received: from cirb503493.alcatel.com.au (c18609.belrs1.nsw.optusnet.com.au [210.49.80.204]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E42043F3F for ; Sun, 16 Feb 2003 22:41:34 -0800 (PST) (envelope-from peterjeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1]) by cirb503493.alcatel.com.au (8.12.5/8.12.5) with ESMTP id h1H6fVLZ063366; Mon, 17 Feb 2003 17:41:32 +1100 (EST) (envelope-from jeremyp@cirb503493.alcatel.com.au) Received: (from jeremyp@localhost) by cirb503493.alcatel.com.au (8.12.6/8.12.5/Submit) id h1H6fUrJ063365; Mon, 17 Feb 2003 17:41:30 +1100 (EST) Date: Mon, 17 Feb 2003 17:41:30 +1100 From: Peter Jeremy To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217064130.GA62020@cirb503493.alcatel.com.au> References: <20030216213552.A63109@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030216213552.A63109@unixdaemons.com> User-Agent: Mutt/1.4i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, Feb 16, 2003 at 09:35:52PM -0500, Bosko Milekic wrote: > I've finally gotten around to implementing the cache balancer/garbage > collector for the mbuf allocator. Actually, it's been sitting in a > local tree for a while but I finally got one of my -CURRENT machines > back up and was able to debug it. Excellent. > For one, you can have network device drivers call the mbuf code > without Giant because they'll know for a fact that Giant will never be > needed down the line. Since the cache balancer will replenish caches > when they're under a low watermark, assuming a well-tuned system, no > noticable impact will be felt on mbuf allocations and deallocations. My only concern is that replishment is reliant on scheduling a process (kernel thread) whilst allocation occurs both at interrupt level and during normal process operation. Is it possible for a heavily loaded system (and a heavy traffic spike) to totally empty the mbuf cache in the interval between the low watermark being reached and the allocator actually running? If so, what happens? Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Feb 16 23:25:57 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7ECBC37B401 for ; Sun, 16 Feb 2003 23:25:56 -0800 (PST) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id DA0D643FBD for ; Sun, 16 Feb 2003 23:25:53 -0800 (PST) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id h1H7PqnN029060 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Sun, 16 Feb 2003 23:25:53 -0800 (PST)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <316301c2d655$cdfb2df0$52557f42@errno.com> From: "Sam Leffler" To: "Peter Jeremy" , "Bosko Milekic" Cc: References: <20030216213552.A63109@unixdaemons.com> <20030217064130.GA62020@cirb503493.alcatel.com.au> Subject: Re: mb_alloc cache balancer / garbage collector Date: Sun, 16 Feb 2003 23:25:52 -0800 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > > For one, you can have network device drivers call the mbuf code > > without Giant because they'll know for a fact that Giant will never be > > needed down the line. Since the cache balancer will replenish caches > > when they're under a low watermark, assuming a well-tuned system, no > > noticable impact will be felt on mbuf allocations and deallocations. > > My only concern is that replishment is reliant on scheduling a process > (kernel thread) whilst allocation occurs both at interrupt level and > during normal process operation. Is it possible for a heavily loaded > system (and a heavy traffic spike) to totally empty the mbuf cache in > the interval between the low watermark being reached and the allocator > actually running? If so, what happens? > With kernel preemption this should be less of an issue. Presumably the balancer thread runs with high enough priority to take preemptive control quickly. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun Feb 16 23:34:46 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D36537B401 for ; Sun, 16 Feb 2003 23:34:45 -0800 (PST) Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by mx1.FreeBSD.org (Postfix) with ESMTP id D14C243F3F for ; Sun, 16 Feb 2003 23:34:44 -0800 (PST) (envelope-from des@ofug.org) Received: by flood.ping.uio.no (Postfix, from userid 2602) id BEE22536E; Mon, 17 Feb 2003 08:34:43 +0100 (CET) X-URL: http://www.ofug.org/~des/ X-Disclaimer: The views expressed in this message do not necessarily coincide with those of any organisation or company with which I am or have been affiliated. To: Bosko Milekic Cc: freebsd-arch@freebsd.org Subject: Re: mb_alloc cache balancer / garbage collector From: Dag-Erling Smorgrav Date: Mon, 17 Feb 2003 08:34:43 +0100 In-Reply-To: <20030216213552.A63109@unixdaemons.com> (Bosko Milekic's message of "Sun, 16 Feb 2003 21:35:52 -0500") Message-ID: User-Agent: Gnus/5.090014 (Oort Gnus v0.14) Emacs/21.2 (i386--freebsd) References: <20030216213552.A63109@unixdaemons.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic writes: > What does this mean for us on the long term? One of the things it > means is that we continue to have a high performance scalable network > buffer allocations but while also being able to free resources to the > rest of the system. Does this render nmbclusters obsolete? DES -- Dag-Erling Smorgrav - des@ofug.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 1:16:49 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F2EA437B401 for ; Mon, 17 Feb 2003 01:16:48 -0800 (PST) Received: from phk.freebsd.dk (phk.freebsd.dk [212.242.86.175]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0A46943F75 for ; Mon, 17 Feb 2003 01:16:48 -0800 (PST) (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by phk.freebsd.dk (8.12.6/8.12.6) with ESMTP id h1H9Gk6E053593; Mon, 17 Feb 2003 09:16:46 GMT (envelope-from phk@phk.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.12.6/8.12.6) with ESMTP id h1H9GjOx024809; Mon, 17 Feb 2003 10:16:46 +0100 (CET) (envelope-from phk@phk.freebsd.dk) To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector From: phk@phk.freebsd.dk In-Reply-To: Your message of "Sun, 16 Feb 2003 21:35:52 EST." <20030216213552.A63109@unixdaemons.com> Date: Mon, 17 Feb 2003 10:16:45 +0100 Message-ID: <24808.1045473405@critter.freebsd.dk> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message <20030216213552.A63109@unixdaemons.com>, Bosko Milekic writes: > I've finally gotten around to implementing the cache balancer/garbage > collector for the mbuf allocator. I talked with Jeff about something slightly similar for UMA: I would like to be able to say "try to always keep N items available" on a zone. For things like struct bio, the exception path is pretty drastic and GEOM allocates them M_NOWAIT and in small swarms, so being able to specify a moderate low-water mark would make sense. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 2:29:21 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DBB8937B401 for ; Mon, 17 Feb 2003 02:29:20 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2422343F85 for ; Mon, 17 Feb 2003 02:29:20 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h1HATIa25230; Mon, 17 Feb 2003 05:29:18 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Mon, 17 Feb 2003 05:29:18 -0500 (EST) From: Jeff Roberson To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector In-Reply-To: <20030216213552.A63109@unixdaemons.com> Message-ID: <20030217052758.E85957-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sun, 16 Feb 2003, Bosko Milekic wrote: > > I've finally gotten around to implementing the cache balancer/garbage > collector for the mbuf allocator. Actually, it's been sitting in a > local tree for a while but I finally got one of my -CURRENT machines > back up and was able to debug it. > Bosko, this is great stuff. This leads me to wonder though, are we ever going to unify mb alloc and uma? It seems that it would make sense to do so. If the performance is not as good with UMA then it may make sense to keep mb_alloc. Especially now that it can reclaim memory. Have you looked at catching the low memory callback to drain your caches? Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 4:24:44 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B67F37B401 for ; Mon, 17 Feb 2003 04:24:42 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id B866643FB1 for ; Mon, 17 Feb 2003 04:24:41 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1HCOKV64292; Mon, 17 Feb 2003 07:24:20 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 07:24:20 -0500 From: Bosko Milekic To: Jeff Roberson Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217072420.A64237@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> <20030217052758.E85957-100000@mail.chesapeake.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030217052758.E85957-100000@mail.chesapeake.net>; from jroberson@chesapeake.net on Mon, Feb 17, 2003 at 05:29:18AM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 05:29:18AM -0500, Jeff Roberson wrote: > > On Sun, 16 Feb 2003, Bosko Milekic wrote: > > > > > I've finally gotten around to implementing the cache balancer/garbage > > collector for the mbuf allocator. Actually, it's been sitting in a > > local tree for a while but I finally got one of my -CURRENT machines > > back up and was able to debug it. > > > > Bosko, this is great stuff. This leads me to wonder though, are we ever > going to unify mb alloc and uma? It seems that it would make sense to do > so. If the performance is not as good with UMA then it may make sense to > keep mb_alloc. Especially now that it can reclaim memory. Have you > looked at catching the low memory callback to drain your caches? I looked at unifying the allocator but there are several problems that make doing it pretty tough. One of these is the optimizations that we perform in mb_alloc. Notably, mbufs and mbuf clusters share the same cache lock. Also, I have routines that are able to allocate an mbuf and a cluster in one shot in one function call without dropping the cache lock in between. Similarily, m_getm() can allocate a large number of mbufs and clusters in one shot without - in the best and hopefully common case - dropping any cache lock in between. Although UMA is really good, from looking at it I doubt that I would be able to make these kinds of optimizations without ripping into/outof it pretty hard. > Cheers, > Jeff -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 5:24:33 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 00FE637B401 for ; Mon, 17 Feb 2003 05:24:33 -0800 (PST) Received: from melusine.cuivre.fr.eu.org (melusine.cuivre.fr.eu.org [62.212.105.185]) by mx1.FreeBSD.org (Postfix) with ESMTP id 13B0643F75 for ; Mon, 17 Feb 2003 05:24:32 -0800 (PST) (envelope-from thomas@cuivre.fr.eu.org) Received: by melusine.cuivre.fr.eu.org (Postfix, from userid 1000) id 456CB2C3D2; Mon, 17 Feb 2003 14:24:30 +0100 (CET) Date: Mon, 17 Feb 2003 14:24:30 +0100 From: Thomas Quinot To: Garance A Drosihn Cc: Wes Peters , arch@FreeBSD.ORG Subject: Re: syslog.conf syntax change (multiple program/host specifications) Message-ID: <20030217132430.GA46806@melusine.cuivre.fr.eu.org> Reply-To: Thomas Quinot References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302141733.29304.wes@softweyr.com> <200302150905.08387.wes@softweyr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.4i X-message-flag: WARNING! Using Outlook can damage your computer. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Le 2003-02-17, Garance A Drosihn écrivait : > I believe this issue would be handled by the "force" option > (either '-Fr' for now, or '-R' & handling once I do > that). So, my assumption is that there is nothing additional > I need to do here. Let me know if I'm missing something. My suggestion is to not change syslogd at all, and to use a modified version of newsyslog (or a new tool sharing come code with it) as a pipe destination for syslogd, which requires new development (code to copy stdin to a file, monitoring whether the rotating condition is met on that file, and rotating it when appropriate). Thomas. -- Thomas.Quinot@Cuivre.FR.EU.ORG To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 6:43:46 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 077A537B401 for ; Mon, 17 Feb 2003 06:43:45 -0800 (PST) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5A56D43F75 for ; Mon, 17 Feb 2003 06:43:44 -0800 (PST) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.6/8.12.6) with ESMTP id h1HEhh8I020580 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Mon, 17 Feb 2003 09:43:43 -0500 (EST) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id h1HEhcB42929; Mon, 17 Feb 2003 09:43:38 -0500 (EST) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15952.62746.260872.18687@grasshopper.cs.duke.edu> Date: Mon, 17 Feb 2003 09:43:38 -0500 (EST) To: Bosko Milekic Cc: freebsd-arch@freebsd.org Subject: Re: mb_alloc cache balancer / garbage collector In-Reply-To: <20030216213552.A63109@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic writes: > What does this mean for us in the really short term? It means that we > can finally make all M_DONTWAIT allocations NOT interface with the VM > subsystem at all. Why is this good in the really really short term? > For one, you can have network device drivers call the mbuf code > without Giant because they'll know for a fact that Giant will never be > needed down the line. Since the cache balancer will replenish caches Not to detract from your work, but M_DONTWAIT allocations have been Giant-free since Alan made the VM system Giant-free for kernel-map allocations in early January. The long-term implications of this work look very exciting. Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 6:45: 5 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9946A37B401 for ; Mon, 17 Feb 2003 06:45:03 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id DC5C243F93 for ; Mon, 17 Feb 2003 06:45:02 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1HEick64583; Mon, 17 Feb 2003 09:44:38 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 09:44:38 -0500 From: Bosko Milekic To: Peter Jeremy Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217094438.A64558@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> <20030217064130.GA62020@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030217064130.GA62020@cirb503493.alcatel.com.au>; from peterjeremy@optushome.com.au on Mon, Feb 17, 2003 at 05:41:30PM +1100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 05:41:30PM +1100, Peter Jeremy wrote: > > For one, you can have network device drivers call the mbuf code > > without Giant because they'll know for a fact that Giant will never be > > needed down the line. Since the cache balancer will replenish caches > > when they're under a low watermark, assuming a well-tuned system, no > > noticable impact will be felt on mbuf allocations and deallocations. > > My only concern is that replishment is reliant on scheduling a process > (kernel thread) whilst allocation occurs both at interrupt level and > during normal process operation. Is it possible for a heavily loaded > system (and a heavy traffic spike) to totally empty the mbuf cache in > the interval between the low watermark being reached and the allocator > actually running? If so, what happens? This is a legit concern. We try to avoid having this happen by: 1) Running the daemon at sufficient priority (PVM) 2) Properly tuning the watermarks However, it should be noted that even if this does happen right now, it's OK because I haven't yet instrumented the M_DONTWAIT case to NOT touch the VM code. That is, in the version of the patch I posted, M_DONTWAIT is still allowed to allocate a page from VM if it can't find anything in the cache(s). There is no forseeable reason that it should not be allowed to keep doing that in the long term. The reason we wanted to remove VM allocations in the M_DONTWAIT case is so that we could safely lock-down parts of network device drivers and remove the requirement for them to keep Giant across mbuf allocations. However, I think that perhaps kmem_malloc() no longer requires Giant (I know kmem_free() does)... so perhaps we could just leave things as they are, in which case you don't have to worry about the spikes except that, of course, making the VM calls makes the allocation more expensive so you still want to make sure you properly tune the watermarks[*]. [*] Speaking of which, I'm still looking for clever auto-tuning algorithms. > Peter -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 6:56:45 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F5A037B401 for ; Mon, 17 Feb 2003 06:56:43 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id C9AE743F93 for ; Mon, 17 Feb 2003 06:56:42 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1HEuL564622; Mon, 17 Feb 2003 09:56:21 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 09:56:21 -0500 From: Bosko Milekic To: Dag-Erling Smorgrav Cc: freebsd-arch@freebsd.org Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217095621.C64558@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from des@ofug.org on Mon, Feb 17, 2003 at 08:34:43AM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 08:34:43AM +0100, Dag-Erling Smorgrav wrote: > Bosko Milekic writes: > > What does this mean for us on the long term? One of the things it > > means is that we continue to have a high performance scalable network > > buffer allocations but while also being able to free resources to the > > rest of the system. > > Does this render nmbclusters obsolete? Heh. Another good question. Right now, no. One could argue though that it's now technically OK to remove that limit but, personally, I think I would still argue that we should keep it. Several reasons: 1) It's good to cap the amount of virtual address space reserved for network buffers; we've seen over the years that a lot of resource-exhausting DoS attacks relied on a code path in the network code that could be used to exhaust system resources by over-allocating to network buffers. At least the virtual address cap allows us to eventually level out and - now with the cache balancer/garbage collector - recover completely. 2) A few optimizations, notably the one regarding mbuf cluster reference counts, relies on mbuf clusters coming from a contiguous virtual address map. Then you can do things like keep an array of reference counters for clusters and index into it based on the virtual address of the cluster for reference counting. I know I've been knocking my head against my desk trying to figure out if there's a better way to do reference counting while maintaining the same level of performance (and I'm sure others have to, judging from countless discussions and the amount of times this code has changed), but we haven't had that G-dly revelation yet. :-) > DES > -- > Dag-Erling Smorgrav - des@ofug.org -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 6:59: 4 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E02B37B401 for ; Mon, 17 Feb 2003 06:59:03 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id B9CE943F75 for ; Mon, 17 Feb 2003 06:59:02 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1HEwgw64659; Mon, 17 Feb 2003 09:58:42 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 09:58:42 -0500 From: Bosko Milekic To: Andrew Gallatin Cc: freebsd-arch@freebsd.org Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217095842.D64558@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <15952.62746.260872.18687@grasshopper.cs.duke.edu>; from gallatin@cs.duke.edu on Mon, Feb 17, 2003 at 09:43:38AM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 09:43:38AM -0500, Andrew Gallatin wrote: > > Bosko Milekic writes: > > What does this mean for us in the really short term? It means that we > > can finally make all M_DONTWAIT allocations NOT interface with the VM > > subsystem at all. Why is this good in the really really short term? > > For one, you can have network device drivers call the mbuf code > > without Giant because they'll know for a fact that Giant will never be > > needed down the line. Since the cache balancer will replenish caches > > Not to detract from your work, but M_DONTWAIT allocations have been > Giant-free since Alan made the VM system Giant-free for kernel-map > allocations in early January. > > The long-term implications of this work look very exciting. Yep, just noticed that. Oh well, all the better for us! I haven't actually instrumented M_DONTWAIT allocations to NOT dip into the VM code in the patch so I guess I won't have to after all. > Drew -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 9:42:33 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6BD1E37B401 for ; Mon, 17 Feb 2003 09:42:31 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id B2E1043F3F for ; Mon, 17 Feb 2003 09:42:30 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1HHgSSJ097183; Mon, 17 Feb 2003 09:42:28 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1HHgSOq097182; Mon, 17 Feb 2003 09:42:28 -0800 (PST) Date: Mon, 17 Feb 2003 09:42:28 -0800 (PST) From: Matthew Dillon Message-Id: <200302171742.h1HHgSOq097182@apollo.backplane.com> To: Bosko Milekic Cc: Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG The work looks great, but I have to say I have grave reservations over any sort of thread-based cleanup / balancing code for a memory allocation subsystem. The only advantage that I can see is that you get good L1 cache effects, but that is counterbalanced by a number of severe disadvantages which have taken a long time to clear up in other subsystems which use separate threads (pageout daemon, buf daemon, syncer, pmap garbage collection from the pageout daemon, etc). Most of these daemons have very good reasons for needing a thread, but I can't think of any reason why a straight memory allocator would *require* a thread. Wouldn't it be easier and more scaleable to implement the hysteresis on the fly? It sounds like it ought to be simple... you have a sysctl to set the per-cpu free cache size and hysteresis (for example, 32[8], aka upon reaching 32 free 32 - 8 = 24 to the global cache, keeping 8). Overflow goes into a global pool. Active systems do not usually bounce from 0 to the maximum number of mbufs and back again, over and over again. Instead they tend to have smaller swings and 'drift' towards the edges, so per-cpu hysteresis should not have to exceed 10% of the total available buffer space in order to reap the maximum locality of reference and mutex benefit. Even in a very heavily loaded system I would expect something like 128[64] to be sufficient. This sort of hysteresis could be implemented trivially in the main mbuf freeing code without any need for a thread and would have the same performance / L1 cache characteristics. Additionally, on-the-fly hysteresis would be able to handle extreme situations that a thread could not (such as extreme swings), and on-the-fly hysteresis can scale in severe or extreme situations while a thread cannot. The same argument could also be applied to UMA, btw. -Matt Matthew Dillon : : :On Mon, Feb 17, 2003 at 09:43:38AM -0500, Andrew Gallatin wrote: :> :> Bosko Milekic writes: :> > What does this mean for us in the really short term? It means that we :> > can finally make all M_DONTWAIT allocations NOT interface with the VM :> > subsystem at all. Why is this good in the really really short term? :> > For one, you can have network device drivers call the mbuf code :> > without Giant because they'll know for a fact that Giant will never be :> > needed down the line. Since the cache balancer will replenish caches :> :> Not to detract from your work, but M_DONTWAIT allocations have been :> Giant-free since Alan made the VM system Giant-free for kernel-map :> allocations in early January. :> :> The long-term implications of this work look very exciting. : : Yep, just noticed that. Oh well, all the better for us! I haven't : actually instrumented M_DONTWAIT allocations to NOT dip into the VM : code in the patch so I guess I won't have to after all. : :> Drew : :-- :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 10: 7: 6 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 209AF37B401 for ; Mon, 17 Feb 2003 10:07:04 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6461943F75 for ; Mon, 17 Feb 2003 10:07:03 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0222.cvx40-bradley.dialup.earthlink.net ([216.244.42.222] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18kpfM-0001jR-00; Mon, 17 Feb 2003 10:06:57 -0800 Message-ID: <3E512464.2D37555B@mindspring.com> Date: Mon, 17 Feb 2003 10:05:24 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matthew Dillon Cc: Bosko Milekic , Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a44b733e3ac878e81b06f067328b3be9c0350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matthew Dillon wrote: > The work looks great, but I have to say I have grave reservations > over any sort of thread-based cleanup / balancing code for a memory > allocation subsystem. The only advantage that I can see is that you > get good L1 cache effects, but that is counterbalanced by a number of > severe disadvantages which have taken a long time to clear up in other > subsystems which use separate threads (pageout daemon, buf daemon, > syncer, pmap garbage collection from the pageout daemon, etc). Most of > these daemons have very good reasons for needing a thread, but I can't > think of any reason why a straight memory allocator would *require* > a thread. The Classic Sequent paper on this uses a two level garbage collector, but avoids the MACH "mark and sweep" stle approach with a seperate GC process, by maintaining GC information dynamically, and coelescing when it becomes possible (when blocks are released to the coelesce-to-page page layer, it performs its accounting at that time). See the paper (1993): http://citeseer.nj.nec.com/484408.html > Wouldn't it be easier and more scaleable to implement the hysteresis on > the fly? It sounds like it ought to be simple... you have a sysctl > to set the per-cpu free cache size and hysteresis (for example, 32[8], > aka upon reaching 32 free 32 - 8 = 24 to the global cache, keeping 8). > Overflow goes into a global pool. Active systems do not usually > bounce from 0 to the maximum number of mbufs and back again, over > and over again. Instead they tend to have smaller swings and 'drift' > towards the edges, so per-cpu hysteresis should not have to exceed > 10% of the total available buffer space in order to reap the maximum > locality of reference and mutex benefit. Even in a very heavily loaded > system I would expect something like 128[64] to be sufficient. This > sort of hysteresis could be implemented trivially in the main mbuf > freeing code without any need for a thread and would have the same > performance / L1 cache characteristics. Additionally, on-the-fly > hysteresis would be able to handle extreme situations that a thread > could not (such as extreme swings), and on-the-fly hysteresis can > scale in severe or extreme situations while a thread cannot. > > The same argument could also be applied to UMA, btw. The one drawback in your approach is that you could end up hitting the global pool on each allocation, in the worst case. It's better to bound the transfer sizes, and add a third layer, as Sequent did in the Dynix allocator. By bounding the transfer sizes to some multiple number of allocation units, you get to amortize the cost. With a simple hysteresis, you effectively end up implementing a sliding, rather than a fixed window size, and that makes the reclaimer vastly more complicated than it needs to be (FWIW). If you wanted to use that paper reference as a reference to look for more recent work (the McKenney/Slingwine paper is 10 years old, now), there's also plenty of more recent work, though most of it is in the context of NUMA/iNUMA systems, which seem to be bad words around here these days... -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 12:41:55 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 349FE37B401 for ; Mon, 17 Feb 2003 12:41:52 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7094A43FAF for ; Mon, 17 Feb 2003 12:41:51 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1HKfR366265; Mon, 17 Feb 2003 15:41:27 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 15:41:27 -0500 From: Bosko Milekic To: Matthew Dillon Cc: Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217154127.A66206@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200302171742.h1HHgSOq097182@apollo.backplane.com>; from dillon@apollo.backplane.com on Mon, Feb 17, 2003 at 09:42:28AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 09:42:28AM -0800, Matthew Dillon wrote: > The work looks great, but I have to say I have grave reservations > over any sort of thread-based cleanup / balancing code for a memory > allocation subsystem. The only advantage that I can see is that you > get good L1 cache effects, but that is counterbalanced by a number of > severe disadvantages which have taken a long time to clear up in other > subsystems which use separate threads (pageout daemon, buf daemon, > syncer, pmap garbage collection from the pageout daemon, etc). Most of > these daemons have very good reasons for needing a thread, but I can't > think of any reason why a straight memory allocator would *require* > a thread. > > Wouldn't it be easier and more scaleable to implement the hysteresis on > the fly? It sounds like it ought to be simple... you have a sysctl > to set the per-cpu free cache size and hysteresis (for example, 32[8], > aka upon reaching 32 free 32 - 8 = 24 to the global cache, keeping 8). > Overflow goes into a global pool. Active systems do not usually > bounce from 0 to the maximum number of mbufs and back again, over > and over again. Instead they tend to have smaller swings and 'drift' > towards the edges, so per-cpu hysteresis should not have to exceed > 10% of the total available buffer space in order to reap the maximum > locality of reference and mutex benefit. Even in a very heavily loaded > system I would expect something like 128[64] to be sufficient. This > sort of hysteresis could be implemented trivially in the main mbuf > freeing code without any need for a thread and would have the same > performance / L1 cache characteristics. Additionally, on-the-fly > hysteresis would be able to handle extreme situations that a thread > could not (such as extreme swings), and on-the-fly hysteresis can > scale in severe or extreme situations while a thread cannot. The allocator does do some hysteresis for what concerns the per-CPU caches on the fly. It will move a bucket over to the global cache if the pcpu cache has gone above the high watermark and we're freeing. It's pretty easy to teach it to move more than a single bucket, too, if we find that it's worthwhile. Perhaps tuning the watermark code to do this more efficiently is worth looking at. What the daemon does is replenish the per-CPU caches (if necessary) in one shot without imposing the overhead on the allocation path. That is, it'll move a bunch of buckets over to the per-CPU caches if they are under-populated; doing that from the main allocation path is theoretically possible but tends to produce high spiking in latency. So what the daemon basically is is a compromise between doing it in the allocation/free path on-the-fly, and doing it from a parallel thread. Additionally, the daemon will empty part of the global cache to the VM, if it needs to - this process is relatively expensive and also produces irregularities in performance, particularly if you decide to do it in the main free path. One of the things I really wanted to focus on was significantly minimizing any VM interactions during network buffer allocations and frees. Once you start minimizing the number of times you'll be going back to the VM with watermarks, you inevitably increase the number of checks you have to do in the regular free case. If you then minimize the number of checks and computations to determine when to flush to VM and when not to, you often end up flushing too often. So it's a tradeoff, really. Perhaps you'll eventually converge to some 'reasonable' compromise, but if you can do it from a thread scheduled in parallel, then it's even easier as long as, as you say, there are no complicated issues to deal with because of the fact that suddenly you have this daemon which runs in parallel and modifies the behavior of your allocations. In this case, though, the allocator was designed with the idea that freeing and balancing would be implemented from a kproc scheduled in parallel anyway, so I hope that those complexities are a non-issue. So, in summary, the daemon here is not the only thing doing the balancing; it's a "compromise," if you will, if both models. As for "extreme swings," I would tend to think that it's the contrary. Like, say you have a huge spike in allocations; with the current model, you'll be able to handle it even in the extreme case and you'll be able to recover via the kproc. If you have a series of huge spikes, then this model may in fact even work out better for you because, due to scheduling, you may defer all attempts to balance caches until the end of the spike, so you may actually avoid ping-ponging of buckets from cache to cache because you won't be relying on the spike data to balance in the long term. Anyway, all this is pretty theoretical talk. My intention is to tune this thing and further evaluate performance based on the requirements of real life applications. > The same argument could also be applied to UMA, btw. > > -Matt > Matthew Dillon > -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 16: 0:42 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D7ED537B401 for ; Mon, 17 Feb 2003 16:00:39 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 301E243F75 for ; Mon, 17 Feb 2003 16:00:39 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1I00bSJ000433; Mon, 17 Feb 2003 16:00:37 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1I00bvl000432; Mon, 17 Feb 2003 16:00:37 -0800 (PST) Date: Mon, 17 Feb 2003 16:00:37 -0800 (PST) From: Matthew Dillon Message-Id: <200302180000.h1I00bvl000432@apollo.backplane.com> To: Bosko Milekic Cc: Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG : What the daemon does is replenish the per-CPU caches (if necessary) in : one shot without imposing the overhead on the allocation path. That : is, it'll move a bunch of buckets over to the per-CPU caches if they : are under-populated; doing that from the main allocation path is : theoretically possible but tends to produce high spiking in latency. : So what the daemon basically is is a compromise between doing it in : the allocation/free path on-the-fly, and doing it from a parallel : thread. Additionally, the daemon will empty part of the global cache :... Hmm. Well, you can also replentish the per-CPU caches in-bulk on the fly. You simply pull in more then one buffer and you will reap the same overhead benefits in the allocation path. If you depend on a thread to do this then you can create a situation where a chronic buffer shortage in the per-cpu cache can occur if the thread doesn't get cpu quickly enough, resulting in non-optimal operation. In otherwords, while it may seem you are saving latency in the critical path (the network trying to allocate a buffer), I think you might actually be creating a situation where instead of latency you wind up with a critical shortage. I don't think VM interaction is that big a deal. The VM system has a notion of a 'shortage' and a 'severe shortage'. When you are allocating mbufs from the global VM system into the per-cpu cache you simply allocate up to into the cache or until the VM system gets low (but not severely low) on memory. The hysteresis does not have to be much to reap the benefits and mitigate the overhead of the global mutex(es)... just 5 or 10 mbufs would mitigate global mutex overhead to the point where it becomes irrelevant. By creating a thread you are introducing more moving parts, and like a physical system these moving parts are going to ineract with each other. Remember, the VM system is *already* trying to ensure that enough free pages exist in the system. If you have a second thread eating memory in large globs it is far more likely that you will destabilize the pageout daemon and create an oscillation between the two threads (pageout daemon and your balancer). This might not turn up in benchmarks (which tend to focus on just one subsystem), but it could lead to some pretty nasty degenerate cases under heavy general loads. I think it is far better to let the VM system do its job and pull the mbufs in on-the-fly in smaller chunks which are less likely to destabilize the pageout daemon. This can be exasperated... made even worse, if your balancing thread is given a high priority. So you have the potential to starve the mbuf system if the balancing thread is too LOW a priority, and the potential to destabilize the VM system if the balancing thread is too HIGH a priority. Also, it seems to me that VM overheads are better addressed in the UMA subsystem, not in a leaf allocation subsystem. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 16:24:46 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E770937B401 for ; Mon, 17 Feb 2003 16:24:42 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 245BE43FAF for ; Mon, 17 Feb 2003 16:24:42 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1I0OII67187; Mon, 17 Feb 2003 19:24:18 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 19:24:18 -0500 From: Bosko Milekic To: Matthew Dillon Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217192418.A67144@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200302180000.h1I00bvl000432@apollo.backplane.com>; from dillon@apollo.backplane.com on Mon, Feb 17, 2003 at 04:00:37PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 04:00:37PM -0800, Matthew Dillon wrote: > : What the daemon does is replenish the per-CPU caches (if necessary) in > : one shot without imposing the overhead on the allocation path. That > : is, it'll move a bunch of buckets over to the per-CPU caches if they > : are under-populated; doing that from the main allocation path is > : theoretically possible but tends to produce high spiking in latency. > : So what the daemon basically is is a compromise between doing it in > : the allocation/free path on-the-fly, and doing it from a parallel > : thread. Additionally, the daemon will empty part of the global cache > :... > > Hmm. Well, you can also replentish the per-CPU caches in-bulk on the fly. > You simply pull in more then one buffer and you will reap the same > overhead benefits in the allocation path. If you depend on a thread > to do this then you can create a situation where a chronic buffer shortage > in the per-cpu cache can occur if the thread doesn't get cpu quickly > enough, resulting in non-optimal operation. In otherwords, while it > may seem you are saving latency in the critical path (the network trying > to allocate a buffer), I think you might actually be creating a situation > where instead of latency you wind up with a critical shortage. Hmm, not quite. You'd need to look at the code; there is no shortage situation created here. As I said, the model I employ is not a purely balance-everything-from-the-daemon model. It is a compromise. In other words, if you can't get an object from the per-CPU cache, you'll try to get an object from the global cache. If you can get an object from the global cache, you'll take it an move a bucket of objects from the global cache to the per-CPU cache for future use. If you can't get an object from the global cache either, it's OK, you'll allocate from VM. The difference comes in the free case where you'll free the object to the bucket, wherever the bucket is sitting (usually this will be your per-CPU cache but it may be, in the non-common case, the global cache). You'll never flush any of the caches back to the VM or move anything more than a bucket of objects between caches in the allocation/deallocation cases. The daemon takes care of that when it can. So, you don't have a resource situation no matter what. > I don't think VM interaction is that big a deal. The VM system has a > notion of a 'shortage' and a 'severe shortage'. When you are allocating > mbufs from the global VM system into the per-cpu cache you simply > allocate up to into the cache or until the VM system gets > low (but not severely low) on memory. The hysteresis does not have to > be much to reap the benefits and mitigate the overhead of the global > mutex(es)... just 5 or 10 mbufs would mitigate global mutex overhead > to the point where it becomes irrelevant. I already pretty much do this. If I really need to, I *will* _allocate_ up to a bucket of mbufs or clusters from VM. A "bucket" right now is PAGE_SIZE-worth, but that's modifiable. > By creating a thread you are introducing more moving parts, and like > a physical system these moving parts are going to ineract with each > other. Remember, the VM system is *already* trying to ensure that > enough free pages exist in the system. If you have a second thread > eating memory in large globs it is far more likely that you will > destabilize the pageout daemon and create an oscillation between the > two threads (pageout daemon and your balancer). This might not turn up > in benchmarks (which tend to focus on just one subsystem), but it could > lead to some pretty nasty degenerate cases under heavy general loads. > I think it is far better to let the VM system do its job and pull the > mbufs in on-the-fly in smaller chunks which are less likely to destabilize > the pageout daemon. This will not happen in the common case. The one exception is if your caches are not balanced or are too low. Assuming that the watermarks are tuned properly you should always have about the average of the watermarks in your caches; if you don't, all the daemon will do is replenish them to that value. Once that's done, it won't do anymore replenishing unless you go low again. Further, if you spike and then return back to normal, the free code will end up moving buckets of objects back to the general cache and the daemon will only free back to the VM from the global cache and, again, it won't free everything, but just enough to bring back the general cache number of objects to the average of the watermarks. So, you can still allocate from the VM in your allocation paths if you need to, but instead of wasting time allocating a bunch of buckets, setting up your free object lists, etc.etc., you'll only allocate one bucket and let the daemon do the rest. Also, keep in mind that the maps for mbufs and clusters are finite, so no matter what you do, you're not going to be able to go beyond the size of those maps. The corner cases you're probably thinking of are those where the rest of the system is strapped for memory and your mbuf daemon may be holding on to too much. The thing is that the daemon should not be over-allocating large chunks unless the caches are really low anyway (you can set the low watermark, keep that in mind), and further, in the extreme case, you could even have the VM system wakeup the daemon to drain ALL the caches in seriously extreme situations (but those are really corner cases in which case you're probably screwed anyway). > This can be exasperated... made even worse, if your balancing thread is > given a high priority. So you have the potential to starve the mbuf > system if the balancing thread is too LOW a priority, and the potential > to destabilize the VM system if the balancing thread is too HIGH a > priority. > > Also, it seems to me that VM overheads are better addressed in the > UMA subsystem, not in a leaf allocation subsystem. Again, this is not a leaf-allocation subsystem anymore than the UMA allocator is. Both interface directly with kmem_malloc/kmem_free. > -Matt -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 16:27:30 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D7CB337B401 for ; Mon, 17 Feb 2003 16:27:29 -0800 (PST) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3BBD343FA3 for ; Mon, 17 Feb 2003 16:27:29 -0800 (PST) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id h1I0ROnN032783 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Mon, 17 Feb 2003 16:27:24 -0800 (PST)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <019101c2d6e4$828cd370$52557f42@errno.com> From: "Sam Leffler" To: "Matthew Dillon" , "Bosko Milekic" Cc: "Andrew Gallatin" , References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> Subject: Re: mb_alloc cache balancer / garbage collector Date: Mon, 17 Feb 2003 16:27:24 -0800 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > By creating a thread you are introducing more moving parts, and like > a physical system these moving parts are going to ineract with each > other. Remember, the VM system is *already* trying to ensure that > enough free pages exist in the system. If you have a second thread > eating memory in large globs it is far more likely that you will > destabilize the pageout daemon and create an oscillation between the > two threads (pageout daemon and your balancer). Good point. Trying to reason about the behaviour of the system w/ two (or more) threads trying to do the same/similar work is too hard. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 16:30:16 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA4F737B401 for ; Mon, 17 Feb 2003 16:30:14 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1FEDC43F93 for ; Mon, 17 Feb 2003 16:30:14 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1I0Tqt67252; Mon, 17 Feb 2003 19:29:52 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 19:29:52 -0500 From: Bosko Milekic To: Matthew Dillon Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217192952.A67225@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20030217192418.A67144@unixdaemons.com>; from bmilekic@unixdaemons.com on Mon, Feb 17, 2003 at 07:24:18PM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 07:24:18PM -0500, Bosko Milekic wrote: > This will not happen in the common case. The one exception is if your > caches are not balanced or are too low. Assuming that the watermarks > are tuned properly you should always have about the average of the > watermarks in your caches; if you don't, all the daemon will do is > replenish them to that value. Once that's done, it won't do anymore > replenishing unless you go low again. Further, if you spike and then > return back to normal, the free code will end up moving buckets of > objects back to the general cache and the daemon will only free back > to the VM from the global cache and, again, it won't free everything, > but just enough to bring back the general cache number of objects to > the average of the watermarks. So, you can still allocate from the VM > in your allocation paths if you need to, but instead of wasting time > allocating a bunch of buckets, setting up your free object lists, > etc.etc., you'll only allocate one bucket and let the daemon do the > rest. One more thing I forgot to add that may help clear this up: the less the daemon runs, the better. Right now I can see how many times it ran by looking at a sysctl-exported counter, mbuf_daemon_ran. I can see that it only runs once to populate the caches and then only runs if I forcibly spike and return back to 'normal' steady activity and if I change the watermarks to force it to run. We should probably eventually have it track and detect an acceleration of running occurances and then, according to that, change the watermarks if it starts to increase (i.e., if it starts to run too much). As I said, in the normal case, it shouldn't run often. This thing doesn't wake up every N ticks. -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 17: 1:15 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8EE7137B401 for ; Mon, 17 Feb 2003 17:01:13 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id E935543F85 for ; Mon, 17 Feb 2003 17:01:12 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1I11ASJ001133; Mon, 17 Feb 2003 17:01:12 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1I11AWr001132; Mon, 17 Feb 2003 17:01:10 -0800 (PST) Date: Mon, 17 Feb 2003 17:01:10 -0800 (PST) From: Matthew Dillon Message-Id: <200302180101.h1I11AWr001132@apollo.backplane.com> To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG : One more thing I forgot to add that may help clear this up: the less : the daemon runs, the better. Right now I can see how many times it : ran by looking at a sysctl-exported counter, mbuf_daemon_ran. I can : see that it only runs once to populate the caches and then only runs : if I forcibly spike and return back to 'normal' steady activity and if : I change the watermarks to force it to run. : : We should probably eventually have it track and detect an acceleration : of running occurances and then, according to that, change the : watermarks if it starts to increase (i.e., if it starts to run too : much). As I said, in the normal case, it shouldn't run often. This : thing doesn't wake up every N ticks. : :-- :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org I guess I still don't understand the point of the daemon. The per-cpu caches are limited (in your patch) to 512 mbufs / 128 clusters. This represents very little memory even if you multiply by ncpus. We shouldn't have to 'balance' anything. Who cares if there are 511 mbufs sitting on cpu 0's cache that aren't being used? These numbers are going to be tuned for the machine (for example, based on the amount of main memory), and are far smaller then the total possible. The only case that matters is if a per-cpu cache gets blown up by an inordinate number of frees being done to it. That is, when the mbuf or cluster count exceeds mbuf_limit or clust_limit. Why is the daemon more preferable for handling this case verses freeing a bunch (like 8 or 16) mbufs/clusters on the fly at the time of the free when the per-cpu cache exceeds the limit? I don't see any advantage to having the daemon at all, and I see several disadvantages. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 17:33:34 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C89FE37B401 for ; Mon, 17 Feb 2003 17:33:31 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0678D43FA3 for ; Mon, 17 Feb 2003 17:33:31 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1I1X6v67791; Mon, 17 Feb 2003 20:33:06 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 20:33:06 -0500 From: Bosko Milekic To: Matthew Dillon Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217203306.A67720@unixdaemons.com> References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> <200302180101.h1I11AWr001132@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200302180101.h1I11AWr001132@apollo.backplane.com>; from dillon@apollo.backplane.com on Mon, Feb 17, 2003 at 05:01:10PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 05:01:10PM -0800, Matthew Dillon wrote: > > : One more thing I forgot to add that may help clear this up: the less > : the daemon runs, the better. Right now I can see how many times it > : ran by looking at a sysctl-exported counter, mbuf_daemon_ran. I can > : see that it only runs once to populate the caches and then only runs > : if I forcibly spike and return back to 'normal' steady activity and if > : I change the watermarks to force it to run. > : > : We should probably eventually have it track and detect an acceleration > : of running occurances and then, according to that, change the > : watermarks if it starts to increase (i.e., if it starts to run too > : much). As I said, in the normal case, it shouldn't run often. This > : thing doesn't wake up every N ticks. > : > :-- > :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org > > I guess I still don't understand the point of the daemon. The per-cpu > caches are limited (in your patch) to 512 mbufs / 128 clusters. This > represents very little memory even if you multiply by ncpus. We shouldn't > have to 'balance' anything. Who cares if there are 511 mbufs sitting > on cpu 0's cache that aren't being used? These numbers are going to be > tuned for the machine (for example, based on the amount of main memory), > and are far smaller then the total possible. I never said that those (totally arbitrary, by the way) numbers are ideal. In fact, I think they should be changed. > The only case that matters is if a per-cpu cache gets blown up by an > inordinate number of frees being done to it. That is, when the mbuf > or cluster count exceeds mbuf_limit or clust_limit. > > Why is the daemon more preferable for handling this case verses freeing > a bunch (like 8 or 16) mbufs/clusters on the fly at the time of the > free when the per-cpu cache exceeds the limit? I don't see any advantage > to having the daemon at all, and I see several disadvantages. You can't just 'free' a bunch of mbufs back to the VM. You free them wherever you got them from (usually your pcpu cache). If you exceed mbuf_limit on your pcpu cache you'll migrate a bucket over to the global cache, which is what you want. However if your global cache becomes too 'blown up' as you say, then you may want to recover the unused physical pages. Doing that directly from the free has several disadvantages; It can be expensive in more ways than one; for one, the VM call itself is extra overhead. Secondly, sometimes freeing a page means traversing the cache until you hit a page worth of free mbufs to free, so even though you may really need to free a page you'll never actually get to freeing it unless you start traversing the list of buckets in the cache; and that's expensive for a simple free - common case or not. By doing the freeing from the kproc context you're not interfering with parallel allocations but you're also not taking longer than it takes to just cache the data being freed for the free case. That's a big advantage. By having the kproc also fill the pcpu caches according the the configurable watermarks you're ensuring to have a certain number of objects cached and ready for immediate allocations, again without taking longer than it takes to just retrieve the object being allocated from the cache for the allocation case. I think that your argument regarding having to worry about the daemon interfering with the VM system is reasonable but I think that what's been left out is that in the case of mbufd, the behavior is entirely determined by the watermarks, which are variable. The good news is that if mbufd is interfering too much, the watermarks can be modified so that it interferes less. With that said, I don't see how mbufd will eat away from the VM system. It doesn't try to replace the VM system, it just tries to avoid having to go to it for 95% of network buffer allocations. Perhaps I can address your concerns if you give me a specific example where you think the daemon is doing a bad thing, then I can work on fixing that. I think for corner cases it would even make sense to explicitly lower the watermarks (thus forcing the daemon to drain the caches) directly from the VM, if that's really determined to be an issue. > -Matt > Matthew Dillon > -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 17:38:33 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A75737B401 for ; Mon, 17 Feb 2003 17:38:32 -0800 (PST) Received: from rwcrmhc51.attbi.com (rwcrmhc51.attbi.com [204.127.198.38]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7161F43F85 for ; Mon, 17 Feb 2003 17:38:31 -0800 (PST) (envelope-from julian@elischer.org) Received: from interjet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by rwcrmhc51.attbi.com (rwcrmhc51) with ESMTP id <20030218013830051007dat7e>; Tue, 18 Feb 2003 01:38:30 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id RAA17270; Mon, 17 Feb 2003 17:38:29 -0800 (PST) Date: Mon, 17 Feb 2003 17:38:27 -0800 (PST) From: Julian Elischer To: Bosko Milekic Cc: Matthew Dillon , freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector In-Reply-To: <20030217192418.A67144@unixdaemons.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 17 Feb 2003, Bosko Milekic wrote: [...] Bosko, If I have one NIC bound to one CPU (a future capability say,) and another bound to a second, and there is a stream of packets fron NIC1 to NIC2 (we are routing) at (say) 30,000 packets per second, what is the path by which those 30,000 packets make their way from CPU2's cache of mbufs, back to CPU1 to be used again? (each second). We'll imagine there are no packets going the other way. (maybe they take a different route). To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 19:14:49 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BAE6137B401 for ; Mon, 17 Feb 2003 19:14:47 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0746743F75 for ; Mon, 17 Feb 2003 19:14:47 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1I3Dnc67975; Mon, 17 Feb 2003 22:13:49 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 22:13:49 -0500 From: Bosko Milekic To: Julian Elischer Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217221349.A67942@unixdaemons.com> References: <20030217192418.A67144@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from julian@elischer.org on Mon, Feb 17, 2003 at 05:38:27PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 05:38:27PM -0800, Julian Elischer wrote: > On Mon, 17 Feb 2003, Bosko Milekic wrote: > [...] > Bosko, If I have one NIC bound to one CPU (a future capability say,) > and another bound to a second, and there is a stream of packets fron > NIC1 to NIC2 (we are routing) at (say) 30,000 packets per second, > what is the path by which those 30,000 packets make their way from > CPU2's cache of mbufs, back to CPU1 to be used again? (each second). > We'll imagine there are no packets going the other way. (maybe they take > a different route). If the mbufs are allocated from CPU1's cache then they'll most likely be freed back to CPU1's cache. The way it works is that the mbuf is freed back to its bucket and so to whichever cache the bucket is sitting in. Most likely the bucket will not have migrated caches so you're going to be using CPU1's cache in this scenario, since that's the cache you're allocating from. This is probably not ideal when you do something like bind the NIC to a CPU but it is better than having the freing thread free to its own cache, in which case you'd have a serious debalancing of caches going on. I'm not sure how performance would be impacted either way, but I guess I can't say until we actually bind the NICs each to their own CPUs and measure. -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 19:45:53 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A9AED37B401 for ; Mon, 17 Feb 2003 19:45:50 -0800 (PST) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id 38F9643F3F for ; Mon, 17 Feb 2003 19:45:49 -0800 (PST) (envelope-from julian@elischer.org) Received: from interjet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by sccrmhc02.attbi.com (sccrmhc02) with ESMTP id <200302180345480020032booe>; Tue, 18 Feb 2003 03:45:48 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id TAA18135; Mon, 17 Feb 2003 19:45:47 -0800 (PST) Date: Mon, 17 Feb 2003 19:45:45 -0800 (PST) From: Julian Elischer To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector In-Reply-To: <20030217221349.A67942@unixdaemons.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 17 Feb 2003, Bosko Milekic wrote: > > On Mon, Feb 17, 2003 at 05:38:27PM -0800, Julian Elischer wrote: > > On Mon, 17 Feb 2003, Bosko Milekic wrote: > > [...] > > Bosko, If I have one NIC bound to one CPU (a future capability say,) > > and another bound to a second, and there is a stream of packets fron > > NIC1 to NIC2 (we are routing) at (say) 30,000 packets per second, > > what is the path by which those 30,000 packets make their way from > > CPU2's cache of mbufs, back to CPU1 to be used again? (each second). > > We'll imagine there are no packets going the other way. (maybe they take > > a different route). > > If the mbufs are allocated from CPU1's cache then they'll most likely > be freed back to CPU1's cache. The way it works is that the mbuf is > freed back to its bucket and so to whichever cache the bucket is > sitting in. Most likely the bucket will not have migrated caches so > you're going to be using CPU1's cache in this scenario, since that's > the cache you're allocating from. This is probably not ideal when you > do something like bind the NIC to a CPU but it is better than having > the freing thread free to its own cache, in which case you'd have a > serious debalancing of caches going on. I'm not sure how performance > would be impacted either way, but I guess I can't say until we > actually bind the NICs each to their own CPUs and measure. So this means that CPU2 is freeing into a cache belonging to CPU1. This means that somewhere there must be a lock involved.. I thought that part of this was that we were trying to avoid using locks.. Is there a separation between the structures that accept the buffer from CPU2 and those that CPU1 gets them from? What mitigates the lock contention between CPU2 and CPU1? > > -- > Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org > > "If we open a quarrel between the past and the present, we shall > find that we have lost the future." > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 20: 4:25 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A52A337B401 for ; Mon, 17 Feb 2003 20:04:22 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id C2E8443F85 for ; Mon, 17 Feb 2003 20:04:21 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1I43RV68254; Mon, 17 Feb 2003 23:03:27 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 23:03:27 -0500 From: Bosko Milekic To: Julian Elischer Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217230327.A68207@unixdaemons.com> References: <20030217221349.A67942@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from julian@elischer.org on Mon, Feb 17, 2003 at 07:45:45PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 07:45:45PM -0800, Julian Elischer wrote: > On Mon, 17 Feb 2003, Bosko Milekic wrote: > > On Mon, Feb 17, 2003 at 05:38:27PM -0800, Julian Elischer wrote: > > > On Mon, 17 Feb 2003, Bosko Milekic wrote: > > > [...] > > > Bosko, If I have one NIC bound to one CPU (a future capability say,) > > > and another bound to a second, and there is a stream of packets fron > > > NIC1 to NIC2 (we are routing) at (say) 30,000 packets per second, > > > what is the path by which those 30,000 packets make their way from > > > CPU2's cache of mbufs, back to CPU1 to be used again? (each second). > > > We'll imagine there are no packets going the other way. (maybe they take > > > a different route). > > > > If the mbufs are allocated from CPU1's cache then they'll most likely > > be freed back to CPU1's cache. The way it works is that the mbuf is > > freed back to its bucket and so to whichever cache the bucket is > > sitting in. Most likely the bucket will not have migrated caches so > > you're going to be using CPU1's cache in this scenario, since that's > > the cache you're allocating from. This is probably not ideal when you > > do something like bind the NIC to a CPU but it is better than having > > the freing thread free to its own cache, in which case you'd have a > > serious debalancing of caches going on. I'm not sure how performance > > would be impacted either way, but I guess I can't say until we > > actually bind the NICs each to their own CPUs and measure. > > So this means that CPU2 is freeing into a cache belonging to CPU1. > This means that somewhere there must be a lock involved.. I thought that > part of this was that we were trying to avoid using locks.. > Is there a separation between the structures that accept the buffer from > CPU2 and those that CPU1 gets them from? What mitigates the lock > contention between CPU2 and CPU1? Right, it basically means that in this scenario we degenerate to a single cache. The structure to which the mbuf is freed is called a "bucket" and right now a "bucket" keeps a PAGE_SIZE worth of mbufs. The idea is that you can move these buckets around from cache to cache, even if they're not totally full. In the scenario that you describe (which by the way is still inexistent), assuming that we determine that it's really worth doing the binding of the threads to individual CPUs (I'm not quite convinced that it is, ... yet), in order to reduce lock contention it would probably be wise to migrate buckets over to the second CPU. Unfortunately, this is easier said than done... once you allow objects allocated from one cache to be freed to whatever cache the CPU currently tampering with them/consuming them owns, then you risk severely deblancing your caches for regular non-CPU-bound applications. So for instance you could have something like this happen: CPU 1 exhausting its cache CPU 2 blowing up its cache because freeing is suddenly done to buckets which are now being migrated to CPU2's cache. CPU 2 migrating buckets to the global cache because suddenly its cache has more than high watermark mbufs. CPU 1 taking buckets back from the global cache because its cache is exhausted. Again, this is OK if you assume that the only users of the allocator are those CPU-bound-threads. Once you start introducing other non-CPU bound threads into the picture you may find that it's better to just keep the current behavior. This is why I can't say for sure how performance will fair in either case. We really have to wait until we're ready, then implement it, then tune and measure. When designing and implementing this you can't get the best case to consistently occur for all scenarios. If we had one fixed model/way to do things then we could easily tune to make the "best case" fit that model but there are always going to be tradeoffs. -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 20:21:15 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F046137B401 for ; Mon, 17 Feb 2003 20:21:13 -0800 (PST) Received: from sccrmhc02.attbi.com (sccrmhc02.attbi.com [204.127.202.62]) by mx1.FreeBSD.org (Postfix) with ESMTP id DA81343FAF for ; Mon, 17 Feb 2003 20:21:12 -0800 (PST) (envelope-from julian@elischer.org) Received: from interjet.elischer.org (12-232-168-4.client.attbi.com[12.232.168.4]) by sccrmhc02.attbi.com (sccrmhc02) with ESMTP id <200302180421110020032smhe>; Tue, 18 Feb 2003 04:21:12 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id UAA18360; Mon, 17 Feb 2003 20:21:08 -0800 (PST) Date: Mon, 17 Feb 2003 20:21:07 -0800 (PST) From: Julian Elischer To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector In-Reply-To: <20030217230327.A68207@unixdaemons.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, 17 Feb 2003, Bosko Milekic wrote: > > On Mon, Feb 17, 2003 at 07:45:45PM -0800, Julian Elischer wrote: > > On Mon, 17 Feb 2003, Bosko Milekic wrote: > > Right, it basically means that in this scenario we degenerate to a > single cache. The structure to which the mbuf is freed is called a > "bucket" and right now a "bucket" keeps a PAGE_SIZE worth of mbufs. > The idea is that you can move these buckets around from cache to > cache, even if they're not totally full. In the scenario that you > describe (which by the way is still inexistent), assuming that we > determine that it's really worth doing the binding of the threads to > individual CPUs (I'm not quite convinced that it is, ... yet), in that was a contrived example, however I can imagine many cases where the networking thread runs on one CPU, and tries to stay there due to affinity issues, which means that the fielding of interrupts and hense filling of mbufs, is left to the other CPU. I'm not saing that NICS need to be bound to processors (though if they were part of the processor unit as in some older SUN boxes that might make sense) but I am saying that I think that the producer and consumer might quite easily be constantly on different CPUs. Here's another example. One of the things that we will be doing in threads is the ability to bind a thread to a CPU. If that thread opens a socket, and starts receiving stuff then the 'consumer' is now locked to one CPU. Now let's make that thread also be using about 100% of that CPU. The other CPU is idle and therefore probably the producer is going to run there. It is true that "on average" things should even out but it is also very easy to make scenarios where this isn't true. Or, two processes doing some set of transactions with each other. (both usning lots of CPU). "On average" the producer and the consumre are going to be on different CPUs. It stilll seems odd to me that the consumer has to pass it back to the producer's CPU because "on average" it will require a locking cycle of some sort. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 20:33: 6 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B16C337B401 for ; Mon, 17 Feb 2003 20:33:03 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0A59043FB1 for ; Mon, 17 Feb 2003 20:33:03 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1I4W6t68519; Mon, 17 Feb 2003 23:32:06 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Mon, 17 Feb 2003 23:32:06 -0500 From: Bosko Milekic To: Julian Elischer Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030217233206.A68495@unixdaemons.com> References: <20030217230327.A68207@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from julian@elischer.org on Mon, Feb 17, 2003 at 08:21:07PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 08:21:07PM -0800, Julian Elischer wrote: > On Mon, 17 Feb 2003, Bosko Milekic wrote: > > On Mon, Feb 17, 2003 at 07:45:45PM -0800, Julian Elischer wrote: > > > On Mon, 17 Feb 2003, Bosko Milekic wrote: > > > > Right, it basically means that in this scenario we degenerate to a > > single cache. The structure to which the mbuf is freed is called a > > "bucket" and right now a "bucket" keeps a PAGE_SIZE worth of mbufs. > > The idea is that you can move these buckets around from cache to > > cache, even if they're not totally full. In the scenario that you > > describe (which by the way is still inexistent), assuming that we > > determine that it's really worth doing the binding of the threads to > > individual CPUs (I'm not quite convinced that it is, ... yet), in > > that was a contrived example, however I can imagine many cases where the > networking thread runs on one CPU, and tries to stay there due to > affinity issues, which means that the fielding of interrupts and hense > filling of mbufs, is left to the other CPU. > > I'm not saing that NICS need to be bound to processors (though if they > were part of the processor unit as in some older SUN boxes that might > make sense) but I am saying that I think that the producer and consumer > might quite easily be constantly on different CPUs. > > Here's another example. One of the things that we will be doing in > threads is the ability to bind a thread to a CPU. If that thread opens a > socket, and starts receiving stuff then the 'consumer' is now locked to > one CPU. Now let's make that thread also be using about 100% of that > CPU. The other CPU is idle and therefore probably the producer is going > to run there. It is true that "on average" things should even out but it > is also very easy to make scenarios where this isn't true. > > Or, two processes doing some set of transactions with each other. > (both usning lots of CPU). > "On average" the producer and the consumre are going to be on different > CPUs. It stilll seems odd to me that the consumer has to pass it back > to the producer's CPU because "on average" it will require a locking > cycle of some sort. Hmmm, to be perfectly honest with you, both of your examples are good examples. I guess what we'd have to do, at least eventually, is modify the code to, when freeing, also migrate the bucket over to the local CPU. Then future frees that involve objects going to the same bucket will need the consuming CPU's cache lock and won't need to contend with the producing CPU cache lock. However, as I already mentionned, it seems to me that this will only really work if you have a strict consumer/producer relationship where the consumer strictly sits on one CPU and the producer on another. The thing is that we don't know how often those cases are going to arise and whether we're warranted to make the change. Either way, I think we need to lock things down, make the modifications, boot two seperate kernels (one that implements each variation) and whack away at it. I don't want to ignore this but I'd like to put it aside for now until we're in the position that will allow us to look at it with more data on hand. Either way, I don't think this counters the advantages of the kproc (which is the original subject of this thread). In fact, it is worth noting that if we do notice that most consumers/producers are on different CPUs in most cases, and we do make the change above, the mbufd kproc can actually help in moving back the objects to <-> from the global cache faster, and with less cache ping-ponging going on (because the recycle-moves would be made in larger chunks) and so less often. -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org "If we open a quarrel between the past and the present, we shall find that we have lost the future." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 20:58:32 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BBB9637B401 for ; Mon, 17 Feb 2003 20:58:27 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 115E543F3F for ; Mon, 17 Feb 2003 20:58:27 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1I4wQSJ048764; Mon, 17 Feb 2003 20:58:26 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1I4wQiA048763; Mon, 17 Feb 2003 20:58:26 -0800 (PST) Date: Mon, 17 Feb 2003 20:58:26 -0800 (PST) From: Matthew Dillon Message-Id: <200302180458.h1I4wQiA048763@apollo.backplane.com> To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> <200302180101.h1I11AWr001132@apollo.backplane.com> <20030217203306.A67720@unixdaemons.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :> I guess I still don't understand the point of the daemon. The per-cpu :> caches are limited (in your patch) to 512 mbufs / 128 clusters. This :> represents very little memory even if you multiply by ncpus. We shouldn't :> have to 'balance' anything. Who cares if there are 511 mbufs sitting :> on cpu 0's cache that aren't being used? These numbers are going to be :> tuned for the machine (for example, based on the amount of main memory), :> and are far smaller then the total possible. : : I never said that those (totally arbitrary, by the way) numbers are : ideal. In fact, I think they should be changed. I can see adjusting them dynamically in an attempt to avoid hitting the hysteresis points too often, up to a point, but changing the numbers doesn't change the associated issues. I expect the defaults you have chosen to work fairly well across a broad range. You wouldn't want to make the numbers arbitrarily large just to avoid hysteresis, it would unbalance the rest of the system. Nor is it a good idea to just assume that your garbage collection thread can magically solve all the degenerate cases that pop up under varying load conditions. The per-cpu maximums have to be fairly low relative to availability in the global queue or you will have our memory subsystem going in circles from thread to thread trying to shove memory around. :> The only case that matters is if a per-cpu cache gets blown up by an :> inordinate number of frees being done to it. That is, when the mbuf :> or cluster count exceeds mbuf_limit or clust_limit. :> :> Why is the daemon more preferable for handling this case verses freeing :> a bunch (like 8 or 16) mbufs/clusters on the fly at the time of the :> free when the per-cpu cache exceeds the limit? I don't see any advantage :> to having the daemon at all, and I see several disadvantages. : : You can't just 'free' a bunch of mbufs back to the VM. You free them : wherever you got them from (usually your pcpu cache). If you exceed : mbuf_limit on your pcpu cache you'll migrate a bucket over to the : global cache, which is what you want. However if your global cache : becomes too 'blown up' as you say, then you may want to recover the : unused physical pages. Doing that directly from the free has several : disadvantages; : It can be expensive in more ways than one; for one, the VM call : itself is extra overhead. Secondly, sometimes freeing a page means : traversing the cache until you hit a page worth of free mbufs to free, : so even though you may really need to free a page you'll never : actually get to freeing it unless you start traversing the list of : buckets in the cache; and that's expensive for a simple free - common : case or not. Remember you are talking about two memory subsystems here. There was a suggestion a little while back in the thread that a better solution might be to integrate the mbuf allocator with UMA. That's really my main point. Use UMA and solve the global cache -> global VM issue in UMA. I have to disagree with your idea of 'expense'. At the point where freeing things on-the-fly becomes 'too expensive' your kernel thread will *already* be overloaded and messing up the system in other ways. Here's an example: Lets say we have an extreme mbuf load. Not so much in allocations, but in the *rate* of allocation and the *rate* of freeing. Now lets say you hit a hysteresis point. With the thread idea you wakeup your thread and continue on your merry way. You are assuming that your thread will be able to handle it. But this may not be true. Now lets say you are doing things on the fly and hit the hysteresis point. What will happen now is rather simple: Once you go over the upper bound you need to free mbufs until you hit the lower bound. You want to free more then one at a time for efficiency, but you *don't* need to free all the mbufs at once. What you do is simply free, say, 5 mbufs at a time for every call to free an mbuf until the levels drop to the lower bound. In otherwords, latency can be fully controlled with an on-the-fly solution because it is fully self-pacing. Now lets go back and look at the thread. Lets say something gets unbalanced and you hit your upper bound again, and start the thread going. How many mbufs is the thread going to free at once? Is it going to free the entire wad required to get back to the lower bound? How will this effect the latency of other processes? Of the pageout daemon, for example, or even of user processes which until your thread started running were doing a fair job draining the TCP and UDP buffers they've been processing. Unlike the on-the-fly method you can't really 'pace' the thread, because of the huge overhead in going to sleep every few milliseconds verses the overhead of freeing the mbufs. In otherwords, the question becomes: How do you intend to control the latency your thread is now causing in the system. I can pace the on-the-fly method trivially... in like four lines of code. How do you solve the same problem with your thread? It isn't as simple as giving it a fixed priority that is less then X and greater then Y. : By doing the freeing from the kproc context you're not interfering : with parallel allocations but you're also not taking longer than it : takes to just cache the data being freed for the free case. That's a : big advantage. By having the kproc also fill the pcpu caches I disagree with this. I don't see how the thread can possibly make a difference vis-a-vie parallel allocations. They work approximately the same either way. In making this statement you are assuming that your thread is getting cpu cycles that magically don't interfere with anything else going on in the system. I don't think you can make this statement without some more analysis. If you agree that dynamically adjusting the hysteresis points results in fewer thread wakeups, those same adjustments will also result in fewer 'extra' on-the-fly actions. : according the the configurable watermarks you're ensuring to have a : certain number of objects cached and ready for immediate allocations, : again without taking longer than it takes to just retrieve the object : being allocated from the cache for the allocation case. This is far from certain. You are again assuming that your thread is able to operate in a fixed period of time, without interfering with other things going on (like user processes which are draining TCP buffers and freeing mbufs back to the caches) to provide this assurance. : Perhaps I can address your concerns if you give me a specific example : where you think the daemon is doing a bad thing, then I can work on : fixing that. I think for corner cases it would even make sense to : explicitly lower the watermarks (thus forcing the daemon to drain the : caches) directly from the VM, if that's really determined to be an : issue. :... :-- :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org Well, Julian's example seemed pretty good, but it's not actually what I am worried about the most. What I am worried about the most is an effect I saw on BEST Internet's heavily loaded machines quite often, especially the old Challenge L's. The effect I am worried about is when system disk and/or network and/or cpu load becomes high enough to create artificial slowdowns in apparently unrelated processes. These slowdowns then lead to an increase in buffered data (like TCP data) and processes completing their work less quickly, leading to more processes as new connections come into the machine, and the whole thing spiraling out of control. The advantage of doing things on the fly is that you can 'smooth the curve'. that is, you approach the point of unusability rather then fall over a cliff and suddenly the machine is dead. It took an insane amount of effort to make the pageout daemon work that way and I'm afraid that your little process will require at least as much work to achieve the same result. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 21:11:24 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 68FA737B401 for ; Mon, 17 Feb 2003 21:11:23 -0800 (PST) Received: from smtp-relay.omnis.com (smtp-relay.omnis.com [216.239.128.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9807D43F3F for ; Mon, 17 Feb 2003 21:11:18 -0800 (PST) (envelope-from wes@softweyr.com) Received: from softweyr.homeunix.net (66-75-151-22.san.rr.com [66.75.151.22]) by smtp-relay.omnis.com (Postfix) with ESMTP id EF55C4310F; Mon, 17 Feb 2003 21:11:16 -0800 (PST) From: Wes Peters Organization: Softweyr To: Peter Jeremy Subject: Re: syslog.conf syntax change (multiple program/host specifications) Date: Tue, 18 Feb 2003 05:11:15 +0000 User-Agent: KMail/1.5 Cc: arch@FreeBSD.ORG References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302150918.09807.wes@softweyr.com> <20030215204503.GA56102@cirb503493.alcatel.com.au> In-Reply-To: <20030215204503.GA56102@cirb503493.alcatel.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200302180511.15013.wes@softweyr.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Saturday 15 February 2003 20:45, Peter Jeremy wrote: > On Sat, Feb 15, 2003 at 09:18:09AM +0000, Wes Peters wrote: > > >Right, only I don't have anywhere near the filesystem-foo to implement > > such a change. I suppose it could be done relatively straightforward > > by allowing the original leading disk blocks to be marked unused and > > making an offset to the beginning of the file field in the inode, > > that counts bytes to skip into the first truly allocated block. > > This doesn't sound too difficult - we can already free blocks from the > end of a file so it shouldn't be too difficult to free blocks from the > beginning of a file. Adding a start-of-file offset to I/O operations > is almost a mechanical operation. The only hard part would be finding > space in the inode for another off_t. > > The downside of this is that there would be an upper limit on the total > number of bytes that can be written to the file (ie when you run out of > triple-indirect blocks). You could avoid this by dropping unused > blocks at the front and shifting the remaining blocks forwards in the > inode. (Probably as groups of blocks to avoid the need to move block > pointers within indirect blocks). This would also reduce the size of > the offset from off_t to enough to represent an indirect block of > bytes. Yeah, if you "re-normalize" the file every time you truncate at the beginning, you don't need a full off_t, just an offset up to blocksize. Not much of a difference, and I have no idea how the re-normalization would affect performance; it's certainly not zero cost. The alternative is to use an off_t and only re-normalize when necessary, which would be the simpler solution. -- Where am I, and what am I doing in this handbasket? Wes Peters wes@softweyr.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 21:17:23 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9B15637B405 for ; Mon, 17 Feb 2003 21:17:21 -0800 (PST) Received: from smtp-relay.omnis.com (smtp-relay.omnis.com [216.239.128.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id CFD6443FBF for ; Mon, 17 Feb 2003 21:17:20 -0800 (PST) (envelope-from wes@softweyr.com) Received: from softweyr.homeunix.net (66-75-151-22.san.rr.com [66.75.151.22]) by smtp-relay.omnis.com (Postfix) with ESMTP id B4DEC4362F; Mon, 17 Feb 2003 21:16:01 -0800 (PST) From: Wes Peters Organization: Softweyr To: Garance A Drosihn Subject: Re: syslog.conf syntax change (multiple program/host specifications) Date: Tue, 18 Feb 2003 05:16:00 +0000 User-Agent: KMail/1.5 Cc: arch@FreeBSD.ORG References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302150905.08387.wes@softweyr.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200302180516.00673.wes@softweyr.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Monday 17 February 2003 02:45, Garance A Drosihn wrote: > > In a separate message on 2/15/03, Wes Peters wrote: > >On Saturday 15 February 2003, Thomas Quinot wrote: > > > Le 2003-02-14, Wes Peters =E9crivait : > > > > To this end I've implemented another feature, 'N' for > > > > newsyslog. When the file size limit is reached, newsyslog > > > > is run with the log filename as the only argument. The > > > > size limitation in syslog.conf and newsyslog.conf should > > > > agree or you won't get what you expect. > > > > > > Well, precisely for this reason it would seem even nicer to > > > me to delegate the size limitation to newsyslog as well > > > perhaps rebuilding a tool similar to daemontool's multilog > > > based on code shared with newsyslog). > > > >That's a better answer than incorporating multilog with all it's > >djb licensing warts, but still costs another process for every > >log file you want to size-limit. > > > >Garance, did you get this one? Do you want to look at this? > > I believe this issue would be handled by the "force" option > (either '-Fr' for now, or '-R' & handling once I do > that). So, my assumption is that there is nothing additional > I need to do here. Let me know if I'm missing something. Oh, Thomas was asking for a process that reads stdin and rotates=20 the data among log files in the way newsyslog does. The "right"=20 way to do this would be to extract the file rotation code into a=20 shared library (librotate - hahaha) and write a simple program to=20 implement the pipe functionality. It sounds straightforward, I=20 can look into it if you're too busy or not interested. Or buried under snow. ;^) =2D-=20 Where am I, and what am I doing in this handbasket? Wes Peters wes@softweyr.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 21:54:53 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8A1C937B401 for ; Mon, 17 Feb 2003 21:54:51 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 67E3F43F75 for ; Mon, 17 Feb 2003 21:54:50 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h1I5sgQb011220; Mon, 17 Feb 2003 21:54:42 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h1I5sc3H011219; Mon, 17 Feb 2003 21:54:38 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Mon, 17 Feb 2003 21:54:38 -0800 From: David Schultz To: Matthew Dillon Cc: Bosko Milekic , Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030218055438.GA10838@HAL9000.homeunix.com> Mail-Followup-To: Matthew Dillon , Bosko Milekic , Andrew Gallatin , freebsd-arch@FreeBSD.ORG References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200302171742.h1HHgSOq097182@apollo.backplane.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake Matthew Dillon : > Wouldn't it be easier and more scaleable to implement the hysteresis on > the fly? It sounds like it ought to be simple... you have a sysctl > to set the per-cpu free cache size and hysteresis (for example, 32[8], > aka upon reaching 32 free 32 - 8 = 24 to the global cache, keeping 8). > Overflow goes into a global pool. Active systems do not usually > bounce from 0 to the maximum number of mbufs and back again, over > and over again. Instead they tend to have smaller swings and 'drift' > towards the edges, so per-cpu hysteresis should not have to exceed > 10% of the total available buffer space in order to reap the maximum > locality of reference and mutex benefit. Even in a very heavily loaded > system I would expect something like 128[64] to be sufficient. This > sort of hysteresis could be implemented trivially in the main mbuf > freeing code without any need for a thread and would have the same > performance / L1 cache characteristics. Additionally, on-the-fly > hysteresis would be able to handle extreme situations that a thread > could not (such as extreme swings), and on-the-fly hysteresis can > scale in severe or extreme situations while a thread cannot. FWIW, I believe Sun's slab allocator does essentially what you describe, including the adjustment of per-CPU caches on the fly. However, instead of having a sysctl for the size of the per-cpu caches, they dynamically tune the sizes within a certain range every 15 seconds by monitoring contention of the lock on the global cache. Apparently this tends to stabilize very quickly. Take a look at Jeff Bonwick's magazine allocator paper. The way they keep down the overhead of managing per-CPU caches on the fly is quite clever. http://www.usenix.org/events/usenix01/bonwick.html BTW, this is *not* the original slab allocator paper; it covers extensions to it that add, among other things, per-CPU caches. To give you an idea of how big Solaris' per-CPU caches are, the ranges are described in the following table from _Solaris_Internals_. As I mentioned, they are occasionally adjusted within these ranges. Keep in mind that this is for a generic memory allocator, though, and not an mbuf allocator. Object Size Range Min PCPU Cache Size Max PCPU Cache Size 0-63 15 143 64-127 7 95 128-255 3 47 256-511 1 31 512-1023 1 15 1024-2047 1 7 2048-16383 1 3 16384- 1 1 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon Feb 17 22:32:13 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89CDB37B401 for ; Mon, 17 Feb 2003 22:32:12 -0800 (PST) Received: from smtp-relay.omnis.com (smtp-relay.omnis.com [216.239.128.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7E8DD43FEA for ; Mon, 17 Feb 2003 22:32:08 -0800 (PST) (envelope-from wes@softweyr.com) Received: from softweyr.homeunix.net (66-75-151-22.san.rr.com [66.75.151.22]) by smtp-relay.omnis.com (Postfix) with ESMTP id 5225243A1A; Mon, 17 Feb 2003 22:28:19 -0800 (PST) From: Wes Peters Organization: Softweyr To: "Sam Leffler" , "Peter Jeremy" , "Bosko Milekic" Subject: Re: mb_alloc cache balancer / garbage collector Date: Tue, 18 Feb 2003 06:28:18 +0000 User-Agent: KMail/1.5 Cc: References: <20030216213552.A63109@unixdaemons.com> <20030217064130.GA62020@cirb503493.alcatel.com.au> <316301c2d655$cdfb2df0$52557f42@errno.com> In-Reply-To: <316301c2d655$cdfb2df0$52557f42@errno.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200302180628.18590.wes@softweyr.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Monday 17 February 2003 07:25, Sam Leffler wrote: > > > > My only concern is that replishment is reliant on scheduling a > > process (kernel thread) whilst allocation occurs both at interrupt > > level and during normal process operation. Is it possible for a > > heavily loaded system (and a heavy traffic spike) to totally empty > > the mbuf cache in the interval between the low watermark being > > reached and the allocator actually running? If so, what happens? > > With kernel preemption this should be less of an issue. Presumably the > balancer thread runs with high enough priority to take preemptive > control quickly. Is this an area in which inversion-proof semaphores might be helpful? -- Where am I, and what am I doing in this handbasket? Wes Peters wes@softweyr.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 6:40:20 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 364A937B401 for ; Tue, 18 Feb 2003 06:40:15 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id D48F843F3F for ; Tue, 18 Feb 2003 06:40:13 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1IEdk569662; Tue, 18 Feb 2003 09:39:46 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Tue, 18 Feb 2003 09:39:46 -0500 From: Bosko Milekic To: Matthew Dillon Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030218093946.A69621@unixdaemons.com> References: <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> <200302180101.h1I11AWr001132@apollo.backplane.com> <20030217203306.A67720@unixdaemons.com> <200302180458.h1I4wQiA048763@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200302180458.h1I4wQiA048763@apollo.backplane.com>; from dillon@apollo.backplane.com on Mon, Feb 17, 2003 at 08:58:26PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Mon, Feb 17, 2003 at 08:58:26PM -0800, Matthew Dillon wrote: > : You can't just 'free' a bunch of mbufs back to the VM. You free them > : wherever you got them from (usually your pcpu cache). If you exceed > : mbuf_limit on your pcpu cache you'll migrate a bucket over to the > : global cache, which is what you want. However if your global cache > : becomes too 'blown up' as you say, then you may want to recover the > : unused physical pages. Doing that directly from the free has several > : disadvantages; > : It can be expensive in more ways than one; for one, the VM call > : itself is extra overhead. Secondly, sometimes freeing a page means > : traversing the cache until you hit a page worth of free mbufs to free, > : so even though you may really need to free a page you'll never > : actually get to freeing it unless you start traversing the list of > : buckets in the cache; and that's expensive for a simple free - common > : case or not. > > Remember you are talking about two memory subsystems here. There > was a suggestion a little while back in the thread that a better > solution might be to integrate the mbuf allocator with UMA. That's > really my main point. Use UMA and solve the global cache -> global > VM issue in UMA. I've looked at integrating these with the general all-purpose system allocator (UMA). I ran into several issues that are not, to my knowledge, easily solved without ripping into UMA pretty badly. I've mentionned these before. One of the issues is the keep-cache-lock across grouped (m_getcl(), m_getm()) allocations and grouped de-allocations (m_freem()) for as long as possible. The other issue has to do with keeping the common allocation and free cases down to one function call. Further, the mbuf code does special things like call drain routines when completely exhausted and although I'm not 100% certain, I can almost guarantee that making sure these work right with UMA is going to take a lot of ripping into it. I'd like to avoid ripping into a general-purpose allocator that I think needs to have less rather than more application-specific complexities. > I have to disagree with your idea of 'expense'. At the point where > freeing things on-the-fly becomes 'too expensive' your kernel thread > will *already* be overloaded and messing up the system in other ways. > > Here's an example: Lets say we have an extreme mbuf load. Not so > much in allocations, but in the *rate* of allocation and the *rate* > of freeing. Now lets say you hit a hysteresis point. With the > thread idea you wakeup your thread and continue on your merry way. > You are assuming that your thread will be able to handle it. But > this may not be true. > > Now lets say you are doing things on the fly and hit the hysteresis > point. What will happen now is rather simple: Once you go over the > upper bound you need to free mbufs until you hit the lower bound. > You want to free more then one at a time for efficiency, but you *don't* > need to free all the mbufs at once. What you do is simply > free, say, 5 mbufs at a time for every call to free an mbuf until > the levels drop to the lower bound. In otherwords, latency can be > fully controlled with an on-the-fly solution because it is fully > self-pacing. > > Now lets go back and look at the thread. Lets say something gets > unbalanced and you hit your upper bound again, and start the thread > going. How many mbufs is the thread going to free at once? Is it > going to free the entire wad required to get back to the lower bound? > How will this effect the latency of other processes? Of the pageout > daemon, for example, or even of user processes which until your thread > started running were doing a fair job draining the TCP and UDP > buffers they've been processing. Unlike the on-the-fly method you > can't really 'pace' the thread, because of the huge overhead in > going to sleep every few milliseconds verses the overhead of freeing > the mbufs. > > In otherwords, the question becomes: How do you intend to control > the latency your thread is now causing in the system. I can pace > the on-the-fly method trivially... in like four lines of code. How > do you solve the same problem with your thread? It isn't as simple > as giving it a fixed priority that is less then X and greater then Y. > > : By doing the freeing from the kproc context you're not interfering > : with parallel allocations but you're also not taking longer than it > : takes to just cache the data being freed for the free case. That's a > : big advantage. By having the kproc also fill the pcpu caches > > I disagree with this. I don't see how the thread can possibly > make a difference vis-a-vie parallel allocations. They work > approximately the same either way. In making this statement you > are assuming that your thread is getting cpu cycles that magically > don't interfere with anything else going on in the system. I > don't think you can make this statement without some more analysis. > > If you agree that dynamically adjusting the hysteresis points > results in fewer thread wakeups, those same adjustments will also > result in fewer 'extra' on-the-fly actions. > > : according the the configurable watermarks you're ensuring to have a > : certain number of objects cached and ready for immediate allocations, > : again without taking longer than it takes to just retrieve the object > : being allocated from the cache for the allocation case. > > This is far from certain. You are again assuming that your thread > is able to operate in a fixed period of time, without interfering > with other things going on (like user processes which are draining > TCP buffers and freeing mbufs back to the caches) to provide > this assurance. Yes, you're right, but the difference is that in most cases, with the kproc, you'll minimize the cost of most of the allocations and frees because the kproc will have done the moving in less time. However, you seem to bring up a good corner-case example. I still think that for network buffer allocations, and well-tuned watermarks, this situation won't be encountered often and, when it is, it can be remedied by careful adjusting of watermarks. Sure, the adjusting of the watermarks would influence the on-the-fly case as well. But, the kproc case has other advantages for the common-case that this corner-case you bring up ignores. Notably, in the common case (when you don't have huge sweep-frees followed by huge sweep allocations going on) the kproc minimizes the number of times the main alloc/free code has to go to VM. I understand that the pageout daemon probably employs an algorithm that can get de-stabilized by large shifting of memory from one subsystem to another. However, my argument is that the effect of moving slightly larger chunks of memory for network buffers is more good than bad. There are more common cases than there are corner cases and for the corner cases I think that I could work out a decent recovery mechanism (the kproc could be temporarily 'turned off,' for example). Here's what I think I'll do in order to get what I'm sure we both want immediately without slowing down progress. I'm going to implement the on-the-fly freeing to VM case (this is pretty trivial). I'll present that and we can get that into the tree (so that we can at least recover resources following network spikes). I'll keep the kproc code here and try to tune it to demonstrate eventually that it does the right thing and that corner cases are minimized. I'll also try varying the number of objects per bucket, especially in the cluster case, and see where we go from there. Keep in mind that because this is a specific network-buffer allocator, we may be able to get away with moving larger chunks of objects from a kproc without necessarily incurring all the bad effects of general-purpose allocation systems. > : Perhaps I can address your concerns if you give me a specific example > : where you think the daemon is doing a bad thing, then I can work on > : fixing that. I think for corner cases it would even make sense to > : explicitly lower the watermarks (thus forcing the daemon to drain the > : caches) directly from the VM, if that's really determined to be an > : issue. > :... > :-- > :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org > > > Well, Julian's example seemed pretty good, but it's not actually what > I am worried about the most. What I am worried about the most is an > effect I saw on BEST Internet's heavily loaded machines quite often, > especially the old Challenge L's. The effect I am worried about is > when system disk and/or network and/or cpu load becomes high enough > to create artificial slowdowns in apparently unrelated processes. > These slowdowns then lead to an increase in buffered data (like TCP > data) and processes completing their work less quickly, leading to > more processes as new connections come into the machine, and the > whole thing spiraling out of control. > > The advantage of doing things on the fly is that you can 'smooth the > curve'. that is, you approach the point of unusability rather > then fall over a cliff and suddenly the machine is dead. It took > an insane amount of effort to make the pageout daemon work that > way and I'm afraid that your little process will require at least > as much work to achieve the same result. It's an interesting corner case, but instead of completely trashing the kproc idea (which does gain us something in common cases by minimizing interactions with VM), I'll see if I can tune it to react properly. I'll look at what kind of gains we can get from more conservative moves from the kproc vis-a-vis larger buckets. It's easy to tune these things without ripping anything else apart, specifically because network buffers are allocated in their own special way. > -Matt > Matthew Dillon > Matt, thanks for still reading the lists and remaining concerned. -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 9:33:51 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5077C37B401 for ; Tue, 18 Feb 2003 09:33:50 -0800 (PST) Received: from melusine.cuivre.fr.eu.org (melusine.cuivre.fr.eu.org [62.212.105.185]) by mx1.FreeBSD.org (Postfix) with ESMTP id 213C943FBD for ; Tue, 18 Feb 2003 09:33:49 -0800 (PST) (envelope-from thomas@cuivre.fr.eu.org) Received: by melusine.cuivre.fr.eu.org (Postfix, from userid 1000) id 9AF722C3D2; Tue, 18 Feb 2003 18:33:47 +0100 (CET) Date: Tue, 18 Feb 2003 18:33:47 +0100 From: Thomas Quinot To: Wes Peters Cc: Garance A Drosihn , arch@FreeBSD.ORG Subject: Re: syslog.conf syntax change (multiple program/host specifications) Message-ID: <20030218173347.GE43307@melusine.cuivre.fr.eu.org> Reply-To: Thomas Quinot References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302150905.08387.wes@softweyr.com> <200302180516.00673.wes@softweyr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200302180516.00673.wes@softweyr.com> User-Agent: Mutt/1.4i X-message-flag: WARNING! Using Outlook can damage your computer. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Le 2003-02-18, Wes Peters écrivait : > the data among log files in the way newsyslog does. The "right" > way to do this would be to extract the file rotation code into a > shared library (librotate - hahaha) and write a simple program to Yes. That and the code to check whether a given file has met its rotation condition. -- Thomas.Quinot@Cuivre.FR.EU.ORG To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 9:57:51 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7337337B401 for ; Tue, 18 Feb 2003 09:57:47 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD75843F75 for ; Tue, 18 Feb 2003 09:57:45 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1IHvjSJ051830; Tue, 18 Feb 2003 09:57:45 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1IHvjaC051829; Tue, 18 Feb 2003 09:57:45 -0800 (PST) Date: Tue, 18 Feb 2003 09:57:45 -0800 (PST) From: Matthew Dillon Message-Id: <200302181757.h1IHvjaC051829@apollo.backplane.com> To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> <200302180101.h1I11AWr001132@apollo.backplane.com> <20030217203306.A67720@unixdaemons.com> <200302180458.h1I4wQiA048763@apollo.backplane.com> <20030218093946.A69621@unixdaemons.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG : I've looked at integrating these with the general all-purpose system : allocator (UMA). I ran into several issues that are not, to my : knowledge, easily solved without ripping into UMA pretty badly. I've : mentionned these before. One of the issues is the keep-cache-lock : across grouped (m_getcl(), m_getm()) allocations and grouped : de-allocations (m_freem()) for as long as possible. The other issue : has to do with keeping the common allocation and free cases down to : one function call. Further, the mbuf code does special things like : call drain routines when completely exhausted and although I'm not : 100% certain, I can almost guarantee that making sure these work : right with UMA is going to take a lot of ripping into it. I'd like : to avoid ripping into a general-purpose allocator that I think needs : to have less rather than more application-specific complexities. Lets separate out the pure efficiency issues from special feature support. The cache locking issue is really just an efficiency issue, easily solved with a little work on UMA. Something like this for example: void **uma_lock = NULL; /* * use of *uma_lock is entirely under the control of UMA. It * can release block and reobtain, release and obtain another * lock, or not use it at all (leave it NULL). The only * requirement is that you call uma_cache_unlock(&uma_lock) * after you are through and that you not block in between UMA * operations. */ uma_cache_free(&uma_lock, ...) ... etc uma_cache_alloc(&uma_lock, ...) ... etc uma_cache_unlock(&uma_lock); Which would allow UMA to maintain a lock through a set of operations, at its sole discretion. If the lock were made a real mutex then we could even allow the caller to block in between UMA operations by msleep()ing on it. I've used this trick on a more global basis on embedded systems... the 'uma_lock' equivalent actually winds up being part of the task structure allowing it to be used universally by multiple subsystems. (which, by the way, would allow one to get rid of the mutex argument to msleep() if it were done that way in FreeBSD). The mbuf draining issue is more of an issue. : Yes, you're right, but the difference is that in most cases, with the : kproc, you'll minimize the cost of most of the allocations and frees : because the kproc will have done the moving in less time. Why would the kproc minimize the cost of the allocations? Try to estimate the efficiency of the following three methods: * The kproc allocating 200 mbufs per scheduled wakeup and the client then making 200 allocations via the local cpu cache. (2 Context switches for every 200 allocations) * The client making 200 allocations via the local cpu cache, the local cpu cache running out, and the allocator doing a bulk allocation of 20 mbufs at a time. (1 VM/global mutex interaction for every 20 allocations). * The kproc uses idle cycles to pre-allocate N mbufs in the per-cpu cache(s). (potentially no overhead if idle cycles are available) I would argue that the kproc method only exceeds the on the fly method if the system has lots of idle cycles for the kproc to run in. Under heavy loads, the on-the-fly method is going to win hands down (in my opinion). Under light loads we shouldn't care if we are slightly less efficiency since we would become more efficient as the load increases. Consider the tuning you would have to do under heavy loads to minimize the number of kproc wakeups. And, also, note that if your goal is for the kproc to never have to wakeup then you are talking about a situation where the on-the-fly mechanism would equivalently not have to resort to the global cache. The on-the-fly mechanism is trivially tunable, the kproc mechanism is not. : I understand that the pageout daemon probably employs an algorithm : that can get de-stabilized by large shifting of memory from one : subsystem to another. However, my argument is that the effect of : moving slightly larger chunks of memory for network buffers is more : good than bad. There are more common cases than there are corner : cases and for the corner cases I think that I could work out a decent : recovery mechanism (the kproc could be temporarily 'turned off,' for : example). I agree as long as the phrase is 'slight larger chunks...'. But that same argument applies to on-the-fly allocation from the global cache, and as I point out above when you have a kproc you still have to decide how long (how much latency) to allow that kproc to introduce, which limits how many mbufs it should try to allocate from the global cache, right? : Here's what I think I'll do in order to get what I'm sure we both want : immediately without slowing down progress. I'm going to implement the : on-the-fly freeing to VM case (this is pretty trivial). I'll present : that and we can get that into the tree (so that we can at least : recover resources following network spikes). I'll keep the kproc code : here and try to tune it to demonstrate eventually that it does the : right thing and that corner cases are minimized. I'll also try : varying the number of objects per bucket, especially in the cluster : case, and see where we go from there. Keep in mind that because this : is a specific network-buffer allocator, we may be able to get away : with moving larger chunks of objects from a kproc without necessarily : incurring all the bad effects of general-purpose allocation systems. :... : It's an interesting corner case, but instead of completely trashing : the kproc idea (which does gain us something in common cases by : minimizing interactions with VM), I'll see if I can tune it to react : properly. I'll look at what kind of gains we can get from more : conservative moves from the kproc vis-a-vis larger buckets. It's easy : to tune these things without ripping anything else apart, specifically : because network buffers are allocated in their own special way. : : Matt, thanks for still reading the lists and remaining concerned. : :-- :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org I think it is well worth you implementing both and making them switchable with sysctl's (simply by adjusting two different sets of hysteresis levels, for example). Then you can test both under load to see if the kproc is worth it. It might well turn out that the kproc is a good idea but that on-the-fly allocation and deallocation is necessary to handle degenerate situations. Or it might turn out that the kproc creates more problems then it solves. Or it might turn out that the on-the-fly allocation and deallocation code is so close to the kproc code in regards to efficiency that there is no real reason to have the kproc. Or it might turn out that the kproc's best use is to recover memory after the machine has finished doing some real hard networking work and is now becoming more idle. Obviously my opinion is heavily weighted towards on-the-fly. At the same I see no reason why you can't develop your kproc idea and even commit it. You are, after all, the person who is taking the time to work on it. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 10:49: 9 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 423F137B401 for ; Tue, 18 Feb 2003 10:49:04 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 507F543F75 for ; Tue, 18 Feb 2003 10:49:03 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1IImak70615; Tue, 18 Feb 2003 13:48:36 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Tue, 18 Feb 2003 13:48:36 -0500 From: Bosko Milekic To: Matthew Dillon Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector Message-ID: <20030218134836.A70583@unixdaemons.com> References: <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> <200302180101.h1I11AWr001132@apollo.backplane.com> <20030217203306.A67720@unixdaemons.com> <200302180458.h1I4wQiA048763@apollo.backplane.com> <20030218093946.A69621@unixdaemons.com> <200302181757.h1IHvjaC051829@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200302181757.h1IHvjaC051829@apollo.backplane.com>; from dillon@apollo.backplane.com on Tue, Feb 18, 2003 at 09:57:45AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, Feb 18, 2003 at 09:57:45AM -0800, Matthew Dillon wrote: > : I've looked at integrating these with the general all-purpose system > : allocator (UMA). I ran into several issues that are not, to my > : knowledge, easily solved without ripping into UMA pretty badly. I've > : mentionned these before. One of the issues is the keep-cache-lock > : across grouped (m_getcl(), m_getm()) allocations and grouped > : de-allocations (m_freem()) for as long as possible. The other issue > : has to do with keeping the common allocation and free cases down to > : one function call. Further, the mbuf code does special things like > : call drain routines when completely exhausted and although I'm not > : 100% certain, I can almost guarantee that making sure these work > : right with UMA is going to take a lot of ripping into it. I'd like > : to avoid ripping into a general-purpose allocator that I think needs > : to have less rather than more application-specific complexities. > > Lets separate out the pure efficiency issues from special feature > support. The cache locking issue is really just an efficiency issue, > easily solved with a little work on UMA. Something like this for > example: > > void **uma_lock = NULL; > > /* > * use of *uma_lock is entirely under the control of UMA. It > * can release block and reobtain, release and obtain another > * lock, or not use it at all (leave it NULL). The only > * requirement is that you call uma_cache_unlock(&uma_lock) > * after you are through and that you not block in between UMA > * operations. > */ > uma_cache_free(&uma_lock, ...) ... etc > uma_cache_alloc(&uma_lock, ...) ... etc > > uma_cache_unlock(&uma_lock); > > Which would allow UMA to maintain a lock through a set of operations, > at its sole discretion. If the lock were made a real mutex then we > could even allow the caller to block in between UMA operations by > msleep()ing on it. I've used this trick on a more global basis on > embedded systems... the 'uma_lock' equivalent actually winds up being > part of the task structure allowing it to be used universally by > multiple subsystems. (which, by the way, would allow one to get > rid of the mutex argument to msleep() if it were done that way > in FreeBSD). It's not quite that simple. You would also have to teach it how to drop the lock if one of the allocations fails (or if it has to go to another cache) and how to tell the caller that it has done that. That means that you'd be introducing more modifications to the API and making it more complicated than it should be (see the MBP_PERSIST{,ENT} implementation for the mbuf allocator). In most cases, you don't need to do the grouped-cache-lock thing, which is why I think that it's not worth complicating UMA just so the mbuf code can use it. The fact that the mbuf code uses it is due to the way the mbuf object itself works. That is, there are situations in which you only allocate an mbuf, and situations where you need both the mbuf and the cluster. You want both situations to be fast and effectively cost one lock/unlock in the common case. > The mbuf draining issue is more of an issue. It is. So is keeping the common case down to one function call without removing the generality of UMA. I have to keep bringing this one up; if we suddenly start to increase the number of function calls required to allocate (and CONFIGURE) an mbuf, then we'll also be quadrupaling the number of function calls needed to allocate an mbuf _and_ a cluster (and CONFIGURE them). This influences overall performance more than one may think. There's also the reference counting issue. We've been through this before, actually, on more than one occasion. > : Yes, you're right, but the difference is that in most cases, with the > : kproc, you'll minimize the cost of most of the allocations and frees > : because the kproc will have done the moving in less time. > > Why would the kproc minimize the cost of the allocations? > > Try to estimate the efficiency of the following three methods: > > * The kproc allocating 200 mbufs per scheduled wakeup and the > client then making 200 allocations via the local cpu cache. > > (2 Context switches for every 200 allocations) > > * The client making 200 allocations via the local cpu cache, > the local cpu cache running out, and the allocator doing a bulk > allocation of 20 mbufs at a time. > > (1 VM/global mutex interaction for every 20 allocations). Actually, it's more than that. There are supporting structures required for every bucket-worth you allocate. So you need to allocate those supporting structures as well. > * The kproc uses idle cycles to pre-allocate N mbufs in the per-cpu > cache(s). > > (potentially no overhead if idle cycles are available) > > > I would argue that the kproc method only exceeds the on the fly > method if the system has lots of idle cycles for the kproc to run in. > Under heavy loads, the on-the-fly method is going to win hands down > (in my opinion). Under light loads we shouldn't care if we are > slightly less efficiency since we would become more efficient as the > load increases. > > Consider the tuning you would have to do under heavy loads to minimize > the number of kproc wakeups. And, also, note that if your goal is > for the kproc to never have to wakeup then you are talking about a > situation where the on-the-fly mechanism would equivalently not have > to resort to the global cache. The on-the-fly mechanism is trivially > tunable, the kproc mechanism is not. > > : I understand that the pageout daemon probably employs an algorithm > : that can get de-stabilized by large shifting of memory from one > : subsystem to another. However, my argument is that the effect of > : moving slightly larger chunks of memory for network buffers is more > : good than bad. There are more common cases than there are corner > : cases and for the corner cases I think that I could work out a decent > : recovery mechanism (the kproc could be temporarily 'turned off,' for > : example). > > I agree as long as the phrase is 'slight larger chunks...'. But > that same argument applies to on-the-fly allocation from the global > cache, and as I point out above when you have a kproc you still have > to decide how long (how much latency) to allow that kproc to > introduce, which limits how many mbufs it should try to allocate from > the global cache, right? > > : Here's what I think I'll do in order to get what I'm sure we both want > : immediately without slowing down progress. I'm going to implement the > : on-the-fly freeing to VM case (this is pretty trivial). I'll present > : that and we can get that into the tree (so that we can at least > : recover resources following network spikes). I'll keep the kproc code > : here and try to tune it to demonstrate eventually that it does the > : right thing and that corner cases are minimized. I'll also try > : varying the number of objects per bucket, especially in the cluster > : case, and see where we go from there. Keep in mind that because this > : is a specific network-buffer allocator, we may be able to get away > : with moving larger chunks of objects from a kproc without necessarily > : incurring all the bad effects of general-purpose allocation systems. > :... > : It's an interesting corner case, but instead of completely trashing > : the kproc idea (which does gain us something in common cases by > : minimizing interactions with VM), I'll see if I can tune it to react > : properly. I'll look at what kind of gains we can get from more > : conservative moves from the kproc vis-a-vis larger buckets. It's easy > : to tune these things without ripping anything else apart, specifically > : because network buffers are allocated in their own special way. > : > : Matt, thanks for still reading the lists and remaining concerned. > : > :-- > :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org > > I think it is well worth you implementing both and making them > switchable with sysctl's (simply by adjusting two different sets of > hysteresis levels, for example). Then you can test both under load > to see if the kproc is worth it. It might well turn out that the > kproc is a good idea but that on-the-fly allocation and deallocation > is necessary to handle degenerate situations. Or it might turn out > that the kproc creates more problems then it solves. Or it might turn > out that the on-the-fly allocation and deallocation code is so close > to the kproc code in regards to efficiency that there is no real > reason to have the kproc. Or it might turn out that the kproc's best > use is to recover memory after the machine has finished doing some > real hard networking work and is now becoming more idle. > > Obviously my opinion is heavily weighted towards on-the-fly. At > the same I see no reason why you can't develop your kproc idea and > even commit it. You are, after all, the person who is taking the > time to work on it. Hmmmmmm... both! The ideal situation would be to have the kproc run for not-too-loaded situations but once the load gets high, recover through the on-the-fly code. Now the problem is shifted to determining when we're "not-too-loaded" (admittedly, this is not as easy as it sounds, as "load" is not purely defined by the state of network buffers). I'll implement the on-the-fly case and commit that, barring other disagreements and take it from there because it's extremely important for me to have the system free resources back after a spike at this stage, at the very least. With that said, does anyone disagree with this approach? > -Matt > Matthew Dillon > -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 14:13:27 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F414E37B401 for ; Tue, 18 Feb 2003 14:13:25 -0800 (PST) Received: from smtp1.server.rpi.edu (smtp1.server.rpi.edu [128.113.2.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A61043F85 for ; Tue, 18 Feb 2003 14:13:25 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp1.server.rpi.edu (8.12.7/8.12.7) with ESMTP id h1IMDN3q029605; Tue, 18 Feb 2003 17:13:23 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <200302180516.00673.wes@softweyr.com> References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302150905.08387.wes@softweyr.com> <200302180516.00673.wes@softweyr.com> Date: Tue, 18 Feb 2003 17:13:22 -0500 To: Wes Peters From: Garance A Drosihn Subject: Re: syslog.conf syntax change (multiple program/host specifications) Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-Spam-Score: -1.6 () IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,SIGNATURE_SHORT_DENSE,SPAM_PHRASE_00_01 X-Scanned-By: MIMEDefang 2.28 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 5:16 AM +0000 2/18/03, Wes Peters wrote: >On Monday 17 February 2003 02:45, Garance A Drosihn wrote: > > > > I believe this issue would be handled by the "force" option >> (either '-Fr' for now, or '-R' & handling once I do >> that). So, my assumption is that there is nothing additional >> I need to do here. Let me know if I'm missing something. > >Oh, Thomas was asking for a process that reads stdin and rotates >the data among log files in the way newsyslog does. The "right" >way to do this would be to extract the file rotation code into a >shared library (librotate - hahaha) and write a simple program to >implement the pipe functionality. It sounds straightforward, I >can look into it if you're too busy or not interested. Eh. Sounds like too much work for too little payback, imo. >Or buried under snow. ;^) At the time of my earlier message, they were still predicting "maybe 1 to 3 inches" for us. We ended up with something like 12-15 inches. Not as bad as other places, but certainly enough to disrupt my earlier plans... I hope to look into the newsyslog changes later tonight. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 20:58: 2 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7767A37B405 for ; Tue, 18 Feb 2003 20:58:00 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id C95DD43F75 for ; Tue, 18 Feb 2003 20:57:59 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1J4vx8h000971; Tue, 18 Feb 2003 20:57:59 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1J4vxes000970; Tue, 18 Feb 2003 20:57:59 -0800 (PST) Date: Tue, 18 Feb 2003 20:57:59 -0800 (PST) From: Matthew Dillon Message-Id: <200302190457.h1J4vxes000970@apollo.backplane.com> To: Bosko Milekic Cc: freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> <200302180101.h1I11AWr001132@apollo.backplane.com> <20030217203306.A67720@unixdaemons.com> <200302180458.h1I4wQiA048763@apollo.backplane.com> <20030218093946.A69621@unixdaemons.com> <200302181757.h1IHvjaC051829@apollo.backplane.com> <20030218134836.A70583@unixdaemons.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :> :> void **uma_lock = NULL; :> :> /* :> * use of *uma_lock is entirely under the control of UMA. It :> * can release block and reobtain, release and obtain another :> * lock, or not use it at all (leave it NULL). The only :> * requirement is that you call uma_cache_unlock(&uma_lock) :> * after you are through and that you not block in between UMA :> * operations. :> */ :> uma_cache_free(&uma_lock, ...) ... etc :> uma_cache_alloc(&uma_lock, ...) ... etc :> :> uma_cache_unlock(&uma_lock); :> : It's not quite that simple. You would also have to teach it how to : drop the lock if one of the allocations fails (or if it has to go to : another cache) and how to tell the caller that it has done that. :... :Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org I think you missed the double pointer. It's void **uma_lock, not void *uma_lock. i.e. UMA can use *uma_lock for whatever it wants, including dropping and reobtaining, or just dropping, or whatever. Then you could call the uma allocator a whole bunch of times with virtually no overhead. Another alternative is to simply add a mutex pointer to the current thread and allow *any* major kernel API to use it to cache an obtained mutex in order to streamline multiple calls. It would be a very powerful efficiency mechanism but would also require a mindset change on the part of kernel developers. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 21: 5:19 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 98BDB37B401 for ; Tue, 18 Feb 2003 21:05:18 -0800 (PST) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2759643F3F for ; Tue, 18 Feb 2003 21:05:18 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.12.6/8.12.6) with ESMTP id h1J55G8h001041; Tue, 18 Feb 2003 21:05:16 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.6/8.12.6/Submit) id h1J55GXt001040; Tue, 18 Feb 2003 21:05:16 -0800 (PST) Date: Tue, 18 Feb 2003 21:05:16 -0800 (PST) From: Matthew Dillon Message-Id: <200302190505.h1J55GXt001040@apollo.backplane.com> To: David Schultz Cc: Bosko Milekic , Andrew Gallatin , freebsd-arch@FreeBSD.ORG Subject: Re: mb_alloc cache balancer / garbage collector References: <20030216213552.A63109@unixdaemons.com> <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030218055438.GA10838@HAL9000.homeunix.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :To give you an idea of how big Solaris' per-CPU caches are, the :ranges are described in the following table from :_Solaris_Internals_. As I mentioned, they are occasionally :adjusted within these ranges. Keep in mind that this is for a :generic memory allocator, though, and not an mbuf allocator. : :Object Size Range Min PCPU Cache Size Max PCPU Cache Size :0-63 15 143 :64-127 7 95 :128-255 3 47 :256-511 1 31 :512-1023 1 15 :1024-2047 1 7 :2048-16383 1 3 :16384- 1 1 Interesting. They are using fairly low object counts, which is what I would expect since the only real goal is to minimize global contention. It doesn't take much to reap the benefit. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue Feb 18 21:53:31 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F95137B401 for ; Tue, 18 Feb 2003 21:53:30 -0800 (PST) Received: from smtp-relay.omnis.com (smtp-relay.omnis.com [216.239.128.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9C2FE43F85 for ; Tue, 18 Feb 2003 21:53:29 -0800 (PST) (envelope-from wes@softweyr.com) Received: from softweyr.homeunix.net (66-75-151-22.san.rr.com [66.75.151.22]) by smtp-relay.omnis.com (Postfix) with ESMTP id CC32743A2D; Tue, 18 Feb 2003 21:47:37 -0800 (PST) From: Wes Peters Organization: Softweyr To: Thomas Quinot Subject: Re: syslog.conf syntax change (multiple program/host specifications) Date: Wed, 19 Feb 2003 05:47:36 +0000 User-Agent: KMail/1.5 Cc: Garance A Drosihn , arch@FreeBSD.ORG References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302180516.00673.wes@softweyr.com> <20030218173347.GE43307@melusine.cuivre.fr.eu.org> In-Reply-To: <20030218173347.GE43307@melusine.cuivre.fr.eu.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200302190547.36891.wes@softweyr.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tuesday 18 February 2003 17:33, Thomas Quinot wrote: > Le 2003-02-18, Wes Peters =E9crivait : > > the data among log files in the way newsyslog does. The "right" > > way to do this would be to extract the file rotation code into a > > shared library (librotate - hahaha) and write a simple program to > > Yes. That and the code to check whether a given file has met its > rotation condition. That's trivial, stat(2) or fstat(2) and compare against the size=20 limit. I've got that now. ;^) =2D-=20 Where am I, and what am I doing in this handbasket? Wes Peters wes@softweyr.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed Feb 19 10:16:57 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5908437B401 for ; Wed, 19 Feb 2003 10:16:56 -0800 (PST) Received: from melusine.cuivre.fr.eu.org (melusine.cuivre.fr.eu.org [62.212.105.185]) by mx1.FreeBSD.org (Postfix) with ESMTP id 345D543F85 for ; Wed, 19 Feb 2003 10:16:55 -0800 (PST) (envelope-from thomas@cuivre.fr.eu.org) Received: by melusine.cuivre.fr.eu.org (Postfix, from userid 1000) id CEEA82C3D1; Wed, 19 Feb 2003 19:16:52 +0100 (CET) Date: Wed, 19 Feb 2003 19:16:52 +0100 From: Thomas Quinot To: Wes Peters Cc: Thomas Quinot , Garance A Drosihn , arch@FreeBSD.ORG Subject: Re: syslog.conf syntax change (multiple program/host specifications) Message-ID: <20030219181652.GA52680@melusine.cuivre.fr.eu.org> Reply-To: Thomas Quinot References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302180516.00673.wes@softweyr.com> <20030218173347.GE43307@melusine.cuivre.fr.eu.org> <200302190547.36891.wes@softweyr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <200302190547.36891.wes@softweyr.com> User-Agent: Mutt/1.4i X-message-flag: WARNING! Using Outlook can damage your computer. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Le 2003-02-19, Wes Peters écrivait : > That's trivial, stat(2) or fstat(2) and compare against the size > limit. I've got that now. ;^) Well, newsyslog.conf allows more sophisticated rotation conditions than just size checking. -- Thomas.Quinot@Cuivre.FR.EU.ORG To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 20 6:51:55 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 75EC737B401 for ; Thu, 20 Feb 2003 06:51:53 -0800 (PST) Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id A7BF443FBD for ; Thu, 20 Feb 2003 06:51:49 -0800 (PST) (envelope-from ru@whale.sunbay.crimea.ua) Received: from whale.sunbay.crimea.ua (root@localhost) by whale.sunbay.crimea.ua (8.12.6/8.12.6/Sunbay) with SMTP id h1KEphhA094296 for ; Thu, 20 Feb 2003 16:51:43 +0200 (EET) (envelope-from ru@whale.sunbay.crimea.ua) Received: from whale.sunbay.crimea.ua (ru@localhost [127.0.0.1]) by whale.sunbay.crimea.ua (8.12.6/8.12.6/Sunbay) with ESMTP id h1KEpgHR094282 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 20 Feb 2003 16:51:43 +0200 (EET) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.12.6/8.12.6/Submit) id h1KEpg5P094277; Thu, 20 Feb 2003 16:51:42 +0200 (EET) Date: Thu, 20 Feb 2003 16:51:42 +0200 From: Ruslan Ermilov To: Takahashi Yoshihiro Cc: arch@freebsd.org Subject: Re: hw.machine on PC98 Message-ID: <20030220145142.GA93982@sunbay.com> References: <200112141527.fBEFRF594757@freefall.freebsd.org> <20011214173056.A94075@sunbay.com> <20011217.233630.74667853.yosihiro@cc.kogakuin.ac.jp> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Qxx1br4bt0+wmkIi" Content-Disposition: inline In-Reply-To: <20011217.233630.74667853.yosihiro@cc.kogakuin.ac.jp> User-Agent: Mutt/1.5.1i Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --Qxx1br4bt0+wmkIi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 17, 2001 at 11:36:30PM +0900, Takahashi Yoshihiro wrote: > In article <20011214173056.A94075@sunbay.com> > Ruslan Ermilov writes: >=20 > > What do you think about making hw.machine display "pc98" > > on PC98's? >=20 > *I* think that hw.machine should be "pc98" on PC-98. >=20 Any progress in this area any time soon? Cheers, --=20 Ruslan Ermilov Sysadmin and DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age --Qxx1br4bt0+wmkIi Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (FreeBSD) iD8DBQE+VOt+Ukv4P6juNwoRAnXyAJ4zBNLSvTi5fTo4DBYYghAmgFqGMACfZuQ2 1EkbY76DSa4H1tJA9o5J9LY= =//77 -----END PGP SIGNATURE----- --Qxx1br4bt0+wmkIi-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 20 19:26:19 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 505F837B401 for ; Thu, 20 Feb 2003 19:26:18 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id CC36F43F85 for ; Thu, 20 Feb 2003 19:26:17 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h1L3QB9l054528 for ; Thu, 20 Feb 2003 19:26:11 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h1L3QBli054527 for arch@FreeBSD.ORG; Thu, 20 Feb 2003 19:26:11 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Thu, 20 Feb 2003 19:26:11 -0800 From: David Schultz To: arch@FreeBSD.ORG Subject: UFS quota reference count overflow Message-ID: <20030221032611.GA54489@HAL9000.homeunix.com> Mail-Followup-To: arch@FreeBSD.ORG Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG It seems that with the advent of multi-terabyte drives, some people want to be able to set inode quotas above 65535. That is presently a problem, because when a given user manages to rack up more than 64K of active or cached vnodes, struct dquot's dq_cnt reference counter overflows and bad things happen. Are there any objections to applying the following patch to -CURRENT and (later) -STABLE? Am I correct to assume that there probably aren't any modules that would have ABI compatability issues? Index: sys/ufs/ufs/quota.h =================================================================== RCS file: /home/ncvs/src/sys/ufs/ufs/quota.h,v retrieving revision 1.15 diff -u -u -r1.15 quota.h --- sys/ufs/ufs/quota.h 1999/12/29 04:55:05 1.15 +++ sys/ufs/ufs/quota.h 2003/02/20 06:50:04 @@ -122,8 +122,7 @@ LIST_ENTRY(dquot) dq_hash; /* hash list */ TAILQ_ENTRY(dquot) dq_freelist; /* free list */ u_int16_t dq_flags; /* flags, see below */ - u_int16_t dq_cnt; /* count of active references */ - u_int16_t dq_spare; /* unused spare padding */ + u_int32_t dq_cnt; /* count of active references */ u_int16_t dq_type; /* quota type of this dquot */ u_int32_t dq_id; /* identifier this applies to */ struct ufsmount *dq_ump; /* filesystem that this is taken from */ While I'm at it, maybe I should group the two remaining 16-bit fields, in case gcc doesn't... To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 20 20:36:11 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A76AC37B401 for ; Thu, 20 Feb 2003 20:36:01 -0800 (PST) Received: from smtp4.server.rpi.edu (smtp4.server.rpi.edu [128.113.2.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9A7F343FA3 for ; Thu, 20 Feb 2003 20:36:00 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp4.server.rpi.edu (8.12.7/8.12.7) with ESMTP id h1L4ZwqX008189; Thu, 20 Feb 2003 23:35:58 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <200302150905.08387.wes@softweyr.com> References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302141733.29304.wes@softweyr.com> <200302150905.08387.wes@softweyr.com> Date: Thu, 20 Feb 2003 23:35:56 -0500 To: Wes Peters From: Garance A Drosihn Subject: Re: syslog.conf syntax change (multiple program/host specifications) Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-Spam-Score: -2.5 () IN_REP_TO,PATCH_UNIFIED_DIFF,QUOTED_EMAIL_TEXT,REFERENCES,SIGNATURE_SHORT_DENSE,SPAM_PHRASE_00_01 X-Scanned-By: MIMEDefang 2.28 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Offshoot of the "syslog.conf syntax change" thread. At 9:05 AM +0000 2/15/03, Wes Peters wrote: >On Saturday 15 February 2003 06:48, Garance A Drosihn wrote: > > >> I would add some default rotate-action to newsyslog, which >> would be used if -R is specified and the file is not listed >> in the newsyslog.conf file. > >Sounds good to me. Are you going to look into that? I'll >definitely want your changes to newsyslog to go along with >my changes to syslog.conf. ;^) What follows is part #1 of what I plan to do. This adds the notion of a "default rotation action" to newsyslog. This action will *only* be significant when newsyslog is run with a specific list of filenames. The patch picks some plausible behavior for that default action, but users can set a different default-action by adding a line to newsyslog.conf which uses as the filename. This should cause no change in behavior for the standard usage of newsyslog (ie, run without listing specific filenames). I'd like to commit this on Sunday night (assuming we don't get ANOTHER snow storm this weekend...), unless there's some feedback. I'll soon have a second update, which will implement "-R requestor", which Wes could then take advantage of with his syslog changes. Updates to the man page will come later. I don't want to start writing that until I'm sure everyone's comfortable with the ideas and implementation. I expect to have more changes to newsyslog after these two, so also let me know if there's "other things" you'd like to see. Index: newsyslog.c =================================================================== RCS file: /home/ncvs/src/usr.sbin/newsyslog/newsyslog.c,v retrieving revision 1.49 diff -u -r1.49 newsyslog.c --- newsyslog.c 21 Dec 2002 22:27:26 -0000 1.49 +++ newsyslog.c 21 Feb 2003 04:18:55 -0000 @@ -87,9 +87,12 @@ int permissions; /* File permissions on the log */ int flags; /* CE_COMPACT, CE_BZCOMPACT, CE_BINARY */ int sig; /* Signal to send */ + int def_cfg; /* Using the rule for this file */ struct conf_entry *next;/* Linked list pointer */ }; +#define DEFAULT_MARKER "" + int archtodir = 0; /* Archive old logfiles to other directory */ int verbose = 0; /* Print out what's going on */ int needroot = 1; /* Root privs are necessary */ @@ -109,11 +112,14 @@ static char *son(char *p); static char *missing_field(char *p, char *errline); static void do_entry(struct conf_entry * ent); +static void free_entry(struct conf_entry *ent); +static struct conf_entry *init_entry(const char *fname, + struct conf_entry *src_entry); static void PRS(int argc, char **argv); static void usage(void); static void dotrim(char *log, const char *pid_file, int numdays, int falgs, - int perm, int owner_uid, int group_gid, int sig); -static int log_trim(char *log); + int perm, int owner_uid, int group_gid, int sig, int def_cfg); +static int log_trim(char *log, int def_cfg); static void compress_log(char *log, int dowait); static void bzcompress_log(char *log, int dowait); static int sizefile(char *file); @@ -152,7 +158,7 @@ } } p = p->next; - free((char *) q); + free_entry(q); q = p; } while (wait(NULL) > 0 || errno == EINTR) @@ -160,6 +166,77 @@ return (0); } +static struct conf_entry * +init_entry(const char *fname, struct conf_entry *src_entry) +{ + struct conf_entry *tempwork; + + if (verbose > 4) + printf("\t--> [creating entry for %s]\n", fname); + + tempwork = malloc(sizeof(struct conf_entry)); + if (tempwork == NULL) + err(1, "malloc of conf_entry for %s", fname); + + tempwork->log = strdup(fname); + if (tempwork->log == NULL) + err(1, "strdup for %s", fname); + + if (src_entry != NULL) { + tempwork->pid_file = NULL; + if (src_entry->pid_file) + tempwork->pid_file = strdup(src_entry->pid_file); + tempwork->uid = src_entry->uid; + tempwork->gid = src_entry->gid; + tempwork->numlogs = src_entry->numlogs; + tempwork->size = src_entry->size; + tempwork->hours = src_entry->hours; + tempwork->trim_at = src_entry->trim_at; + tempwork->permissions = src_entry->permissions; + tempwork->flags = src_entry->flags; + tempwork->sig = src_entry->sig; + tempwork->def_cfg = src_entry->def_cfg; + } else { + /* Initialize as a "do-nothing" entry */ + tempwork->pid_file = NULL; + tempwork->uid = NONE; + tempwork->gid = NONE; + tempwork->numlogs = 1; + tempwork->size = -1; + tempwork->hours = -1; + tempwork->trim_at = (time_t)0; + tempwork->permissions = 0; + tempwork->flags = 0; + tempwork->sig = SIGHUP; + tempwork->def_cfg = 0; + } + tempwork->next = NULL; + + return (tempwork); +} + +static void +free_entry(struct conf_entry *ent) +{ + + if (ent == NULL) + return; + + if (ent->log != NULL) { + if (verbose > 4) + printf("\t--> [freeing entry for %s]\n", ent->log); + free(ent->log); + ent->log = NULL; + } + + if (ent->pid_file != NULL) { + free(ent->pid_file); + ent->pid_file = NULL; + } + + free(ent); +} + static void do_entry(struct conf_entry * ent) { @@ -223,7 +300,7 @@ } dotrim(ent->log, pid_file, ent->numlogs, ent->flags, ent->permissions, ent->uid, ent->gid, - ent->sig); + ent->sig, ent->def_cfg); } else { if (verbose) printf("--> skipping\n"); @@ -279,7 +356,7 @@ { fprintf(stderr, - "usage: newsyslog [-Fnrv] [-f config-file] [-a directory]\n"); + "usage: newsyslog [-Fnrv] [-f config-file] [-a directory] [ filename ... ]\n"); exit(1); } @@ -293,13 +370,13 @@ FILE *f; char line[BUFSIZ], *parse, *q; char *cp, *errline, *group; - char **p; - struct conf_entry *first, *working; + char **given; + struct conf_entry *defconf, *first, *working, *worklist; struct passwd *pass; struct group *grp; int eol; - first = working = NULL; + defconf = first = working = worklist = NULL; if (strcmp(conf, "-")) f = fopen(conf, "r"); @@ -331,27 +408,55 @@ errline); *parse = '\0'; + /* + * If newsyslog was run with a list of specific filenames, + * then this line of the config file should be skipped if + * it is NOT one of those given files (except that we do + * want any line that defines the action). + * + * XXX - note that CE_GLOB processing is *NOT* done when + * trying to match a filename given on the command! + */ if (*files) { - for (p = files; *p; ++p) - if (strcmp(*p, q) == 0) - break; - if (!*p) + if (strcasecmp(DEFAULT_MARKER, q) != 0) { + for (given = files; *given; ++given) { + if (strcmp(*given, q) == 0) + break; + } + if (!*given) + continue; + } + if (verbose > 2) + printf("\t+ Matched entry %s\n", q); + } else { + /* + * If no files were specified on the command line, + * then we can skip any line which defines the + * default action. + */ + if (strcasecmp(DEFAULT_MARKER, q) == 0) { + if (verbose > 2) + printf("\t+ Ignoring entry for %s\n", + q); continue; + } } - if (!first) { - if ((working = malloc(sizeof(struct conf_entry))) == - NULL) - err(1, "malloc"); - first = working; + working = init_entry(q, NULL); + if (strcasecmp(DEFAULT_MARKER, q) == 0) { + if (defconf != NULL) { + warnx("Ignoring duplicate entry for %s!", q); + free_entry(working); + continue; + } + defconf = working; } else { - if ((working->next = malloc(sizeof(struct conf_entry))) - == NULL) - err(1, "malloc"); - working = working->next; + if (!first) + first = working; + else + worklist->next = working; + worklist = working; } - if ((working->log = strdup(q)) == NULL) - err(1, "strdup"); q = parse = missing_field(sob(++parse), errline); parse = son(parse); @@ -527,9 +632,57 @@ } free(errline); } - if (working) - working->next = (struct conf_entry *) NULL; (void) fclose(f); + + /* + * The entire config file has been processed. If there were + * no specific files given on the run command, then the work + * of this routine is done. + */ + if (*files == NULL) + return (first); + + /* + * If the program was given a specific list of files to process, + * it may be that some of those files were not listed in the + * config file. Those unlisted files should get the default + * rotation action. First, create the default-rotation action + * if none was found in the config file. + */ + if (defconf == NULL) { + working = init_entry(DEFAULT_MARKER, NULL); + working->numlogs = 3; + working->size = 50; + working->permissions = S_IRUSR|S_IWUSR; + defconf = working; + } + + for (given = files; *given; ++given) { + for (working = first; working; working = working->next) { + if (strcmp(*given, working->log) == 0) + break; + } + if (working != NULL) + continue; + if (verbose > 2) + printf("\t+ No entry for %s (will use %s)\n", + *given, DEFAULT_MARKER); + /* + * This given file was not found in the config file. + * Add another item on to our work list, based on the + * default entry. + */ + working = init_entry(*given, defconf); + if (!first) + first = working; + else + worklist->next = working; + /* This is a file that was *not* found in config file */ + working->def_cfg = 1; + worklist = working; + } + + free_entry(defconf); return (first); } @@ -544,7 +697,7 @@ static void dotrim(char *log, const char *pid_file, int numdays, int flags, int perm, - int owner_uid, int group_gid, int sig) + int owner_uid, int group_gid, int sig, int def_cfg) { char dirpart[MAXPATHLEN], namepart[MAXPATHLEN]; char file1[MAXPATHLEN], file2[MAXPATHLEN]; @@ -659,8 +812,10 @@ (void) chown(zfile2, owner_uid, group_gid); } } - if (!noaction && !(flags & CE_BINARY)) - (void) log_trim(log); /* Report the trimming to the old log */ + if (!noaction && !(flags & CE_BINARY)) { + /* Report the trimming to the old log */ + (void) log_trim(log, def_cfg); + } if (!_numdays) { if (noaction) @@ -691,9 +846,11 @@ if (fchown(fd, owner_uid, group_gid)) err(1, "can't chmod new log file"); (void) close(fd); - if (!(flags & CE_BINARY)) - if (log_trim(tfile)) /* Add status message */ + if (!(flags & CE_BINARY)) { + /* Add status message to new log file */ + if (log_trim(tfile, def_cfg)) err(1, "can't add status message to log"); + } } if (noaction) printf("chmod %o %s...\n", perm, log); @@ -759,14 +916,18 @@ /* Log the fact that the logs were turned over */ static int -log_trim(char *log) +log_trim(char *log, int def_cfg) { FILE *f; + const char *xtra; if ((f = fopen(log, "a")) == NULL) return (-1); - fprintf(f, "%s %s newsyslog[%d]: logfile turned over\n", - daytime, hostname, (int) getpid()); + xtra = ""; + if (def_cfg) + xtra = " using rule"; + fprintf(f, "%s %s newsyslog[%d]: logfile turned over%s\n", + daytime, hostname, (int) getpid(), xtra); if (fclose(f) == EOF) err(1, "log_trim: fclose:"); return (0); -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 20 21:33:17 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EEC1337B401 for ; Thu, 20 Feb 2003 21:33:16 -0800 (PST) Received: from smtp3.server.rpi.edu (smtp3.server.rpi.edu [128.113.2.3]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4C07B43FB1 for ; Thu, 20 Feb 2003 21:33:16 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp3.server.rpi.edu (8.12.7/8.12.7) with ESMTP id h1L5XF0H020207; Fri, 21 Feb 2003 00:33:15 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302141733.29304.wes@softweyr.com> <200302150905.08387.wes@softweyr.com> Date: Fri, 21 Feb 2003 00:33:14 -0500 To: Wes Peters From: Garance A Drosihn Subject: Re: NEWSYSLOG changes Cc: arch@FreeBSD.ORG Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-Spam-Score: -0.8 () IN_REP_TO,REFERENCES,SIGNATURE_SHORT_DENSE,SPAM_PHRASE_00_01 X-Scanned-By: MIMEDefang 2.28 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 11:35 PM -0500 2/20/03, Garance A Drosihn wrote: >Offshoot of the "syslog.conf syntax change" thread. Arg. My previous message was supposed to go out with this new subject, instead of just repeating the subject of the thread "it was an offshoot of". I won't repost the whole thing, but if you're interested in newsyslog changes then check my most-recent message with the subject "Re: syslog.conf syntax change (multiple program/host..." Sorry about that... -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 20 22:59:32 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CE44237B401 for ; Thu, 20 Feb 2003 22:59:31 -0800 (PST) Received: from sasami.jurai.net (sasami.jurai.net [66.92.160.223]) by mx1.FreeBSD.org (Postfix) with ESMTP id F38B443F3F for ; Thu, 20 Feb 2003 22:59:30 -0800 (PST) (envelope-from winter@jurai.net) Received: from sasami.jurai.net (sasami.jurai.net [66.92.160.223]) by sasami.jurai.net (8.12.6/8.12.5) with ESMTP id h1L6xJvA057592; Fri, 21 Feb 2003 01:59:24 -0500 (EST) (envelope-from winter@jurai.net) Date: Fri, 21 Feb 2003 01:59:19 -0500 (EST) From: "Matthew N. Dodd" To: David Schultz Cc: arch@FreeBSD.ORG Subject: Re: UFS quota reference count overflow In-Reply-To: <20030221032611.GA54489@HAL9000.homeunix.com> Message-ID: <20030221015809.W66355@sasami.jurai.net> References: <20030221032611.GA54489@HAL9000.homeunix.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 20 Feb 2003, David Schultz wrote: > Am I correct to assume that there probably aren't any modules that would > have ABI compatability issues? Won't systems with existing quotas be forced to recreate their quota files? -- | Matthew N. Dodd | '78 Datsun 280Z | '75 Volvo 164E | FreeBSD/NetBSD | | winter@jurai.net | 2 x '84 Volvo 245DL | ix86,sparc,pmax | | http://www.jurai.net/~winter | For Great Justice! | ISO8802.5 4ever | To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 20 23: 0:27 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EC1E537B401 for ; Thu, 20 Feb 2003 23:00:25 -0800 (PST) Received: from sasami.jurai.net (sasami.jurai.net [66.92.160.223]) by mx1.FreeBSD.org (Postfix) with ESMTP id F192243FAF for ; Thu, 20 Feb 2003 23:00:24 -0800 (PST) (envelope-from winter@jurai.net) Received: from sasami.jurai.net (sasami.jurai.net [66.92.160.223]) by sasami.jurai.net (8.12.6/8.12.5) with ESMTP id h1L70NvA057632; Fri, 21 Feb 2003 02:00:24 -0500 (EST) (envelope-from winter@jurai.net) Date: Fri, 21 Feb 2003 02:00:23 -0500 (EST) From: "Matthew N. Dodd" To: David Schultz Cc: arch@FreeBSD.ORG Subject: Re: UFS quota reference count overflow In-Reply-To: <20030221015809.W66355@sasami.jurai.net> Message-ID: <20030221015953.R66355@sasami.jurai.net> References: <20030221032611.GA54489@HAL9000.homeunix.com> <20030221015809.W66355@sasami.jurai.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 21 Feb 2003, Matthew N. Dodd wrote: > On Thu, 20 Feb 2003, David Schultz wrote: > > Am I correct to assume that there probably aren't any modules that would > > have ABI compatability issues? > > Won't systems with existing quotas be forced to recreate their quota > files? Feh, I should know better than to reply to email this late at night. :) -- | Matthew N. Dodd | '78 Datsun 280Z | '75 Volvo 164E | FreeBSD/NetBSD | | winter@jurai.net | 2 x '84 Volvo 245DL | ix86,sparc,pmax | | http://www.jurai.net/~winter | For Great Justice! | ISO8802.5 4ever | To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu Feb 20 23:38:52 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A2B5C37B401 for ; Thu, 20 Feb 2003 23:38:51 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id C969843FAF for ; Thu, 20 Feb 2003 23:38:50 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h1L7cn9l055339; Thu, 20 Feb 2003 23:38:49 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h1L7cmXx055338; Thu, 20 Feb 2003 23:38:48 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Thu, 20 Feb 2003 23:38:48 -0800 From: David Schultz To: "Matthew N. Dodd" Cc: arch@FreeBSD.ORG Subject: Re: UFS quota reference count overflow Message-ID: <20030221073848.GA55314@HAL9000.homeunix.com> Mail-Followup-To: "Matthew N. Dodd" , arch@FreeBSD.ORG References: <20030221032611.GA54489@HAL9000.homeunix.com> <20030221015809.W66355@sasami.jurai.net> <20030221015953.R66355@sasami.jurai.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030221015953.R66355@sasami.jurai.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake Matthew N. Dodd : > On Fri, 21 Feb 2003, Matthew N. Dodd wrote: > > On Thu, 20 Feb 2003, David Schultz wrote: > > > Am I correct to assume that there probably aren't any modules that would > > > have ABI compatability issues? > > > > Won't systems with existing quotas be forced to recreate their quota > > files? > > Feh, I should know better than to reply to email this late at night. Heh, you should see some of the things I've posted at 2, 3, or even 6 AM. Just in case anyone else has the same concern, the on-disk structure is separate. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 5:15:49 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4AA6C37B405 for ; Fri, 21 Feb 2003 05:15:48 -0800 (PST) Received: from harmony.village.org (rover.bsdimp.com [204.144.255.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 05EB243F3F for ; Fri, 21 Feb 2003 05:15:47 -0800 (PST) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.12.6/8.12.3) with ESMTP id h1LDFi3Y061136; Fri, 21 Feb 2003 06:15:45 -0700 (MST) (envelope-from imp@bsdimp.com) Date: Fri, 21 Feb 2003 06:15:11 -0700 (MST) Message-Id: <20030221.061511.127813063.imp@bsdimp.com> To: winter@jurai.net Cc: dschultz@uclink.Berkeley.EDU, arch@FreeBSD.ORG Subject: Re: UFS quota reference count overflow From: "M. Warner Losh" In-Reply-To: <20030221015809.W66355@sasami.jurai.net> References: <20030221032611.GA54489@HAL9000.homeunix.com> <20030221015809.W66355@sasami.jurai.net> X-Mailer: Mew version 2.1 on Emacs 21.2 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message: <20030221015809.W66355@sasami.jurai.net> "Matthew N. Dodd" writes: : On Thu, 20 Feb 2003, David Schultz wrote: : > Am I correct to assume that there probably aren't any modules that would : > have ABI compatability issues? : : Won't systems with existing quotas be forced to recreate their quota : files? Little endian filesystems would be fine. Big endian file systems would need to regenerate things, no? Warner To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 5:55:38 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3311037B401 for ; Fri, 21 Feb 2003 05:55:37 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 99E2243F75 for ; Fri, 21 Feb 2003 05:55:36 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h1LDtR9l058317; Fri, 21 Feb 2003 05:55:27 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h1LDtRlI058316; Fri, 21 Feb 2003 05:55:27 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Fri, 21 Feb 2003 05:55:27 -0800 From: David Schultz To: "M. Warner Losh" Cc: winter@jurai.net, arch@FreeBSD.ORG Subject: Re: UFS quota reference count overflow Message-ID: <20030221135527.GA58235@HAL9000.homeunix.com> Mail-Followup-To: "M. Warner Losh" , winter@jurai.net, arch@FreeBSD.ORG References: <20030221032611.GA54489@HAL9000.homeunix.com> <20030221015809.W66355@sasami.jurai.net> <20030221.061511.127813063.imp@bsdimp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030221.061511.127813063.imp@bsdimp.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Thus spake M. Warner Losh : > In message: <20030221015809.W66355@sasami.jurai.net> > "Matthew N. Dodd" writes: > : On Thu, 20 Feb 2003, David Schultz wrote: > : > Am I correct to assume that there probably aren't any modules that would > : > have ABI compatability issues? > : > : Won't systems with existing quotas be forced to recreate their quota > : files? > > Little endian filesystems would be fine. Big endian file systems > would need to regenerate things, no? The patch changes struct dquot, which is not written to the quota file, so that shouldn't be an issue. (dquot just has a reference count and a few memory pointers.) You seem to be thinking of struct dqblk, which stores the actual quotas. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 7:10:12 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 18D4C37B401 for ; Fri, 21 Feb 2003 07:10:10 -0800 (PST) Received: from angelica.unixdaemons.com (angelica.unixdaemons.com [209.148.64.135]) by mx1.FreeBSD.org (Postfix) with ESMTP id 54D3243F93 for ; Fri, 21 Feb 2003 07:10:09 -0800 (PST) (envelope-from hiten@angelica.unixdaemons.com) Received: from angelica.unixdaemons.com (hiten@localhost.unixdaemons.com [127.0.0.1]) by angelica.unixdaemons.com (8.12.7/8.12.1) with ESMTP id h1LFA8eG060571 for ; Fri, 21 Feb 2003 10:10:08 -0500 (EST) Received: (from hiten@localhost) by angelica.unixdaemons.com (8.12.7/8.12.1/Submit) id h1LFA7ul060570 for FreeBSD-arch@FreeBSD.ORG; Fri, 21 Feb 2003 10:10:07 -0500 (EST) (envelope-from hiten) Date: Fri, 21 Feb 2003 10:10:07 -0500 From: Hiten Pandya To: FreeBSD-arch@FreeBSD.ORG Subject: Mbuf flags cleanup proposal Message-ID: <20030221151007.GA60348@unixdaemons.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4i X-Operating-System: FreeBSD i386 X-Public-Key: http://www.pittgoth.com/~hiten/pubkey.asc X-URL: http://www.unixdaemons.com/~hiten X-PGP: http://pgp.mit.edu:11371/pks/lookup?search=Hiten+Pandya&op=index Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Gang. I have a proposal to rename the current mbuf subsystem flag bits as follows: mbuf (sys/mbuf.h) flags: M_TRYWAIT -> MB_TRYWAIT M_DONTWAIT -> MB_DONTWAIT mbchain(9) (sys/mchain) flags: MB_MSYSTEM -> MBC_MSYSTEM MB_MUSER -> MBC_MUSER MB_MINLINE -> MBC_MINLINE MB_MZERO -> MBC_MZERO MB_MCUSTOM -> MBC_MCUSTOM This would also be beneficial for the various mbuf(9) and mbchain(9) routines. The following are the reasons why I think it should be done: - Less confusion. - Less mistakes in future. Any reasonable objections/comments? Cheers. -- Hiten Pandya (hiten@unixdaemons.com, hiten@uk.FreeBSD.org) http://www.unixdaemons.com/~hiten/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 9: 6:44 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B975137B401 for ; Fri, 21 Feb 2003 09:06:42 -0800 (PST) Received: from ebb.errno.com (ebb.errno.com [66.127.85.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2C5AA43F85 for ; Fri, 21 Feb 2003 09:06:42 -0800 (PST) (envelope-from sam@errno.com) Received: from melange (melange.errno.com [66.127.85.82]) (authenticated bits=0) by ebb.errno.com (8.12.5/8.12.1) with ESMTP id h1LH6fnN051641 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO); Fri, 21 Feb 2003 09:06:41 -0800 (PST)?g (envelope-from sam@errno.com)œ X-Authentication-Warning: ebb.errno.com: Host melange.errno.com [66.127.85.82] claimed to be melange Message-ID: <122c01c2d9cb$9add0640$52557f42@errno.com> From: "Sam Leffler" To: "Hiten Pandya" , References: <20030221151007.GA60348@unixdaemons.com> Subject: Re: Mbuf flags cleanup proposal Date: Fri, 21 Feb 2003 09:06:41 -0800 Organization: Errno Consulting MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4807.1700 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG > Gang. I have a proposal to rename the current mbuf subsystem > flag bits as follows: > > mbuf (sys/mbuf.h) flags: > > M_TRYWAIT -> MB_TRYWAIT > M_DONTWAIT -> MB_DONTWAIT > > mbchain(9) (sys/mchain) flags: > > MB_MSYSTEM -> MBC_MSYSTEM > MB_MUSER -> MBC_MUSER > MB_MINLINE -> MBC_MINLINE > MB_MZERO -> MBC_MZERO > MB_MCUSTOM -> MBC_MCUSTOM > > This would also be beneficial for the various mbuf(9) > and mbchain(9) routines. The following are the reasons > why I think it should be done: > > - Less confusion. > - Less mistakes in future. > > Any reasonable objections/comments? This would mean breaking compatibility with other releases and other bsd systems unless you left compatibility shims in place. The intent is to enforce the right flags by checking them at runtime. This should eliminate the "less mistakes in future" case. I don't consider less confusion a valid argument; since these are all just #define's there is no compile-time enforcement and unless you define the flags to have separate values you're back where you were before. But if you make them separate values then you've got nothing different than what's already proposed. I suggest that this issue has been resolved and you should leave it alone. Sam To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 9:25:27 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 512F537B401 for ; Fri, 21 Feb 2003 09:25:26 -0800 (PST) Received: from harmony.village.org (rover.bsdimp.com [204.144.255.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7E75443F75 for ; Fri, 21 Feb 2003 09:25:25 -0800 (PST) (envelope-from imp@bsdimp.com) Received: from localhost (warner@rover2.village.org [10.0.0.1]) by harmony.village.org (8.12.6/8.12.3) with ESMTP id h1LHPO3Y063357; Fri, 21 Feb 2003 10:25:24 -0700 (MST) (envelope-from imp@bsdimp.com) Date: Fri, 21 Feb 2003 10:24:46 -0700 (MST) Message-Id: <20030221.102446.48202444.imp@bsdimp.com> To: hiten@unixdaemons.com Cc: FreeBSD-arch@FreeBSD.ORG Subject: Re: Mbuf flags cleanup proposal From: "M. Warner Losh" In-Reply-To: <20030221151007.GA60348@unixdaemons.com> References: <20030221151007.GA60348@unixdaemons.com> X-Mailer: Mew version 2.1 on Emacs 21.2 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG In message: <20030221151007.GA60348@unixdaemons.com> Hiten Pandya writes: : Gang. I have a proposal to rename the current mbuf subsystem No. Warner To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 10:53:32 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A01DF37B406 for ; Fri, 21 Feb 2003 10:53:30 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id D36AF43FD7 for ; Fri, 21 Feb 2003 10:53:29 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from dialup-209.245.134.181.dial1.sanjose1.level3.net ([209.245.134.181] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18mIIa-0007Qo-00; Fri, 21 Feb 2003 10:53:28 -0800 Message-ID: <3E5673E7.F3F1FA4F@mindspring.com> Date: Fri, 21 Feb 2003 10:45:59 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Hiten Pandya Cc: FreeBSD-arch@FreeBSD.ORG Subject: Re: Mbuf flags cleanup proposal References: <20030221151007.GA60348@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4618b3c353e223a8afe8b3a5d5bd2980e387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Hiten Pandya wrote: > Gang. I have a proposal to rename the current mbuf subsystem > flag bits as follows: > > mbuf (sys/mbuf.h) flags: > > M_TRYWAIT -> MB_TRYWAIT > M_DONTWAIT -> MB_DONTWAIT [ ... ] > Any reasonable objections/comments? 1) This seems to move away from integration of UVA and the mbuf allocator. 2) It seems to me that all code should be moving to non-blocking interfaces, and blocking interfaces should be deprecated. 3) "TRYWAIT" is really useless; either I can depend on blocking until the request is satisfied, or I can't; if I can't, then I might as well not have the extra complication of "wait a little bit, and then fail anyway": it doesn't make my code any less complicated. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 12: 8:27 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DB82237B401 for ; Fri, 21 Feb 2003 12:08:25 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0085F43FA3 for ; Fri, 21 Feb 2003 12:08:25 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1LK7h979382; Fri, 21 Feb 2003 15:07:43 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Fri, 21 Feb 2003 15:07:43 -0500 From: Bosko Milekic To: Terry Lambert Cc: Hiten Pandya , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mbuf flags cleanup proposal Message-ID: <20030221150743.A79345@unixdaemons.com> References: <20030221151007.GA60348@unixdaemons.com> <3E5673E7.F3F1FA4F@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3E5673E7.F3F1FA4F@mindspring.com>; from tlambert2@mindspring.com on Fri, Feb 21, 2003 at 10:45:59AM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG For What It's Worth: On Fri, Feb 21, 2003 at 10:45:59AM -0800, Terry Lambert wrote: > Hiten Pandya wrote: > > Gang. I have a proposal to rename the current mbuf subsystem > > flag bits as follows: > > > > mbuf (sys/mbuf.h) flags: > > > > M_TRYWAIT -> MB_TRYWAIT > > M_DONTWAIT -> MB_DONTWAIT > > [ ... ] > > > Any reasonable objections/comments? > > 1) This seems to move away from integration of UVA and the > mbuf allocator. You mean UMA. And this has absolutely nothing to do with it. There are several other reasons that the move is most likely not going to happen (they are cited at the top of the most recent subr_mbuf.c in the comments). I wish people read those more often and stopped blindly recommending where and to what things should move towards, without providing reasonable solutions to the problems that actually arise when TRYING to do what they suggest. In other words, talk is cheap, dirt cheap. > 2) It seems to me that all code should be moving to non-blocking > interfaces, and blocking interfaces should be deprecated. > > 3) "TRYWAIT" is really useless; either I can depend on blocking > until the request is satisfied, or I can't; if I can't, then > I might as well not have the extra complication of "wait a > little bit, and then fail anyway": it doesn't make my code > any less complicated. The behavior has absolutely nothing to do with your code. It's not an API thing, it's the default behavior of the allocator which tries to wait at least a little bit to see if it can recover during an exhaustion before giving up. I agree with you when you say that the networking code[1] should be taught to deal with failure (however "drastic" its dealing with it is is irrelevant), but that should not be sufficient reason to preclude the allocator from at least trying harder when it's in a tight spot. [1] This is what I infer you're saying from this post and this earlier one: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333614+0+archive/2003/ \ freebsd-arch/20030126.freebsd-arch > -- Terry As for the flags renaming, although I would like to see it, Sam made a good point in the first response to this post. Now let's just let it die and move on to bigger and better things. -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 12:10:43 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6B2F437B401 for ; Fri, 21 Feb 2003 12:10:42 -0800 (PST) Received: from smtp1.server.rpi.edu (smtp1.server.rpi.edu [128.113.2.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 08CC043F93 for ; Fri, 21 Feb 2003 12:10:41 -0800 (PST) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp1.server.rpi.edu (8.12.7/8.12.7) with ESMTP id h1LKAZ3q007293; Fri, 21 Feb 2003 15:10:36 -0500 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: In-Reply-To: <002f01c2d9e0$79434a70$140aa8c0@corp.exodus.net> References: <20030210114930.GB90800@melusine.cuivre.fr.eu.org> <200302141733.29304.wes@softweyr.com> <200302150905.08387.wes@softweyr.com> <002f01c2d9e0$79434a70$140aa8c0@corp.exodus.net> Date: Fri, 21 Feb 2003 15:10:34 -0500 To: "Todd Wagner" From: Garance A Drosihn Subject: Re: NEWSYSLOG changes Cc: Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-RPI-Spam-Score: -1.9 () IN_REP_TO,QUOTED_EMAIL_TEXT,REFERENCES,SIGNATURE_SHORT_DENSE,SPAM_PHRASE_01_02 X-Scanned-By: MIMEDefang 2.28 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG At 11:36 AM -0800 2/21/03, Todd Wagner wrote: >Garance wrote: > > I expect to have more changes to newsyslog after these two, > > so also let me know if there's "other things" you'd like > > to see. > >I'd like to see the ability to rotate the log off with a >timestamp, instead of within a rotation. > >See PR bin/30654. Actually, I have also wanted something like that. I'll look into your PR. I might not do it quite the same way though. Thanks for the patch, it'll get me thinking about it. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 15:21: 3 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4A49737B401 for ; Fri, 21 Feb 2003 15:20:59 -0800 (PST) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6F05A43F75 for ; Fri, 21 Feb 2003 15:20:58 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0179.cvx21-bradley.dialup.earthlink.net ([209.179.192.179] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18mMTJ-0007Ob-00; Fri, 21 Feb 2003 15:20:50 -0800 Message-ID: <3E56B3F5.9EF3F9FE@mindspring.com> Date: Fri, 21 Feb 2003 15:19:17 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bosko Milekic Cc: Hiten Pandya , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mbuf flags cleanup proposal References: <20030221151007.GA60348@unixdaemons.com> <3E5673E7.F3F1FA4F@mindspring.com> <20030221150743.A79345@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4e5a9249e0b3597340f9139e4c0c151f4548b785378294e88350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic wrote: > > 1) This seems to move away from integration of UVA and the > > mbuf allocator. > > You mean UMA. Yeah, UMA or whatever. > And this has absolutely nothing to do with it. There > are several other reasons that the move is most likely not going to > happen (they are cited at the top of the most recent subr_mbuf.c in > the comments). I wish people read those more often and stopped > blindly recommending where and to what things should move towards, > without providing reasonable solutions to the problems that actually > arise when TRYING to do what they suggest. In other words, talk is > cheap, dirt cheap. You want the gloves off, we can take the gloves off. Let me point out that I did *not* participate in the discussion where people were giving you crap for your hysteresis refill idea, just because you failed to articulate the fact that the hysteresis applied only to the garbage collection, and the people who were giving you crap failed to read your code, and not your email. I'm aware of the code, and what you failed to say about it, and what the people condemning it failed to read about it. Let me also point out that I've avoided commenting on the use of Horde-style algorithms in the code, once the project commited to that path. So, in general, I don't comment on your mbuf allocator. Finally, let me point out that the idea of *not* integrating the various allocators to a single, underlying allocation framework is incredibly naieve. The Rodney King idea of "Can't we all just get along?" is inherently bad. You cannot be egalitarian and let everyone use whatever the hell model they want. That's what's wrong with the whole FreeBSD SMP effort, which can't decide whether it's locking code, critical paths, both, neither, and so you get these massive, heavy-weight locking implementations that can't decide if reentrancy is good, because it means you don't have to rewrite code, or reentrancy is bad, because it means that code that needs to be rewritten never is. In other words, I know that people have been giving you some unjustified crap about the code, and that my posting might have looked enough like it that you gave a knee-jerk response. With that in mind... Even if it's not your personal intent to address the interface layering issues that prevent the code being integrated, it's a bad idea to add yet-more-frobs to make the job harder for someone twho *is* willing to do that scut work to do the integration at some point in the future. This is a fundamental issue of project philosophy, and it should *NOT* be decided by fiat, it should be decided by the project, as a whole, or, minimally, the architecture board, whoever they are. > > 2) It seems to me that all code should be moving to non-blocking > > interfaces, and blocking interfaces should be deprecated. > > > > 3) "TRYWAIT" is really useless; either I can depend on blocking > > until the request is satisfied, or I can't; if I can't, then > > I might as well not have the extra complication of "wait a > > little bit, and then fail anyway": it doesn't make my code > > any less complicated. > > The behavior has absolutely nothing to do with your code. It's not an > API thing, it's the default behavior of the allocator which tries to > wait at least a little bit to see if it can recover during an > exhaustion before giving up. I agree with you when you say that the > networking code[1] should be taught to deal with failure (however > "drastic" its dealing with it is is irrelevant), but that should not be > sufficient reason to preclude the allocator from at least trying > harder when it's in a tight spot. In for a penny, in for a pound... 8-(. I am fundamentally philosophically opposed to the idea of code that will do it's best only if you ask it to, and by default will only make a half-assed attempt, and in neither case is it willing to commit to doing the job it claims it is capable of doing. An API is a *CONTRACT*. I realize that the real reason this is there is that "TRYWAIT" *really* means "It's OK to sleep, because we have a context on which to sleep available to us", and the reason it's "TRYWAIT" instead of "WAIT" is that what's *really* being said is "Wait for resources, but in the absence of a working contention resolution protocol in a low resource situation, try to fake one by backing off, and hope that's good enough". > [1] This is what I infer you're saying from this post and this > earlier one: > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333614+0+archive/2003/ \ > freebsd-arch/20030126.freebsd-arch You need to reread the posting. It is *precisely*, to quote you, "sufficient reason to preclude the allocator from at least trying harder when it's in a tight spot" There's no benefit to putting "try-to-worm-out-of-a-tight-spot" code in every little subsystem. There should be a system-wide strategy for "deal with being in a tight spot". This is morally equivalent "rape-proofing" a small section of the sidewalk from the bus station to your house, and then claiming it's now safe to walk home from the bus station. The literature, in general, tells us that the correct strategy is "shed load"; that is, fail to service requests, and, if necessary, indicate that failure to whoever cares enough that they will not retry if not given an indicator. Further, it tells us to do this *as early as possible*, so that we don't waste time partially processing a request that we will be unable to complete processing on at the last minute. Djikstra called this the "Banker's Algorithm"; it has to do with precommitting *all* resources, and if that can't be done, don't *waste time* starting a job that can't be completed. > As for the flags renaming, although I would like to see it, Sam made > a good point in the first response to this post. Now let's just let > it die and move on to bigger and better things. I'm all for renaming, if we don't care about integration at any point in the future. You are claiming that we don't. If you are right, then seperate the namespace overlap, and be done with it. If not, bow to the pressure and admit that integration is a future goal, rather than coming up with reasons why it's not possible to do it, ever. Frankly, I think most of the contention is that people *want* the integration, but you and Jeff are building your own little kingdoms. What the rename amounts to, in that case, is that you would like to build a fence, on the theory that "good fences make good neighbors". I understand your arguments against integration, and I understand *the* argument against integration: that you and Jeff would have to hammer out a mutually acceptable philosophy, going forward, and that that is volunteer work for which neither of you is willing to volunteer, so it's not getting done. What I'm telling you is that the general population of the project *wants* integration. The *do not* want to have to learn multiple allocation API's to use in their code, and the *do not* want to have to live with additional complexity tha makes it harder to find someone maintain the code in the future, if you, Jeff, or both get hit by a bus. In fact, if one of you *were* to get hit by a bus, we would *definitely* end up getting the integration we want, because the other code would rot for lack of a maintainer who has bought into a general allocation philosophy held by the project. If you are going to build something, at least try to build something that will outlive you; that's not possible in a vacuum, without a general buy-in. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 17:18:20 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7429A37B401 for ; Fri, 21 Feb 2003 17:18:14 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9482D43FD7 for ; Fri, 21 Feb 2003 17:18:13 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1M1HSh80701; Fri, 21 Feb 2003 20:17:28 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Fri, 21 Feb 2003 20:17:28 -0500 From: Bosko Milekic To: Terry Lambert Cc: Hiten Pandya , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mbuf flags cleanup proposal Message-ID: <20030221201728.A80661@unixdaemons.com> References: <20030221151007.GA60348@unixdaemons.com> <3E5673E7.F3F1FA4F@mindspring.com> <20030221150743.A79345@unixdaemons.com> <3E56B3F5.9EF3F9FE@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3E56B3F5.9EF3F9FE@mindspring.com>; from tlambert2@mindspring.com on Fri, Feb 21, 2003 at 03:19:17PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I'm not going to engage in a talk about integration and egalitarianism and whatever-else-you-feel-like-talking-about-today. You have some good ideas, you have some good experience, and I have seen some of good code of yours (admittedly, not recently, though). That still doesn't make you right about everything. In FreeBSD (notice I said "FreeBSD," not TerryBSD, SunOS, Linux, or whatever else), network buffer allocations have for the longest time been done seperately and for good enough reason; as a result, FreeBSD has adopted some fairly serious optimizations for what concerns the way they (and supporting structures) are allocated. "My" allocator (and by the way, it's wrong to call it _mine_ because it really is the result of a number of different people who have worked on it) maintains those optimizations (for lack of finding equally simple and well-performing replacements) while at the same time taking advantage of parallel processing in the kernel. Similarily, UMA is a great allocator, and it does similar things for general-purpose allocations in the system. If you ask me whether or not mbuf allocations can be made to use UMA? The answer is yes. If you ask me whether performance is going to be better? I don't know for sure, but I can tell you that in order to solve the issues I bring up it's going to be difficult, and I _do_ know that if you don't solve them, performance is going to suck, comparatively speaking. Solving them would require what I think is relatively serious modification to UMA which, in my opinion anyway, would uglify [sic] it. I *have* looked at it, and I think that the fact that UMA does allocations for all objects using the same techniques is great and - I can't speak for Jeff - but *I* wouldn't want to hack at it just so that we can get the optimizations/solutions we currently have for mbuf allocations. And, you know what? If *you* think it's worth it, why don't *YOU* do it and waste hours on end to, finally, get something that _maybe_ performs as well at the expense of an uglier (not-so-general-and-simple-anymore) allocator. On Fri, Feb 21, 2003 at 03:19:17PM -0800, Terry Lambert wrote: > Bosko Milekic wrote: > > > 1) This seems to move away from integration of UVA and the > > > mbuf allocator. > > > > You mean UMA. > > Yeah, UMA or whatever. > > > And this has absolutely nothing to do with it. There > > are several other reasons that the move is most likely not going to > > happen (they are cited at the top of the most recent subr_mbuf.c in > > the comments). I wish people read those more often and stopped > > blindly recommending where and to what things should move towards, > > without providing reasonable solutions to the problems that actually > > arise when TRYING to do what they suggest. In other words, talk is > > cheap, dirt cheap. > > You want the gloves off, we can take the gloves off. > > Let me point out that I did *not* participate in the discussion > where people were giving you crap for your hysteresis refill idea, > just because you failed to articulate the fact that the hysteresis > applied only to the garbage collection, and the people who were > giving you crap failed to read your code, and not your email. I'm > aware of the code, and what you failed to say about it, and what > the people condemning it failed to read about it. > > Let me also point out that I've avoided commenting on the use of > Horde-style algorithms in the code, once the project commited to > that path. So, in general, I don't comment on your mbuf allocator. > > Finally, let me point out that the idea of *not* integrating the > various allocators to a single, underlying allocation framework is > incredibly naieve. The Rodney King idea of "Can't we all just get > along?" is inherently bad. You cannot be egalitarian and let > everyone use whatever the hell model they want. That's what's wrong > with the whole FreeBSD SMP effort, which can't decide whether it's > locking code, critical paths, both, neither, and so you get these > massive, heavy-weight locking implementations that can't decide if > reentrancy is good, because it means you don't have to rewrite code, > or reentrancy is bad, because it means that code that needs to be > rewritten never is. > > > In other words, I know that people have been giving you some > unjustified crap about the code, and that my posting might > have looked enough like it that you gave a knee-jerk response. > > > With that in mind... > > Even if it's not your personal intent to address the interface > layering issues that prevent the code being integrated, it's a > bad idea to add yet-more-frobs to make the job harder for someone > twho *is* willing to do that scut work to do the integration at > some point in the future. > > This is a fundamental issue of project philosophy, and it should > *NOT* be decided by fiat, it should be decided by the project, as > a whole, or, minimally, the architecture board, whoever they are. > > > > > 2) It seems to me that all code should be moving to non-blocking > > > interfaces, and blocking interfaces should be deprecated. > > > > > > 3) "TRYWAIT" is really useless; either I can depend on blocking > > > until the request is satisfied, or I can't; if I can't, then > > > I might as well not have the extra complication of "wait a > > > little bit, and then fail anyway": it doesn't make my code > > > any less complicated. > > > > The behavior has absolutely nothing to do with your code. It's not an > > API thing, it's the default behavior of the allocator which tries to > > wait at least a little bit to see if it can recover during an > > exhaustion before giving up. I agree with you when you say that the > > networking code[1] should be taught to deal with failure (however > > "drastic" its dealing with it is is irrelevant), but that should not be > > sufficient reason to preclude the allocator from at least trying > > harder when it's in a tight spot. > > In for a penny, in for a pound... 8-(. > > I am fundamentally philosophically opposed to the idea of code > that will do it's best only if you ask it to, and by default will > only make a half-assed attempt, and in neither case is it willing > to commit to doing the job it claims it is capable of doing. > > An API is a *CONTRACT*. > > > I realize that the real reason this is there is that "TRYWAIT" > *really* means "It's OK to sleep, because we have a context on > which to sleep available to us", and the reason it's "TRYWAIT" > instead of "WAIT" is that what's *really* being said is "Wait > for resources, but in the absence of a working contention resolution > protocol in a low resource situation, try to fake one by backing > off, and hope that's good enough". > > > > [1] This is what I infer you're saying from this post and this > > earlier one: > > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=333614+0+archive/2003/ \ > > freebsd-arch/20030126.freebsd-arch > > You need to reread the posting. It is *precisely*, to quote you, > > "sufficient reason to preclude the allocator from at least > trying harder when it's in a tight spot" > > There's no benefit to putting "try-to-worm-out-of-a-tight-spot" > code in every little subsystem. There should be a system-wide > strategy for "deal with being in a tight spot". > > This is morally equivalent "rape-proofing" a small section of the > sidewalk from the bus station to your house, and then claiming it's > now safe to walk home from the bus station. > > The literature, in general, tells us that the correct strategy is > "shed load"; that is, fail to service requests, and, if necessary, > indicate that failure to whoever cares enough that they will not > retry if not given an indicator. > > Further, it tells us to do this *as early as possible*, so that we > don't waste time partially processing a request that we will be > unable to complete processing on at the last minute. > > Djikstra called this the "Banker's Algorithm"; it has to do with > precommitting *all* resources, and if that can't be done, don't > *waste time* starting a job that can't be completed. > > > > As for the flags renaming, although I would like to see it, Sam made > > a good point in the first response to this post. Now let's just let > > it die and move on to bigger and better things. > > I'm all for renaming, if we don't care about integration at any > point in the future. You are claiming that we don't. If you are > right, then seperate the namespace overlap, and be done with it. > If not, bow to the pressure and admit that integration is a future > goal, rather than coming up with reasons why it's not possible to > do it, ever. > > Frankly, I think most of the contention is that people *want* > the integration, but you and Jeff are building your own little > kingdoms. What the rename amounts to, in that case, is that you > would like to build a fence, on the theory that "good fences make > good neighbors". > > I understand your arguments against integration, and I understand > *the* argument against integration: that you and Jeff would have to > hammer out a mutually acceptable philosophy, going forward, and > that that is volunteer work for which neither of you is willing to > volunteer, so it's not getting done. > > What I'm telling you is that the general population of the project > *wants* integration. The *do not* want to have to learn multiple > allocation API's to use in their code, and the *do not* want to > have to live with additional complexity tha makes it harder to > find someone maintain the code in the future, if you, Jeff, or both > get hit by a bus. In fact, if one of you *were* to get hit by a > bus, we would *definitely* end up getting the integration we want, > because the other code would rot for lack of a maintainer who has > bought into a general allocation philosophy held by the project. > > If you are going to build something, at least try to build something > that will outlive you; that's not possible in a vacuum, without a > general buy-in. > > > -- Terry > -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 20:14:49 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1DC4037B401 for ; Fri, 21 Feb 2003 20:14:46 -0800 (PST) Received: from bluejay.mail.pas.earthlink.net (bluejay.mail.pas.earthlink.net [207.217.120.218]) by mx1.FreeBSD.org (Postfix) with ESMTP id E23E243FAF for ; Fri, 21 Feb 2003 20:14:44 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0402.cvx22-bradley.dialup.earthlink.net ([209.179.199.147] helo=mindspring.com) by bluejay.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18mR3f-0000kU-00; Fri, 21 Feb 2003 20:14:40 -0800 Message-ID: <3E56F8DE.5453DB88@mindspring.com> Date: Fri, 21 Feb 2003 20:13:18 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Bosko Milekic Cc: Hiten Pandya , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mbuf flags cleanup proposal References: <20030221151007.GA60348@unixdaemons.com> <3E5673E7.F3F1FA4F@mindspring.com> <20030221150743.A79345@unixdaemons.com> <3E56B3F5.9EF3F9FE@mindspring.com> <20030221201728.A80661@unixdaemons.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a483b8a2de0555cd969da4bda0e6d22851350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic wrote: > In FreeBSD (notice I said "FreeBSD," not TerryBSD, SunOS, Linux, or > whatever else), network buffer allocations have for the longest time > been done seperately and for good enough reason; as a result, FreeBSD > has adopted some fairly serious optimizations for what concerns the > way they (and supporting structures) are allocated. The mbufs, historically, were allocates out of a zalloci zone, which was the only allocator type (again, historically) to support allocation at interrupt time. The argument that this allocation was seperate has more to do with all interrupt allocation being seperate, than it does with a special case for mbufs alone. Nevertheless, it's generally true that the first thing I do when converting a FreeBSD kernel for embedded processing in network equipment is "replace the mbuf allocator". Generally, I use a machdep.c KVA preallocated freelist (FWIW), which is a heck of a lot faster than anything that's ever been committed to the FreeBSD CVS repository. As such, there are some good arguments for seperation, or at least layered abstraction, of the mbuf allocator. > "My" allocator > (and by the way, it's wrong to call it _mine_ because it really is the > result of a number of different people who have worked on it) > maintains those optimizations (for lack of finding equally simple and > well-performing replacements) while at the same time taking advantage > of parallel processing in the kernel. I am an extremely vocal advocate of parallelization of code paths, and have been since I "rescued" Jack Vogel's 1995 SMP code from oblivion, and Peter and Steve Passe took that code and made it the basis of the FreeBSD SMP project. I fully support your efforts in this regard, and, among other things, that support has taken the form of self-censorship of public criticism, for the most part. I understand that other people have worked on the code, but you are the one who has consistently championed the code. I'm sorry if that led me to call it "your allocator" unjustly. > Similarily, UMA is a great > allocator, and it does similar things for general-purpose allocations > in the system. If you ask me whether or not mbuf allocations can be > made to use UMA? The answer is yes. If you ask me whether > performance is going to be better? I don't know for sure, but I can > tell you that in order to solve the issues I bring up it's going to be > difficult, and I _do_ know that if you don't solve them, performance > is going to suck, comparatively speaking. I understand this, as well. To my mind, it is a matter of will, on the part of you and Jeff, where some of the internals of Jeff's code need to have hooks made available for the additional processing the mbuf code needs to do. Let me say that I believe that this *will* happen, sooner or later, and that any change that would make this more difficult later is going to have to be backed out. Better that it never went in, as arguing to change something recently committed is very difficult, for social, rather than technical reasons. > Solving them would require what I think is relatively serious > modification to UMA which, in my opinion anyway, would uglify [sic] > it. I understand this as well; what I don't understand is the unwillingness to discuss it, or to do the "uglification" anyway. > I *have* looked at it, and I think that the fact that UMA does > allocations for all objects using the same techniques is great and - I > can't speak for Jeff - but *I* wouldn't want to hack at it just so > that we can get the optimizations/solutions we currently have for mbuf > allocations. And, you know what? If *you* think it's worth it, why > don't *YOU* do it and waste hours on end to, finally, get something > that _maybe_ performs as well at the expense of an uglier > (not-so-general-and-simple-anymore) allocator. I think it's possible to do -- as long as there are not changes that preclude it, such as renaming manifest constants to be use specific, etc.. Changes such as you were proposing. I'm not willing to do it today... I don't think Jeff's code is finished enough, yet, and I have confidence that he will return to it, after his diversion into a new scheduler is done. Until he does, the excessive locking in the current code is evil. I could take care of that, but he has expressed a reluctance to allow his statistics to become snapshots, rather than exact values, and I have not wanted to step on his toes over that. I will say that, if someone is willing to commit the code, I'm willing to do the abstration work. For me, it's trivially easy to do this type of work (in fact, I had planned on submitting patches soon which did the necessary indirection for boot-time selection of an arbitrary scheduler from a list of loaded modules, in the near future, which is a task with a similar level of complexity). Don't think that your work is unappreciated, but do realize that I'm not attacking the manifest constant rename proposal out of a sense of "trying to jump on a winning bandwagon", or out of a sense of "trying to interfere with progress", like some others seem to be (or they would have read the code, and realized some of the hysteresis and thrashing problems they though might happen are not, in fact possible, even with a GC thread). I genuinely believe that unification of the interfaces in the future is the right direction. Anything that abstracts complexity for other kernel programmers is good. Regards, -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 20:34:26 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C889237B401 for ; Fri, 21 Feb 2003 20:34:22 -0800 (PST) Received: from tesla.distributel.net (nat.MTL.distributel.NET [66.38.181.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0CBE943FCB for ; Fri, 21 Feb 2003 20:34:22 -0800 (PST) (envelope-from bmilekic@unixdaemons.com) Received: (from bmilekic@localhost) by tesla.distributel.net (8.11.6/8.11.6) id h1M4XaW81594; Fri, 21 Feb 2003 23:33:36 -0500 (EST) (envelope-from bmilekic@unixdaemons.com) Date: Fri, 21 Feb 2003 23:33:31 -0500 From: Bosko Milekic To: Terry Lambert Cc: Hiten Pandya , FreeBSD-arch@FreeBSD.ORG Subject: Re: Mbuf flags cleanup proposal Message-ID: <20030221233331.A81541@unixdaemons.com> References: <20030221151007.GA60348@unixdaemons.com> <3E5673E7.F3F1FA4F@mindspring.com> <20030221150743.A79345@unixdaemons.com> <3E56B3F5.9EF3F9FE@mindspring.com> <20030221201728.A80661@unixdaemons.com> <3E56F8DE.5453DB88@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <3E56F8DE.5453DB88@mindspring.com>; from tlambert2@mindspring.com on Fri, Feb 21, 2003 at 08:13:18PM -0800 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Now that we're getting back into technical discussion mode... On Fri, Feb 21, 2003 at 08:13:18PM -0800, Terry Lambert wrote: > Bosko Milekic wrote: > > In FreeBSD (notice I said "FreeBSD," not TerryBSD, SunOS, Linux, or > > whatever else), network buffer allocations have for the longest time > > been done seperately and for good enough reason; as a result, FreeBSD > > has adopted some fairly serious optimizations for what concerns the > > way they (and supporting structures) are allocated. > > The mbufs, historically, were allocates out of a zalloci zone, > which was the only allocator type (again, historically) to support > allocation at interrupt time. The argument that this allocation > was seperate has more to do with all interrupt allocation being > seperate, than it does with a special case for mbufs alone. Actually, they weren't really. I'm fairly confident that they never were (at least not from 2.2.2 and on). I believe that this was a local change you had, because the mbufs and clusters were always really allocated their own way, from their own map, as long as I've been working on the code (and probably even before that). > Nevertheless, it's generally true that the first thing I do when > converting a FreeBSD kernel for embedded processing in network > equipment is "replace the mbuf allocator". Generally, I use a > machdep.c KVA preallocated freelist (FWIW), which is a heck of a > lot faster than anything that's ever been committed to the FreeBSD > CVS repository. As such, there are some good arguments for seperation, > or at least layered abstraction, of the mbuf allocator. [...] > > Similarily, UMA is a great > > allocator, and it does similar things for general-purpose allocations > > in the system. If you ask me whether or not mbuf allocations can be > > made to use UMA? The answer is yes. If you ask me whether > > performance is going to be better? I don't know for sure, but I can > > tell you that in order to solve the issues I bring up it's going to be > > difficult, and I _do_ know that if you don't solve them, performance > > is going to suck, comparatively speaking. > > I understand this, as well. To my mind, it is a matter of will, on > the part of you and Jeff, where some of the internals of Jeff's code > need to have hooks made available for the additional processing the > mbuf code needs to do. > > Let me say that I believe that this *will* happen, sooner or later, > and that any change that would make this more difficult later is > going to have to be backed out. Better that it never went in, as > arguing to change something recently committed is very difficult, > for social, rather than technical reasons. The mbuf allocator currently is not that complicated, really. My argument is that I really don't think that the effort to hack those hooks into UMA only to tinker with them so that performance is as good for things like generating refcounts for clusters on-the-fly and struggling to keep the common allocation case (including initialization of fields) down to a single function is worth it. I think that in the end it's all going to boil down to ~same performance at the expense of UMA being suddenly more complicated to maintain and understand, which was what you initially wanted to avoid for current and future maintainers of the code. And, granted, you would have a valid basis to argue the opposite if the APIs really were that different and if each allocator was really that entrenched in its own "philosophy," but the fact is that the APIs have always been similar with the exception of ONE behavior - that is that the mbuf code doesn't indefinitely block with M_{,TRY}WAIT... but you've already agreed (I think) that indefinite blocking is not good anyway, right? :-) In any case, that behavior is easily modifiable without trashing the allocator. Seriously, just s/M_TRYWAIT/M_WAIT/ and change the cv_timedwait() call in subr_mbuf.c to an indefinite block on the cv and, poof, you're done (again, not that I think that would be the correct thing to do). > > Solving them would require what I think is relatively serious > > modification to UMA which, in my opinion anyway, would uglify [sic] > > it. > > I understand this as well; what I don't understand is the unwillingness > to discuss it, or to do the "uglification" anyway. Dude, seriously, try it! > > I *have* looked at it, and I think that the fact that UMA does > > allocations for all objects using the same techniques is great and - I > > can't speak for Jeff - but *I* wouldn't want to hack at it just so > > that we can get the optimizations/solutions we currently have for mbuf > > allocations. And, you know what? If *you* think it's worth it, why > > don't *YOU* do it and waste hours on end to, finally, get something > > that _maybe_ performs as well at the expense of an uglier > > (not-so-general-and-simple-anymore) allocator. > > I think it's possible to do -- as long as there are not changes > that preclude it, such as renaming manifest constants to be use > specific, etc.. Changes such as you were proposing. Actually, I stick to what I said initially which is that one has nothing to do with the other... whether or not the allocations suddenly start coming from A or B in the background isn't going to change the API. That is, to allocate an mbuf you'll keep doing m_get(), etc. All it may change is the behavior, and this only slightly. [...] > Regards, > -- Terry Regards, -- Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri Feb 21 22:50:48 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B99B137B401 for ; Fri, 21 Feb 2003 22:50:47 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id DCEF743FD7 for ; Fri, 21 Feb 2003 22:50:46 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h1M6okf98405 for ; Sat, 22 Feb 2003 01:50:46 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sat, 22 Feb 2003 01:50:45 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org Subject: More buf cache locking. Message-ID: <20030222013233.T88110-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I need to be able to pass the vnode interlock in as the lockmgr interlock in cases where we are pulling buffers off of dirty/clean lists. To support this BUF_LOCK* has to grow another argument. I am going to make a pass through the tree to add interlock where appropriate. Also, there is currently a lock for bufs on the wchan and prio fields. This is the buftimelock. This has to go away so that the vnode interlock can be used. To facilitate this I'm going to extend lockmgr to accept another pair of arguments that specify the wchan and prio. I intend to make a new function that has two extra arguments and then define an inline that supports the old semantics. If anyone objects, please let me know. Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Feb 22 10: 5: 9 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4597537B434 for ; Sat, 22 Feb 2003 10:05:05 -0800 (PST) Received: from bol.com.br (200-163-044-064.cpece7003.dsl.brasiltelecom.net.br [200.163.44.64]) by mx1.FreeBSD.org (Postfix) with SMTP id BBC6A43FEC for ; Sat, 22 Feb 2003 10:04:59 -0800 (PST) (envelope-from redacaocoml@bol.com.br) From: "Redação Comercial" To: Subject: Elaboração de cartas comerciais Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Date: Sat, 22 Feb 2003 14:04:53 -0300 Content-Transfer-Encoding: 8bit Message-Id: <20030222180459.BBC6A43FEC@mx1.FreeBSD.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG COMUNICADO IMPORTANTE!! Estamos lançando o KIT DE CARTAS COMERCIAIS, que sana suas dúvidas na elaboração de: agradecimentos, atestados e declarações, avisos, cartas de cobrança, cartas em inglês, comunicados, convites, contratos, propostas, empregos, solicitações e pedidos, telegramas, cartas por e-mail, etc. Composto de 02 (dois) disquetes com 150 modelos de documentos cada um, mais livreto 20 páginas, com técnicas de redação comercial. Indicado para: secretárias em geral, gerências, Rh, executivos, estudantes e empresas de toda ordem. Este kit possui um preço ínfimo em relação ao que poderá gerar no aperfeiçoamento da comunicação de sua empresa. Acesse nossa Home Page para mais detalhes: http://www.redacaodecartas.ihp.com.br Ps: Caso não queira receber novas mensagens e novidades sobre esse assunto, acesse: http://www.remova-me.ihp.com.br To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Feb 22 19:10:19 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0A64037B401 for ; Sat, 22 Feb 2003 19:10:18 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 531BA43F85 for ; Sat, 22 Feb 2003 19:10:16 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h1N3AFE02208 for ; Sat, 22 Feb 2003 22:10:15 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sat, 22 Feb 2003 22:10:15 -0500 (EST) From: Jeff Roberson To: arch@FreeBSD.ORG Subject: Re: More buf cache locking. (patch) In-Reply-To: <20030222013233.T88110-100000@mail.chesapeake.net> Message-ID: <20030222220659.D1116-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ok, I did this a little cleaner than I thought I would. I got rid of the buftimelock and acquire the lock's interlock outside of lockmgr. Then I pass a flag into lockmgr to tell it I've already got its interlock. I then added an interlock parameter to BUF_LOCK and BUF_TIMELOCK. The user still has to specify LK_INTERLOCK. This smp safes all of the queueing of bufs onto and off of vnode buf lists. The free list queueing is already safe. I'm testing this code right now on my laptop. I'll throw it on a few machines around here for a while before I commit as well. Since this was much lower impact than I thought it would be I'm not going to wait more than three or four days to commit assuming there are no objections. Cheers, Jeff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat Feb 22 19:10:43 2003 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CE6D537B401 for ; Sat, 22 Feb 2003 19:10:42 -0800 (PST) Received: from mail.chesapeake.net (chesapeake.net [205.130.220.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1F78143FBD for ; Sat, 22 Feb 2003 19:10:42 -0800 (PST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost) by mail.chesapeake.net (8.11.6/8.11.6) with ESMTP id h1N3Afl02335 for ; Sat, 22 Feb 2003 22:10:41 -0500 (EST) (envelope-from jroberson@chesapeake.net) Date: Sat, 22 Feb 2003 22:10:41 -0500 (EST) From: Jeff Roberson To: arch@FreeBSD.ORG Subject: Re: More buf cache locking. (patch) In-Reply-To: <20030222220659.D1116-100000@mail.chesapeake.net> Message-ID: <20030222221028.J1116-100000@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I forgot, the patch is at: http://www.chesapeake.net/~jroberson/bcache.diff To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message