From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 23 20:54:05 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 343D016A4B3 for ; Tue, 23 Sep 2003 20:54:05 -0700 (PDT) Received: from smtpout.mac.com (A17-250-248-87.apple.com [17.250.248.87]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7A2E543FE1 for ; Tue, 23 Sep 2003 20:54:03 -0700 (PDT) (envelope-from justin@mac.com) Received: from mac.com (smtpin07-en2 [10.13.10.152]) by smtpout.mac.com (Xserve/MantshX 2.0) with ESMTP id h8O3s30Q011049 for ; Tue, 23 Sep 2003 20:54:03 -0700 (PDT) Received: from mac.com (12-210-49-211.client.attbi.com [12.210.49.211]) (authenticated bits=0) by mac.com (Xserve/smtpin07/MantshX 3.0) with ESMTP id h8O3s1A2003638 (version=TLSv1/SSLv3 cipher=DES-CBC3-SHA bits=168 verify=NO) for ; Tue, 23 Sep 2003 20:54:02 -0700 (PDT) Date: Tue, 23 Sep 2003 20:54:01 -0700 Content-Type: text/plain; charset=US-ASCII; format=flowed Mime-Version: 1.0 (Apple Message framework v552) From: "Justin C. Walker" To: freebsd-hackers@freebsd.org Content-Transfer-Encoding: 7bit In-Reply-To: <20030924021219.GV34641@pixies.tirloni.org> Message-Id: X-Mailer: Apple Mail (2.552) Subject: Re: mbuf doubts X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Sep 2003 03:54:05 -0000 I'm not an expert on all BSD-derived stacks and the way mbufs are defined and used in each, but: On Tuesday, September 23, 2003, at 07:12 PM, Giovanni P. Tirloni wrote: > struct mbuf *m; > > 1. Normal mbuf using m->M_databuf M_databuf is the beginning of the data area in an mbuf > 2. Normal mbuf with external storage (cluster?) in m->m_hdr->mh_data mh_data *always* points to the beginning of valid data or available space. The bit M_EXT indicates whether mh_data points into the external storage, or into the area beginning at M_databuf. > 3. Header mbuf using m->m_pktdat; This is used to access the data in an mbuf when the M_PKTHDR bit is set in the m_flags word. This is because extra space in this lead mbuf is taken up with local information pertaining to the packet and its handling. I'm not entirely clear on how it's used. > 4. Header mbuf with ext. storage (cluster?) in m->m_ext->ext_buf This points to the external storage buffer. It can be a cluster, or it can be other data areas. I believe the distinction is made based on the field ext_free in the m_ext structure (if non-null, it points to a routine to free data, and thus the external storage is *not* a cluster). > Other questions: > 1. When using ext. storage is the space allocated by M_databuf > wasted? Yes. > 2. How the system decides 256 bytes for each mbuf isn't enough and it > needs a mbuf cluster? Isn't chaining useful there? There is a constant (MINCLSIZE) that the system uses to decide when to allocate a cluster, and when to use a chain of normal mbufs. If the size is greater than MINCLSIZE, it opts for a cluster. Note that you can sometimes notice the effect of MINCLSIZE on the performance of both the system and the network, so the choice of this value can be important. It is normally set to a value that goes to clusters when two mbufs won't suffice. > 3. How does changing MSIZE affects the whole thing? Significantly :-}. This is a gnarly subject. You have to balance wasted space, time, and other subtle details (typical packet sizes vs. mbuf size; time spent dealing with chains vs. time spent dealing with clusters; ...). At one point, for example, packet sizes on the internet were strongly "bi-modal" (small packets for telnet; max-sized packets for ftp). More recently, I suspect that this has changed, but I don't know what the distribution looks like now. > 4. What about MCLBYTES? Same set of issues. AIX, for example, has a "power-of-2" collection of mbuf pools, and tries to allocate from the best pool for the requested size, bumping up at most two levels to fill empty pools. Other BSDs stick with a single size, generally 2048 bytes; this makes jumbo ethernet packets kind of expensive. Check out Wright/Stevens, "TCP/IP Illustrated, V.2", Addison Wesley, 1995. Ch. 2 is a fairly in-depth discussion of the above. It deals with a long-dead version of BSD, but the fundamentals have not changed that much. In addition, the book is a very well-done code walkthrough of the networking code in BSD (again, from long ago, but the "bones" are good). Regards, Justin -- Justin C. Walker, Curmudgeon-At-Large * Institute for General Semantics | It's not whether you win or lose... | It's whether *I* win or lose. *--------------------------------------*-------------------------------*