From owner-freebsd-current Thu Jul 4 13:59:47 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E094037B400; Thu, 4 Jul 2002 13:59:39 -0700 (PDT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id C2B1243E09; Thu, 4 Jul 2002 13:59:35 -0700 (PDT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id QAA07652; Thu, 4 Jul 2002 16:59:32 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.11.6/8.9.1) id g64Kx2g27620; Thu, 4 Jul 2002 16:59:02 -0400 (EDT) (envelope-from gallatin@cs.duke.edu) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15652.46870.463359.853754@grasshopper.cs.duke.edu> Date: Thu, 4 Jul 2002 16:59:02 -0400 (EDT) To: Bosko Milekic Cc: "Kenneth D. Merry" , current@FreeBSD.ORG, net@FreeBSD.ORG Subject: virtually contig jumbo mbufs (was Re: new zero copy sockets snapshot) In-Reply-To: <20020620134723.A22954@unixdaemons.com> References: <20020618223635.A98350@panzer.kdm.org> <20020619090046.A2063@panzer.kdm.org> <20020619120641.A18434@unixdaemons.com> <15633.17238.109126.952673@grasshopper.cs.duke.edu> <20020619233721.A30669@unixdaemons.com> <15633.62357.79381.405511@grasshopper.cs.duke.edu> <20020620114511.A22413@unixdaemons.com> <15634.534.696063.241224@grasshopper.cs.duke.edu> <20020620134723.A22954@unixdaemons.com> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Bosko Milekic writes: > > One question. I've observed some really anomolous behaviour under > > -stable with my Myricom GM driver (2Gb/s + 2Gb/s link speed, Dual 1GHz > > pIII). When I use 4K mbufs for receives, the best speed I see is > > about 1300Mb/sec. However, if I use private 9K physically contiguous > > buffers I see 1850Mb/sec (iperf TCP). > > > > The obvious conclusion is that there's a lot of overhead in setting up > > the DMA engines, but that's not the case; we have a fairly quick chain > > dma engine. I've provided a "control" by breaking my contiguous > > buffers down into 4K chunks so that I do the same number of DMAs in > > both cases and I still see ~1850 Mb/sec for the 9K buffers. > > > > A coworker suggested that the problem was that when doing copyouts to > > userspace, the PIII was doing speculative reads and loading the cache > > with the next page. However, we then start copying from a totally > > different address using discontigous buffers, so we effectively take > > 2x the number of cache misses we'd need to. Does that sound > > reasonable to you? I need to try malloc'ing virtually contigous and > > physically discontigous buffers & see if I get the same (good) > > performance... > > I believe that the Intel chips do "virtual page caching" and that the > logic that does the virtual -> physical address translation sits between > the L2 cache and RAM. If that is indeed the case, then your idea of > testing with virtually contiguous pages is a good one. > Unfortunately, I don't know if the PIII is doing speculative > cache-loads, but it could very well be the case. If it is and if in > fact the chip does caching based on virtual addresses, then providing it > with virtually contiguous address space may yield better results. If > you try this, please let me know. I'm extremely interested in seeing > the results! contigmalloc'ed private jumbo mbufs (same as bge, if_ti, etc): % iperf -c ugly-my -l 32k -fm ------------------------------------------------------------ Client connecting to ugly-my, TCP port 5001 TCP window size: 0.2 MByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.3 port 1031 connected with 192.168.1.4 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 2137 MBytes 1792 Mbits/sec malloc'ed, physically discontigous private jumbo mbufs: % iperf -c ugly-my -l 32k -fm ------------------------------------------------------------ Client connecting to ugly-my, TCP port 5001 TCP window size: 0.2 MByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.3 port 1029 connected with 192.168.1.4 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 2131 MBytes 1788 Mbits/sec So I'd be willing to believe that the 4Mb/sec loss was due to the extra overhead of setting up 2 additional DMAs. So it looks like this idea would work. Drew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message