From owner-freebsd-performance@FreeBSD.ORG Tue Apr 1 11:10:10 2008 Return-Path: Delivered-To: performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 35F9E106568D for ; Tue, 1 Apr 2008 11:10:10 +0000 (UTC) (envelope-from rpaulo@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.169]) by mx1.freebsd.org (Postfix) with ESMTP id C0F538FC2D for ; Tue, 1 Apr 2008 11:10:06 +0000 (UTC) (envelope-from rpaulo@gmail.com) Received: by ug-out-1314.google.com with SMTP id y2so78986uge.37 for ; Tue, 01 Apr 2008 04:10:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject:message-id:references:mime-version:content-type:content-disposition:in-reply-to:user-agent:sender; bh=Mg2EoIw3Bn9zqeuLrD/DOi5Q9WiJADfVrdvtzLegDLU=; b=XXKpajZ6JbRpjUdrjIgxX5rkhkkbsL2vkjAi/LC8/PbkgrWdT27KxA8tc2i0kuoQK/4aQHDdiWTpxtqO5K/AkI3J5CaPox/pAEtYaav5VonEzwjBTXlzrm9PAu4OyqRRMlteGrbawBluDiIrNC1xvAyMRi2B/dMC+nZbF2+7oNg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:content-disposition:in-reply-to:user-agent:sender; b=PCn82L/isERc2o5gMbYCVW3G4q8iF0iVhJpRqX/WzEx8OEohX4Qt0JFpOpN7NvE8loh2vmBqHaitRwV1XQUAnKIDrisjTa4rKfZNqHskVpYzagWJdfr1ur6QdNYE/aGoT17j8b6nD069GddmV2LXpyiWUT+l4yb43xg5q8XUOg0= Received: by 10.67.30.3 with SMTP id h3mr284034ugj.35.1207046565656; Tue, 01 Apr 2008 03:42:45 -0700 (PDT) Received: from fnop.net ( [89.214.129.156]) by mx.google.com with ESMTPS id j4sm245284ugf.49.2008.04.01.03.42.42 (version=SSLv3 cipher=OTHER); Tue, 01 Apr 2008 03:42:44 -0700 (PDT) Date: Tue, 1 Apr 2008 11:41:29 +0100 From: Rui Paulo To: Anthony Pankov Message-ID: <20080401104128.GA1194@fnop.net> References: <1333421734.20080328201458@mail.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1333421734.20080328201458@mail.ru> User-Agent: Mutt/1.5.17 (2007-11-01) Sender: Rui Paulo Cc: freebsd-net@freebsd.org, performance@freebsd.org Subject: Re: packet delay because of blackhole X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 11:10:10 -0000 On Fri, Mar 28, 2008 at 08:14:58PM +0300, Anthony Pankov wrote: > Just for somebody convince. > > While analyzing client<->server HTTPS conversation one second delay in > packet exchange was discovered (strongly reproducible): > > Sample: > N time > 6 0.002303 10.28.4.14 10.28.4.50 SSL Client Hello > 7 0.106710 10.28.4.50 10.28.4.14 TCP 443 > 1447 [ACK] Seq=1 Ack=103 Win=65535 Len=0 > 8 1.045712 10.28.4.50 10.28.4.14 TLSv1 Server Hello, Certificate, Server Hello Done > > Another sample: > 10 0.011722 10.28.4.14 10.28.4.50 TLSv1 Application Data > 11 0.115933 10.28.4.50 10.28.4.14 TCP 443 > 1442 [ACK] Seq=839 Ack=519 Win=65466 Len=0 > 12 1.054037 10.28.4.50 10.28.4.14 TLSv1 Application Data > > The reason for delay is sysctl tcp.blackhole value grater than 0, much to surprise. > > So, turning tcp.blackhole to 0 eliminate any delay (strongly reproducible). > > System: FreeBSD 6_2_stable I'm not sure how performance penalty can induce a cache miss and I it's very processor specific. So, you're best guess is to profile the kernel. Regards, -- Rui Paulo From owner-freebsd-performance@FreeBSD.ORG Tue Apr 1 14:03:49 2008 Return-Path: Delivered-To: performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 11239106566B; Tue, 1 Apr 2008 14:03:49 +0000 (UTC) (envelope-from ap00@mail.ru) Received: from mx0.awanti.com (mx0.awanti.com [91.190.112.18]) by mx1.freebsd.org (Postfix) with ESMTP id 8EABE8FC16; Tue, 1 Apr 2008 14:03:48 +0000 (UTC) (envelope-from ap00@mail.ru) Received: by mx0.awanti.com (Postfix, from userid 100) id D55244C3F1; Tue, 1 Apr 2008 18:03:46 +0400 (MSD) X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.1.9 on mx0.awanti.com X-Spam-Status: No, score=-2.3 required=6.5 tests=AWL,BAYES_00 autolearn=ham version=3.1.9 Received: from pstation (unknown [10.28.4.14]) by mx0.awanti.com (Postfix) with ESMTP id A5F364C3DD; Tue, 1 Apr 2008 18:03:45 +0400 (MSD) Date: Tue, 1 Apr 2008 18:05:29 +0400 From: Anthony Pankov X-Mailer: The Bat! (v1.51) Personal X-Priority: 3 (Normal) Message-ID: <1493139437.20080401180529@mail.ru> To: Rui Paulo In-Reply-To: <20080401104128.GA1194@fnop.net> References: <1333421734.20080328201458@mail.ru> <20080401104128.GA1194@fnop.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, performance@freebsd.org Subject: Re[2]: packet delay because of blackhole X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Anthony Pankov List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 14:03:49 -0000 Hello Rui, Tuesday, April 01, 2008, 2:41:29 PM, you wrote: RP> On Fri, Mar 28, 2008 at 08:14:58PM +0300, Anthony Pankov wrote: >> Just for somebody convince. >> >> While analyzing client<->server HTTPS conversation one second delay in >> packet exchange was discovered (strongly reproducible): >> >> Sample: >> N time >> 6 0.002303 10.28.4.14 10.28.4.50 SSL Client Hello >> 7 0.106710 10.28.4.50 10.28.4.14 TCP 443 > 1447 [ACK] Seq=1 Ack=103 Win=65535 Len=0 >> 8 1.045712 10.28.4.50 10.28.4.14 TLSv1 Server Hello, Certificate, Server Hello Done >> >> Another sample: >> 10 0.011722 10.28.4.14 10.28.4.50 TLSv1 Application Data >> 11 0.115933 10.28.4.50 10.28.4.14 TCP 443 > 1442 [ACK] Seq=839 Ack=519 Win=65466 Len=0 >> 12 1.054037 10.28.4.50 10.28.4.14 TLSv1 Application Data >> >> The reason for delay is sysctl tcp.blackhole value grater than 0, much to surprise. >> >> So, turning tcp.blackhole to 0 eliminate any delay (strongly reproducible). >> >> System: FreeBSD 6_2_stable RP> I'm not sure how performance penalty can induce a cache miss and I RP> it's very processor specific. So, you're best guess is to profile the RP> kernel. RP> Regards, I'm not fully understand what cache miss do you mean. I'll try to be more clear. During client<->server HTTPS conversation there is a packet delay (see "sample" and "Another sample") about 900 ms. Delay appear one per conversation in random place (between 7-8 packet in "sample", 11-12 in "another sample"). Of course, it's not depending from SSL session cache, SSL negotiation or any other apache/mod_ssl/OpenSSL setting/performance, otherwise i should write to another maillist. I have disabled all my sysctl tuning, one by one. No effect has achieved. But when i turn tcp.blackhole to zero, all things became fine. Maximum delay between packet is 6 ms. It is strange, so i've reported to all. I suggest to keep tcp.blackhole=0 and use firewall for protection. If one would raise tcp.blackhole value, than he should dump packets and make sure that there is no strange delay between packets. It most likely FreeBSD net issue. P.S. "Another sample" is not a sequel of "Sample", it is a dump of different transaction. -- Best regards, Anthony mailto:ap00@mail.ru From owner-freebsd-performance@FreeBSD.ORG Wed Apr 2 04:51:46 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 116DE106566B for ; Wed, 2 Apr 2008 04:51:46 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from tomjudge.vm.bytemark.co.uk (tomjudge.vm.bytemark.co.uk [80.68.91.100]) by mx1.freebsd.org (Postfix) with ESMTP id C93A88FC1C for ; Wed, 2 Apr 2008 04:51:45 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from localhost (localhost [127.0.0.1]) by tomjudge.vm.bytemark.co.uk (Postfix) with ESMTP id E4329341EA; Wed, 2 Apr 2008 05:29:14 +0100 (BST) Received: from tomjudge.vm.bytemark.co.uk ([127.0.0.1]) by localhost (tomjudge.vm.bytemark.co.uk [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rTCsq-+fOGvv; Wed, 2 Apr 2008 05:29:14 +0100 (BST) Received: from [192.168.255.6] (unknown [192.168.255.6]) by tomjudge.vm.bytemark.co.uk (Postfix) with ESMTP id 03A5C3417B; Wed, 2 Apr 2008 05:29:10 +0100 (BST) Message-ID: <47F30B96.6000605@tomjudge.com> Date: Tue, 01 Apr 2008 23:29:10 -0500 From: Tom Judge User-Agent: Thunderbird 2.0.0.12 (X11/20080227) MIME-Version: 1.0 To: Josh Paetzel References: <24adbbc00803231521h78844f26q77c48573f82408b9@mail.gmail.com> <200803240327.01211.josh@tcbug.org> <24adbbc00803240422m5b04b485s5df2f406aa89dc2b@mail.gmail.com> <200803241040.53346.josh@tcbug.org> In-Reply-To: <200803241040.53346.josh@tcbug.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org, Daniel Andersson Subject: Re: Tuning: 100mbit faster, gbit slower. X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Apr 2008 04:51:46 -0000 Josh Paetzel wrote: > > This is on RELENG_6_3 > > net.inet.tcp.sendspace=262144 > net.inet.tcp.recvspace=262144 > kern.ipc.maxsockbuf=1048576 > ifconfig em0 mtu 9014 (You'll need a switch that supports jumbo frames to do > this) > Daniel, you should note that all devices on your network also need to have the same MTU configured, and that Jumbo frames are only supported on gige connections so if you have any 100/10 devices you can not use jumbo frames. > iperf shows wire traffic around 969 mbps and FTP runs at 110 Megs/sec > scp/sftp appears to be cpu bound at 45 Megs/sec, and NFS with TCP mounts and > send/receive packets set to 16384 manages about 90 Megs/sec. > Hi Josh, If you are running SCP/SFTP over your internal lan and are not worried about the security of the data in the session only the authentication then you man want to check out the HPN patches for openssh (I belive they are available as an option in the openssh port) which re-enable the cypher 'none'. Also the latest patch set introduces multi threaded crypto so that ssh is not bound by the performance of a single cpu in multi cpu/core systems. We run the HPN patches here (the older set with no multi threading support) and we can saturate gige with a single scp/sftp/ssh connection between two reasonably spec'd boxes (Our mtu is 8192 as this is more compatiable with most hardware [nics and switchs] most of the hardware wie have will only do 9000 byte jumbo frames). Tom From owner-freebsd-performance@FreeBSD.ORG Thu Apr 3 13:01:59 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F4A71065673 for ; Thu, 3 Apr 2008 13:01:59 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 27B588FC13 for ; Thu, 3 Apr 2008 13:01:58 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from japan.t-online.private (people [192.168.2.4]) by people.fsn.hu (Postfix) with ESMTP id 3113DB2633; Thu, 3 Apr 2008 14:43:09 +0200 (CEST) Message-ID: <47F4D0DD.2040809@fsn.hu> Date: Thu, 03 Apr 2008 14:43:09 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (X11/20080326) MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> In-Reply-To: <479F02A7.9020607@fsn.hu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: =?UTF-8?B?SklOTUVJIFRhdHV5YSAvIOelnuaYjumBlOWTiQ==?= , bind-users@isc.org Subject: Bad bind performance with FreeBSD 7 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2008 13:01:59 -0000 On 01/29/08 11:40, Attila Nagy wrote: > ps: I have an other problem. I've recently switched from a last year > 6-STABLE to 7-STABLE and got pretty bad results on the same machine > with the same bind (9.4). > The graphs are here: > http://picasaweb.google.com/nagy.attila/20080129Fbsd6vs7Bind The problem still persists and now I can provide some profiling info, made by HWPMC. The samples were collected from totally equal machines, the only difference is the OS (FreeBSD 6-STABLE and FreeBSD 7-STABLE, amd64). Here is FreeBSD/amd64 6-STABLE (compiled yesterday): granularity: each sample hit covers 4 byte(s) for 0.00% of 158108.00 seconds % cumulative self self total time seconds seconds calls ms/call ms/call name 8.2 12909.00 12909.00 0 100.00% SHA256_Transform [1] 4.7 20408.00 7499.00 0 100.00% kern_select [2] 3.9 26594.00 6186.00 0 100.00% swi_net [3] 3.3 31829.00 5235.00 0 100.00% sopoll [4] 3.3 37059.00 5230.00 0 100.00% syscall [5] 3.3 42280.00 5221.00 0 100.00% bcopy [6] 3.2 47297.00 5017.00 0 100.00% critical_exit [7] 3.1 52274.00 4977.00 0 100.00% Xfast_syscall [8] 2.8 56748.00 4474.00 0 100.00% spinlock_exit [9] 2.8 61176.00 4428.00 0 100.00% DELAY [10] 2.5 65101.00 3925.00 0 100.00% netisr_poll [11] 2.1 68346.00 3245.00 0 100.00% bge_rxeof [12] 1.8 71126.00 2780.00 0 100.00% copyout [13] 1.7 73888.00 2762.00 0 100.00% _mtx_lock_sleep [14] 1.7 76641.00 2753.00 0 100.00% soreceive [15] 1.6 79245.00 2604.00 0 100.00% rn_match [16] 1.6 81769.00 2524.00 0 100.00% selrecord [17] 1.5 84187.00 2418.00 0 100.00% netisr_pollmore [18] 1.5 86526.00 2339.00 0 100.00% copyin [19] 1.5 88843.00 2317.00 0 100.00% uma_zfree_arg [20] 1.4 91011.00 2168.00 0 100.00% uma_zalloc_arg [21] 1.3 93118.00 2107.00 0 100.00% soo_poll [22] 1.1 94793.00 1675.00 0 100.00% bge_poll [23] 1.0 96428.00 1635.00 0 100.00% spinlock_enter [24] And here is FreeBSD/amd64 7-STABLE (also compiled yesterday): granularity: each sample hit covers 4 byte(s) for 0.00% of 204813.00 seconds % cumulative self self total time seconds seconds calls ms/call ms/call name 9.5 19395.00 19395.00 0 100.00% _mtx_lock_sleep [1] 7.1 33844.00 14449.00 0 100.00% SHA256_Transform [2] 4.2 42408.00 8564.00 0 100.00% DELAY [3] 3.5 49583.00 7175.00 0 100.00% kern_select [4] 3.4 56565.00 6982.00 0 100.00% syscall [5] 3.2 63104.00 6539.00 0 100.00% sopoll_generic [6] 2.9 69125.00 6021.00 0 100.00% Xfast_syscall [7] 2.7 74615.00 5490.00 0 100.00% bcopy [8] 2.6 79947.00 5332.00 0 100.00% swi_net [9] 2.2 84515.00 4568.00 0 100.00% critical_exit [10] 1.9 88438.00 3923.00 0 100.00% netisr_poll [11] 1.8 92179.00 3741.00 0 100.00% spinlock_exit [12] 1.8 95882.00 3703.00 0 100.00% copyout [13] 1.7 99343.00 3461.00 0 100.00% _thread_lock_flags [14] 1.6 102581.00 3238.00 0 100.00% uma_zfree_arg [15] 1.4 105358.00 2777.00 0 100.00% spinlock_enter [16] 1.2 107897.00 2539.00 0 100.00% rn_match [17] 1.2 110369.00 2472.00 0 100.00% soreceive_generic [18] 1.2 112814.00 2445.00 0 100.00% intr_event_schedule_thread [19] 1.2 115194.00 2380.00 0 100.00% uma_zalloc_arg [20] 1.1 117417.00 2223.00 0 100.00% netisr_pollmore [21] 1.0 119444.00 2027.00 0 100.00% selrecord [22] 1.0 121450.00 2006.00 0 100.00% copyin [23] 1.0 123397.00 1947.00 0 100.00% cpu_switch [24] Here are the full output: http://people.fsn.hu/~bra/freebsd/bind94-performance-fbsd6vs7-20080403/ At first glimpse it seems that there is a lot more time spent in _mtx_lock_sleep in FreeBSD 7 than in FreeBSD 6... -- Attila Nagy e-mail: Attila.Nagy@fsn.hu Free Software Network (FSN.HU) phone: +3630 306 6758 http://www.fsn.hu/ From owner-freebsd-performance@FreeBSD.ORG Thu Apr 3 13:22:04 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 266D7106564A for ; Thu, 3 Apr 2008 13:22:04 +0000 (UTC) (envelope-from stefan.lambrev@moneybookers.com) Received: from blah.sun-fish.com (blah.sun-fish.com [217.18.249.150]) by mx1.freebsd.org (Postfix) with ESMTP id C6E128FC21 for ; Thu, 3 Apr 2008 13:22:03 +0000 (UTC) (envelope-from stefan.lambrev@moneybookers.com) Received: by blah.sun-fish.com (Postfix, from userid 1002) id 40E331B10F17; Thu, 3 Apr 2008 15:22:02 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on blah.cmotd.com X-Spam-Level: X-Spam-Status: No, score=-10.6 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.2.3 Received: from hater.haters.org (hater.cmotd.com [192.168.3.125]) by blah.sun-fish.com (Postfix) with ESMTP id 146781B10EBB; Thu, 3 Apr 2008 15:21:55 +0200 (CEST) Message-ID: <47F4D9F2.9070200@moneybookers.com> Date: Thu, 03 Apr 2008 16:21:54 +0300 From: Stefan Lambrev User-Agent: Thunderbird 2.0.0.12 (X11/20080326) MIME-Version: 1.0 To: Attila Nagy References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47F4D0DD.2040809@fsn.hu> In-Reply-To: <47F4D0DD.2040809@fsn.hu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/6566/Thu Apr 3 13:34:30 2008 on blah.cmotd.com X-Virus-Status: Clean Cc: =?UTF-8?B?6YGU5ZOJ?= , freebsd-performance@freebsd.org, =?UTF-8?B?SklOTUVJIFRhdHV5YSAvIOelnuaYjg==?=, bind-users@isc.org Subject: Re: Bad bind performance with FreeBSD 7 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2008 13:22:04 -0000 Greetings, Attila Nagy wrote: > On 01/29/08 11:40, Attila Nagy wrote: >> ps: I have an other problem. I've recently switched from a last year >> 6-STABLE to 7-STABLE and got pretty bad results on the same machine >> with the same bind (9.4). >> The graphs are here: >> http://picasaweb.google.com/nagy.attila/20080129Fbsd6vs7Bind > The problem still persists and now I can provide some profiling info, > made by HWPMC. > > Sorry if you already answer this question, but at least I can find it in the thread. What scheduler are you using on RELENG_7 ? Did you check with both schedulers (ule/4bsd) to see which one works better for you? Also are you sure that you service the same number of requests - I see that the 6.x image shows CPU usage from Aug 2007 and 7.x image is from Jan 2008 ... is it possible, that you have more requests and that's why your CPU usage increased? -- Best Wishes, Stefan Lambrev ICQ# 24134177 From owner-freebsd-performance@FreeBSD.ORG Thu Apr 3 13:54:54 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C5201065670 for ; Thu, 3 Apr 2008 13:54:54 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 0B23A8FC0C for ; Thu, 3 Apr 2008 13:54:53 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from [172.16.129.140] (fw.axelero.hu [195.228.243.120]) by people.fsn.hu (Postfix) with ESMTP id 798F5B2F2A; Thu, 3 Apr 2008 15:54:46 +0200 (CEST) Message-ID: <47F4E1A1.2020500@fsn.hu> Date: Thu, 03 Apr 2008 15:54:41 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: Stefan Lambrev References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47F4D0DD.2040809@fsn.hu> <47F4D9F2.9070200@moneybookers.com> In-Reply-To: <47F4D9F2.9070200@moneybookers.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: =?UTF-8?B?6YGU5ZOJ?= , freebsd-performance@freebsd.org, =?UTF-8?B?SklOTUVJIFRhdHV5YSAvIOelnuaYjg==?=, bind-users@isc.org Subject: Re: Bad bind performance with FreeBSD 7 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2008 13:54:54 -0000 On 2008.04.03. 15:21, Stefan Lambrev wrote: > Greetings, > > Attila Nagy wrote: >> On 01/29/08 11:40, Attila Nagy wrote: >>> ps: I have an other problem. I've recently switched from a last year >>> 6-STABLE to 7-STABLE and got pretty bad results on the same machine >>> with the same bind (9.4). >>> The graphs are here: >>> http://picasaweb.google.com/nagy.attila/20080129Fbsd6vs7Bind >> The problem still persists and now I can provide some profiling info, >> made by HWPMC. >> >> > Sorry if you already answer this question, but at least I can find it > in the thread. > What scheduler are you using on RELENG_7 ? > Did you check with both schedulers (ule/4bsd) to see which one works > better for you? > Also are you sure that you service the same number of requests - I see > that the 6.x image shows CPU usage from > Aug 2007 and 7.x image is from Jan 2008 ... is it possible, that you > have more requests and that's why your CPU usage increased? As for the pictures: GENERIC kernels, so 4BSD on both versions (6 and 7). As for the profiling info: 4BSD on 6, ULE on 7 (because both were upgraded yesterday, and ULE is now default in RELENG_7) The pictures are from the same timeframe (what aug 2007 refers to is the time when the OS was compiled), the two machines were behind a per packet load balancer, so yes: at least in pps, they've got exactly the same traffic (of course it was possible be that one machine could serve the answer directly from the cache, while the other had to go out, but I've started them at the same time, so I think this effect was minimized). From owner-freebsd-performance@FreeBSD.ORG Thu Apr 3 15:22:49 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 368C81065673 for ; Thu, 3 Apr 2008 15:22:49 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id B22F88FC2D for ; Thu, 3 Apr 2008 15:22:48 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from japan.t-online.private (people [192.168.2.4]) by people.fsn.hu (Postfix) with ESMTP id 9DAABAEC18; Thu, 3 Apr 2008 17:22:38 +0200 (CEST) Message-ID: <47F4F63E.80703@fsn.hu> Date: Thu, 03 Apr 2008 17:22:38 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (X11/20080326) MIME-Version: 1.0 To: =?UTF-8?B?SklOTUVJIFRhdHV5YSAvIOelnuaYjumBlOWTiQ==?= References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47A614E9.4030501@fsn.hu> <47A77A13.6010802@fsn.hu> <47B1D2F4.5070304@fsn.hu> <47B2DD62.6020507@fsn.hu> <47BAE0B3.4090004@fsn.hu> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, bind-users@isc.org Subject: Re: max-cache-size doesn't work with 9.5.0b1 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2008 15:22:49 -0000 Sorry again for the long delay, I've got other work to do, and our 9.4 servers work fine (at least on FreeBSD 6, though, see the other -performance- problem)... On 02/20/08 04:30, JINMEI Tatuya / 神明達哉 wrote: > At Tue, 19 Feb 2008 14:59:15 +0100, > Attila Nagy wrote: > > >>> Okay, then please try this patch with '-n 1' (note: this patch doesn't >>> contain the memory statistics hack via the HTTP interface, but I don't >>> we don't need it for this test). >>> > > [...] > > >> (max-cache-size still 32M) >> > > Hmm, then the mutex locks dynamically generated are also irrelevant. > > I've also tried to reproduce the problem in a similar environment > (BIND 9.5.0b1 with threads on FreeBSD 7.0RC1/AMD64, cache-only > configuration, using a real query sample), unsuccessfully. One > significant difference is that I disabled SMP in the kernel (it was > very unstable with the SMP support for some unknown reason), but I > doubt this is the key for the difference of the named behavior. > I don't know why am I the only one to see this. > BTW, is this reproduceable on FreeBSD 6.x? If so, then I'd like to > see what happens if you specify some small value of datasize > (e.g. 512MB) and have named abort when malloc() fails with the "X" > _malloc_options. (This option doesn't seem to work for FreeBSD 7.x, > at least at the moment). > Yes, it's the same, even when there is a different (libpthreads, KSE) threading library is in use. I've recompiled named with the following in main(): ./work/bind-9.5.0b2/bin/named/main.c: _malloc_options="X"; And set cache-size to 32MB. At: 21664 bind 4 20 0 819M 819M kserel 0 5:32 0.00% named.950 I pressed a CTRL-C: mem.c:1114: REQUIRE((((ctx) != ((void *)0)) && (((const isc__magic_t *)(ctx))->magic == ((('M') << 24 | ('e') << 16 | ('m') << 8 | ('C')))))) failed. > Some other questions: > - can we see your named.conf? If you specify non-default > configuration options, that might be the reason for, or related to, > this problem. > Of course (see at the end). > - does your named produce lot of log messages? If so, it might also > be a reason (simply because it relies on standard libraries). > grep named ns20080403.log | wc -l 1930006 For today (17 hours and 18 minutes of logs). Is this a lot? Config (normally max-cache-size is about 2400M): -hmm I haven't tried to change cleaning-interval, it was needed because the default cache housekeeping effectively stopped the ns during the cleanup- options { directory "/etc/bind"; tcp-clients 256; recursive-clients 8192; max-cache-size 32M; minimal-responses yes; pid-file "/var/run/named.pid"; cleaning-interval 15; allow-query-cache { any; }; allow-query { any; }; allow-recursion { any; }; }; controls { inet * port 953 allow { } keys { "rndc-key"; }; }; key "rndc-key" { algorithm hmac-md5; secret }; logging { channel syslog-ng { syslog local5; severity info; print-category yes; print-severity yes; }; category default { syslog-ng; }; category config { syslog-ng; }; category xfer-in { syslog-ng; }; category xfer-out { syslog-ng; }; category notify { syslog-ng; }; category security { syslog-ng; }; category update { syslog-ng; }; category lame-servers { syslog-ng; }; category update-security { syslog-ng; }; }; zone "10.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "16.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "17.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "18.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "19.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "20.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "21.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "22.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "23.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "24.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "25.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "26.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "27.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "28.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "29.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "30.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "31.172.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; zone "168.192.in-addr.arpa" in { type master; file "db/db.rfc1918"; }; -- Attila Nagy e-mail: Attila.Nagy@fsn.hu Free Software Network (FSN.HU) phone: +3630 306 6758 http://www.fsn.hu/ From owner-freebsd-performance@FreeBSD.ORG Thu Apr 3 17:46:54 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0725F1065676 for ; Thu, 3 Apr 2008 17:46:54 +0000 (UTC) (envelope-from Jinmei_Tatuya@isc.org) Received: from mon.jinmei.org (mon.jinmei.org [IPv6:2001:4f8:3:36::162]) by mx1.freebsd.org (Postfix) with ESMTP id DE8238FC19 for ; Thu, 3 Apr 2008 17:46:53 +0000 (UTC) (envelope-from Jinmei_Tatuya@isc.org) Received: from dhcp-191.sql1.isc.org (unknown [IPv6:2001:4f8:3:bb:217:f2ff:fee0:a91f]) by mon.jinmei.org (Postfix) with ESMTP id 9875D33C2E; Thu, 3 Apr 2008 10:46:53 -0700 (PDT) Date: Thu, 03 Apr 2008 10:46:53 -0700 Message-ID: From: JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= To: Attila Nagy In-Reply-To: <47F4F63E.80703@fsn.hu> References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47A614E9.4030501@fsn.hu> <47A77A13.6010802@fsn.hu> <47B1D2F4.5070304@fsn.hu> <47B2DD62.6020507@fsn.hu> <47BAE0B3.4090004@fsn.hu> <47F4F63E.80703@fsn.hu> User-Agent: Wanderlust/2.14.0 (Africa) Emacs/22.1 Mule/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Mailman-Approved-At: Thu, 03 Apr 2008 18:09:27 +0000 Cc: freebsd-performance@freebsd.org, bind-users@isc.org Subject: Re: max-cache-size doesn't work with 9.5.0b1 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Apr 2008 17:46:54 -0000 At Thu, 03 Apr 2008 17:22:38 +0200, Attila Nagy wrote: > Sorry again for the long delay, I've got other work to do, and our 9.4 > servers work fine (at least on FreeBSD 6, though, see the other > -performance- problem)... No problem, I understand testing a beta version cannot be a high priority work. > > BTW, is this reproduceable on FreeBSD 6.x? If so, then I'd like to > > see what happens if you specify some small value of datasize > > (e.g. 512MB) and have named abort when malloc() fails with the "X" > > _malloc_options. (This option doesn't seem to work for FreeBSD 7.x, > > at least at the moment). > > > Yes, it's the same, even when there is a different (libpthreads, KSE) > threading library is in use. > I've recompiled named with the following in main(): > ./work/bind-9.5.0b2/bin/named/main.c: _malloc_options="X"; > > And set cache-size to 32MB. > > At: > 21664 bind 4 20 0 819M 819M kserel 0 5:32 0.00% named.950 > I pressed a CTRL-C: > mem.c:1114: REQUIRE((((ctx) != ((void *)0)) && (((const isc__magic_t > *)(ctx))->magic == ((('M') << 24 | ('e') << 16 | ('m') << 8 | ('C')))))) > failed. Hmm, this is odd in two points: 1. the "X" malloc option doesn't seem to work as expected. I expected a call to malloc() should trigger an assertion failure (within the malloc library) at a much earlier stage. Does it change if you try the alternative debugging approach I mentioned before? That is: - create a symbolic link from "/etc/malloc.conf" to "X": # ln -s X /etc/malloc.conf - start named with a moderate limitation of virtual memory size, e.g. # /usr/bin/limits -v 384m $path_to_named/named 2. Whether it's related to this max-cache-size issue, the assertion failure in mem.c wasn't an expected result; this is likely to be a bug anyway. If the process dumped a core, can you show the stack backtrace of it? (gdb) thread apply all bt full > > Some other questions: > > - can we see your named.conf? If you specify non-default > > configuration options, that might be the reason for, or related to, > > this problem. > > > Of course (see at the end). > > > - does your named produce lot of log messages? If so, it might also > > be a reason (simply because it relies on standard libraries). > > > grep named ns20080403.log | wc -l > 1930006 > For today (17 hours and 18 minutes of logs). > Is this a lot? This means about 31 log messages per second. This may not be extremely frequent, but if some memory is lost for every log message, I guess it could be a reason for the growing memory at the hight rate we've seen. What if you change the channel setting from: > channel syslog-ng { > syslog local5; > severity info; > print-category yes; > print-severity yes; > }; to this one? channel syslog-ng { null; severity info; print-category yes; print-severity yes; }; BTW, > -hmm I haven't tried to change cleaning-interval, it was needed because > the default cache housekeeping effectively stopped the ns during the > cleanup- This doesn't matter for 9.5. It doesn't perform periodic cleaning regardless of the value of cleaning-interval. --- JINMEI, Tatuya Internet Systems Consortium, Inc. From owner-freebsd-performance@FreeBSD.ORG Fri Apr 4 12:44:27 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0B4D4106574B for ; Fri, 4 Apr 2008 12:44:25 +0000 (UTC) (envelope-from fw@deneb.enyo.de) Received: from mail.enyo.de (mail.enyo.de [IPv6:2001:14b0:202:1::a7]) by mx1.freebsd.org (Postfix) with ESMTP id B4E0C8FC14 for ; Fri, 4 Apr 2008 12:44:22 +0000 (UTC) (envelope-from fw@deneb.enyo.de) Received: from deneb.vpn.enyo.de ([212.9.189.177] helo=deneb.enyo.de) by mail.enyo.de with esmtp id 1JhlH4-0001z5-Ey; Fri, 04 Apr 2008 14:44:06 +0200 Received: from fw by deneb.enyo.de with local (Exim 4.69) (envelope-from ) id 1JhlH3-00010x-IW; Fri, 04 Apr 2008 14:44:05 +0200 From: Florian Weimer To: JINMEI Tatuya / =?utf-8?B?56We5piO6YGU5ZOJ?= References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47A614E9.4030501@fsn.hu> <47A77A13.6010802@fsn.hu> <47B1D2F4.5070304@fsn.hu> <47B2DD62.6020507@fsn.hu> <47BAE0B3.4090004@fsn.hu> Date: Fri, 04 Apr 2008 14:44:05 +0200 In-Reply-To: ("JINMEI Tatuya / =?utf-8?B?56We5piO6YGU5ZOJ?= "'s message of "Wed, 20 Feb 2008 13:51:02 -0800") Message-ID: <87abk9yelm.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Fri, 04 Apr 2008 13:14:37 +0000 Cc: Attila Nagy , freebsd-performance@freebsd.org, bind-users@isc.org Subject: Re: max-cache-size doesn't work with 9.5.0b1 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Apr 2008 12:44:27 -0000 * JINMEI Tatuya / =E7=A5=9E=E6=98=8E=E9=81=94=E5=93=89: > Then the named process will eventually abort itself with a core dump > due to malloc failure. Please show us the stack trace at that point. > Hopefully it will reveal the malloc call that keeps consuming memory. I've successfully used a backtrace()-instrumented malloc() to track down difficult memory leaks. backtrace() is necessary because it allows you to see past malloc() wrappers. (backtrace() seems to be part of libexecinfo on FreeBSD.) From owner-freebsd-performance@FreeBSD.ORG Fri Apr 4 13:31:52 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3D611065677 for ; Fri, 4 Apr 2008 13:31:52 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 29D708FC26; Fri, 4 Apr 2008 13:31:50 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <47F62DC5.5010703@FreeBSD.org> Date: Fri, 04 Apr 2008 15:31:49 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.12 (Macintosh/20080213) MIME-Version: 1.0 To: Attila Nagy References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47F4D0DD.2040809@fsn.hu> <47F4D9F2.9070200@moneybookers.com> <47F4E1A1.2020500@fsn.hu> In-Reply-To: <47F4E1A1.2020500@fsn.hu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "JINMEI Tatuya / 神明"@FreeBSD.ORG, =?UTF-8?B?6YGU5ZOJ?= , freebsd-performance@freebsd.org, bind-users@isc.org, Stefan Lambrev Subject: Re: Bad bind performance with FreeBSD 7 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Apr 2008 13:31:52 -0000 Attila Nagy wrote: > On 2008.04.03. 15:21, Stefan Lambrev wrote: >> Greetings, >> >> Attila Nagy wrote: >>> On 01/29/08 11:40, Attila Nagy wrote: >>>> ps: I have an other problem. I've recently switched from a last year >>>> 6-STABLE to 7-STABLE and got pretty bad results on the same machine >>>> with the same bind (9.4). >>>> The graphs are here: >>>> http://picasaweb.google.com/nagy.attila/20080129Fbsd6vs7Bind >>> The problem still persists and now I can provide some profiling info, >>> made by HWPMC. >>> >>> >> Sorry if you already answer this question, but at least I can find it >> in the thread. >> What scheduler are you using on RELENG_7 ? >> Did you check with both schedulers (ule/4bsd) to see which one works >> better for you? >> Also are you sure that you service the same number of requests - I see >> that the 6.x image shows CPU usage from >> Aug 2007 and 7.x image is from Jan 2008 ... is it possible, that you >> have more requests and that's why your CPU usage increased? > As for the pictures: GENERIC kernels, so 4BSD on both versions (6 and 7). > As for the profiling info: 4BSD on 6, ULE on 7 (because both were > upgraded yesterday, and ULE is now default in RELENG_7) > > The pictures are from the same timeframe (what aug 2007 refers to is the > time when the OS was compiled), the two machines were behind a per > packet load balancer, so yes: at least in pps, they've got exactly the > same traffic (of course it was possible be that one machine could serve > the answer directly from the cache, while the other had to go out, but > I've started them at the same time, so I think this effect was minimized). User time is much greater so named is doing much more work for some reason. It doesn't appear that this is a kernel problem. Verify that the config is identical, and you are not overloading it (bind doesn't scale beyond 4 threads). Kris From owner-freebsd-performance@FreeBSD.ORG Fri Apr 4 14:34:56 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22D5C106564A for ; Fri, 4 Apr 2008 14:34:56 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id D190D8FC1A for ; Fri, 4 Apr 2008 14:34:55 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from japan.t-online.private (people [192.168.2.4]) by people.fsn.hu (Postfix) with ESMTP id 7C170B2E06; Fri, 4 Apr 2008 16:34:43 +0200 (CEST) Message-ID: <47F63C82.7060502@fsn.hu> Date: Fri, 04 Apr 2008 16:34:42 +0200 From: Attila Nagy User-Agent: Thunderbird 2.0.0.12 (X11/20080326) MIME-Version: 1.0 To: =?UTF-8?B?SklOTUVJIFRhdHV5YSAvIOelnuaYjumBlOWTiQ==?= References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47A614E9.4030501@fsn.hu> <47A77A13.6010802@fsn.hu> <47B1D2F4.5070304@fsn.hu> <47B2DD62.6020507@fsn.hu> <47BAE0B3.4090004@fsn.hu> <47F4F63E.80703@fsn.hu> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-performance@freebsd.org, bind-users@isc.org Subject: Re: max-cache-size doesn't work with 9.5.0b1 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Apr 2008 14:34:56 -0000 On 04/03/08 19:46, JINMEI Tatuya / 神明達哉 wrote: > Hmm, this is odd in two points: > 1. the "X" malloc option doesn't seem to work as expected. I expected > a call to malloc() should trigger an assertion failure (within the > malloc library) at a much earlier stage. Does it change if you try > the alternative debugging approach I mentioned before? That is: > - create a symbolic link from "/etc/malloc.conf" to "X": > # ln -s X /etc/malloc.conf > - start named with a moderate limitation of virtual memory size, e.g. > # /usr/bin/limits -v 384m $path_to_named/named > > 2. Whether it's related to this max-cache-size issue, the assertion > failure in mem.c wasn't an expected result; this is likely to be a > bug anyway. If the process dumped a core, can you show the > stack backtrace of it? > (gdb) thread apply all bt full > > No effect, the process grows happily. I don't have a core dump. > This means about 31 log messages per second. This may not be > extremely frequent, but if some memory is lost for every log message, > I guess it could be a reason for the growing memory at the hight rate > we've seen. > > What if you change the channel setting from: > > I've added this, so now the server doesn't log much (after start, noting): category default { null; }; The memory usage still grows. -- Attila Nagy e-mail: Attila.Nagy@fsn.hu Free Software Network (FSN.HU) phone: +3630 306 6758 http://www.fsn.hu/ From owner-freebsd-performance@FreeBSD.ORG Fri Apr 4 17:12:01 2008 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 28B4D106564A for ; Fri, 4 Apr 2008 17:12:01 +0000 (UTC) (envelope-from Jinmei_Tatuya@isc.org) Received: from mon.jinmei.org (mon.jinmei.org [IPv6:2001:4f8:3:36::162]) by mx1.freebsd.org (Postfix) with ESMTP id 0FAD58FC27 for ; Fri, 4 Apr 2008 17:12:01 +0000 (UTC) (envelope-from Jinmei_Tatuya@isc.org) Received: from dhcp-191.sql1.isc.org (unknown [IPv6:2001:4f8:3:bb:1584:aac1:f118:4b12]) by mon.jinmei.org (Postfix) with ESMTP id C1A2433C2E; Fri, 4 Apr 2008 10:12:00 -0700 (PDT) Date: Fri, 04 Apr 2008 10:12:00 -0700 Message-ID: From: JINMEI Tatuya / =?ISO-2022-JP?B?GyRCP0BMQEMjOkgbKEI=?= To: Attila Nagy In-Reply-To: <47F63C82.7060502@fsn.hu> References: <475B0F3E.5070100@fsn.hu> <479DFE74.8030004@fsn.hu> <479F02A7.9020607@fsn.hu> <47A614E9.4030501@fsn.hu> <47A77A13.6010802@fsn.hu> <47B1D2F4.5070304@fsn.hu> <47B2DD62.6020507@fsn.hu> <47BAE0B3.4090004@fsn.hu> <47F4F63E.80703@fsn.hu> <47F63C82.7060502@fsn.hu> User-Agent: Wanderlust/2.14.0 (Africa) Emacs/22.1 Mule/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII X-Mailman-Approved-At: Fri, 04 Apr 2008 17:55:58 +0000 Cc: freebsd-performance@freebsd.org, bind-users@isc.org Subject: Re: max-cache-size doesn't work with 9.5.0b1 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Apr 2008 17:12:01 -0000 At Fri, 04 Apr 2008 16:34:42 +0200, Attila Nagy wrote: > No effect, the process grows happily. I don't have a core dump. Hmm, sorry, then I have no further idea of chasing the problem. A few points that may help: - can you show the diff you applied to bin/named/main.c when you added the malloc_options? - regarding the core dump, you may have to set the "kern.sugid_coredump" sysctl variable to 1, and make sure that then directly where named resides (which should be under /etc/bind/ with your config) is writable for the uid of the process. Or, simply run named as a root without specifying -u for the debugging purpose. - is it possible for me to get access to the test machine, or to get the query pattern so that I can reproduce the problem by myself? As I stated before, I tried to reproduce it with my box that has the similar system environment, but I failed to see the problem. Thanks, --- JINMEI, Tatuya Internet Systems Consortium, Inc.