From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 12 13:08:01 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id C4F711065675; Sun, 12 Sep 2010 13:08:01 +0000 (UTC) Date: Sun, 12 Sep 2010 13:08:01 +0000 From: Alexander Best To: Jilles Tjoelker Message-ID: <20100912130801.GA23538@freebsd.org> References: <4C8A81D9.5020905@rawbw.com> <20100910194600.GB60815@stack.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100910194600.GB60815@stack.nl> Cc: Yuri , freebsd-hackers@freebsd.org Subject: Re: Why I can't trace linux process's childs with truss? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2010 13:08:01 -0000 On Fri Sep 10 10, Jilles Tjoelker wrote: > On Fri, Sep 10, 2010 at 12:07:05PM -0700, Yuri wrote: > > I am trying to get the log of all system calls that skype makes with > > truss -f /usr/local/share/skype/skype > > For some reason the resulting log only has the leading process calls and > > nothing from it's 8 childs. > > Truss doesn't show any 'cloned' processes. Is this a bug in truss that > > it doesn't follow 'cloned' processes? > > > Is there any workaround or other way I can debug skype? strace doesn't > > work on amd64. > > I am primarily interested why it can't read /dev/video0 device, created > > by webcamd. > > Try using ktrace instead of truss. You will need devel/linux_kdump from > ports to decode the resulting ktrace.out. there's a PR related to this "issue" [1]. so is truss missing this functionality or is this in fact a feature, because truss musn't be used on any non freebsd executable? if that is the case i vote to add a CAVEATS section to the truss(1) manual so people rather use ktrace in combination with linux_kdump. cheers. alex [1] http://www.freebsd.org/cgi/query-pr.cgi?pr=150262 > > Alternatively, if you're familiar with dtrace, you could try that. > > -- > Jilles Tjoelker -- a13x From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 12 15:27:03 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B3F11065673; Sun, 12 Sep 2010 15:27:03 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id A946C8FC1E; Sun, 12 Sep 2010 15:27:02 +0000 (UTC) Received: by eyx24 with SMTP id 24so2810271eyx.13 for ; Sun, 12 Sep 2010 08:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=m6j26NU88iPwMai1zisxeFGeJ9x6JoUNitGUgojjdNk=; b=RdxK/vCAnxNJMQYHKNdcMsz0O1ErxDnkvXOIp9a1fogsEdyEbp10aLbIpYqs2qDo+d AtkUFzJl/PC+ARLpIYy1AB7qRKp48YRKGit8uaCkp7JxvVUW+Xo4n7OlgBa2ATGsNVk/ 03J6S+E6Re9Bfr3i3rNdeOSCDWUHQCMZtrAvs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=pNpssS96zWm8ZtSeSqAHpGCzpapvYy/3YMyyZja0U5BhCGL7mQlml92g88pnHzvW7m AIjBneP1rp9j4841IcXFMUXPen6RhW/Pnv0EvRnvoT0co+g/QOMAdv6LXlZxCHlhjv6N XbXRlC0nZ+/LEvJTvzHyaxmtxn2BKaEPUhO50= MIME-Version: 1.0 Received: by 10.213.22.139 with SMTP id n11mr1022763ebb.21.1284303669383; Sun, 12 Sep 2010 08:01:09 -0700 (PDT) Received: by 10.14.120.146 with HTTP; Sun, 12 Sep 2010 08:01:09 -0700 (PDT) In-Reply-To: <20100912130801.GA23538@freebsd.org> References: <4C8A81D9.5020905@rawbw.com> <20100910194600.GB60815@stack.nl> <20100912130801.GA23538@freebsd.org> Date: Sun, 12 Sep 2010 17:01:09 +0200 Message-ID: From: Mateusz Guzik To: Alexander Best Content-Type: text/plain; charset=ISO-8859-1 Cc: Yuri , Jilles Tjoelker , freebsd-hackers@freebsd.org Subject: Re: Why I can't trace linux process's childs with truss? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2010 15:27:03 -0000 On Sun, Sep 12, 2010 at 3:08 PM, Alexander Best wrote: > there's a PR related to this "issue" [1]. so is truss missing this > functionality or is this in fact a feature, because truss musn't be used on > any non freebsd executable? > Actually truss handles linux processes just fine, except for their children. :) Linux process can create a child using linux_clone syscall, but truss does not handle that case and this can be the problem that Yuri reported (since no log was provided, I can only guess). This trivial patch should fix this: http://student.agh.edu.pl/~mjguzik/truss-linux-forks.patch Tested on this simple program: http://student.agh.edu.pl/~mjguzik/fork.c If it still does not work, log generated by truss would be helfpul. Regards, -- Mateusz Guzik From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 12 15:40:54 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 685B7106566B; Sun, 12 Sep 2010 15:40:54 +0000 (UTC) Date: Sun, 12 Sep 2010 15:40:54 +0000 From: Alexander Best To: Mateusz Guzik Message-ID: <20100912154054.GA42409@freebsd.org> References: <4C8A81D9.5020905@rawbw.com> <20100910194600.GB60815@stack.nl> <20100912130801.GA23538@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: Yuri , Jilles Tjoelker , freebsd-hackers@freebsd.org Subject: Re: Why I can't trace linux process's childs with truss? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Sep 2010 15:40:54 -0000 On Sun Sep 12 10, Mateusz Guzik wrote: > On Sun, Sep 12, 2010 at 3:08 PM, Alexander Best wrote: > > there's a PR related to this "issue" [1]. so is truss missing this > > functionality or is this in fact a feature, because truss musn't be used on > > any non freebsd executable? > > > > Actually truss handles linux processes just fine, except for their children. :) > Linux process can create a child using linux_clone syscall, but truss does not > handle that case and this can be the problem that Yuri reported (since > no log was > provided, I can only guess). > > This trivial patch should fix this: > http://student.agh.edu.pl/~mjguzik/truss-linux-forks.patch > > Tested on this simple program: > http://student.agh.edu.pl/~mjguzik/fork.c > > If it still does not work, log generated by truss would be helfpul. looking good. could be post that patch as followup to yuri's PR? hope it gets committed soon. :) cheers. alex > > Regards, > -- > Mateusz Guzik -- a13x From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 13 15:10:46 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B3D2106584C for ; Mon, 13 Sep 2010 15:10:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1F0418FC15 for ; Mon, 13 Sep 2010 15:10:41 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id C3E1346C08; Mon, 13 Sep 2010 11:10:40 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 07C168A050; Mon, 13 Sep 2010 11:10:40 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Mon, 13 Sep 2010 10:11:18 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100911060704.55B611065670@hub.freebsd.org> In-Reply-To: <20100911060704.55B611065670@hub.freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009131011.19089.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 13 Sep 2010 11:10:40 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Simon Subject: Re: MCE Decoding - MCA: Bank 8, Status 0xcc0031800001009f/0xc8000980000200cf X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2010 15:10:46 -0000 On Saturday, September 11, 2010 1:40:28 am Simon wrote: > Hello, > > Can someone please help me decode these two errors on FreeBSD 8.1-R: > > MCA: Bank 8, Status 0xcc0031800001009f > MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000 > MCA: Vendor "GenuineIntel", ID 0x106a5, APIC ID 16 > MCA: CPU 0 COR (198) OVER RD channel ?? memory error > MCA: Address 0x1b6188d80 > MCA: Misc 0x72ae242000000084 > > MCA: Bank 8, Status 0xc8000980000200cf > MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000 > MCA: Vendor "GenuineIntel", ID 0x106a5, APIC ID 16 > MCA: CPU 0 COR (38) OVER MS channel ?? memory error > MCA: Misc 0x72ae242000000140 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 8 MISC 72ae242000000084 ADDR 1b6188d80 MCG status: MCi status: Error overflow MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR Transaction: Memory read error Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 198 Memory transaction Tracker ID (RTId): 84 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 72ae2420 STATUS cc0031800001009f MCGSTATUS 0 MCGCAP 1c09 APICID 10 SOCKETID 0 CPUID Vendor Intel Family 6 Model 26 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 8 MISC 72ae242000000140 MCG status: MCi status: Error overflow MCi_MISC register valid MCA: MEMORY CONTROLLER MS_CHANNELunspecified_ERR Transaction: Memory scrubbing error Memory ECC error occurred during scrub Memory corrected error count (CORE_ERR_CNT): 38 Memory transaction Tracker ID (RTId): 40 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 72ae2420 STATUS c8000980000200cf MCGSTATUS 0 MCGCAP 1c09 APICID 10 SOCKETID 0 CPUID Vendor Intel Family 6 Model 26 You have some corrected memory errors (198+38 = 236) in the first DIMM (on the SuperMicro boards we have at work, it would correspond to the DIMM slot labeled P1_DIMM1A). In my experience I would just ignore them unless the count gets much higher (say 10000+ / per hour). -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 13 21:28:32 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E36AA106566B for ; Mon, 13 Sep 2010 21:28:32 +0000 (UTC) (envelope-from cronfy@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 77FF28FC08 for ; Mon, 13 Sep 2010 21:28:32 +0000 (UTC) Received: by bwz20 with SMTP id 20so323618bwz.13 for ; Mon, 13 Sep 2010 14:28:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:from:date :message-id:subject:to:content-type; bh=OFS21fCPafcBZA8E2xRo2cgr6nGRr74PMqwUVHh7aXE=; b=pLGPdmyuIP2HcmKEzUqd+j8s2kV6rIilRuQ2irVZnHk1gJPO2yVCVnfomQxXV7qebj mI0reWWefZXVM7BuU1C4a/jiGn9kz41qafYPC7x6ysdptPhFRvqGTl/FWq31p9OfEsno Fz5kXMQTQ3z7vovH+g56hIAlmaou0eLXjwBiU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=OO55HqOXfZW9IWZcYTGVALKo8fWUxb2qFcWME4jm/RGxyuyZq8TJ4x6RHp6Y8SgPSB ktdw7MmV6Dv+eGMZ2oVY8Mp8ZsozgrpXrU9OyDhwKAjOOzH3H0C0FMq6KEY0+g1+sbEC Z6YX85rbqGHudqPlWBLMvF3FmRcNLtCY58On0= Received: by 10.204.85.90 with SMTP id n26mr3623589bkl.109.1284411465116; Mon, 13 Sep 2010 13:57:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.99.197 with HTTP; Mon, 13 Sep 2010 13:57:15 -0700 (PDT) From: cronfy Date: Tue, 14 Sep 2010 00:57:15 +0400 Message-ID: To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: is vfs.lookup_shared unsafe in 7.3? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Sep 2010 21:28:33 -0000 Hello, Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA > 40, very slow lstat() at these moments, looks like some kind of lock contention) I enabled vfs.lookup_shared=1 on two servers today. One is FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is FreeBSD-7.3 csup'ed and built Jul 16 2010. The server with more fresh kernel is running nice and does not show high load anymore. But on the second server it did not help. More, after a few hours of work with vfs.lookup_shared=1 I noticed processes stucked in "ufs" state. I tried to kill them with no luck. Disabling vfs.lookup_shared freezed the whole system. So, is vfs.lookup_shared=1 unsafe in 7.3? Did it become more stable between 16 Jul and 9 Sep (is it the reason why first system is still running?), or should I expect that it will freeze in a near time too? Thanks in advance! -- // cronfy From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 14 11:34:10 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D9B510656A3 for ; Tue, 14 Sep 2010 11:34:10 +0000 (UTC) (envelope-from freebsd-hackers@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 290088FC0A for ; Tue, 14 Sep 2010 11:34:09 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1OvTm1-0005cu-Nf for freebsd-hackers@freebsd.org; Tue, 14 Sep 2010 13:34:05 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 14 Sep 2010 13:34:05 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 14 Sep 2010 13:34:05 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-hackers@freebsd.org From: Ivan Voras Date: Tue, 14 Sep 2010 13:33:58 +0200 Lines: 13 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.9) Gecko/20100518 Thunderbird/3.0.4 In-Reply-To: X-Enigmail-Version: 1.0.1 Subject: Re: is vfs.lookup_shared unsafe in 7.3? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2010 11:34:10 -0000 On 09/13/10 22:57, cronfy wrote: > Hello, > > Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA> > 40, very slow lstat() at these moments, looks like some kind of lock > contention) I enabled vfs.lookup_shared=1 on two servers today. One is > FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is > FreeBSD-7.3 csup'ed and built Jul 16 2010. The important think you missed is *where* is the supposed lock contention. If you have lots of processes in "ufs" state, there are other things that can help you, such as increasing vfs.ufs.dirhash_maxmem. From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 14 12:40:39 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15E16106564A for ; Tue, 14 Sep 2010 12:40:39 +0000 (UTC) (envelope-from cronfy@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 99F4C8FC13 for ; Tue, 14 Sep 2010 12:40:38 +0000 (UTC) Received: by bwz15 with SMTP id 15so231136bwz.13 for ; Tue, 14 Sep 2010 05:40:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=xIRs/8CLMx8Y6X31HRmvyrj3QNuwH7F6hIElIkaBp0Y=; b=bNbc8aCm77SStPdXFmHM/cEfo2mKUeltWVUmQHtf5fZSAsG/HtXkjmNO6yw9FPWMXv z1BQBZ4/lRENjNzzp+1LERD0UU/CnClk7rew5Xe3hW4Pno/3d+yi9kMIq+WAtJcCBsM3 1njqhGIqmcMGYmBFlgf6N6XdMRsceN2c96WuM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=JHwDqRtikl5IiURjofoeZ+73i/TM9P+/9k/wfdkVDbDr95eNDkv8/FhT8cweXl82Kg Milw81+HT8Svb9mKoaIwb6oXbswnZPCi7QR0G7GOIP20nqr/ZKU0I0lr2W9qUgw6XzKu 1lFSb7eVhhwG01Oq/4D4foSNpm46d4KgwvnhE= Received: by 10.204.76.140 with SMTP id c12mr4388021bkk.7.1284468037314; Tue, 14 Sep 2010 05:40:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.99.197 with HTTP; Tue, 14 Sep 2010 05:40:07 -0700 (PDT) In-Reply-To: References: From: cronfy Date: Tue, 14 Sep 2010 16:40:07 +0400 Message-ID: To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: is vfs.lookup_shared unsafe in 7.3? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Sep 2010 12:40:39 -0000 >> Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA> >> 40, very slow lstat() at these moments, looks like some kind of lock >> contention) I enabled vfs.lookup_shared=3D1 on two servers today. One is >> FreeBSD-7.3 kernel csup'ed and built Sep =A09 2010 and other is >> FreeBSD-7.3 csup'ed and built Jul 16 2010. > > The important think you missed is *where* is the supposed lock contention= . > If you have lots of processes in "ufs" state, there are other things that > can help you, such as increasing vfs.ufs.dirhash_maxmem. Before I changed vfs.lookup_shared I did increase vfs.ufs.dirhash_maxmem to 16M. It filled in ~5 minutes, but even while it was not full, server was not running better. Usually there is very small number of processes in ufs state (they are even not in top). That processes I've been talking about I suspect were the consequence of enabling vfs.lookup_shared. I also enabled hwpmc to examine system at the moments of high load, but did not have a chance to use it. What am I afraid of now is that server that is running nice till now may crash, that's why I am asking about stability of vfs.lookup_shared in 7.3. At svn.freebsd.org I see a couple of commits in stable/7/sys/fs/ and ufs/ for last 2 months that could change the behaviour, and this may be the reason why one system is running stable, and another was not. But I am not sure about it, so I am asking experienced people here :) --=20 // cronfy From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 14:02:15 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A2BCE106564A for ; Wed, 15 Sep 2010 14:02:15 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2E55E8FC14 for ; Wed, 15 Sep 2010 14:02:15 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1OvsHX-0003Jg-KI for freebsd-hackers@freebsd.org; Wed, 15 Sep 2010 16:44:15 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id 1E5B61CC1E; Wed, 15 Sep 2010 16:44:16 +0300 (EEST) Date: Wed, 15 Sep 2010 16:44:15 +0300 From: Andrey Simonenko To: freebsd-hackers@freebsd.org Message-ID: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 06-Jan-2007 23:14:37) X-Date: 2010-09-15 16:44:15 X-Connected-IP: 10.18.52.101:21881 X-Message-Linecount: 48 X-Body-Linecount: 36 X-Message-Size: 2282 X-Body-Size: 1769 Subject: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 14:02:15 -0000 Hello, I have questions about mutex implementation in kern/kern_mutex.c and sys/mutex.h files (current versions of these files): 1. Is the following statement correct for a volatile pointer or integer variable: if a volatile variable is updated by the compare-and-set instruction (e.g. atomic_cmpset_ptr(&val, ...)), then the current value of such variable can be read without any special instruction (e.g. v = val)? I checked Assembler code for a function with "v = val" and "val = v" like statements generated for volatile variable and simple variable and found differences: on ia64 "v = val" was implemented by ld.acq and "val = v" was implemented by st.rel; on mips and sparc64 Assembler code can have different order of lines for volatile and simple variable (depends on the code of a function). 2. Let there is a default (sleep) mutex and adaptive mutexes is enabled. A thread tries to obtain lock quickly and fails, _mtx_lock_sleep() is called, it gets the address of the current mutex's owner thread and checks whether that owner thread is running (on another CPU). How does _mtx_lock_sleep() know that that thread still exists (lines 311-337 in kern_mutex.c)? When adaptive mutexes was implemented there was explicit locking around adaptive mutexes code. When turnstile in mutex code was implemented that's locking logic was changed. 3. Why there is no any memory barrier in mtx_init()? If another thread (on another CPU) finds that mutex is initialized using mtx_initialized() then it can mtx_lock() it and mtx_lock() it second time, as a result mtx_recurse field will be increased, but its value still can be uninitialized on architecture with relaxed memory ordering model. Thanks. From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 15:46:03 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEC83106566C for ; Wed, 15 Sep 2010 15:46:03 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 96CB28FC14 for ; Wed, 15 Sep 2010 15:46:03 +0000 (UTC) Received: by iwn34 with SMTP id 34so278483iwn.13 for ; Wed, 15 Sep 2010 08:46:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=4F657KLvZXEkdXaD6MM/T61VWuvtyoffYvYV9GE9Rro=; b=Jt+pH0aJUr0mG/5fzLhHMdFx2kKJEfeF4oqyBpE6dkd/eIDd2eEkddm9zsQCuCkKu8 fTA6doOsoDG90dvNQjOrTtHTxaLfphg0CsDV1wUv/VIKcZAXOvG6MCf4zgY0NtEmZ0sK gXCivjhIifG43MYmK1rkzgIWyPWsbk7t72XgU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=uLvQVTVq1HsnbPdYY5YBmvGvOF8Wsf7FMwTGZ0F1SYvO5mE6zpUp6jqwX9PptiehFk zNECRduuIa+CJ7WtCHcFUg1LrtK6HYNQXeEGbTTam7FsmeFvy5Pl5tB+4jHA+JPlfJXm +uL8iiBfRc10T3smFov5h8BgHltKwl2PsaSRA= MIME-Version: 1.0 Received: by 10.231.58.198 with SMTP id i6mr1922724ibh.43.1284565560750; Wed, 15 Sep 2010 08:46:00 -0700 (PDT) Received: by 10.231.130.34 with HTTP; Wed, 15 Sep 2010 08:46:00 -0700 (PDT) In-Reply-To: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> Date: Wed, 15 Sep 2010 08:46:00 -0700 Message-ID: From: Matthew Fleming To: Andrey Simonenko Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 15:46:03 -0000 I'll take a stab at answering these... On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko wrote: > Hello, > > I have questions about mutex implementation in kern/kern_mutex.c > and sys/mutex.h files (current versions of these files): > > 1. Is the following statement correct for a volatile pointer or integer > =A0 variable: if a volatile variable is updated by the compare-and-set > =A0 instruction (e.g. atomic_cmpset_ptr(&val, ...)), then the current > =A0 value of such variable can be read without any special instruction > =A0 (e.g. v =3D val)? > > =A0 I checked Assembler code for a function with "v =3D val" and "val =3D= v" > =A0 like statements generated for volatile variable and simple variable > =A0 and found differences: on ia64 "v =3D val" was implemented by ld.acq = and > =A0 "val =3D v" was implemented by st.rel; on mips and sparc64 Assembler = code > =A0 can have different order of lines for volatile and simple variable > =A0 (depends on the code of a function). I think this depends somewhat on the hardware and what you mean by "current" value. If you want a value that is not in-flux, then something like atomic_cmpset_ptr() setting to the current value is needed, so that you force any other atomic_cmpset to fail. However, since there is no explicit lock involved, there is no strong meaning for "current" value and a read that does not rely on a value cached in a register is likely sufficient. While the "volatile" keyword in C has no explicit hardware meaning, it often means that a load from memory (or, presumably, L1-L3 cache) is required. > 2. Let there is a default (sleep) mutex and adaptive mutexes is enabled. > =A0 A thread tries to obtain lock quickly and fails, _mtx_lock_sleep() > =A0 is called, it gets the address of the current mutex's owner thread > =A0 and checks whether that owner thread is running (on another CPU). > =A0 How does _mtx_lock_sleep() know that that thread still exists > =A0 (lines 311-337 in kern_mutex.c)? > > =A0 When adaptive mutexes was implemented there was explicit locking > =A0 around adaptive mutexes code. =A0When turnstile in mutex code was > =A0 implemented that's locking logic was changed. It appears that it's possible for the thread pointer to be recycled between fetching the value of owner and looking at TD_IS_RUNNING. On actual hardware, this race is unlikely to occur due to the time it takes for a thread to release a lock and perform all of thread exit code before the struct thread is returned to the uma zone. However, even once returned to the uma zone on many FreeBSD implementations the access is safe as the address of the thread is still dereferenceable, due to the implementation of uma zones. On e.g. AIX this issue was different because the address range for threads was determined at compile time (one giant table) and the array only grew, never shrank, so the thread pointer was always valid and would be recycled at first opportunity. It appears to me, from a strict correctness standpoint, that the use of uma_zalloc/uma_zfree for thread objects is not safe. But from a practical implementation POV, the unsafe access in kern_mutex.c will not cause trouble in the absence of a hypervisor controlling when virtual CPUs get runtime. > 3. Why there is no any memory barrier in mtx_init()? =A0If another thread > =A0 (on another CPU) finds that mutex is initialized using mtx_initialize= d() > =A0 then it can mtx_lock() it and mtx_lock() it second time, as a result > =A0 mtx_recurse field will be increased, but its value still can be > =A0 uninitialized on architecture with relaxed memory ordering model. It seems to me that it's generally a programming error to rely on the return of mtx_initialized(), as there is no serialization with e.g. a thread calling mtx_destroy(). A fully correct serialization model would require that a single thread initialize the mtx and then create any worker threads that will use the mtx. Cheers, matthew From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 18:54:32 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A23DD106566C for ; Wed, 15 Sep 2010 18:54:32 +0000 (UTC) (envelope-from PHeyman@adaranet.com) Received: from barracuda.adaranet.com (smtp.adaranet.com [72.5.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 80AAA8FC14 for ; Wed, 15 Sep 2010 18:54:32 +0000 (UTC) X-ASG-Debug-ID: 1284576871-506119a40001-P5m3U7 Received: from SJ-EXCH-1.adaranet.com ([10.10.1.29]) by barracuda.adaranet.com with ESMTP id P3HKNALWk1viV6wp for ; Wed, 15 Sep 2010 11:54:31 -0700 (PDT) X-Barracuda-Envelope-From: PHeyman@adaranet.com Received: from SJ-EXCH-1.adaranet.com ([fe80::7042:d8c2:5973:c523]) by SJ-EXCH-1.adaranet.com ([fe80::7042:d8c2:5973:c523%14]) with mapi; Wed, 15 Sep 2010 11:54:31 -0700 From: Paul Heyman X-Barracuda-BBL-IP: fe80::7042:d8c2:5973:c523 X-Barracuda-RBL-IP: fe80::7042:d8c2:5973:c523 To: "freebsd-hackers@freebsd.org" Date: Wed, 15 Sep 2010 11:53:16 -0700 X-ASG-Orig-Subj: Crash dump on HP Proliant G6 broken as of V8.0 Thread-Topic: Crash dump on HP Proliant G6 broken as of V8.0 Thread-Index: AQHLVGJZfuEva5LEmEq7OF4hw9IBSpMTQSawgAAYjsOAAAwKSg== Message-ID: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A9F@SJ-EXCH-1.adaranet.com> References: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A95@SJ-EXCH-1.adaranet.com>, <32AB5C9615CC494997D9ABB1DB12783C024C8DE83F@SJ-EXCH-1.adaranet.com>, <32AB5C9615CC494997D9ABB1DB12783C024C8C5A9C@SJ-EXCH-1.adaranet.com> In-Reply-To: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A9C@SJ-EXCH-1.adaranet.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Barracuda-Connect: UNKNOWN[10.10.1.29] X-Barracuda-Start-Time: 1284576871 X-Barracuda-URL: http://172.16.10.203:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at adaranet.com Cc: Patrick Mahan Subject: Crash dump on HP Proliant G6 broken as of V8.0 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 18:54:32 -0000 ALL, The crash dump worked fine in V7.3. I am debugging crash dump problem on a HP Proliant G6 which uses a SATA drive connected to a CISS Raid Controller. I have tried this on a x86 box using a non-raid ATA/SATA disk controller and it works well. I noticed that in V8.0 there is a new SCSI operating method. In the v7.3 ve= rsion there was only CISS_TRANSPORT_METHOD_SIMPLE, but in v8.0 there has been CISS_TRANSPORT_ME= THOD_PERF method added. These methods have different function calls in ciss_poll_request. The dump comand starts with a call to dadump. This function will setup a struct ccb_scsiio structure. This is done by cal= ling scsi_read_write. Then the meat of dump happens when it calls xpt_polled_action, which manag= es and simualtes interrupt functionality that is working fine. The disk operations work fine= except during a crash dump. I have turned debug on for CISS and CAMDEBUG to debug this problem. In xpt_polled_action (cam_xpt.c) we get past the first polling loop at line= 3013, as both devq->send_opening and dev->ccbq.dev_openings are > 0 ( 256 and 254 )= . But we do get stuck in the second one at line 3025. We eventually time out setting start_ccb->ccb_h.status to CAM_CMD_TIMEOUT. The timeout is set with DA_DEFAULT_TIMEOUT (scsi_da.c) which is set to 60, and is used in the call = to scsi_read_write. Here is the debug trace: Dumping 1240 MB: ciss_cam_action_io: XPT_SCSI_IO 0:0:0 ciss_get_request: called ciss_start: post command 150 tag 600 ciss_map_request: called ciss_request_map_helper: called ciss_cam_poll: called ciss_perf_done: completed command 150 ciss_perf_done: completed command 150 ciss_complete: called ciss_unmap_request: called ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK ciss_release_request: called ciss_complete: called ciss_unmap_request: called ciss0: WARNING: completing non-busy request ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK . . . . after about 60 seconds ciss0: WARNING: completing non-busy request ciss0: WARNING: completed command with no submitter ciss_unmap_request: called . . . This goes on forever Thanks Paul Paul Heyman pheyman@adaranetworks.com From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 19:09:57 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6649D1065696 for ; Wed, 15 Sep 2010 19:09:54 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C3C8B8FC1A for ; Wed, 15 Sep 2010 19:09:53 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 4AA5A46C20; Wed, 15 Sep 2010 15:09:53 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 4FECA8A04F; Wed, 15 Sep 2010 15:09:52 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Wed, 15 Sep 2010 15:09:49 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009151509.49728.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 15 Sep 2010 15:09:52 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: cronfy Subject: Re: is vfs.lookup_shared unsafe in 7.3? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 19:09:57 -0000 On Monday, September 13, 2010 4:57:15 pm cronfy wrote: > Hello, > > Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA > > 40, very slow lstat() at these moments, looks like some kind of lock > contention) I enabled vfs.lookup_shared=1 on two servers today. One is > FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is > FreeBSD-7.3 csup'ed and built Jul 16 2010. > > The server with more fresh kernel is running nice and does not show > high load anymore. But on the second server it did not help. More, > after a few hours of work with vfs.lookup_shared=1 I noticed processes > stucked in "ufs" state. I tried to kill them with no luck. Disabling > vfs.lookup_shared freezed the whole system. > > So, is vfs.lookup_shared=1 unsafe in 7.3? Did it become more stable > between 16 Jul and 9 Sep (is it the reason why first system is still > running?), or should I expect that it will freeze in a near time too? > > Thanks in advance! No, 7.3 has a bug that can cause these hangs that is probably made worse by vfs.lookup_shared=1, but can occur even if it is disabled. You want these fixes applied (in order, one of them reverts part of another): Author: jhb Date: Fri Jul 16 20:23:24 2010 New Revision: 210173 URL: http://svn.freebsd.org/changeset/base/210173 Log: When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted into a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case. The softupdates code also toggles LK_NOSHARE in one function to close a race with snapshots. Fix this code to grab the interlock while fiddling with lk_flags. Modified: stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c stable/7/sys/fs/cd9660/cd9660_vfsops.c stable/7/sys/fs/udf/udf_vfsops.c stable/7/sys/ufs/ffs/ffs_softdep.c stable/7/sys/ufs/ffs/ffs_vfsops.c Author: jhb Date: Fri Aug 20 20:33:13 2010 New Revision: 211532 URL: http://svn.freebsd.org/changeset/base/211532 Log: MFC: Use VN_LOCK_AREC() and VN_LOCK_ASHARE() rather than manipulating lockmgr lock flags directly. Modified: stable/7/sys/fs/nwfs/nwfs_node.c stable/7/sys/fs/pseudofs/pseudofs_vncache.c stable/7/sys/fs/smbfs/smbfs_node.c stable/7/sys/gnu/fs/xfs/FreeBSD/xfs_freebsd_iget.c stable/7/sys/kern/vfs_lookup.c Author: jhb Date: Fri Aug 20 20:58:57 2010 New Revision: 211533 URL: http://svn.freebsd.org/changeset/base/211533 Log: Revert 210173 as it did not properly fix the bug. It assumed that the VI_LOCK() for a given vnode was used as the internal interlock for that vnode's v_lock lockmgr lock. This is not the case. Instead, add dedicated routines to toggle the LK_NOSHARE and LK_CANRECURSE flags. These routines lock the lockmgr lock's internal interlock to synchronize the updates to the flags member with other threads attempting to acquire the lock. The VN_LOCK_A*() macros now invoke these routines, and the softupdates code uses these routines to temporarly enable recursion on buffer locks. Reviewed by: kib Modified: stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c stable/7/sys/fs/cd9660/cd9660_vfsops.c stable/7/sys/fs/udf/udf_vfsops.c stable/7/sys/kern/kern_lock.c stable/7/sys/sys/lockmgr.h stable/7/sys/sys/vnode.h stable/7/sys/ufs/ffs/ffs_softdep.c stable/7/sys/ufs/ffs/ffs_vfsops.c -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 19:22:17 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D89BF106567A for ; Wed, 15 Sep 2010 19:22:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8ED698FC12 for ; Wed, 15 Sep 2010 19:22:17 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 2D61B46C20; Wed, 15 Sep 2010 15:22:17 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 2B0808A03C; Wed, 15 Sep 2010 15:22:16 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Wed, 15 Sep 2010 15:22:15 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009151522.15593.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 15 Sep 2010 15:22:16 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Andrey Simonenko , Matthew Fleming Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 19:22:17 -0000 On Wednesday, September 15, 2010 11:46:00 am Matthew Fleming wrote: > I'll take a stab at answering these... > > On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko > wrote: > > Hello, > > > > I have questions about mutex implementation in kern/kern_mutex.c > > and sys/mutex.h files (current versions of these files): > > > > 1. Is the following statement correct for a volatile pointer or integer > > variable: if a volatile variable is updated by the compare-and-set > > instruction (e.g. atomic_cmpset_ptr(&val, ...)), then the current > > value of such variable can be read without any special instruction > > (e.g. v = val)? > > > > I checked Assembler code for a function with "v = val" and "val = v" > > like statements generated for volatile variable and simple variable > > and found differences: on ia64 "v = val" was implemented by ld.acq and > > "val = v" was implemented by st.rel; on mips and sparc64 Assembler code > > can have different order of lines for volatile and simple variable > > (depends on the code of a function). > > I think this depends somewhat on the hardware and what you mean by > "current" value. > > If you want a value that is not in-flux, then something like > atomic_cmpset_ptr() setting to the current value is needed, so that > you force any other atomic_cmpset to fail. However, since there is no > explicit lock involved, there is no strong meaning for "current" value > and a read that does not rely on a value cached in a register is > likely sufficient. While the "volatile" keyword in C has no explicit > hardware meaning, it often means that a load from memory (or, > presumably, L1-L3 cache) is required. Actually, all we care about is getting a consistent snapshot of the value of the lock cookie at some point in time. For that 'v = val' works fine. The value may certainly be stale, but the mutex code handles these races in two ways: 1) If MTX_CONTESTED is not set, then the lock cookie value can change at any time to either be unlocked, locked by another thread, or to become contested. If any of those actions occur, then the attempt to set the MTX_CONTESTED bit via atomic_cmpset() in _mtx_lock_sleep() will fail causing the code to retry its loop until it succesfully sets MTX_CONTESTED or it notices a different lock cookie state. 2) Once MTX_CONTESTED is set, the value of the lock cookie will not be changed unless the associated turnstile chain is locked. This means that once we have locked the turnstile chain and verified that MTX_CONTESTED is set (or successfully set the bit), we can call turnstile_wait() to block without assured that the owner of the lock will resume this thread via turnstile_wakeup() when it releases the lock. > > 2. Let there is a default (sleep) mutex and adaptive mutexes is enabled. > > A thread tries to obtain lock quickly and fails, _mtx_lock_sleep() > > is called, it gets the address of the current mutex's owner thread > > and checks whether that owner thread is running (on another CPU). > > How does _mtx_lock_sleep() know that that thread still exists > > (lines 311-337 in kern_mutex.c)? > > > > When adaptive mutexes was implemented there was explicit locking > > around adaptive mutexes code. When turnstile in mutex code was > > implemented that's locking logic was changed. > > It appears that it's possible for the thread pointer to be recycled > between fetching the value of owner and looking at TD_IS_RUNNING. On > actual hardware, this race is unlikely to occur due to the time it > takes for a thread to release a lock and perform all of thread exit > code before the struct thread is returned to the uma zone. However, > even once returned to the uma zone on many FreeBSD implementations the > access is safe as the address of the thread is still dereferenceable, > due to the implementation of uma zones. > > On e.g. AIX this issue was different because the address range for > threads was determined at compile time (one giant table) and the array > only grew, never shrank, so the thread pointer was always valid and > would be recycled at first opportunity. > > It appears to me, from a strict correctness standpoint, that the use > of uma_zalloc/uma_zfree for thread objects is not safe. But from a > practical implementation POV, the unsafe access in kern_mutex.c will > not cause trouble in the absence of a hypervisor controlling when > virtual CPUs get runtime. Yes, it is a known "accepted" race. This does probably warrant a comment to say as much. One could perhaps remove the race by using the owning thread's td_cpu to do a pcpu_find() and comparing pc_curthread against the cached 'owner' value instead. I think even in that case you can still be subject to the same theoretical race however if a HV prevented you from running in between setting 'owner' and dereferencing 'owner->td_oncpu'. However, I might actually prefer switching to the 'pc_curthread' approach only because it does less work on each spin. > > 3. Why there is no any memory barrier in mtx_init()? If another thread > > (on another CPU) finds that mutex is initialized using mtx_initialized() > > then it can mtx_lock() it and mtx_lock() it second time, as a result > > mtx_recurse field will be increased, but its value still can be > > uninitialized on architecture with relaxed memory ordering model. > > It seems to me that it's generally a programming error to rely on the > return of mtx_initialized(), as there is no serialization with e.g. a > thread calling mtx_destroy(). A fully correct serialization model > would require that a single thread initialize the mtx and then create > any worker threads that will use the mtx. Yes, it is the caller's job to not expose a mtx until after it has been initialized. A memory barrier in mtx_init() can't solve all those races. If you put an object containing a mutex on a global queue and only invoke mtx_init() after dropping the global lock protecting the global queue, no amount of memory barriers in mtx_init() will save you. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 20:24:50 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A305B1065672; Wed, 15 Sep 2010 20:24:50 +0000 (UTC) (envelope-from nox@jelal.kn-bremen.de) Received: from smtp.kn-bremen.de (gelbbaer.kn-bremen.de [78.46.108.116]) by mx1.freebsd.org (Postfix) with ESMTP id 62EC48FC15; Wed, 15 Sep 2010 20:24:50 +0000 (UTC) Received: by smtp.kn-bremen.de (Postfix, from userid 10) id 306C21E007A9; Wed, 15 Sep 2010 22:07:18 +0200 (CEST) Received: from triton8.kn-bremen.de (noident@localhost [127.0.0.1]) by triton8.kn-bremen.de (8.14.4/8.14.3) with ESMTP id o8FK6ZpK039590; Wed, 15 Sep 2010 22:06:35 +0200 (CEST) (envelope-from nox@triton8.kn-bremen.de) Received: (from nox@localhost) by triton8.kn-bremen.de (8.14.4/8.14.3/Submit) id o8FK6ZUR039589; Wed, 15 Sep 2010 22:06:35 +0200 (CEST) (envelope-from nox) From: Juergen Lock Date: Wed, 15 Sep 2010 22:06:34 +0200 To: hackers@freebsd.org Message-ID: <20100915200634.GA38314@triton8.kn-bremen.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-Mailman-Approved-At: Wed, 15 Sep 2010 20:41:01 +0000 Cc: Doug Rabson Subject: So I got "The D Programming Language" (and: threaded .xz compression) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 20:24:50 -0000 (that's this book: http://amazon.com/exec/obidos/ASIN/0321635361/modecdesi-20 Author's homepage: http://erdani.com/ ) ...and finally played with the language a bit. I've posted some notes about getting dmd 2.048 running on FreeBSD (that's the D 2.0 compiler + runtime + phobos libs), debugging with gdb head and Doug Rabson's D-aware debugger ngdb [1], and my first (useful) hack, a threaded .xz compressor, here: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=117243 (more links in there; also read the followups...) [1] ngdb announce message: http://lists.freebsd.org/pipermail/freebsd-current/2009-August/011071.html Cheeers, Juergen From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 15 21:43:32 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EECF3106564A; Wed, 15 Sep 2010 21:43:32 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 65A7C8FC08; Wed, 15 Sep 2010 21:43:32 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8FLhF2c022234; Wed, 15 Sep 2010 23:43:30 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8FLhE9p022233; Wed, 15 Sep 2010 23:43:14 +0200 (CEST) (envelope-from olli) Date: Wed, 15 Sep 2010 23:43:14 +0200 (CEST) Message-Id: <201009152143.o8FLhE9p022233@lurza.secnetix.de> From: Oliver Fromme To: freebsd-hackers@FreeBSD.ORG, wblock@wonkity.com, mav@FreeBSD.ORG In-Reply-To: X-Newsgroups: list.freebsd-hackers User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Wed, 15 Sep 2010 23:43:31 +0200 (CEST) Cc: Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2010 21:43:33 -0000 Warren Block wrote: > [...] > 8. Alexander Motin has an updated CAM version of the ATA system which > will eventually replace the existing one. In -CURRENT, anyway. He was > kind enough to look at my event handler. My understanding is that he is > looking at implementing the head parking/standby mechanism in that new > code. The patch below will work with the new CAM ATA driver (i.e. ada(4) disks). It adds a sysctl, so you can switch the spin-down off if you're going to just reboot: # sysctl kern.cam.ada.spindown_shutdown=0 This patch applies to stable/8, but I think it should work with current, too (I haven't tried because I don't have a machine running HEAD that has ada(4) disks). Best regards Oliver --- ata_da.c.orig 2010-05-23 18:16:33.000000000 +0200 +++ ata_da.c 2010-09-15 22:48:03.000000000 +0200 @@ -79,7 +79,8 @@ ADA_FLAG_CAN_TRIM = 0x080, ADA_FLAG_OPEN = 0x100, ADA_FLAG_SCTX_INIT = 0x200, - ADA_FLAG_CAN_CFA = 0x400 + ADA_FLAG_CAN_CFA = 0x400, + ADA_FLAG_CAN_POWERMGT = 0x800 } ada_flags; typedef enum { @@ -180,6 +181,10 @@ #define ADA_DEFAULT_SEND_ORDERED 1 #endif +#ifndef ADA_DEFAULT_SPINDOWN_SHUTDOWN +#define ADA_DEFAULT_SPINDOWN_SHUTDOWN 1 +#endif + /* * Most platforms map firmware geometry to actual, but some don't. If * not overridden, default to nothing. @@ -191,6 +196,7 @@ static int ada_retry_count = ADA_DEFAULT_RETRY; static int ada_default_timeout = ADA_DEFAULT_TIMEOUT; static int ada_send_ordered = ADA_DEFAULT_SEND_ORDERED; +static int ada_spindown_shutdown = ADA_DEFAULT_SPINDOWN_SHUTDOWN; SYSCTL_NODE(_kern_cam, OID_AUTO, ada, CTLFLAG_RD, 0, "CAM Direct Access Disk driver"); @@ -203,6 +209,9 @@ SYSCTL_INT(_kern_cam_ada, OID_AUTO, ada_send_ordered, CTLFLAG_RW, &ada_send_ordered, 0, "Send Ordered Tags"); TUNABLE_INT("kern.cam.ada.ada_send_ordered", &ada_send_ordered); +SYSCTL_INT(_kern_cam_ada, OID_AUTO, spindown_shutdown, CTLFLAG_RW, + &ada_spindown_shutdown, 0, "Spin down upon shutdown"); +TUNABLE_INT("kern.cam.ada.spindown_shutdown", &ada_spindown_shutdown); /* * ADA_ORDEREDTAG_INTERVAL determines how often, relative @@ -665,6 +674,8 @@ softc->flags |= ADA_FLAG_CAN_48BIT; if (cgd->ident_data.support.command2 & ATA_SUPPORT_FLUSHCACHE) softc->flags |= ADA_FLAG_CAN_FLUSHCACHE; + if (cgd->ident_data.support.command2 & ATA_SUPPORT_POWERMGT) + softc->flags |= ADA_FLAG_CAN_POWERMGT; if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ && cgd->inq_flags & SID_CmdQue) softc->flags |= ADA_FLAG_CAN_NCQ; @@ -1222,6 +1233,57 @@ /*getcount_only*/0); cam_periph_unlock(periph); } + + if (ada_spindown_shutdown == 0) + return; + + DELAY(500000); + + TAILQ_FOREACH(periph, &adadriver.units, unit_links) { + union ccb ccb; + + /* If we paniced with lock held - not recurse here. */ + if (cam_periph_owned(periph)) + continue; + cam_periph_lock(periph); + softc = (struct ada_softc *)periph->softc; + /* + * We only spin-down the drive if it is capable of it.. + */ + if ((softc->flags & ADA_FLAG_CAN_POWERMGT) == 0) { + cam_periph_unlock(periph); + continue; + } + + /* XXX Hide this behind bootverbose? */ + xpt_print(periph->path, "spin-down\n"); + + xpt_setup_ccb(&ccb.ccb_h, periph->path, CAM_PRIORITY_NORMAL); + + ccb.ccb_h.ccb_state = ADA_CCB_DUMP; + cam_fill_ataio(&ccb.ataio, + 1, + adadone, + CAM_DIR_NONE, + 0, + NULL, + 0, + ada_default_timeout*1000); + + ata_28bit_cmd(&ccb.ataio, ATA_STANDBY_IMMEDIATE, 0, 0, 0); + xpt_polled_action(&ccb); + + if ((ccb.ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) + xpt_print(periph->path, "Spin-down disk failed\n"); + + if ((ccb.ccb_h.status & CAM_DEV_QFRZN) != 0) + cam_release_devq(ccb.ccb_h.path, + /*relsim_flags*/0, + /*reduction*/0, + /*timeout*/0, + /*getcount_only*/0); + cam_periph_unlock(periph); + } } #endif /* _KERNEL */ -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd Python is executable pseudocode. Perl is executable line noise. From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 00:12:28 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78F42106564A for ; Thu, 16 Sep 2010 00:12:28 +0000 (UTC) (envelope-from PMahan@adaranet.com) Received: from barracuda.adaranet.com (smtp.adaranet.com [72.5.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 5CE288FC16 for ; Thu, 16 Sep 2010 00:12:28 +0000 (UTC) X-ASG-Debug-ID: 1284595010-50611bbc0001-P5m3U7 Received: from SJ-EXCH-1.adaranet.com ([10.10.1.29]) by barracuda.adaranet.com with ESMTP id EWSJLDgI0sj1uFlI for ; Wed, 15 Sep 2010 16:56:50 -0700 (PDT) X-Barracuda-Envelope-From: PMahan@adaranet.com Received: from mycroft.adaranet.com (10.10.24.100) by SJ-EXCH-1.adaranet.com (10.10.1.29) with Microsoft SMTP Server (TLS) id 8.1.240.5; Wed, 15 Sep 2010 16:56:49 -0700 Message-ID: <4C915E4F.9030006@adaranet.com> X-Barracuda-BBL-IP: nil Date: Wed, 15 Sep 2010 17:01:19 -0700 From: Patrick Mahan User-Agent: Thunderbird 2.0.0.23 (X11/20091021) MIME-Version: 1.0 To: X-ASG-Orig-Subj: odd issues with DDB vs GDB Content-Type: multipart/mixed; boundary="------------090702000608020704010105" X-Barracuda-Connect: UNKNOWN[10.10.1.29] X-Barracuda-Start-Time: 1284595010 X-Barracuda-URL: http://172.16.10.203:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at adaranet.com Subject: odd issues with DDB vs GDB X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 00:12:28 -0000 --------------090702000608020704010105 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit All, I am trying to debug a system hang occurring on my HP Proliant G6 running some of our kernel software. I am seeing that under certain test loads, the system will hang-up complete, no keyboard, no console, etc. I suspect it is some of the kernel code that I have inherited that contains a lot of locking (lots of data structure, each having their own mutex lock (sleepable)). I rebuilt the kernel to include the following: options KDB options DDB options GDB options MUTEX_NOINLINE options MUTEX_DEBUG options WITNESS options WITNESS_SKIPSPIN options SW_WATCHDOG # Enable to force us into the debugger on a hang This places me in the kernel DDB debugger. The backtrace show by DDB makes a lot of sense, it is showing we are blocked in _mtx_lock_flags()+0x6f. Great, so I go to enable GDB - db> gdb Step to enter the remote GDB backend. db> s $T0510:a6f86c80fff*";thread:186c0;#62 gdb kernel.debug Current directory is ~/devel/pm_bz5486/FBSD80REL/amd64/obj/usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/MPATH/ GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... (gdb) target remote 10.10.29.111:7028 Remote debugging using 10.10.29.111:7028 0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361 warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. warning: shared library handler failed to enable breakpoint gdb> So right away I am somewhat suspicious as it is showing me a completely different entry point. DDB showed Tracing pid 0 tid 100032 td 0xffffff0002668390 breakpoint() at breakpoint+0x5 kdb_enter() at kdb_enter+0x52 watchdog_fire() at watchdog_fire+0xda hardclock() at hardclock+0x73 lapic_handle_timer() at lapic_handle_timer+0x120 Xtimerint() at Xtimerint+0x8c But GDB is showing the above. A backtrace (bt) in GDB does not show the same stack signature. I have attached the complete log for those who are interested. Is there a reason for the wide difference between DDB and GDB? Am I invoking gdb incorrectly? Thanks for the education, as always! Patrick --------------090702000608020704010105 Content-Type: text/plain; name="kernel_debug_prob.txt" Content-Transfer-Encoding: 8bit Content-Disposition: inline; filename="kernel_debug_prob.txt" Debugging a system hang. Enabled watchdog(4) built kernel with KDB, DDB and GDB. I am trying to debug this via remote GDB but what DDB shows for a stack trace and what GDB shows are two seperate animals. External serial port setup with the following in /boot/loader.conf console="comconsole vidconsole" comconsole_speed=9600 hint.uart.0.flags="0x90" Serial is accessed via a cyclades ACS console server. 'telnet 10.10.29.111 70XX' where XX is the physical port number. System comes up fine, testing is initiated, eventually the system hangs and the watchdog fires dropping us into DDB - DDB output db> trace Tracing pid 0 tid 100032 td 0xffffff0002668390 breakpoint() at breakpoint+0x5 kdb_enter() at kdb_enter+0x52 watchdog_fire() at watchdog_fire+0xda hardclock() at hardclock+0x73 lapic_handle_timer() at lapic_handle_timer+0x120 Xtimerint() at Xtimerint+0x8c --- interrupt, rip = 0xffffffff80688532, rsp = 0xffffff800011e460, rbp = 0xffffff800011e4c0 --- _mtx_lock_sleep() at _mtx_lock_sleep+0x92 _mtx_lock_flags() at _mtx_lock_flags+0x6f VCDgetWithIIFremote() at VCDgetWithIIFremote+0x3f ProcessDataPkt() at ProcessDataPkt+0x3dc ip_input() at ip_input+0xa24 netisr_dispatch_src() at netisr_dispatch_src+0xe3 netisr_dispatch() at netisr_dispatch+0x20 gif_input() at gif_input+0x324 in_gif_input() at in_gif_input+0x28f encap4_input() at encap4_input+0x1b8 ip_input() at ip_input+0xd1a netisr_dispatch_src() at netisr_dispatch_src+0xe3 netisr_dispatch() at netisr_dispatch+0x20 ether_demux() at ether_demux+0x1f3 ether_input() at ether_input+0x4ab em_rxeof() at em_rxeof+0x410 em_handle_que() at em_handle_que+0x6f taskqueue_run() at taskqueue_run+0xbb taskqueue_thread_loop() at taskqueue_thread_loop+0x33 fork_exit() at fork_exit+0xba fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800011ed30, rbp = 0 --- db>gdb Step to enter the remote GDB backend. db>s ^] telnet> quit # # Enter the debugger via remote gdb # gdb kernel.debug Current directory is ~/devel/pm_bz5486/FBSD80REL/amd64/obj/usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/MPATH/ GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... (gdb) target remote 10.10.29.111:7028 Remote debugging using 10.10.29.111:7028 0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361 warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. warning: shared library handler failed to enable breakpoint (gdb) bt #0 0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361 #1 0xffffffff8064c4da in _cv_wait (cvp=0xffffff800011e340, lock=0xffffffff80a9cd1d) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/kern_condvar.c:102 #2 0xffffffff8064bd33 in tvtohz (tv=0x2668390) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/kern_clock.c:371 #3 0xffffffff80988cf0 in lapic_handle_timer (frame=0xffffff800011e3b0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/amd64/amd64/local_apic.c:792 #4 0xffffffff809816ac in Xinvlpg () at apic_vector.S:146 #5 0xffffff0107bff3a0 in ?? () #6 0xffffff0107bff3a0 in ?? () #7 0x0000000000000004 in ?? () #8 0xffffff0002668390 in ?? () #9 0x0000000000000943 in ?? () #10 0xffffff800011e5e4 in ?? () #11 0x0000000000000004 in ?? () #12 0xffffff0002668000 in ?? () #13 0xffffff800011e4c0 in ?? () #14 0x000000000afe0014 in ?? () #15 0x0000000000000006 in ?? () #16 0xffffffff806dfd30 in taskqueue_thread_loop (arg=0xffffff0002668000) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_taskqueue.c:359 #17 0xffffffff8068811f in atomic_cmpset_long (dst=0x7bff300, exp=0xffffffff80a9bc70, src=0x9430011e530) at atomic.h:158 #18 0xffffffff8063b6cf in VAagingTimer (dummy=0xffffff0107bff388) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/ipr/virtual_circuits.c:2389 #19 0xffffffff8062998c in ProcessDataPkt (socklyr=0x0, iif=0xffffff010798de00, protocol=0x6, src_addr={s_addr = 0xafe0014}, dst_addr={s_addr = 0xafa001b}, src_port=0x1f90, dst_port=0x402, tcp_flags=0x12, pkt=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/ipr/mpvc_forward.c:227 #20 0xffffffff807aa6c4 in ip_input (m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/netinet/ip_input.c:1032 #21 0xffffffff80778d43 in netisr_dispatch_src (proto=0x1, source=0x0, m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:934 #22 0xffffffff80779060 in netisr_start_swi (cpuid=0xffffff00, pc=0xffffffff8104eee0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:1034 #23 0xffffffff8076ff14 in gif_ioctl (ifp=0xffffff00026bd800, cmd=0x20011e790, data=0xffffffff8076ff14 "ÉÃfff\220ff\220ff\220UH\211åH\201ì\220") at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/if_gif.c:694 #24 0xffffffff8079af8f in gif_validate4 (ip=0xffffffff807a67f4, sc=0xffffff0003ad8700, ifp=0x1449ba01c0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/netinet/in_gif.c:396 #25 0xffffffff807a5c38 in encap6_input (mp=0xffffff0002668390, offp=0x1400000002, proto=0x4) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/netinet/ip_encap.c:206 #26 0xffffffff807aa9ba in __bswap16 (_x=0x0) at endian.h:135 #27 0xffffffff80778d43 in netisr_dispatch_src (proto=0x1, source=0x0, m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:934 #28 0xffffffff80779060 in netisr_start_swi (cpuid=0xffffffff, pc=0xffffff800011ea10) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/netisr.c:1034 #29 0xffffffff8076bc83 in ether_demux (ifp=0xffffff00026f6800, m=0xffffff0003ad8700) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/if_ethersubr.c:911 #30 0xffffffff8076ba4b in ether_demux (ifp=0xffffff0003ad8700, m=0xffffff800011ead0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/net/if_ethersubr.c:778 #31 0xffffffff8038aa70 in em_rxeof (rxr=0xffffff0002719c00, count=0x63, done=0x0) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/dev/e1000/if_em.c:4188 #32 0xffffffff8038360f in em_handle_que (context=0xffffff80003fc000, pending=0x1) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/dev/e1000/if_em.c:1451 #33 0xffffffff806df78b in taskqueue_drain (queue=0xffffff80004006e0, task=0x100000001) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_taskqueue.c:256 #34 0xffffffff806dfd63 in taskqueue_thread_loop () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_taskqueue.c:375 #35 0x0000034380d52f40 in ?? () #36 0xffffff80004006e0 in ?? () #37 0xffffff0002711c00 in ?? () #38 0xffffff80004006e0 in ?? () #39 0xffffff800011ec70 in ?? () #40 0xffffffff8066b08a in fork_exit (callout=0xffffffff806df78b , arg=0xffffff800011ebc0, frame=0xffffff0002711c00) at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/kern_fork.c:856 Previous frame identical to this frame (corrupt stack?) I also did an "info threads" (output omitted) Here is thread 100032 as gdb sees it. 392 Thread 100032 0xffffffff806cf8a6 in kdb_init () at /usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361 while ddb saw Tracing pid 0 tid 100032 td 0xffffff0002668390 Why can I not see the stack correctly in gdb? --------------090702000608020704010105-- From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 00:49:02 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: by hub.freebsd.org (Postfix, from userid 1233) id 20EF2106567A; Thu, 16 Sep 2010 00:49:02 +0000 (UTC) Date: Thu, 16 Sep 2010 00:49:02 +0000 From: Alexander Best To: Oliver Fromme Message-ID: <20100916004902.GA46401@freebsd.org> References: <201009152143.o8FLhE9p022233@lurza.secnetix.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <201009152143.o8FLhE9p022233@lurza.secnetix.de> Cc: freebsd-hackers@FreeBSD.ORG, mav@FreeBSD.ORG Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 00:49:02 -0000 On Wed Sep 15 10, Oliver Fromme wrote: > Warren Block wrote: > > [...] > > 8. Alexander Motin has an updated CAM version of the ATA system which > > will eventually replace the existing one. In -CURRENT, anyway. He was > > kind enough to look at my event handler. My understanding is that he is > > looking at implementing the head parking/standby mechanism in that new > > code. > > The patch below will work with the new CAM ATA driver > (i.e. ada(4) disks). It adds a sysctl, so you can switch > the spin-down off if you're going to just reboot: > # sysctl kern.cam.ada.spindown_shutdown=0 i haven't tested your patch yet, but i don't think deciding whether to spin down the hdd should be decided merely from the sysctl value. the hdd should spindown when a shutdown has been issued and not spindown, if a reboot has been issued. either people have the sysctl set to 1 in which case a reboot will cause a spindown (which isn't healthy for the hdd) ...or people will set it to 0 in which case everything remains just the way it is. imo the sysctl should stay, but shuld have a different meaning. if it is set to 1 (which should be the default) a shutdown will issue a spindown; a reboot won't. if for some reason people want back the current behavior (no spindown even during a shutdown) they need to set it to 0. deciding whether freebsd reboots or shuts down cannot be done from a script, since users might use the reboot or halt commands in which case (if i'm not mistaken) all shutdown scripts get skipped. cheers. alex > > This patch applies to stable/8, but I think it should > work with current, too (I haven't tried because I don't > have a machine running HEAD that has ada(4) disks). > > Best regards > Oliver > > > > > -- > Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. > Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: > secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- > chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart > > FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd > > Python is executable pseudocode. Perl is executable line noise. -- a13x From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 01:01:20 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id C7B9E1065679; Thu, 16 Sep 2010 01:01:20 +0000 (UTC) Date: Thu, 16 Sep 2010 01:01:20 +0000 From: Alexander Best To: freebsd-hackers@freebsd.org Message-ID: <20100916010120.GA49997@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="W/nzBZO5zC0uMSeA" Content-Disposition: inline Subject: traling whitespace in CFLAGS if make.conf:CPUTYPE is not defined/empty X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 01:01:20 -0000 --W/nzBZO5zC0uMSeA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline hi there, after discovering PR #114082 i noticed that with CPUTYPE not being defined in make.conf, `make -VCFLAGS` reports a trailing whitespace for CFLAGS. the reason for this is that ${_CPUCFLAGS} gets added to CFLAGS even if it's empty. the following patch should take care of the problem. i also added the same logik to COPTFLAGS. although i wasn't able to trigger the trailing whitespace, it should still introduce a cleaner behaviour. cheers. alex -- a13x --W/nzBZO5zC0uMSeA Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="bsd.cpu.mk-and-kern.pre.mk.diff" diff --git a/share/mk/bsd.cpu.mk b/share/mk/bsd.cpu.mk index e3ad18b..fa7fb32 100644 --- a/share/mk/bsd.cpu.mk +++ b/share/mk/bsd.cpu.mk @@ -6,6 +6,7 @@ .if !defined(CPUTYPE) || empty(CPUTYPE) _CPUCFLAGS = +NO_CPU_CFLAGS = . if ${MACHINE_ARCH} == "i386" MACHINE_CPU = i486 . elif ${MACHINE_ARCH} == "amd64" diff --git a/sys/conf/kern.pre.mk b/sys/conf/kern.pre.mk index d4bdc1f..9929176 100644 --- a/sys/conf/kern.pre.mk +++ b/sys/conf/kern.pre.mk @@ -23,6 +23,10 @@ NM?= nm OBJCOPY?= objcopy SIZE?= size +.if !defined(CPUTYPE) || empty(CPUTYPE) +_CPUCFLAGS = +NO_CPU_COPTFLAGS = +.endif .if ${CC:T:Micc} == "icc" COPTFLAGS?= -O .else --W/nzBZO5zC0uMSeA-- From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 01:12:28 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 60AB8106564A for ; Thu, 16 Sep 2010 01:12:28 +0000 (UTC) (envelope-from PMahan@adaranet.com) Received: from barracuda.adaranet.com (smtp.adaranet.com [72.5.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 481278FC08 for ; Thu, 16 Sep 2010 01:12:28 +0000 (UTC) X-ASG-Debug-ID: 1284595102-50611bbe0001-P5m3U7 Received: from SJ-EXCH-1.adaranet.com ([10.10.1.29]) by barracuda.adaranet.com with ESMTP id dvUHndTDQGjm2ENe for ; Wed, 15 Sep 2010 16:58:22 -0700 (PDT) X-Barracuda-Envelope-From: PMahan@adaranet.com Received: from mycroft.adaranet.com (10.10.24.100) by SJ-EXCH-1.adaranet.com (10.10.1.29) with Microsoft SMTP Server (TLS) id 8.1.240.5; Wed, 15 Sep 2010 16:58:22 -0700 Message-ID: <4C915EAC.8020509@adaranet.com> X-Barracuda-BBL-IP: nil Date: Wed, 15 Sep 2010 17:02:52 -0700 From: Patrick Mahan User-Agent: Thunderbird 2.0.0.23 (X11/20091021) MIME-Version: 1.0 To: X-ASG-Orig-Subj: [Fwd: Crash dump on HP Proliant G6 broken as of V8.0] Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: UNKNOWN[10.10.1.29] X-Barracuda-Start-Time: 1284595102 X-Barracuda-URL: http://172.16.10.203:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at adaranet.com Subject: [Fwd: Crash dump on HP Proliant G6 broken as of V8.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 01:12:28 -0000 Forwarding for a colleague, Patrick -------- Original Message -------- Subject: Crash dump on HP Proliant G6 broken as of V8.0 Date: Wed, 15 Sep 2010 11:53:16 -0700 From: Paul Heyman To: freebsd-hackers@freebsd.org CC: Patrick Mahan References: <32AB5C9615CC494997D9ABB1DB12783C024C8C5A95@SJ-EXCH-1.adaranet.com>,<32AB5C9615CC494997D9ABB1DB12783C024C8DE83F@SJ-EXCH-1.adaranet.com>,<32AB5C9615CC494997D9ABB1DB12783C024C8C5A9C@SJ-EXCH-1.adaranet.com> ALL, The crash dump worked fine in V7.3. I am debugging crash dump problem on a HP Proliant G6 which uses a SATA drive connected to a CISS Raid Controller. I have tried this on a x86 box using a non-raid ATA/SATA disk controller and it works well. I noticed that in V8.0 there is a new SCSI operating method. In the v7.3 version there was only CISS_TRANSPORT_METHOD_SIMPLE, but in v8.0 there has been CISS_TRANSPORT_METHOD_PERF method added. These methods have different function calls in ciss_poll_request. The dump comand starts with a call to dadump. This function will setup a struct ccb_scsiio structure. This is done by calling scsi_read_write. Then the meat of dump happens when it calls xpt_polled_action, which manages and simualtes interrupt functionality that is working fine. The disk operations work fine except during a crash dump. I have turned debug on for CISS and CAMDEBUG to debug this problem. In xpt_polled_action (cam_xpt.c) we get past the first polling loop at line 3013, as both devq->send_opening and dev->ccbq.dev_openings are > 0 ( 256 and 254 ). But we do get stuck in the second one at line 3025. We eventually time out setting start_ccb->ccb_h.status to CAM_CMD_TIMEOUT. The timeout is set with DA_DEFAULT_TIMEOUT (scsi_da.c) which is set to 60, and is used in the call to scsi_read_write. Here is the debug trace: Dumping 1240 MB: ciss_cam_action_io: XPT_SCSI_IO 0:0:0 ciss_get_request: called ciss_start: post command 150 tag 600 ciss_map_request: called ciss_request_map_helper: called ciss_cam_poll: called ciss_perf_done: completed command 150 ciss_perf_done: completed command 150 ciss_complete: called ciss_unmap_request: called ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK ciss_release_request: called ciss_complete: called ciss_unmap_request: called ciss0: WARNING: completing non-busy request ciss_cam_complete: called _ciss_report_request: called ciss_cam_complete: SCSI_STATUS_OK . . . . after about 60 seconds ciss0: WARNING: completing non-busy request ciss0: WARNING: completed command with no submitter ciss_unmap_request: called . . . This goes on forever Thanks Paul Paul Heyman pheyman@adaranetworks.com From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 02:37:36 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DEEBF106566C; Thu, 16 Sep 2010 02:37:36 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 880148FC14; Thu, 16 Sep 2010 02:37:36 +0000 (UTC) Received: by iwn34 with SMTP id 34so774086iwn.13 for ; Wed, 15 Sep 2010 19:37:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=9IksdNRsHV2L49om2obd0n78exIv00WjA9NH9KFrdhI=; b=K2iRRGLdVIM3JfuU29yySB+pxQc6UNVJ5CKnuOkpBqibln5vakfX0Ujd0z4cC0+IMg eV26FplHn2VSvSwGs7k0kccwzk6iow534HPYrVGV+4tMVSwdFil5zvxY5MFVFMRWHin3 g9RXfdtPFOC47kxIf7nLzcxwdtFS5LNXWQ338= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=VsHJ6HVq3oeWiO6F+ax7oy9E5OJNI8gAadm/lSlU5Vn3xryfpHczj6GN39RTezuo4P ccOS/O6dr0KprtclUjJG0lZ9NZUBDrbP59sMuXdBwl8pLafvGQUtcKPy39av9SF9eM1i swKx4m9NE8EoehvPkXeEN0VnSXOROiCl9RDEw= MIME-Version: 1.0 Received: by 10.231.152.143 with SMTP id g15mr2684794ibw.76.1284604655652; Wed, 15 Sep 2010 19:37:35 -0700 (PDT) Sender: yanegomi@gmail.com Received: by 10.231.11.133 with HTTP; Wed, 15 Sep 2010 19:37:35 -0700 (PDT) In-Reply-To: <20100916004902.GA46401@freebsd.org> References: <201009152143.o8FLhE9p022233@lurza.secnetix.de> <20100916004902.GA46401@freebsd.org> Date: Wed, 15 Sep 2010 19:37:35 -0700 X-Google-Sender-Auth: jIrTdal_NsRjXlp-o-lqsIx6KQU Message-ID: From: Garrett Cooper To: Alexander Best Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, mav@freebsd.org, Oliver Fromme Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 02:37:37 -0000 On Wed, Sep 15, 2010 at 5:49 PM, Alexander Best wrote= : > On Wed Sep 15 10, Oliver Fromme wrote: >> Warren Block wrote: >> =A0> [...] >> =A0> 8. Alexander Motin has an updated CAM version of the ATA system whi= ch >> =A0> will eventually replace the existing one. =A0In -CURRENT, anyway. = =A0He was >> =A0> kind enough to look at my event handler. =A0My understanding is tha= t he is >> =A0> looking at implementing the head parking/standby mechanism in that = new >> =A0> code. >> >> The patch below will work with the new CAM ATA driver >> (i.e. ada(4) disks). =A0It adds a sysctl, so you can switch >> the spin-down off if you're going to just reboot: >> # sysctl kern.cam.ada.spindown_shutdown=3D0 > > i haven't tested your patch yet, but i don't think deciding whether to sp= in > down the hdd should be decided merely from the sysctl value. > > the hdd should spindown when a shutdown has been issued and not spindown, > if a reboot has been issued. > > either people have the sysctl set to 1 in which case a reboot will cause = a > spindown (which isn't healthy for the hdd) > ...or people will set it to 0 in which case everything remains just the w= ay it > is. > > imo the sysctl should stay, but shuld have a different meaning. if it is = set to > 1 (which should be the default) a shutdown will issue a spindown; a reboo= t > won't. > if for some reason people want back the current behavior (no spindown eve= n > during a shutdown) they need to set it to 0. Agreed. Spinning down at reboot isn't smart and seems like a good way to kill a disk quicker. > deciding whether freebsd reboots or shuts down cannot be done from a scri= pt, > since users might use the reboot or halt commands in which case (if i'm n= ot > mistaken) all shutdown scripts get skipped. I'm not so sure of that statement, in particular because halt(8), reboot(8), and shutdown(8) send SIGTERM to processes (unless you use halt -q / reboot -q ... there might be some other scenarios I'm not envisioning here). Thanks, -Garrett From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 07:17:54 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9015E1065670; Thu, 16 Sep 2010 07:17:54 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 4EC888FC19; Thu, 16 Sep 2010 07:17:53 +0000 (UTC) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id 9F7D91FFC34; Thu, 16 Sep 2010 07:17:52 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id 70D5884550; Thu, 16 Sep 2010 09:17:52 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Garrett Cooper References: <201009152143.o8FLhE9p022233@lurza.secnetix.de> <20100916004902.GA46401@freebsd.org> Date: Thu, 16 Sep 2010 09:17:52 +0200 In-Reply-To: (Garrett Cooper's message of "Wed, 15 Sep 2010 19:37:35 -0700") Message-ID: <86mxri17j3.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Alexander Best , mav@freebsd.org, Oliver Fromme , freebsd-hackers@freebsd.org Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 07:17:54 -0000 Garrett Cooper writes: > Agreed. Spinning down at reboot isn't smart and seems like a good way > to kill a disk quicker. *not* spinning down at halt is far worse. Most modern disks are rated for hundreds of thousands of load-unload cycles, but far fewer emergency unloads (which is what happens when the drive loses power while still spinning). DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 07:54:19 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 292FA1065673 for ; Thu, 16 Sep 2010 07:54:19 +0000 (UTC) (envelope-from cronfy@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 828AB8FC1B for ; Thu, 16 Sep 2010 07:54:18 +0000 (UTC) Received: by bwz15 with SMTP id 15so1692964bwz.13 for ; Thu, 16 Sep 2010 00:54:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=OYHEdS863mPnBUZrwgDQFTXSraAuIWi8EB+7DDTN0Lo=; b=jnVYTQYFL7fiuHbzo84ckt3cSxArUrYUejMpKRxHuWQmT1MlH/12wCFkFw7lSKxIj9 To/tcLuUC4fB1yIKjobXZasTcKllZ8sh9ItqZl+LlkHIWWBxgfXETxcPxU8Mlx1wayTb ZEvt6yjhs/CPilWBlOm29SgIpqiUIsdsUJx6s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=ILpALBYbfO1m0sLFVOjl6RMBzr8bFUFBjsBIu04Q+tbR4yNa0fjX/2u5BIcARLzbKo qlJD/1uK4DbgmVT2FHI9VGlv9VBFCMn/I0r8Ubcg+LBYl4KsZknA068T4lEm0uUu7up3 /ET5wEXpdpPwGifwlZJRx9oJeUP5F68t01sPs= Received: by 10.204.82.18 with SMTP id z18mr2231732bkk.125.1284623657408; Thu, 16 Sep 2010 00:54:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.99.197 with HTTP; Thu, 16 Sep 2010 00:53:47 -0700 (PDT) In-Reply-To: <201009151509.49728.jhb@freebsd.org> References: <201009151509.49728.jhb@freebsd.org> From: cronfy Date: Thu, 16 Sep 2010 11:53:47 +0400 Message-ID: To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: is vfs.lookup_shared unsafe in 7.3? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 07:54:19 -0000 >> Hello, >> >> Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA > >> 40, very slow lstat() at these moments, looks like some kind of lock >> contention) I enabled vfs.lookup_shared=3D1 on two servers today. One is >> FreeBSD-7.3 kernel csup'ed and built Sep =A09 2010 and other is >> FreeBSD-7.3 csup'ed and built Jul 16 2010. >> >> The server with more fresh kernel is running nice and does not show >> high load anymore. But on the second server it did not help. More, >> after a few hours of work with vfs.lookup_shared=3D1 I noticed processes >> stucked in "ufs" state. I tried to kill them with no luck. Disabling >> vfs.lookup_shared freezed the whole system. >> >> So, is vfs.lookup_shared=3D1 unsafe in 7.3? Did it become more stable >> between 16 Jul and 9 Sep (is it the reason why first system is still >> running?), or should I expect that it will freeze in a near time too? >> >> Thanks in advance! > > No, 7.3 has a bug that can cause these hangs that is probably made worse = by > vfs.lookup_shared=3D1, but can occur even if it is disabled. =A0You want > these fixes applied (in order, one of them reverts part of another): Thank you for the fix and for the explanation, that's exactly what I wanted to know. Just to be sure: do these patches completely fix the bug with hangs (even without vfs.lookup_shared=3D1)? --=20 // cronfy From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 08:41:24 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEBC11065673; Thu, 16 Sep 2010 08:41:24 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 4DE158FC08; Thu, 16 Sep 2010 08:41:24 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8G8f7CZ047727; Thu, 16 Sep 2010 10:41:23 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8G8f7Q2047725; Thu, 16 Sep 2010 10:41:07 +0200 (CEST) (envelope-from olli) From: Oliver Fromme Message-Id: <201009160841.o8G8f7Q2047725@lurza.secnetix.de> To: arundel@FreeBSD.ORG (Alexander Best) Date: Thu, 16 Sep 2010 10:41:07 +0200 (CEST) In-Reply-To: <20100916004902.GA46401@freebsd.org> X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Thu, 16 Sep 2010 10:41:23 +0200 (CEST) Cc: freebsd-hackers@FreeBSD.ORG, mav@FreeBSD.ORG Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 08:41:25 -0000 Alexander Best wrote: > On Wed Sep 15 10, Oliver Fromme wrote: > > Warren Block wrote: > > > [...] > > > 8. Alexander Motin has an updated CAM version of the ATA system which > > > will eventually replace the existing one. In -CURRENT, anyway. He was > > > kind enough to look at my event handler. My understanding is that he is > > > looking at implementing the head parking/standby mechanism in that new > > > code. > > > > The patch below will work with the new CAM ATA driver > > (i.e. ada(4) disks). It adds a sysctl, so you can switch > > the spin-down off if you're going to just reboot: > > # sysctl kern.cam.ada.spindown_shutdown=0 > > i haven't tested your patch yet, but i don't think deciding whether to spin > down the hdd should be decided merely from the sysctl value. It was the most simple and least intrusive way to introduce some means to switch it on and off. Of course there might be better ways to do it. You're welcome to submit your own patch. > the hdd should spindown when a shutdown has been issued and not spindown, > if a reboot has been issued. Right. That's why my shutdown wrapper script sets the sysctl to 0 when the -r option is present (I've got that wrapper script for ages, for different reasons). Also, there are cases where it is completely impossible to decide automatically whether the disks should be spun down or not. For example, if the admin issues a shutdown -h (halt), there's no way for the OS to know in advance whether the admin is going to switch the machine off or reboot to multi-user. So there must be a way for the user to forcibly enable/disable the spindown feature. I think a sysctl is the most appropriate way to do that, isn't it? Actually, my plan is to have a mask of two bits for the sysctl (the default value would be 3): - bit 0: enable (1) or disable (0) spindown - bit 1: automatic (1) or manual (0) setting With the default setting (i.e. bit 1 == 1), at shutdown time some facility would look at the reboot(2) "howto" flags and then set bit 0 to either 0 or 1. There are several ways where to handle that. For example, init(8) could be modified to pass the "howto" value to rc.shutdown (which could be useful for other purposes, too). Then a standard rc.d script could handle the spindown sysctl. The advantage of that solution would be maximum flexibility, because the actual logic is implemented in an rc.d script. > deciding whether freebsd reboots or shuts down cannot be done from a script, > since users might use the reboot or halt commands in which case (if i'm not > mistaken) all shutdown scripts get skipped. Right, which is why it is a rather bad idea to use halt(8) or reboot(8), except in an emergency. Actually I think the manpages and handbook should strongly discourage it, and recommend to use shutdown(8) or init(8) instead, both of which send a signal to PID 1 by default, so rc.shutdown is executed properly. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 11:59:52 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6576F1065672 for ; Thu, 16 Sep 2010 11:59:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 32C408FC1E for ; Thu, 16 Sep 2010 11:59:52 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id C17A246B8A; Thu, 16 Sep 2010 07:59:51 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6EE038A03C; Thu, 16 Sep 2010 07:59:50 -0400 (EDT) From: John Baldwin To: cronfy Date: Thu, 16 Sep 2010 07:59:49 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <201009151509.49728.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009160759.49179.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 16 Sep 2010 07:59:50 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hackers@freebsd.org Subject: Re: is vfs.lookup_shared unsafe in 7.3? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 11:59:52 -0000 On Thursday, September 16, 2010 3:53:47 am cronfy wrote: > >> Hello, > >> > >> Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA > > >> 40, very slow lstat() at these moments, looks like some kind of lock > >> contention) I enabled vfs.lookup_shared=1 on two servers today. One is > >> FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is > >> FreeBSD-7.3 csup'ed and built Jul 16 2010. > >> > >> The server with more fresh kernel is running nice and does not show > >> high load anymore. But on the second server it did not help. More, > >> after a few hours of work with vfs.lookup_shared=1 I noticed processes > >> stucked in "ufs" state. I tried to kill them with no luck. Disabling > >> vfs.lookup_shared freezed the whole system. > >> > >> So, is vfs.lookup_shared=1 unsafe in 7.3? Did it become more stable > >> between 16 Jul and 9 Sep (is it the reason why first system is still > >> running?), or should I expect that it will freeze in a near time too? > >> > >> Thanks in advance! > > > > No, 7.3 has a bug that can cause these hangs that is probably made worse by > > vfs.lookup_shared=1, but can occur even if it is disabled. You want > > these fixes applied (in order, one of them reverts part of another): > > Thank you for the fix and for the explanation, that's exactly what I > wanted to know. Just to be sure: do these patches completely fix the > bug with hangs (even without vfs.lookup_shared=1)? Yes. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 12:38:02 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A128210656A5 for ; Thu, 16 Sep 2010 12:38:02 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 72E6F8FC19 for ; Thu, 16 Sep 2010 12:38:02 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 0B43B46B5C; Thu, 16 Sep 2010 08:38:02 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 323D38A03C; Thu, 16 Sep 2010 08:38:01 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Thu, 16 Sep 2010 08:15:18 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <4C915E4F.9030006@adaranet.com> In-Reply-To: <4C915E4F.9030006@adaranet.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201009160815.18679.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 16 Sep 2010 08:38:01 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Patrick Mahan Subject: Re: odd issues with DDB vs GDB X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 12:38:02 -0000 On Wednesday, September 15, 2010 8:01:19 pm Patrick Mahan wrote: > All, > > I am trying to debug a system hang occurring on my HP Proliant G6 running some of our > kernel software. I am seeing that under certain test loads, the system will hang-up > complete, no keyboard, no console, etc. I suspect it is some of the kernel code that > I have inherited that contains a lot of locking (lots of data structure, each having > their own mutex lock (sleepable)). You need to use 'kgdb' rather than 'gdb' on kernel.debug. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 12:38:07 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5DA661065670; Thu, 16 Sep 2010 12:38:06 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5F75C8FC0C; Thu, 16 Sep 2010 12:38:06 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 01BE346B7E; Thu, 16 Sep 2010 08:38:06 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 85BDF8A04F; Thu, 16 Sep 2010 08:38:04 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Thu, 16 Sep 2010 08:22:24 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100916010120.GA49997@freebsd.org> In-Reply-To: <20100916010120.GA49997@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201009160822.24460.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 16 Sep 2010 08:38:05 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Alexander Best Subject: Re: traling whitespace in CFLAGS if make.conf:CPUTYPE is not defined/empty X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 12:38:07 -0000 On Wednesday, September 15, 2010 9:01:20 pm Alexander Best wrote: > hi there, > > after discovering PR #114082 i noticed that with CPUTYPE not being defined in > make.conf, `make -VCFLAGS` reports a trailing whitespace for CFLAGS. > the reason for this is that ${_CPUCFLAGS} gets added to CFLAGS even if it's > empty. > > the following patch should take care of the problem. i also added the same > logik to COPTFLAGS. although i wasn't able to trigger the trailing whitespace, > it should still introduce a cleaner behaviour. Does the trailing whitespace break anything? In the past we have had a non-empty default CPU CFLAGS (e.g. using '-mtune=pentiumpro' on i386 at one point IIRC) which this change would break. Unless the trailing whitespace is causing non-cosmetic problems I'd probably just leave it as it is. Also, if we were to go with this approach, I would not have changed kern.pre.mk at all, but set both NO_CPU_CFLAGS and NO_CPU_COPTFLAGS in bsd.cpu.mk when CPUTYPE was empty. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 12:41:24 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 305861065698; Thu, 16 Sep 2010 12:41:24 +0000 (UTC) Date: Thu, 16 Sep 2010 12:41:24 +0000 From: Alexander Best To: John Baldwin Message-ID: <20100916124124.GA52106@freebsd.org> References: <20100916010120.GA49997@freebsd.org> <201009160822.24460.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201009160822.24460.jhb@freebsd.org> Cc: freebsd-hackers@freebsd.org Subject: Re: traling whitespace in CFLAGS if make.conf:CPUTYPE is not defined/empty X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 12:41:24 -0000 On Thu Sep 16 10, John Baldwin wrote: > On Wednesday, September 15, 2010 9:01:20 pm Alexander Best wrote: > > hi there, > > > > after discovering PR #114082 i noticed that with CPUTYPE not being defined in > > make.conf, `make -VCFLAGS` reports a trailing whitespace for CFLAGS. > > the reason for this is that ${_CPUCFLAGS} gets added to CFLAGS even if it's > > empty. > > > > the following patch should take care of the problem. i also added the same > > logik to COPTFLAGS. although i wasn't able to trigger the trailing whitespace, > > it should still introduce a cleaner behaviour. > > Does the trailing whitespace break anything? In the past we have had a > non-empty default CPU CFLAGS (e.g. using '-mtune=pentiumpro' on i386 at one > point IIRC) which this change would break. Unless the trailing whitespace > is causing non-cosmetic problems I'd probably just leave it as it is. the PR claims that a few ports are having problems with trailing whitespaces during ./configure, but personally i haven't experienced any problems. however i don't use the port system a lot so i'm not really able to comment on that. cheers. alex > > Also, if we were to go with this approach, I would not have changed > kern.pre.mk at all, but set both NO_CPU_CFLAGS and NO_CPU_COPTFLAGS in > bsd.cpu.mk when CPUTYPE was empty. > > -- > John Baldwin -- a13x From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 14:10:00 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 148871065672; Thu, 16 Sep 2010 14:10:00 +0000 (UTC) (envelope-from tijl@coosemans.org) Received: from mailrelay001.isp.belgacom.be (mailrelay001.isp.belgacom.be [195.238.6.51]) by mx1.freebsd.org (Postfix) with ESMTP id 7B5728FC08; Thu, 16 Sep 2010 14:09:59 +0000 (UTC) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aj4FAIe7kUxbsVM9/2dsb2JhbACUMY1icsFXhUEE Received: from 61.83-177-91.adsl-dyn.isp.belgacom.be (HELO kalimero.tijl.coosemans.org) ([91.177.83.61]) by relay.skynet.be with ESMTP; 16 Sep 2010 15:40:09 +0200 Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org [127.0.0.1]) by kalimero.tijl.coosemans.org (8.14.4/8.14.4) with ESMTP id o8GDe8sH004587; Thu, 16 Sep 2010 15:40:08 +0200 (CEST) (envelope-from tijl@coosemans.org) From: Tijl Coosemans To: freebsd-hackers@freebsd.org Date: Thu, 16 Sep 2010 15:40:01 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.1-PRERELEASE; KDE/4.4.5; i386; ; ) References: <201009160841.o8G8f7Q2047725@lurza.secnetix.de> In-Reply-To: <201009160841.o8G8f7Q2047725@lurza.secnetix.de> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart10116848.qj6mstj0Su"; protocol="application/pgp-signature"; micalg=pgp-sha256 Content-Transfer-Encoding: 7bit Message-Id: <201009161540.08029.tijl@coosemans.org> Cc: Alexander Best , mav@freebsd.org, Oliver Fromme Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 14:10:00 -0000 --nextPart10116848.qj6mstj0Su Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On Thursday 16 September 2010 10:41:07 Oliver Fromme wrote: > Alexander Best wrote: >> On Wed Sep 15 10, Oliver Fromme wrote: >>> The patch below will work with the new CAM ATA driver >>> (i.e. ada(4) disks). It adds a sysctl, so you can switch >>> the spin-down off if you're going to just reboot: >>> # sysctl kern.cam.ada.spindown_shutdown=3D0 >> >> the hdd should spindown when a shutdown has been issued and not spindown, >> if a reboot has been issued. > > Right. That's why my shutdown wrapper script sets the sysctl > to 0 when the -r option is present (I've got that wrapper > script for ages, for different reasons). >=20 > Also, there are cases where it is completely impossible to > decide automatically whether the disks should be spun down > or not. For example, if the admin issues a shutdown -h > (halt), there's no way for the OS to know in advance whether > the admin is going to switch the machine off or reboot to > multi-user. So there must be a way for the user to forcibly > enable/disable the spindown feature. I think a sysctl is > the most appropriate way to do that, isn't it? I would just spin down the disk in case of a halt. An unwanted spin down is harmless compared to an emergency shutdown and usually the intention is to power off rather than reboot. Part of your patch modifies ada_shutdown. That function already gets the reboot(2) howto flags passed to it, so you could test for (howto & (RB_HALT | RB_POWEROFF)) !=3D 0 before issuing the STANDBY command. There's no need to make this more complicated with a sysctl that can override this in my opinion. Also command2 should be command1 in this line: + if (cgd->ident_data.support.command2 & ATA_SUPPORT_POWERMGT) --nextPart10116848.qj6mstj0Su Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iF4EABEIAAYFAkySHjcACgkQfoCS2CCgtispsgD+LN0j62if3uUa43YFwYM0CeQv NPOutTmV6xb7ynDC3JsA/2abG7cabPUjYNCbXzQWwjjvOwSM3eDDS9aq/RA9R0Ov =tIod -----END PGP SIGNATURE----- --nextPart10116848.qj6mstj0Su-- From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 14:10:42 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A7B81065670; Thu, 16 Sep 2010 14:10:42 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 764198FC15; Thu, 16 Sep 2010 14:10:41 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8GEAMuA029068; Thu, 16 Sep 2010 16:10:37 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8GEAM1n029066; Thu, 16 Sep 2010 16:10:22 +0200 (CEST) (envelope-from olli) From: Oliver Fromme Message-Id: <201009161410.o8GEAM1n029066@lurza.secnetix.de> To: tijl@coosemans.org (Tijl Coosemans) Date: Thu, 16 Sep 2010 16:10:22 +0200 (CEST) In-Reply-To: <201009161540.08029.tijl@coosemans.org> X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Thu, 16 Sep 2010 16:10:37 +0200 (CEST) Cc: freebsd-hackers@freebsd.org, mav@freebsd.org, Alexander Best Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 14:10:42 -0000 Tijl Coosemans wrote: > On Thursday 16 September 2010 10:41:07 Oliver Fromme wrote: > > Also, there are cases where it is completely impossible to > > decide automatically whether the disks should be spun down > > or not. For example, if the admin issues a shutdown -h > > (halt), there's no way for the OS to know in advance whether > > the admin is going to switch the machine off or reboot to > > multi-user. So there must be a way for the user to forcibly > > enable/disable the spindown feature. I think a sysctl is > > the most appropriate way to do that, isn't it? > > I would just spin down the disk in case of a halt. An unwanted spin > down is harmless compared to an emergency shutdown and usually the > intention is to power off rather than reboot. Is it? When I intend to power-off, I use shutdown -p, not shutdown -h. Quite often (but not always) when I halt a machine, I'm going to reboot to multi-user, not power off. In that case I certainly wouldn't want to spin the drives down and have them spun up immediately afterwards. I don't think that wear&tear caused by that procedure is completely insignificant (although it's certainly less of a problem than emergency unloads). For that reason I definitely want to have a way to disable the spindown function manually. > Part of your patch modifies ada_shutdown. That function already gets > the reboot(2) howto flags passed to it, so you could test for > (howto & (RB_HALT | RB_POWEROFF)) != 0 before issuing the STANDBY > command. Right, good point. I didn't notice because the shutdown function in ad(4) doesn't get the howto flag, so I assumed (without checking) that ada(4) doesn't get it either. > There's no need to make this more complicated with a sysctl > that can override this in my opinion. I'm afraid I have to disagree (see above). Apart from that, there's nothing complicated at all about a sysctl. > Also command2 should be command1 in this line: > > + if (cgd->ident_data.support.command2 & ATA_SUPPORT_POWERMGT) Oops ... You're right. Thanks for pointing that out. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "I made up the term 'object-oriented', and I can tell you I didn't have C++ in mind." -- Alan Kay, OOPSLA '97 From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 14:57:22 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AF75A1065679; Thu, 16 Sep 2010 14:57:22 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id 4E7EB8FC21; Thu, 16 Sep 2010 14:57:22 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.4/8.14.4) with ESMTP id o8GEg04S068467; Thu, 16 Sep 2010 08:42:00 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.4/8.14.4/Submit) with ESMTP id o8GEg0Fi068464; Thu, 16 Sep 2010 08:42:00 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Thu, 16 Sep 2010 08:42:00 -0600 (MDT) From: Warren Block To: Alexander Best In-Reply-To: <20100916004902.GA46401@freebsd.org> Message-ID: References: <201009152143.o8FLhE9p022233@lurza.secnetix.de> <20100916004902.GA46401@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.6 (wonkity.com [127.0.0.1]); Thu, 16 Sep 2010 08:42:00 -0600 (MDT) Cc: freebsd-hackers@FreeBSD.ORG, mav@FreeBSD.ORG, Oliver Fromme Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 14:57:22 -0000 On Thu, 16 Sep 2010, Alexander Best wrote: > On Wed Sep 15 10, Oliver Fromme wrote: >> Warren Block wrote: >> > [...] >> > 8. Alexander Motin has an updated CAM version of the ATA system which >> > will eventually replace the existing one. In -CURRENT, anyway. He was >> > kind enough to look at my event handler. My understanding is that he is >> > looking at implementing the head parking/standby mechanism in that new >> > code. >> >> The patch below will work with the new CAM ATA driver >> (i.e. ada(4) disks). It adds a sysctl, so you can switch >> the spin-down off if you're going to just reboot: >> # sysctl kern.cam.ada.spindown_shutdown=0 > > i haven't tested your patch yet, but i don't think deciding whether to spin > down the hdd should be decided merely from the sysctl value. > > the hdd should spindown when a shutdown has been issued and not spindown, > if a reboot has been issued. It's been a while, but the problem I found when comparing the NetBSD code was that there didn't appear to be a way to tell from within the FreeBSD driver whether it was a shutdown or reboot. From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 15:06:46 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3072106567A for ; Thu, 16 Sep 2010 15:06:46 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from agogare.doit.wisc.edu (agogare.doit.wisc.edu [144.92.197.211]) by mx1.freebsd.org (Postfix) with ESMTP id 7AFD28FC22 for ; Thu, 16 Sep 2010 15:06:46 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from avs-daemon.smtpauth2.wiscmail.wisc.edu by smtpauth2.wiscmail.wisc.edu (Sun Java(tm) System Messaging Server 7u2-7.05 32bit (built Jul 30 2009)) id <0L8U00600HZ93V00@smtpauth2.wiscmail.wisc.edu> for freebsd-hackers@freebsd.org; Thu, 16 Sep 2010 10:06:45 -0500 (CDT) Received: from comporellon.tachypleus.net ([unknown] [76.210.68.10]) by smtpauth2.wiscmail.wisc.edu (Sun Java(tm) System Messaging Server 7u2-7.05 32bit (built Jul 30 2009)) with ESMTPSA id <0L8U00HVUHZ8TI40@smtpauth2.wiscmail.wisc.edu> for freebsd-hackers@freebsd.org; Thu, 16 Sep 2010 10:06:45 -0500 (CDT) Date: Thu, 16 Sep 2010 10:06:44 -0500 From: Nathan Whitehorn In-reply-to: To: freebsd-hackers@freebsd.org Message-id: <4C923284.20304@freebsd.org> X-Spam-Report: AuthenticatedSender=yes, SenderIP=76.210.68.10 X-Spam-PmxInfo: Server=avs-9, Version=5.6.0.2009776, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2010.9.16.145715, SenderIP=76.210.68.10 X-Enigmail-Version: 1.0.1 References: <201009152143.o8FLhE9p022233@lurza.secnetix.de> <20100916004902.GA46401@freebsd.org> User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.12) Gecko/20100909 Thunderbird/3.0.7 Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 15:06:46 -0000 On 09/16/10 09:42, Warren Block wrote: > On Thu, 16 Sep 2010, Alexander Best wrote: > >> On Wed Sep 15 10, Oliver Fromme wrote: >>> Warren Block wrote: >>> > [...] >>> > 8. Alexander Motin has an updated CAM version of the ATA system which >>> > will eventually replace the existing one. In -CURRENT, anyway. >>> He was >>> > kind enough to look at my event handler. My understanding is that >>> he is >>> > looking at implementing the head parking/standby mechanism in that >>> new >>> > code. >>> >>> The patch below will work with the new CAM ATA driver >>> (i.e. ada(4) disks). It adds a sysctl, so you can switch >>> the spin-down off if you're going to just reboot: >>> # sysctl kern.cam.ada.spindown_shutdown=0 >> >> i haven't tested your patch yet, but i don't think deciding whether >> to spin >> down the hdd should be decided merely from the sysctl value. >> >> the hdd should spindown when a shutdown has been issued and not >> spindown, >> if a reboot has been issued. > > It's been a while, but the problem I found when comparing the NetBSD > code was that there didn't appear to be a way to tell from within the > FreeBSD driver whether it was a shutdown or reboot. Register a shutdown event handler? The second argument can be tested against RB_HALT to determine what is happening. -Nathan From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 15:42:28 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2DD421065781; Thu, 16 Sep 2010 15:42:28 +0000 (UTC) (envelope-from tijl@coosemans.org) Received: from mailrelay004.isp.belgacom.be (mailrelay004.isp.belgacom.be [195.238.6.170]) by mx1.freebsd.org (Postfix) with ESMTP id 35FB58FC16; Thu, 16 Sep 2010 15:42:26 +0000 (UTC) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aj4FADnXkUxbsVM9/2dsb2JhbACUJI1icsJuhUEE Received: from 61.83-177-91.adsl-dyn.isp.belgacom.be (HELO kalimero.tijl.coosemans.org) ([91.177.83.61]) by relay.skynet.be with ESMTP; 16 Sep 2010 17:42:25 +0200 Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org [127.0.0.1]) by kalimero.tijl.coosemans.org (8.14.4/8.14.4) with ESMTP id o8GFgOQm005501; Thu, 16 Sep 2010 17:42:24 +0200 (CEST) (envelope-from tijl@coosemans.org) From: Tijl Coosemans To: freebsd-hackers@freebsd.org Date: Thu, 16 Sep 2010 17:42:18 +0200 User-Agent: KMail/1.13.5 (FreeBSD/8.1-PRERELEASE; KDE/4.4.5; i386; ; ) References: <201009161410.o8GEAM1n029066@lurza.secnetix.de> In-Reply-To: <201009161410.o8GEAM1n029066@lurza.secnetix.de> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1394869.hKoHA25AgX"; protocol="application/pgp-signature"; micalg=pgp-sha256 Content-Transfer-Encoding: 7bit Message-Id: <201009161742.24228.tijl@coosemans.org> Cc: Alexander Best , mav@freebsd.org, Oliver Fromme Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 15:42:28 -0000 --nextPart1394869.hKoHA25AgX Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On Thursday 16 September 2010 16:10:22 Oliver Fromme wrote: > Tijl Coosemans wrote: >> I would just spin down the disk in case of a halt. An unwanted spin >> down is harmless compared to an emergency shutdown and usually the >> intention is to power off rather than reboot. >=20 > Is it? When I intend to power-off, I use shutdown -p, not > shutdown -h. Quite often (but not always) when I halt a > machine, I'm going to reboot to multi-user, not power off. Hmm, I suppose support for power off is ubiquitous nowadays. It used to be that halt meant: bring the system in a state where we can safely cut the power. In that case it makes sense to let halt spin down the disks. If you intend to reboot why not explicitly reboot rather than halt? Also, to go from single to multi user mode you can just exit(1) the shell. > In that case I certainly wouldn't want to spin the drives > down and have them spun up immediately afterwards. I don't > think that wear&tear caused by that procedure is completely > insignificant (although it's certainly less of a problem > than emergency unloads). >=20 > For that reason I definitely want to have a way to disable > the spindown function manually. Ok, I'm soft on the sysctl really, it wouldn't hurt anyone. Although, if the intention is to just override the default behaviour at the time of shutdown you might as well just add an option to halt(8). A "don't spin down disks" option would fit in with the other options there. --nextPart1394869.hKoHA25AgX Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iF4EABEIAAYFAkySOt8ACgkQfoCS2CCgtivkLwD/cjNQVg2WjEC0GxsxQBQZZdLW tGouE291l49ypQZ4DGIA/j9rGCo+idLc+CeGLeYhG7X1ES9Z8d4zSZqwg3Nl5mpp =XBCi -----END PGP SIGNATURE----- --nextPart1394869.hKoHA25AgX-- From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 16:19:29 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1777106566C; Thu, 16 Sep 2010 16:19:29 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2BAA38FC08; Thu, 16 Sep 2010 16:19:28 +0000 (UTC) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id o8GGJA1T035380; Thu, 16 Sep 2010 18:19:26 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id o8GGJAmv035378; Thu, 16 Sep 2010 18:19:10 +0200 (CEST) (envelope-from olli) From: Oliver Fromme Message-Id: <201009161619.o8GGJAmv035378@lurza.secnetix.de> To: tijl@coosemans.org (Tijl Coosemans) Date: Thu, 16 Sep 2010 18:19:10 +0200 (CEST) In-Reply-To: <201009161742.24228.tijl@coosemans.org> X-Mailer: ELM [version 2.5 PL8] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.5 (lurza.secnetix.de [127.0.0.1]); Thu, 16 Sep 2010 18:19:26 +0200 (CEST) Cc: freebsd-hackers@freebsd.org, mav@freebsd.org, Alexander Best Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 16:19:30 -0000 Tijl Coosemans wrote: > On Thursday 16 September 2010 16:10:22 Oliver Fromme wrote: > > Tijl Coosemans wrote: > > > I would just spin down the disk in case of a halt. An unwanted spin > > > down is harmless compared to an emergency shutdown and usually the > > > intention is to power off rather than reboot. > > > > Is it? When I intend to power-off, I use shutdown -p, not > > shutdown -h. Quite often (but not always) when I halt a > > machine, I'm going to reboot to multi-user, not power off. > > Hmm, I suppose support for power off is ubiquitous nowadays. It used to > be that halt meant: bring the system in a state where we can safely cut > the power. In that case it makes sense to let halt spin down the disks. > If you intend to reboot why not explicitly reboot rather than halt? For example, I use shutdown -h in order to swap disks that are not hot-swappable, or other kind of hardware work that can be done while the machine is switched on. Of course, in that particular case the disk which is about to be swapped out should be spun down, while the others should not. But that's not a problem because I can use atacontrol(8) and camcontrol(8) to spin down a specific disk drive manually. > Also, to go from single to multi user mode you can just exit(1) the > shell. Yes, of course, that's a different matter. I've updated the patch for ada(4). It includes a bug fix (command1 vs. command2) and uses the howto flags passed to the shutdown function. Thanks again for pointing these out. Best regards Oliver --- ata_da.c.orig 2010-05-23 18:16:33.000000000 +0200 +++ ata_da.c 2010-09-16 17:21:10.000000000 +0200 @@ -42,6 +42,7 @@ #include #include #include +#include #include #endif /* _KERNEL */ @@ -79,7 +80,8 @@ ADA_FLAG_CAN_TRIM = 0x080, ADA_FLAG_OPEN = 0x100, ADA_FLAG_SCTX_INIT = 0x200, - ADA_FLAG_CAN_CFA = 0x400 + ADA_FLAG_CAN_CFA = 0x400, + ADA_FLAG_CAN_POWERMGT = 0x800 } ada_flags; typedef enum { @@ -180,6 +182,10 @@ #define ADA_DEFAULT_SEND_ORDERED 1 #endif +#ifndef ADA_DEFAULT_SPINDOWN_SHUTDOWN +#define ADA_DEFAULT_SPINDOWN_SHUTDOWN 1 +#endif + /* * Most platforms map firmware geometry to actual, but some don't. If * not overridden, default to nothing. @@ -191,6 +197,7 @@ static int ada_retry_count = ADA_DEFAULT_RETRY; static int ada_default_timeout = ADA_DEFAULT_TIMEOUT; static int ada_send_ordered = ADA_DEFAULT_SEND_ORDERED; +static int ada_spindown_shutdown = ADA_DEFAULT_SPINDOWN_SHUTDOWN; SYSCTL_NODE(_kern_cam, OID_AUTO, ada, CTLFLAG_RD, 0, "CAM Direct Access Disk driver"); @@ -203,6 +210,9 @@ SYSCTL_INT(_kern_cam_ada, OID_AUTO, ada_send_ordered, CTLFLAG_RW, &ada_send_ordered, 0, "Send Ordered Tags"); TUNABLE_INT("kern.cam.ada.ada_send_ordered", &ada_send_ordered); +SYSCTL_INT(_kern_cam_ada, OID_AUTO, spindown_shutdown, CTLFLAG_RW, + &ada_spindown_shutdown, 0, "Spin down upon shutdown"); +TUNABLE_INT("kern.cam.ada.spindown_shutdown", &ada_spindown_shutdown); /* * ADA_ORDEREDTAG_INTERVAL determines how often, relative @@ -665,6 +675,8 @@ softc->flags |= ADA_FLAG_CAN_48BIT; if (cgd->ident_data.support.command2 & ATA_SUPPORT_FLUSHCACHE) softc->flags |= ADA_FLAG_CAN_FLUSHCACHE; + if (cgd->ident_data.support.command1 & ATA_SUPPORT_POWERMGT) + softc->flags |= ADA_FLAG_CAN_POWERMGT; if (cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ && cgd->inq_flags & SID_CmdQue) softc->flags |= ADA_FLAG_CAN_NCQ; @@ -1222,6 +1234,58 @@ /*getcount_only*/0); cam_periph_unlock(periph); } + + if (ada_spindown_shutdown == 0 || + (howto & (RB_HALT | RB_POWEROFF)) == 0) + return; + + DELAY(500000); + + TAILQ_FOREACH(periph, &adadriver.units, unit_links) { + union ccb ccb; + + /* If we paniced with lock held - not recurse here. */ + if (cam_periph_owned(periph)) + continue; + cam_periph_lock(periph); + softc = (struct ada_softc *)periph->softc; + /* + * We only spin-down the drive if it is capable of it.. + */ + if ((softc->flags & ADA_FLAG_CAN_POWERMGT) == 0) { + cam_periph_unlock(periph); + continue; + } + + /* XXX Hide this behind bootverbose? */ + xpt_print(periph->path, "spin-down\n"); + + xpt_setup_ccb(&ccb.ccb_h, periph->path, CAM_PRIORITY_NORMAL); + + ccb.ccb_h.ccb_state = ADA_CCB_DUMP; + cam_fill_ataio(&ccb.ataio, + 1, + adadone, + CAM_DIR_NONE, + 0, + NULL, + 0, + ada_default_timeout*1000); + + ata_28bit_cmd(&ccb.ataio, ATA_STANDBY_IMMEDIATE, 0, 0, 0); + xpt_polled_action(&ccb); + + if ((ccb.ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) + xpt_print(periph->path, "Spin-down disk failed\n"); + + if ((ccb.ccb_h.status & CAM_DEV_QFRZN) != 0) + cam_release_devq(ccb.ccb_h.path, + /*relsim_flags*/0, + /*reduction*/0, + /*timeout*/0, + /*getcount_only*/0); + cam_periph_unlock(periph); + } } #endif /* _KERNEL */ -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "I have stopped reading Stephen King novels. Now I just read C code instead." -- Richard A. O'Keefe From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 17:32:28 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC3FE106567A for ; Thu, 16 Sep 2010 17:32:28 +0000 (UTC) (envelope-from PMahan@adaranet.com) Received: from barracuda.adaranet.com (smtp.adaranet.com [72.5.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 7C3618FC18 for ; Thu, 16 Sep 2010 17:32:28 +0000 (UTC) X-ASG-Debug-ID: 1284657190-506121d90001-P5m3U7 Received: from SJ-EXCH-1.adaranet.com ([10.10.1.29]) by barracuda.adaranet.com with ESMTP id uITb2gqaTP51pqht; Thu, 16 Sep 2010 10:13:10 -0700 (PDT) X-Barracuda-Envelope-From: PMahan@adaranet.com Received: from mycroft.adaranet.com (10.10.24.100) by SJ-EXCH-1.adaranet.com (10.10.1.29) with Microsoft SMTP Server (TLS) id 8.1.240.5; Thu, 16 Sep 2010 10:13:10 -0700 Message-ID: <4C925133.4060309@adaranet.com> X-Barracuda-BBL-IP: nil Date: Thu, 16 Sep 2010 10:17:39 -0700 From: Patrick Mahan User-Agent: Thunderbird 2.0.0.23 (X11/20091021) MIME-Version: 1.0 To: John Baldwin X-ASG-Orig-Subj: Re: odd issues with DDB vs GDB References: <4C915E4F.9030006@adaranet.com> <201009160815.18679.jhb@freebsd.org> In-Reply-To: <201009160815.18679.jhb@freebsd.org> Content-Type: text/plain; charset="iso-8859-15"; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: UNKNOWN[10.10.1.29] X-Barracuda-Start-Time: 1284657190 X-Barracuda-URL: http://172.16.10.203:8000/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at adaranet.com Cc: "freebsd-hackers@freebsd.org" Subject: Re: odd issues with DDB vs GDB X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 17:32:28 -0000 John Baldwin wrote: > On Wednesday, September 15, 2010 8:01:19 pm Patrick Mahan wrote: >> All, >> >> I am trying to debug a system hang occurring on my HP Proliant G6 running some of our >> kernel software. I am seeing that under certain test loads, the system will hang-up >> complete, no keyboard, no console, etc. I suspect it is some of the kernel code that >> I have inherited that contains a lot of locking (lots of data structure, each having >> their own mutex lock (sleepable)). > > You need to use 'kgdb' rather than 'gdb' on kernel.debug. > Doh! *-( I'm so used to gdb even though I use kgdb for looking at crash dumps. Thanks, Patrick From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 17:33:09 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52D7F1065673 for ; Thu, 16 Sep 2010 17:33:09 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id CC2858FC16 for ; Thu, 16 Sep 2010 17:33:08 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1OwIKZ-00052m-8B; Thu, 16 Sep 2010 20:33:07 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id 9A4B81CC1E; Thu, 16 Sep 2010 20:33:07 +0300 (EEST) Date: Thu, 16 Sep 2010 20:33:07 +0300 From: Andrey Simonenko To: Matthew Fleming Message-ID: <20100916173307.GA1994@pm513-1.comsys.ntu-kpi.kiev.ua> References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 06-Jan-2007 23:14:37) X-Date: 2010-09-16 20:33:07 X-Connected-IP: 10.18.52.101:59547 X-Message-Linecount: 121 X-Body-Linecount: 105 X-Message-Size: 6165 X-Body-Size: 5370 Cc: freebsd-hackers@freebsd.org Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 17:33:09 -0000 On Wed, Sep 15, 2010 at 08:46:00AM -0700, Matthew Fleming wrote: > I'll take a stab at answering these... > > On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko > wrote: > > Hello, > > > > I have questions about mutex implementation in kern/kern_mutex.c > > and sys/mutex.h files (current versions of these files): > > > > 1. Is the following statement correct for a volatile pointer or integer > > variable: if a volatile variable is updated by the compare-and-set > > instruction (e.g. atomic_cmpset_ptr(&val, ...)), then the current > > value of such variable can be read without any special instruction > > (e.g. v = val)? > > > > I checked Assembler code for a function with "v = val" and "val = v" > > like statements generated for volatile variable and simple variable > > and found differences: on ia64 "v = val" was implemented by ld.acq and > > "val = v" was implemented by st.rel; on mips and sparc64 Assembler code > > can have different order of lines for volatile and simple variable > > (depends on the code of a function). > > I think this depends somewhat on the hardware and what you mean by > "current" value. "Current" value means that the value of a variable read by one thread is equal to the value of this variable successfully updated by another thread by the compare-and-set instruction. As I understand from the kernel source code, atomic_cmpset_ptr() allows to update a variable in a way that all other CPUs will invalidate corresponding cache lines that contain the value of this variable. The mtx_owned(9) macro uses this property, mtx_owned() does not use anything special to compare the value of m->mtx_lock (volatile) with current thread pointer, all other functions that update m->mtx_lock of unowned mutex use compare-and-set instruction. Also I cannot find anything special in generated Assembler code for volatile variables (except for ia64 where acquire loads and release stores are used). > > If you want a value that is not in-flux, then something like > atomic_cmpset_ptr() setting to the current value is needed, so that > you force any other atomic_cmpset to fail. However, since there is no > explicit lock involved, there is no strong meaning for "current" value > and a read that does not rely on a value cached in a register is > likely sufficient. While the "volatile" keyword in C has no explicit > hardware meaning, it often means that a load from memory (or, > presumably, L1-L3 cache) is required. The "volatile" keyword here and all questions are related to the base C compiler, current version and currently supported architectures in FreeBSD. Yes, here under "volatile" I want to say that the value of a variable is not cached in a register and it is referenced by its address in all commands. There are some places in the kernel where a variable is updated in something like "do { v = value; } while (!atomic_cmpset_int(&value, ...));" and that variable is not "volatile", but the compiler generates correct Assembler code. So "volatile" is not a requirement for all cases. > > > 2. Let there is a default (sleep) mutex and adaptive mutexes is enabled. > > A thread tries to obtain lock quickly and fails, _mtx_lock_sleep() > > is called, it gets the address of the current mutex's owner thread > > and checks whether that owner thread is running (on another CPU). > > How does _mtx_lock_sleep() know that that thread still exists > > (lines 311-337 in kern_mutex.c)? > > > > When adaptive mutexes was implemented there was explicit locking > > around adaptive mutexes code. When turnstile in mutex code was > > implemented that's locking logic was changed. > > It appears that it's possible for the thread pointer to be recycled > between fetching the value of owner and looking at TD_IS_RUNNING. On > actual hardware, this race is unlikely to occur due to the time it > takes for a thread to release a lock and perform all of thread exit > code before the struct thread is returned to the uma zone. However, > even once returned to the uma zone on many FreeBSD implementations the > access is safe as the address of the thread is still dereferenceable, > due to the implementation of uma zones. I checked exactly this scenario, that's why asked this question to verify my understanding. > > > 3. Why there is no any memory barrier in mtx_init()? If another thread > > (on another CPU) finds that mutex is initialized using mtx_initialized() > > then it can mtx_lock() it and mtx_lock() it second time, as a result > > mtx_recurse field will be increased, but its value still can be > > uninitialized on architecture with relaxed memory ordering model. > > It seems to me that it's generally a programming error to rely on the > return of mtx_initialized(), as there is no serialization with e.g. a > thread calling mtx_destroy(). A fully correct serialization model > would require that a single thread initialize the mtx and then create > any worker threads that will use the mtx. I agree that this should not happen in practice. Another thread can get a pointer to just initialized mutex and begin to work with it, so mtx_initialized() is not a requirement. I just want to say that when mtx_init() is finished, it does not mean that just initialized mutex by one thread is ready to be used by another thread. Thank you for answers. From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 18:02:36 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A78D1065672 for ; Thu, 16 Sep 2010 18:02:36 +0000 (UTC) (envelope-from rwmaillists@googlemail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id DC9ED8FC17 for ; Thu, 16 Sep 2010 18:02:35 +0000 (UTC) Received: by wyb33 with SMTP id 33so2135362wyb.13 for ; Thu, 16 Sep 2010 11:02:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=KabAr2bqs7bh6gCNdZI1LiSGjqw1JKfmkzEmIKaVSBY=; b=Z1LuLy/zTiK2km6AA6WTrzC5LAUAYZEK6QVtsMeziuCDOJav8j6UYnxnRhb1nSf4Q1 8FX5PjhUDAnNJHS7TGNnZ0gzgHKoi6yUE3VkJpGu8IaJrz7JVSVjr0PFwI4D/pMdoMF3 x5ldIo1NM5oDkLiUiFgouATiKAfN/mUSa1ptc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=BYx1Iq1QUR9JFR4J/vFEWFNnfVnJhgl50YvjVBR+vDmTA82XgpFQuQs3ZSNhFNgM5g boBartZaCkP02iCqpRfEjHAnWLFWmGcj8m7olznk1wG8+8+b5TVDWSeh/TRk2PgG5Htc k5ohRKQBactYA3efSvA0HaXQkbgzKf4OoKJOQ= Received: by 10.216.21.204 with SMTP id r54mr3001019wer.95.1284658266706; Thu, 16 Sep 2010 10:31:06 -0700 (PDT) Received: from gumby.homeunix.com (bb-87-81-140-128.ukonline.co.uk [87.81.140.128]) by mx.google.com with ESMTPS id p82sm2001464weq.3.2010.09.16.10.31.05 (version=SSLv3 cipher=RC4-MD5); Thu, 16 Sep 2010 10:31:05 -0700 (PDT) Date: Thu, 16 Sep 2010 18:31:03 +0100 From: RW To: freebsd-hackers@freebsd.org Message-ID: <20100916183103.20c70a5a@gumby.homeunix.com> In-Reply-To: <86mxri17j3.fsf@ds4.des.no> References: <201009152143.o8FLhE9p022233@lurza.secnetix.de> <20100916004902.GA46401@freebsd.org> <86mxri17j3.fsf@ds4.des.no> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd8.1) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 18:02:36 -0000 On Thu, 16 Sep 2010 09:17:52 +0200 Dag-Erling Sm=F8rgrav wrote: > Garrett Cooper writes: > > Agreed. Spinning down at reboot isn't smart and seems like a good > > way to kill a disk quicker. >=20 > *not* spinning down at halt is far worse. Most modern disks are rated > for hundreds of thousands of load-unload cycles, but far fewer > emergency unloads (which is what happens when the drive loses power > while still spinning). As I understand it wear from spinning-down used to come from the head actually scraping the disk surface as it lost lift, parking placed the head on a disposable area, but modern drives take the head off the disk altogether. When Hitachi was specifying 300,000 unloads, they said that in testing the drives were still working at 1,000,000, someone quoted 600,000 as the current spec. At these levels you can be spinning the drives down and up ever few minutes for the normal lifetime of the drive. Even on very old drives I doubt reboot are much of a problem, they're rare on servers. On laptops and desktops they're rare compared to shutdowns and suspends. =20 From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 18:16:08 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B14BC1065674 for ; Thu, 16 Sep 2010 18:16:08 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7EFE88FC1C for ; Thu, 16 Sep 2010 18:16:08 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D323646B52; Thu, 16 Sep 2010 14:16:07 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8A9128A03C; Thu, 16 Sep 2010 14:16:06 -0400 (EDT) From: John Baldwin To: Andrey Simonenko Date: Thu, 16 Sep 2010 14:16:05 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> <20100916173307.GA1994@pm513-1.comsys.ntu-kpi.kiev.ua> In-Reply-To: <20100916173307.GA1994@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009161416.05759.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 16 Sep 2010 14:16:06 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hackers@freebsd.org, Matthew Fleming Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 18:16:08 -0000 On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: > On Wed, Sep 15, 2010 at 08:46:00AM -0700, Matthew Fleming wrote: > > I'll take a stab at answering these... > > > > On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko > > wrote: > > > Hello, > > > > > > I have questions about mutex implementation in kern/kern_mutex.c > > > and sys/mutex.h files (current versions of these files): > > > > > > 1. Is the following statement correct for a volatile pointer or integer > > > variable: if a volatile variable is updated by the compare-and-set > > > instruction (e.g. atomic_cmpset_ptr(&val, ...)), then the current > > > value of such variable can be read without any special instruction > > > (e.g. v = val)? > > > > > > I checked Assembler code for a function with "v = val" and "val = v" > > > like statements generated for volatile variable and simple variable > > > and found differences: on ia64 "v = val" was implemented by ld.acq and > > > "val = v" was implemented by st.rel; on mips and sparc64 Assembler code > > > can have different order of lines for volatile and simple variable > > > (depends on the code of a function). > > > > I think this depends somewhat on the hardware and what you mean by > > "current" value. > > "Current" value means that the value of a variable read by one thread > is equal to the value of this variable successfully updated by another > thread by the compare-and-set instruction. As I understand from the kernel > source code, atomic_cmpset_ptr() allows to update a variable in a way that > all other CPUs will invalidate corresponding cache lines that contain > the value of this variable. That is not true. It is likely true on x86, but it is certainly not true on other architectures such as sparc64 where a write may be held in a store buffer for an indeterminate amount of time (and note that some lock releases are simple stores with a "rel" memory barrier). All that we require is that if the value is stale, the atomic_cmpset() that attempts to set MTX_CONTESTED will fail. > The mtx_owned(9) macro uses this property, mtx_owned() does not use anything > special to compare the value of m->mtx_lock (volatile) with current thread > pointer, all other functions that update m->mtx_lock of unowned mutex use > compare-and-set instruction. Also I cannot find anything special in > generated Assembler code for volatile variables (except for ia64 where > acquire loads and release stores are used). No, mtx_owned() is just not harmed by the races it loses. You can certainly read a stale value of mtx_lock in mtx_owned() if some other thread owns the lock or has just released the lock. However, we don't care, because in both of those cases, mtx_owned() returns false. What does matter is that mtx_owned() can only return true if we currently hold the mutex. This works because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the same time, and 2) even CPUs that hold writes in store buffers will snoop their store buffer for local reads on that CPU. That is, a given CPU will never read a stale value of a memory word that is "older" than a write it has performed to that word. > > If you want a value that is not in-flux, then something like > > atomic_cmpset_ptr() setting to the current value is needed, so that > > you force any other atomic_cmpset to fail. However, since there is no > > explicit lock involved, there is no strong meaning for "current" value > > and a read that does not rely on a value cached in a register is > > likely sufficient. While the "volatile" keyword in C has no explicit > > hardware meaning, it often means that a load from memory (or, > > presumably, L1-L3 cache) is required. > > The "volatile" keyword here and all questions are related to the base C > compiler, current version and currently supported architectures in FreeBSD. > Yes, here under "volatile" I want to say that the value of a variable is > not cached in a register and it is referenced by its address in all > commands. > > There are some places in the kernel where a variable is updated in > something like "do { v = value; } while (!atomic_cmpset_int(&value, ...));" > and that variable is not "volatile", but the compiler generates correct > Assembler code. So "volatile" is not a requirement for all cases. Hmm, I suspect that many of those places actually do use volatile. The various lock cookies (mtx_lock, etc.) are declared volatile in the structure. Otherwise the compiler would be free to conclude that 'v = value;' is a loop invariant and move it out of the loop which would break. Given that, the construct you referred to does in fact require 'value' to be volatile. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 19:00:37 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEDB31065670 for ; Thu, 16 Sep 2010 19:00:37 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id 9DAAF8FC12 for ; Thu, 16 Sep 2010 19:00:37 +0000 (UTC) Received: from [192.168.221.2] (remotevpn [192.168.221.2]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o8GJ0ap9029969 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 16 Sep 2010 12:00:36 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C92694D.1070705@feral.com> Date: Thu, 16 Sep 2010 12:00:29 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100825 Thunderbird/3.1.3 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.2.6 (ns1.feral.com [192.168.221.1]); Thu, 16 Sep 2010 12:00:37 -0700 (PDT) Subject: race conditions for destroying and opening a dev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 19:00:37 -0000 Has anyone seen this scenario before? I am seeing it in RELENG_7, but the code in question exists through to head. Thread 1: (kgdb) where #0 sched_switch (td=0xffffff003a04ea80, newtd=0xffffff00210b4000, flags=Variable "flags" is not available. ) at ../../../kern/sched_ule.c:1944 #1 0xffffffff803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0xffffffff80402399 in sleepq_switch (wchan=0xffffff8413b50b60) at ../../../kern/subr_sleepqueue.c:497 #3 0xffffffff80402e8c in sleepq_timedwait (wchan=0xffffff8413b50b60) at ../../../kern/subr_sleepqueue.c:615 #4 0xffffffff803b682d in _sleep (ident=0xffffff8413b50b60, lock=0xffffffff80b0ee00, priority=76, wmesg=0xffffffff806583bb "devdrn", timo=100) at ../../../kern/kern_synch.c:228 #5 0xffffffff8037640c in destroy_devl (dev=0xffffff003aaf0000) at ../../../kern/kern_conf.c:874 #6 0xffffffff80376759 in destroy_dev (dev=0xffffff003aaf0000) at ../../../kern/kern_conf.c:916 #7 0xffffffff8034c939 in g_dev_orphan (cp=0xffffff003a544800) at ../../../geom/geom_dev.c:438 #8 0xffffffff803506a0 in g_run_events () at ../../../geom/geom_event.c:164 #9 0xffffffff80351f1c in g_event_procbody () at ../../../geom/geom_kern.c:141 #10 0xffffffff8038a73a in fork_exit (callout=0xffffffff80351eb0 , arg=0x0, frame=0xffffff8413b50c80) at ../../../kern/kern_fork.c:829 #11 0xffffffff805a747e in fork_trampoline () at ../../../amd64/amd64/exception.S:564 #12 0x0000000000000000 in ?? () This thread is waiting on the threadcount to go away- i.e., the last close of the device to occur ("da16" in this case). Thread 2: (kgdb) where #0 sched_switch (td=0xffffff009bb4ca80, newtd=0xffffff003af43380, flags=Variable "flags" is not available. ) at ../../../kern/sched_ule.c:1944 #1 0xffffffff803b6091 in mi_switch (flags=1, newtd=0x0) at ../../../kern/kern_synch.c:450 #2 0xffffffff80402399 in sleepq_switch (wchan=0xffffffff80b0e040) at ../../../kern/subr_sleepqueue.c:497 #3 0xffffffff80402f84 in sleepq_wait (wchan=0xffffffff80b0e040) at ../../../kern/subr_sleepqueue.c:580 #4 0xffffffff803b5385 in _sx_xlock_hard (sx=0xffffffff80b0e040, tid=18446742976810240640, opts=Variable "opts" is not available. ) at ../../../kern/kern_sx.c:562 #5 0xffffffff803b5731 in _sx_xlock (sx=0xffffffff80b0e040, opts=0, file=0xffffffff80652d27 "../../../geom/geom_dev.c", line=196) at sx.h:154 #6 0xffffffff8034d1bc in g_dev_open (dev=0xffffff003aaf0000, flags=1, fmt=Variable "fmt" is not available. ) at ../../../geom/geom_dev.c:196 #7 0xffffffff80333741 in devfs_open (ap=0xffffff841dea88b0) at ../../../fs/devfs/devfs_vnops.c:902 #8 0xffffffff80601daf in VOP_OPEN_APV (vop=0xffffffff8089fb80, a=0xffffff841dea88b0) at vnode_if.c:371 #9 0xffffffff80467246 in vn_open_cred (ndp=0xffffff841dea8a00, flagp=0xffffff841dea894c, cmode=Variable "cmode" is not available. ) at vnode_if.h:199 #10 0xffffffff80463770 in kern_open (td=0xffffff009bb4ca80, path=0x5114a0
, pathseg=Variable "pathseg" is not available. ) at ../../../kern/vfs_syscalls.c:1054 #11 0xffffffff805c599e in syscall (frame=0xffffff841dea8c80) at ../../../amd64/amd64/trap.c:911 #12 0xffffffff805a723b in Xfast_syscall () at ../../../amd64/amd64/exception.S:349 #13 0x00000008009a219c in ?? () This thread was opening the device, bumped the refcount, but then wedged on the geom topology lock ..... the refcount field is protected under devmtx.... Anyone seen this? I'm half inclined to either add in CDP_SCHED_DTR when one calls destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more toward the latter. Any thoughts on this? From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 19:11:06 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 68F4F106564A for ; Thu, 16 Sep 2010 19:11:06 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id F3B758FC17 for ; Thu, 16 Sep 2010 19:11:05 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o8GJAvVF028136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 16 Sep 2010 22:10:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o8GJAvVR012684; Thu, 16 Sep 2010 22:10:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o8GJAv9U012683; Thu, 16 Sep 2010 22:10:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 16 Sep 2010 22:10:57 +0300 From: Kostik Belousov To: Matthew Jacob Message-ID: <20100916191057.GF2389@deviant.kiev.zoral.com.ua> References: <4C92694D.1070705@feral.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="8TaQrIeukR7mmbKf" Content-Disposition: inline In-Reply-To: <4C92694D.1070705@feral.com> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org Subject: Re: race conditions for destroying and opening a dev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 19:11:06 -0000 --8TaQrIeukR7mmbKf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 16, 2010 at 12:00:29PM -0700, Matthew Jacob wrote: >=20 > Has anyone seen this scenario before? I am seeing it in RELENG_7, but=20 > the code in question exists through to head. >=20 > Thread 1: >=20 > (kgdb) where > #0 sched_switch (td=3D0xffffff003a04ea80, newtd=3D0xffffff00210b4000,=20 > flags=3DVariable "flags" is not available. > ) at ../../../kern/sched_ule.c:1944 > #1 0xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at=20 > ../../../kern/kern_synch.c:450 > #2 0xffffffff80402399 in sleepq_switch (wchan=3D0xffffff8413b50b60) at= =20 > ../../../kern/subr_sleepqueue.c:497 > #3 0xffffffff80402e8c in sleepq_timedwait (wchan=3D0xffffff8413b50b60) a= t=20 > ../../../kern/subr_sleepqueue.c:615 > #4 0xffffffff803b682d in _sleep (ident=3D0xffffff8413b50b60,=20 > lock=3D0xffffffff80b0ee00, priority=3D76, wmesg=3D0xffffffff806583bb "dev= drn",=20 > timo=3D100) at ../../../kern/kern_synch.c:228 > #5 0xffffffff8037640c in destroy_devl (dev=3D0xffffff003aaf0000) at=20 > ../../../kern/kern_conf.c:874 > #6 0xffffffff80376759 in destroy_dev (dev=3D0xffffff003aaf0000) at=20 > ../../../kern/kern_conf.c:916 > #7 0xffffffff8034c939 in g_dev_orphan (cp=3D0xffffff003a544800) at=20 > ../../../geom/geom_dev.c:438 > #8 0xffffffff803506a0 in g_run_events () at ../../../geom/geom_event.c:1= 64 > #9 0xffffffff80351f1c in g_event_procbody () at=20 > ../../../geom/geom_kern.c:141 > #10 0xffffffff8038a73a in fork_exit (callout=3D0xffffffff80351eb0=20 > , arg=3D0x0,=20 > frame=3D0xffffff8413b50c80) at ../../../kern/kern_fork.c:829 > #11 0xffffffff805a747e in fork_trampoline () at=20 > ../../../amd64/amd64/exception.S:564 > #12 0x0000000000000000 in ?? () >=20 > This thread is waiting on the threadcount to go away- i.e., the last=20 > close of the device to occur ("da16" in this case). >=20 > Thread 2: >=20 > (kgdb) where > #0 sched_switch (td=3D0xffffff009bb4ca80, newtd=3D0xffffff003af43380,=20 > flags=3DVariable "flags" is not available. > ) at ../../../kern/sched_ule.c:1944 > #1 0xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at=20 > ../../../kern/kern_synch.c:450 > #2 0xffffffff80402399 in sleepq_switch (wchan=3D0xffffffff80b0e040) at= =20 > ../../../kern/subr_sleepqueue.c:497 > #3 0xffffffff80402f84 in sleepq_wait (wchan=3D0xffffffff80b0e040) at=20 > ../../../kern/subr_sleepqueue.c:580 > #4 0xffffffff803b5385 in _sx_xlock_hard (sx=3D0xffffffff80b0e040,=20 > tid=3D18446742976810240640, opts=3DVariable "opts" is not available. > ) at ../../../kern/kern_sx.c:562 > #5 0xffffffff803b5731 in _sx_xlock (sx=3D0xffffffff80b0e040, opts=3D0,= =20 > file=3D0xffffffff80652d27 "../../../geom/geom_dev.c", line=3D196) at sx.h= :154 > #6 0xffffffff8034d1bc in g_dev_open (dev=3D0xffffff003aaf0000, flags=3D1= ,=20 > fmt=3DVariable "fmt" is not available. > ) at ../../../geom/geom_dev.c:196 > #7 0xffffffff80333741 in devfs_open (ap=3D0xffffff841dea88b0) at=20 > ../../../fs/devfs/devfs_vnops.c:902 > #8 0xffffffff80601daf in VOP_OPEN_APV (vop=3D0xffffffff8089fb80,=20 > a=3D0xffffff841dea88b0) at vnode_if.c:371 > #9 0xffffffff80467246 in vn_open_cred (ndp=3D0xffffff841dea8a00,=20 > flagp=3D0xffffff841dea894c, cmode=3DVariable "cmode" is not available. > ) at vnode_if.h:199 > #10 0xffffffff80463770 in kern_open (td=3D0xffffff009bb4ca80,=20 > path=3D0x5114a0
, pathseg=3DVariable=20 > "pathseg" is not available. > ) at ../../../kern/vfs_syscalls.c:1054 > #11 0xffffffff805c599e in syscall (frame=3D0xffffff841dea8c80) at=20 > ../../../amd64/amd64/trap.c:911 > #12 0xffffffff805a723b in Xfast_syscall () at=20 > ../../../amd64/amd64/exception.S:349 > #13 0x00000008009a219c in ?? () >=20 > This thread was opening the device, bumped the refcount, but then wedged= =20 > on the geom topology lock ..... >=20 > the refcount field is protected under devmtx.... >=20 > Anyone seen this? >=20 > I'm half inclined to either add in CDP_SCHED_DTR when one calls=20 > destroy_dev, or make dev_refthread look at CDP_ACTIVE, leaning more=20 > toward the latter. >=20 > Any thoughts on this? And who owns the topology lock ? Is it thread 1 ? Destroy_devl() clears si_devsw for departing cdev, and *refthread() checks si_devsw against NULL as an indicator of device destruction in progress. I think that this situation is what destroy_dev_sched(9) was created for. --8TaQrIeukR7mmbKf Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkySa8AACgkQC3+MBN1Mb4jKNwCgv30TrKYWhEeXq1KmjAP516a4 AxAAoKkXX9pQeQkkTIxWtC0V8662YWhb =gNHJ -----END PGP SIGNATURE----- --8TaQrIeukR7mmbKf-- From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 19:45:50 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BBDD1065672 for ; Thu, 16 Sep 2010 19:45:50 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 6A5ED8FC08 for ; Thu, 16 Sep 2010 19:45:49 +0000 (UTC) Received: by iwn34 with SMTP id 34so1514022iwn.13 for ; Thu, 16 Sep 2010 12:45:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=HOzJqFNga1KhNkzgOYbMTNziS2nPm+kCrDKKKHokiSs=; b=IVcXYD+SW90n01lylQwKdIVnY/tUyKlmAx/JFan6hHlhGKhdV6mLP7ITsu43EcyPw7 irOL1OvDC4ucSc3A9gBtOncar4TUmRjAg40lnUNaCdpzoGVAhKwGLI8gvZFhxoCxhOif dtN1sPz5M2LkIUkrzWM7oNzxYHlJyeGCPOA8I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=QG6o9Txg51rKZGkruImg/68QpGiCohZLee2HGsX05Tru7mhrqkG4eqSGYh5eWA/o+z pOnH50WKIWGv4il5GI8HtPORslGWAQqKud1wCEcnN4TFicPlKLLvE1p7hZrR0Wcp7HFM XRfof4KZyixodEoCEY4AquOlzEWsneF6/x02E= MIME-Version: 1.0 Received: by 10.231.31.129 with SMTP id y1mr3938081ibc.45.1284666348448; Thu, 16 Sep 2010 12:45:48 -0700 (PDT) Received: by 10.231.187.71 with HTTP; Thu, 16 Sep 2010 12:45:48 -0700 (PDT) In-Reply-To: <4C92694D.1070705@feral.com> References: <4C92694D.1070705@feral.com> Date: Thu, 16 Sep 2010 12:45:48 -0700 Message-ID: From: Matthew Fleming To: Matthew Jacob Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: race conditions for destroying and opening a dev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 19:45:50 -0000 On Thu, Sep 16, 2010 at 12:00 PM, Matthew Jacob wrote: > > Has anyone seen this scenario before? I am seeing it in RELENG_7, but the > code in question exists through to head. > > Thread 1: > > (kgdb) where > #0 =A0sched_switch (td=3D0xffffff003a04ea80, newtd=3D0xffffff00210b4000, > flags=3DVariable "flags" is not available. > ) at ../../../kern/sched_ule.c:1944 > #1 =A00xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at > ../../../kern/kern_synch.c:450 > #2 =A00xffffffff80402399 in sleepq_switch (wchan=3D0xffffff8413b50b60) at > ../../../kern/subr_sleepqueue.c:497 > #3 =A00xffffffff80402e8c in sleepq_timedwait (wchan=3D0xffffff8413b50b60)= at > ../../../kern/subr_sleepqueue.c:615 > #4 =A00xffffffff803b682d in _sleep (ident=3D0xffffff8413b50b60, > lock=3D0xffffffff80b0ee00, priority=3D76, wmesg=3D0xffffffff806583bb "dev= drn", > timo=3D100) at ../../../kern/kern_synch.c:228 > #5 =A00xffffffff8037640c in destroy_devl (dev=3D0xffffff003aaf0000) at > ../../../kern/kern_conf.c:874 > #6 =A00xffffffff80376759 in destroy_dev (dev=3D0xffffff003aaf0000) at > ../../../kern/kern_conf.c:916 > #7 =A00xffffffff8034c939 in g_dev_orphan (cp=3D0xffffff003a544800) at > ../../../geom/geom_dev.c:438 > #8 =A00xffffffff803506a0 in g_run_events () at ../../../geom/geom_event.c= :164 > #9 =A00xffffffff80351f1c in g_event_procbody () at > ../../../geom/geom_kern.c:141 > #10 0xffffffff8038a73a in fork_exit (callout=3D0xffffffff80351eb0 > , arg=3D0x0, > frame=3D0xffffff8413b50c80) at ../../../kern/kern_fork.c:829 > #11 0xffffffff805a747e in fork_trampoline () at > ../../../amd64/amd64/exception.S:564 > #12 0x0000000000000000 in ?? () > > This thread is waiting on the threadcount to go away- i.e., the last clos= e > of the device to occur ("da16" in this case). > > Thread 2: > > (kgdb) where > #0 =A0sched_switch (td=3D0xffffff009bb4ca80, newtd=3D0xffffff003af43380, > flags=3DVariable "flags" is not available. > ) at ../../../kern/sched_ule.c:1944 > #1 =A00xffffffff803b6091 in mi_switch (flags=3D1, newtd=3D0x0) at > ../../../kern/kern_synch.c:450 > #2 =A00xffffffff80402399 in sleepq_switch (wchan=3D0xffffffff80b0e040) at > ../../../kern/subr_sleepqueue.c:497 > #3 =A00xffffffff80402f84 in sleepq_wait (wchan=3D0xffffffff80b0e040) at > ../../../kern/subr_sleepqueue.c:580 > #4 =A00xffffffff803b5385 in _sx_xlock_hard (sx=3D0xffffffff80b0e040, > tid=3D18446742976810240640, opts=3DVariable "opts" is not available. > ) at ../../../kern/kern_sx.c:562 > #5 =A00xffffffff803b5731 in _sx_xlock (sx=3D0xffffffff80b0e040, opts=3D0, > file=3D0xffffffff80652d27 "../../../geom/geom_dev.c", line=3D196) at sx.h= :154 > #6 =A00xffffffff8034d1bc in g_dev_open (dev=3D0xffffff003aaf0000, flags= =3D1, > fmt=3DVariable "fmt" is not available. > ) at ../../../geom/geom_dev.c:196 > #7 =A00xffffffff80333741 in devfs_open (ap=3D0xffffff841dea88b0) at > ../../../fs/devfs/devfs_vnops.c:902 > #8 =A00xffffffff80601daf in VOP_OPEN_APV (vop=3D0xffffffff8089fb80, > a=3D0xffffff841dea88b0) at vnode_if.c:371 > #9 =A00xffffffff80467246 in vn_open_cred (ndp=3D0xffffff841dea8a00, > flagp=3D0xffffff841dea894c, cmode=3DVariable "cmode" is not available. > ) at vnode_if.h:199 > #10 0xffffffff80463770 in kern_open (td=3D0xffffff009bb4ca80, path=3D0x51= 14a0 >
, pathseg=3DVariable "pathseg" is not > available. > ) at ../../../kern/vfs_syscalls.c:1054 > #11 0xffffffff805c599e in syscall (frame=3D0xffffff841dea8c80) at > ../../../amd64/amd64/trap.c:911 > #12 0xffffffff805a723b in Xfast_syscall () at > ../../../amd64/amd64/exception.S:349 > #13 0x00000008009a219c in ?? () > > This thread was opening the device, bumped the refcount, but then wedged = on > the geom topology lock ..... > > the refcount field is protected under devmtx.... > > Anyone seen this? > > I'm half inclined to either add in CDP_SCHED_DTR when one calls destroy_d= ev, > or make dev_refthread look at CDP_ACTIVE, leaning more toward the latter. > > Any thoughts on this? We had a similar bug at Isilon, but in our case it was in cam/scsi/scsi_pass.c::passcleanup() calling destroy_dev(). We switched it to destroy_dev_sched() to fix the si_threadcount deadlock. Cheers, matthew From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 20:12:08 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A98DF1065695 for ; Thu, 16 Sep 2010 20:12:08 +0000 (UTC) (envelope-from mj@feral.com) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id 5D6EA8FC12 for ; Thu, 16 Sep 2010 20:12:08 +0000 (UTC) Received: from [192.168.221.2] (remotevpn [192.168.221.2]) by ns1.feral.com (8.14.3/8.14.3) with ESMTP id o8GKC6Qm030388 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 16 Sep 2010 13:12:07 -0700 (PDT) (envelope-from mj@feral.com) Message-ID: <4C927A10.1080202@feral.com> Date: Thu, 16 Sep 2010 13:12:00 -0700 From: Matthew Jacob Organization: Feral Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100825 Thunderbird/3.1.3 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org References: <4C92694D.1070705@feral.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-4.2.6 (ns1.feral.com [192.168.221.1]); Thu, 16 Sep 2010 13:12:07 -0700 (PDT) Subject: Re: race conditions for destroying and opening a dev X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 20:12:08 -0000 kostik, matthew- thanks mucho! From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 22:47:09 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3CCA0106564A; Thu, 16 Sep 2010 22:47:09 +0000 (UTC) (envelope-from wblock@wonkity.com) Received: from wonkity.com (wonkity.com [67.158.26.137]) by mx1.freebsd.org (Postfix) with ESMTP id EB5C48FC19; Thu, 16 Sep 2010 22:47:08 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.4/8.14.4) with ESMTP id o8GMl5RD070096; Thu, 16 Sep 2010 16:47:05 -0600 (MDT) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.4/8.14.4/Submit) with ESMTP id o8GMl4xJ070093; Thu, 16 Sep 2010 16:47:05 -0600 (MDT) (envelope-from wblock@wonkity.com) Date: Thu, 16 Sep 2010 16:47:04 -0600 (MDT) From: Warren Block To: Oliver Fromme In-Reply-To: <201009161619.o8GGJAmv035378@lurza.secnetix.de> Message-ID: References: <201009161619.o8GGJAmv035378@lurza.secnetix.de> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.6 (wonkity.com [127.0.0.1]); Thu, 16 Sep 2010 16:47:05 -0600 (MDT) Cc: freebsd-hackers@freebsd.org, mav@freebsd.org, Tijl Coosemans , Alexander Best Subject: Re: Summary: Re: Spin down HDD after disk sync or before power off X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2010 22:47:09 -0000 On Thu, 16 Sep 2010, Oliver Fromme wrote: > I've updated the patch for ada(4). It includes a bug fix > (command1 vs. command2) and uses the howto flags passed to > the shutdown function. Thanks again for pointing these out. Works perfectly on a system here. Thanks! From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 03:24:34 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A09B106566C; Fri, 17 Sep 2010 03:24:34 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id B3BF48FC1A; Fri, 17 Sep 2010 03:24:33 +0000 (UTC) X-AuditID: 12074424-b7b2bae000005b3f-87-4c92df56e785 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-7.mit.edu (Symantec Brightmail Gateway) with SMTP id 23.BA.23359.65FD29C4; Thu, 16 Sep 2010 23:24:06 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id o8H3OWJk009314; Thu, 16 Sep 2010 23:24:32 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id o8H3OUor018959 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 16 Sep 2010 23:24:32 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id o8H3OThN013379; Thu, 16 Sep 2010 23:24:29 -0400 (EDT) Date: Thu, 16 Sep 2010 23:24:29 -0400 (EDT) From: Benjamin Kaduk To: John Baldwin In-Reply-To: <201009161416.05759.jhb@freebsd.org> Message-ID: References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> <20100916173307.GA1994@pm513-1.comsys.ntu-kpi.kiev.ua> <201009161416.05759.jhb@freebsd.org> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: AAAAAA== Cc: freebsd-hackers@freebsd.org Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 03:24:34 -0000 On Thu, 16 Sep 2010, John Baldwin wrote: > On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: > >> The mtx_owned(9) macro uses this property, mtx_owned() does not use anything >> special to compare the value of m->mtx_lock (volatile) with current thread >> pointer, all other functions that update m->mtx_lock of unowned mutex use >> compare-and-set instruction. Also I cannot find anything special in >> generated Assembler code for volatile variables (except for ia64 where >> acquire loads and release stores are used). > > No, mtx_owned() is just not harmed by the races it loses. You can certainly > read a stale value of mtx_lock in mtx_owned() if some other thread owns the > lock or has just released the lock. However, we don't care, because in both > of those cases, mtx_owned() returns false. What does matter is that > mtx_owned() can only return true if we currently hold the mutex. This works > because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the > same time, and 2) even CPUs that hold writes in store buffers will snoop their > store buffer for local reads on that CPU. That is, a given CPU will never > read a stale value of a memory word that is "older" than a write it has > performed to that word. Sorry for the naive question, but would you mind expounding a bit on what keeps the thread from migrating to a different CPU and getting a stale value there? (I can imagine a couple possible mechanisms, but don't know enough to know which one(s) are the real ones.) Thanks, Ben Kaduk From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 08:14:41 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A619A1065670; Fri, 17 Sep 2010 08:14:41 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 393368FC16; Fri, 17 Sep 2010 08:14:39 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA18677; Fri, 17 Sep 2010 11:14:37 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OwW5c-0008B2-V0; Fri, 17 Sep 2010 11:14:37 +0300 Message-ID: <4C93236B.4050906@freebsd.org> Date: Fri, 17 Sep 2010 11:14:35 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org X-Enigmail-Version: 1.1.2 Content-Type: multipart/mixed; boundary="------------030602010507080304070903" Cc: Jeff Roberson Subject: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 08:14:41 -0000 This is a multi-part message in MIME format. --------------030602010507080304070903 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit I've been investigating interaction between zfs and uma for a while. You might remember that there is a noticeable fragmentation in zfs uma zones when uma use is not enabled for actual data/metadata buffers. I also noticed that when uma use is enabled for data/metadata buffers (zio.use_uma=1) amount of memory reserved in free items of zfs uma zones becomes really huge. And this is despite the fact that the vast majority of the data/metadata zone have items with sizes that are multiples of page size. This couldn't really be because of fragmentation. Further checks show that the free items are accumulated in per-cpu cache buckets. uz_count for those buckets starts with 1, but over time, during bursts of activity, it grows up to maximum of 128. Problem with those buckets is that they are not drained on low memory conditions and uz_count never goes down. So, after a while, I observe about 300 free items (on a mere two core system) cached in 4 per-cpu buckets for a single zone with 128KB item size. That's 30MB right there. For all data and metadata zones the number goes as high as 500MB on my machine with 4GB physical RAM. This seems like a bit too much to me. Although keeping free items around improves performance, it does consume memory too. And the fact that that memory is not freed on lowmem condition makes the situation worse. So, I decided to take a look at how they handle this situation in (Open)Solaris. There is this good book: http://books.google.com/books?id=r_cecYD4AKkC&printsec=frontcover Please see section 6.2.4.5 on page 225 and table 6-11 on page 226. And also this code: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/kmem.c#971 It makes sense to me to limit size of per-cpu buckets depending on item size. I even wrote a little bit hackish patch [attached]. But I didn't go far as they did in Solaris, so minimum bucket size limit is 4. But perhaps it would make sense to not use the cache at all starting with certain size. Another attached hack removes zio zones that have items larger than page size, but not multiple of page size. Internally they would still consume multiple of page size per item, so we potentially can have two zones that use the same number of pages per zone, but with different item size. With the patch they are collapsed into a single zone. -- Andriy Gapon --------------030602010507080304070903 Content-Type: text/plain; name="uma-uz_count_max.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="uma-uz_count_max.diff" ZGlmZiAtLWdpdCBhL3N5cy92bS91bWFfY29yZS5jIGIvc3lzL3ZtL3VtYV9jb3JlLmMKaW5k ZXggM2ZjNWI4YS4uM2I4Mzg0YiAxMDA2NDQKLS0tIGEvc3lzL3ZtL3VtYV9jb3JlLmMKKysr IGIvc3lzL3ZtL3VtYV9jb3JlLmMKQEAgLTE3OSw5ICsxNzksMTIgQEAgc3RydWN0IHVtYV9i dWNrZXRfem9uZSB7CiAJaW50CQl1YnpfZW50cmllczsKIH07CiAKLSNkZWZpbmUJQlVDS0VU X01BWAkxMjgKKyNkZWZpbmUJQlVDS0VUX1NJWkVfVEhSRVNIT0xECTEzMTA3MgorI2RlZmlu ZQlCVUNLRVRfTUFYCQkxMjgKIAogc3RydWN0IHVtYV9idWNrZXRfem9uZSBidWNrZXRfem9u ZXNbXSA9IHsKKwl7IE5VTEwsICI0IEJ1Y2tldCIsIDQgfSwKKwl7IE5VTEwsICI4IEJ1Y2tl dCIsIDggfSwKIAl7IE5VTEwsICIxNiBCdWNrZXQiLCAxNiB9LAogCXsgTlVMTCwgIjMyIEJ1 Y2tldCIsIDMyIH0sCiAJeyBOVUxMLCAiNjQgQnVja2V0IiwgNjQgfSwKQEAgLTE4OSw3ICsx OTIsNyBAQCBzdHJ1Y3QgdW1hX2J1Y2tldF96b25lIGJ1Y2tldF96b25lc1tdID0gewogCXsg TlVMTCwgTlVMTCwgMH0KIH07CiAKLSNkZWZpbmUJQlVDS0VUX1NISUZUCTQKKyNkZWZpbmUJ QlVDS0VUX1NISUZUCTIKICNkZWZpbmUJQlVDS0VUX1pPTkVTCSgoQlVDS0VUX01BWCA+PiBC VUNLRVRfU0hJRlQpICsgMSkKIAogLyoKQEAgLTE0NjMsNiArMTQ2NiwxMyBAQCB6b25lX2N0 b3Iodm9pZCAqbWVtLCBpbnQgc2l6ZSwgdm9pZCAqdWRhdGEsIGludCBmbGFncykKIAkJem9u ZS0+dXpfY291bnQgPSBrZWctPnVrX2lwZXJzOwogCWVsc2UKIAkJem9uZS0+dXpfY291bnQg PSBCVUNLRVRfTUFYOworCisJem9uZS0+dXpfY291bnRfbWF4ID0gQlVDS0VUX1NJWkVfVEhS RVNIT0xEIC8gem9uZS0+dXpfc2l6ZTsKKwlpZiAoem9uZS0+dXpfY291bnRfbWF4ID4gQlVD S0VUX01BWCkKKwkJem9uZS0+dXpfY291bnRfbWF4ID0gQlVDS0VUX01BWDsKKwllbHNlIGlm ICh6b25lLT51el9jb3VudF9tYXggPCAoMSA8PCBCVUNLRVRfU0hJRlQpKQorCQl6b25lLT51 el9jb3VudF9tYXggPSAxIDw8IEJVQ0tFVF9TSElGVDsKKwogCXJldHVybiAoMCk7CiB9CiAK QEAgLTIwNzYsNyArMjA4Niw3IEBAIHphbGxvY19zdGFydDoKIAljcml0aWNhbF9leGl0KCk7 CiAKIAkvKiBCdW1wIHVwIG91ciB1el9jb3VudCBzbyB3ZSBnZXQgaGVyZSBsZXNzICovCi0J aWYgKHpvbmUtPnV6X2NvdW50IDwgQlVDS0VUX01BWCkKKwlpZiAoem9uZS0+dXpfY291bnQg PCB6b25lLT51el9jb3VudF9tYXgpCiAJCXpvbmUtPnV6X2NvdW50Kys7CiAKIAkvKgpkaWZm IC0tZ2l0IGEvc3lzL3ZtL3VtYV9pbnQuaCBiL3N5cy92bS91bWFfaW50LmgKaW5kZXggNzcx MzU5My4uNmQ4MWUzZCAxMDA2NDQKLS0tIGEvc3lzL3ZtL3VtYV9pbnQuaAorKysgYi9zeXMv dm0vdW1hX2ludC5oCkBAIC0zMzAsNiArMzMwLDcgQEAgc3RydWN0IHVtYV96b25lIHsKIAl1 X2ludDY0X3QJdXpfc2xlZXBzOwkvKiBUb3RhbCBudW1iZXIgb2YgYWxsb2Mgc2xlZXBzICov CiAJdWludDE2X3QJdXpfZmlsbHM7CS8qIE91dHN0YW5kaW5nIGJ1Y2tldCBmaWxscyAqLwog CXVpbnQxNl90CXV6X2NvdW50OwkvKiBIaWdoZXN0IHZhbHVlIHViX3B0ciBjYW4gaGF2ZSAq LworCXVpbnQxNl90CXV6X2NvdW50X21heDsJLyogSGlnaGVzdCB2YWx1ZSB1el9jb3VudCBj YW4gaGF2ZSAqLwogCiAJLyoKIAkgKiBUaGlzIEhBUyB0byBiZSB0aGUgbGFzdCBpdGVtIGJl Y2F1c2Ugd2UgYWRqdXN0IHRoZSB6b25lIHNpemUK --------------030602010507080304070903 Content-Type: text/plain; name="zfs-zio-zones.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="zfs-zio-zones.diff" ZGlmZiAtLWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1vbi9m cy96ZnMvemlvLmMgYi9zeXMvY2RkbC9jb250cmliL29wZW5zb2xhcmlzL3V0cy9jb21tb24v ZnMvemZzL3ppby5jCmluZGV4IDhkZGY3Y2QuLjM0MGY2NzYgMTAwNjQ0Ci0tLSBhL3N5cy9j ZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1vbi9mcy96ZnMvemlvLmMKKysrIGIv c3lzL2NkZGwvY29udHJpYi9vcGVuc29sYXJpcy91dHMvY29tbW9uL2ZzL3pmcy96aW8uYwpA QCAtMTIxLDEwICsxMjEsMTEgQEAgemlvX2luaXQodm9pZCkKIAkJCWFsaWduID0gU1BBX01J TkJMT0NLU0laRTsKIAkJfSBlbHNlIGlmIChQMlBIQVNFKHNpemUsIFBBR0VTSVpFKSA9PSAw KSB7CiAJCQlhbGlnbiA9IFBBR0VTSVpFOworI2lmIDAKIAkJfSBlbHNlIGlmIChQMlBIQVNF KHNpemUsIHAyID4+IDIpID09IDApIHsKIAkJCWFsaWduID0gcDIgPj4gMjsKKyNlbmRpZgog CQl9Ci0KIAkJaWYgKGFsaWduICE9IDApIHsKIAkJCWNoYXIgbmFtZVszNl07CiAJCQkodm9p ZCkgc3ByaW50ZihuYW1lLCAiemlvX2J1Zl8lbHUiLCAodWxvbmdfdClzaXplKTsK --------------030602010507080304070903-- From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 12:32:35 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1213A106564A; Fri, 17 Sep 2010 12:32:35 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D1EAC8FC08; Fri, 17 Sep 2010 12:32:33 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA21728; Fri, 17 Sep 2010 15:32:32 +0300 (EEST) (envelope-from avg@freebsd.org) Message-ID: <4C935FDF.4040909@freebsd.org> Date: Fri, 17 Sep 2010 15:32:31 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100909 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: Andre Oppermann References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> In-Reply-To: <4C935F56.4030903@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Jeff Roberson Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 12:32:35 -0000 on 17/09/2010 15:30 Andre Oppermann said the following: > Having a general solutions for that is appreciated. Maybe the size > of the free per-cpu buckets should be specified when setting up the > UMA zone. Of certain frequently re-used elements we may want to > cache more, other less. This kind of flexibility seems like a very good idea. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 12:39:46 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 512131065781 for ; Fri, 17 Sep 2010 12:39:46 +0000 (UTC) (envelope-from alex.coulson@charthouse.co.uk) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id E10BF8FC20 for ; Fri, 17 Sep 2010 12:39:45 +0000 (UTC) Received: by wyb33 with SMTP id 33so3087918wyb.13 for ; Fri, 17 Sep 2010 05:39:45 -0700 (PDT) Received: by 10.216.165.209 with SMTP id e59mr705541wel.58.1284725751134; Fri, 17 Sep 2010 05:15:51 -0700 (PDT) Received: from [192.168.10.127] (host81-149-4-164.in-addr.btopenworld.com [81.149.4.164]) by mx.google.com with ESMTPS id n17sm2606299weq.30.2010.09.17.05.15.49 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 17 Sep 2010 05:15:50 -0700 (PDT) From: Alex Coulson Date: Fri, 17 Sep 2010 13:15:47 +0100 Message-Id: <2E7772A2-D0C2-474F-9101-DC782F58BC4F@charthouse.co.uk> To: freebsd-hackers@freebsd.org Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Nanobsd - Freebsd7.2 - Can't enable core dump X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 12:39:46 -0000 I have a soekris net4801 running nanobsd which is freezing randomly = (between 10min->2hours), and does not create a crash dump file when it = fails (and nothing interesting in the messages log). The following kernel options are enabled: > makeoptions DEBUG=3D-g=09 > options KDTRACE_HOOKS > options KDB > options DDB > options KDB_UNATTENDED > options KDB_TRACE rc.conf > dumpdev=3D"/dev/da0s1b" > savecore=3D"YES" swapinfo > Device 1K-blocks Used Avail Capacity > /dev/da0s1b 500720 0 500720 0% db> call doadump > Physical memory: 247 MB > Dumping 35 MB:ucom0: ucomreadcb: TIMEOUT > Aborting dump due to I/O error. > status =3D=3D 0x4, scsi status =3D=3D 0x0 >=20 > ** DUMP FAILED (ERROR 5) ** > =3D 0x1d Any help would be appreciated! Alex Coulson From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 12:56:53 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4224E106566C for ; Fri, 17 Sep 2010 12:56:53 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id A757D8FC1D for ; Fri, 17 Sep 2010 12:56:52 +0000 (UTC) Received: (qmail 14323 invoked from network); 17 Sep 2010 12:24:36 -0000 Received: from unknown (HELO [62.48.0.92]) ([62.48.0.92]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 17 Sep 2010 12:24:36 -0000 Message-ID: <4C935F56.4030903@freebsd.org> Date: Fri, 17 Sep 2010 14:30:14 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100825 Thunderbird/3.1.3 MIME-Version: 1.0 To: Andriy Gapon References: <4C93236B.4050906@freebsd.org> In-Reply-To: <4C93236B.4050906@freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Jeff Roberson Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 12:56:53 -0000 On 17.09.2010 10:14, Andriy Gapon wrote: > > I've been investigating interaction between zfs and uma for a while. > You might remember that there is a noticeable fragmentation in zfs uma zones > when uma use is not enabled for actual data/metadata buffers. > > I also noticed that when uma use is enabled for data/metadata buffers > (zio.use_uma=1) amount of memory reserved in free items of zfs uma zones becomes > really huge. And this is despite the fact that the vast majority of the > data/metadata zone have items with sizes that are multiples of page size. > This couldn't really be because of fragmentation. > > Further checks show that the free items are accumulated in per-cpu cache > buckets. uz_count for those buckets starts with 1, but over time, during bursts > of activity, it grows up to maximum of 128. > Problem with those buckets is that they are not drained on low memory conditions > and uz_count never goes down. > > So, after a while, I observe about 300 free items (on a mere two core system) > cached in 4 per-cpu buckets for a single zone with 128KB item size. > That's 30MB right there. > For all data and metadata zones the number goes as high as 500MB on my machine > with 4GB physical RAM. > This seems like a bit too much to me. > > Although keeping free items around improves performance, it does consume memory > too. And the fact that that memory is not freed on lowmem condition makes the > situation worse. Interesting. We may run into related issues with excessive mbuf (cluster) caching in the per-cpu buckets as well. Having a general solutions for that is appreciated. Maybe the size of the free per-cpu buckets should be specified when setting up the UMA zone. Of certain frequently re-used elements we may want to cache more, other less. -- Andre From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 15:23:48 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 792801065672 for ; Fri, 17 Sep 2010 15:23:48 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 48F188FC12 for ; Fri, 17 Sep 2010 15:23:48 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id F210146BA0; Fri, 17 Sep 2010 11:23:47 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1B4BC8A04F; Fri, 17 Sep 2010 11:23:47 -0400 (EDT) From: John Baldwin To: Benjamin Kaduk Date: Fri, 17 Sep 2010 09:02:18 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> <201009161416.05759.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009170902.18748.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 17 Sep 2010 11:23:47 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hackers@freebsd.org Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 15:23:48 -0000 On Thursday, September 16, 2010 11:24:29 pm Benjamin Kaduk wrote: > On Thu, 16 Sep 2010, John Baldwin wrote: > > > On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: > > > >> The mtx_owned(9) macro uses this property, mtx_owned() does not use anything > >> special to compare the value of m->mtx_lock (volatile) with current thread > >> pointer, all other functions that update m->mtx_lock of unowned mutex use > >> compare-and-set instruction. Also I cannot find anything special in > >> generated Assembler code for volatile variables (except for ia64 where > >> acquire loads and release stores are used). > > > > No, mtx_owned() is just not harmed by the races it loses. You can certainly > > read a stale value of mtx_lock in mtx_owned() if some other thread owns the > > lock or has just released the lock. However, we don't care, because in both > > of those cases, mtx_owned() returns false. What does matter is that > > mtx_owned() can only return true if we currently hold the mutex. This works > > because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the > > same time, and 2) even CPUs that hold writes in store buffers will snoop their > > store buffer for local reads on that CPU. That is, a given CPU will never > > read a stale value of a memory word that is "older" than a write it has > > performed to that word. > > Sorry for the naive question, but would you mind expounding a bit on what > keeps the thread from migrating to a different CPU and getting a stale > value there? (I can imagine a couple possible mechanisms, but don't know > enough to know which one(s) are the real ones.) The memory barriers in the thread_lock() / thread_unlock() pair of a context switch ensure that any writes posted by the thread before it performs a context switch will be visible on the "new" CPU before the thread resumes execution. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 17:42:45 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 858E71065694; Fri, 17 Sep 2010 17:42:45 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1EBBF8FC15; Fri, 17 Sep 2010 17:42:45 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1OwexQ-0004Vx-89; Fri, 17 Sep 2010 20:42:44 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id B27531CC1E; Fri, 17 Sep 2010 20:42:44 +0300 (EEST) Date: Fri, 17 Sep 2010 20:42:44 +0300 From: Andrey Simonenko To: John Baldwin Message-ID: <20100917174244.GA2570@pm513-1.comsys.ntu-kpi.kiev.ua> References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> <20100916173307.GA1994@pm513-1.comsys.ntu-kpi.kiev.ua> <201009161416.05759.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201009161416.05759.jhb@freebsd.org> User-Agent: Mutt/1.5.20 (2009-06-14) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 06-Jan-2007 23:14:37) X-Date: 2010-09-17 20:42:44 X-Connected-IP: 10.18.52.101:43526 X-Message-Linecount: 105 X-Body-Linecount: 87 X-Message-Size: 5808 X-Body-Size: 4945 Cc: freebsd-hackers@freebsd.org, Matthew Fleming Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 17:42:45 -0000 On Thu, Sep 16, 2010 at 02:16:05PM -0400, John Baldwin wrote: > On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: > > > "Current" value means that the value of a variable read by one thread > > is equal to the value of this variable successfully updated by another > > thread by the compare-and-set instruction. As I understand from the kernel > > source code, atomic_cmpset_ptr() allows to update a variable in a way that > > all other CPUs will invalidate corresponding cache lines that contain > > the value of this variable. > > That is not true. It is likely true on x86, but it is certainly not true on > other architectures such as sparc64 where a write may be held in a store > buffer for an indeterminate amount of time (and note that some lock releases > are simple stores with a "rel" memory barrier). All that we require is that > if the value is stale, the atomic_cmpset() that attempts to set MTX_CONTESTED > will fail. I missed _release_lock_quick() call in _mtx_unlock_sleep(). > > > The mtx_owned(9) macro uses this property, mtx_owned() does not use anything > > special to compare the value of m->mtx_lock (volatile) with current thread > > pointer, all other functions that update m->mtx_lock of unowned mutex use > > compare-and-set instruction. Also I cannot find anything special in > > generated Assembler code for volatile variables (except for ia64 where > > acquire loads and release stores are used). > > No, mtx_owned() is just not harmed by the races it loses. You can certainly > read a stale value of mtx_lock in mtx_owned() if some other thread owns the > lock or has just released the lock. However, we don't care, because in both > of those cases, mtx_owned() returns false. What does matter is that > mtx_owned() can only return true if we currently hold the mutex. This works > because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the > same time, and 2) even CPUs that hold writes in store buffers will snoop their > store buffer for local reads on that CPU. That is, a given CPU will never > read a stale value of a memory word that is "older" than a write it has > performed to that word. Looks like I understand the logic why mtx_owned() works correctly when mtx_lock is present in CPU cache or is absent in CPU cache. The mtx_lock value definitely can say whether lock is held by the current thread, but it cannot say whether it is unowned or is owned by another thread. Let me ask another one question about memory barriers and thread migration. Let a thread locked a mutex, modified shared data protected by this mutex and was migrated from CPU1 to CPU2 (mutex is still locked). In this scenario just migrated thread will not see stale data for a mutex itself (the m->mtx_lock value) and for shared data on CPU2 because when it was migrated from CPU1 there was at least one unlock call for some another mutex that had release semantics and appropriate memory barrier instruction was run implicitly or explicitly. As a result this "rel" memory barrier made all modifications from CPU1 visible on another CPUs. When CPU2 switched to just migrated thread there was at least on lock call for some another mutex with acquire semantics, so "rel/acq" memory barriers pair works here together. (Also I consider case when CPU2 did not work with that mutex, but worked with its memory before. Some thread on CPU2 could allocate some memory, worked with it and freed it. Later the same part of memory was allocated by a thread on CPU1 for mutex). Is the above written description correct? Such logic of memory barriers is described in detail in Sparc v9 documentation book in MEMBAR instruction description. Actually MEMBAR with appropriate masks is used in atomic.h for this architecture. As I understand the same logic for memory barriers (atomic_..._rel and atomic_..._acq) is applicable to all other architectures. Otherwise I do not understand how mtx_lock() and mtx_unlock() pair can protect data and can ensure that a thread that locked a mutex will see correct (not stale) data protected by this mutex. > > There are some places in the kernel where a variable is updated in > > something like "do { v = value; } while (!atomic_cmpset_int(&value, ...));" > > and that variable is not "volatile", but the compiler generates correct > > Assembler code. So "volatile" is not a requirement for all cases. > > Hmm, I suspect that many of those places actually do use volatile. The > various lock cookies (mtx_lock, etc.) are declared volatile in the structure. > Otherwise the compiler would be free to conclude that 'v = value;' is a loop > invariant and move it out of the loop which would break. Given that, the > construct you referred to does in fact require 'value' to be volatile. I checked Assembler code for these functions: kern/subr_msgbuf.c:msgbuf_addchar() vm/vm_map.c:vmspace_free() Thank your for answers. From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 17 21:11:23 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C1441065670 for ; Fri, 17 Sep 2010 21:11:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 56AD78FC08 for ; Fri, 17 Sep 2010 21:11:23 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D29E946B66; Fri, 17 Sep 2010 17:11:22 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id EBA198A03C; Fri, 17 Sep 2010 17:11:21 -0400 (EDT) From: John Baldwin To: Andrey Simonenko Date: Fri, 17 Sep 2010 17:11:21 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> <201009161416.05759.jhb@freebsd.org> <20100917174244.GA2570@pm513-1.comsys.ntu-kpi.kiev.ua> In-Reply-To: <20100917174244.GA2570@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201009171711.21307.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 17 Sep 2010 17:11:21 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-hackers@freebsd.org, Matthew Fleming Subject: Re: Questions about mutex implementation in kern/kern_mutex.c X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2010 21:11:23 -0000 On Friday, September 17, 2010 1:42:44 pm Andrey Simonenko wrote: > On Thu, Sep 16, 2010 at 02:16:05PM -0400, John Baldwin wrote: > > On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: > > > The mtx_owned(9) macro uses this property, mtx_owned() does not use anything > > > special to compare the value of m->mtx_lock (volatile) with current thread > > > pointer, all other functions that update m->mtx_lock of unowned mutex use > > > compare-and-set instruction. Also I cannot find anything special in > > > generated Assembler code for volatile variables (except for ia64 where > > > acquire loads and release stores are used). > > > > No, mtx_owned() is just not harmed by the races it loses. You can certainly > > read a stale value of mtx_lock in mtx_owned() if some other thread owns the > > lock or has just released the lock. However, we don't care, because in both > > of those cases, mtx_owned() returns false. What does matter is that > > mtx_owned() can only return true if we currently hold the mutex. This works > > because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the > > same time, and 2) even CPUs that hold writes in store buffers will snoop their > > store buffer for local reads on that CPU. That is, a given CPU will never > > read a stale value of a memory word that is "older" than a write it has > > performed to that word. > > Looks like I understand the logic why mtx_owned() works correctly when > mtx_lock is present in CPU cache or is absent in CPU cache. The mtx_lock > value definitely can say whether lock is held by the current thread, but > it cannot say whether it is unowned or is owned by another thread. > > Let me ask another one question about memory barriers and thread migration. > > Let a thread locked a mutex, modified shared data protected by this mutex > and was migrated from CPU1 to CPU2 (mutex is still locked). In this scenario > just migrated thread will not see stale data for a mutex itself (the > m->mtx_lock value) and for shared data on CPU2 because when it was migrated > from CPU1 there was at least one unlock call for some another mutex that had > release semantics and appropriate memory barrier instruction was run > implicitly or explicitly. As a result this "rel" memory barrier made all > modifications from CPU1 visible on another CPUs. When CPU2 switched to just > migrated thread there was at least on lock call for some another mutex with > acquire semantics, so "rel/acq" memory barriers pair works here together. > (Also I consider case when CPU2 did not work with that mutex, but worked > with its memory before. Some thread on CPU2 could allocate some memory, > worked with it and freed it. Later the same part of memory was allocated > by a thread on CPU1 for mutex). > > Is the above written description correct? Yes. > > > There are some places in the kernel where a variable is updated in > > > something like "do { v = value; } while (!atomic_cmpset_int(&value, ...));" > > > and that variable is not "volatile", but the compiler generates correct > > > Assembler code. So "volatile" is not a requirement for all cases. > > > > Hmm, I suspect that many of those places actually do use volatile. The > > various lock cookies (mtx_lock, etc.) are declared volatile in the structure. > > Otherwise the compiler would be free to conclude that 'v = value;' is a loop > > invariant and move it out of the loop which would break. Given that, the > > construct you referred to does in fact require 'value' to be volatile. > > I checked Assembler code for these functions: > > kern/subr_msgbuf.c:msgbuf_addchar() > vm/vm_map.c:vmspace_free() They may happen to accidentally work because atomic_cmpset() clobbers all of memory, but these should be marked volatile. Index: vm/vm_map.c =================================================================== --- vm/vm_map.c (revision 212801) +++ vm/vm_map.c (working copy) @@ -343,10 +343,7 @@ if (vm->vm_refcnt == 0) panic("vmspace_free: attempt to free already freed vmspace"); - do - refcnt = vm->vm_refcnt; - while (!atomic_cmpset_int(&vm->vm_refcnt, refcnt, refcnt - 1)); - if (refcnt == 1) + if (atomic_fetchadd_int(&vm->vm_refcnt, -1) == 1) vmspace_dofree(vm); } Index: vm/vm_map.h =================================================================== --- vm/vm_map.h (revision 212801) +++ vm/vm_map.h (working copy) @@ -237,7 +237,7 @@ caddr_t vm_taddr; /* (c) user virtual address of text */ caddr_t vm_daddr; /* (c) user virtual address of data */ caddr_t vm_maxsaddr; /* user VA at max stack growth */ - int vm_refcnt; /* number of references */ + volatile int vm_refcnt; /* number of references */ /* * Keep the PMAP last, so that CPU-specific variations of that * structure on a single architecture don't result in offset Index: sys/msgbuf.h =================================================================== --- sys/msgbuf.h (revision 212801) +++ sys/msgbuf.h (working copy) @@ -38,7 +38,7 @@ #define MSG_MAGIC 0x063062 u_int msg_magic; u_int msg_size; /* size of buffer area */ - u_int msg_wseq; /* write sequence number */ + volatile u_int msg_wseq; /* write sequence number */ u_int msg_rseq; /* read sequence number */ u_int msg_cksum; /* checksum of contents */ u_int msg_seqmod; /* range for sequence numbers */ -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 04:01:09 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F356106564A; Sat, 18 Sep 2010 04:01:09 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-4.mit.edu (DMZ-MAILSEC-SCANNER-4.MIT.EDU [18.9.25.15]) by mx1.freebsd.org (Postfix) with ESMTP id 304138FC12; Sat, 18 Sep 2010 04:01:08 +0000 (UTC) X-AuditID: 1209190f-b7bf7ae00000628e-91-4c9439882650 Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-4.mit.edu (Symantec Brightmail Gateway) with SMTP id 07.91.25230.889349C4; Sat, 18 Sep 2010 00:01:12 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id o8I417ZW026321; Sat, 18 Sep 2010 00:01:07 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id o8I415po019277 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 18 Sep 2010 00:01:07 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id o8I414dO023384; Sat, 18 Sep 2010 00:01:04 -0400 (EDT) Date: Sat, 18 Sep 2010 00:01:04 -0400 (EDT) From: Benjamin Kaduk To: kientzle@freebsd.org, kaiw@freebsd.org In-Reply-To: <20100829201050.GA60715@stack.nl> Message-ID: References: <20100829201050.GA60715@stack.nl> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Brightmail-Tracker: AAAAAA== Cc: freebsd-hackers@freebsd.org, Jilles Tjoelker Subject: Re: ar(1) format_decimal failure is fatal? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 04:01:09 -0000 On Sun, 29 Aug 2010, Jilles Tjoelker wrote: > On Sat, Aug 28, 2010 at 07:08:34PM -0400, Benjamin Kaduk wrote: >> [...] >> building static egacy library >> ar: fatal: Numeric user ID too large >> *** Error code 70 > >> This error appears to be coming from >> lib/libarchive/archive_write_set_format_ar.c , which seems to only have >> provisions for outputting a user ID in AR_uid_size = 6 columns. [...] >> It looks like this macro was so defined in version 1.1 of that file, with >> commit message "'ar' format support for libarchive, contributed by Kai >> Wang.". This doesn't make it terribly clear whether the 'ar' format >> mandates this length, or if it is an implementation decision, so I get to >> ask: what reasoning (if any) was behind this choice? Would anything break >> if it was bumped up to a larger size? Are there other options for a >> workaround in my AFS environment? > > I wonder if the uid/gid fields are useful at all for ar archives. Ar > archives are usually not extracted, and when they are, the current > user's values seem good enough. The uid/gid also prevent exactly > reproducible builds (together with the timestamp). GNU binutils has recently (well, March 2009) added a -D ("deterministic") argument to ar(1) which sets the timestamp, uid, and gid to zero, and the mode to 644. If that argument is not given, linux's ar(1) happily uses my 8-digit uid as-is; the manual page seems to imply that it will handle 15 or 16 digits in that field. Solaris' ar(1) caps large uids to 600001. On OS X, the value is wrapped at some power of two less than 26, showing up in the archive as 271 (33554703 = 271 + 2^25). In no cases that I tried was a large uid a fatal error; I'm not really convinced that it should be fatal for FreeBSD. Poking at the source, it seems this stems from usr.bin/ar/write.c's use of the AC() macro, defined in ar.h: #define AC(CALL) do { \ if ((CALL)) \ bsdar_errc(bsdar, EX_SOFTWARE, 0, "%s", \ archive_error_string(a)); \ } while (0) archive_write_header() is always called within this macro, and the relevant implementation (archive_write_ar_header() in libarchive/archive_write_set_format_ar.c) immediately returns ARCHIVE_WARN if the format_decimal() call fails. Other places in the libarchive code actually use the distinction between ARCHIVE_OK, ARCHIVE_WARN, and ARCHIVE_FATAL (and friends); I think that it would be pretty easy to modify format_decimal() (and probably its cousins) to use that convention instead of just -1 and 0. It already does a reasonable thing in the case of overflow (write the maximum value), it just does not distinguish between the different possible errors. I propose that format_{decimal,octal}() return ARCHIVE_FAILED for negative input, and ARCHIVE_WARN for overflow. archive_write_ar_header() can then catch ARCHIVE_WARN from the format_foo functions and continue on, propagating the ARCHIVE_WARN return value at the end of its execution instead of bailing immediately. ar/write.c would also need to be changed, calling archive_write_header without the AC macro and dealing with the ARCHIVE_WARN return value case, presumably by writing archive_error_string(a) to stderr and continuing. Would (one of) you be willing to review a patch to that effect? Thanks, Ben Kaduk From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 08:02:33 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C363D1065670; Sat, 18 Sep 2010 08:02:33 +0000 (UTC) (envelope-from kientzle@freebsd.org) Received: from monday.kientzle.com (99-115-135-74.uvs.sntcca.sbcglobal.net [99.115.135.74]) by mx1.freebsd.org (Postfix) with ESMTP id D027D8FC19; Sat, 18 Sep 2010 08:02:32 +0000 (UTC) Received: from [10.123.2.180] (DIR-655 [192.168.1.65]) by monday.kientzle.com (8.14.3/8.14.3) with ESMTP id o8I7Ops0073710; Sat, 18 Sep 2010 07:24:51 GMT (envelope-from kientzle@freebsd.org) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: Tim Kientzle In-Reply-To: Date: Sat, 18 Sep 2010 00:24:51 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20100829201050.GA60715@stack.nl> To: Benjamin Kaduk X-Mailer: Apple Mail (2.1081) Cc: freebsd-hackers@freebsd.org, kaiw@freebsd.org, Jilles Tjoelker Subject: Re: ar(1) format_decimal failure is fatal? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 08:02:33 -0000 On Sep 17, 2010, at 9:01 PM, Benjamin Kaduk wrote: > On Sun, 29 Aug 2010, Jilles Tjoelker wrote: >=20 >> On Sat, Aug 28, 2010 at 07:08:34PM -0400, Benjamin Kaduk wrote: >>> [...] >>> building static egacy library >>> ar: fatal: Numeric user ID too large >>> *** Error code 70 >>=20 >>> This error appears to be coming from >>> lib/libarchive/archive_write_set_format_ar.c , which seems to only = have >>> provisions for outputting a user ID in AR_uid_size =3D 6 columns. > [...] >>> It looks like this macro was so defined in version 1.1 of that file, = with >>> commit message "'ar' format support for libarchive, contributed by = Kai >>> Wang.". This doesn't make it terribly clear whether the 'ar' format >>> mandates this length, or if it is an implementation decision... There's no official standard for the ar format, only old conventions and compatibility with legacy implementations. >> I wonder if the uid/gid fields are useful at all for ar archives. Ar >> archives are usually not extracted, and when they are, the current >> user's values seem good enough. The uid/gid also prevent exactly >> reproducible builds (together with the timestamp). >=20 > GNU binutils has recently (well, March 2009) added a -D = ("deterministic") argument to ar(1) which sets the timestamp, uid, and = gid to zero, and the mode to 644. If that argument is not given, = linux's ar(1) happily uses my 8-digit uid as-is; the manual page seems = to imply that it will handle 15 or 16 digits in that field. Please send me a small example file... I don't think I've seen this format variant. Maybe we can extend our ar(1) to support this variant. Personally, I wonder if it wouldn't make sense to just always force the timestamp, uid, and gid to zero. I find it hard to believe anyone is using ar(1) as a general-purpose archiving tool. Of course, it should be trivial to add -D support to our ar(1). > I propose that format_{decimal,octal}() return ARCHIVE_FAILED for = negative input, and ARCHIVE_WARN for overflow. = archive_write_ar_header() can then catch ARCHIVE_WARN from the = format_foo functions and continue on, propagating the ARCHIVE_WARN = return value at the end of its execution ... This sounds entirely reasonable to me. I personally don't see much advantage to distinguishing negative versus overflow, but certainly have no objections to that part. Definitely ar(1) should not abort on a simple ARCHIVE_WARN. > Would (one of) you be willing to review a patch to that effect? Happy to do so.=20 Cheers, Tim From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 11:23:40 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 559B2106566C for ; Sat, 18 Sep 2010 11:23:40 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9ADA68FC08 for ; Sat, 18 Sep 2010 11:23:39 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA05606 for ; Sat, 18 Sep 2010 14:23:37 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OwvW5-000CwJ-7j for freebsd-hackers@FreeBSD.org; Sat, 18 Sep 2010 14:23:37 +0300 Message-ID: <4C94A138.8050905@icyb.net.ua> Date: Sat, 18 Sep 2010 14:23:36 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.org X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Subject: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 11:23:40 -0000 Here's a small patch that adds support for printing stack trace in form of frame addresses when KDB_TRACE is enabled, but there is no debugger backend configured. The patch is styled after "cheap" variant of stack_ktr. What do you think (useful/useless, correct, etc) ? --- a/sys/kern/subr_kdb.c +++ b/sys/kern/subr_kdb.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -295,10 +296,16 @@ void kdb_backtrace(void) { + struct stack st; + int i; - if (kdb_dbbe != NULL && kdb_dbbe->dbbe_trace != NULL) { - printf("KDB: stack backtrace:\n"); + printf("KDB: stack backtrace:\n"); + if (kdb_dbbe != NULL && kdb_dbbe->dbbe_trace != NULL) kdb_dbbe->dbbe_trace(); + else { + stack_save(&st); + for (i = 0; i < st.depth; i++) + printf("#%d %p\n", i, (void*)(uintptr_t)st.pcs[i]); } } -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 11:23:48 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EAC9C1065697; Sat, 18 Sep 2010 11:23:48 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C44A18FC1F; Sat, 18 Sep 2010 11:23:48 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 75C8546B2D; Sat, 18 Sep 2010 07:23:48 -0400 (EDT) Date: Sat, 18 Sep 2010 12:23:48 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Andre Oppermann In-Reply-To: <4C935F56.4030903@freebsd.org> Message-ID: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, Jeff Roberson , Andriy Gapon Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 11:23:49 -0000 On Fri, 17 Sep 2010, Andre Oppermann wrote: >> Although keeping free items around improves performance, it does consume >> memory too. And the fact that that memory is not freed on lowmem condition >> makes the situation worse. > > Interesting. We may run into related issues with excessive mbuf (cluster) > caching in the per-cpu buckets as well. > > Having a general solutions for that is appreciated. Maybe the size of the > free per-cpu buckets should be specified when setting up the UMA zone. Of > certain frequently re-used elements we may want to cache more, other less. I've been keeping a vague eye out for this over the last few years, and haven't spotted many problems in production machines I've inspected. You can use the umastat tool in the tools tree to look at the distribution of memory over buckets (etc) in UMA manually. It would be nice if it had some automated statistics on fragmentation however. Short-lived fragmentation is likely, and isn't an issue, so what you want is a tool that monitors over time and reports on longer-lived fragmentation. The main fragmentation issue we've had in the past has been due to mbuf+cluster caching, which prevented mbufs from being freed usefully in some cases. Jeff's ongoing work on variable-sized mbufs would entirely eliminate that problem... Robert From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 11:27:46 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C5BF1065670; Sat, 18 Sep 2010 11:27:46 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8CB628FC17; Sat, 18 Sep 2010 11:27:45 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA05660; Sat, 18 Sep 2010 14:27:44 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Owva3-000Cwf-Oi; Sat, 18 Sep 2010 14:27:43 +0300 Message-ID: <4C94A22F.1070608@freebsd.org> Date: Sat, 18 Sep 2010 14:27:43 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: Robert Watson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 11:27:46 -0000 on 18/09/2010 14:23 Robert Watson said the following: > I've been keeping a vague eye out for this over the last few years, and haven't > spotted many problems in production machines I've inspected. You can use the > umastat tool in the tools tree to look at the distribution of memory over > buckets (etc) in UMA manually. It would be nice if it had some automated > statistics on fragmentation however. Short-lived fragmentation is likely, and > isn't an issue, so what you want is a tool that monitors over time and reports > on longer-lived fragmentation. > > The main fragmentation issue we've had in the past has been due to mbuf+cluster > caching, which prevented mbufs from being freed usefully in some cases. Jeff's > ongoing work on variable-sized mbufs would entirely eliminate that problem... Robert, just in case, this thread is not about fragmentation, it's about per-cpu buckets, number of items in them and size of the items. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 11:30:44 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7930106566C; Sat, 18 Sep 2010 11:30:44 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B11DC8FC08; Sat, 18 Sep 2010 11:30:44 +0000 (UTC) Received: from [127.0.0.1] (rhee.cl.cam.ac.uk [128.232.1.202]) by cyrus.watson.org (Postfix) with ESMTPSA id BE70246B2D; Sat, 18 Sep 2010 07:30:43 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: "Robert N. M. Watson" In-Reply-To: <4C94A22F.1070608@freebsd.org> Date: Sat, 18 Sep 2010 12:30:41 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <52AE93F3-D15F-40C9-A9CA-07F30C803B81@freebsd.org> References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C94A22F.1070608@freebsd.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) Cc: freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 11:30:45 -0000 On 18 Sep 2010, at 12:27, Andriy Gapon wrote: > on 18/09/2010 14:23 Robert Watson said the following: >> I've been keeping a vague eye out for this over the last few years, = and haven't >> spotted many problems in production machines I've inspected. You can = use the >> umastat tool in the tools tree to look at the distribution of memory = over >> buckets (etc) in UMA manually. It would be nice if it had some = automated >> statistics on fragmentation however. Short-lived fragmentation is = likely, and >> isn't an issue, so what you want is a tool that monitors over time = and reports >> on longer-lived fragmentation. >>=20 >> The main fragmentation issue we've had in the past has been due to = mbuf+cluster >> caching, which prevented mbufs from being freed usefully in some = cases. Jeff's >> ongoing work on variable-sized mbufs would entirely eliminate that = problem... >=20 > just in case, this thread is not about fragmentation, it's about = per-cpu > buckets, number of items in them and size of the items. Those issues are closely related, and in particular, wanted to point = Andre at umastat since he's probably not aware of it.. :-) Robert= From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 12:49:13 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39C7F106566B; Sat, 18 Sep 2010 12:49:13 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.29.23]) by mx1.freebsd.org (Postfix) with ESMTP id BC0848FC0C; Sat, 18 Sep 2010 12:49:12 +0000 (UTC) Received: from [87.79.159.189] (helo=r500.local) by smtprelay01.ispgateway.de with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1Owweq-0004gU-7a; Sat, 18 Sep 2010 14:36:44 +0200 Date: Sat, 18 Sep 2010 14:35:16 +0200 From: Fabian Keil To: Robert Watson Message-ID: <20100918143516.3568f40e@r500.local> In-Reply-To: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; amd64-portbld-freebsd9.0) X-PGP-KEY-URL: http://www.fabiankeil.de/gpg-keys/freebsd-listen-2008-08-18.asc Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/j0IIO6G0OvbQXQCJQewu8.K"; protocol="application/pgp-signature" X-Df-Sender: 775067 Cc: freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 12:49:13 -0000 --Sig_/j0IIO6G0OvbQXQCJQewu8.K Content-Type: multipart/mixed; boundary="MP_/V45ylbNW9Sv144uke8uKVXp" --MP_/V45ylbNW9Sv144uke8uKVXp Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Robert Watson wrote: > On Fri, 17 Sep 2010, Andre Oppermann wrote: >=20 > >> Although keeping free items around improves performance, it does consu= me=20 > >> memory too. And the fact that that memory is not freed on lowmem cond= ition=20 > >> makes the situation worse. > > > > Interesting. We may run into related issues with excessive mbuf (clust= er)=20 > > caching in the per-cpu buckets as well. > > > > Having a general solutions for that is appreciated. Maybe the size of = the=20 > > free per-cpu buckets should be specified when setting up the UMA zone. = Of=20 > > certain frequently re-used elements we may want to cache more, other le= ss. >=20 > I've been keeping a vague eye out for this over the last few years, and=20 > haven't spotted many problems in production machines I've inspected. You= can=20 > use the umastat tool in the tools tree to look at the distribution of mem= ory=20 > over buckets (etc) in UMA manually. Doesn't build for me on amd64: fk@r500 /usr/src/tools/tools/umastat $make Warning: Object directory not changed from original /usr/src/tools/tools/um= astat cc -O2 -pipe -fno-omit-frame-pointer -std=3Dgnu99 -fstack-protector -Wsyst= em-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-= prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-poi= nter-sign -c umastat.c cc1: warnings being treated as errors umastat.c: In function 'uma_print_bucketlist': umastat.c:234: warning: format '%llu' expects type 'long long unsigned int'= , but argument 3 has type 'uint64_t' umastat.c:234: warning: format '%llu' expects type 'long long unsigned int'= , but argument 4 has type 'uint64_t' umastat.c: In function 'uma_print_cache': umastat.c:245: warning: format '%llu' expects type 'long long unsigned int'= , but argument 3 has type 'u_int64_t' umastat.c:246: warning: format '%llu' expects type 'long long unsigned int'= , but argument 3 has type 'u_int64_t' umastat.c: In function 'main': umastat.c:416: warning: format '%llu' expects type 'long long unsigned int'= , but argument 2 has type 'u_int64_t' umastat.c:418: warning: format '%llu' expects type 'long long unsigned int'= , but argument 2 has type 'u_int64_t' umastat.c:420: warning: format '%llu' expects type 'long long unsigned int'= , but argument 2 has type 'u_int64_t' umastat.c:426: warning: dereferencing type-punned pointer will break strict= -aliasing rules umastat.c:429: warning: dereferencing type-punned pointer will break strict= -aliasing rules *** Error code 1 Stop in /usr/src/tools/tools/umastat. The attached patch seems to work around the problem, I'm not sure if the casts to void* are better than decreasing the WARN level, though ... Fabian --MP_/V45ylbNW9Sv144uke8uKVXp Content-Type: text/x-patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename=0001-Work-around-umastat-build-failures-on-amd64.patch =46rom b84b5cf4f24b6886b5db9885f5bea707dcfb11e8 Mon Sep 17 00:00:00 2001 From: Fabian Keil Date: Sat, 18 Sep 2010 13:55:54 +0200 Subject: [PATCH] Work around umastat build failures on amd64. --- tools/tools/umastat/umastat.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/tools/umastat/umastat.c b/tools/tools/umastat/umastat.c index 3c9fe0e..639bf80 100644 --- a/tools/tools/umastat/umastat.c +++ b/tools/tools/umastat/umastat.c @@ -230,7 +230,7 @@ uma_print_bucketlist(kvm_t *kvm, struct bucketlist *buc= ketlist, } =20 printf("\n"); - printf("%s}; // total cnt %llu, total entries %llu\n", spaces, + printf("%s}; // total cnt %ju, total entries %ju\n", spaces, total_cnt, total_entries); } =20 @@ -242,8 +242,8 @@ uma_print_cache(kvm_t *kvm, struct uma_cache *cache, co= nst char *name, int ret; =20 printf("%s%s[%d] =3D {\n", spaces, name, cpu); - printf("%s uc_frees =3D %llu;\n", spaces, cache->uc_frees); - printf("%s uc_allocs =3D %llu;\n", spaces, cache->uc_allocs); + printf("%s uc_frees =3D %ju;\n", spaces, cache->uc_frees); + printf("%s uc_allocs =3D %ju;\n", spaces, cache->uc_allocs); =20 if (cache->uc_freebucket !=3D NULL) { ret =3D kread(kvm, cache->uc_freebucket, &ub, sizeof(ub), 0); @@ -412,20 +412,20 @@ main(int argc, char *argv[]) } printf(" Zone {\n"); printf(" uz_name =3D \"%s\";\n", name); - printf(" uz_allocs =3D %llu;\n", + printf(" uz_allocs =3D %ju;\n", uzp_userspace->uz_allocs); - printf(" uz_frees =3D %llu;\n", + printf(" uz_frees =3D %ju;\n", uzp_userspace->uz_frees); - printf(" uz_fails =3D %llu;\n", + printf(" uz_fails =3D %ju;\n", uzp_userspace->uz_fails); printf(" uz_fills =3D %u;\n", uzp_userspace->uz_fills); printf(" uz_count =3D %u;\n", uzp_userspace->uz_count); - uma_print_bucketlist(kvm, (struct bucketlist *) + uma_print_bucketlist(kvm, (void *) &uzp_userspace->uz_full_bucket, "uz_full_bucket", " "); - uma_print_bucketlist(kvm, (struct bucketlist *) + uma_print_bucketlist(kvm, (void *) &uzp_userspace->uz_free_bucket, "uz_free_bucket", " "); =20 --=20 1.7.2.3 --MP_/V45ylbNW9Sv144uke8uKVXp-- --Sig_/j0IIO6G0OvbQXQCJQewu8.K Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iEYEARECAAYFAkyUsgkACgkQBYqIVf93VJ16SACfcwYSHrh0IoqMUFODzDrJ9RQZ 9voAoIqzNCiBLm9dpxXbGh0l8WHJEsg2 =MVkL -----END PGP SIGNATURE----- --Sig_/j0IIO6G0OvbQXQCJQewu8.K-- From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 13:29:14 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 982C61065670; Sat, 18 Sep 2010 13:29:14 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AB1CE8FC15; Sat, 18 Sep 2010 13:29:13 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA07099; Sat, 18 Sep 2010 16:29:12 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OwxTb-000D5K-LP; Sat, 18 Sep 2010 16:29:11 +0300 Message-ID: <4C94BEA7.6040504@freebsd.org> Date: Sat, 18 Sep 2010 16:29:11 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: "Robert N. M. Watson" References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C94A22F.1070608@freebsd.org> <52AE93F3-D15F-40C9-A9CA-07F30C803B81@freebsd.org> In-Reply-To: <52AE93F3-D15F-40C9-A9CA-07F30C803B81@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 13:29:14 -0000 on 18/09/2010 14:30 Robert N. M. Watson said the following: > Those issues are closely related, and in particular, wanted to point Andre at > umastat since he's probably not aware of it.. :-) I didn't know about the tool too, so thanks! But I perceived the issues as quite opposite: small items vs huge items. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 13:52:53 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C36281065673 for ; Sat, 18 Sep 2010 13:52:53 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 9B6398FC15 for ; Sat, 18 Sep 2010 13:52:53 +0000 (UTC) Received: from [127.0.0.1] (rhee.cl.cam.ac.uk [128.232.1.202]) by cyrus.watson.org (Postfix) with ESMTPSA id 9B05146B09; Sat, 18 Sep 2010 09:52:52 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: "Robert N. M. Watson" In-Reply-To: <20100918143516.3568f40e@r500.local> Date: Sat, 18 Sep 2010 14:52:51 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <20100918143516.3568f40e@r500.local> To: Fabian Keil X-Mailer: Apple Mail (2.1081) Cc: freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 13:52:53 -0000 On 18 Sep 2010, at 13:35, Fabian Keil wrote: > Doesn't build for me on amd64: >=20 > fk@r500 /usr/src/tools/tools/umastat $make > Warning: Object directory not changed from original = /usr/src/tools/tools/umastat > cc -O2 -pipe -fno-omit-frame-pointer -std=3Dgnu99 -fstack-protector = -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter = -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith = -Wno-uninitialized -Wno-pointer-sign -c umastat.c > cc1: warnings being treated as errors > umastat.c: In function 'uma_print_bucketlist': > umastat.c:234: warning: format '%llu' expects type 'long long unsigned = int', but argument 3 has type 'uint64_t' > umastat.c:234: warning: format '%llu' expects type 'long long unsigned = int', but argument 4 has type 'uint64_t' > umastat.c: In function 'uma_print_cache': > umastat.c:245: warning: format '%llu' expects type 'long long unsigned = int', but argument 3 has type 'u_int64_t' > umastat.c:246: warning: format '%llu' expects type 'long long unsigned = int', but argument 3 has type 'u_int64_t' > umastat.c: In function 'main': > umastat.c:416: warning: format '%llu' expects type 'long long unsigned = int', but argument 2 has type 'u_int64_t' > umastat.c:418: warning: format '%llu' expects type 'long long unsigned = int', but argument 2 has type 'u_int64_t' > umastat.c:420: warning: format '%llu' expects type 'long long unsigned = int', but argument 2 has type 'u_int64_t' > umastat.c:426: warning: dereferencing type-punned pointer will break = strict-aliasing rules > umastat.c:429: warning: dereferencing type-punned pointer will break = strict-aliasing rules > *** Error code 1 >=20 > Stop in /usr/src/tools/tools/umastat. >=20 > The attached patch seems to work around the problem, I'm not sure if > the casts to void* are better than decreasing the WARN level, though = ... This is a 32-bit/64-bit issue. Probably all pointers printing should be = converted to %p, and large integer types to %ju and %jd, perhaps with a = cast first to intmax_t or uintmax_t if required. Robert= From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 15:30:52 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B44B106566C for ; Sat, 18 Sep 2010 15:30:52 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id EDACD8FC17 for ; Sat, 18 Sep 2010 15:30:51 +0000 (UTC) Received: by iwn34 with SMTP id 34so3412455iwn.13 for ; Sat, 18 Sep 2010 08:30:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=qEzrj+z/ZxgA+AlwhN+SwIBRyayXTx5qDaFIep448oY=; b=s+Oo90rnNJh+oFuhuAhfjBWjfo0nmpLI44fZa1agXtlVS4AFCKSHdTN9yqQKP3+V+l bxsdgVOO7ZMzo65O7HlaaKM5E51sfiusKcmXzLudqnQl8cKSDpdkHN9cvB/ZqoMWAJWF WyxjWKgPn++VDEBeqfpzSB84LOgD7CcyUS9uI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=GtPl3u1AZTDtIMCNqKL1gU/UMVs53mGQOBv7ihx7wzRbEGiCxt7rwjR/Hd759/00pb 0nHzyuDIzHQEPJZJ5J4xi+DCEP+E7pe3jIn3p3K3AUWPBBAI79aSysx3q16wyo7bzH4E ysYJtTQG7lE8fExMYhjYfVC1aX1qfAf5QbBSA= MIME-Version: 1.0 Received: by 10.231.193.135 with SMTP id du7mr6087557ibb.176.1284823851253; Sat, 18 Sep 2010 08:30:51 -0700 (PDT) Sender: yanegomi@gmail.com Received: by 10.231.11.133 with HTTP; Sat, 18 Sep 2010 08:30:51 -0700 (PDT) In-Reply-To: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <20100918143516.3568f40e@r500.local> Date: Sat, 18 Sep 2010 08:30:51 -0700 X-Google-Sender-Auth: qg-NKjrSM8pHGrs2OZblCgKXerA Message-ID: From: Garrett Cooper To: "Robert N. M. Watson" Content-Type: multipart/mixed; boundary=00504501751940c0ea04908a5de9 Cc: freebsd-hackers@freebsd.org, Fabian Keil Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 15:30:52 -0000 --00504501751940c0ea04908a5de9 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sat, Sep 18, 2010 at 6:52 AM, Robert N. M. Watson wrote: > > On 18 Sep 2010, at 13:35, Fabian Keil wrote: > >> Doesn't build for me on amd64: >> >> fk@r500 /usr/src/tools/tools/umastat $make >> Warning: Object directory not changed from original /usr/src/tools/tools= /umastat >> cc -O2 -pipe =A0-fno-omit-frame-pointer -std=3Dgnu99 -fstack-protector -= Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wst= rict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wn= o-pointer-sign -c umastat.c >> cc1: warnings being treated as errors >> umastat.c: In function 'uma_print_bucketlist': >> umastat.c:234: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 3 has type 'uint64_t' >> umastat.c:234: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 4 has type 'uint64_t' >> umastat.c: In function 'uma_print_cache': >> umastat.c:245: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 3 has type 'u_int64_t' >> umastat.c:246: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 3 has type 'u_int64_t' >> umastat.c: In function 'main': >> umastat.c:416: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 2 has type 'u_int64_t' >> umastat.c:418: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 2 has type 'u_int64_t' >> umastat.c:420: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 2 has type 'u_int64_t' >> umastat.c:426: warning: dereferencing type-punned pointer will break str= ict-aliasing rules >> umastat.c:429: warning: dereferencing type-punned pointer will break str= ict-aliasing rules >> *** Error code 1 >> >> Stop in /usr/src/tools/tools/umastat. >> >> The attached patch seems to work around the problem, I'm not sure if >> the casts to void* are better than decreasing the WARN level, though ... > > This is a 32-bit/64-bit issue. Probably all pointers printing should be c= onverted to %p, and large integer types to %ju and %jd, perhaps with a cast= first to intmax_t or uintmax_t if required. All types were explicitly declared as u_int64_t, so I'd try this instead with PRIu64. Very few spots in the code today use void * (and the ones that do interface with kvm_read(3)). FWIW, kvm_read taking the second argument as unsigned long instead of void* seems a bit inconsistent: ssize_t kvm_read(kvm_t *kd, unsigned long addr, void *buf, size_t nbytes); ssize_t kvm_write(kvm_t *kd, unsigned long addr, const void *buf, size_t nbyte= s); but that's a different topic to look at later, if it really matters to anyo= ne. Thanks, -Garrett --00504501751940c0ea04908a5de9 Content-Type: application/octet-stream; name="umastat-64bit.diff" Content-Disposition: attachment; filename="umastat-64bit.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_ge8mva1o0 SW5kZXg6IHVtYXN0YXQuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09Ci0tLSB1bWFzdGF0LmMJKHJldmlzaW9uIDIxMjIy MykKKysrIHVtYXN0YXQuYwkod29ya2luZyBjb3B5KQpAQCAtMzYsNiArMzYsNyBAQAogI2luY2x1 ZGUgPHZtL3VtYV9pbnQuaD4KIAogI2luY2x1ZGUgPGVyci5oPgorI2luY2x1ZGUgPGludHR5cGVz Lmg+CiAjaW5jbHVkZSA8a3ZtLmg+CiAjaW5jbHVkZSA8bWVtc3RhdC5oPgogI2luY2x1ZGUgPHN0 ZGlvLmg+CkBAIC0yMzAsOCArMjMxLDggQEAKIAl9CiAKIAlwcmludGYoIlxuIik7Ci0JcHJpbnRm KCIlc307ICAvLyB0b3RhbCBjbnQgJWxsdSwgdG90YWwgZW50cmllcyAlbGx1XG4iLCBzcGFjZXMs Ci0JICAgIHRvdGFsX2NudCwgdG90YWxfZW50cmllcyk7CisJcHJpbnRmKCIlc307ICAvLyB0b3Rh bCBjbnQgJSJQUkl1NjQiLCB0b3RhbCBlbnRyaWVzICUiUFJJdTY0IlxuIiwKKwkgICAgc3BhY2Vz LCB0b3RhbF9jbnQsIHRvdGFsX2VudHJpZXMpOwogfQogCiBzdGF0aWMgdm9pZApAQCAtMjQyLDgg KzI0Myw4IEBACiAJaW50IHJldDsKIAogCXByaW50ZigiJXMlc1slZF0gPSB7XG4iLCBzcGFjZXMs IG5hbWUsIGNwdSk7Ci0JcHJpbnRmKCIlcyAgdWNfZnJlZXMgPSAlbGx1O1xuIiwgc3BhY2VzLCBj YWNoZS0+dWNfZnJlZXMpOwotCXByaW50ZigiJXMgIHVjX2FsbG9jcyA9ICVsbHU7XG4iLCBzcGFj ZXMsIGNhY2hlLT51Y19hbGxvY3MpOworCXByaW50ZigiJXMgIHVjX2ZyZWVzID0gJSJQUkl1NjQi O1xuIiwgc3BhY2VzLCBjYWNoZS0+dWNfZnJlZXMpOworCXByaW50ZigiJXMgIHVjX2FsbG9jcyA9 ICUiUFJJdTY0IjtcbiIsIHNwYWNlcywgY2FjaGUtPnVjX2FsbG9jcyk7CiAKIAlpZiAoY2FjaGUt PnVjX2ZyZWVidWNrZXQgIT0gTlVMTCkgewogCQlyZXQgPSBrcmVhZChrdm0sIGNhY2hlLT51Y19m cmVlYnVja2V0LCAmdWIsIHNpemVvZih1YiksIDApOwpAQCAtNDEyLDExICs0MTMsMTEgQEAKIAkJ CX0KIAkJCXByaW50ZigiICBab25lIHtcbiIpOwogCQkJcHJpbnRmKCIgICAgdXpfbmFtZSA9IFwi JXNcIjtcbiIsIG5hbWUpOwotCQkJcHJpbnRmKCIgICAgdXpfYWxsb2NzID0gJWxsdTtcbiIsCisJ CQlwcmludGYoIiAgICB1el9hbGxvY3MgPSAlIlBSSXU2NCI7XG4iLAogCQkJICAgIHV6cF91c2Vy c3BhY2UtPnV6X2FsbG9jcyk7Ci0JCQlwcmludGYoIiAgICB1el9mcmVlcyA9ICVsbHU7XG4iLAor CQkJcHJpbnRmKCIgICAgdXpfZnJlZXMgPSAlIlBSSXU2NCI7XG4iLAogCQkJICAgIHV6cF91c2Vy c3BhY2UtPnV6X2ZyZWVzKTsKLQkJCXByaW50ZigiICAgIHV6X2ZhaWxzID0gJWxsdTtcbiIsCisJ CQlwcmludGYoIiAgICB1el9mYWlscyA9ICUiUFJJdTY0IjtcbiIsCiAJCQkgICAgdXpwX3VzZXJz cGFjZS0+dXpfZmFpbHMpOwogCQkJcHJpbnRmKCIgICAgdXpfZmlsbHMgPSAldTtcbiIsCiAJCQkg ICAgdXpwX3VzZXJzcGFjZS0+dXpfZmlsbHMpOwo= --00504501751940c0ea04908a5de9-- From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 16:11:28 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49AC61065672 for ; Sat, 18 Sep 2010 16:11:28 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id B04698FC08 for ; Sat, 18 Sep 2010 16:11:27 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o8IGBNxk080340 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 18 Sep 2010 19:11:23 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o8IGBNAr037222; Sat, 18 Sep 2010 19:11:23 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o8IGBMjR037221; Sat, 18 Sep 2010 19:11:22 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 18 Sep 2010 19:11:22 +0300 From: Kostik Belousov To: Mateusz Guzik Message-ID: <20100918161122.GU2389@deviant.kiev.zoral.com.ua> References: <4C8A81D9.5020905@rawbw.com> <20100910194600.GB60815@stack.nl> <20100912130801.GA23538@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Wlbg71WMOPzcvmIn" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_20, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Alexander Best , freebsd-hackers@freebsd.org, Jilles Tjoelker , Yuri Subject: Re: Why I can't trace linux process's childs with truss? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 16:11:28 -0000 --Wlbg71WMOPzcvmIn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Sep 12, 2010 at 05:01:09PM +0200, Mateusz Guzik wrote: > On Sun, Sep 12, 2010 at 3:08 PM, Alexander Best wro= te: > > there's a PR related to this "issue" [1]. so is truss missing this > > functionality or is this in fact a feature, because truss musn't be use= d on > > any non freebsd executable? > > >=20 > Actually truss handles linux processes just fine, except for their childr= en. :) > Linux process can create a child using linux_clone syscall, but truss doe= s not > handle that case and this can be the problem that Yuri reported (since > no log was > provided, I can only guess). >=20 > This trivial patch should fix this: > http://student.agh.edu.pl/~mjguzik/truss-linux-forks.patch This is too trivial, IMO. linux_clone() does not neccessary cause new process to be created, I think. >=20 > Tested on this simple program: > http://student.agh.edu.pl/~mjguzik/fork.c >=20 > If it still does not work, log generated by truss would be helfpul. --Wlbg71WMOPzcvmIn Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkyU5KoACgkQC3+MBN1Mb4gHLwCgmhPxYKiowkOfNguiKSZ3pY6X cBAAn2eQ4uOkvtH2s58PkJls7s3SbipN =VDPi -----END PGP SIGNATURE----- --Wlbg71WMOPzcvmIn-- From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 17:31:13 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D8A1106566B; Sat, 18 Sep 2010 17:31:13 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 2B7D58FC19; Sat, 18 Sep 2010 17:31:12 +0000 (UTC) Received: by qwg5 with SMTP id 5so2980286qwg.13 for ; Sat, 18 Sep 2010 10:31:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=5DkWPnfo/teCp6QxaDxKazLg/ZQNeSCiESarYWsymPM=; b=C5FX06akGjS2VNzJjyvJdpQpIHq/GtSWW3z//1LafHhyR9nVzZA6btFkPgClY5/Pg3 nPrMIv9B8xWJSpvPPR9WLq4GzrVtB+PIRMf1UK1qcP205ekWpXjSkhWsP2qntaWnWVjd CNQ9aOmzyWaKw0yR3YsmTi+wN7sR4N8b+dRnw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=sUq+BDtpUqjh58s81S5qPnsIaVkpGdID32s38dppqaEbhcsL35kSTnX6tdcCXIdQHX MF/Df4IZvgjs2NEwM1Ysd+ksGpe7XHdDHplapZ2k0v99WVXGNif2TfsudzSHooBQPkb4 MjkDUJOSQXwehnPU7DpPLTHvNgJCQIYcwFh7I= MIME-Version: 1.0 Received: by 10.229.11.18 with SMTP id r18mr4405897qcr.281.1284829283744; Sat, 18 Sep 2010 10:01:23 -0700 (PDT) Received: by 10.229.19.206 with HTTP; Sat, 18 Sep 2010 10:01:23 -0700 (PDT) In-Reply-To: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <20100918143516.3568f40e@r500.local> Date: Sat, 18 Sep 2010 21:01:23 +0400 Message-ID: From: pluknet To: "Robert N. M. Watson" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, Fabian Keil Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 17:31:13 -0000 On 18 September 2010 17:52, Robert N. M. Watson wrote= : > > On 18 Sep 2010, at 13:35, Fabian Keil wrote: > >> Doesn't build for me on amd64: >> >> fk@r500 /usr/src/tools/tools/umastat $make >> Warning: Object directory not changed from original /usr/src/tools/tools= /umastat >> cc -O2 -pipe =A0-fno-omit-frame-pointer -std=3Dgnu99 -fstack-protector -= Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wst= rict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wn= o-pointer-sign -c umastat.c >> cc1: warnings being treated as errors >> umastat.c: In function 'uma_print_bucketlist': >> umastat.c:234: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 3 has type 'uint64_t' >> umastat.c:234: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 4 has type 'uint64_t' >> umastat.c: In function 'uma_print_cache': >> umastat.c:245: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 3 has type 'u_int64_t' >> umastat.c:246: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 3 has type 'u_int64_t' >> umastat.c: In function 'main': >> umastat.c:416: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 2 has type 'u_int64_t' >> umastat.c:418: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 2 has type 'u_int64_t' >> umastat.c:420: warning: format '%llu' expects type 'long long unsigned i= nt', but argument 2 has type 'u_int64_t' >> umastat.c:426: warning: dereferencing type-punned pointer will break str= ict-aliasing rules >> umastat.c:429: warning: dereferencing type-punned pointer will break str= ict-aliasing rules >> *** Error code 1 >> >> Stop in /usr/src/tools/tools/umastat. >> >> The attached patch seems to work around the problem, I'm not sure if >> the casts to void* are better than decreasing the WARN level, though ... > > This is a 32-bit/64-bit issue. Probably all pointers printing should be c= onverted to %p, and large integer types to %ju and %jd, perhaps with a cast= first to intmax_t or uintmax_t if required. > FYI, There is a PR 146119 about sort of fixing that issues. --=20 wbr, pluknet From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 18:26:38 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0798E1065673 for ; Sat, 18 Sep 2010 18:26:38 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id ACB4B8FC0A for ; Sat, 18 Sep 2010 18:26:37 +0000 (UTC) Received: by qwg5 with SMTP id 5so2998504qwg.13 for ; Sat, 18 Sep 2010 11:26:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=O06q6jKsHAn9hWr71J0lIm64cO/vjmlV00nTX8p5rRM=; b=cMUvxEpR7zZ0LIEGv5VpyhTjhEvIn9QvMQtVSw1pUxE4grBqU+JAtTSUjUi5rHRuZB OQrWKKlTIGreV9IPZSq1EriYQ2jXexq4vFMKWyUIfMxau8ThbphTa4JcUPklMI7BaTMw CAwFyAAoQgAiBEMRNKoCTv17LuPz0yOWpXyLE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=mOUDoK9yWEwiBaiyMTK+qMdpA/SObvX6JujEIsQOEu3wXf0IBVUE9DTamSLH43RgFk XPyUV/pKQ9/LlqGZiBWmWEg78+VjVSadScvUshSvhLMAlqlK3ppvhhzEYios8qy9xENe IONrda3j8MUjFjj+QtzWJ3LZUIEOkapHzIf90= MIME-Version: 1.0 Received: by 10.224.28.145 with SMTP id m17mr4477091qac.196.1284834396771; Sat, 18 Sep 2010 11:26:36 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.229.235.143 with HTTP; Sat, 18 Sep 2010 11:26:36 -0700 (PDT) In-Reply-To: <4C94A138.8050905@icyb.net.ua> References: <4C94A138.8050905@icyb.net.ua> Date: Sat, 18 Sep 2010 20:26:36 +0200 X-Google-Sender-Auth: 6GFItXLAkcVaIihHlDlIKx2a62U Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 18:26:38 -0000 2010/9/18 Andriy Gapon : > > Here's a small patch that adds support for printing stack trace in form o= f frame > addresses when KDB_TRACE is enabled, but there is no debugger backend con= figured. > The patch is styled after "cheap" variant of stack_ktr. > > What do you think (useful/useless, correct, etc) ? > > --- a/sys/kern/subr_kdb.c > +++ b/sys/kern/subr_kdb.c > @@ -37,6 +37,7 @@ > =C2=A0#include > =C2=A0#include > =C2=A0#include > +#include > =C2=A0#include > > =C2=A0#include > @@ -295,10 +296,16 @@ > =C2=A0void > =C2=A0kdb_backtrace(void) > =C2=A0{ > + =C2=A0 =C2=A0 =C2=A0 struct stack st; > + =C2=A0 =C2=A0 =C2=A0 int i; > > - =C2=A0 =C2=A0 =C2=A0 if (kdb_dbbe !=3D NULL && kdb_dbbe->dbbe_trace != =3D NULL) { > - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 printf("KDB: stack bac= ktrace:\n"); > + =C2=A0 =C2=A0 =C2=A0 printf("KDB: stack backtrace:\n"); > + =C2=A0 =C2=A0 =C2=A0 if (kdb_dbbe !=3D NULL && kdb_dbbe->dbbe_trace != =3D NULL) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kdb_dbbe->dbbe_tra= ce(); > + =C2=A0 =C2=A0 =C2=A0 else { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 stack_save(&st); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 for (i =3D 0; i < st.d= epth; i++) > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 printf("#%d %p\n", i, (void*)(uintptr_t)st.pcs[i]); > =C2=A0 =C2=A0 =C2=A0 =C2=A0} > =C2=A0} You have to eventually wrap this logic within the 'STACK' option (opt_stack.h for the check) because stack_save() will be uneffective otherwise. STACK should be mandatory for DDB I guess, but it is not for KDB. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 18:41:25 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63793106566B for ; Sat, 18 Sep 2010 18:41:25 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7A4D88FC15 for ; Sat, 18 Sep 2010 18:41:24 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA10346; Sat, 18 Sep 2010 21:41:23 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Ox2Li-000DRY-MV; Sat, 18 Sep 2010 21:41:22 +0300 Message-ID: <4C9507D1.3010008@icyb.net.ua> Date: Sat, 18 Sep 2010 21:41:21 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: Attilio Rao References: <4C94A138.8050905@icyb.net.ua> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 18:41:25 -0000 on 18/09/2010 21:26 Attilio Rao said the following: > > You have to eventually wrap this logic within the 'STACK' option > (opt_stack.h for the check) because stack_save() will be uneffective > otherwise. STACK should be mandatory for DDB I guess, but it is not > for KDB. Thank you for the tip! BTW, why is this under an option? It seems like something like this won't add much to kernel size and won't affect performance at all. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 18:55:08 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2D1771065679 for ; Sat, 18 Sep 2010 18:55:08 +0000 (UTC) (envelope-from freebsd-hackers@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id D9D1A8FC08 for ; Sat, 18 Sep 2010 18:55:07 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Ox2Yz-000830-3E for freebsd-hackers@freebsd.org; Sat, 18 Sep 2010 20:55:05 +0200 Received: from k.saper.info ([91.121.151.35]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 18 Sep 2010 20:55:05 +0200 Received: from saper by k.saper.info with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 18 Sep 2010 20:55:05 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-hackers@freebsd.org From: Marcin Cieslak Date: Sat, 18 Sep 2010 18:03:42 +0000 (UTC) Organization: http://saper.info Lines: 9 Message-ID: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <20100918143516.3568f40e@r500.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: k.saper.info User-Agent: slrn/0.9.9p1 (FreeBSD) Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 18:55:08 -0000 > FWIW, kvm_read taking the second argument as unsigned long instead of > void* seems a bit inconsistent: I think it done on purpose, since address in the kernel address space has nothing to do with pointers for mere userland mortals. We shouldn't bother compiler with aliasing and other stuff in case of kernel addresses. //Marcin From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 19:00:27 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E1A21065670; Sat, 18 Sep 2010 19:00:27 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8DB778FC1B; Sat, 18 Sep 2010 19:00:26 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA10537; Sat, 18 Sep 2010 22:00:24 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Ox2e8-000DT7-Ly; Sat, 18 Sep 2010 22:00:24 +0300 Message-ID: <4C950C48.6020600@freebsd.org> Date: Sat, 18 Sep 2010 22:00:24 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: Attilio Rao References: <4C94A138.8050905@icyb.net.ua> <4C9507D1.3010008@icyb.net.ua> In-Reply-To: <4C9507D1.3010008@icyb.net.ua> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 19:00:27 -0000 on 18/09/2010 21:41 Andriy Gapon said the following: > on 18/09/2010 21:26 Attilio Rao said the following: >> >> You have to eventually wrap this logic within the 'STACK' option >> (opt_stack.h for the check) because stack_save() will be uneffective >> otherwise. STACK should be mandatory for DDB I guess, but it is not >> for KDB. > > Thank you for the tip! > BTW, why is this under an option? > It seems like something like this won't add much to kernel size and won't affect > performance at all. > Oh, wow, and I totally overlooked stack_print(). Should have read stack(9) from the start. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 20:30:06 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7BDDF1065679; Sat, 18 Sep 2010 20:30:06 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 956078FC18; Sat, 18 Sep 2010 20:30:05 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA11471; Sat, 18 Sep 2010 23:30:03 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Ox42t-000DZ7-8B; Sat, 18 Sep 2010 23:30:03 +0300 Message-ID: <4C95214A.3070600@freebsd.org> Date: Sat, 18 Sep 2010 23:30:02 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: Attilio Rao References: <4C94A138.8050905@icyb.net.ua> <4C9507D1.3010008@icyb.net.ua> <4C950C48.6020600@freebsd.org> In-Reply-To: <4C950C48.6020600@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 20:30:06 -0000 on 18/09/2010 22:00 Andriy Gapon said the following: > Oh, wow, and I totally overlooked stack_print(). > Should have read stack(9) from the start. New patch. Hope this is better. I don't like that the printf is duplicated, but couldn't figure out a way to combine pre-processor and C conditions. --- a/sys/kern/subr_kdb.c +++ b/sys/kern/subr_kdb.c @@ -37,6 +37,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -300,6 +301,15 @@ kdb_backtrace(void) printf("KDB: stack backtrace:\n"); kdb_dbbe->dbbe_trace(); } +#ifdef STACK + else { + struct stack st; + + printf("KDB: stack backtrace:\n"); + stack_save(&st); + stack_print(&st); + } +#endif } /* -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 20:35:49 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6CA0E106566C; Sat, 18 Sep 2010 20:35:49 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 04D6A8FC0C; Sat, 18 Sep 2010 20:35:48 +0000 (UTC) Received: by qyk31 with SMTP id 31so1950493qyk.13 for ; Sat, 18 Sep 2010 13:35:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=5RFx50P4F4C6tu0HojTsa89DCFh42GPVILYc75KuQHs=; b=w0/0x/ih92CRV8qeHO/45gBHwYKV2pYhhJJ3LpADcjsvIbDgZaDcEtDcTPR8kPZ1fK /1MTA+oMozoFjA0v5K9bLB4jh8uHWG7iN6S9axbuobAKPmcbTuQ3/brsZsQ4eUDq4KrK HqDZ1npW4n6y3xafOb1VbUyHmVx5JRLu8V4m8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=lmnBJERnwcmnZKtNrXuUbt7+0eYlVQ6WqFQnVFJOohi9Ic4OkxNpeFpDlzBhdmXxLx +Yho5469QPb5nxmHjaisxsjxrM+hudJVqZ89q4wAEMcogp/LLfKwnxyLStakl3zk1eAL kTIu+U7U2coTp4YUuG9HijZrcSshWPEWEDHUk= MIME-Version: 1.0 Received: by 10.224.54.13 with SMTP id o13mr4609286qag.9.1284842148123; Sat, 18 Sep 2010 13:35:48 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.229.235.143 with HTTP; Sat, 18 Sep 2010 13:35:48 -0700 (PDT) In-Reply-To: <4C95214A.3070600@freebsd.org> References: <4C94A138.8050905@icyb.net.ua> <4C9507D1.3010008@icyb.net.ua> <4C950C48.6020600@freebsd.org> <4C95214A.3070600@freebsd.org> Date: Sat, 18 Sep 2010 22:35:48 +0200 X-Google-Sender-Auth: IZi5OPjEjC5t-Tks2jaEQwJa2gI Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 20:35:49 -0000 2010/9/18 Andriy Gapon : > on 18/09/2010 22:00 Andriy Gapon said the following: >> Oh, wow, and I totally overlooked stack_print(). >> Should have read stack(9) from the start. > > New patch. =C2=A0Hope this is better. > I don't like that the printf is duplicated, but couldn't figure out a way= to > combine pre-processor and C conditions. > > --- a/sys/kern/subr_kdb.c > +++ b/sys/kern/subr_kdb.c > @@ -37,6 +37,7 @@ __FBSDID("$FreeBSD$"); > =C2=A0#include > =C2=A0#include > =C2=A0#include > +#include > =C2=A0#include > > =C2=A0#include > @@ -300,6 +301,15 @@ kdb_backtrace(void) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0printf("KDB: stack= backtrace:\n"); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0kdb_dbbe->dbbe_tra= ce(); > =C2=A0 =C2=A0 =C2=A0 =C2=A0} > +#ifdef STACK > + =C2=A0 =C2=A0 =C2=A0 else { > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct stack st; > + > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 printf("KDB: stack bac= ktrace:\n"); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 stack_save(&st); > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 stack_print(&st); > + =C2=A0 =C2=A0 =C2=A0 } > +#endif > =C2=A0} > > =C2=A0/* > It is still missing checking on opt_stack.h Besides, I'd reconsider having KDB_TRACE explanation in ddb(4) manpage (right now it is rightly there because it is DDB specific only, as long as it offers the backend, but with your change it is a global functionality. Not sure if it worths changing it but however you may have more opinions). Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 20:49:46 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A4C5A106564A; Sat, 18 Sep 2010 20:49:46 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id BACC48FC15; Sat, 18 Sep 2010 20:49:45 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA11682; Sat, 18 Sep 2010 23:49:44 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Ox4Lv-000DaR-VM; Sat, 18 Sep 2010 23:49:44 +0300 Message-ID: <4C9525E7.3030804@freebsd.org> Date: Sat, 18 Sep 2010 23:49:43 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100912 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: Attilio Rao References: <4C94A138.8050905@icyb.net.ua> <4C9507D1.3010008@icyb.net.ua> <4C950C48.6020600@freebsd.org> <4C95214A.3070600@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 20:49:46 -0000 on 18/09/2010 23:35 Attilio Rao said the following: > It is still missing checking on opt_stack.h Yes, thanks, fixed it in my tree. > Besides, I'd reconsider having KDB_TRACE explanation in ddb(4) manpage > (right now it is rightly there because it is DDB specific only, as > long as it offers the backend, but with your change it is a global > functionality. Not sure if it worths changing it but however you may > have more opinions). It seems that we don't have kdb(4) ? -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 20:51:11 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83177106566C; Sat, 18 Sep 2010 20:51:11 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 1B57A8FC2C; Sat, 18 Sep 2010 20:51:10 +0000 (UTC) Received: by qyk31 with SMTP id 31so1956175qyk.13 for ; Sat, 18 Sep 2010 13:51:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=ExSZDlKSbl6vAcqzIjhguElOQXDg1wxncMi1z93+Zd4=; b=thvwQZec1fNAui/3tO5ZLmwHy6vALiraLea6idT26CvQJJaG5wCBR0WHkLFT+7DW+H G2ebHn96oZ+cJALId/Cdw1m94kNJ0tLjwknmaQJ411LMFk9R8fknZiz/x6ldqBF5DJ0C CDRmeu2mgp5ufn9Z1KW/N/Yi0elsQEQccntNI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=lI+v0WQPgxxCii/u2KySEaMQONlrFoWfSNs+o0A7+gH3tFN3YaNZ9uSVnDgzGuRQ5e hWuaRONwF3cjqA0pSuQcjAm4hZvw+F3Zp+JeBy9DQHgq8M7mbkmi2lxOvkqx0fmBu033 L8offdM3t3FM5r/Cl2PHJzT84crGfuuo39Qog= MIME-Version: 1.0 Received: by 10.229.65.159 with SMTP id j31mr4307475qci.212.1284843070303; Sat, 18 Sep 2010 13:51:10 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.229.235.143 with HTTP; Sat, 18 Sep 2010 13:51:10 -0700 (PDT) In-Reply-To: <4C9525E7.3030804@freebsd.org> References: <4C94A138.8050905@icyb.net.ua> <4C9507D1.3010008@icyb.net.ua> <4C950C48.6020600@freebsd.org> <4C95214A.3070600@freebsd.org> <4C9525E7.3030804@freebsd.org> Date: Sat, 18 Sep 2010 22:51:10 +0200 X-Google-Sender-Auth: CBHq4fhJDb-0P_PwGC6Pi7xl26k Message-ID: From: Attilio Rao To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: freebsd-hackers@freebsd.org Subject: Re: KDB_TRACE and no backend X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 20:51:11 -0000 2010/9/18 Andriy Gapon : > on 18/09/2010 23:35 Attilio Rao said the following: >> It is still missing checking on opt_stack.h > > Yes, thanks, fixed it in my tree. > >> Besides, I'd reconsider having KDB_TRACE explanation in ddb(4) manpage >> (right now it is rightly there because it is DDB specific only, as >> long as it offers the backend, but with your change it is a global >> functionality. Not sure if it worths changing it but however you may >> have more opinions). > > It seems that we don't have kdb(4) ? > We don't and we should really have. I'd really like a kernel section describing the whole kdb infrastructure and kdbe hooks. That may be indicated as a janitor taks actually if someone wants to takeover and document the whole layer. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 22:13:24 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75DCE1065672; Sat, 18 Sep 2010 22:13:24 +0000 (UTC) (envelope-from perryh@pluto.rain.com) Received: from agora.rdrop.com (agora.rdrop.com [IPv6:2607:f678:1010::34]) by mx1.freebsd.org (Postfix) with ESMTP id 541688FC1B; Sat, 18 Sep 2010 22:13:24 +0000 (UTC) Received: from agora.rdrop.com (66@localhost [127.0.0.1]) by agora.rdrop.com (8.13.1/8.12.7) with ESMTP id o8IMDNtr026215 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 18 Sep 2010 15:13:23 -0700 (PDT) (envelope-from perryh@pluto.rain.com) Received: (from uucp@localhost) by agora.rdrop.com (8.13.1/8.12.9/Submit) with UUCP id o8IMDNbH026214; Sat, 18 Sep 2010 15:13:23 -0700 (PDT) Received: from fbsd61 by pluto.rain.com (4.1/SMI-4.1-pluto-M2060407) id AA15753; Sat, 18 Sep 10 15:09:25 PDT Date: Sat, 18 Sep 2010 15:09:19 -0700 From: perryh@pluto.rain.com To: kientzle@freebsd.org Message-Id: <4c95388f.vSPICvvA6A5bgvDR%perryh@pluto.rain.com> References: <20100829201050.GA60715@stack.nl> In-Reply-To: User-Agent: nail 11.25 7/29/05 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: ar(1) format_decimal failure is fatal? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 22:13:24 -0000 Tim Kientzle wrote: > Personally, I wonder if it wouldn't make sense to just always > force the timestamp, uid, and gid to zero .. uid and gid, OK. Timestamp, no. It is not that rare to need to find out which version of some .o is in a particular .a file, usually in connection with debugging some obscure failure. For that matter, aren't there some versions of make(1) that can check whether an archive member is up to date by examining the timestamp in the archive? From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 18 22:42:12 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9C8C106566B; Sat, 18 Sep 2010 22:42:11 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id AF8728FC12; Sat, 18 Sep 2010 22:42:11 +0000 (UTC) Received: by pzk7 with SMTP id 7so1109817pzk.13 for ; Sat, 18 Sep 2010 15:42:11 -0700 (PDT) Received: by 10.142.232.19 with SMTP id e19mr4134137wfh.254.1284848144219; Sat, 18 Sep 2010 15:15:44 -0700 (PDT) Received: from [10.0.1.198] (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id o17sm9676882wal.21.2010.09.18.15.15.40 (version=SSLv3 cipher=RC4-MD5); Sat, 18 Sep 2010 15:15:42 -0700 (PDT) Date: Sat, 18 Sep 2010 12:16:49 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Robert Watson In-Reply-To: Message-ID: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Sat, 18 Sep 2010 22:47:56 +0000 Cc: freebsd-hackers@freebsd.org, Jeff Roberson , Andre Oppermann , Andriy Gapon Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Sep 2010 22:42:12 -0000 On Sat, 18 Sep 2010, Robert Watson wrote: > > On Fri, 17 Sep 2010, Andre Oppermann wrote: > >>> Although keeping free items around improves performance, it does consume >>> memory too. And the fact that that memory is not freed on lowmem >>> condition makes the situation worse. >> >> Interesting. We may run into related issues with excessive mbuf (cluster) >> caching in the per-cpu buckets as well. >> >> Having a general solutions for that is appreciated. Maybe the size of the >> free per-cpu buckets should be specified when setting up the UMA zone. Of >> certain frequently re-used elements we may want to cache more, other less. > > I've been keeping a vague eye out for this over the last few years, and > haven't spotted many problems in production machines I've inspected. You can > use the umastat tool in the tools tree to look at the distribution of memory > over buckets (etc) in UMA manually. It would be nice if it had some > automated statistics on fragmentation however. Short-lived fragmentation is > likely, and isn't an issue, so what you want is a tool that monitors over > time and reports on longer-lived fragmentation. Not specifically in reaction to Robert's comment but I would like to add my thoughts to this notion of resource balancing in buckets. I really prefer not to do any specific per-zone tuning except in extreme cases. This is because quite often the decisions we make don't apply to some class of machines or workloads. I would instead prefer to keep the algorithm adaptable. I like the idea of weighting the bucket decisions by the size of the item. Obviously this has some flaws with compound objects but in the general case it is good. We should consider increasing the cost of bucket expansion based on the size of the item. Right now buckets are expanded fairly readily. We could also consider decreasing the default bucket size for a zone based on vm pressure and use. Right now there is no downward pressure on bucket size, only upward based on trips to the slab layer. Additionally we could make a last ditch flush mechanism that runs on each cpu in turn and flushes some or all of the buckets in per-cpu caches. Presently that is not done due to synchronization issues. It can't be done from a central place. It could be done with a callout mechanism or a for loop that binds to each core in succession. I believe the combination of these approaches would significantly solve the problem and should be relatively little new code. It should also preserve the adaptable nature of the system without penalizing resource heavy systems. I would be happy to review patches from anyone who wishes to undertake it. > > The main fragmentation issue we've had in the past has been due to > mbuf+cluster caching, which prevented mbufs from being freed usefully in some > cases. Jeff's ongoing work on variable-sized mbufs would entirely eliminate > that problem... I'm going to get back to this soon as infiniband gets to a useful state for doing high performance network testing. This is only because I have no 10gigE but do have ib and have funding to cover working on it. I hope to have some results and activity on this front by the end of the year. I know it has been long coming. Thanks, Jeff > > Robert >