From owner-freebsd-afs@FreeBSD.ORG Tue May 25 16:26:47 2010 Return-Path: Delivered-To: freebsd-afs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52FC3106566B for ; Tue, 25 May 2010 16:26:47 +0000 (UTC) (envelope-from me@janh.de) Received: from mailhost.uni-hamburg.de (mailhost.uni-hamburg.de [134.100.32.155]) by mx1.freebsd.org (Postfix) with ESMTP id C22D78FC0C for ; Tue, 25 May 2010 16:26:46 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mailhost.uni-hamburg.de (Postfix) with ESMTP id 8393990132; Tue, 25 May 2010 18:26:45 +0200 (CEST) X-Virus-Scanned: by University of Hamburg (RRZ/mailhost) Received: from mailhost.uni-hamburg.de ([127.0.0.1]) by localhost (mailhost.uni-hamburg.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 6lMJwBSMGJ5O; Tue, 25 May 2010 18:26:45 +0200 (CEST) Received: from pc861.math.uni-hamburg.de (pc861.math.uni-hamburg.de [134.100.222.11]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: fmjv004) by mailhost.uni-hamburg.de (Postfix) with ESMTPSA id 65BD8900A9; Tue, 25 May 2010 18:26:45 +0200 (CEST) Message-ID: <4BFBFA49.8040900@janh.de> Date: Tue, 25 May 2010 18:26:49 +0200 From: Jan Henrik Sylvester User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.9) Gecko/20100514 Thunderbird/3.0.4 MIME-Version: 1.0 To: Benjamin Kaduk References: <558186334.2619.1271175586732.JavaMail.root@thunderbeast.private.linuxbox.com> <4BC4A5F1.6050406@janh.de> <4BC5EEB5.5050109@janh.de> <4BE01AF0.3080309@janh.de> <4BF2B962.50408@janh.de> <4BF2D461.8030302@janh.de> <4BF56F38.4040008@janh.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: afs-list freebsd Subject: Re: AFS on FreeBSD 8? X-BeenThere: freebsd-afs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: The Andrew File System and FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 May 2010 16:26:47 -0000 On 05/20/10 19:24, Benjamin Kaduk wrote: > On Thu, 20 May 2010, Jan Henrik Sylvester wrote: > >> I would have tried to narrow down the circumstances that cause >> deadlocks, if both of the machines in front of me were not crashing so >> often... > > I fixed a bug that was causing very quick deadlocks on my system, in > revision 42a280f50daf6e4dc65873150c4738aacf2c3a86 ( Wed, 19 May 2010 > 10:39:35 +0000 (03:39 -0700)). Now the most common failure mode I am > seeing is kernel panics that seem to be due to some form of memory > corruption. I repeated my tests again with a build from half an hour ago, once on SMP, once with kern.smp.disabled=1. Nothing changed: Copying a 2162 bytes file works fine, copying a 256004096 bytes file locks afs. cmdebug does not return anything. Trying to reboot, I get "init: some processes would not die; ps axl advised". After "All buffers synced", afs got a message for me: afs: WARM shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener... osi_StopListener: rxk.ListenerPid ffffff0071a9c000 {2nd try: ffffff005cf3460} WARNING: not all blocks freed: large 0 small 1 All allocated tables... done Nothing new, unfortunately. I could try different file sizes... maybe around the cache size -- would that be interesting? (I have "/afs:/var/openafs/cache:100000" in /usr/local/etc/openafs/cacheinfo. That is in KB, isn't it?) Cheers, Jan Henrik From owner-freebsd-afs@FreeBSD.ORG Tue May 25 16:29:32 2010 Return-Path: Delivered-To: freebsd-afs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DC53106566B for ; Tue, 25 May 2010 16:29:32 +0000 (UTC) (envelope-from matt@linuxbox.com) Received: from aa.linuxbox.com (aa.linuxbox.com [134.215.213.37]) by mx1.freebsd.org (Postfix) with ESMTP id CF9F28FC15 for ; Tue, 25 May 2010 16:29:31 +0000 (UTC) Received: from thunderbeast.private.linuxbox.com (thunderbeast.private.linuxbox.com [10.1.1.55]) by aa.linuxbox.com (8.13.1/8.13.1/SuSE Linux 0.7) with ESMTP id o4PGTUjn005959; Tue, 25 May 2010 12:29:30 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by thunderbeast.private.linuxbox.com (Postfix) with ESMTP id 2F60F40100FF; Tue, 25 May 2010 12:29:30 -0400 (EDT) X-Virus-Scanned: amavisd-new at linuxbox.com Received: from thunderbeast.private.linuxbox.com ([127.0.0.1]) by localhost (thunderbeast.private.linuxbox.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3o4gcxrfFzqp; Tue, 25 May 2010 12:29:28 -0400 (EDT) Received: from thunderbeast.private.linuxbox.com (thunderbeast.private.linuxbox.com [10.1.1.55]) by thunderbeast.private.linuxbox.com (Postfix) with ESMTP id 4118F40100DF; Tue, 25 May 2010 12:29:28 -0400 (EDT) Date: Tue, 25 May 2010 12:29:28 -0400 (EDT) From: "Matt W. Benjamin" To: Jan Henrik Sylvester Message-ID: <2114226057.2439.1274804967983.JavaMail.root@thunderbeast.private.linuxbox.com> In-Reply-To: <4BFBFA49.8040900@janh.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.8.0.10] X-Mailer: Zimbra 6.0.5_GA_2180.CentOS5_64 (ZimbraWebClient - FF3.0 (Win)/6.0.5_GA_2180.CentOS5_64) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (aa.linuxbox.com [10.1.1.1]); Tue, 25 May 2010 12:29:30 -0400 (EDT) Cc: afs-list freebsd , Benjamin Kaduk Subject: Re: AFS on FreeBSD 8? X-BeenThere: freebsd-afs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: The Andrew File System and FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 May 2010 16:29:32 -0000 Hi, If you have ddb and whatnot, it would be helpful to enter under the hang, show alllocks, and paste backtraces of interesting threads. Matt ----- "Jan Henrik Sylvester" wrote: > On 05/20/10 19:24, Benjamin Kaduk wrote: > > On Thu, 20 May 2010, Jan Henrik Sylvester wrote: > > > >> I would have tried to narrow down the circumstances that cause > >> deadlocks, if both of the machines in front of me were not crashing > so > >> often... > > > > I fixed a bug that was causing very quick deadlocks on my system, > in > > revision 42a280f50daf6e4dc65873150c4738aacf2c3a86 ( Wed, 19 May > 2010 > > 10:39:35 +0000 (03:39 -0700)). Now the most common failure mode I > am > > seeing is kernel panics that seem to be due to some form of memory > > corruption. > > I repeated my tests again with a build from half an hour ago, once on > > SMP, once with kern.smp.disabled=1. Nothing changed: Copying a 2162 > bytes file works fine, copying a 256004096 bytes file locks afs. > cmdebug > does not return anything. > > Trying to reboot, I get "init: some processes would not die; ps axl > advised". After "All buffers synced", afs got a message for me: > > afs: WARM shutting down of: CB... afs... BkG... CTrunc... AFSDB... > RxEvent... UnmaskRxkSignals... RxListener... osi_StopListener: > rxk.ListenerPid ffffff0071a9c000 {2nd try: ffffff005cf3460} > WARNING: not all blocks freed: large 0 small 1 > All allocated tables... done > > Nothing new, unfortunately. > > I could try different file sizes... maybe around the cache size -- > would > that be interesting? (I have "/afs:/var/openafs/cache:100000" in > /usr/local/etc/openafs/cacheinfo. That is in KB, isn't it?) > > Cheers, > Jan Henrik > _______________________________________________ > freebsd-afs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-afs > To unsubscribe, send any mail to "freebsd-afs-unsubscribe@freebsd.org" -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 From owner-freebsd-afs@FreeBSD.ORG Wed May 26 04:02:03 2010 Return-Path: Delivered-To: freebsd-afs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A85D6106566C for ; Wed, 26 May 2010 04:02:03 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-6.mit.edu (DMZ-MAILSEC-SCANNER-6.MIT.EDU [18.7.68.35]) by mx1.freebsd.org (Postfix) with ESMTP id 5BDCF8FC14 for ; Wed, 26 May 2010 04:02:03 +0000 (UTC) X-AuditID: 12074423-b7c0bae0000030f0-13-4bfc9d3a1a62 Received: from mailhub-auth-4.mit.edu (MAILHUB-AUTH-4.MIT.EDU [18.7.62.39]) by dmz-mailsec-scanner-6.mit.edu (Symantec Brightmail Gateway) with SMTP id DC.AD.12528.A3D9CFB4; Wed, 26 May 2010 00:02:02 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id o4Q4217t023985; Wed, 26 May 2010 00:02:02 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id o4Q41xlZ001026 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 26 May 2010 00:02:01 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id o4Q41wFN005721; Wed, 26 May 2010 00:01:58 -0400 (EDT) Date: Wed, 26 May 2010 00:01:58 -0400 (EDT) From: Benjamin Kaduk To: Jan Henrik Sylvester In-Reply-To: <4BFBFA49.8040900@janh.de> Message-ID: References: <558186334.2619.1271175586732.JavaMail.root@thunderbeast.private.linuxbox.com> <4BC4A5F1.6050406@janh.de> <4BC5EEB5.5050109@janh.de> <4BE01AF0.3080309@janh.de> <4BF2B962.50408@janh.de> <4BF2D461.8030302@janh.de> <4BF56F38.4040008@janh.de> <4BFBFA49.8040900@janh.de> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: AAAAAA== Cc: afs-list freebsd Subject: Re: AFS on FreeBSD 8? X-BeenThere: freebsd-afs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: The Andrew File System and FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 04:02:03 -0000 On Tue, 25 May 2010, Jan Henrik Sylvester wrote: > > I could try different file sizes... maybe around the cache size -- would that > be interesting? (I have "/afs:/var/openafs/cache:100000" in > /usr/local/etc/openafs/cacheinfo. That is in KB, isn't it?) It is in KB, yes. I have some latent curiousity as to whether the cache size is a threshold for the hang, but don't have any ideas as to why that might be the case. No reason to do it unless you're testing other things (again) anyway, I think. -Ben Kaduk From owner-freebsd-afs@FreeBSD.ORG Wed May 26 05:08:15 2010 Return-Path: Delivered-To: freebsd-afs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5006106564A for ; Wed, 26 May 2010 05:08:15 +0000 (UTC) (envelope-from derrick.brashear@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7C1CA8FC21 for ; Wed, 26 May 2010 05:08:15 +0000 (UTC) Received: by vws18 with SMTP id 18so3020833vws.13 for ; Tue, 25 May 2010 22:08:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:from:to:in-reply-to :x-mailer:subject:references:message-id:content-type :content-transfer-encoding:mime-version:date:cc; bh=pIGmab8f+UEtAhqgzyQNfDprtKsJsBagLKGUBG8wCI8=; b=u82EWqeImsetcTR7Y8NR01Ve6T1AeOT0xZkMXdauHmRIy1N4rgOjVlYhdpTmWPLkWb oCeLL1MGsRToKV638C6nnAABJQS60lNdOXhH1OQqGa2UDxMExeIoykBUZe4temazy79r vpGMG5tPmH4pJKMZ17zNbPDAQeEosgtPflrHs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:from:to:in-reply-to:x-mailer:subject:references:message-id :content-type:content-transfer-encoding:mime-version:date:cc; b=QPXOD18Xt+4qS8l0ETbVvl/2XJyCtNT8L3/eN5+aIqxT3e15iCEFWaXeAY7JVB8g3w H4cvkXP7SLz8gFyGCRIrvH0cndnpd8jpgqRViPOzvprOQkxiW18/4qPaW8FXWUiEyvbR LG1hZPPLZaNfsivXsOh8lHqKCyozfptz41GXw= Received: by 10.220.107.94 with SMTP id a30mr5858814vcp.15.1274848678435; Tue, 25 May 2010 21:37:58 -0700 (PDT) Received: from [10.78.124.119] ([32.167.89.41]) by mx.google.com with ESMTPS id z13sm27140977vco.6.2010.05.25.21.37.53 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 25 May 2010 21:37:56 -0700 (PDT) Sender: Derrick Brashear From: Derrick Brashear To: Benjamin Kaduk In-Reply-To: X-Mailer: iPhone Mail (7E18) References: <558186334.2619.1271175586732.JavaMail.root@thunderbeast.private.linuxbox.com> <4BC4A5F1.6050406@janh.de> <4BC5EEB5.5050109@janh.de> <4BE01AF0.3080309@janh.de> <4BF2B962.50408@janh.de> <4BF2D461.8030302@janh.de> <4BF56F38.4040008@janh.de> <4BFBFA49.8040900@janh.de> Message-Id: Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (iPhone Mail 7E18) Date: Tue, 25 May 2010 23:37:32 -0500 Cc: Jan Henrik Sylvester , afs-list freebsd Subject: Re: AFS on FreeBSD 8? X-BeenThere: freebsd-afs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: The Andrew File System and FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 05:08:15 -0000 if the cache size is doing it then it's going to be the needmorespace flush Derrick On May 25, 2010, at 11:01 PM, Benjamin Kaduk wrote: > On Tue, 25 May 2010, Jan Henrik Sylvester wrote: > >> >> I could try different file sizes... maybe around the cache size -- >> would that be interesting? (I have "/afs:/var/openafs/cache:100000" >> in /usr/local/etc/openafs/cacheinfo. That is in KB, isn't it?) > > It is in KB, yes. > > I have some latent curiousity as to whether the cache size is a > threshold for the hang, but don't have any ideas as to why that > might be the case. No reason to do it unless you're testing other > things (again) anyway, I think. > > -Ben Kaduk > _______________________________________________ > freebsd-afs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-afs > To unsubscribe, send any mail to "freebsd-afs-unsubscribe@freebsd.org" From owner-freebsd-afs@FreeBSD.ORG Wed May 26 10:14:22 2010 Return-Path: Delivered-To: freebsd-afs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D3171065673 for ; Wed, 26 May 2010 10:14:22 +0000 (UTC) (envelope-from me@janh.de) Received: from mailhost.uni-hamburg.de (mailhost.uni-hamburg.de [134.100.32.155]) by mx1.freebsd.org (Postfix) with ESMTP id 1C95D8FC1B for ; Wed, 26 May 2010 10:14:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mailhost.uni-hamburg.de (Postfix) with ESMTP id 399C890484; Wed, 26 May 2010 12:14:20 +0200 (CEST) X-Virus-Scanned: by University of Hamburg (RRZ/mailhost) Received: from mailhost.uni-hamburg.de ([127.0.0.1]) by localhost (mailhost.uni-hamburg.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id uyAmW+pGjE9J; Wed, 26 May 2010 12:14:20 +0200 (CEST) Received: from pc861.math.uni-hamburg.de (pc861.math.uni-hamburg.de [134.100.222.11]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: fmjv004) by mailhost.uni-hamburg.de (Postfix) with ESMTPSA id B82E890480; Wed, 26 May 2010 12:14:19 +0200 (CEST) Message-ID: <4BFCF47F.6060106@janh.de> Date: Wed, 26 May 2010 12:14:23 +0200 From: Jan Henrik Sylvester User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.9) Gecko/20100514 Thunderbird/3.0.4 MIME-Version: 1.0 To: Benjamin Kaduk References: <558186334.2619.1271175586732.JavaMail.root@thunderbeast.private.linuxbox.com> <4BC4A5F1.6050406@janh.de> <4BC5EEB5.5050109@janh.de> <4BE01AF0.3080309@janh.de> <4BF2B962.50408@janh.de> <4BF2D461.8030302@janh.de> <4BF56F38.4040008@janh.de> <4BFBFA49.8040900@janh.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: afs-list freebsd , Derrick Brashear Subject: Re: AFS on FreeBSD 8? X-BeenThere: freebsd-afs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: The Andrew File System and FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 10:14:22 -0000 On 05/26/10 06:01, Benjamin Kaduk wrote: > On Tue, 25 May 2010, Jan Henrik Sylvester wrote: > >> >> I could try different file sizes... maybe around the cache size -- >> would that be interesting? (I have "/afs:/var/openafs/cache:100000" in >> /usr/local/etc/openafs/cacheinfo. That is in KB, isn't it?) > > It is in KB, yes. > > I have some latent curiousity as to whether the cache size is a > threshold for the hang, but don't have any ideas as to why that might be > the case. No reason to do it unless you're testing other things (again) > anyway, I think. About Matt's advice: Unfortunately, I have never used ddb and I do not even know what "whatnot" is, sorry. Maybe, I will learn debugging some day, but not now. I could try to get another account for that afs -- would you want to try to reproduce it against that afs server? I tried to copy files starting with 1KB and doubling the size. 1KB up to 64KB copied fine. For 128KB, I got "Permission denied". Trying "ls" on the afs directory, I got all the file names and for each of them a "Permission denied". I did "kdestroy", "kinit", and "aklog", which reenabled afs access. Of the 128KB, only 64KB got copied (I checked that it were indeed the first 64KB of the file). 256KB copied fine (after the kdestroy-kinit-aklog-cycle). 512KB did not give an error, but only 448KB were copied. 1MB gave me "Permission denied" again and nothing was copied. 1MB copied fine after another kdestroy-kinit-aklog-cycle. 2MB did not give an error, but only 1664KB were copied. 4MB gave me "Permission denied" and after "kdestroy", the machine froze completely. It does not seem to be the cache and there is not only locking going on but also corruption. (All this on SMP. I could try again with SMP disabled.) Cheers, Jan Henrik