From owner-freebsd-stable@FreeBSD.ORG Sun Mar 3 01:56:47 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CACFD25A for ; Sun, 3 Mar 2013 01:56:47 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 852B4DA9 for ; Sun, 3 Mar 2013 01:56:46 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r231ug5f045500 for ; Sat, 2 Mar 2013 19:56:43 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Sat Mar 2 19:56:43 2013 Message-ID: <5132ADD4.8050507@denninger.net> Date: Sat, 02 Mar 2013 19:56:36 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Musings on ZFS Backup strategies References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> In-Reply-To: <5130EB8A.7060706@gmail.com> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130302-1, 03/02/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2013 01:56:47 -0000 Quoth Ben Morrow: > I don't know what medium you're backing up to (does anyone use tape any > more?) but when backing up to disk I much prefer to keep the backup in > the form of a filesystem rather than as 'zfs send' streams. One reason > for this is that I believe that new versions of the ZFS code are more > likely to be able to correctly read old versions of the filesystem than > old versions of the stream format; this may not be correct any more, > though. > > Another reason is that it means I can do 'rolling snapshot' backups. I > do an initial dump like this > > # zpool is my working pool > # bakpool is a second pool I am backing up to > > zfs snapshot -r zpool/fs at dump > zfs send -R zpool/fs at dump | zfs recv -vFd bakpool > > That pipe can obviously go through ssh or whatever to put the backup on > a different machine. Then to make an increment I roll forward the > snapshot like this > > zfs rename -r zpool/fs at dump dump-old > zfs snapshot -r zpool/fs at dump > zfs send -R -I @dump-old zpool/fs at dump | zfs recv -vFd bakpool > zfs destroy -r zpool/fs at dump-old > zfs destroy -r bakpool/fs at dump-old > > (Notice that the increment starts at a snapshot called @dump-old on the > send side but at a snapshot called @dump on the recv side. ZFS can > handle this perfectly well, since it identifies snapshots by UUID, and > will rename the bakpool snapshot as part of the recv.) > > This brings the filesystem on bakpool up to date with the filesystem on > zpool, including all snapshots, but never creates an increment with more > than one backup interval's worth of data in. If you want to keep more > history on the backup pool than the source pool, you can hold off on > destroying the old snapshots, and instead rename them to something > unique. (Of course, you could always give them unique names to start > with, but I find it more convenient not to.) Uh, I see a potential problem here. What if the zfs send | zfs recv command fails for some reason before completion? I have noted that zfs recv is atomic -- if it fails for any reason the entire receive is rolled back like it never happened. But you then destroy the old snapshot, and the next time this runs the new gets rolled down. It would appear that there's an increment missing, never to be seen again. What gets lost in that circumstance? Anything changed between the two times -- and silently at that? (yikes!) -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Sun Mar 3 04:23:15 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C53D0A38 for ; Sun, 3 Mar 2013 04:23:15 +0000 (UTC) (envelope-from mauzo@anubis.morrow.me.uk) Received: from isis.morrow.me.uk (isis.morrow.me.uk [204.109.63.142]) by mx1.freebsd.org (Postfix) with ESMTP id 85FB82B5 for ; Sun, 3 Mar 2013 04:23:15 +0000 (UTC) Received: from anubis.morrow.me.uk (host86-177-98-144.range86-177.btcentralplus.com [86.177.98.144]) (Authenticated sender: mauzo) by isis.morrow.me.uk (Postfix) with ESMTPSA id BDA544504E; Sun, 3 Mar 2013 04:23:07 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.7.4 isis.morrow.me.uk BDA544504E DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=morrow.me.uk; s=dkim201101; t=1362284588; bh=oSxkz3ZESlKXiLXFzNvF/c+7pKhbofHkyViW+0Xd+6U=; h=Date:From:To:Subject:References:In-Reply-To; b=kki9bgf2Ee1DFYiOChRnDJk5/YqJAPqPNva7sfT0snMdibLPZtIjQzDn3LaWsseFy KISwObWe7cO+5xaMgPPeB05+/dVYn1VSJ/tP0qCIXA6ZN0+qRofECnrpkkQihSL9Nr U8VAp28OHGSYarUi0LCK8fdPwCeSwQMwW3Zfci9M= X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.97.6 at isis.morrow.me.uk Received: by anubis.morrow.me.uk (Postfix, from userid 5001) id B42E69EBB; Sun, 3 Mar 2013 04:23:05 +0000 (GMT) Date: Sun, 3 Mar 2013 04:23:05 +0000 From: Ben Morrow To: karl@denninger.net, freebsd-stable@freebsd.org Subject: Re: Musings on ZFS Backup strategies Message-ID: <20130303042301.GA54356@anubis.morrow.me.uk> References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5132ADD4.8050507@denninger.net> X-Newsgroups: gmane.os.freebsd.stable Organization: morrow.me.uk User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2013 04:23:15 -0000 Quoth Karl Denninger : > Quoth Ben Morrow: > > I don't know what medium you're backing up to (does anyone use tape any > > more?) but when backing up to disk I much prefer to keep the backup in > > the form of a filesystem rather than as 'zfs send' streams. One reason > > for this is that I believe that new versions of the ZFS code are more > > likely to be able to correctly read old versions of the filesystem than > > old versions of the stream format; this may not be correct any more, > > though. > > > > Another reason is that it means I can do 'rolling snapshot' backups. I > > do an initial dump like this > > > > # zpool is my working pool > > # bakpool is a second pool I am backing up to > > > > zfs snapshot -r zpool/fs at dump > > zfs send -R zpool/fs at dump | zfs recv -vFd bakpool > > > > That pipe can obviously go through ssh or whatever to put the backup on > > a different machine. Then to make an increment I roll forward the > > snapshot like this > > > > zfs rename -r zpool/fs at dump dump-old > > zfs snapshot -r zpool/fs at dump > > zfs send -R -I @dump-old zpool/fs at dump | zfs recv -vFd bakpool > > zfs destroy -r zpool/fs at dump-old > > zfs destroy -r bakpool/fs at dump-old > > > > (Notice that the increment starts at a snapshot called @dump-old on the > > send side but at a snapshot called @dump on the recv side. ZFS can > > handle this perfectly well, since it identifies snapshots by UUID, and > > will rename the bakpool snapshot as part of the recv.) > > > > This brings the filesystem on bakpool up to date with the filesystem on > > zpool, including all snapshots, but never creates an increment with more > > than one backup interval's worth of data in. If you want to keep more > > history on the backup pool than the source pool, you can hold off on > > destroying the old snapshots, and instead rename them to something > > unique. (Of course, you could always give them unique names to start > > with, but I find it more convenient not to.) > > Uh, I see a potential problem here. > > What if the zfs send | zfs recv command fails for some reason before > completion? I have noted that zfs recv is atomic -- if it fails for any > reason the entire receive is rolled back like it never happened. > > But you then destroy the old snapshot, and the next time this runs the > new gets rolled down. It would appear that there's an increment > missing, never to be seen again. No, if the recv fails my backup script aborts and doesn't delete the old snapshot. Cleanup then means removing the new snapshot and renaming the old back on the source zpool; in my case I do this by hand, but it could be automated given enough thought. (The names of the snapshots on the backup pool don't matter; they will be cleaned up by the next successful recv.) > What gets lost in that circumstance? Anything changed between the two > times -- and silently at that? (yikes!) It's impossible to recv an incremental stream on top of the wrong snapshot (identified by UUID, not by its current name), so nothing can get silently lost. A 'zfs recv -F' will find the correct starting snapshot on the destination filesystem (assuming it's there) regardless of its name, and roll forward to the state as of the end snapshot. If a recv succeeds you can be sure nothing up to that point has been missed. The worst that can happen is if you mistakenly delete the snapshot on the source pool that marks the end of the last successful recv on the backup pool; in that case you have to take an increment from further back (which will therefore be a larger incremental stream than it needed to be). The very worst case is if you end up without any snapshots in common between the source and backup pools, and you have to start again with a full dump. Ben From owner-freebsd-stable@FreeBSD.ORG Sun Mar 3 04:57:47 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C37FEE17 for ; Sun, 3 Mar 2013 04:57:47 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 6370036E for ; Sun, 3 Mar 2013 04:57:46 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r234vikk052956 for ; Sat, 2 Mar 2013 22:57:44 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Sat Mar 2 22:57:44 2013 Message-ID: <5132D843.4000800@denninger.net> Date: Sat, 02 Mar 2013 22:57:39 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Musings on ZFS Backup strategies References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> <20130303042301.GA54356@anubis.morrow.me.uk> In-Reply-To: <20130303042301.GA54356@anubis.morrow.me.uk> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130302-1, 03/02/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2013 04:57:47 -0000 On 3/2/2013 10:23 PM, Ben Morrow wrote: > Quoth Karl Denninger : >> Quoth Ben Morrow: >>> I don't know what medium you're backing up to (does anyone use tape any >>> more?) but when backing up to disk I much prefer to keep the backup in >>> the form of a filesystem rather than as 'zfs send' streams. One reason >>> for this is that I believe that new versions of the ZFS code are more >>> likely to be able to correctly read old versions of the filesystem than >>> old versions of the stream format; this may not be correct any more, >>> though. >>> >>> Another reason is that it means I can do 'rolling snapshot' backups. I >>> do an initial dump like this >>> >>> # zpool is my working pool >>> # bakpool is a second pool I am backing up to >>> >>> zfs snapshot -r zpool/fs at dump >>> zfs send -R zpool/fs at dump | zfs recv -vFd bakpool >>> >>> That pipe can obviously go through ssh or whatever to put the backup on >>> a different machine. Then to make an increment I roll forward the >>> snapshot like this >>> >>> zfs rename -r zpool/fs at dump dump-old >>> zfs snapshot -r zpool/fs at dump >>> zfs send -R -I @dump-old zpool/fs at dump | zfs recv -vFd bakpool >>> zfs destroy -r zpool/fs at dump-old >>> zfs destroy -r bakpool/fs at dump-old >>> >>> (Notice that the increment starts at a snapshot called @dump-old on the >>> send side but at a snapshot called @dump on the recv side. ZFS can >>> handle this perfectly well, since it identifies snapshots by UUID, and >>> will rename the bakpool snapshot as part of the recv.) >>> >>> This brings the filesystem on bakpool up to date with the filesystem on >>> zpool, including all snapshots, but never creates an increment with more >>> than one backup interval's worth of data in. If you want to keep more >>> history on the backup pool than the source pool, you can hold off on >>> destroying the old snapshots, and instead rename them to something >>> unique. (Of course, you could always give them unique names to start >>> with, but I find it more convenient not to.) >> Uh, I see a potential problem here. >> >> What if the zfs send | zfs recv command fails for some reason before >> completion? I have noted that zfs recv is atomic -- if it fails for any >> reason the entire receive is rolled back like it never happened. >> >> But you then destroy the old snapshot, and the next time this runs the >> new gets rolled down. It would appear that there's an increment >> missing, never to be seen again. > No, if the recv fails my backup script aborts and doesn't delete the old > snapshot. Cleanup then means removing the new snapshot and renaming the > old back on the source zpool; in my case I do this by hand, but it could > be automated given enough thought. (The names of the snapshots on the > backup pool don't matter; they will be cleaned up by the next successful > recv.) I was concerned that if the one you rolled to "old" get killed without the backup being successful then you're screwed as you've lost the context. I presume that zfs recv will properly set the exit code non-zero if something's wrong (I would hope so!) >> What gets lost in that circumstance? Anything changed between the two >> times -- and silently at that? (yikes!) > It's impossible to recv an incremental stream on top of the wrong > snapshot (identified by UUID, not by its current name), so nothing can > get silently lost. A 'zfs recv -F' will find the correct starting > snapshot on the destination filesystem (assuming it's there) regardless > of its name, and roll forward to the state as of the end snapshot. If a > recv succeeds you can be sure nothing up to that point has been missed. Ah, ok. THAT I did not understand. So the zfs recv process checks what it's about to apply the delta against, and if it can't find a consistent place to start it garfs rather than screw you. That's good. As long as it gets caught I can live with it. Recovery isn't a terrible pain in the butt so long as it CAN be recovered. It's the potential for silent failures that scare the bejeezus out of me for all the obvious reasons. > The worst that can happen is if you mistakenly delete the snapshot on > the source pool that marks the end of the last successful recv on the > backup pool; in that case you have to take an increment from further > back (which will therefore be a larger incremental stream than it needed > to be). The very worst case is if you end up without any snapshots in > common between the source and backup pools, and you have to start again > with a full dump. > > Ben Got it. That's not great in that it could force a new "full copy", but it's also not the end of the world. In my case I am already automatically taking daily and 4-hour snaps, keeping a week's worth around, which is more than enough time to be able to obtain a consistent place to go from. That should be ok then. I think I'm going to play with this and see what I think of it. One thing that is very attractive to this design is to have the receiving side be a mirror, then to rotate to the vault copy run a scrub (to insure that both members are consistent at a checksum level), break the mirror and put one in the vault, replacing it with the drive coming FROM the vault, then do a zpool replace and allow it to resilver into the other drive. You now have the two in consistent state again locally if the pool pukes and one in the vault in the event of a fire or other "entire facility is toast" event. The only risk that makes me uncomfortable doing this is that the pool is always active when the system is running. With UFS backup disks it's not -- except when being actually written to they're unmounted, and this materially decreases the risk of an insane adapter scribbling the drives, since there is no I/O at all going to them unless mounted. While the backup pool would be nominally idle it is probably more-exposed to a potential scribble than the UFS-mounted packs would be. The two times in my career I've gotten hosed by this my operative theory is that something went wrong in the adapter code and it decided that cache RAM pages "belonged" to a different disk than they really belonged to. That's the only explanation I can come up with that makes sense; in both cases it resulted in effectively complete destruction of the data on all mounted drives in the array. -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Sun Mar 3 05:26:39 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 980C6400 for ; Sun, 3 Mar 2013 05:26:39 +0000 (UTC) (envelope-from regnauld@x0.dk) Received: from moof.catpipe.net (moof.catpipe.net [194.28.252.64]) by mx1.freebsd.org (Postfix) with ESMTP id 59F8B665 for ; Sun, 3 Mar 2013 05:26:38 +0000 (UTC) Received: from localhost (moof.catpipe.net [194.28.252.64]) by localhost.catpipe.net (Postfix) with ESMTP id 9501F4CE954; Sun, 3 Mar 2013 06:17:57 +0100 (CET) Received: from moof.catpipe.net ([194.28.252.64]) by localhost (moof.catpipe.net [194.28.252.64]) (amavisd-new, port 10024) with ESMTP id NAsVN+2u44u2; Sun, 3 Mar 2013 06:17:57 +0100 (CET) Received: from macbook.bluepipe.net (unknown [122.248.97.220]) (Authenticated sender: relayuser) by moof.catpipe.net (Postfix) with ESMTPA id AA2874CE952; Sun, 3 Mar 2013 06:17:56 +0100 (CET) Received: by macbook.bluepipe.net (Postfix, from userid 1001) id 8EFF1222F8C; Sun, 3 Mar 2013 11:47:54 +0630 (MMT) Date: Sun, 3 Mar 2013 11:47:54 +0630 From: Phil Regnauld To: Karl Denninger Subject: Re: Musings on ZFS Backup strategies Message-ID: <20130303051754.GE21613@macbook.bluepipe.net> References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> <20130303042301.GA54356@anubis.morrow.me.uk> <5132D843.4000800@denninger.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5132D843.4000800@denninger.net> X-Operating-System: Darwin 12.2.1 x86_64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2013 05:26:39 -0000 Karl Denninger (karl) writes: > > I think I'm going to play with this and see what I think of it. One > thing that is very attractive to this design is to have the receiving > side be a mirror, then to rotate to the vault copy run a scrub (to > insure that both members are consistent at a checksum level), break the > mirror and put one in the vault, replacing it with the drive coming FROM > the vault, then do a zpool replace and allow it to resilver into the > other drive. You now have the two in consistent state again locally if > the pool pukes and one in the vault in the event of a fire or other > "entire facility is toast" event. That's one solution. > The only risk that makes me uncomfortable doing this is that the pool is > always active when the system is running. With UFS backup disks it's > not -- except when being actually written to they're unmounted, and this > materially decreases the risk of an insane adapter scribbling the > drives, since there is no I/O at all going to them unless mounted. > While the backup pool would be nominally idle it is probably > more-exposed to a potential scribble than the UFS-mounted packs would be. Could "zpool export" in between syncs on the target, assuming that's not your root pool :) Cheers, Phil From owner-freebsd-stable@FreeBSD.ORG Sun Mar 3 06:24:34 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4DBD7CD7 for ; Sun, 3 Mar 2013 06:24:34 +0000 (UTC) (envelope-from mauzo@anubis.morrow.me.uk) Received: from isis.morrow.me.uk (isis.morrow.me.uk [204.109.63.142]) by mx1.freebsd.org (Postfix) with ESMTP id 19E2078E for ; Sun, 3 Mar 2013 06:24:33 +0000 (UTC) Received: from anubis.morrow.me.uk (host86-177-98-144.range86-177.btcentralplus.com [86.177.98.144]) (Authenticated sender: mauzo) by isis.morrow.me.uk (Postfix) with ESMTPSA id A6FF14504E for ; Sun, 3 Mar 2013 06:24:32 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.7.4 isis.morrow.me.uk A6FF14504E DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=morrow.me.uk; s=dkim201101; t=1362291872; bh=nYMrs2s0YHAcQypBvUA/B4gE7ly4CRlpC9l9RuMNm3U=; h=Date:From:To:Subject:References:In-Reply-To; b=P4DrdWqnji7Kn+xVKfW6NQOUKYr5STEJQE5+BdsQk5z+wk9oDLADwRWwV7wH+eFw2 FGatTa7ljOJ/KWiHYvquCuFEL5TaoZo0g+zKkVhSF4qf6wN0ec+/m9HKz2Kb0ScnM1 n6xZzeFx1xZghjgPOh97TKCbfv8+ny2fGTQ7MH/8= X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.97.6 at isis.morrow.me.uk Received: by anubis.morrow.me.uk (Postfix, from userid 5001) id 70D209EE5; Sun, 3 Mar 2013 06:24:30 +0000 (GMT) Date: Sun, 3 Mar 2013 06:24:30 +0000 From: Ben Morrow To: freebsd-stable@freebsd.org Subject: Re: Musings on ZFS Backup strategies Message-ID: <20130303062426.GA55406@anubis.morrow.me.uk> References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> <20130303042301.GA54356@anubis.morrow.me.uk> <5132D843.4000800@denninger.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130303051754.GE21613@macbook.bluepipe.net> X-Newsgroups: gmane.os.freebsd.stable Organization: morrow.me.uk User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2013 06:24:34 -0000 Quoth Phil Regnauld : > > > The only risk that makes me uncomfortable doing this is that the pool is > > always active when the system is running. With UFS backup disks it's > > not -- except when being actually written to they're unmounted, and this > > materially decreases the risk of an insane adapter scribbling the > > drives, since there is no I/O at all going to them unless mounted. > > While the backup pool would be nominally idle it is probably > > more-exposed to a potential scribble than the UFS-mounted packs would be. > > Could "zpool export" in between syncs on the target, assuming that's not > your root pool :) If I were feeling paranoid I might be tempted to not only keep the pool exported when not in use, but to 'zpool offline' one half of the mirror while performing the receive, then put it back online and allow it to resilver before exporting the whole pool again. I'm not sure if there's any way to wait for the resilver to finish except to poll 'zpool status', though. Ben From owner-freebsd-stable@FreeBSD.ORG Sun Mar 3 15:52:43 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 1C2B435E for ; Sun, 3 Mar 2013 15:52:43 +0000 (UTC) (envelope-from smithi@nimnet.asn.au) Received: from sola.nimnet.asn.au (paqi.nimnet.asn.au [115.70.110.159]) by mx1.freebsd.org (Postfix) with ESMTP id 679F9D4B for ; Sun, 3 Mar 2013 15:52:41 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by sola.nimnet.asn.au (8.14.2/8.14.2) with ESMTP id r23FqSdY041074; Mon, 4 Mar 2013 02:52:28 +1100 (EST) (envelope-from smithi@nimnet.asn.au) Date: Mon, 4 Mar 2013 02:52:28 +1100 (EST) From: Ian Smith To: Darren Pilgrim Subject: Re: Building RELENG_9 (or RELENG_9_*) on a small machine? In-Reply-To: <5131035A.2050803@bluerosetech.com> Message-ID: <20130304001653.U32142@sola.nimnet.asn.au> References: <512EB5BD.1020803@bluerosetech.com> <20130228014635.GB70215@glenbarber.us> <512F23AC.1000003@bluerosetech.com> <20130302042828.J32142@sola.nimnet.asn.au> <5131035A.2050803@bluerosetech.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2013 15:52:43 -0000 On Fri, 1 Mar 2013 11:36:58 -0800, Darren Pilgrim wrote: > On 2013-03-01 10:50, Ian Smith wrote: > > At 256MB - the > > minimum earlier that completed installation without disabling CTL - swap > > often sat at ~14MB but blew out to around 165MB building those huge llvm > > libraries - cc1plus 332M, 173M resident was one top I snipped, but I > > can't say I caught the biggest. > > I had top running throughout a build and saw cc1plus reach 460 MB with 171MB > resident, at that point CPU was down to about 4% and the system was hammering > swap. As a testament to FreeBSD's robustness, I could still log in via ssh, > start screened shells, and generally conduct admin tasks while cc1plus beat > the crap out of my VM. Even start and stop other services. After 15 years I expect nothing less; even so I try not to fill swap :) > > Then I added 128MB (to 384MB) and repeated the first buildworld (incl. > > clang) expecting huge savings as it'd only touched swap to about 12MB a > > time or two, mostly having 100MB+ of free memory .. wow, down to 7h02m! Well I was going to bring mine straight up to 768MB, but having a spare 512MB stick and in the interest of science, I tried again with 512MB and nothing in src.conf, but still CAM CTL disabled. After a couple of hours swap was tickled up to 608K and got no higher, finishing at 544K which I suspect was VM getting a bit nervous and making plans at most; still at 544K next day. As mentioned but more precisely, with 256MB buildworld took 7:39:38 with lots of swapping - but only 3:31:49 WITHOUT_CLANG=y, with no swapping - and then with 384MB, 7:02:10 and no more than 14MB swap seen. With 512MB and sub-1MB swap touched it took 7:00:49; the extra 128MB RAM shaved a whole minute 31 seconds off buildworld, 0.15% faster, whoopee! > For 9.x, I changed my notes to "256 MB to run, 768 MB to build". For 8.x, > the numbers were 192 MB to run, 512 MB to build. Without CAM CTL I just managed to install 9.1 in 128MB; with took 256MB. If adrian@ succeeds defaulting CAM CTL off it'll improve matters ~35MB. >From the above I'd say 384MB without CAM CTL or 512MB with will build world with minimal swapping, and 768MB will be fine - though I don't know about building bigger ports in that? I'm not moving my $wholelife to 9.1 till packages become again available, for obvious reasons. Any idea what happens both time- and memorywise from building world with clang rather than gcc? Hopefully All That Code is more efficient? > > Here at least, building llvm libs and clang doubles buildworld time! and > > extends /usr/obj from 675MB to 1GB. > > I'll be doing `make buildkernel buildworld` ET and size comparisons between > RELENG_8_3 and RELENG_9_1 when I test out my buildbox. I'd like to gather > memory usage metrics as well, if someone knows some tricks for that. My > current approach is somewhat crude. :) If there's interest I'll follow up > with the results here. Interest here. My sampling was very crude also, just appending: 'nice top -nS -ores | head 11 | grep -v "^$" ; sleep 60' to a log. Doesn't catch max size or res except by chance, but quite good enough to track movements in swap. I guess parsing the tail of vmstat might be the go? cheers, Ian From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 13:49:50 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 464F3349 for ; Mon, 4 Mar 2013 13:49:50 +0000 (UTC) (envelope-from isabell@issyl0.co.uk) Received: from mail-vc0-f169.google.com (mail-vc0-f169.google.com [209.85.220.169]) by mx1.freebsd.org (Postfix) with ESMTP id 034B1193E for ; Mon, 4 Mar 2013 13:49:49 +0000 (UTC) Received: by mail-vc0-f169.google.com with SMTP id n10so3359304vcn.14 for ; Mon, 04 Mar 2013 05:49:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:sender:x-originating-ip:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=DioQTY+wgCO7jSel/CTscrjyiF2Xu6dBG4Yow/s5vw8=; b=pwhsnn7TwWyyzAtlYT4U18MlkZMh87lNU23r3qwnUhDcYo0sn4nwWTGnYxHjvku/Ub oA5UMWCIrfGcNSmVBJszE6ThUVMG6qP1R3KnP2wDsTu/J+usYtL0gKCixfVmqyzaa3gz w4w6+wMSI5bEJU9O6CnFtMAgn3MemFvftRrRzvB2Nh69uB6qcKnVwbjT71fu4skZSt2f Aeury108Eg8N6PnIamnWFZVyvAboz1ngPv6OtDzeWSyYjXaRVdzn6aoC+6DX0+mzupXp nwbUQHp/yNvaCUsi8EGlYVBGXLTaWJC8PvmOLlZ90IgOLzfqe/+umrWQZj8ELRgoEPN3 CqBQ== MIME-Version: 1.0 X-Received: by 10.220.116.5 with SMTP id k5mr7718621vcq.55.1362404988940; Mon, 04 Mar 2013 05:49:48 -0800 (PST) Sender: isabell@issyl0.co.uk Received: by 10.220.250.138 with HTTP; Mon, 4 Mar 2013 05:49:48 -0800 (PST) X-Originating-IP: [2.29.6.249] Date: Mon, 4 Mar 2013 13:49:48 +0000 X-Google-Sender-Auth: 7LX9Z-YX-bFUn-7rILEEDB9MoqY Message-ID: Subject: FreeBSD Quarterly Status Report, July-September 2012. From: Isabell Long To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQn4JF8ZnFxDgFiYscP2dX6EFf7tZEzHFWZo+FUhnOHZdTCIQEHLRwZhxnR3EPdqp+4tITwP Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 13:49:50 -0000 FreeBSD Quarterly Status Report, July-September 2012. Introduction This report covers FreeBSD-related projects between July and September 2012. This is the third of the four reports planned for 2012. Highlights from this quarter include successful participation in Google Summer of Code, major work in areas of the source and ports trees, and a Developer Summit attended by over 30 developers. Thanks to all the reporters for the excellent work! This report contains 12 entries and we hope you enjoy reading it. __________________________________________________________________ Projects * FreeBSD on Altera FPGAs * Native iSCSI Target * Parallel rc.d execution FreeBSD Team Reports * FreeBSD Bugbusting Team * FreeBSD Foundation * The FreeBSD Core Team Kernel * FreeBSD on ARMv6/ARMv7 Documentation * The FreeBSD Japanese Documentation Project Ports * KDE/FreeBSD * Ports Collection Miscellaneous * FreeBSD Developer Summit, Cambridge, UK FreeBSD in Google Summer of Code * Google Summer of Code 2012 __________________________________________________________________ FreeBSD Bugbusting Team URL: http://www.FreeBSD.org/support.html#gnats URL: https://wiki.freebsd.org/BugBusting Contact: Eitan Adler Contact: Gavin Atkinson Contact: Oleksandr Tymoshenko In August, Eitan Adler (eadler@) and Oleksandr Tymoshenko (gonzo@) joined the Bugmeister team. At the same time, Remko Lodder and Volker Werth stepped down. We extend our thanks to Volker and Remko for their work in the past, and welcome Oleksandr and Eitan. Eitan and Oleksandr have been working hard on migrating from GNATS, and have made significant progress on evaluating new software, and creating scripts to export data from GNATS. The bugbusting team continue work on trying to make the contents of the GNATS PR database cleaner, more accessible and easier for committers to find and resolve PRs, by tagging PRs to indicate the areas involved, and by ensuring that there is sufficient info within each PR to resolve each issue. As always, anybody interested in helping out with the PR queue is welcome to join us in #freebsd-bugbusters on EFnet. We are always looking for additional help, whether your interests lie in triaging incoming PRs, generating patches to resolve existing problems, or simply helping with the database housekeeping (identifying duplicate PRs, ones that have already been resolved, etc). This is a great way of getting more involved with FreeBSD! Open tasks: 1. Further research into tools suitable to replace GNATS. 2. Get more users involved with triaging PRs as they come in. 3. Assist committers with closing PRs. __________________________________________________________________ FreeBSD Developer Summit, Cambridge, UK URL: https://wiki.freebsd.org/201208DevSummit Contact: Robert Watson In the end of August, there was an "off-season" Developer Summit held in Cambridge, UK at the University of Cambridge Computer Laboratory. This was a three-day event, with a documentation summit scheduled for the day before. The three days of the main event were split into three sessions, with two tracks in each. Some of them even involved ARM developers from the neighborhoods which proven to be productive, and led to further engagement between the FreeBSD community and ARM. The schedule was finalized on the first day, spawning a plethora of topics to discuss, followed by splitting into groups. A short summary from each of the groups was presented in the final session and then published at the event's home page on the FreeBSD wiki. This summit contributed greatly to arriving to a tentative plan for throwing the switch to make clang the default compiler on HEAD. This was further discussed on the mailing list, and has now happened, bringing us one big step closer to a GPL-free FreeBSD 10. As part of the program, an afternoon of short talks from researchers in the Cambridge Computer Laboratory involved either operating systems work in general or FreeBSD in particular. Robert Watson showed off a tablet running FreeBSD on a MIPS-compatible soft-core processor running on an Altera FPGA. In association with the event, a dinner was hosted by St. John's college and co-sponsored by Google and the FreeBSD Foundation. The day after the conference, a trip was organized to Bletchley Park, which was celebrating Turing's centenary in 2012. __________________________________________________________________ FreeBSD Foundation URL: http://www.freebsdfoundation.org/press/2012Jul-newsletter.shtml Contact: Deb Goodkin The Foundation hosted and sponsored the Cambridge FreeBSD developer summit in August 2012. We were represented at the following conferences: OSCON July 2012, Texas LinuxFest, and Ohio LinuxFest. We negotiated/supervised Foundation funded projects: Distributed Security Audit Logging, Capsicum Component Framework, Native iSCSI Target Scoping, and Growing UFS Filesystems Online. We negotiated, supervised, and funded hardware needs for FreeBSD co-location centers. We welcomed Kirk McKusick to our board of directors. He took over the responsibility of managing our investments. We visited companies to discuss their FreeBSD use and to help facilitate collaboration with the Project. We managed FreeBSD vendor community mailing list and meetings. We created a high quality FreeBSD 9 brochure to help promote FreeBSD. Published our semi-annual newsletter that highlighted Foundation funded projects, travel grants for developers, conferences sponsored and other ways the Foundation supported the FreeBSD Project. We hired a technical writer to help with FreeBSD marketing/promotional material. We began work on redesigning our website. __________________________________________________________________ FreeBSD on Altera FPGAs URL: http://www.cl.cam.ac.uk/research/security/ctsrd/ URL: http://www.cl.cam.ac.uk/research/security/ctsrd/cheri.html Contact: Brooks Davis Contact: Robert Watson Contact: Bjoern Zeeb In the course of developing the CHERI processor as part of the CTSRD project SRI International's Computer Science Laboratory and the University of Cambridge Computer Laboratory have developed support for a number of general purpose IP cores for Altera FPGAs including the Altera Triple Speed Ethernet (ATSE) MAC core, the Altera University Program SD Card core, and the Altera JTAG UART. We have also added support for general access to memory mapped devices on the Avalon bus via the avgen bus. We have implemented both nexus and flattened device tree (FDT) attachments for these devices. In addition to these softcore we have developed support for the Terasic multi-touch LCD and are working to provide support for the Terasic HDMI Transmitter Daughter Card. Both of these work with common development and/or reference boards for Altera FPGAs. They do require additional IP cores which we plan to release to the open source community in the near future. With exception of the ATSE and HDMI drivers we have merged all of these changes to FreeBSD-CURRENT. We anticipate that these drivers will be useful for users who with to run FreeBSD on either hard or soft core CPUs on Altera FPGAs. This work has been sponsored by DARPA, AFRL, and Google. __________________________________________________________________ FreeBSD on ARMv6/ARMv7 Contact: freebsd-arm mailing list Support for ARMv6 and ARMv7 architecture has been merged from project branch to HEAD. This code covers the following parts: * General ARMv6/ARMv7 kernel bits (pmap, cache, assembler routines, etc...) * ARM Generic Interrupt Controller driver * Improved thread-local storage for cpus >=ARMv6 * Driver for SMSC LAN95XX and LAN8710A ethernet controllers * Marvell MV78x60 support (multiuser, ARMADA XP kernel config) * TI OMAP4 and AM335x support (multiuser, no GPU or graphics support, kernel configs for Pandaboard and Beaglebone) * LPC32x0 support (multiuser, frame buffer works with SSD1289 LCD controller. Embedded Artists EA3250 kernel config) This work was a result of a joint effort by many people, including but not limited to: Grzegorz Bernacki (gber@), Aleksander Dutkowski, Ben R. Gray (bgray@), Olivier Houchard (cognet@), Rafal Jaworowski (raj@) and Semihalf team, Tim Kientzle (kientzle@), Jakub Wojciech Klama (jceel@), Ian Lepore (ian@), Warner Losh (imp@), Damjan Marion (dmarion@), Lukasz Plachno, Stanislav Sedov (stas@), Mark Tinguely and Andrew Turner (andrew@). Thanks to all, who contributed by submitting code, testing and giving valuable advice. Open tasks: 1. More hardware bring-ups and more drivers 2. Finish SMP support 3. VFP/NEON support __________________________________________________________________ Google Summer of Code 2012 URL: http://www.freebsd.org/projects/summerofcode.html URL: https://wiki.freebsd.org/SummerOfCode2012 Contact: FreeBSD Summer of Code Administrators Over the Summer of 2012, FreeBSD were once again granted a place to participate in the Google Summer of Code program. We received a total of 32 project proposals, and were ultimately given 15 slots for university students to work on open source projects mentored by existing FreeBSD developers. We were able to accept a wide spread of proposals, covering both the base system and the ports infrastructure. We had students working on file systems, file integrity checking, and parallelization in the ports collection. Students worked on kernel infrastructure, including one project to support CPU resource limits on users, processes and jails, and one student improving the BSD callout(9) and timer facilities. Two students worked on the ARM platform, widely used in embedded systems and smart phones; one student worked on a significant cleanup and improvements to the Flattened Device Tree implementation code, while the other ported FreeBSD to the OMAP3-based BeagleBoard-xM device. One student worked on improving IPv6 support in userland tools, whilst another worked on BIOS emulation for the BHyVE BSD-licensed hypervisor, new in FreeBSD 10. Other students worked on EFI boot support, userland lock profiling and an automated kernel crash reporting system. Overall, a significant proportion of the code produced has or will be integrated into FreeBSD in one form or another. All of the work is available in our Summer Of Code Subversion repository, and some of the work has already been merged back into the main repositories. FreeBSD is once again grateful to Google for being selected to participate in Summer of Code 2012. __________________________________________________________________ KDE/FreeBSD URL: http://FreeBSD.kde.org URL: http://FreeBSD.kde.org/area51.php Contact: KDE FreeBSD The KDE/FreeBSD team have continued to improve the experience of KDE software and Qt under FreeBSD. The latest round of improvements include: * Fixes for building Qt with libc++ and C++11 * Fixes for Solid-related crashes * Fix battery detection in battery monitor plasmoid The team has also made many releases and upstreamed many fixes and patches. The latest round of releases include: * KDE SC: 4.9.1 (area51) and 4.8.4 (ports) * Qt: 4.8.3 (area51) * PyQt: 4.9.4 (area51); QScintilla 2.6.2 (area51); SIP: 4.13.3 (area51) * Calligra: 2.4.3, 2.5-RC2, 2.5.0. 2.5.1, 2.5.2 (area51) and 2.4.3, 2.5.0, 2.5.1 (ports) * Amarok: 2.6.0 (area51) * CMake: 2.8.9 (ports) * Digikam (and KIPI-plugins): 2.7.0, 2.8.0, 2.9.0 (area51) and 2.7.0, 2.9.0 (ports) * QtCreator: 2.6.0-beta (area51) * many smaller ports The team is always looking for more testers and porters so please contact us at kde@FreeBSD.org and visit our home page at http://FreeBSD.kde.org. Open tasks: 1. Please see 2012 Q4 Status Report 2. Updating out-of-date ports, see PortScout for a list __________________________________________________________________ Native iSCSI Target Contact: Edward Tomasz Napieral/a During the July-September time period, the Native iSCSI Target project was officially started under sponsorship from the FreeBSD Foundation. Before the end of September I've written ctld(8), the userspace part of the target, responsible for handling configuration, accepting incoming connections, performing authentication and iSCSI parameter negotiation, and handing off connections to the kernel. For the time being, I've reused some parts of protocol-handling code from the istgt project; since ctld(8) only handles the Login phase, the code can be rewritten in a much simpler and shorter way in the future. __________________________________________________________________ Parallel rc.d execution URL: https://github.com/buganini/rcexecr URL: https://github.com/kil/rcorder Contact: Kuan-Chung Chiu Contact: Kilian There are two implementations to make rc.d execution parallel. Compared to Kil's rcorder, rcexecr brings more concurrence and provides more flexibility than older "early_late_divider" mechanism but require more invasive /etc patch. Both implementations have switch to toggle parallel execution. Further modification/integration needs more discussion. Open tasks: 1. Refine /etc/rc.d/* to eliminate unnecessary waiting. __________________________________________________________________ Ports Collection URL: http://www.FreeBSD.org/ports/ URL: http://www.freebsd.org/doc/en_US.ISO8859-1/articles/contributing-ports/ URL: http://portsmon.freebsd.org/index.html URL: http://www.freebsd.org/portmgr/index.html URL: http://blogs.freebsdish.org/portmgr/ URL: http://www.twitter.com/freebsd_portmgr/ URL: http://www.facebook.com/portmgr Contact: Thomas Abthorpe Contact: Port Management Team The ports tree approaches 24,000 ports, while the PR count still is above 1000. In Q3 we added 2 new committers and took in two commits bit for safe keeping. The Ports Management team had performed multiple -exp runs, verifying how base system updates may affect the ports tree, as well as providing QA runs for major ports updates. Beat Gaetzi took over the role of sending out fail mails, a role that Pav Lucistnik had previously held. Beat also undertook the task of converting the Ports tree from CVS to Subversion. Florent Thoumie stepped down from his role on portmgr, he was instrumental in maintaining the legacy pkg_* code. Open tasks: 1. Most ports PRs are assigned, we now need to focus on testing, committing and closing. __________________________________________________________________ The FreeBSD Core Team Contact: Core Team Along with the change in the Core Team membership, several related roles changed hands. Gabor Pali assumed the role of core secretary from Gavin Atkinson, and David Chisnall replaced Robert Watson as liaison to the FreeBSD Foundation. The Core Team felt there was no longer a need for a formal security team liaison, so that role was retired. In the third quarter, the Core Team granted access for 2 new committers and took 2 commit bits into safekeeping. The Core Team worked with the Port Management Team and Cluster Administrators to set a date to stop providing CVS exports for the ports repository, which is February 28, 2013. In the meantime, the CVS export for 9.1-RELEASE was restored. __________________________________________________________________ The FreeBSD Japanese Documentation Project URL: http://www.FreeBSD.org/ja/ URL: http://www.jp.FreeBSD.org/doc-jp/ Contact: Hiroki Sato Contact: Ryusuke Suzuki Web page (htdocs): Newsflash and some other updates in the English version were translated to keep them up-to-date. Especially "security incident on FreeBSD infrastructure" was translated and published in a timely manner. FreeBSD Handbook: Big update in the "advanced-networking". With this update, merging translation results from the handbook in the local repository of Japanese documentation project into the main repository was completed. This chapter is still outdated and needs more work. The other sections have also constantly been updated. Especially, new subsection "Using pkgng for Binary Package Management" was added to "ports" section and "Using subversion" subsection was added to "mirrors" section. Article: Some progress was made in "Writing FreeBSD Problem Reports" and "Writing FreeBSD Problem Reports" articles. Open tasks: 1. Further translation work of outdated documents in the ja_JP.eucJP subtree. __________________________________________________________________ From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 13:54:10 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DA272775 for ; Mon, 4 Mar 2013 13:54:10 +0000 (UTC) (envelope-from isabell@issyl0.co.uk) Received: from mail-vc0-f176.google.com (mail-vc0-f176.google.com [209.85.220.176]) by mx1.freebsd.org (Postfix) with ESMTP id 9F365199B for ; Mon, 4 Mar 2013 13:54:10 +0000 (UTC) Received: by mail-vc0-f176.google.com with SMTP id fk10so3383149vcb.35 for ; Mon, 04 Mar 2013 05:54:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:sender:x-originating-ip:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=TAF5gStlzh2lRTRynJnPHC8g/JD6grypvGawb0QOpBE=; b=hUIPVscQ5RYXdg125aOOpJ7HUIngWyIemhigIXCPCs3RJe88fgaYbSK4gC2Io/9iuw AF65FX6t73oK3UNMZd+9QD9bEpKgHVCr5NV45IAa+qWlvbeq56nXMHo7wX1q2JSIOHmc Zs1kMs+Voc+2FOmAG1VtLYA2QN0KvMnbjrrZ3dCbHGSrf6YvuE/cz73c55dlAF52vGHx S8XQZ0GugtyjGqLQaEbgPXriWFZb0IXq8RBf9lchUgzwMzkp+dj3xjj5u5qNqn2lFQ5H KQH35D3yYVWIGlKjcV0X/so1UChsQ9Nq/BNi0tHGEPlxqa7jNC7MXi0V1jAsThb7Rim2 cFqA== MIME-Version: 1.0 X-Received: by 10.220.242.73 with SMTP id lh9mr7566953vcb.49.1362405243743; Mon, 04 Mar 2013 05:54:03 -0800 (PST) Sender: isabell@issyl0.co.uk Received: by 10.220.250.138 with HTTP; Mon, 4 Mar 2013 05:54:03 -0800 (PST) X-Originating-IP: [2.29.6.249] Date: Mon, 4 Mar 2013 13:54:03 +0000 X-Google-Sender-Auth: OHpE0Y1ylrkCvNGJ-qAPGcvxdCs Message-ID: Subject: FreeBSD Quarterly Status Report, October-December 2012. From: Isabell Long To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlp0GgqFUEJJGtShspdwVs4SSry+4N88kmc8P3fU3OLwRJU6O4lNbaagnD+FNW4wdqX+VQi Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 13:54:10 -0000 FreeBSD Quarterly Status Report, October-December 2012. Introduction This report covers FreeBSD-related projects between October and December 2012. This is the last of four reports planned for 2012. Highlights from this status report include a very successful EuroBSDCon 2012 conference and associated FreeBSD Developer Summit, both held in Warsaw, Poland. Other highlights are several projects related to the FreeBSD port to the ARM architecture, extending support for platforms, boards and CPUs, improvements to the performance of the pf(4) firewall, and a new native iSCSI target. Thanks to all the reporters for the excellent work! This report contains 27 entries and we hope you enjoy reading it. The deadline for submissions covering the period between January and March 2013 is April 21st, 2013. __________________________________________________________________ Projects * BHyVe * Native iSCSI Target * NFS Version 4 * pxe_http -- booting FreeBSD from apache * UEFI * Unprivileged install and image creation Userland Programs * BSD-licenced patch(1) * bsdconfig(8) FreeBSD Team Reports * FreeBSD Core Team * FreeBSD Documentation Engineering * FreeBSD Foundation * Postmaster Kernel * AMD GPUs kernel-modesetting support * Common Flash Interface (CFI) driver improvements * SMP-Friendly pf(4) * Unmapped I/O Documentation * The FreeBSD Japanese Documentation Project Architectures * Compiler improvements for FreeBSD/ARMv6 * FreeBSD on AARCH64 * FreeBSD on BeagleBone * FreeBSD on Raspberry Pi Ports * FreeBSD Haskell Ports * KDE/FreeBSD * Ports Collection * Xfce Miscellaneous * EuroBSDcon 2012 * FreeBSD Developer Summit, Warsaw __________________________________________________________________ AMD GPUs kernel-modesetting support URL: https://wiki.FreeBSD.org/AMD_GPU URL: http://people.FreeBSD.org/~kib/misc/ttm.1.patch Contact: Alexander Kabaev Contact: Jean-S=E9bastien P=E9dron Contact: Konstantin Belousov Jean-S=E9bastien P=E9dron started to port the AMD GPUs driver from Linux= to FreeBSD 10-CURRENT in January 2013. This work is based on a previous effort by Alexander Kabaev. Konstantin Belousov provided the initial port of the TTM memory manager. As of this writing, the driver is building but the tested device fails to attach. Status updates will be posted to the FreeBSD wiki. __________________________________________________________________ BHyVe URL: https://wiki.FreeBSD.org/BHyVe URL: http://www.bhyve.org/ Contact: Neel Natu Contact: Peter Grehan BHyVe is a type-2 hypervisor for FreeBSD/amd64 hosts with Intel VT-x and EPT CPU support. The bhyve project branch was merged into CURRENT on Jan 18. Work is progressing on performance, ease of use, AMD SVM support, and being able to run non-FreeBSD operating systems. Open tasks: 1. 1. Booting Linux/*BSD/Windows 2. 2. Moving the codebase to a more modular design consisting of a small base and loadable modules 3. 3. Various hypervisor features such as suspend/resume/live migration/sparse disk support __________________________________________________________________ BSD-licenced patch(1) URL: http://code.google.com/p/bsd-patch/ Contact: Pedro Giffuni Contact: Gabor Kovesdan Contact: Xin Li FreeBSD has been using for a while a very old version of GNU patch that is partially under the GPLv2. The original GNU patch utility is based on an initial implementation by Larry Wall that was not actually copyleft. OpenBSD did many enhancements to an older non-copyleft version of patch, this version was later adopted and further refined by DragonFlyBSD and NetBSD but there was no centralized development of the tool and FreeBSD kept working independently. In less than a week we took the version in DragonFlyBSD and adapted the FreeBSD enhancements to make it behave nearer to the version used natively in FreeBSD. Most of the work was done by Pedro Giffuni, adapting patches from sepotvin@ and ed@, and additional contributions were done by Christoph Mallon, Gabor Kovesdan and Xin Li. As a result of this we now have a new version of patch committed in head/usr.bin/patch that you can try by using WITH_BSD_PATCH in your builds. The new patch(1) doesn't support the FreeBSD-specific -I and -S options which don't seem necessary. In GNU patch -I actually means 'ignore whitespaces' and we now support it too. Open tasks: 1. Testing. A lot more testing. __________________________________________________________________ bsdconfig(8) URL: http://svnweb.FreeBSD.org/base/head/usr.sbin/bsdconfig/ URL: http://freshports.org/sysutils/bsdconfig/ URL: http://druidbsd.sf.net/download/bsdconfig/ Contact: Devin Teske bsdconfig(8) is actively being developed in HEAD under the WITH_BSDCONFIG build-requirement. Snapshots are occasionally taken and made available through the ports system to make testing on 9.0-RELEASE or higher easier on the testers. Currently HEAD is far beyond the version 0.7.3 sitting in ports. Upcoming changes will push this to version 0.8 bringing in the necessary frameworks required for in-depth package management and distribution maintenance (read: one step closer to full 1.0 release). __________________________________________________________________ Common Flash Interface (CFI) driver improvements Contact: Brooks Davis The Common Flash Interface provides a common programming interface for a wide range of NOR flash devices commonly found in embedded systems. I have developed a number of improvements to the cfi(4) device when used on Intel StrataFlash parts. Unnecessary erase cycles are now avoided, devices that require single word writes only write changed words, and multi-word writes are supported for Intel and Sharp devices. Additionally the timeout code has been reworked and no longer imposes unneeded latency on operations taking less than 100us. With all of these changes streaming write speed has improved by more than an order of magnitude. Once these changes are reviewed they will be committed to HEAD. This work was sponsored by DARPA and AFRL. __________________________________________________________________ Compiler improvements for FreeBSD/ARMv6 Contact: Andrew Turner FreeBSD/ARM architecture is now supported by the in-tree clang compiler. ARM EABI support is now available for both clang and gcc along with the older and less documented OABI. There are several outstanding issues, once they are fixed EABI will be made default. Open tasks: 1. Test EABI builds 2. Fix exception handling for EABI 3. Test clang builds 4. Get clang to work natively on EABI-based ARM system. Currently it works only as cross-compiler for ARM EABI. __________________________________________________________________ EuroBSDcon 2012 URL: http://2012.eurobsdcon.org/ URL: http://www.youtube.com/user/eurobsdcon Contact: EuroBSDcon Organizers Contact: Gabor Pali The 11th European BSD Conference took place in Warsaw, Poland at the Warsaw University of Technology with a large number of visitors. It started up with two tracks of tutorials, featuring FreeNAS, pfSense, DTrace, PF, development of NetBSD drivers, and an overall introduction to the FreeBSD operating system given by Kirk McKusick. There we also had opening and closing keynotes, supplemented with 22 talks on different topics related to FreeBSD, OpenBSD, NetBSD, FreeNAS and PC-BSD: BHyVe, configuration management with puppet, improvements in the OpenBSD cryptographic framework, tuning ZFS, server load balancing in DNS, running FreeBSD on embedded systems, e.g MIPS and ARM, and challenges in identity management and authentication. The conference also had a dedicated track presented by the attendees of the FreeBSD developer summit and open to all, where one could learn more about what is happening currently in the Project: results of Google Summer of Code 2012, architectural changes in the FreeBSD documentation tree, ILNP, advancements in package building and development of pkg(8), and a status report on the USB stack. __________________________________________________________________ FreeBSD Core Team Contact: Core Team In the fourth quarter, the Core Team granted access for 7 new committers, and took 1 commit bit in for safekeeping. The Core Team oversaw the response to the security incident in November in cooperation with the security team, port managers, and cluster administrators. For more information on the fallouts and response see the official announcement. As a result, 9.1-RELEASE was delayed until late December and was released with a limited set of binary packages. The Core Team continues to work with developers to rebuild, review, and restore the package building infrastructure along with redports/QAT. __________________________________________________________________ FreeBSD Developer Summit, Warsaw URL: http://wiki.FreeBSD.org/201210DevSummit Contact: Gabor Pali We had 53 FreeBSD developers and invited guests attending the FreeBSD Developer Summit organized as part of EuroBSDcon 2012 in Warsaw, Poland at the Warsaw University of Technology. This year EuroBSDcon organizers again offered us their generous support in helping with keeping the event running smooth, helping with registrations, renting the venue, and providing food for keeping attendees satisfied and happy. The Warsaw developer summit spanned over 3 days and had 9 working groups on various topics. We improved last year's layout inherited from the Canadian summits because it has worked well earlier but could use some further refinements. On both the first and second days, we ran the working groups, ranging from the standard matters, discussing issues with the USB stack, the compiler toolchain, the Ports Collection, or the documentation to some experimental ones, e.g. arranging an operating systems course focusing on FreeBSD. In addition to this, similarly to last year, one of the working groups was about gathering vendors to present their ideas and engage in discussion with the developers on their needs from the Project. Finally, on the third day, there were a number of exciting work-in-progress reports given in a dedicated Developer Summit track at the main conference. Photos and slides for the most of the talks are available on the home page of the summit. __________________________________________________________________ FreeBSD Documentation Engineering URL: http://www.FreeBSD.org/internal/doceng.html Contact: Glen Barber Contact: Marc Fonvieille Contact: G=E1bor K=F6vesd=E1n Contact: Hiroki Sato The translations/, projects/ and user/ directories of the doc repository have been opened with the announced policies in effect. These branches are now actively used for translations work, editing the upcoming printed version of the Handbook, and some doc infrastructure improvements. The next phase of the infrastructure improvements is in progress. It will migrate to real XML tools (with the exception of Jade) for validation and rendering. At the same time, the DocBook schema will be updated to 4.5. After long discussions, Google Analytics has been enabled on FreeBSD.org webpages but access to statistical data has to be solicited from the Documentation Engineering Team on an individual and one time basis. Since July, we have added two doc committers and one translator. Open tasks: 1. Help the ongoing work on printed edition of the Handbook. 2. Finish the migration to XML tools. __________________________________________________________________ FreeBSD Foundation Contact: Deb Goodkin A strong year-end fundraising campaign led to the raising $770,000 in 2012. Thank you to everyone who made a donation to support FreeBSD! We published our year-end newsletter that highlighted everything we did to support the FreeBSD Project and community during the second half of the year. We were a Gold Sponsor for EuroBSDCon. We also attended the conference and developer summit. Erwin Lansing organized and chaired the Ports and Package Summit and Vendor Summit at EuroBSDCon 2012. We attended MeetBSD developer summit November 2012. George Neville-Neil organized and the Foundation sponsored the Bay Area Vendor Summit November 2012. We were represented at LISA. Kirk McKusick taught a tutorial and gave a keynote at EuroBSDCon 2012, and Justin Gibbs gave a talk at ZFS Day, October 2012. We talked to DNS server software vendors and participated in discussions on our DNS implementation, specifically with regard to DNSSEC validation, at CENTR Tech September 2012 (Amsterdam, the Netherlands) and EuroBSDCon. We visited companies to discuss their FreeBSD use and to help facilitate collaboration with the Project. Robert Watson published ACM Queue and Communications of the ACM: A decade of OS access-control extensibility and Kirk McKusick published ACM Queue and Communications of the ACM: Disks from the Perspective of a File System. We negotiated/supervised Foundation funded projects: porting FreeBSD to the Efika ARM platform, Capsicum Component Framework, Native iSCSI Target implementation, and EUFI. We negotiated/supervised/funded hardware needs in FreeBSD co-location centers. Many board members provided support for recovery efforts following the security compromise of FreeBSD.org systems in late 2012. We completed negotiation and provided legal counsel for the new website privacy policy for the FreeBSD Project. We are now an industrial partner in the Cambridge/Imperial/Edinburgh EPSRC REMS project on the Rigorous Engineering of Mainstream Systems. We coordinated the Foundation's discussion of Jira/Java; conclusion, will continue to be supportive of OpenJDK and not restart proprietary JDK support. We implemented a donor management database to help with our fundraising efforts. We also began working on automating the donation process. We started the Faces of FreeBSD Series where we share the story of a Foundation grant recipient periodically. This allows us to spotlight people who received Foundation funding to work on development projects, run conferences, travel to conferences, and advocate for FreeBSD. We hired two technical staff members. __________________________________________________________________ FreeBSD Haskell Ports URL: http://wiki.FreeBSD.org/Haskell URL: https://github.com/FreeBSD-haskell/FreeBSD-haskell/ Contact: G=E1bor P=C1LI Contact: Ashish SHUKLA We are proud to announce that the FreeBSD Haskell Team has updated the Haskell Platform to 2012.4.0.0, GHC to 7.4.2 as well as updated existing ports to their latest stable versions. All Haskell ports are also updated to use new OPTIONS framework, and now, building with dynamic libraries (DYNAMIC) is on by default. GHC also uses GCC 4.6 and binutils 2.22 from ports. We also added a number of new Haskell ports, and their count in FreeBSD Ports tree is now 368. Open tasks: 1. Test GHC to work with clang/LLVM. 2. Commit pending Haskell ports to the FreeBSD Ports tree. 3. Add more ports to the Ports Collection. __________________________________________________________________ FreeBSD on AARCH64 URL: https://github.com/zxombie/aarch64-freebsd-sandbox URL: http://www.arm.com/products/tools/models/fast-models/foundation-model.p hp Contact: Andrew Turner Work has started on porting FreeBSD to AARCH64, ARM's new 64-bit architecture, using the ARMv8 Foundation Model software. GCC and binutils have been ported to FreeBSD and work started on kernel initialization, including MMU setup. Open tasks: 1. Get the MMU working 2. Get system register documentation from ARM 3. Port clang AArch64 to FreeBSD 4. Bring the code into a FreeBSD project branch __________________________________________________________________ FreeBSD on BeagleBone Contact: Tim Kientzle Contact: Oleksandr Tymoshenko Contact: Damjan Marion Contact: Brett Wynkoop FreeBSD on BeagleBone is benefiting from the general work on ARM stability being done by many people, and is proving to be a nice testbed for our ARMv7 support. All ongoing work is happening now directly in -CURRENT and we expect it to be in pretty good shape by the time 10.0 ships. The network driver is now pretty stable; the system should be useful as a small network device. Occasional system snapshots are being built and advertised for people to test. Ask on freebsd-arm@ if you'd like to try the newest one. Open tasks: 1. We need someone to finish the USB driver. Ask if you'd like to take this over. 2. MMCSD performance is still rather poor. 3. There's been discussion of how to improve the GPIO configuration and pinmux handling to simplify hardware experimentation. If we had more people to help build drivers, we could start supporting some of the BeagleBone capes. 4. Mostly we just need people to use it and report any issues they encounter. __________________________________________________________________ FreeBSD on Raspberry Pi Contact: Oleksandr Tymoshenko FreeBSD is running on Raspberry Pi and supports the following peripherals: * USB controller * SDHC controller * Network * Framebuffer (HDMI and composite) * GPIO * VCHI interface Videocore tests (OpenGL, video decoding, audio, display access) work with current VCHI driver implementation. Open tasks: 1. Add DMA mode support to USB driver. Some proof-of-concept code is done but more work required to finish it. 2. Re-implement VCHI driver with more FreeBSD-friendly locking. 3. Implement more drivers: SPI, PWM, audio. __________________________________________________________________ Google Summer of Code 2013 URL: http://www.FreeBSD.org/projects/summerofcode.html URL: http://google-opensource.blogspot.co.uk/2013/02/flip-bits-not-burgers-g oogle-summer-of.html URL: http://www.google-melange.com/gsoc/homepage/google/gsoc2013 URL: http://en.wikipedia.org/wiki/Google_Summer_of_Code Contact: FreeBSD Summer of Code Administrators Since 2005 Google has run its yearly Summer of Code program, in which Google awards stipends to students who successfully complete projects with participating Open Source organisations. FreeBSD has participated in GSoC every year since its inception, and with the announcement that Google will once again run the program in 2013 hopes to participate once more. Google have not yet opened the application period for mentoring organisations, but once it does FreeBSD plans to apply. Assuming that we are successful in our application to participate, we will publish a large list of ideas for possible projects shortly after. Students may then apply to do one of those projects, or suggest their own idea for a project. After the application period, FreeBSD will discover how many student slots we have been allocated, at which point successful students will take some time to plan their project, gather required information and discuss their plans with their mentors, before having around 12 weeks to develop their code. In the eight years of FreeBSD's participation in Google Summer of Code, approximately 150 students have successfully complete projects with us, covering a wide spread of areas of both the source and ports trees. Of these, 22 students continued participating with FreeBSD and subsequently became full FreeBSD committers, many later going on to mentor Summer of Code students themselves. Whether FreeBSD has been successful in being selected to be a participating organisation in Google Summer of Code 2013 should be announced in early April. __________________________________________________________________ KDE/FreeBSD URL: http://FreeBSD.kde.org URL: http://FreeBSD.kde.org/area51.php Contact: KDE FreeBSD The KDE/FreeBSD team have continued to improve the experience of KDE software and Qt under FreeBSD. The latest round of improvements include: * Fix handling of Removable property in solid engine * Fix management of backlight with UPower (requires acpi_video(4)) * Installing spell-checking dictionaries with a dependency of KDE-locale ports The team has also made many releases and upstreamed many fixes and patches. The latest round of releases include: * KDE SC: 4.9.2 (area51) * PyQt: 4.9.5 (area51); SIP: 4.14 (area51) * KDevelop: 4.4.0, 4.4.1 (area51); KDevPlatform: 1.4.0, 1.4.1 (area51) * Calligra: 2.5.3, 2.5.4 (area51) * CMake: 2.8.10.1 * Many smaller ports The team is always looking for more testers and porters so please contact us at kde@FreeBSD.org and visit our home page at http://FreeBSD.kde.org. Open tasks: 1. Updating out-of-date ports, see PortScout for a list __________________________________________________________________ Native iSCSI Target Contact: Edward Tomasz Napieral/a During the October-December time period, the Native iSCSI Target project progressed to the working prototype stage. Most of this time was spent writing kernel-based part, an iSCSI frontend to the CAM Target Layer. The frontend handles iSCSI Full Feature phase after ctld(8) hands off the connection. The istgt-derived code in ctld(8) was rewritten from scratch; now it's much shorter and more readable. The ctladm(8) utility gained iSCSI-specific subcommands to handle tasks such as listing iSCSI sessions or forcing disconnection. The target works correctly with the FreeBSD initiator. __________________________________________________________________ NFS Version 4 Contact: Rick Macklem The NFSv4.1 client, including support for pNFS for the Files Layout only, has now been committed to head/current. Work on NFSv4.1 server support has just been started and will hopefully be ready for head/current this summer. The client side disk caching of delegated files is progressing and the code is under projects/nfsv4-packrats in the subversion repository. Someone is working on server side referrals and, as such, I hope this might make it into 10.0 as well. __________________________________________________________________ Ports Collection URL: http://www.FreeBSD.org/ports/ URL: http://www.FreeBSD.org/doc/en_US.ISO8859-1/articles/contributing-ports/ URL: http://portsmon.FreeBSD.org/index.html URL: http://www.FreeBSD.org/portmgr/index.html URL: http://blogs.FreeBSDish.org/portmgr/ URL: http://www.twitter.com/freebsd_portmgr/ URL: http://www.facebook.com/portmgr Contact: Thomas Abthorpe Contact: Port Management Team The ports tree crossed the threshold of 24,000 ports, while the PR count still is close to 1600. In Q4 we added five new committers and took in two commit bits for safe keeping. In the tradition of recruiting new portmgr@ at conferences, we added Bernhard Froehlich to our ranks. He is the one responsible for redports.org Pav Lucistnik stepped down from his role on portmgr, he was one of our principles doing -exp runs and well known for sending failmails. In the well publicised compromise, the pointyhat machines were broken into and subsequently taken down, isolated and sanitised. As a pre-emptive move redports/QAT were also taken down. Work is under way to restore the services. Mark Linimon began a from-scratch test install on one of his own spare machines with the purpose of documenting all the missing steps from the portbuild article. While doing so, he further overhauled the codebase to both make it easier to install, and to further refactor it in light of a security review (still ongoing at time of this writing). Once this is complete, the next task will be to reinstall all existing machines from scratch. Open tasks: 1. Most ports PRs are assigned, we now need to focus on testing, committing and closing. __________________________________________________________________ Postmaster Contact: David Wolfskill The postmaster team has expanded, with the addition of Florian Smeets (flo@FreeBSD.org). We have implemented a Mailman "handler" to drop duplicate messages when both copies are sent to the same list (under both the "long" (e.g., "freebsd-current") and "short" (e.g., "current") names). We have created several new mailing lists: * freebsd-course: educational course on FreeBSD * freebsd-numerics: Discussions of high quality implementation of libm functions. * freebsd-snapshots: FreeBSD Development Snapshot Announcements * freebsd-tcltk: FreeBSD-specific Tcl/Tk discussions We have also removed old mailing lists: * freebsd-binup * freebsd-www (merged into freebsd-doc) __________________________________________________________________ pxe_http -- booting FreeBSD from apache URL: http://svnweb.FreeBSD.org/base/user/sbruno/pxe_http_head/ Contact: Sean Bruno Currently works with VirtualBox VMs and Apache 2.2 port. Open tasks: 1. Lots and lots of compile warnings exist with clang and gcc. This really needs to be investigated. 2. Better support for other webservers. Currently needs Apache to work. 3. Needs another pass at basic documentation. Current documentation is actually quite good from the original 4. Network stack needs audit. I'm not sure if the HTTP/TCP/UDP/IP code is original or based on something else. __________________________________________________________________ SMP-Friendly pf(4) Contact: Gleb Smirnoff The project is aimed at moving the pf(4) packet filter out of a single mutex, as well as in general improving of the FreeBSD port. The project has reached its main goal. The pf(4) is no longer covered by single mutex and contention on network stack on pf(4) is now very low. The code is production ready. The projects/pf branch had been merged to the head branch and will be available in 10.0-RELEASE. __________________________________________________________________ The FreeBSD Japanese Documentation Project URL: http://www.FreeBSD.org/ja/ URL: http://www.jp.FreeBSD.org/doc-jp/ Contact: Hiroki Sato Contact: Ryusuke Suzuki The ja_JP.eucJP subtree has constantly been updated since the last status report. In FreeBSD Handbook, translation work of the "users" section has been completed. "linuxemu" and "serialcomms" were updated and subsection "Subversion mirror site" was newly added to "mirrors" section. Open tasks: 1. Further translation work of outdated documents in the ja_JP.eucJP subtree. __________________________________________________________________ UEFI URL: https://wiki.FreeBSD.org/UEFI URL: http://svnweb.FreeBSD.org/base/projects/uefi/ Contact: Benno Rice There is code in the projects/uefi branch that can build a working 64-bit loader for UEFI. This loader can load a kernel and boot to a mountroot prompt on a serial console on a system with <=3D 1GB of RAM. Full multiuser has not yet been tested. Work is progressing towards having a working syscons. The issue preventing boot on systems with > 1GB of RAM has not yet been found. UEFI-compatible boot media can be generated using in-tree tools, however there are issues with detecting the CD filesystem and using it as the load default. The 64-bit UEFI loader can load a 32-bit kernel but currently cannot hand over to it due to a lack of code to switch to 32-bit mode. Further research is required into Secure Boot. __________________________________________________________________ Unmapped I/O URL: http://people.FreeBSD.org/~kib/misc/unmapped.13.patch Contact: Jeff Roberson Contact: Konstantin Belousov A well-known performance problem of FreeBSD on large SMP hardware is the need to invalidate TLB for all CPUs when instantiating and destroying the VMIO buffers. Invalidation is performed by sending inter-processor interrupt broadcast, which disrupts the execution path of each CPU, and induces latency on the request itself. Since most I/O requests processing require creation of the buffers to hold the data in the kernel, TLB invalidation becomes an obstacle for I/O scalability on many-CPU machines. The work done for flushing the TLBs is especially meaningless since most mappings created are not used for anything but copying the data from the usermode to the kernel page cache forth and back. Most architectures have already established facilities to perform such copies using much faster techniques, for instance, the direct map on amd64, or specially reserved per-CPU page frames or TLB entries on other architectures. Jeff Roberson unified the machine-specific parts of the busdma(9), making a common set of low-level functions available on each architecture. This was committed as r246713. The end result is that the new types of the load functions can be added in the single, machine-independent place. In particular, it is easy to modify the drivers to accept the 'unmapped' bio requests, which lists the vm pages for the device dma engine, instead of the virtual address of the kernel buffer. Konstantin Belousov developed the changes for buffer cache which allow the VMIO buffers to not map the referenced pages, and used the feature for UFS. Per-architecture pmap_copy_pages(9) methods were added to facilitate fast copying between user I/O buffers and pages of unmapped buffers. The unmapped buffers create the unmapped bio requests for the drivers, support for which was made possible by Jeff's patch. Tests show that even on a small 4-core machine, the system time for reading files on UFS is reduced by 30%. Open tasks: 1. Test the patch, in particular, on non-x86 architectures. __________________________________________________________________ Unprivileged install and image creation Contact: Brooks Davis In order to make it easier to build releases and embedded system disk images I have been adding infrastructure to allow the install and packaging stages to the FreeBSD build progress to run without root privilege. To this end I have added two options to the toplevel build system: The -DDB_FROM_SRC option allows the install to proceed when the required set of passwd and group entires does not match the host system. The -DNO_ROOT option causes files to be installed as the running user and for metadata such as owner, group, suid bits, and file flags to be logged in a ${DESTDIR}/METALOG file. This work required the import of NetBSD's mtree and the addition of a number of features from NetBSD to install. I have added all FreeBSD features to NetBSD's mtree and imported it as nmtree. Before FreeBSD 10.0 is released I will replace our version. I have also added all required features to install. Changes to makefs were required to parse the contents of the METALOG file. These new features required importing new versions of the pwcache(3) and vis(3) APIs from NetBSD so those portions of libc. In addition to modifying build infrastructure to use the new features of mtree and install. I corrected a number of cases of files being installed by programs other than install or being installed more than once. A few known instances of duplicate directories in the output exist, but the results are usable in some contexts. I plan to MFC these changes as far back as the stable/8 branch to make it possible to build all supported releases without root privilege. This work was sponsored by DARPA and AFRL. Open tasks: 1. Add support for -DNO_ROOT to src/release/Makefile so that releases can be built without root privilege. 2. Create a tool to install partition tables and file system images in disk image files without the use of mdctl, gpart, and dd. __________________________________________________________________ Xfce URL: https://wiki.FreeBSD.org/Xfce Contact: A major update has been made to Thunar (file manager for the Xfce Desktop Environment). 1.6.x series introduce lots of improvements, most noticeably is tabs support, and the performance has been improved. Open tasks: 1. Try to fix HSTS (HTTP Strict Transport Security) feature in Midori with Vala 0.12.1 (works fine with Vala >=3D 0.14.x) 2. Replace libxfce4gui (deprecated and not maintained by upstream) by libxfce4ui in order to enhance support for Xfce >=3D 4.10. 3. Test core and plugins (panel, Thunar) with GLib >=3D 4.32 (to replac= e deprecated and removed functions introduced since GLib 2.30). 4. Fix gtk-xfce-engine with Gtk+ >=3D 3.6. __________________________________________________________________ From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 16:05:59 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7695BFDE for ; Mon, 4 Mar 2013 16:05:59 +0000 (UTC) (envelope-from pierre@guinoiseau.eu) Received: from mail.poildetroll.net (tritus.poildetroll.net [IPv6:2a01:4f8:160:72a3::6:1]) by mx1.freebsd.org (Postfix) with ESMTP id 0BB84332 for ; Mon, 4 Mar 2013 16:05:59 +0000 (UTC) Received: from tritus.poildetroll.net (tritus.poildetroll.net [IPv6:2a01:4f8:160:72a3::6:1]) by mail.poildetroll.net (Postfix) with ESMTP id C7386AA15 for ; Mon, 4 Mar 2013 17:05:57 +0100 (CET) X-Virus-Scanned: amavisd-new at poildetroll.net Received: from mail.poildetroll.net ([IPv6:2a01:4f8:160:72a3::6:1]) by tritus.poildetroll.net (mail.poildetroll.net [IPv6:2a01:4f8:160:72a3::6:1]) (amavisd-new, port 10024) with LMTP id qA0Af7YCH-NI for ; Mon, 4 Mar 2013 17:05:56 +0100 (CET) Received: from kyleck.poildetroll.net (master.obiwankeno.be [IPv6:2a01:4f8:160:72a3::7:1]) by mail.poildetroll.net (Postfix) with SMTP id AB92AAA0D for ; Mon, 4 Mar 2013 17:05:56 +0100 (CET) Date: Mon, 4 Mar 2013 17:05:56 +0100 From: Pierre Guinoiseau To: freebsd-stable@freebsd.org Subject: Re: FreeBSD 9.1 - openldap slapd lockups, mutex problems Message-ID: <20130304160556.GF83171@kyleck.poildetroll.net> References: <50FEB684.9040201@egr.msu.edu> <20130214021356.GJ46275@kyleck.poildetroll.net> <20130214091945.GZ38901@e-Gitt.NET> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="GdbWtwDHkcXqP16f" Content-Disposition: inline In-Reply-To: <20130214091945.GZ38901@e-Gitt.NET> X-Operating-System: FreeBSD User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 16:05:59 -0000 --GdbWtwDHkcXqP16f Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, I've tested it in a 8.3R jail on a 9.1R host, same setup, and the problem is still there. So it may be a kernel bug on 9.1R. On 14/02/2013 10:19:45, Oliver Brandmueller wrote: > Hi, >=20 > On Thu, Feb 14, 2013 at 03:13:57AM +0100, Pierre Guinoiseau wrote: > > > I have seen openldap spin the cpu and even run out of memory to get= =20 > > > killed on some of our test systems running ~9.1-rel with zfs. > [...] > > I've the same problem too, inside a jail, stored on ZFS. I've tried var= ious > > tuning in slapd.conf, but none fixed the problem. While hanging, db_sta= t -c > > shows that all locks are being used, I've tried to set the limit really= high, > > far more than normally needed, but it didn't help. I may have the same = problem > > with amavisd-new but I've to verify that to be sure the symptoms are si= milar. >=20 > I have amd64 9.1-STABLE r245456 (about Jan 15) running. I have openldap= =20 > openldap-server-2.4.33_2 running, depending on libltdl-2.4.2 and=20 > db46-4.6.21.4 . >=20 > The system is zfs only (for the local filesystems, where openldap is=20 > running - it has several NFS mounts for other purposes though). It's up= =20 > and running for about a month now (29 days) and never showed any=20 > problematic behaviour regarding to slapd. >=20 > I have ~10 SEARCH requests per seconds avg and only minor=20 > ADD/MODIFY/DELETE operations. It has several binds und unbinds, about=20 > 1/10th of the requests. It runs in slurpd slave mode for my master LDAP. >=20 > zroot/var/db runs with compression=3Doff, dedup=3Doff, zroot is a mirrore= d=20 > pool on 2 Intel SATA SSD drives inside a GPT partition. Swap is on a ZFS= =20 > zvol. >=20 > - Oliver >=20 >=20 > --=20 > | Oliver Brandmueller http://sysadm.in/ ob@sysadm.in | > | Ich bin das Internet. Sowahr ich Gott helfe. | --GdbWtwDHkcXqP16f Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlE0xmQACgkQJikNJSAyef/XkQCgvWkxaOLDhQ9QP/Kv4Cf9U6OI 0uIAn3bNrMm/y/k2VblShovtePqmnQP+ =CuOo -----END PGP SIGNATURE----- --GdbWtwDHkcXqP16f-- From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 16:07:40 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 73A7C176 for ; Mon, 4 Mar 2013 16:07:40 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-bk0-f51.google.com (mail-bk0-f51.google.com [209.85.214.51]) by mx1.freebsd.org (Postfix) with ESMTP id 0539436F for ; Mon, 4 Mar 2013 16:07:39 +0000 (UTC) Received: by mail-bk0-f51.google.com with SMTP id ik5so2514539bkc.10 for ; Mon, 04 Mar 2013 08:07:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=LhhCXONZ4ZSz0nmMfkq6k7xWhim+LDrw+y6YJhE+Dfs=; b=eNvBmBl5NUZyNdTxGfCNdUxbtm1RLzFaRYJVXZyAWvdKo4TKGRL+CxsXiBBMPgcDYp ca/8jrJPYW6OGCJ2IsFKs75AJhnZ6ZZJSXy4/Qde7aQEcceCrWdgz7lsKtq+ebkaggwd 3TXF4pMDN+SrE2TNUR+62YDvLyi2D7nwVa1qDY42zenc3i+xe9Idpfl2yd70JcZsHLTy r1SA/9y8WvoAPinkKdODLdz4qnY64Tfak+PwuKh5k4Gsq6Rx/bDA5epSlTZ8CzGRq4z5 Qz0UeHna+qaXk120mxCPCIUOGl0YtMyMHdbDg1RuIfjSs5o0S7dL5CaX6y+MzGl0vqgY CAkA== X-Received: by 10.205.123.138 with SMTP id gk10mr7748655bkc.49.1362413253322; Mon, 04 Mar 2013 08:07:33 -0800 (PST) Received: from [192.168.1.128] ([91.196.229.122]) by mx.google.com with ESMTPS id x10sm6028582bkv.13.2013.03.04.08.07.31 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 04 Mar 2013 08:07:32 -0800 (PST) Message-ID: <5134C6C2.9020009@gmail.com> Date: Mon, 04 Mar 2013 18:07:30 +0200 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:19.0) Gecko/20100101 Firefox/19.0 SeaMonkey/2.16 MIME-Version: 1.0 To: David Magda Subject: Re: Musings on ZFS Backup strategies References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> <2B318078-F863-4415-8DAE-94EE4431BF4C@ee.ryerson.ca> In-Reply-To: <2B318078-F863-4415-8DAE-94EE4431BF4C@ee.ryerson.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 16:07:40 -0000 02.03.2013 03:12, David Magda: > > On Mar 1, 2013, at 12:55, Volodymyr Kostyrko wrote: > >> Yes, I'm working with backups the same way, I wrote a simple script that synchronizes two filesystems between distant servers. I also use the same script to synchronize bushy filesystems (with hundred thousands of files) where rsync produces a too big load for synchronizing. >> >> https://github.com/kworr/zfSnap/commit/08d8b499dbc2527a652cddbc601c7ee8c0c23301 > > There are quite a few scripts out there: > > http://www.freshports.org/search.php?query=zfs A lot of them require python or ruby, and none of them manages synchronizing snapshots over network. > For file level copying, where you don't want to walk the entire tree, here is the "zfs diff" command: > >> zfs diff [-FHt] snapshot [snapshot|filesystem] >> >> Describes differences between a snapshot and a successor dataset. The >> successor dataset can be a later snapshot or the current filesystem. >> >> The changed files are displayed including the change type. The change >> type is displayed useing a single character. If a file or directory >> was renamed, the old and the new names are displayed. > > http://www.freebsd.org/cgi/man.cgi?query=zfs > > This allows one to get a quick list of files and directories, then use tar/rsync/cp/etc. to do the actual copy (where the destination does not have to be ZFS: e.g., NFS, ext4, Lustre, HDFS, etc.). I know that but I see no reason in reverting to file-based synch if I can do block-based. -- Sphinx of black quartz, judge my vow. From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 17:05:26 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A4581C24 for ; Mon, 4 Mar 2013 17:05:26 +0000 (UTC) (envelope-from dmagda@ee.ryerson.ca) Received: from eccles.ee.ryerson.ca (eccles.ee.ryerson.ca [141.117.1.2]) by mx1.freebsd.org (Postfix) with ESMTP id 73CDD88A for ; Mon, 4 Mar 2013 17:05:25 +0000 (UTC) Received: from webmail.ee.ryerson.ca (eccles [172.16.1.2]) by eccles.ee.ryerson.ca (8.14.4/8.14.4) with ESMTP id r24H4PTI075344; Mon, 4 Mar 2013 12:04:25 -0500 (EST) (envelope-from dmagda@ee.ryerson.ca) Received: from 206.108.127.2 (SquirrelMail authenticated user dmagda) by webmail.ee.ryerson.ca with HTTP; Mon, 4 Mar 2013 12:04:25 -0500 Message-ID: <1e4c24a68e76a279eaf4dc4f7c0156d3.squirrel@webmail.ee.ryerson.ca> In-Reply-To: <5134C6C2.9020009@gmail.com> References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> <2B318078-F863-4415-8DAE-94EE4431BF4C@ee.ryerson.ca> <5134C6C2.9020009@gmail.com> Date: Mon, 4 Mar 2013 12:04:25 -0500 Subject: Re: Musings on ZFS Backup strategies From: "David Magda" To: "Volodymyr Kostyrko" User-Agent: SquirrelMail/1.4.22 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 17:05:26 -0000 On Mon, March 4, 2013 11:07, Volodymyr Kostyrko wrote: > 02.03.2013 03:12, David Magda: >> There are quite a few scripts out there: >> >> http://www.freshports.org/search.php?query=zfs > > A lot of them require python or ruby, and none of them manages > synchronizing snapshots over network. Yes, but I think it is worth considering the creation of snapshots, and the transfer of snapshots, as two separate steps. By treating them independently (perhaps in two different scripts), it helps prevent the breakage in one from affecting the other. Snapshots are not backups (IMHO), but they are handy for users and sysadmins for the simple situations of accidentally files. If your network access / copying breaks or is slow for some reason, at least you have simply copies locally. Similarly if you're having issues with the machine that keeps your remove pool. By keeping the snapshots going separately, once any problems with the network or remote server are solved, you can use them to incrementally sync up the remote pool. You can simply run the remote-sync scripts more often to do the catch up. It's just an idea, and everyone has different needs. I often find it handy to keep different steps in different scripts that are loosely coupled. >> This allows one to get a quick list of files and directories, then use >> tar/rsync/cp/etc. to do the actual copy (where the destination does not >> have to be ZFS: e.g., NFS, ext4, Lustre, HDFS, etc.). > > I know that but I see no reason in reverting to file-based synch if I > can do block-based. Sure. I just thought I'd mention it in the thread in case other do need that functionality and were not aware of "zfs diff". Not everyone does or can do pool-to-pool backups. From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 17:23:51 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 627262BD for ; Mon, 4 Mar 2013 17:23:51 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-bk0-f49.google.com (mail-bk0-f49.google.com [209.85.214.49]) by mx1.freebsd.org (Postfix) with ESMTP id D9DB395A for ; Mon, 4 Mar 2013 17:23:50 +0000 (UTC) Received: by mail-bk0-f49.google.com with SMTP id w11so2551862bku.36 for ; Mon, 04 Mar 2013 09:23:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=sk5W3+3nvRe71nugnSyyifOd4kSdYPZ/Ub+d7GJViBA=; b=zMA0KF4uJlGEZjiP7tV8LcXC33/jrrBOg02BBliQiw9MGyaR1HKMeg/af6yXr5+VxF M1c8Hbjc0V1Ed/jZNhyi7osurfsq2BQ/VqUdiKcDIFmA4pGUGTUGw2XZRYRRC4tI+dHJ jKgskDuOSCAsGAnl8ibq9LXYz9uyrQh6Ikn8feikcCbJzmNSI8/1MiwgCUpMmpRU3p2d jhTtm+We+C09TJsOer5aPWxgkOjmzu+pmfYY0hfo1q+JDHoqSoj7VCoTKQ9mchB1GMdT QP8MJCregp3133efBQXksF8gi+eM0MizHhOeAErdb16PrFrMHCD+XgYDvUNxfG0by6bQ u+Fg== X-Received: by 10.204.141.17 with SMTP id k17mr7773232bku.67.1362417824038; Mon, 04 Mar 2013 09:23:44 -0800 (PST) Received: from [192.168.1.128] ([91.196.229.122]) by mx.google.com with ESMTPS id x18sm6123629bkw.4.2013.03.04.09.23.42 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 04 Mar 2013 09:23:43 -0800 (PST) Message-ID: <5134D89E.3050004@gmail.com> Date: Mon, 04 Mar 2013 19:23:42 +0200 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:19.0) Gecko/20100101 Firefox/19.0 SeaMonkey/2.16 MIME-Version: 1.0 To: David Magda Subject: Re: Musings on ZFS Backup strategies References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> <2B318078-F863-4415-8DAE-94EE4431BF4C@ee.ryerson.ca> <5134C6C2.9020009@gmail.com> <1e4c24a68e76a279eaf4dc4f7c0156d3.squirrel@webmail.ee.ryerson.ca> In-Reply-To: <1e4c24a68e76a279eaf4dc4f7c0156d3.squirrel@webmail.ee.ryerson.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 17:23:51 -0000 04.03.2013 19:04, David Magda: > On Mon, March 4, 2013 11:07, Volodymyr Kostyrko wrote: >> 02.03.2013 03:12, David Magda: >>> There are quite a few scripts out there: >>> >>> http://www.freshports.org/search.php?query=zfs >> >> A lot of them require python or ruby, and none of them manages >> synchronizing snapshots over network. > > Yes, but I think it is worth considering the creation of snapshots, and > the transfer of snapshots, as two separate steps. By treating them > independently (perhaps in two different scripts), it helps prevent the > breakage in one from affecting the other. Exactly. My script is just an addition to zfSnap or any other tool that manages snapshots. Currently it does nothing more then comparing list of available snapshots and network transfer. > Snapshots are not backups (IMHO), but they are handy for users and > sysadmins for the simple situations of accidentally files. If your network > access / copying breaks or is slow for some reason, at least you have > simply copies locally. Similarly if you're having issues with the machine > that keeps your remove pool. Yes, I addressed such thing specifically adding availability to restart transfer from any point or just even don't care - once initialized the process is autonomous and in case of failure anything would be rolled back to last known good snapshot. I also added possibility to compress/limit traffic. > By keeping the snapshots going separately, once any problems with the > network or remote server are solved, you can use them to incrementally > sync up the remote pool. You can simply run the remote-sync scripts more > often to do the catch up. > > It's just an idea, and everyone has different needs. I often find it handy > to keep different steps in different scripts that are loosely coupled. I just tried to give another use for snapshots. Or least the way to simplify things in one specific situation. -- Sphinx of black quartz, judge my vow. From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 21:24:12 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 202BD1DC; Mon, 4 Mar 2013 21:24:12 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 274B21617; Mon, 4 Mar 2013 21:24:10 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id r24LOAGb019883; Mon, 4 Mar 2013 14:24:10 -0700 (MST) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id r24LO8VF019882; Mon, 4 Mar 2013 14:24:08 -0700 (MST) (envelope-from ken) Date: Mon, 4 Mar 2013 14:24:08 -0700 From: "Kenneth D. Merry" To: Adrian Chadd Subject: Re: 9.1 minimal ram requirements Message-ID: <20130304212408.GA19842@nargothrond.kdm.org> References: <1356218834151-5771583.post@n5.nabble.com> <50D644E5.9070801@martenvijn.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2i Cc: Marten Vijn , Sergey Kandaurov , freebsd-stable@freebsd.org, jakub_lach@mailplus.pl X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 21:24:12 -0000 I just checked in a change to HEAD (247814) that compiles CTL in GENERIC but disables it by default. (i.e. it uses no memory) You can re-enable it with the existing loader tunable. i.e. set kern.cam.ctl.disable=0 in /boot/loader.conf and it will be enabled. Ken On Wed, Feb 27, 2013 at 18:26:28 -0800, Adrian Chadd wrote: > Hi Ken, > > I'd like to fix this for 9.2 and -HEAD. > > Would you mind if I disabled CTL in GENERIC (but still build it as a > module) until you've fixed the initial RAM reservation that it > requires? > > Thanks, > > > > Adrian > > > On 22 December 2012 22:32, Adrian Chadd wrote: > > Ken, > > > > Does CAM CTL really need to pre-allocate 35MB of RAM at startup? > > > > > > > > Adrian > > > > On 22 December 2012 16:45, Sergey Kandaurov wrote: > >> On 23 December 2012 03:40, Marten Vijn wrote: > >>> On 12/23/2012 12:27 AM, Jakub Lach wrote: > >>>> > >>>> Guys, I've heard about some absurd RAM requirements > >>>> for 9.1, has anybody tested it? > >>>> > >>>> e.g. > >>>> > >>>> http://forums.freebsd.org/showthread.php?t=36314 > >>> > >>> > >>> jup, I can comfirm this with nanobsd (cross) compiled > >>> for my soekris net4501 which has 64 MB mem: > >>> > >>> from dmesg: real memory = 67108864 (64 MB) > >>> > >>> while the same config compiled against a 9.0 tree still works... > >>> > >> > >> This (i.e. the "kmem_map too small" message seen with kernel memory > >> shortage) could be due to CAM CTL ('device ctl' added in 9.1), which is > >> quite a big kernel memory consumer. > >> Try to disable CTL in loader with kern.cam.ctl.disable=1 to finish boot. > >> A longer term workaround could be to postpone those memory allocations > >> until the first call to CTL. > >> > >> # cam ctl init allocates roughly 35 MB of kernel memory at once > >> # three memory pools, somewhat under M_DEVBUF, and memory disk > >> # devbuf takes 1022K with kern.cam.ctl.disable=1 > >> > >> Type InUse MemUse HighUse Requests Size(s) > >> devbuf 213 20366K - 265 16,32,64,128,256,512,1024,2048,4096 > >> ctlmem 5062 10113K - 5062 64,2048 > >> ctlblk 200 800K - 200 4096 > >> ramdisk 1 4096K - 1 > >> ctlpool 532 138K - 532 16,512 > >> > >> -- > >> wbr, > >> pluknet > >> _______________________________________________ > >> freebsd-stable@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable > >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 21:32:52 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A83B9587 for ; Mon, 4 Mar 2013 21:32:52 +0000 (UTC) (envelope-from bsd-src@helfman.org) Received: from mail-ve0-f175.google.com (mail-ve0-f175.google.com [209.85.128.175]) by mx1.freebsd.org (Postfix) with ESMTP id 6AE301692 for ; Mon, 4 Mar 2013 21:32:52 +0000 (UTC) Received: by mail-ve0-f175.google.com with SMTP id cy12so5202322veb.34 for ; Mon, 04 Mar 2013 13:32:51 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=vMm6eqPiioRef9pt1e1DmMl6WjF1cjxn9q4o2FVfevU=; b=poNCM3pWUhQp8cjM5jPLaKJPVY1AwIG9PY5abGai3oDowcqQ4YYkNCdyTFuVDn6pqH inh0dBtWTtU/YaN+Ipna1PGaGfddiVE17rgX+arhyMTBr0Fe/VQQ6zqQ1x63Wq8nhQL6 CsKDfiW9VIq+fr6FY6sq2/NE8zJpynFyQ7ZuC/fGLqIfiuuQqsSN5iv5cUPpCR6T2dyt 4gYm00TdSPmB3I+0JDcahOBYNDXkedDYOl2vZ7gvLUCp3X5jponk7UrrefyNlLRWfgtw 98AGeg19swDHQxrcHUW7wEiXCfahqpCnLIfY20Sqss14mvaQ/ymMwmceSg/jdr8+/R5l FV/g== MIME-Version: 1.0 X-Received: by 10.220.149.11 with SMTP id r11mr8290206vcv.44.1362432771698; Mon, 04 Mar 2013 13:32:51 -0800 (PST) Sender: bsd-src@helfman.org Received: by 10.58.30.79 with HTTP; Mon, 4 Mar 2013 13:32:51 -0800 (PST) In-Reply-To: References: Date: Mon, 4 Mar 2013 13:32:51 -0800 X-Google-Sender-Auth: x6DoRbhGAMEia236_u7s2f0ok40 Message-ID: Subject: Re: GNATS now available via rsync From: Jason Helfman To: "Simon L. B. Nielsen" X-Gm-Message-State: ALoCoQkt0DALb5In2TSV2u+4gONwomf8XXKR8Yc1ddtXZVyVlSNIiH2RUGGrCzZShu4dbMv0xSK9 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-stable@freebsd.org, FreeBSD Ports List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 21:32:52 -0000 On Sun, Dec 23, 2012 at 1:51 PM, Simon L. B. Nielsen wrote: > Hey, > > The GNATS database can now be mirrored using rsync from: > > rsync://bit0.us-west.freebsd.org/FreeBSD-bit/gnats/ > > I expect that URL to be permanent, at least while GNATS is still > alive. At a later point there will be more mirrors (a us-east will be > the first) and I will find a place to publish the mirror list. > > On a side note, GNATS changes aren't mirrored to the old CVSup system > right now, as cvsupd broke on FreeBSD 10.0, which the hosts running > GNATS is running. There is no current plans from clusteradm@'s side to > fix this now that an alternative way to get GNATS exists and cvsup is > deprecated long term anyway. > I have supplied an update to reflect this change in the committers's guide here: http://www.freebsd.org/doc/en/articles/committers-guide/gnats.html -jgh -- Jason Helfman | FreeBSD Committer jgh@FreeBSD.org | http://people.freebsd.org/~jgh | The Power to Serve From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 22:48:33 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7C637D53 for ; Mon, 4 Mar 2013 22:48:33 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 43915F7 for ; Mon, 4 Mar 2013 22:48:32 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r24MmNFO027226 for ; Mon, 4 Mar 2013 16:48:23 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Mon Mar 4 16:48:23 2013 Message-ID: <513524B2.6020600@denninger.net> Date: Mon, 04 Mar 2013 16:48:18 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: ZFS "stalls" -- and maybe we should be talking about defaults? X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130304-0, 03/04/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 22:48:33 -0000 Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still couldn't find any evidence of a hardware problem (not even ECC-corrected data returns.) All the disks involved are completely clean -- zero sector reassignments, the drive-specific log is clean, etc. Attempting to cut back the ARECA adapter's aggressiveness (buffering, etc) on the theory that I was tickling something in its cache management algorithm that was pissing it off proved fruitless as well, even when I shut off ALL caching and NCQ options. I also set vfs.zfs.prefetch_disable=1 to no effect. Hmmmm... Last night after reading the ZFS Tuning wiki for FreeBSD I went on a lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=2000000000), set vfs.zfs.write_limit_override to 1024000000 (1GB) and rebooted. /* The problem instantly disappeared and I cannot provoke its return even with multiple full-bore snapshot and rsync filesystem copies running while a scrub is being done.*/ /**/ I'm pinging between being I/O and processor (geli) limited now in normal operation and slamming the I/O channel during a scrub. It appears that performance is roughly equivalent, maybe a bit less, than it was with UFS+SU -- but it's fairly close. The operating theory I have at the moment is that the ARC cache was in some way getting into a near-deadlock situation with other memory demands on the system (there IS a Postgres server running on this hardware although it's a replication server and not taking queries -- nonetheless it does grab a chunk of RAM) leading to the stalls. Limiting its grab of RAM appears to have to resolved the contention issue. I was unable to catch it actually running out of free memory although it was consistently into the low five-digit free page count and the kernel never garfed on the console about resource exhaustion -- other than a bitch about swap stalling (the infamous "more than 20 seconds" message.) Page space in use near the time in question (I could not get a display while locked as it went to I/O and froze) was not zero, but pretty close to it (a few thousand blocks.) That the system was driven into light paging does appear to be significant and indicative of some sort of memory contention issue as under operation with UFS filesystems this machine has never been observed to allocate page space. Anyone seen anything like this before and if so.... is this a case of bad defaults or some bad behavior between various kernel memory allocation contention sources? This isn't exactly a resource-constrained machine running x64 code with 12GB of RAM and two quad-core processors in it! -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Mon Mar 4 22:49:53 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4A469E85 for ; Mon, 4 Mar 2013 22:49:53 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id B930D118 for ; Mon, 4 Mar 2013 22:49:52 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id r24Mnplp036014 for ; Tue, 5 Mar 2013 02:49:51 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Tue, 5 Mar 2013 02:49:51 +0400 (MSK) From: Dmitry Morozovsky To: freebsd-stable@FreeBSD.org Subject: carp on stable/9: is there a way to keep jumbo? (fwd) Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (woozle.rinet.ru [0.0.0.0]); Tue, 05 Mar 2013 02:49:51 +0400 (MSK) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2013 22:49:53 -0000 Collegaues, sorry, sent to the wrong list (the only escuse for me is possibly that I'm trying to make HAST base on carp...) -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ ---------- Forwarded message ---------- Date: Tue, 5 Mar 2013 02:31:51 From: Dmitry Morozovsky To: freebsd-fs@freebsd.org Subject: carp on stable/9: is there a way to keep jumbo? Dear collesagues, yes, I know glebius@ overhauled carp in -current, but I'm a bit nervous to deploy bleeding edge system on a NAS/SAN ;) So, my question is about current state of carp in stable/9: building HA pair I found that carp interfaces lose jumbo capabilities: root@cthulhu4:~# ifconfig | grep mtu em0: flags=8943 metric 0 mtu 9000 em1: flags=8943 metric 0 mtu 9000 lo0: flags=8049 metric 0 mtu 16384 lagg0: flags=8943 metric 0 mtu 9000 carp0: flags=49 metric 0 mtu 1500 carp1: flags=49 metric 0 mtu 1500 root@cthulhu4:~# ifconfig carp1 mtu 9000 ifconfig: ioctl (set mtu): Invalid argument Is it unavoidable at the moment, or am I missing something obvious? Thanks! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 00:33:08 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A7DAD11B for ; Tue, 5 Mar 2013 00:33:08 +0000 (UTC) (envelope-from prvs=1776ac14af=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 3C9BA633 for ; Tue, 5 Mar 2013 00:33:07 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002541918.msg for ; Tue, 05 Mar 2013 00:33:00 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 05 Mar 2013 00:33:00 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1776ac14af=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> From: "Steven Hartland" To: "Karl Denninger" , References: <513524B2.6020600@denninger.net> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Tue, 5 Mar 2013 00:33:01 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 00:33:08 -0000 What does zfs-stats -a show when your having the stall issue? You can also use zfs iostats to show individual disk iostats which may help identify a single failing disk e.g. zpool iostat -v 1 Also have you investigated which of the two sysctls you changed fixed it or does it require both? Regards Steve ----- Original Message ----- From: "Karl Denninger" To: Sent: Monday, March 04, 2013 10:48 PM Subject: ZFS "stalls" -- and maybe we should be talking about defaults? Well now this is interesting. I have converted a significant number of filesystems to ZFS over the last week or so and have noted a few things. A couple of them aren't so good. The subject machine in question has 12GB of RAM and dual Xeon 5500-series processors. It also has an ARECA 1680ix in it with 2GB of local cache and the BBU for it. The ZFS spindles are all exported as JBOD drives. I set up four disks under GPT, have a single freebsd-zfs partition added to them, are labeled and the providers are then geli-encrypted and added to the pool. When the same disks were running on UFS filesystems they were set up as a 0+1 RAID array under the ARECA adapter, exported as a single unit, GPT labeled as a single pack and then gpart-sliced and newfs'd under UFS+SU. Since I previously ran UFS filesystems on this config I know what the performance level I achieved with that, and the entire system had been running flawlessly set up that way for the last couple of years. Presently the machine is running 9.1-Stable, r244942M Immediately after the conversion I set up a second pool to play with backup strategies to a single drive and ran into a problem. The disk I used for that testing is one that previously was in the rotation and is also known good. I began to get EXTENDED stalls with zero I/O going on, some lasting for 30 seconds or so. The system was not frozen but anything that touched I/O would lock until it cleared. Dedup is off, incidentally. My first thought was that I had a bad drive, cable or other physical problem. However, searching for that proved fruitless -- there was nothing being logged anywhere -- not in the SMART data, not by the adapter, not by the OS. Nothing. Sticking a digital storage scope on the +5V and +12V rails didn't disclose anything interesting with the power in the chassis; it's stable. Further, swapping the only disk that had changed (the new backup volume) with a different one didn't change behavior either. The last straw was when I was able to reproduce the stalls WITHIN the original pool against the same four disks that had been running flawlessly for two years under UFS, and still couldn't find any evidence of a hardware problem (not even ECC-corrected data returns.) All the disks involved are completely clean -- zero sector reassignments, the drive-specific log is clean, etc. Attempting to cut back the ARECA adapter's aggressiveness (buffering, etc) on the theory that I was tickling something in its cache management algorithm that was pissing it off proved fruitless as well, even when I shut off ALL caching and NCQ options. I also set vfs.zfs.prefetch_disable=1 to no effect. Hmmmm... Last night after reading the ZFS Tuning wiki for FreeBSD I went on a lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=2000000000), set vfs.zfs.write_limit_override to 1024000000 (1GB) and rebooted. /* The problem instantly disappeared and I cannot provoke its return even with multiple full-bore snapshot and rsync filesystem copies running while a scrub is being done.*/ /**/ I'm pinging between being I/O and processor (geli) limited now in normal operation and slamming the I/O channel during a scrub. It appears that performance is roughly equivalent, maybe a bit less, than it was with UFS+SU -- but it's fairly close. The operating theory I have at the moment is that the ARC cache was in some way getting into a near-deadlock situation with other memory demands on the system (there IS a Postgres server running on this hardware although it's a replication server and not taking queries -- nonetheless it does grab a chunk of RAM) leading to the stalls. Limiting its grab of RAM appears to have to resolved the contention issue. I was unable to catch it actually running out of free memory although it was consistently into the low five-digit free page count and the kernel never garfed on the console about resource exhaustion -- other than a bitch about swap stalling (the infamous "more than 20 seconds" message.) Page space in use near the time in question (I could not get a display while locked as it went to I/O and froze) was not zero, but pretty close to it (a few thousand blocks.) That the system was driven into light paging does appear to be significant and indicative of some sort of memory contention issue as under operation with UFS filesystems this machine has never been observed to allocate page space. Anyone seen anything like this before and if so.... is this a case of bad defaults or some bad behavior between various kernel memory allocation contention sources? This isn't exactly a resource-constrained machine running x64 code with 12GB of RAM and two quad-core processors in it! -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 01:21:31 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0D5C17E2 for ; Tue, 5 Mar 2013 01:21:31 +0000 (UTC) (envelope-from prvs=1776ac14af=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 9328A75B for ; Tue, 5 Mar 2013 01:21:30 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002542736.msg for ; Tue, 05 Mar 2013 01:21:29 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 05 Mar 2013 01:21:29 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1776ac14af=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@FreeBSD.org Message-ID: From: "Steven Hartland" To: "Dmitry Morozovsky" , References: Subject: Re: carp on stable/9: is there a way to keep jumbo? (fwd) Date: Tue, 5 Mar 2013 01:21:31 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 01:21:31 -0000 You might want to try:- http://blog.multiplay.co.uk/dropzone/freebsd/carp-mtu.patch Be warned it doesn't do any validation so if you use it against physical interfaces with a smaller MTU things will likely go badly wrong, hell they may go badly wrong anyway as its just a very quick and dirty hack ;-) Regards Steve ----- Original Message ----- From: "Dmitry Morozovsky" To: Sent: Monday, March 04, 2013 10:49 PM Subject: carp on stable/9: is there a way to keep jumbo? (fwd) > Collegaues, > > sorry, sent to the wrong list (the only escuse for me is possibly that I'm > trying to make HAST base on carp...) > > -- > Sincerely, > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > [ FreeBSD committer: marck@FreeBSD.org ] > ------------------------------------------------------------------------ > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > ------------------------------------------------------------------------ > > ---------- Forwarded message ---------- > Date: Tue, 5 Mar 2013 02:31:51 > From: Dmitry Morozovsky > To: freebsd-fs@freebsd.org > Subject: carp on stable/9: is there a way to keep jumbo? > > Dear collesagues, > > yes, I know glebius@ overhauled carp in -current, but I'm a bit nervous to > deploy bleeding edge system on a NAS/SAN ;) > > So, my question is about current state of carp in stable/9: building HA pair I > found that carp interfaces lose jumbo capabilities: > > root@cthulhu4:~# ifconfig | grep mtu > em0: flags=8943 metric 0 mtu 9000 > em1: flags=8943 metric 0 mtu 9000 > lo0: flags=8049 metric 0 mtu 16384 > lagg0: flags=8943 metric 0 mtu 9000 > carp0: flags=49 metric 0 mtu 1500 > carp1: flags=49 metric 0 mtu 1500 > root@cthulhu4:~# ifconfig carp1 mtu 9000 > ifconfig: ioctl (set mtu): Invalid argument > > Is it unavoidable at the moment, or am I missing something obvious? > > Thanks! > > -- > Sincerely, > D.Marck [DM5020, MCK-RIPE, DM3-RIPN] > [ FreeBSD committer: marck@FreeBSD.org ] > ------------------------------------------------------------------------ > *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** > ------------------------------------------------------------------------ > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 02:08:02 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 83C88C65 for ; Tue, 5 Mar 2013 02:08:02 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 28ABE883 for ; Tue, 5 Mar 2013 02:08:02 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.6/8.14.5) with ESMTP id r2527kwu040755; Mon, 4 Mar 2013 18:07:46 -0800 (PST) (envelope-from freebsd@pki2.com) Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? From: Dennis Glatting To: Karl Denninger In-Reply-To: <513524B2.6020600@denninger.net> References: <513524B2.6020600@denninger.net> Content-Type: text/plain; charset="ISO-8859-1" Date: Mon, 04 Mar 2013 18:07:46 -0800 Message-ID: <1362449266.92708.8.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: r2527kwu040755 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 02:08:02 -0000 I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25% ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips without much load. Interestingly pbzip2 consistently created a problem on a volume whereas gzip does not. Here, stalls happen across several systems however I have had less problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same chips: IR vs IT) I don't have a problem. On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote: > Well now this is interesting. > > I have converted a significant number of filesystems to ZFS over the > last week or so and have noted a few things. A couple of them aren't so > good. > > The subject machine in question has 12GB of RAM and dual Xeon > 5500-series processors. It also has an ARECA 1680ix in it with 2GB of > local cache and the BBU for it. The ZFS spindles are all exported as > JBOD drives. I set up four disks under GPT, have a single freebsd-zfs > partition added to them, are labeled and the providers are then > geli-encrypted and added to the pool. When the same disks were running > on UFS filesystems they were set up as a 0+1 RAID array under the ARECA > adapter, exported as a single unit, GPT labeled as a single pack and > then gpart-sliced and newfs'd under UFS+SU. > > Since I previously ran UFS filesystems on this config I know what the > performance level I achieved with that, and the entire system had been > running flawlessly set up that way for the last couple of years. > Presently the machine is running 9.1-Stable, r244942M > > Immediately after the conversion I set up a second pool to play with > backup strategies to a single drive and ran into a problem. The disk I > used for that testing is one that previously was in the rotation and is > also known good. I began to get EXTENDED stalls with zero I/O going on, > some lasting for 30 seconds or so. The system was not frozen but > anything that touched I/O would lock until it cleared. Dedup is off, > incidentally. > > My first thought was that I had a bad drive, cable or other physical > problem. However, searching for that proved fruitless -- there was > nothing being logged anywhere -- not in the SMART data, not by the > adapter, not by the OS. Nothing. Sticking a digital storage scope on > the +5V and +12V rails didn't disclose anything interesting with the > power in the chassis; it's stable. Further, swapping the only disk that > had changed (the new backup volume) with a different one didn't change > behavior either. > > The last straw was when I was able to reproduce the stalls WITHIN the > original pool against the same four disks that had been running > flawlessly for two years under UFS, and still couldn't find any evidence > of a hardware problem (not even ECC-corrected data returns.) All the > disks involved are completely clean -- zero sector reassignments, the > drive-specific log is clean, etc. > > Attempting to cut back the ARECA adapter's aggressiveness (buffering, > etc) on the theory that I was tickling something in its cache management > algorithm that was pissing it off proved fruitless as well, even when I > shut off ALL caching and NCQ options. I also set > vfs.zfs.prefetch_disable=1 to no effect. Hmmmm... > > Last night after reading the ZFS Tuning wiki for FreeBSD I went on a > lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=2000000000), set > vfs.zfs.write_limit_override to 1024000000 (1GB) and rebooted. /* > > The problem instantly disappeared and I cannot provoke its return even > with multiple full-bore snapshot and rsync filesystem copies running > while a scrub is being done.*/ > /**/ > I'm pinging between being I/O and processor (geli) limited now in normal > operation and slamming the I/O channel during a scrub. It appears that > performance is roughly equivalent, maybe a bit less, than it was with > UFS+SU -- but it's fairly close. > > The operating theory I have at the moment is that the ARC cache was in > some way getting into a near-deadlock situation with other memory > demands on the system (there IS a Postgres server running on this > hardware although it's a replication server and not taking queries -- > nonetheless it does grab a chunk of RAM) leading to the stalls. > Limiting its grab of RAM appears to have to resolved the contention > issue. I was unable to catch it actually running out of free memory > although it was consistently into the low five-digit free page count and > the kernel never garfed on the console about resource exhaustion -- > other than a bitch about swap stalling (the infamous "more than 20 > seconds" message.) Page space in use near the time in question (I could > not get a display while locked as it went to I/O and froze) was not > zero, but pretty close to it (a few thousand blocks.) That the system > was driven into light paging does appear to be significant and > indicative of some sort of memory contention issue as under operation > with UFS filesystems this machine has never been observed to allocate > page space. > > Anyone seen anything like this before and if so.... is this a case of > bad defaults or some bad behavior between various kernel memory > allocation contention sources? > > This isn't exactly a resource-constrained machine running x64 code with > 12GB of RAM and two quad-core processors in it! > From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 02:48:37 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 540723B1 for ; Tue, 5 Mar 2013 02:48:37 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id F3D9B9A5 for ; Tue, 5 Mar 2013 02:48:36 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r252mZZA038927 for ; Mon, 4 Mar 2013 20:48:35 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Mon Mar 4 20:48:35 2013 Message-ID: <51355CFE.7080405@denninger.net> Date: Mon, 04 Mar 2013 20:48:30 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> In-Reply-To: <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130304-2, 03/04/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 02:48:37 -0000 On 3/4/2013 6:33 PM, Steven Hartland wrote: > What does zfs-stats -a show when your having the stall issue? > > You can also use zfs iostats to show individual disk iostats > which may help identify a single failing disk e.g. > zpool iostat -v 1 > > Also have you investigated which of the two sysctls you changed > fixed it or does it require both? > > Regards > Steve > > ----- Original Message ----- From: "Karl Denninger" > To: > Sent: Monday, March 04, 2013 10:48 PM > Subject: ZFS "stalls" -- and maybe we should be talking about defaults? > > > Well now this is interesting. > > I have converted a significant number of filesystems to ZFS over the > last week or so and have noted a few things. A couple of them aren't so > good. > > The subject machine in question has 12GB of RAM and dual Xeon > 5500-series processors. It also has an ARECA 1680ix in it with 2GB of > local cache and the BBU for it. The ZFS spindles are all exported as > JBOD drives. I set up four disks under GPT, have a single freebsd-zfs > partition added to them, are labeled and the providers are then > geli-encrypted and added to the pool. When the same disks were running > on UFS filesystems they were set up as a 0+1 RAID array under the ARECA > adapter, exported as a single unit, GPT labeled as a single pack and > then gpart-sliced and newfs'd under UFS+SU. > > Since I previously ran UFS filesystems on this config I know what the > performance level I achieved with that, and the entire system had been > running flawlessly set up that way for the last couple of years. > Presently the machine is running 9.1-Stable, r244942M > > Immediately after the conversion I set up a second pool to play with > backup strategies to a single drive and ran into a problem. The disk I > used for that testing is one that previously was in the rotation and is > also known good. I began to get EXTENDED stalls with zero I/O going on, > some lasting for 30 seconds or so. The system was not frozen but > anything that touched I/O would lock until it cleared. Dedup is off, > incidentally. > > My first thought was that I had a bad drive, cable or other physical > problem. However, searching for that proved fruitless -- there was > nothing being logged anywhere -- not in the SMART data, not by the > adapter, not by the OS. Nothing. Sticking a digital storage scope on > the +5V and +12V rails didn't disclose anything interesting with the > power in the chassis; it's stable. Further, swapping the only disk that > had changed (the new backup volume) with a different one didn't change > behavior either. > > The last straw was when I was able to reproduce the stalls WITHIN the > original pool against the same four disks that had been running > flawlessly for two years under UFS, and still couldn't find any evidence > of a hardware problem (not even ECC-corrected data returns.) All the > disks involved are completely clean -- zero sector reassignments, the > drive-specific log is clean, etc. > > Attempting to cut back the ARECA adapter's aggressiveness (buffering, > etc) on the theory that I was tickling something in its cache management > algorithm that was pissing it off proved fruitless as well, even when I > shut off ALL caching and NCQ options. I also set > vfs.zfs.prefetch_disable=1 to no effect. Hmmmm... > > Last night after reading the ZFS Tuning wiki for FreeBSD I went on a > lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=2000000000), set > vfs.zfs.write_limit_override to 1024000000 (1GB) and rebooted. /* > > The problem instantly disappeared and I cannot provoke its return even > with multiple full-bore snapshot and rsync filesystem copies running > while a scrub is being done.*/ > /**/ > I'm pinging between being I/O and processor (geli) limited now in normal > operation and slamming the I/O channel during a scrub. It appears that > performance is roughly equivalent, maybe a bit less, than it was with > UFS+SU -- but it's fairly close. > > The operating theory I have at the moment is that the ARC cache was in > some way getting into a near-deadlock situation with other memory > demands on the system (there IS a Postgres server running on this > hardware although it's a replication server and not taking queries -- > nonetheless it does grab a chunk of RAM) leading to the stalls. > Limiting its grab of RAM appears to have to resolved the contention > issue. I was unable to catch it actually running out of free memory > although it was consistently into the low five-digit free page count and > the kernel never garfed on the console about resource exhaustion -- > other than a bitch about swap stalling (the infamous "more than 20 > seconds" message.) Page space in use near the time in question (I could > not get a display while locked as it went to I/O and froze) was not > zero, but pretty close to it (a few thousand blocks.) That the system > was driven into light paging does appear to be significant and > indicative of some sort of memory contention issue as under operation > with UFS filesystems this machine has never been observed to allocate > page space. > > Anyone seen anything like this before and if so.... is this a case of > bad defaults or some bad behavior between various kernel memory > allocation contention sources? > > This isn't exactly a resource-constrained machine running x64 code with > 12GB of RAM and two quad-core processors in it! > I caught it with systat -vm running (which displays raw io stats on the bottom) and when it locks there's no I/O to ANY spindle (there are six online spindles in the box plus two backup volumes that are normally unmounted.) Note that the machine is not booting from ZFS -- it is booting from and has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks like a single "da0" drive to the OS) and that drive stalls as well when it freezes. It's definitely a kernel thing when it happens as the OS would otherwise not have locked (just I/O to the user partitions) -- but it does. You can't do anything while it's frozen -- anything that wants I/O hangs until it unfreezes. I have zero errors logged in the OS pack for both drives in that mirror as well and again none in the RAID adapter either. I'm not sure which tunable stopped it as I changed both at the same time. Unfortunately both of those tunables can only be changed in /boot/loader.conf, not dynamically, so trying to figure out where the wall is on this is going to be a lot of fun. This is a machine I can futz with provided that I give reasonable notice and it's off-hours; it has a sister system that I can play with at will up to and including destroying it and I'm going to take one of the backup volumes, detach it and use that as a "seed" to effectively replicate the environment on the other box and see if I can isolate this. I've got close to a dozen machines in this basic configuration in the field; they're slightly-older Xeon-series CPUs but work exceptionally well -- this is my first foray into zfs and I need to understand what's going on as stalls like this in production are not good for obvious reasons. -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 02:58:49 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DEB0A812 for ; Tue, 5 Mar 2013 02:58:49 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 92698A1D for ; Tue, 5 Mar 2013 02:58:48 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r252wmRx039416 for ; Mon, 4 Mar 2013 20:58:48 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Mon Mar 4 20:58:48 2013 Message-ID: <51355F64.4040409@denninger.net> Date: Mon, 04 Mar 2013 20:58:44 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> In-Reply-To: <1362449266.92708.8.camel@btw.pki2.com> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130304-2, 03/04/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 02:58:49 -0000 Stick this in /boot/loader.conf and see if your lockups goes away: vfs.zfs.write_limit_override=1024000000 I've got a "sentinal" running that watches for zero-bandwidth zpool iostat 5s that has been running for close to 12 hours now and with the two tunables I changed it doesn't appear to be happening any more. This system always has small-ball write I/Os going to it as it's a postgresql "hot standby" mirror backing a VERY active system and is receiving streaming logdata from the primary at a colocation site, so the odds of it ever experiencing an actual zero for I/O (unless there's a connectivity problem) is pretty remote. If it turns out that the write_limit_override tunable is the one responsible for stopping the hangs I can drop the ARC limit tunable although I'm not sure I want to; I don't see much if any performance penalty from leaving it where it is and if the larger cache isn't helping anything then why use it? I'm inclined to stick an SSD in the cabinet as a cache drive instead of dedicating RAM to this -- even though it's not AS fast as RAM it's still MASSIVELY quicker than getting data off a rotating plate of rust. Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all? On 3/4/2013 8:07 PM, Dennis Glatting wrote: > I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25% > ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips > without much load. Interestingly pbzip2 consistently created a problem > on a volume whereas gzip does not. > > Here, stalls happen across several systems however I have had less > problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same > chips: IR vs IT) I don't have a problem. > > > > > On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote: >> Well now this is interesting. >> >> I have converted a significant number of filesystems to ZFS over the >> last week or so and have noted a few things. A couple of them aren't so >> good. >> >> The subject machine in question has 12GB of RAM and dual Xeon >> 5500-series processors. It also has an ARECA 1680ix in it with 2GB of >> local cache and the BBU for it. The ZFS spindles are all exported as >> JBOD drives. I set up four disks under GPT, have a single freebsd-zfs >> partition added to them, are labeled and the providers are then >> geli-encrypted and added to the pool. When the same disks were running >> on UFS filesystems they were set up as a 0+1 RAID array under the ARECA >> adapter, exported as a single unit, GPT labeled as a single pack and >> then gpart-sliced and newfs'd under UFS+SU. >> >> Since I previously ran UFS filesystems on this config I know what the >> performance level I achieved with that, and the entire system had been >> running flawlessly set up that way for the last couple of years. >> Presently the machine is running 9.1-Stable, r244942M >> >> Immediately after the conversion I set up a second pool to play with >> backup strategies to a single drive and ran into a problem. The disk I >> used for that testing is one that previously was in the rotation and is >> also known good. I began to get EXTENDED stalls with zero I/O going on, >> some lasting for 30 seconds or so. The system was not frozen but >> anything that touched I/O would lock until it cleared. Dedup is off, >> incidentally. >> >> My first thought was that I had a bad drive, cable or other physical >> problem. However, searching for that proved fruitless -- there was >> nothing being logged anywhere -- not in the SMART data, not by the >> adapter, not by the OS. Nothing. Sticking a digital storage scope on >> the +5V and +12V rails didn't disclose anything interesting with the >> power in the chassis; it's stable. Further, swapping the only disk that >> had changed (the new backup volume) with a different one didn't change >> behavior either. >> >> The last straw was when I was able to reproduce the stalls WITHIN the >> original pool against the same four disks that had been running >> flawlessly for two years under UFS, and still couldn't find any evidence >> of a hardware problem (not even ECC-corrected data returns.) All the >> disks involved are completely clean -- zero sector reassignments, the >> drive-specific log is clean, etc. >> >> Attempting to cut back the ARECA adapter's aggressiveness (buffering, >> etc) on the theory that I was tickling something in its cache management >> algorithm that was pissing it off proved fruitless as well, even when I >> shut off ALL caching and NCQ options. I also set >> vfs.zfs.prefetch_disable=1 to no effect. Hmmmm... >> >> Last night after reading the ZFS Tuning wiki for FreeBSD I went on a >> lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=2000000000), set >> vfs.zfs.write_limit_override to 1024000000 (1GB) and rebooted. /* >> >> The problem instantly disappeared and I cannot provoke its return even >> with multiple full-bore snapshot and rsync filesystem copies running >> while a scrub is being done.*/ >> /**/ >> I'm pinging between being I/O and processor (geli) limited now in normal >> operation and slamming the I/O channel during a scrub. It appears that >> performance is roughly equivalent, maybe a bit less, than it was with >> UFS+SU -- but it's fairly close. >> >> The operating theory I have at the moment is that the ARC cache was in >> some way getting into a near-deadlock situation with other memory >> demands on the system (there IS a Postgres server running on this >> hardware although it's a replication server and not taking queries -- >> nonetheless it does grab a chunk of RAM) leading to the stalls. >> Limiting its grab of RAM appears to have to resolved the contention >> issue. I was unable to catch it actually running out of free memory >> although it was consistently into the low five-digit free page count and >> the kernel never garfed on the console about resource exhaustion -- >> other than a bitch about swap stalling (the infamous "more than 20 >> seconds" message.) Page space in use near the time in question (I could >> not get a display while locked as it went to I/O and froze) was not >> zero, but pretty close to it (a few thousand blocks.) That the system >> was driven into light paging does appear to be significant and >> indicative of some sort of memory contention issue as under operation >> with UFS filesystems this machine has never been observed to allocate >> page space. >> >> Anyone seen anything like this before and if so.... is this a case of >> bad defaults or some bad behavior between various kernel memory >> allocation contention sources? >> >> This isn't exactly a resource-constrained machine running x64 code with >> 12GB of RAM and two quad-core processors in it! >> > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > %SPAMBLOCK-SYS: Matched [@freebsd.org+], message ok -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 03:25:19 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9E8DFE16 for ; Tue, 5 Mar 2013 03:25:19 +0000 (UTC) (envelope-from prvs=1776ac14af=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 20FECBE8 for ; Tue, 5 Mar 2013 03:25:18 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002544721.msg for ; Tue, 05 Mar 2013 03:25:16 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 05 Mar 2013 03:25:16 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1776ac14af=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> From: "Steven Hartland" To: "Karl Denninger" , References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Tue, 5 Mar 2013 03:25:18 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 03:25:19 -0000 ----- Original Message ----- From: "Karl Denninger" > Stick this in /boot/loader.conf and see if your lockups goes away: > > vfs.zfs.write_limit_override=1024000000 ... > If it turns out that the write_limit_override tunable is the one > responsible for stopping the hangs I can drop the ARC limit tunable > although I'm not sure I want to; I don't see much if any performance > penalty from leaving it where it is and if the larger cache isn't > helping anything then why use it? I'm inclined to stick an SSD in the > cabinet as a cache drive instead of dedicating RAM to this -- even > though it's not AS fast as RAM it's still MASSIVELY quicker than getting > data off a rotating plate of rust. Now interesting you should say that I've seen a stall recently on ZFS only box running on 6 x SSD RAIDZ2. The stall was caused by fairly large mysql import, with nothing else running. Then it happened I thought the machine had wedged, but minutes (not seconds) later, everything sprung into action again. > Am I correct that a ZFS filesystem does NOT use the VM buffer cache > at all? Correct Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 03:39:32 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 67A0BFD8 for ; Tue, 5 Mar 2013 03:39:32 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 2EB91CCC for ; Tue, 5 Mar 2013 03:39:31 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r253dVVU041211 for ; Mon, 4 Mar 2013 21:39:31 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Mon Mar 4 21:39:31 2013 Message-ID: <513568EE.80006@denninger.net> Date: Mon, 04 Mar 2013 21:39:26 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 CC: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> In-Reply-To: <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130304-2, 03/04/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 03:39:32 -0000 On 3/4/2013 9:25 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Karl Denninger" > >> Stick this in /boot/loader.conf and see if your lockups goes away: >> >> vfs.zfs.write_limit_override=1024000000 > ... > >> If it turns out that the write_limit_override tunable is the one >> responsible for stopping the hangs I can drop the ARC limit tunable >> although I'm not sure I want to; I don't see much if any performance >> penalty from leaving it where it is and if the larger cache isn't >> helping anything then why use it? I'm inclined to stick an SSD in the >> cabinet as a cache drive instead of dedicating RAM to this -- even >> though it's not AS fast as RAM it's still MASSIVELY quicker than getting >> data off a rotating plate of rust. > > Now interesting you should say that I've seen a stall recently on ZFS > only box running on 6 x SSD RAIDZ2. > > The stall was caused by fairly large mysql import, with nothing else > running. > > Then it happened I thought the machine had wedged, but minutes (not > seconds) later, everything sprung into action again. That's exactly what I can reproduce here; the stalls are anywhere from a few seconds to well north of a half-minute. It looks like the machine is hung -- but it is not. The machine in question normally runs with zero swap allocated but it always has 1.5Gb of shared memory allocated to Postgres ("shared_buffers = 1500MB" in its config file) I wonder if the ARC cache management code is misbehaving when shared segments are in use? -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 03:52:51 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0C6D6214 for ; Tue, 5 Mar 2013 03:52:51 +0000 (UTC) (envelope-from dg@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 9DC53D52 for ; Tue, 5 Mar 2013 03:52:50 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.6/8.14.5) with ESMTP id r253qZIx079117; Mon, 4 Mar 2013 19:52:35 -0800 (PST) (envelope-from dg@pki2.com) Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? From: Dennis Glatting To: Karl Denninger In-Reply-To: <51355F64.4040409@denninger.net> References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> Content-Type: text/plain; charset="ISO-8859-1" Date: Mon, 04 Mar 2013 19:52:35 -0800 Message-ID: <1362455555.62624.11.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: r253qZIx079117 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg@pki2.com Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 03:52:51 -0000 On Mon, 2013-03-04 at 20:58 -0600, Karl Denninger wrote: > Stick this in /boot/loader.conf and see if your lockups goes away: > > vfs.zfs.write_limit_override=1024000000 > K. > I've got a "sentinal" running that watches for zero-bandwidth zpool > iostat 5s that has been running for close to 12 hours now and with the > two tunables I changed it doesn't appear to be happening any more. > I've also done this as well as top and systat -vmstat. Disk I/O stops but the system lives through top, system, and the network. However, if I try to login the login won't complete. All of my systems are hardware RAID1 for the OS (LSI and Areca) and typically a separate disk for swap. All other disks are ZFS. > This system always has small-ball write I/Os going to it as it's a > postgresql "hot standby" mirror backing a VERY active system and is > receiving streaming logdata from the primary at a colocation site, so > the odds of it ever experiencing an actual zero for I/O (unless there's > a connectivity problem) is pretty remote. > I am doing multi TB sorts and GB database loads. > If it turns out that the write_limit_override tunable is the one > responsible for stopping the hangs I can drop the ARC limit tunable > although I'm not sure I want to; I don't see much if any performance > penalty from leaving it where it is and if the larger cache isn't > helping anything then why use it? I'm inclined to stick an SSD in the > cabinet as a cache drive instead of dedicating RAM to this -- even > though it's not AS fast as RAM it's still MASSIVELY quicker than getting > data off a rotating plate of rust. > I forgot to mention that on my three 8.3 systems they occasionally offline a disk (one or two a week, total). I simply online the disk and after resilver all is well. There are ~40 disks across those three systems. Of my 9.1 systems three are busy but with smaller number of disks (about eight across two volumes (RAIDz2 and mirror). I also have a ZFS-on-Linux (CentOS) system for play (about 12 disks). It did not exhibit problems when it was in use but it did teach me a lesson on the evils of dedup. :) > Am I correct that a ZFS filesystem does NOT use the VM buffer cache at all? > Dunno. > On 3/4/2013 8:07 PM, Dennis Glatting wrote: > > I get stalls with 256GB of RAM with arc_max=64G (my limit is usually 25% > > ) on a 64 core system with 20 new 3TB Seagate disks under LSI2008 chips > > without much load. Interestingly pbzip2 consistently created a problem > > on a volume whereas gzip does not. > > > > Here, stalls happen across several systems however I have had less > > problems under 8.3 than 9.1. If I go to hardware RAID5 (LSI2008 -- same > > chips: IR vs IT) I don't have a problem. > > > > > > > > > > On Mon, 2013-03-04 at 16:48 -0600, Karl Denninger wrote: > >> Well now this is interesting. > >> > >> I have converted a significant number of filesystems to ZFS over the > >> last week or so and have noted a few things. A couple of them aren't so > >> good. > >> > >> The subject machine in question has 12GB of RAM and dual Xeon > >> 5500-series processors. It also has an ARECA 1680ix in it with 2GB of > >> local cache and the BBU for it. The ZFS spindles are all exported as > >> JBOD drives. I set up four disks under GPT, have a single freebsd-zfs > >> partition added to them, are labeled and the providers are then > >> geli-encrypted and added to the pool. When the same disks were running > >> on UFS filesystems they were set up as a 0+1 RAID array under the ARECA > >> adapter, exported as a single unit, GPT labeled as a single pack and > >> then gpart-sliced and newfs'd under UFS+SU. > >> > >> Since I previously ran UFS filesystems on this config I know what the > >> performance level I achieved with that, and the entire system had been > >> running flawlessly set up that way for the last couple of years. > >> Presently the machine is running 9.1-Stable, r244942M > >> > >> Immediately after the conversion I set up a second pool to play with > >> backup strategies to a single drive and ran into a problem. The disk I > >> used for that testing is one that previously was in the rotation and is > >> also known good. I began to get EXTENDED stalls with zero I/O going on, > >> some lasting for 30 seconds or so. The system was not frozen but > >> anything that touched I/O would lock until it cleared. Dedup is off, > >> incidentally. > >> > >> My first thought was that I had a bad drive, cable or other physical > >> problem. However, searching for that proved fruitless -- there was > >> nothing being logged anywhere -- not in the SMART data, not by the > >> adapter, not by the OS. Nothing. Sticking a digital storage scope on > >> the +5V and +12V rails didn't disclose anything interesting with the > >> power in the chassis; it's stable. Further, swapping the only disk that > >> had changed (the new backup volume) with a different one didn't change > >> behavior either. > >> > >> The last straw was when I was able to reproduce the stalls WITHIN the > >> original pool against the same four disks that had been running > >> flawlessly for two years under UFS, and still couldn't find any evidence > >> of a hardware problem (not even ECC-corrected data returns.) All the > >> disks involved are completely clean -- zero sector reassignments, the > >> drive-specific log is clean, etc. > >> > >> Attempting to cut back the ARECA adapter's aggressiveness (buffering, > >> etc) on the theory that I was tickling something in its cache management > >> algorithm that was pissing it off proved fruitless as well, even when I > >> shut off ALL caching and NCQ options. I also set > >> vfs.zfs.prefetch_disable=1 to no effect. Hmmmm... > >> > >> Last night after reading the ZFS Tuning wiki for FreeBSD I went on a > >> lark and limited the ARC cache to 2GB (vfs.zfs.arc_max=2000000000), set > >> vfs.zfs.write_limit_override to 1024000000 (1GB) and rebooted. /* > >> > >> The problem instantly disappeared and I cannot provoke its return even > >> with multiple full-bore snapshot and rsync filesystem copies running > >> while a scrub is being done.*/ > >> /**/ > >> I'm pinging between being I/O and processor (geli) limited now in normal > >> operation and slamming the I/O channel during a scrub. It appears that > >> performance is roughly equivalent, maybe a bit less, than it was with > >> UFS+SU -- but it's fairly close. > >> > >> The operating theory I have at the moment is that the ARC cache was in > >> some way getting into a near-deadlock situation with other memory > >> demands on the system (there IS a Postgres server running on this > >> hardware although it's a replication server and not taking queries -- > >> nonetheless it does grab a chunk of RAM) leading to the stalls. > >> Limiting its grab of RAM appears to have to resolved the contention > >> issue. I was unable to catch it actually running out of free memory > >> although it was consistently into the low five-digit free page count and > >> the kernel never garfed on the console about resource exhaustion -- > >> other than a bitch about swap stalling (the infamous "more than 20 > >> seconds" message.) Page space in use near the time in question (I could > >> not get a display while locked as it went to I/O and froze) was not > >> zero, but pretty close to it (a few thousand blocks.) That the system > >> was driven into light paging does appear to be significant and > >> indicative of some sort of memory contention issue as under operation > >> with UFS filesystems this machine has never been observed to allocate > >> page space. > >> > >> Anyone seen anything like this before and if so.... is this a case of > >> bad defaults or some bad behavior between various kernel memory > >> allocation contention sources? > >> > >> This isn't exactly a resource-constrained machine running x64 code with > >> 12GB of RAM and two quad-core processors in it! > >> > > > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > > > > %SPAMBLOCK-SYS: Matched [@freebsd.org+], message ok > -- Dennis Glatting From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 03:54:43 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 95DA331D for ; Tue, 5 Mar 2013 03:54:43 +0000 (UTC) (envelope-from dg@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 44072D6D for ; Tue, 5 Mar 2013 03:54:43 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.6/8.14.5) with ESMTP id r253sTT7079185; Mon, 4 Mar 2013 19:54:29 -0800 (PST) (envelope-from dg@pki2.com) Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? From: Dennis Glatting To: Steven Hartland In-Reply-To: <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> Content-Type: text/plain; charset="ISO-8859-1" Date: Mon, 04 Mar 2013 19:54:29 -0800 Message-ID: <1362455669.62624.12.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: r253sTT7079185 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg@pki2.com Cc: freebsd-stable@freebsd.org, Karl Denninger X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 03:54:43 -0000 On Tue, 2013-03-05 at 03:25 +0000, Steven Hartland wrote: > ----- Original Message ----- > From: "Karl Denninger" > > > Stick this in /boot/loader.conf and see if your lockups goes away: > > > > vfs.zfs.write_limit_override=1024000000 > ... > > > If it turns out that the write_limit_override tunable is the one > > responsible for stopping the hangs I can drop the ARC limit tunable > > although I'm not sure I want to; I don't see much if any performance > > penalty from leaving it where it is and if the larger cache isn't > > helping anything then why use it? I'm inclined to stick an SSD in the > > cabinet as a cache drive instead of dedicating RAM to this -- even > > though it's not AS fast as RAM it's still MASSIVELY quicker than getting > > data off a rotating plate of rust. > > Now interesting you should say that I've seen a stall recently on ZFS > only box running on 6 x SSD RAIDZ2. > > The stall was caused by fairly large mysql import, with nothing else > running. > > Then it happened I thought the machine had wedged, but minutes (not > seconds) later, everything sprung into action again. > I've seen this too. > > Am I correct that a ZFS filesystem does NOT use the VM buffer cache > > at all? > > Correct > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Dennis Glatting From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 04:01:46 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BA6EE46D for ; Tue, 5 Mar 2013 04:01:46 +0000 (UTC) (envelope-from prvs=1776ac14af=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 442BFDAD for ; Tue, 5 Mar 2013 04:01:45 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002545216.msg for ; Tue, 05 Mar 2013 04:01:45 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 05 Mar 2013 04:01:45 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1776ac14af=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: <19488839B29A476C8F731EB8F419AF94@multiplay.co.uk> From: "Steven Hartland" To: "Karl Denninger" References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> <513568EE.80006@denninger.net> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Tue, 5 Mar 2013 04:01:47 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 04:01:46 -0000 ----- Original Message ----- From: "Karl Denninger" >> Then it happened I thought the machine had wedged, but minutes (not >> seconds) later, everything sprung into action again. > > That's exactly what I can reproduce here; the stalls are anywhere from a > few seconds to well north of a half-minute. It looks like the machine > is hung -- but it is not. Out of interest when this happens for you is syncer using lots of CPU? If its anything like my stalls you'll need top loaded prior to the fact. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 04:11:19 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 9A2065DB for ; Tue, 5 Mar 2013 04:11:19 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 6ECDADF5 for ; Tue, 5 Mar 2013 04:11:18 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r254BGTe042625 for ; Mon, 4 Mar 2013 22:11:17 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Mon Mar 4 22:11:17 2013 Message-ID: <5135705F.6090406@denninger.net> Date: Mon, 04 Mar 2013 22:11:11 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> <513568EE.80006@denninger.net> <19488839B29A476C8F731EB8F419AF94@multiplay.co.uk> In-Reply-To: <19488839B29A476C8F731EB8F419AF94@multiplay.co.uk> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130304-2, 03/04/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 04:11:19 -0000 On 3/4/2013 10:01 PM, Steven Hartland wrote: > ----- Original Message ----- From: "Karl Denninger" >>> Then it happened I thought the machine had wedged, but minutes (not >>> seconds) later, everything sprung into action again. >> >> That's exactly what I can reproduce here; the stalls are anywhere from a >> few seconds to well north of a half-minute. It looks like the machine >> is hung -- but it is not. > > Out of interest when this happens for you is syncer using lots of CPU? > > If its anything like my stalls you'll need top loaded prior to the fact. > > Regards > Steve Don't know. But the CPU is getting hammered when it happens because I am geli-encrypting all my drives and as a consequence it is not at all uncommon for the load average to be north of 10 when the system is under heavy I/O load. System response is fine right up until it stalls. I'm going to put some effort into trying to isolate exactly what is going on here in the coming days since I happen to have a spare box in an identical configuration that I can afford to lock up without impacting anyone doing real work :-) -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 05:05:57 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 636EA7C for ; Tue, 5 Mar 2013 05:05:57 +0000 (UTC) (envelope-from mauzo@anubis.morrow.me.uk) Received: from isis.morrow.me.uk (isis.morrow.me.uk [204.109.63.142]) by mx1.freebsd.org (Postfix) with ESMTP id 42457F7C for ; Tue, 5 Mar 2013 05:05:56 +0000 (UTC) Received: from anubis.morrow.me.uk (host86-177-98-144.range86-177.btcentralplus.com [86.177.98.144]) (Authenticated sender: mauzo) by isis.morrow.me.uk (Postfix) with ESMTPSA id 52167450D5 for ; Tue, 5 Mar 2013 05:05:49 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.7.4 isis.morrow.me.uk 52167450D5 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=morrow.me.uk; s=dkim201101; t=1362459949; bh=zTxULzH7AnUObu/hT2FGwKswOn2+suFODX6YEZ/Iqp0=; h=Date:From:To:Subject:References:In-Reply-To; b=1FOHP8GCxLwB1bGQQ2+FV7jgfgPijKoTBITUvgk7H8qr8XU3giFGF4schd0YwzAEv KLkuk3hu5dHITrY5SJqdzpexWRbhrbfpR/adpqhaA6Ae60uCYGKPUxE1rNnMN9imBI ejIBf1+NHapZIKdfHKWrcGb2D5h4bHVHmu5HS8hk= X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.97.6 at isis.morrow.me.uk Received: by anubis.morrow.me.uk (Postfix, from userid 5001) id 15CAD932B; Tue, 5 Mar 2013 05:05:47 +0000 (GMT) Date: Tue, 5 Mar 2013 05:05:47 +0000 From: Ben Morrow To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305050539.GA52821@anubis.morrow.me.uk> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51355CFE.7080405@denninger.net> X-Newsgroups: gmane.os.freebsd.stable Organization: morrow.me.uk User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 05:05:57 -0000 Quoth Karl Denninger : > > Note that the machine is not booting from ZFS -- it is booting from and > has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks > like a single "da0" drive to the OS) and that drive stalls as well when > it freezes. It's definitely a kernel thing when it happens as the OS > would otherwise not have locked (just I/O to the user partitions) -- but > it does. Is it still the case that mixing UFS and ZFS can cause problems, or were they all fixed? I remember a while ago (before the arc usage monitoring code was added) there were a number of reports of serious probles running an rsync from UFS to ZFS. If you can it might be worth trying your scratch machine booting from ZFS. Probably the best way is to leave your swap partition where it is (IMHO it's not worth trying to swap onto a zvol) and convert the UFS partition into a separate zpool to boot from. You will also need to replace the boot blocks; assuming you're using GPT you can do this with gpart bootcode -p /boot/gptzfsboot -i . Ben From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 05:32:52 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 2CE8482F for ; Tue, 5 Mar 2013 05:32:52 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16]) by mx1.freebsd.org (Postfix) with ESMTP id D910D109 for ; Tue, 5 Mar 2013 05:32:51 +0000 (UTC) Received: from omta05.emeryville.ca.mail.comcast.net ([76.96.30.43]) by qmta01.emeryville.ca.mail.comcast.net with comcast id 7RqP1l0060vp7WLA1hYqG6; Tue, 05 Mar 2013 05:32:50 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta05.emeryville.ca.mail.comcast.net with comcast id 7hYp1l00F1t3BNj8RhYpep; Tue, 05 Mar 2013 05:32:49 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4572573A31; Mon, 4 Mar 2013 21:32:49 -0800 (PST) Date: Mon, 4 Mar 2013 21:32:49 -0800 From: Jeremy Chadwick To: Ben Morrow Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305053249.GA38107@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130305050539.GA52821@anubis.morrow.me.uk> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362461570; bh=nFa860JsFsflesZ/PCtuoEUZ3aENOQ8OxwIyC96Ncbg=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=By9xfZg+4LonIoHhi/a/XCBjGkuu83fqPy36pow11791eB8xZbI82NPo+SsbCbLXn EeT5Zu/VustLEfJmp6kTrPEEdZE9mCvR19yYaHxja2xBGKIyr6ECYgXyW/l2nre5AF lKx97ya0eXBqm5Lud7nRj1/HO3UXBxr5oEwC/YZ7biOaKufl7d77FQcewdKWEbcqac 2LCEBGvn+dPTh8AZNaIsFEgwWXaLfeyQ4qvv9qHKBnAi6vWUMs6giz1VcIu9ANnVK8 e+AmeC6gKRXmAPXvRMXuCiPy5QtGuGHcRX6jZPCncd0A8v/XfkV/UDD0S2vhyvJxB4 /3+ZOrqAwMiBw== Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 05:32:52 -0000 On Tue, Mar 05, 2013 at 05:05:47AM +0000, Ben Morrow wrote: > Quoth Karl Denninger : > > > > Note that the machine is not booting from ZFS -- it is booting from and > > has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks > > like a single "da0" drive to the OS) and that drive stalls as well when > > it freezes. It's definitely a kernel thing when it happens as the OS > > would otherwise not have locked (just I/O to the user partitions) -- but > > it does. > > Is it still the case that mixing UFS and ZFS can cause problems, or were > they all fixed? I remember a while ago (before the arc usage monitoring > code was added) there were a number of reports of serious probles > running an rsync from UFS to ZFS. This problem still exists on stable/9. The behaviour manifests itself as fairly bad performance (I cannot remember if stalling or if just throughput rates were awful). I can only speculate as to what the root cause is, but my guess is that it has something to do with the two caching systems (UFS vs. ZFS ARC) fighting over large sums of memory. The advice I've given people in the past is: if you do a LOT of I/O between UFS and ZFS on the same box, it's time to move to 100% ZFS. That said, I still do not recommend ZFS for a root filesystem (this biting people still happens even today), and swap-on-ZFS is a huge no-no. I will note that I myself use pure UFS+SU (not SUJ) for my main OS installation (that means /, swap, /var, /tmp, and /usr) on a dedicated SSD, while everything else is ZFS raidz1 (no dedup, no compression; won't ever enable these until that thread priority problem is fixed on FreeBSD). However, when I was migrating from gmirror+UFS+SU to ZFS, I witnessed what I described in my 1st and 2nd paragraphs. What userland utilities were used (rsync vs. cp) made no difference; the problem is in the kernel. Footnote about this thread: This thread contains all sorts of random pieces of information about systems, with very little actual detail in them (barring the symptoms, which are always useful to know!). For example, just because your machine has 8 cores and 12GB of RAM doesn't mean jack squat if some software in the kernel is designed "oddly". Reworded: throwing more hardware at a problem solves nothing. The most useful thing (for me) that I found was deep within the thread, a few words along the lines of "De-dup isn't used". What about compression, and if it's *ever* been enabled on the filesystem (even if not presently enabled)? It matters. All this matters. I see lots of end-users talking about these problems, but (barring Steven) literally no "kernel people" who are "in the know" about ZFS mentioning how said users can get them (devs) info that can help track this down. Those devs live on freebsd-fs@ and freebsd-hackers@, and not too many read freebsd-stable@. Step back for a moment and look at this anti-KISS configuration: - Hardware RAID controller involved (Areca 1680ix) - Hardware RAID controller has its own battery-backed cache (2GB) - Therefore arcmsr(4) is involved -- revision of driver/OS build matters here, ditto with firmware version - 4 disks are involved, models unknown - Disks are GPT and are *partitioned, and ZFS refers to the partitions not the raw disk -- this matters (honest, it really does; the ZFS code handles things differently with raw disks) - Providers are GELI-encrypted Now ask yourself if any dev is really going to tackle this one given the above mess. My advice would be to get rid of the hardware RAID (go with Intel ICHxx or ESBx on-board with AHCI), use raw disks for ZFS (if 4096-byte sector disks use the gnop(8) method, which is a one-time thing), and get rid of GELI. If you can reproduce the problem there 100% of the time, awesome, it's a clean/clear setup for someone to help investigate. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 05:40:40 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B7CEDABD for ; Tue, 5 Mar 2013 05:40:40 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 5B269162 for ; Tue, 5 Mar 2013 05:40:40 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r255ecCC083743; Tue, 5 Mar 2013 00:40:38 -0500 (EST) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id r255ecEC083742; Tue, 5 Mar 2013 00:40:38 -0500 (EST) (envelope-from wollman) Date: Tue, 5 Mar 2013 00:40:38 -0500 (EST) From: Garrett Wollman Message-Id: <201303050540.r255ecEC083742@hergotha.csail.mit.edu> To: killing@multiplay.co.uk Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? In-Reply-To: <8C68812328E3483BA9786EF15591124D@multiplay.co.uk> References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> Organization: none X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Tue, 05 Mar 2013 00:40:39 -0500 (EST) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 05:40:40 -0000 In article <8C68812328E3483BA9786EF15591124D@multiplay.co.uk>, killing@multiplay.co.uk writes: >Now interesting you should say that I've seen a stall recently on ZFS >only box running on 6 x SSD RAIDZ2. > >The stall was caused by fairly large mysql import, with nothing else >running. > >Then it happened I thought the machine had wedged, but minutes (not >seconds) later, everything sprung into action again. I have certainly seen what you might describe as "stalls", caused, so far as I can tell, by kernel memory starvation. I've seen it take as much as a half an hour to recover from these (which is too long for my users). Right now I have the ARC limited to 64 GB (on a 96 GB file server) and that has made it more stable, but it's still not behaving quite as I would like, and I'm looking to put more memory into the system (to be used for non-ARC functions). Looking at my munin graphs, I find that backups in particular put very heavy pressure on, doubling the UMA allocations over steady-state, and this takes about four or five hours to climb back down. See for an example. Some of the stalls are undoubtedly caused by internal fragmentation rather than actual data in use. (Solaris used to have this issue, and some hooks were added to allow some amount of garbage collection with the cooperation of the filesystem.) -GAWollman From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 09:13:01 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id F3BC1722 for ; Tue, 5 Mar 2013 09:13:00 +0000 (UTC) (envelope-from prvs=1776ac14af=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 9105ECC1 for ; Tue, 5 Mar 2013 09:13:00 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002549346.msg for ; Tue, 05 Mar 2013 09:12:58 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 05 Mar 2013 09:12:58 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1776ac14af=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" , "Ben Morrow" References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Tue, 5 Mar 2013 09:12:47 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 09:13:01 -0000 ----- Original Message ----- From: "Jeremy Chadwick" To: "Ben Morrow" Cc: Sent: Tuesday, March 05, 2013 5:32 AM Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? > On Tue, Mar 05, 2013 at 05:05:47AM +0000, Ben Morrow wrote: >> Quoth Karl Denninger : >> > >> > Note that the machine is not booting from ZFS -- it is booting from and >> > has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks >> > like a single "da0" drive to the OS) and that drive stalls as well when >> > it freezes. It's definitely a kernel thing when it happens as the OS >> > would otherwise not have locked (just I/O to the user partitions) -- but >> > it does. >> >> Is it still the case that mixing UFS and ZFS can cause problems, or were >> they all fixed? I remember a while ago (before the arc usage monitoring >> code was added) there were a number of reports of serious probles >> running an rsync from UFS to ZFS. > > This problem still exists on stable/9. The behaviour manifests itself > as fairly bad performance (I cannot remember if stalling or if just > throughput rates were awful). I can only speculate as to what the root > cause is, but my guess is that it has something to do with the two > caching systems (UFS vs. ZFS ARC) fighting over large sums of memory. In our case we have no UFS, so this isn't the cause of the stalls. Spec here is * 64GB RAM * LSI 2008 * 8.3-RELEASE * Pure ZFS * Trigger MySQL doing a DB import, nothing else running. * 4K disk alignment Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 09:27:02 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 06421CC5 for ; Tue, 5 Mar 2013 09:27:02 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:80]) by mx1.freebsd.org (Postfix) with ESMTP id DF22FD92 for ; Tue, 5 Mar 2013 09:27:01 +0000 (UTC) Received: from omta11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by qmta08.emeryville.ca.mail.comcast.net with comcast id 7lT11l0020mlR8UA8lT1qF; Tue, 05 Mar 2013 09:27:01 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta11.emeryville.ca.mail.comcast.net with comcast id 7lT01l0081t3BNj8XlT0Md; Tue, 05 Mar 2013 09:27:01 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 58E0A73A31; Tue, 5 Mar 2013 01:27:00 -0800 (PST) Date: Tue, 5 Mar 2013 01:27:00 -0800 From: Jeremy Chadwick To: Steven Hartland Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305092700.GA43045@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362475621; bh=dYYv9BAeH+HWOmMEGRGBsAvlgYtOhLkfcnuVO5BYspQ=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=XbaVRiCBLxh+Jc6Zdu0n5wKjMt3lAPmeRpqI0phbgQIt4Pt8sRMcXUwFeJovjw2El oKMQ3fx79olshuCIrJ4K6uvvmwWbQJO4mm781+05++73/qF9nQ0fEqE5aAKPmLi2eq m1354A4KrG2NqDVYqiv9oAdED13X1LtcIbG7No/CtouQ0oiieEb0WEcpsr1u7a6n0s ehR1vXDgnalEoE+rxlmIaVz6apGkH7m3x+8bX1BWMzrVtmUXps6l4VDw7ChIzti6Ye qAysqwwWnH51dcTstkgLne20fTSZXm5g8NZfMd/wvrQRqzOzS74C4w1Ca33/K3MF3o 5R+89YDJxkuDQ== Cc: Ben Morrow , freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 09:27:02 -0000 On Tue, Mar 05, 2013 at 09:12:47AM -0000, Steven Hartland wrote: > > ----- Original Message ----- From: "Jeremy Chadwick" > > To: "Ben Morrow" > Cc: > Sent: Tuesday, March 05, 2013 5:32 AM > Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? > > > >On Tue, Mar 05, 2013 at 05:05:47AM +0000, Ben Morrow wrote: > >>Quoth Karl Denninger : > >>> > Note that the machine is not booting from ZFS -- it is > >>booting from and > >>> has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks > >>> like a single "da0" drive to the OS) and that drive stalls as well when > >>> it freezes. It's definitely a kernel thing when it happens as the OS > >>> would otherwise not have locked (just I/O to the user partitions) -- but > >>> it does. > >> > >>Is it still the case that mixing UFS and ZFS can cause problems, or were > >>they all fixed? I remember a while ago (before the arc usage monitoring > >>code was added) there were a number of reports of serious probles > >>running an rsync from UFS to ZFS. > > > >This problem still exists on stable/9. The behaviour manifests itself > >as fairly bad performance (I cannot remember if stalling or if just > >throughput rates were awful). I can only speculate as to what the root > >cause is, but my guess is that it has something to do with the two > >caching systems (UFS vs. ZFS ARC) fighting over large sums of memory. > > In our case we have no UFS, so this isn't the cause of the stalls. > Spec here is > * 64GB RAM > * LSI 2008 > * 8.3-RELEASE > * Pure ZFS > * Trigger MySQL doing a DB import, nothing else running. > * 4K disk alignment 1. Is compression enabled? Has it ever been enabled (on any fs) in the past (barring pool being destroyed + recreated)? 2. Is dedup enabled? Has it ever been enabled (on any fs) in the past (barring pool being destroyed + recreated)? I can speculate day and night about what could cause this kind of issue, honestly. The possibilities are quite literally infinite, and all of them require folks deeply familiar with both FreeBSD's ZFS as well as very key/major parts of the kernel (ranging from VM to interrupt handlers to I/O subsystem). (This next comment isn't for you, Steve, you already know this :-) ) The way different pieces of the kernel interact with one another is fairly complex; the kernel is not simple. Things I think that might prove useful: * Describing the stall symptoms; what all does it impact? Can you switch VTYs on console when its happening? Network I/O (e.g. SSH'd into the same box and just holding down a letter) showing stalls then catching up? Things of this nature. * How long the stall is in duration (ex. if there's some way to roughly calculate this using "date" in a shell script) * Contents of /etc/sysctl.conf and /boot/loader.conf (re: "tweaking" of the system) * "sysctl -a | grep zfs" before and after a stall -- do not bother with those "ARC summaries" scripts please, at least not for this * "vmstat -z" before and after a stall * "vmstat -m" before and after a stall * "vmstat -s" before and after a stall * "vmstat -i" before, after, AND during a stall Basically, every person who experiences this problem needs to treat every situation uniquely -- no "me too" -- and try to find reliable 100% test cases for it. That's the only way bugs of this nature (i.e. of a complex nature) get fixed. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 11:09:48 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 209103C9 for ; Tue, 5 Mar 2013 11:09:48 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 736F775C for ; Tue, 5 Mar 2013 11:09:47 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA20941; Tue, 05 Mar 2013 13:09:41 +0200 (EET) (envelope-from avg@FreeBSD.org) Message-ID: <5135D275.3050500@FreeBSD.org> Date: Tue, 05 Mar 2013 13:09:41 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130220 Thunderbird/17.0.3 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> In-Reply-To: <20130305053249.GA38107@icarus.home.lan> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 11:09:48 -0000 Completely unrelated to the main thread: on 05/03/2013 07:32 Jeremy Chadwick said the following: > That said, I still do not recommend ZFS for a root filesystem Why? > (this biting people still happens even today) What exactly? > - Disks are GPT and are *partitioned, and ZFS refers to the partitions > not the raw disk -- this matters (honest, it really does; the ZFS > code handles things differently with raw disks) Not on FreeBSD as far I can see. P.S. I completely agree with your suggestions on simplifying the setup and gathering objective information for the purpose of debugging the issue. I also completely agree that "me too"-ing is not very useful (and often completely incorrect) for the complex problems like this one. Thank you. -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 12:56:09 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A2039417 for ; Tue, 5 Mar 2013 12:56:09 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 54EB8CB6 for ; Tue, 5 Mar 2013 12:56:08 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r25Cu7ps066311 for ; Tue, 5 Mar 2013 06:56:07 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Tue Mar 5 06:56:07 2013 Message-ID: <5135EB62.6060006@denninger.net> Date: Tue, 05 Mar 2013 06:56:02 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> <20130305092700.GA43045@icarus.home.lan> In-Reply-To: <20130305092700.GA43045@icarus.home.lan> X-Enigmail-Version: 1.5 X-Antivirus: avast! (VPS 130304-2, 03/04/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 12:56:09 -0000 On 3/5/2013 3:27 AM, Jeremy Chadwick wrote: > On Tue, Mar 05, 2013 at 09:12:47AM -0000, Steven Hartland wrote: >> ----- Original Message ----- From: "Jeremy Chadwick" >> >> To: "Ben Morrow" >> Cc: >> Sent: Tuesday, March 05, 2013 5:32 AM >> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? >> >> >>> On Tue, Mar 05, 2013 at 05:05:47AM +0000, Ben Morrow wrote: >>>> Quoth Karl Denninger : >>>>>> Note that the machine is not booting from ZFS -- it is >>>> booting from and >>>>> has its swap on a UFS 2-drive mirror (handled by the disk adapter; looks >>>>> like a single "da0" drive to the OS) and that drive stalls as well when >>>>> it freezes. It's definitely a kernel thing when it happens as the OS >>>>> would otherwise not have locked (just I/O to the user partitions) -- but >>>>> it does. >>>> Is it still the case that mixing UFS and ZFS can cause problems, or were >>>> they all fixed? I remember a while ago (before the arc usage monitoring >>>> code was added) there were a number of reports of serious probles >>>> running an rsync from UFS to ZFS. >>> This problem still exists on stable/9. The behaviour manifests itself >>> as fairly bad performance (I cannot remember if stalling or if just >>> throughput rates were awful). I can only speculate as to what the root >>> cause is, but my guess is that it has something to do with the two >>> caching systems (UFS vs. ZFS ARC) fighting over large sums of memory. >> In our case we have no UFS, so this isn't the cause of the stalls. >> Spec here is >> * 64GB RAM >> * LSI 2008 >> * 8.3-RELEASE >> * Pure ZFS >> * Trigger MySQL doing a DB import, nothing else running. >> * 4K disk alignment > 1. Is compression enabled? Has it ever been enabled (on any fs) in the > past (barring pool being destroyed + recreated)? > > 2. Is dedup enabled? Has it ever been enabled (on any fs) in the past > (barring pool being destroyed + recreated)? > > I can speculate day and night about what could cause this kind of issue, > honestly. The possibilities are quite literally infinite, and all of > them require folks deeply familiar with both FreeBSD's ZFS as well as > very key/major parts of the kernel (ranging from VM to interrupt > handlers to I/O subsystem). (This next comment isn't for you, Steve, > you already know this :-) ) The way different pieces of the kernel > interact with one another is fairly complex; the kernel is not simple. > > Things I think that might prove useful: > > * Describing the stall symptoms; what all does it impact? Can you > switch VTYs on console when its happening? Network I/O (e.g. SSH'd > into the same box and just holding down a letter) showing stalls > then catching up? Things of this nature. When it happens on my system anything that is CPU-bound continues to execute. I can switch consoles and network I/O also works. If I have an iostat running at the time all I/O counters go to and remain at zero while the stall is occurring, but the process that is producing the iostat continues to run and emit characters whether it is a ssh session or on the physical console. The CPUs are running and processing, but all threads block if they attempt access to the disk I/O subsystem, irrespective of the portion of the disk I/O subsystem they attempt to access (e.g. UFS, swap or ZFS) I therefore cannot start any new process that requires image activation. > * How long the stall is in duration (ex. if there's some way to > roughly calculate this using "date" in a shell script) They're variable. Some last fractions of a second and are not really all that noticeable unless you happen to be paying CLOSE attention. Some last a few (5 or so) seconds. The really bad ones last long enough that the kernel throws the message "swap_pager: indefinite wait buffer". The machine in the general sense never pages. It contains 12GB of RAM but historically (prior to ZFS being put into service) always showed "0" for a "pstat -s", although it does have a 20g raw swap partition (to /dev/da0s1b, not to a zpool) allocated. During the stalls I cannot run a pstat (I tried; it stalls) but when it unlocks I find that there is swap allocated, albeit not a ridiculous amount. ~20,000 pages or so have made it to the swap partition. This is not behavior that I had seen before on this machine prior to the stall problem, and with the two tuning tweaks discussed here I'm now up to 48 hours without any allocation to swap (or any stalls.) > * Contents of /etc/sysctl.conf and /boot/loader.conf (re: "tweaking" > of the system) /boot/loader.conf: kern.ipc.semmni=256 kern.ipc.semmns=512 kern.ipc.semmnu=256 geom_eli_load="YES" sound_load="YES" # # Limit to physical CPU count for threads # kern.geom.eli.threads=8 # # ZFS Prefetch does help, although you'd think it would not due to the adapter # doing it already. Wrong guess; it's good for 2x the performance. # We limit the ARC to 2GB of RAM and the TXG write limit to 1GB. # #vfs.zfs.prefetch_disable="1" vfs.zfs.arc_max=2000000000 vfs.zfs.write_limit_override=1024000000 -------------------------------- The first three are required for Postgres. The geli thread limit has been found to provide better performance under heavy load, as the system will otherwise start 16 threads per geli-attached provider since the CPUs support hyperthreading. The two ZFS-related entries at the end, if present, stop the stalls. Geli is not used on the boot pack; da0 is an old-style MBR disk that is physically comprised of two 300MB drives in a mirror managed by the adapter. Swap resides on the traditional "b" slice of that pack; it is a reasonably-standard "old-style" setup in that regard with separate root, /home, /var and /usr slices. sysctl.conf contains: # $FreeBSD: src/etc/sysctl.conf,v 1.8 2003/03/13 18:43:50 mux Exp $ # # This file is read when going to multi-user and its contents piped thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. # # Uncomment this to prevent users from seeing information about processes that # are being run under another UID. #security.bsd.see_other_uids=0 # # tuning for PostgreSQL # kern.ipc.shm_use_phys=1 kern.ipc.shmmax=4096000000 kern.ipc.shmall=1000000 kern.ipc.semmsl=512 kern.ipc.semmap=256 # # IP Performance # kern.ipc.somaxconn=4096 kern.ipc.nmbclusters=32768 net.inet.tcp.sendspace=131072 net.inet.tcp.recvspace=131072 net.inet.tcp.inflight.enable=1 # # Tune for asshole (DDOS) resistance # net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 net.inet.icmp.icmplim=10 net.inet.tcp.imcp_may_rst=0 net.inet.tcp.drop_synfin=1 net.inet.tcp.msl=7500 # # Maxfiles # kern.maxfiles=65535 I suspect (but can't yet prove) that wiring shared memory is likely involved in this. That makes a BIG difference in Postgres performance, but I can certainly see where a misbehaving ARC cache could "think" that the (rather large) shared segment that Postgres has (it currently allocates 1.5G of shared memory and wires it) can or might "get out of the way." But it most-certainly won't with kern.ipc.shm_use_phys set. In normal operation that Postgres server is a hot-spare replication machine that connects to Asheville; in the event of a catastrophic failure there it would be promoted and the load would shift here. > * "sysctl -a | grep zfs" before and after a stall -- do not bother > with those "ARC summaries" scripts please, at least not for this > * "vmstat -z" before and after a stall > * "vmstat -m" before and after a stall > * "vmstat -s" before and after a stall > * "vmstat -i" before, after, AND during a stall > > Basically, every person who experiences this problem needs to treat > every situation uniquely -- no "me too" -- and try to find reliable 100% > test cases for it. That's the only way bugs of this nature (i.e. > of a complex nature) get fixed. I am fortunate enough to have an identical machine that's "cold" in the rack and will effort spinning that up today; I'm going to attach another pack to the backup and allow it to resilver, then use that "in anger" to restore the spare box. I'm quite sure I can reproduce the workload that causes the stalls; populating the backup pack as a separate zfs pool (with zfs send | zfs recv) was what led to it happening here originally. With that said I've got more than 24 hours on the box that exhibited the problem with the two tunables in /boot/loader.conf and a sentinal process that is doing a zpool iostat 5 looking for more than one "all zeros" I/O line sequentially. It hasn't happened since I stuck those two lines in there and at this point two nightly backup runs have gone to completion along with some fairly heavy user I/O last evening which was plenty of load to provoke the misbehavior previously. -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 14:10:57 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0C13EB0A for ; Tue, 5 Mar 2013 14:10:57 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-ee0-f44.google.com (mail-ee0-f44.google.com [74.125.83.44]) by mx1.freebsd.org (Postfix) with ESMTP id 90CD8192 for ; Tue, 5 Mar 2013 14:10:56 +0000 (UTC) Received: by mail-ee0-f44.google.com with SMTP id l10so4656177eei.17 for ; Tue, 05 Mar 2013 06:10:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=NIalEIDpvWOUHsIVgYhDi1W3HuabM43KbdU6LzZihbY=; b=l2hqcm7GwxoOmaWXGVZRNueqg9olNIF8FOJfi5z8/1gp34HMmVfybZWi9e9TR5gJkV FI2Yo7juC9NPASaM0aUXk5H6bVW4Fcb0DsKxazarD89jPbRommY/LCq0yh/bGfHwFeMA ULT44GUodu0VE43QWV0AHN1oDYQBjsY26KJt1L/jU5x63EASfjK0s2vKfqspRwWnv+e9 2R/GJYd/HbvKzCfnWi1L+v794kRl5aawVYEsIJXzztcNc7RRRXMZ3wAjqgNkb73gX2e2 ZnxFNr++d31hxYQfC9+V5pNBO6Rprod0DjWqPvWvLWSq35zZG/1oeME6CHd9nrZSMPd/ up1w== X-Received: by 10.14.207.73 with SMTP id m49mr71243646eeo.24.1362492655243; Tue, 05 Mar 2013 06:10:55 -0800 (PST) Received: from [192.168.1.129] (schavemaker.nl. [213.84.84.186]) by mx.google.com with ESMTPS id o3sm37608293eem.15.2013.03.05.06.10.53 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 05 Mar 2013 06:10:54 -0800 (PST) Message-ID: <5135FCEC.7090105@gmail.com> Date: Tue, 05 Mar 2013 15:10:52 +0100 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable Subject: make_dev_physpath_alias Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 14:10:57 -0000 Hello all. I have a supermicro 16 bay box with a LSI 9211-8i card. We use it for temp data storage, and we wanted to try the l4z compression. After updating the source tree to r247839: and doing a make buildworld cycle all works fine. But at boot time we get some warnings. make_dev_physpath_alias: WARNING - Unable to alias gptid/281951f4-a996-11e1-83eb-00259061b51a to enc@n500304800182f4fd/type@0/slot@1/elmdesc@Slot_01/gptid/281951f4-a996-11e1-83eb-00259061b51a - path too long Do i need to worry!!! Gr johan Here is my full dmesg Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.1-STABLE #0 r247839: Tue Mar 5 13:17:25 CET 2013 root@backup.neuteboom.com:/usr/obj/usr/src/sys/KRNL amd64 CPU: Intel(R) Xeon(R) CPU E31220 @ 3.10GHz (3093.04-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x206a7 Family = 0x6 Model = 0x2a Stepping = 7 Features=0xbfebfbff Features2=0x1fbae3ff AMD Features=0x28100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33075630080 (31543 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 2 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 6 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 ctl: CAM Target Layer loaded acpi0: on motherboard acpi0: Power Button (fixed) acpi0: reservation of 67, 1 (4) failed cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 Event timer "HPET3" frequency 14318180 Hz quality 440 Event timer "HPET4" frequency 14318180 Hz quality 440 atrtc0: port 0x70-0x77 irq 8 on acpi0 atrtc0: Warning: Couldn't map I/O. Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: irq 19 at device 6.0 on pci0 pci1: on pcib1 mps0: port 0xe000-0xe0ff mem 0xf7600000-0xf7603fff,0xf7580000-0xf75bffff irq 19 at device 0.0 on pci1 mps0: Firmware: 15.00.00.00, Driver: 14.00.00.01-fbsd mps0: IOCCapabilities: 1285c em0: port 0xf020-0xf03f mem 0xf7800000-0xf781ffff,0xf7825000-0xf7825fff irq 20 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 00:25:90:75:c8:09 ehci0: mem 0xf7824000-0xf78243ff irq 16 at device 26.0 on pci0 usbus0: EHCI version 1.0 usbus0 on ehci0 pcib2: irq 16 at device 28.0 on pci0 pci2: on pcib2 pcib3: irq 16 at device 28.4 on pci0 pci3: on pcib3 em1: port 0xd000-0xd01f mem 0xf7700000-0xf771ffff,0xf7720000-0xf7723fff irq 16 at device 0.0 on pci3 em1: Using MSIX interrupts with 3 vectors em1: Ethernet address: 00:25:90:75:c8:08 ehci1: mem 0xf7823000-0xf78233ff irq 23 at device 29.0 on pci0 usbus1: EHCI version 1.0 usbus1 on ehci1 pcib4: at device 30.0 on pci0 pci4: on pcib4 vgapci0: mem 0xf5000000-0xf5ffffff,0xf7000000-0xf7003fff,0xf6800000-0xf6ffffff irq 19 at device 3.0 on pci4 isab0: at device 31.0 on pci0 isa0: on isab0 ahci0: port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf000-0xf01f mem 0xf7822000-0xf78227ff irq 19 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 acpi_tz0: on acpi0 acpi_tz1: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 uart2: <16550 or compatible> port 0x3e8-0x3ef irq 10 on acpi0 orm0: at iomem 0xc0000-0xc7fff,0xce000-0xcefff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range est0: on cpu0 p4tcc0: on cpu0 est1: on cpu1 p4tcc1: on cpu1 est2: on cpu2 p4tcc2: on cpu2 est3: on cpu3 p4tcc3: on cpu3 ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 1.000 msec usbus0: 480Mbps High Speed USB v2.0 usbus1: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ada0 at ahcich0 bus 0 scbus1 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus2 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 ses0 at mps0 bus 0 scbus0 target 8 lun 0 ses0: Fixed Enclosure Services SCSI-5 device ses0: 600.000MB/s transfers ses0: Command Queueing enabled ses0: SCSI-3 ENC Device ses0: probe9,pass1,da0: Element descriptor: 'Slot 01' ses0: probe9,pass1,da0: SAS Device Slot Element: 1 Phys at Slot 0 ses0: phy 0: SATA device ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ec ses0: probe10,pass2,da1: Element descriptor: 'Slot 02' ses0: probe10,pass2,da1: SAS Device Slot Element: 1 Phys at Slot 1 ses0: phy 0: SATA device ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ed ses0: probe11,pass3,da2: Element descriptor: 'Slot 03' ses0: probe11,pass3,da2: SAS Device Slot Element: 1 Phys at Slot 2 ses0: phy 0: SATA device ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ee ses0: probe12,pass4,da3: Element descriptor: 'Slot 04' ses0: probe12,pass4,da3: SAS Device Slot Element: 1 Phys at Slot 3 ses0: phy 0: SATA device ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ef ses0: probe13,pass5,da4: Element descriptor: 'Slot 05' ses0: probe13,pass5,da4: SAS Device Slot Element: 1 Phys at Slot 4 ses0: phy 0: SATA device ses0: phy 0: parent 500304800182f4ff addr 500304800182f4f0 ses0: probe14,pass6,da5: Element descriptor: 'Slot 06' ses0: probe14,pass6,da5: SAS Device Slot Element: 1 Phys at Slot 5 ses0: phy 0: SATA device ses0: phy 0: parent 500304800182f4ff addr 500304800182f4f1 SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! Timecounter "TSC-low" frequency 1546522006 Hz quality 1000 da4 at mps0 bus 0 scbus0 target 13 lun 0 da4: Fixed Direct Access SCSI-6 device da4: 600.000MB/s transfers da4: Command Queueing enabled da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da3 at mps0 bus 0 scbus0 target 12 lun 0 da3: Fixed Direct Access SCSI-6 device da3: 600.000MB/s transfers da3: Command Queueing enabled da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da0 at mps0 bus 0 scbus0 target 9 lun 0 da0: Fixed Direct Access SCSI-6 device da0: 600.000MB/s transfers da0: Command Queueing enabled da0: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da5 at mps0 bus 0 scbus0 target 14 lun 0 da5: Fixed Direct Access SCSI-6 device da5: 600.000MB/s transfers da5: Command Queueing enabled da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da2 at mps0 bus 0 scbus0 target 11 lun 0 da2: Fixed Direct Access SCSI-6 device da2: 600.000MB/s transfers da2: Command Queueing enabled da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) da1 at mps0 bus 0 scbus0 target 10 lun 0 da1: Fixed Direct Access SCSI-6 device da1: 600.000MB/s transfers da1: Command Queueing enabled da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered make_dev_physpath_alias: WARNING - Unable to alias gptid/281951f4-a996-11e1-83eb-00259061b51a to enc@n500304800182f4fd/type@0/slot@1/elmdesc@Slot_01/gptid/281951f4-a996-11e1-83eb-00259061b51a - path too long make_dev_physpath_alias: WARNING - Unable to alias gptid/87f6d404-a997-11e1-83eb-00259061b51a to enc@n500304800182f4fd/type@0/slot@2/elmdesc@Slot_02/gptid/87f6d404-a997-11e1-83eb-00259061b51a - path too long make_dev_physpath_alias: WARNING - Unable to alias gptid/affdfc28-a997-11e1-83eb-00259061b51a to enc@n500304800182f4fd/type@0/slot@3/elmdesc@Slot_03/gptid/affdfc28-a997-11e1-83eb-00259061b51a - path too long make_dev_physpath_alias: WARNING - Unable to alias gptid/c8eab51d-a997-11e1-83eb-00259061b51a to enc@n500304800182f4fd/type@0/slot@4/elmdesc@Slot_04/gptid/c8eab51d-a997-11e1-83eb-00259061b51a - path too long ugen0.2: at usbus0 uhub2: on usbus0 ugen1.2: at usbus1 uhub3: on usbus1 make_dev_physpath_alias: WARNING - Unable to alias gptid/e4b1f963-a997-11e1-83eb-00259061b51a to enc@n500304800182f4fd/type@0/slot@5/elmdesc@Slot_05/gptid/e4b1f963-a997-11e1-83eb-00259061b51a - path too long make_dev_physpath_alias: WARNING - Unable to alias gptid/faa2b1ca-a997-11e1-83eb-00259061b51a to enc@n500304800182f4fd/type@0/slot@6/elmdesc@Slot_06/gptid/faa2b1ca-a997-11e1-83eb-00259061b51a - path too long Root mount waiting for: usbus1 usbus0 uhub2: 6 ports with 6 removable, self powered uhub3: 6 ports with 6 removable, self powered ugen0.3: at usbus0 ums0: on usbus0 ums0: 3 buttons and [Z] coordinates ID=0 ukbd0: on usbus0 kbd2 at ukbd0 Trying to mount root from ufs:/dev/ada0p2 [rw]... em0: link state changed to UP thanks all From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 15:22:54 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id C4B5CFDA for ; Tue, 5 Mar 2013 15:22:54 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 81563831 for ; Tue, 5 Mar 2013 15:22:54 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UCthc-0003yW-Aa; Tue, 05 Mar 2013 10:22:52 -0500 Date: Tue, 5 Mar 2013 10:22:52 -0500 From: Gary Palmer To: Garrett Wollman Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305152252.GA52706@in-addr.com> References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <201303050540.r255ecEC083742@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201303050540.r255ecEC083742@hergotha.csail.mit.edu> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: stable@freebsd.org, killing@multiplay.co.uk X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 15:22:54 -0000 On Tue, Mar 05, 2013 at 12:40:38AM -0500, Garrett Wollman wrote: > In article <8C68812328E3483BA9786EF15591124D@multiplay.co.uk>, > killing@multiplay.co.uk writes: > > >Now interesting you should say that I've seen a stall recently on ZFS > >only box running on 6 x SSD RAIDZ2. > > > >The stall was caused by fairly large mysql import, with nothing else > >running. > > > >Then it happened I thought the machine had wedged, but minutes (not > >seconds) later, everything sprung into action again. > > I have certainly seen what you might describe as "stalls", caused, so > far as I can tell, by kernel memory starvation. I've seen it take as > much as a half an hour to recover from these (which is too long for my > users). Right now I have the ARC limited to 64 GB (on a 96 GB file > server) and that has made it more stable, but it's still not behaving > quite as I would like, and I'm looking to put more memory into the > system (to be used for non-ARC functions). Looking at my munin > graphs, I find that backups in particular put very heavy pressure on, > doubling the UMA allocations over steady-state, and this takes about > four or five hours to climb back down. See > for an example. > > Some of the stalls are undoubtedly caused by internal fragmentation > rather than actual data in use. (Solaris used to have this issue, and > some hooks were added to allow some amount of garbage collection with > the cooperation of the filesystem.) Just as a note that there was a page I read in the past few months that pointed out that having a huge ARC may not always be in the best interests of the system. Some operation on the filesystem (I forget what, apologies) caused the system to churn through the ARC and discard most of it, while regular I/O was blocked Unfortunately I cannot remember where I found that page now and I don't appear to have bookmarked it >From what has been said in this thread I'm not convinced that people are hitting this issue, however I would like to raise it for consideration Regards, Gary From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 15:34:27 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4100C92F for ; Tue, 5 Mar 2013 15:34:27 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id 8C2888E4 for ; Tue, 5 Mar 2013 15:34:26 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UCtse-0005Qx-NU for freebsd-stable@freebsd.org; Tue, 05 Mar 2013 16:34:18 +0100 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UCtse-00024p-NA for freebsd-stable@freebsd.org; Tue, 05 Mar 2013 16:34:16 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-stable@freebsd.org Subject: Re: make_dev_physpath_alias References: <5135FCEC.7090105@gmail.com> Date: Tue, 05 Mar 2013 16:34:15 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <5135FCEC.7090105@gmail.com> User-Agent: Opera Mail/12.14 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: b18fd8143cf8f96f7d4d8fb8e2c1cc4e X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 15:34:27 -0000 On Tue, 05 Mar 2013 15:10:52 +0100, Johan Hendriks wrote: > Hello all. > > I have a supermicro 16 bay box with a LSI 9211-8i card. > We use it for temp data storage, and we wanted to try the l4z > compression. > > After updating the source tree to r247839: and doing a make buildworld > cycle all works fine. > But at boot time we get some warnings. > make_dev_physpath_alias: WARNING - Unable to alias > gptid/281951f4-a996-11e1-83eb-00259061b51a to > enc@n500304800182f4fd/type@0/slot@1/elmdesc@Slot_01/gptid/281951f4-a996-11e1-83eb-00259061b51a > - path too long > > Do i need to worry!!! Only if you want to use the paths it can't create. Ronald. > > Gr > johan > > Here is my full dmesg > > Copyright (c) 1992-2013 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights > reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 9.1-STABLE #0 r247839: Tue Mar 5 13:17:25 CET 2013 > root@backup.neuteboom.com:/usr/obj/usr/src/sys/KRNL amd64 > CPU: Intel(R) Xeon(R) CPU E31220 @ 3.10GHz (3093.04-MHz K8-class CPU) > Origin = "GenuineIntel" Id = 0x206a7 Family = 0x6 Model = 0x2a > Stepping = 7 > Features=0xbfebfbff > Features2=0x1fbae3ff > AMD Features=0x28100800 > AMD Features2=0x1 > TSC: P-state invariant, performance statistics > real memory = 34359738368 (32768 MB) > avail memory = 33075630080 (31543 MB) > Event timer "LAPIC" quality 600 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > FreeBSD/SMP: 1 package(s) x 4 core(s) > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 2 > cpu2 (AP): APIC ID: 4 > cpu3 (AP): APIC ID: 6 > ioapic0 irqs 0-23 on motherboard > kbd1 at kbdmux0 > ctl: CAM Target Layer loaded > acpi0: on motherboard > acpi0: Power Button (fixed) > acpi0: reservation of 67, 1 (4) failed > cpu0: on acpi0 > cpu1: on acpi0 > cpu2: on acpi0 > cpu3: on acpi0 > hpet0: iomem 0xfed00000-0xfed003ff on acpi0 > Timecounter "HPET" frequency 14318180 Hz quality 950 > Event timer "HPET" frequency 14318180 Hz quality 550 > Event timer "HPET1" frequency 14318180 Hz quality 440 > Event timer "HPET2" frequency 14318180 Hz quality 440 > Event timer "HPET3" frequency 14318180 Hz quality 440 > Event timer "HPET4" frequency 14318180 Hz quality 440 > atrtc0: port 0x70-0x77 irq 8 on acpi0 > atrtc0: Warning: Couldn't map I/O. > Event timer "RTC" frequency 32768 Hz quality 0 > attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 > Timecounter "i8254" frequency 1193182 Hz quality 0 > Event timer "i8254" frequency 1193182 Hz quality 100 > Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pcib1: irq 19 at device 6.0 on pci0 > pci1: on pcib1 > mps0: port 0xe000-0xe0ff mem > 0xf7600000-0xf7603fff,0xf7580000-0xf75bffff irq 19 at device 0.0 on pci1 > mps0: Firmware: 15.00.00.00, Driver: 14.00.00.01-fbsd > mps0: IOCCapabilities: > 1285c > em0: port 0xf020-0xf03f mem > 0xf7800000-0xf781ffff,0xf7825000-0xf7825fff irq 20 at device 25.0 on pci0 > em0: Using an MSI interrupt > em0: Ethernet address: 00:25:90:75:c8:09 > ehci0: mem 0xf7824000-0xf78243ff irq > 16 at device 26.0 on pci0 > usbus0: EHCI version 1.0 > usbus0 on ehci0 > pcib2: irq 16 at device 28.0 on pci0 > pci2: on pcib2 > pcib3: irq 16 at device 28.4 on pci0 > pci3: on pcib3 > em1: port 0xd000-0xd01f mem > 0xf7700000-0xf771ffff,0xf7720000-0xf7723fff irq 16 at device 0.0 on pci3 > em1: Using MSIX interrupts with 3 vectors > em1: Ethernet address: 00:25:90:75:c8:08 > ehci1: mem 0xf7823000-0xf78233ff irq > 23 at device 29.0 on pci0 > usbus1: EHCI version 1.0 > usbus1 on ehci1 > pcib4: at device 30.0 on pci0 > pci4: on pcib4 > vgapci0: mem > 0xf5000000-0xf5ffffff,0xf7000000-0xf7003fff,0xf6800000-0xf6ffffff irq 19 > at device 3.0 on pci4 > isab0: at device 31.0 on pci0 > isa0: on isab0 > ahci0: port > 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf000-0xf01f > mem 0xf7822000-0xf78227ff irq 19 at device 31.2 on pci0 > ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported > ahcich0: at channel 0 on ahci0 > ahcich1: at channel 1 on ahci0 > ahcich2: at channel 2 on ahci0 > ahcich4: at channel 4 on ahci0 > ahcich5: at channel 5 on ahci0 > pci0: at device 31.3 (no driver attached) > acpi_button0: on acpi0 > acpi_tz0: on acpi0 > acpi_tz1: on acpi0 > atkbdc0: port 0x60,0x64 irq 1 on acpi0 > atkbd0: irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > psm0: irq 12 on atkbdc0 > psm0: [GIANT-LOCKED] > psm0: model IntelliMouse Explorer, device ID 4 > uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 > uart2: <16550 or compatible> port 0x3e8-0x3ef irq 10 on acpi0 > orm0: at iomem 0xc0000-0xc7fff,0xce000-0xcefff on isa0 > sc0: at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > ppc0: cannot reserve I/O port range > est0: on cpu0 > p4tcc0: on cpu0 > est1: on cpu1 > p4tcc1: on cpu1 > est2: on cpu2 > p4tcc2: on cpu2 > est3: on cpu3 > p4tcc3: on cpu3 > ZFS filesystem version: 5 > ZFS storage pool version: features support (5000) > Timecounters tick every 1.000 msec > usbus0: 480Mbps High Speed USB v2.0 > usbus1: 480Mbps High Speed USB v2.0 > ugen0.1: at usbus0 > uhub0: on usbus0 > ugen1.1: at usbus1 > uhub1: on usbus1 > ada0 at ahcich0 bus 0 scbus1 target 0 lun 0 > ada0: ATA-8 SATA 2.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > ada1 at ahcich1 bus 0 scbus2 target 0 lun 0 > ada1: ATA-8 SATA 2.x device > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) > ada1: Previously was known as ad6 > ses0 at mps0 bus 0 scbus0 target 8 lun 0 > ses0: Fixed Enclosure Services SCSI-5 device > ses0: 600.000MB/s transfers > ses0: Command Queueing enabled > ses0: SCSI-3 ENC Device > ses0: probe9,pass1,da0: Element descriptor: 'Slot 01' > ses0: probe9,pass1,da0: SAS Device Slot Element: 1 Phys at Slot 0 > ses0: phy 0: SATA device > ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ec > ses0: probe10,pass2,da1: Element descriptor: 'Slot 02' > ses0: probe10,pass2,da1: SAS Device Slot Element: 1 Phys at Slot 1 > ses0: phy 0: SATA device > ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ed > ses0: probe11,pass3,da2: Element descriptor: 'Slot 03' > ses0: probe11,pass3,da2: SAS Device Slot Element: 1 Phys at Slot 2 > ses0: phy 0: SATA device > ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ee > ses0: probe12,pass4,da3: Element descriptor: 'Slot 04' > ses0: probe12,pass4,da3: SAS Device Slot Element: 1 Phys at Slot 3 > ses0: phy 0: SATA device > ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ef > ses0: probe13,pass5,da4: Element descriptor: 'Slot 05' > ses0: probe13,pass5,da4: SAS Device Slot Element: 1 Phys at Slot 4 > ses0: phy 0: SATA device > ses0: phy 0: parent 500304800182f4ff addr 500304800182f4f0 > ses0: probe14,pass6,da5: Element descriptor: 'Slot 06' > ses0: probe14,pass6,da5: SAS Device Slot Element: 1 Phys at Slot 5 > ses0: phy 0: SATA device > ses0: phy 0: parent 500304800182f4ff addr 500304800182f4f1 > SMP: AP CPU #3 Launched! > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > Timecounter "TSC-low" frequency 1546522006 Hz quality 1000 > da4 at mps0 bus 0 scbus0 target 13 lun 0 > da4: Fixed Direct Access SCSI-6 device > da4: 600.000MB/s transfers > da4: Command Queueing enabled > da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) > da3 at mps0 bus 0 scbus0 target 12 lun 0 > da3: Fixed Direct Access SCSI-6 device > da3: 600.000MB/s transfers > da3: Command Queueing enabled > da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) > da0 at mps0 bus 0 scbus0 target 9 lun 0 > da0: Fixed Direct Access SCSI-6 device > da0: 600.000MB/s transfers > da0: Command Queueing enabled > da0: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) > da5 at mps0 bus 0 scbus0 target 14 lun 0 > da5: Fixed Direct Access SCSI-6 device > da5: 600.000MB/s transfers > da5: Command Queueing enabled > da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) > da2 at mps0 bus 0 scbus0 target 11 lun 0 > da2: Fixed Direct Access SCSI-6 device > da2: 600.000MB/s transfers > da2: Command Queueing enabled > da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) > da1 at mps0 bus 0 scbus0 target 10 lun 0 > da1: Fixed Direct Access SCSI-6 device > da1: 600.000MB/s transfers > da1: Command Queueing enabled > da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) > uhub0: 2 ports with 2 removable, self powered > uhub1: 2 ports with 2 removable, self powered > make_dev_physpath_alias: WARNING - Unable to alias > gptid/281951f4-a996-11e1-83eb-00259061b51a to > enc@n500304800182f4fd/type@0/slot@1/elmdesc@Slot_01/gptid/281951f4-a996-11e1-83eb-00259061b51a > - path too long > make_dev_physpath_alias: WARNING - Unable to alias > gptid/87f6d404-a997-11e1-83eb-00259061b51a to > enc@n500304800182f4fd/type@0/slot@2/elmdesc@Slot_02/gptid/87f6d404-a997-11e1-83eb-00259061b51a > - path too long > make_dev_physpath_alias: WARNING - Unable to alias > gptid/affdfc28-a997-11e1-83eb-00259061b51a to > enc@n500304800182f4fd/type@0/slot@3/elmdesc@Slot_03/gptid/affdfc28-a997-11e1-83eb-00259061b51a > - path too long > make_dev_physpath_alias: WARNING - Unable to alias > gptid/c8eab51d-a997-11e1-83eb-00259061b51a to > enc@n500304800182f4fd/type@0/slot@4/elmdesc@Slot_04/gptid/c8eab51d-a997-11e1-83eb-00259061b51a > - path too long > ugen0.2: at usbus0 > uhub2: > on usbus0 > ugen1.2: at usbus1 > uhub3: > on usbus1 > make_dev_physpath_alias: WARNING - Unable to alias > gptid/e4b1f963-a997-11e1-83eb-00259061b51a to > enc@n500304800182f4fd/type@0/slot@5/elmdesc@Slot_05/gptid/e4b1f963-a997-11e1-83eb-00259061b51a > - path too long > make_dev_physpath_alias: WARNING - Unable to alias > gptid/faa2b1ca-a997-11e1-83eb-00259061b51a to > enc@n500304800182f4fd/type@0/slot@6/elmdesc@Slot_06/gptid/faa2b1ca-a997-11e1-83eb-00259061b51a > - path too long > Root mount waiting for: usbus1 usbus0 > uhub2: 6 ports with 6 removable, self powered > uhub3: 6 ports with 6 removable, self powered > ugen0.3: at usbus0 > ums0: rev 1.10/0.01, addr 3> on usbus0 > ums0: 3 buttons and [Z] coordinates ID=0 > ukbd0: rev 1.10/0.01, addr 3> on usbus0 > kbd2 at ukbd0 > Trying to mount root from ufs:/dev/ada0p2 [rw]... > em0: link state changed to UP > > > thanks all > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 15:55:36 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 6B453627 for ; Tue, 5 Mar 2013 15:55:36 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.78]) by mx1.freebsd.org (Postfix) with ESMTP id B3252A2C for ; Tue, 5 Mar 2013 15:55:34 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UCuDD-0007As-Uh; Tue, 05 Mar 2013 16:55:33 +0100 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UCuDD-00032g-PH; Tue, 05 Mar 2013 16:55:31 +0100 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: "freebsd-stable@freebsd.org" , "Johan Hendriks" Subject: Re: make_dev_physpath_alias References: <5135FCEC.7090105@gmail.com> <51361204.2040405@gmail.com> Date: Tue, 05 Mar 2013 16:55:30 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <51361204.2040405@gmail.com> User-Agent: Opera Mail/12.14 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: 908c7028e6cdb5aff7650a93df4e53f5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 15:55:36 -0000 On Tue, 05 Mar 2013 16:40:52 +0100, Johan Hendriks wrote: > Ronald Klop schreef: >> On Tue, 05 Mar 2013 15:10:52 +0100, Johan Hendriks >> wrote: >> >>> Hello all. >>> >>> I have a supermicro 16 bay box with a LSI 9211-8i card. >>> We use it for temp data storage, and we wanted to try the l4z >>> compression. >>> >>> After updating the source tree to r247839: and doing a make buildworld >>> cycle all works fine. >>> But at boot time we get some warnings. >>> make_dev_physpath_alias: WARNING - Unable to alias >>> gptid/281951f4-a996-11e1-83eb-00259061b51a to >>> enc@n500304800182f4fd/type@0/slot@1/elmdesc@Slot_01/gptid/281951f4-a996-11e1-83eb-00259061b51a >>> - path too long >>> >>> Do i need to worry!!! >> >> Only if you want to use the paths it can't create. >> >> Ronald. >> > Thanks for the quick responce. > Do you know why it can not create the path. > The system is installed normally, why is it trying to use path names it > can not use? No, I am not familiar with the details of this. What I see is that it tries to make aliases for the device in /dev. So there already is a name for the device you can use, but the alias (which might be a symlink with a convenient name) is not a valid filename, so it is not created. The code in 9-stable is in sys/kern/kern_conf.c line 995. NB: I re-added the mailinglist in the addresses. Other people might know more. Regards, Ronald. > > gr > Johan > >>> Gr >>> johan >>> >>> Here is my full dmesg >>> >>> Copyright (c) 1992-2013 The FreeBSD Project. >>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, >>> 1994 >>> The Regents of the University of California. All rights >>> reserved. >>> FreeBSD is a registered trademark of The FreeBSD Foundation. >>> FreeBSD 9.1-STABLE #0 r247839: Tue Mar 5 13:17:25 CET 2013 >>> root@backup.neuteboom.com:/usr/obj/usr/src/sys/KRNL amd64 >>> CPU: Intel(R) Xeon(R) CPU E31220 @ 3.10GHz (3093.04-MHz K8-class CPU) >>> Origin = "GenuineIntel" Id = 0x206a7 Family = 0x6 Model = 0x2a >>> Stepping = 7 >>> Features=0xbfebfbff >>> Features2=0x1fbae3ff >>> AMD Features=0x28100800 >>> AMD Features2=0x1 >>> TSC: P-state invariant, performance statistics >>> real memory = 34359738368 (32768 MB) >>> avail memory = 33075630080 (31543 MB) >>> Event timer "LAPIC" quality 600 >>> ACPI APIC Table: >>> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs >>> FreeBSD/SMP: 1 package(s) x 4 core(s) >>> cpu0 (BSP): APIC ID: 0 >>> cpu1 (AP): APIC ID: 2 >>> cpu2 (AP): APIC ID: 4 >>> cpu3 (AP): APIC ID: 6 >>> ioapic0 irqs 0-23 on motherboard >>> kbd1 at kbdmux0 >>> ctl: CAM Target Layer loaded >>> acpi0: on motherboard >>> acpi0: Power Button (fixed) >>> acpi0: reservation of 67, 1 (4) failed >>> cpu0: on acpi0 >>> cpu1: on acpi0 >>> cpu2: on acpi0 >>> cpu3: on acpi0 >>> hpet0: iomem 0xfed00000-0xfed003ff on >>> acpi0 >>> Timecounter "HPET" frequency 14318180 Hz quality 950 >>> Event timer "HPET" frequency 14318180 Hz quality 550 >>> Event timer "HPET1" frequency 14318180 Hz quality 440 >>> Event timer "HPET2" frequency 14318180 Hz quality 440 >>> Event timer "HPET3" frequency 14318180 Hz quality 440 >>> Event timer "HPET4" frequency 14318180 Hz quality 440 >>> atrtc0: port 0x70-0x77 irq 8 on acpi0 >>> atrtc0: Warning: Couldn't map I/O. >>> Event timer "RTC" frequency 32768 Hz quality 0 >>> attimer0: port 0x40-0x43,0x50-0x53 irq 0 on acpi0 >>> Timecounter "i8254" frequency 1193182 Hz quality 0 >>> Event timer "i8254" frequency 1193182 Hz quality 100 >>> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 >>> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 >>> pcib0: port 0xcf8-0xcff on acpi0 >>> pci0: on pcib0 >>> pcib1: irq 19 at device 6.0 on pci0 >>> pci1: on pcib1 >>> mps0: port 0xe000-0xe0ff mem >>> 0xf7600000-0xf7603fff,0xf7580000-0xf75bffff irq 19 at device 0.0 on >>> pci1 >>> mps0: Firmware: 15.00.00.00, Driver: 14.00.00.01-fbsd >>> mps0: IOCCapabilities: >>> 1285c >>> em0: port 0xf020-0xf03f >>> mem 0xf7800000-0xf781ffff,0xf7825000-0xf7825fff irq 20 at device 25.0 >>> on pci0 >>> em0: Using an MSI interrupt >>> em0: Ethernet address: 00:25:90:75:c8:09 >>> ehci0: mem 0xf7824000-0xf78243ff >>> irq 16 at device 26.0 on pci0 >>> usbus0: EHCI version 1.0 >>> usbus0 on ehci0 >>> pcib2: irq 16 at device 28.0 on pci0 >>> pci2: on pcib2 >>> pcib3: irq 16 at device 28.4 on pci0 >>> pci3: on pcib3 >>> em1: port 0xd000-0xd01f >>> mem 0xf7700000-0xf771ffff,0xf7720000-0xf7723fff irq 16 at device 0.0 >>> on pci3 >>> em1: Using MSIX interrupts with 3 vectors >>> em1: Ethernet address: 00:25:90:75:c8:08 >>> ehci1: mem 0xf7823000-0xf78233ff >>> irq 23 at device 29.0 on pci0 >>> usbus1: EHCI version 1.0 >>> usbus1 on ehci1 >>> pcib4: at device 30.0 on pci0 >>> pci4: on pcib4 >>> vgapci0: mem >>> 0xf5000000-0xf5ffffff,0xf7000000-0xf7003fff,0xf6800000-0xf6ffffff irq >>> 19 at device 3.0 on pci4 >>> isab0: at device 31.0 on pci0 >>> isa0: on isab0 >>> ahci0: port >>> 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf000-0xf01f >>> mem 0xf7822000-0xf78227ff irq 19 at device 31.2 on pci0 >>> ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported >>> ahcich0: at channel 0 on ahci0 >>> ahcich1: at channel 1 on ahci0 >>> ahcich2: at channel 2 on ahci0 >>> ahcich4: at channel 4 on ahci0 >>> ahcich5: at channel 5 on ahci0 >>> pci0: at device 31.3 (no driver attached) >>> acpi_button0: on acpi0 >>> acpi_tz0: on acpi0 >>> acpi_tz1: on acpi0 >>> atkbdc0: port 0x60,0x64 irq 1 on acpi0 >>> atkbd0: irq 1 on atkbdc0 >>> kbd0 at atkbd0 >>> atkbd0: [GIANT-LOCKED] >>> psm0: irq 12 on atkbdc0 >>> psm0: [GIANT-LOCKED] >>> psm0: model IntelliMouse Explorer, device ID 4 >>> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 >>> uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 >>> uart2: <16550 or compatible> port 0x3e8-0x3ef irq 10 on acpi0 >>> orm0: at iomem 0xc0000-0xc7fff,0xce000-0xcefff on >>> isa0 >>> sc0: at flags 0x100 on isa0 >>> sc0: VGA <16 virtual consoles, flags=0x300> >>> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on >>> isa0 >>> ppc0: cannot reserve I/O port range >>> est0: on cpu0 >>> p4tcc0: on cpu0 >>> est1: on cpu1 >>> p4tcc1: on cpu1 >>> est2: on cpu2 >>> p4tcc2: on cpu2 >>> est3: on cpu3 >>> p4tcc3: on cpu3 >>> ZFS filesystem version: 5 >>> ZFS storage pool version: features support (5000) >>> Timecounters tick every 1.000 msec >>> usbus0: 480Mbps High Speed USB v2.0 >>> usbus1: 480Mbps High Speed USB v2.0 >>> ugen0.1: at usbus0 >>> uhub0: on >>> usbus0 >>> ugen1.1: at usbus1 >>> uhub1: on >>> usbus1 >>> ada0 at ahcich0 bus 0 scbus1 target 0 lun 0 >>> ada0: ATA-8 SATA 2.x device >>> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>> ada0: Command Queueing enabled >>> ada0: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) >>> ada0: Previously was known as ad4 >>> ada1 at ahcich1 bus 0 scbus2 target 0 lun 0 >>> ada1: ATA-8 SATA 2.x device >>> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>> ada1: Command Queueing enabled >>> ada1: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) >>> ada1: Previously was known as ad6 >>> ses0 at mps0 bus 0 scbus0 target 8 lun 0 >>> ses0: Fixed Enclosure Services SCSI-5 device >>> ses0: 600.000MB/s transfers >>> ses0: Command Queueing enabled >>> ses0: SCSI-3 ENC Device >>> ses0: probe9,pass1,da0: Element descriptor: 'Slot 01' >>> ses0: probe9,pass1,da0: SAS Device Slot Element: 1 Phys at Slot 0 >>> ses0: phy 0: SATA device >>> ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ec >>> ses0: probe10,pass2,da1: Element descriptor: 'Slot 02' >>> ses0: probe10,pass2,da1: SAS Device Slot Element: 1 Phys at Slot 1 >>> ses0: phy 0: SATA device >>> ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ed >>> ses0: probe11,pass3,da2: Element descriptor: 'Slot 03' >>> ses0: probe11,pass3,da2: SAS Device Slot Element: 1 Phys at Slot 2 >>> ses0: phy 0: SATA device >>> ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ee >>> ses0: probe12,pass4,da3: Element descriptor: 'Slot 04' >>> ses0: probe12,pass4,da3: SAS Device Slot Element: 1 Phys at Slot 3 >>> ses0: phy 0: SATA device >>> ses0: phy 0: parent 500304800182f4ff addr 500304800182f4ef >>> ses0: probe13,pass5,da4: Element descriptor: 'Slot 05' >>> ses0: probe13,pass5,da4: SAS Device Slot Element: 1 Phys at Slot 4 >>> ses0: phy 0: SATA device >>> ses0: phy 0: parent 500304800182f4ff addr 500304800182f4f0 >>> ses0: probe14,pass6,da5: Element descriptor: 'Slot 06' >>> ses0: probe14,pass6,da5: SAS Device Slot Element: 1 Phys at Slot 5 >>> ses0: phy 0: SATA device >>> ses0: phy 0: parent 500304800182f4ff addr 500304800182f4f1 >>> SMP: AP CPU #3 Launched! >>> SMP: AP CPU #1 Launched! >>> SMP: AP CPU #2 Launched! >>> Timecounter "TSC-low" frequency 1546522006 Hz quality 1000 >>> da4 at mps0 bus 0 scbus0 target 13 lun 0 >>> da4: Fixed Direct Access SCSI-6 device >>> da4: 600.000MB/s transfers >>> da4: Command Queueing enabled >>> da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) >>> da3 at mps0 bus 0 scbus0 target 12 lun 0 >>> da3: Fixed Direct Access SCSI-6 device >>> da3: 600.000MB/s transfers >>> da3: Command Queueing enabled >>> da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) >>> da0 at mps0 bus 0 scbus0 target 9 lun 0 >>> da0: Fixed Direct Access SCSI-6 device >>> da0: 600.000MB/s transfers >>> da0: Command Queueing enabled >>> da0: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) >>> da5 at mps0 bus 0 scbus0 target 14 lun 0 >>> da5: Fixed Direct Access SCSI-6 device >>> da5: 600.000MB/s transfers >>> da5: Command Queueing enabled >>> da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) >>> da2 at mps0 bus 0 scbus0 target 11 lun 0 >>> da2: Fixed Direct Access SCSI-6 device >>> da2: 600.000MB/s transfers >>> da2: Command Queueing enabled >>> da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) >>> da1 at mps0 bus 0 scbus0 target 10 lun 0 >>> da1: Fixed Direct Access SCSI-6 device >>> da1: 600.000MB/s transfers >>> da1: Command Queueing enabled >>> da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C) >>> uhub0: 2 ports with 2 removable, self powered >>> uhub1: 2 ports with 2 removable, self powered >>> make_dev_physpath_alias: WARNING - Unable to alias >>> gptid/281951f4-a996-11e1-83eb-00259061b51a to >>> enc@n500304800182f4fd/type@0/slot@1/elmdesc@Slot_01/gptid/281951f4-a996-11e1-83eb-00259061b51a >>> - path too long >>> make_dev_physpath_alias: WARNING - Unable to alias >>> gptid/87f6d404-a997-11e1-83eb-00259061b51a to >>> enc@n500304800182f4fd/type@0/slot@2/elmdesc@Slot_02/gptid/87f6d404-a997-11e1-83eb-00259061b51a >>> - path too long >>> make_dev_physpath_alias: WARNING - Unable to alias >>> gptid/affdfc28-a997-11e1-83eb-00259061b51a to >>> enc@n500304800182f4fd/type@0/slot@3/elmdesc@Slot_03/gptid/affdfc28-a997-11e1-83eb-00259061b51a >>> - path too long >>> make_dev_physpath_alias: WARNING - Unable to alias >>> gptid/c8eab51d-a997-11e1-83eb-00259061b51a to >>> enc@n500304800182f4fd/type@0/slot@4/elmdesc@Slot_04/gptid/c8eab51d-a997-11e1-83eb-00259061b51a >>> - path too long >>> ugen0.2: at usbus0 >>> uhub2: >> 2> on usbus0 >>> ugen1.2: at usbus1 >>> uhub3: >> 2> on usbus1 >>> make_dev_physpath_alias: WARNING - Unable to alias >>> gptid/e4b1f963-a997-11e1-83eb-00259061b51a to >>> enc@n500304800182f4fd/type@0/slot@5/elmdesc@Slot_05/gptid/e4b1f963-a997-11e1-83eb-00259061b51a >>> - path too long >>> make_dev_physpath_alias: WARNING - Unable to alias >>> gptid/faa2b1ca-a997-11e1-83eb-00259061b51a to >>> enc@n500304800182f4fd/type@0/slot@6/elmdesc@Slot_06/gptid/faa2b1ca-a997-11e1-83eb-00259061b51a >>> - path too long >>> Root mount waiting for: usbus1 usbus0 >>> uhub2: 6 ports with 6 removable, self powered >>> uhub3: 6 ports with 6 removable, self powered >>> ugen0.3: at usbus0 >>> ums0: >> rev 1.10/0.01, addr 3> on usbus0 >>> ums0: 3 buttons and [Z] coordinates ID=0 >>> ukbd0: >> 0/0, rev 1.10/0.01, addr 3> on usbus0 >>> kbd2 at ukbd0 >>> Trying to mount root from ufs:/dev/ada0p2 [rw]... >>> em0: link state changed to UP >>> >>> >>> thanks all >>> >>> >>> >>> _______________________________________________ >>> freebsd-stable@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to >>> "freebsd-stable-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to >> "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 18:01:49 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 04180635 for ; Tue, 5 Mar 2013 18:01:49 +0000 (UTC) (envelope-from kron24@gmail.com) Received: from mail-ee0-f41.google.com (mail-ee0-f41.google.com [74.125.83.41]) by mx1.freebsd.org (Postfix) with ESMTP id 96EC72B0 for ; Tue, 5 Mar 2013 18:01:48 +0000 (UTC) Received: by mail-ee0-f41.google.com with SMTP id c13so5035435eek.28 for ; Tue, 05 Mar 2013 10:01:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=lB4FpaI54J0Cy2faY+YaOsR86eFYwOeYmu9Lb8/YbG8=; b=xraFWeD7Z5Ktf64hLpOauVryOusp85NXdlpQ9xSuLq+DQ829DTnotlcnaWOlAtI6I9 8bbU3Uh4Pyt+EttAEkslqC/V/EnEJjdPDHb6wgWKC3rSUEFIEU1p275oZLA1Lt3+lW7s +RmSzSKbQAs8wwH3du6NB+6Aj7x5Nv5shORAPr2Q5eFhiq2YIFvjFxYFMIPStJ3rcIli 5F8M3ttr9JkzTHBsFagrp5m4ggdg51a0qRLeeQEe8DfQEkugC/BhZV9xWGbDDhqJe+CI UVivEFU71kT4I/LgDpF58jZOC3QST4R8cL9Sve/bRtiq84Zr/69/DTm8wXCejaQjWorI PdGA== X-Received: by 10.14.207.200 with SMTP id n48mr73242949eeo.4.1362506501982; Tue, 05 Mar 2013 10:01:41 -0800 (PST) Received: from nbvk.local (uidzr185150.sattnet.cz. [212.96.185.150]) by mx.google.com with ESMTPS id d47sm38646122eem.9.2013.03.05.10.01.40 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 05 Mar 2013 10:01:41 -0800 (PST) Message-ID: <51363303.5080300@gmail.com> Date: Tue, 05 Mar 2013 19:01:39 +0100 From: kron User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130304 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: reproducible "panic: page fault" with clang-compiled nvidia-driver Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 18:01:49 -0000 Hello, I have 100% reproducible "page fault" kernel panics on 9-STABLE (FreeBSD 9.1-STABLE #0 r247842M) It needs two conditions together: 1. nvidia-driver built by clang 2. nvidia_load="YES" in loader.conf On system startup I get (for example): #1 0xffffffff80473164 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0xffffffff804735a4 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0xffffffff806c1925 in trap_fatal (frame=, eva=) at /usr/src/sys/amd64/amd64/trap.c:878 #4 0xffffffff806c1cc3 in trap_pfault (frame=0x0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:735 #5 0xffffffff806c130e in trap (frame=0xffffff83e43e6710) at /usr/src/sys/amd64/amd64/trap.c:463 #6 0xffffffff806ab893 in calltrap () at exception.S:228 #7 0xffffffff804b5d05 in propagate_priority (td=) at /usr/src/sys/kern/subr_turnstile.c:277 #8 0xffffffff804b6563 in turnstile_wait (ts=, owner=, queue=) at /usr/src/sys/kern/subr_turnstile.c:743 #9 0xffffffff80461e08 in _mtx_lock_sleep (m=0xffffffff80abf800, tid=18446741874843076896, opts=, file=0xfffffe000c830d80 "Hw\xa4\200\xff\xff\xff\xff", line=0) at /usr/src/sys/kern/kern_mutex.c:471 #10 0xffffffff8039132e in usbd_do_request_flags (udev=0xfffffe000c3a1000, mtx=0x0, req=0xffffff83e43e6990, data=, flags=, actlen=0x0, timeout=255) at /usr/src/sys/dev/usb/usb_request.c:705 #11 0xffffffff8039150f in usbd_req_reset_port (udev=0xfffffe000c3a1000, mtx=0x0, port=1 '\001') at /usr/src/sys/dev/usb/usb_request.c:1674 #12 0xffffffff8038def3 in uhub_explore (udev=0xfffffe000c3a1000) at /usr/src/sys/dev/usb/usb_hub.c:424 #13 0xffffffff8037647a in usb_bus_explore (pm=) at /usr/src/sys/dev/usb/controller/usb_controller.c:359 #14 0xffffffff803903ff in usb_process (arg=0xffffff8000b15db0) at /usr/src/sys/dev/usb/usb_process.c:169 #15 0xffffffff80447645 in fork_exit ( callout=0xffffffff803902f0 , arg=0xffffff8000b15db0, frame=0xffffff83e43e6b00) at /usr/src/sys/kern/kern_fork.c:988 #16 0xffffffff806abdce in fork_trampoline () at exception.S:602 #17 0x0000000000000000 in ?? () The "caller" bellow _mtx_lock_sleep varies - I saw kern_intr.c, tty.c, and maybe others beside usb_process.c pasted above. If I build nvidia-driver by gcc or postpone nvidia loading to a later moment (f.ex. via rc.local) or both, the system boots OK. I've found a related thread (I think) in freebsd-current@: "sysctl -a causes kernel trap 12" [1]. Unfortunately, it ended without a clear conclusion. Is any developer interested in this? I can crash on demand :-) BR Oli [1] http://lists.freebsd.org/pipermail/freebsd-current/2013-January/038969.html From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 18:17:52 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 56B5CB0C; Tue, 5 Mar 2013 18:17:52 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qa0-f51.google.com (mail-qa0-f51.google.com [209.85.216.51]) by mx1.freebsd.org (Postfix) with ESMTP id 0B5A1402; Tue, 5 Mar 2013 18:17:51 +0000 (UTC) Received: by mail-qa0-f51.google.com with SMTP id cr7so2085555qab.10 for ; Tue, 05 Mar 2013 10:17:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=+zDQG7mzu/vM8X99OkKgTJVOjVjvxnqd1a+n89P9WM4=; b=HfYg8O6HqMp/D+hlwSuXkxUvDiKULMuqqCBijgnk7F/71ZzVcQwkpiiGqBLdPOBCCz 2nat+U1PqpQpJTD7m45+5OZEGBePOuOKmRgaNi2Pfs1TDUEM3iivNaYje5kIi1zg0ZzN etQ1h8EFLet7GFOx1BF80QhHUK6QA4VVO0sBDyTWtLHsljls36qZdI5Xj0ibq7J3YtMO qSS34BETqj3K5RSCgmvpz78V7l/J2R77JQix+gLclk3UbSiH1yor1mwpLx7jxhEehgKR T975Yv2nuY3R8FcIE+65/ylJQ4Xe7x4HhbJ+xKM74BN4j5UaGU0vqX3D4IfxemS4gO5k +qvQ== MIME-Version: 1.0 X-Received: by 10.49.94.208 with SMTP id de16mr13530224qeb.22.1362507471205; Tue, 05 Mar 2013 10:17:51 -0800 (PST) Received: by 10.49.106.233 with HTTP; Tue, 5 Mar 2013 10:17:50 -0800 (PST) In-Reply-To: <20130305152252.GA52706@in-addr.com> References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <201303050540.r255ecEC083742@hergotha.csail.mit.edu> <20130305152252.GA52706@in-addr.com> Date: Tue, 5 Mar 2013 10:17:50 -0800 Message-ID: Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? From: Freddie Cash To: Gary Palmer Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Steven Hartland , stable@freebsd.org, Garrett Wollman X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 18:17:52 -0000 On Tue, Mar 5, 2013 at 7:22 AM, Gary Palmer wrote: > Just as a note that there was a page I read in the past few months > that pointed out that having a huge ARC may not always be in the best > interests of the system. Some operation on the filesystem (I forget > what, apologies) caused the system to churn through the ARC and discard > most of it, while regular I/O was blocked > Huh. What timing. I've been fighting with our largest ZFS box (128 GB of RAM, 16 CPU cores, 2x SSD for SLOG, 2x SSD for L2ARC, 45x 2 TB HD for pool in 6-driive raidz2 vdevs) for the past week trying to figure out why ZFS send/recv just hangs after awhile. Everything is stuck in "D" in "ps ax" output, and top show the l2arc_feed_ thread using 100% of one CPU. Even removing the L2ARC devices from the pool doesn't help, just slows the amount of time until the "hang". ARC was configured for 120 GB, with arc_meta_limit set to 90 GB. Yes, dedup and compression are enabled (it's a backups storage box, and we get over 5x combined dedup/compress ratio). After several hours of running, the ARC and wired would get up to 100+ GB, and the box would spend most of its time "spinning", with almost 0 I/O to the pool (only a few KB/s of reads in "zpool iostat 1" or "gstat"). ZFS send/recv would eventually complete, but what used to take 15-20 minutes would take 6-8 hours to complete. I've reduced the ARC to only 32 GB, with arc_meta set to 28 GB, and things are running much smoother now (50-200 MB/s writes for 3-5 seconds every 10s), and send/recv is back down to 10-15 minutes. Who would have thought "too much RAM" would be an issue? Will play with this over the next couple of days with different ARC max settings to see where the problems start. All of our ZFS boxes until this one had under 64 GB of RAM. (And we had issues with dedupe enabled on boxes with too little RAM, as in under 32 GB.) -- Freddie Cash fjwcash@gmail.com From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 18:29:10 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8C31DE40 for ; Tue, 5 Mar 2013 18:29:10 +0000 (UTC) (envelope-from david@catwhisker.org) Received: from albert.catwhisker.org (m209-73.dsl.rawbw.com [198.144.209.73]) by mx1.freebsd.org (Postfix) with ESMTP id 5326C66B for ; Tue, 5 Mar 2013 18:29:10 +0000 (UTC) Received: from albert.catwhisker.org (localhost [127.0.0.1]) by albert.catwhisker.org (8.14.6/8.14.6) with ESMTP id r25IT4Oi032113; Tue, 5 Mar 2013 10:29:04 -0800 (PST) (envelope-from david@albert.catwhisker.org) Received: (from david@localhost) by albert.catwhisker.org (8.14.6/8.14.6/Submit) id r25IT4A5032112; Tue, 5 Mar 2013 10:29:04 -0800 (PST) (envelope-from david) Date: Tue, 5 Mar 2013 10:29:04 -0800 From: David Wolfskill To: kron Subject: Re: reproducible "panic: page fault" with clang-compiled nvidia-driver Message-ID: <20130305182904.GI13861@albert.catwhisker.org> References: <51363303.5080300@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NyChO5MpGs3JHJbz" Content-Disposition: inline In-Reply-To: <51363303.5080300@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 18:29:10 -0000 --NyChO5MpGs3JHJbz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 05, 2013 at 07:01:39PM +0100, kron wrote: > Hello, >=20 > I have 100% reproducible "page fault" kernel panics on 9-STABLE > (FreeBSD 9.1-STABLE #0 r247842M) >=20 > It needs two conditions together: > 1. nvidia-driver built by clang > 2. nvidia_load=3D"YES" in loader.conf Hmmm... I don't see a problem; I am running: FreeBSD g1-235.catwhisker.org 9.1-STABLE FreeBSD 9.1-STABLE #393 r247828M/= 247839: Tue Mar 5 05:11:27 PST 2013 root@g1-235.catwhisker.org:/usr/ob= j/usr/src/sys/CANARY i386 using nvidia-driver built by clang -- e..g: =2E.. =3D=3D=3D> Building for nvidia-driver-310.32 =3D=3D=3D> src (all) @ -> /usr/src/sys machine -> /usr/src/sys/i386/include x86 -> /usr/src/sys/x86/include =2E.. clang -O2 -pipe -fno-strict-aliasing -DNV_VERSION_STRING=3D\"310.32\" -D__K= ERNEL__ -DNVRM -Wno-unused-function -Wuninitialized -O -UDEBUG -U_DEBUG -DN= DEBUG -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I. -I. -I@ -I@/contrib/alt= q -fno-common -mno-aes -mno-avx -mno-mmx -mno-sse -msoft-float -ffreestan= ding -fstack-protector -std=3Diso9899:1999 -Qunused-arguments -fstack-prote= ctor -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissin= g-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sig= n -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -W= no-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses= -equality -c nvidia_ctl.c clang -O2 -pipe -fno-strict-aliasing -DNV_VERSION_STRING=3D\"310.32\" -D__K= ERNEL__ -DNVRM -Wno-unused-function -Wuninitialized -O -UDEBUG -U_DEBUG -DN= DEBUG -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I. -I. -I@ -I@/contrib/alt= q -fno-common -mno-aes -mno-avx -mno-mmx -mno-sse -msoft-float -ffreestan= ding -fstack-protector -std=3Diso9899:1999 -Qunused-arguments -fstack-prote= ctor -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissin= g-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sig= n -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -W= no-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses= -equality -c nvidia_dev.c =2E... Referring to , the most recent commits to stable/9 (in most-recent-first sequence) are: * r247850 - stable/9/sys/dev/mxge * r247846 - stable/9/libexec/rtld-elf * r247828 - in stable: 8/sys/dev/mfi 9/sys * r247827 - in stable: 8/share/man/man4 8/ * r247803 - stable/9/sbin/devd * r247802 - stable/9/sbin/devd so I believe that your sources @r247842M should be equivalent to mine @r247828M. (The cause for the "M" flag in my case is the change I made locally to src/sys/conf/newvers.sh to report the SVN revision a bit differently.) Note that I'm running i386; that may be a difference. > ... Peace, david --=20 David H. Wolfskill david@catwhisker.org Taliban: Evil men with guns afraid of truth from a 14-year old girl. See http://www.catwhisker.org/~david/publickey.gpg for my public key. --NyChO5MpGs3JHJbz Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlE2OW8ACgkQmprOCmdXAD3kCgCbBw0Wlnja838U/aorB50BLkWd EkEAn0on/1n7rnEDd1NomTm7LXTORT70 =Kh9J -----END PGP SIGNATURE----- --NyChO5MpGs3JHJbz-- From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 22:09:39 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7920B1B2 for ; Tue, 5 Mar 2013 22:09:39 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:48]) by mx1.freebsd.org (Postfix) with ESMTP id 5DCCBF9B for ; Tue, 5 Mar 2013 22:09:38 +0000 (UTC) Received: from omta10.emeryville.ca.mail.comcast.net ([76.96.30.28]) by qmta05.emeryville.ca.mail.comcast.net with comcast id 7xWM1l0010cQ2SLA5y9d16; Tue, 05 Mar 2013 22:09:37 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta10.emeryville.ca.mail.comcast.net with comcast id 7y9c1l00C1t3BNj8Wy9c6S; Tue, 05 Mar 2013 22:09:36 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 348B173A31; Tue, 5 Mar 2013 14:09:36 -0800 (PST) Date: Tue, 5 Mar 2013 14:09:36 -0800 From: Jeremy Chadwick To: Andriy Gapon Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305220936.GA54718@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5135D275.3050500@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362521377; bh=9tEkI7sf4HHRtV5mRCQvuEVF4VrOaz8kR3bM7crWW9Q=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=ZLRMo1erUwpejESZPYN4+JxFztsyk69c4v+NWRp2ELXgKmaFiECuQplz/VncioP1s UYIwET1VgF7q7Q6cm/83pEdCiaolcVdBXMsO2nLk7BuMT4fBZ89UdSNLry1x42e83j PnkQj524oarnAaiyneew9wzLT8MCvDVqRnSDEJt9mKq7uSAzK6seSVQnb2AKs0k7H/ DLMIJjbjWHnnvnRJ9Xggthafyfq8Lo7vrHuk0gK8mfLDgunvs2BxwNQbUNTh+xgRJz 5zCWfCNTwpKX21uEcf7g5s8MGFq91A1Jp1tgyiu7z9OoEkM2DJv5hmOaPbJJkQI2I/ Z4kHu5ZKKhvvQ== Cc: freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 22:09:39 -0000 On Tue, Mar 05, 2013 at 01:09:41PM +0200, Andriy Gapon wrote: > Completely unrelated to the main thread: > > on 05/03/2013 07:32 Jeremy Chadwick said the following: > > That said, I still do not recommend ZFS for a root filesystem > > Why? Too long a history of problems with it and weird edge cases (keep reading); the last thing an administrator wants to deal with is a system where the root filesystem won't mount/can't be used. It makes recovery or problem-solving (i.e. the server is not physically accessible given geographic distances) very difficult. Are there still issues booting from raidzX or stripes or root pools with multiple vdevs? What about with cache or log devices? My point/opinion: UFS for a root filesystem is guaranteed to work without any fiddling about and, barring drive failures or controller issues, is (again, my opinion) a lot more risk-free than ZFS-on-root. I say that knowing lots of people use ZFS-on-root, which is great -- I just wonder how many of them have tested all the crazy scenarios and then tried to boot from things. > > (this biting people still happens even today) > > What exactly? http://lists.freebsd.org/pipermail/freebsd-questions/2013-February/249363.html http://lists.freebsd.org/pipermail/freebsd-questions/2013-February/249387.html http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072398.html The last one got solved: http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072406.html http://lists.freebsd.org/pipermail/freebsd-stable/2013-February/072408.html I know factually you're aware of the zpool.cache ordeal (which may or may not be the cause of the issue shown in the 2nd URL above), but my point is that still at this moment in time -- barring someone using a stable/9 ISO for installation -- there still seem to be issues. Things on the mailing lists which go unanswered/never provide closure of this nature are numerous, and that just adds to my concern. > > - Disks are GPT and are *partitioned, and ZFS refers to the partitions > > not the raw disk -- this matters (honest, it really does; the ZFS > > code handles things differently with raw disks) > > Not on FreeBSD as far I can see. My statement comes from here (first line in particular): http://lists.freebsd.org/pipermail/freebsd-questions/2013-January/248697.html If this is wrong/false, then this furthers my point about kernel folks who are in-the-know needing to chime in and help stop the misinformation. The rest of us are just end-users, often misinformed. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 22:18:36 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D7FB633B; Tue, 5 Mar 2013 22:18:36 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-oa0-f49.google.com (mail-oa0-f49.google.com [209.85.219.49]) by mx1.freebsd.org (Postfix) with ESMTP id 69C97FF0; Tue, 5 Mar 2013 22:18:36 +0000 (UTC) Received: by mail-oa0-f49.google.com with SMTP id j6so11835386oag.36 for ; Tue, 05 Mar 2013 14:18:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=Whqh69ZBV/NE/ZRsJ3BlwIFWpDTrRxncyZ9JkT6Dw/M=; b=mWnsz5Wdf89sU39Bmp78m7bpy0qzsTfxQJK7/YNTppNSrT+MLOk2YERT0FgHoz3kuj Bit/dv3OppCydKCkqm408x10EX4ozbxRxA5CaaJXOPS9/IuPeYNWra7xA658C5HfEC4K 0WCTjvu+qMO1RPXIT26u542TNTAOTJTcxTy0orHDPkgAXYnKK16ay/UHvefVId3uaxF0 aT8Gle9TTVvRR8xhRiXZaJ1kttgRkWcyVuZex9W6wfEL2FY+DQ+NHsLMTkB6rBVAnIgD bJllzVtWdZQ655eP2dT+1GGl5bnVk2QSLoJOBF9d2LvHQYYvstKFU7r1y+FwhH8KXYKE Vhig== MIME-Version: 1.0 X-Received: by 10.60.172.237 with SMTP id bf13mr21283067oec.83.1362521910486; Tue, 05 Mar 2013 14:18:30 -0800 (PST) Received: by 10.76.0.197 with HTTP; Tue, 5 Mar 2013 14:18:30 -0800 (PST) In-Reply-To: <20130305220936.GA54718@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> <20130305220936.GA54718@icarus.home.lan> Date: Tue, 5 Mar 2013 14:18:30 -0800 Message-ID: Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? From: Freddie Cash To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Stable , Andriy Gapon X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 22:18:36 -0000 On Tue, Mar 5, 2013 at 2:09 PM, Jeremy Chadwick wrote: > On Tue, Mar 05, 2013 at 01:09:41PM +0200, Andriy Gapon wrote: > > > > - Disks are GPT and are *partitioned, and ZFS refers to the partitions > > > not the raw disk -- this matters (honest, it really does; the ZFS > > > code handles things differently with raw disks) > > > > Not on FreeBSD as far I can see. > > My statement comes from here (first line in particular): > > > http://lists.freebsd.org/pipermail/freebsd-questions/2013-January/248697.html > > If this is wrong/false, then this furthers my point about kernel folks > who are in-the-know needing to chime in and help stop the > misinformation. The rest of us are just end-users, often misinformed. > This has been false from the very first import of ZFS into FreeBSD 7-STABLE. Pawel even mentions that GEOM allows the use of the cache on partitions with ZFS somewhere around that time frame. Considering he did the initial import of ZFS into FreeBSD, I don't think you can find a more canonical answer. :) This is one of the biggest differences between the Solaris-based ZFS and the FreeBSD-based ZFS. It's too bad this mis-information has basically become a meme. :( -- Freddie Cash fjwcash@gmail.com From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 22:42:26 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 634C275C for ; Tue, 5 Mar 2013 22:42:26 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:16]) by mx1.freebsd.org (Postfix) with ESMTP id 3C431167 for ; Tue, 5 Mar 2013 22:42:26 +0000 (UTC) Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51]) by qmta01.emeryville.ca.mail.comcast.net with comcast id 7pJk1l00J16AWCUA1yiSXw; Tue, 05 Mar 2013 22:42:26 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta06.emeryville.ca.mail.comcast.net with comcast id 7yiR1l00C1t3BNj8SyiR0d; Tue, 05 Mar 2013 22:42:25 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 2688073A31; Tue, 5 Mar 2013 14:42:25 -0800 (PST) Date: Tue, 5 Mar 2013 14:42:25 -0800 From: Jeremy Chadwick To: Freddie Cash Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130305224225.GA55551@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> <20130305220936.GA54718@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362523346; bh=PQP8KkKjlOarKOwyUKlz5Vtun06WEsF3zkGAhx2gnyU=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=FfRu3zkUbR8im3dXwFoe2sQcrYRaiwX+LhH+9ml8rEWovFhk8HSKVV7GW9GR9MjbT Bftz3yMcgTkrDwLuFsWQGooOXu5jSSShJPZx++q1maflLqgk8u/TSd+jhixuvjXqHZ QP3N+B2SqRciQI6EeUDmvNciDYU6OgctKko98gsqTYNHsLz55eS0RwKZWydVdpBD4q MyZ+CGaz1aAqftHjOVWPzYDiiomEj/GitazcovR0mF3JY1+gDxYqIfaj6tnzeOriTo EkHlcecEeqKzJFD7/pvRFelC3pwWsmG/ZvnwSbMut+VmpgQdwS4u0/GabzMIO9A92A 8EAH+k2/iENyw== Cc: FreeBSD Stable , Andriy Gapon X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 22:42:26 -0000 On Tue, Mar 05, 2013 at 02:18:30PM -0800, Freddie Cash wrote: > On Tue, Mar 5, 2013 at 2:09 PM, Jeremy Chadwick wrote: > > > On Tue, Mar 05, 2013 at 01:09:41PM +0200, Andriy Gapon wrote: > > > > > > - Disks are GPT and are *partitioned, and ZFS refers to the partitions > > > > not the raw disk -- this matters (honest, it really does; the ZFS > > > > code handles things differently with raw disks) > > > > > > Not on FreeBSD as far I can see. > > > > My statement comes from here (first line in particular): > > > > > > http://lists.freebsd.org/pipermail/freebsd-questions/2013-January/248697.html > > > > If this is wrong/false, then this furthers my point about kernel folks > > who are in-the-know needing to chime in and help stop the > > misinformation. The rest of us are just end-users, often misinformed. > > This has been false from the very first import of ZFS into FreeBSD > 7-STABLE. Pawel even mentions that GEOM allows the use of the cache on > partitions with ZFS somewhere around that time frame. Considering he did > the initial import of ZFS into FreeBSD, I don't think you can find a more > canonical answer. :) > > This is one of the biggest differences between the Solaris-based ZFS and > the FreeBSD-based ZFS. This is good (excellent) information to know -- thank you for clearing that up. > It's too bad this mis-information has basically become a meme. :( Such is the case with FreeBSD's ZFS in general, solely because of the fact that the number of people who can answer the deep technical questions are few. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 23:03:05 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4274DD99; Tue, 5 Mar 2013 23:03:05 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id BD98422F; Tue, 5 Mar 2013 23:03:04 +0000 (UTC) Received: from digsys200-136.pip.digsys.bg (digsys200-136.pip.digsys.bg [193.68.136.200]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r25N2rK7058038 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 6 Mar 2013 01:02:54 +0200 (EET) (envelope-from daniel@digsys.bg) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? From: Daniel Kalchev In-Reply-To: <20130305220936.GA54718@icarus.home.lan> Date: Wed, 6 Mar 2013 01:02:49 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> <20130305220936.GA54718@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1499) Cc: freebsd-stable@FreeBSD.org, Andriy Gapon X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 23:03:05 -0000 On Mar 6, 2013, at 12:09 AM, Jeremy Chadwick wrote: > I say that knowing lots of people use ZFS-on-root, which is great -- I > just wonder how many of them have tested all the crazy scenarios and > then tried to boot from things. I have verified that ZFS-on-root works reliably in all of the following = scenarios: single disk, one mirror vdev, many mirror vdevs, raidz. = Haven't found the time to test many raidz vdevs, I admit. :) Combined with "boot environments" (that can be served many different = ways), ZFS on root is short of a miracle. ZFS on FreeBSD has some issues, mostly with huge installations and = defaults/tuning, but not really with ZFS-on-root. Of course, if for example, you follow stable, you should be prepared = with alternative boot media that supports the current zpool/zfs = versions. But this is small cost to pay. Daniel= From owner-freebsd-stable@FreeBSD.ORG Tue Mar 5 23:08:38 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 82FD510B; Tue, 5 Mar 2013 23:08:38 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id DB92A26D; Tue, 5 Mar 2013 23:08:37 +0000 (UTC) Received: from digsys200-136.pip.digsys.bg (digsys200-136.pip.digsys.bg [193.68.136.200]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r25N8Wrb058505 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 6 Mar 2013 01:08:33 +0200 (EET) (envelope-from daniel@digsys.bg) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? From: Daniel Kalchev In-Reply-To: Date: Wed, 6 Mar 2013 01:08:28 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <22720B5D-7BCA-41BC-B1E8-A2ACB2ADB795@digsys.bg> References: <513524B2.6020600@denninger.net> <1362449266.92708.8.camel@btw.pki2.com> <51355F64.4040409@denninger.net> <201303050540.r255ecEC083742@hergotha.csail.mit.edu> <20130305152252.GA52706@in-addr.com> To: Freddie Cash X-Mailer: Apple Mail (2.1499) Cc: Garrett Wollman , stable@freebsd.org, Steven Hartland X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2013 23:08:38 -0000 On Mar 5, 2013, at 8:17 PM, Freddie Cash wrote: >=20 > ZFS send/recv would eventually complete, but what used to take 15-20 > minutes would take 6-8 hours to complete. >=20 > I've reduced the ARC to only 32 GB, with arc_meta set to 28 GB, and = things > are running much smoother now (50-200 MB/s writes for 3-5 seconds = every > 10s), and send/recv is back down to 10-15 minutes. >=20 > Who would have thought "too much RAM" would be an issue? >=20 > Will play with this over the next couple of days with different ARC = max > settings to see where the problems start. All of our ZFS boxes until = this > one had under 64 GB of RAM. (And we had issues with dedupe enabled on > boxes with too little RAM, as in under 32 GB.) I have an archive box running very similar setup as yours, but with 72GB = of RAM. I have set both arc_max and arc_meta_limit to 64GB, with no = issues. I am still doing a very complex snapshot reordering between two = pools. One of the pools has dedup enabled (which prompted me to add = RAM), with dedup ratio f over 10x and there are still no issues or any = stalling. The other pool has both dedup and compression for some = filesystems.=20 My only issue is that replacing a drive in either pool takes few days = (6-drive vdevs of 3TB drives). Perhaps the memory indexing/search algorithms are inefficient? Daniel= From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 00:42:23 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 05D639D0; Wed, 6 Mar 2013 00:42:23 +0000 (UTC) (envelope-from prvs=1777d4b2b8=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 508B773F; Wed, 6 Mar 2013 00:42:22 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002567538.msg; Wed, 06 Mar 2013 00:42:15 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 06 Mar 2013 00:42:15 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1777d4b2b8=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <404FB08ACE6E47318A468ABFCE335FA8@multiplay.co.uk> From: "Steven Hartland" To: "Daniel Kalchev" , "Jeremy Chadwick" References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> <20130305220936.GA54718@icarus.home.lan> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Wed, 6 Mar 2013 00:42:17 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: freebsd-stable@FreeBSD.org, Andriy Gapon X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 00:42:23 -0000 ----- Original Message ----- From: "Daniel Kalchev" > On Mar 6, 2013, at 12:09 AM, Jeremy Chadwick wrote: > >> I say that knowing lots of people use ZFS-on-root, which is great -- I >> just wonder how many of them have tested all the crazy scenarios and >> then tried to boot from things. > > I have verified that ZFS-on-root works reliably in all of the following > scenarios: single disk, one mirror vdev, many mirror vdevs, raidz. > Haven't found the time to test many raidz vdevs, I admit. :) One thing to watch out for is the available BIOS boot disks. If you try to do a large RAIDZ with lots of disk as you root pool your likely to run into problems not because of any ZFS issue but simply because the disks the BIOS sees and hence tries to boot may not be what you expect. It won't nessacarily hit you when you first install either, add more disks at a later date to an multi controller LSI 2008 machine and you can end up with not being able to specify the correct set of disks in the bios. Yes learned that one the hard way :( For larger storage boxes we've taken to using two SSD's paritioned and used as the boot, ZIL as neither requires a massive amount space they are a nice fit together. > Combined with "boot environments" (that can be served many different > ways), ZFS on root is short of a miracle. > > ZFS on FreeBSD has some issues, mostly with huge installations and > defaults/tuning, but not really with ZFS-on-root. > > Of course, if for example, you follow stable, you should be prepared > with alternative boot media that supports the current zpool/zfs versions. > But this is small cost to pay. For anyone looking to do a zfs only install I would definitely recommend they look at:- http://mfsbsd.vx.sk/ this little gem + custom script for our env and it takes a few mins from boot to installed machine. Its also our go to "rescue" disk, forget messing around with the standard ISO's and their rescue option which never worked for me when I needed it, this is fully work OS with all the tools you'll want when things go wrong and if there is something missing its easy to compile and build your own version. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 00:56:17 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 64E5AFE8 for ; Wed, 6 Mar 2013 00:56:17 +0000 (UTC) (envelope-from mauzo@anubis.morrow.me.uk) Received: from isis.morrow.me.uk (isis.morrow.me.uk [204.109.63.142]) by mx1.freebsd.org (Postfix) with ESMTP id 42D6A7DD for ; Wed, 6 Mar 2013 00:56:16 +0000 (UTC) Received: from anubis.morrow.me.uk (host86-177-98-144.range86-177.btcentralplus.com [86.177.98.144]) (Authenticated sender: mauzo) by isis.morrow.me.uk (Postfix) with ESMTPSA id D5F87450D5 for ; Wed, 6 Mar 2013 00:56:15 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.7.4 isis.morrow.me.uk D5F87450D5 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=morrow.me.uk; s=dkim201101; t=1362531376; bh=pyAacx+B1QQpvze9CuUqgkKUoMRFQskh4IpWPtovAOU=; h=Date:From:To:Subject:References:In-Reply-To; b=RJ8yiuX3yeP4XsR8TtNvI0aJmUkn/xxATmYZcW+2xSCGuDmZxdaSBH9D4mh2oL1L7 WXoSR7stq3v0MDuNkzueC+fcAoGS+bfEfrU8IhKmyiQO8qCdZ5ZDyi3gT307D9zgu+ PdKdEjpkmthWMhJEx8GArXkrMAYD2stNdNBKSVcE= X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.97.6 at isis.morrow.me.uk Received: by anubis.morrow.me.uk (Postfix, from userid 5001) id 1E9F89542; Wed, 6 Mar 2013 00:56:12 +0000 (GMT) Date: Wed, 6 Mar 2013 00:56:12 +0000 From: Ben Morrow To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130306005607.GA61190@anubis.morrow.me.uk> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> <20130305220936.GA54718@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <404FB08ACE6E47318A468ABFCE335FA8@multiplay.co.uk> X-Newsgroups: gmane.os.freebsd.stable Organization: morrow.me.uk User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 00:56:17 -0000 Quoth "Steven Hartland" : > ----- Original Message ----- > From: "Daniel Kalchev" > > On Mar 6, 2013, at 12:09 AM, Jeremy Chadwick wrote: > > > >> I say that knowing lots of people use ZFS-on-root, which is great -- I > >> just wonder how many of them have tested all the crazy scenarios and > >> then tried to boot from things. > > > > I have verified that ZFS-on-root works reliably in all of the following > > scenarios: single disk, one mirror vdev, many mirror vdevs, raidz. > > Haven't found the time to test many raidz vdevs, I admit. :) > > One thing to watch out for is the available BIOS boot disks. If you try > to do a large RAIDZ with lots of disk as you root pool your likely to > run into problems not because of any ZFS issue but simply because the > disks the BIOS sees and hence tries to boot may not be what you expect. IIRC the Sun documentation recommends keeping the root pool separate from the data pools in any case. Ben From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 05:08:11 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0F49A605 for ; Wed, 6 Mar 2013 05:08:11 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta12.emeryville.ca.mail.comcast.net (qmta12.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:44:76:96:27:227]) by mx1.freebsd.org (Postfix) with ESMTP id E164F106 for ; Wed, 6 Mar 2013 05:08:10 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta12.emeryville.ca.mail.comcast.net with comcast id 7px51l0080b6N64AC58An2; Wed, 06 Mar 2013 05:08:10 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta03.emeryville.ca.mail.comcast.net with comcast id 85891l00K1t3BNj8P5892m; Wed, 06 Mar 2013 05:08:09 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4FA9873A31; Tue, 5 Mar 2013 21:08:09 -0800 (PST) Date: Tue, 5 Mar 2013 21:08:09 -0800 From: Jeremy Chadwick To: Karl Denninger Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130306050809.GA61727@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> <20130305092700.GA43045@icarus.home.lan> <5135EB62.6060006@denninger.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5135EB62.6060006@denninger.net> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362546490; bh=thyNgTRE7QedHJP1E1QIQLxh2Rx4FEPr4hQ3GTmXpiw=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=fcKHjVk7+tBDLDPbIRNjWAlsRQpIQ2bzFMa5WVDcpwgviWrU8vD72nE+zSufvnvBm jT57JHN7OhkznULsGugzmwCpnxJXtvJu+tVu5lJpLgGRASe5cwIub7PS/8N44aNtRb ngVI57dOfEnf9TK3N4UmQBTwucu+IQEjR5lcXo8nV3GRWeXukn8Y2KOXVNDaPKYdJ6 JBrXpEpKzzlVkSCmALWV+lEUk9qdpaQWZfk7sBP9y3Hk+iPYthgQWsO+BcLccH0ElN 4qnBOO7TSJeeb1x30Go81UdvunU4Xeoq8IkOqE5HKNX3Hr+EHP99LDMvc7iGGfRyNF uoREB9UM/KP1A== Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 05:08:11 -0000 On Tue, Mar 05, 2013 at 06:56:02AM -0600, Karl Denninger wrote: > { I've snipped lots of text. For those who are reading this follow-up } > { and wish to read the snipped portions, please see this URL: } > { http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072696.html } > > 1. Is compression enabled? Has it ever been enabled (on any fs) in the > > past (barring pool being destroyed + recreated)? > > > > 2. Is dedup enabled? Has it ever been enabled (on any fs) in the past > > (barring pool being destroyed + recreated)? No answers to questions #1 and #2? (Edit: see below, I believe it's implied neither are used) > > * Describing the stall symptoms; what all does it impact? Can you > > switch VTYs on console when its happening? Network I/O (e.g. SSH'd > > into the same box and just holding down a letter) showing stalls > > then catching up? Things of this nature. > When it happens on my system anything that is CPU-bound continues to > execute. I can switch consoles and network I/O also works. Okay, it sounds like compression and dedup aren't in use/have never been used. The stalling problem with compression and dedup (e.g. if you use either of these features, and it worsens if you use both) results in a full/hard system stall where *everything* is impacted, and has been explained in the past (2nd URL has the explanation): http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012718.html http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012726.html http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012752.html > If I have an iostat running at the time all I/O counters go to and > remain at zero while the stall is occurring, but the process that is > producing the iostat continues to run and emit characters whether it > is a ssh session or on the physical console. What kind of an iostat? iostat(8) or zpool iostat? (Edit: last paragraph of this response says "zpool iostat", which is not the same thing as iostat) Why not gstat(8), e.g. gstat -I500ms, as well? This provides the I/O statistics at a deeper layer, not the ZFS layer. Do the numbers actually change **while the system is stalling**? The answer matters greatly, because it would help indicate if some kernel API requests for I/O statistics are also blocking, or if only *actual I/O (e.g. read() and write() requests)* are blocking. > The CPUs are running and processing, but all threads block if they > attempt access to the disk I/O subsystem, irrespective of the portion > of the disk I/O subsystem they attempt to access (e.g. UFS, swap or > ZFS) I therefore cannot start any new process that requires image > activation. And now you'll need to provide a full diagram of your disk and controller device tree, along with all partitions, slices, and filesystem types. It's best to draw this in ASCII in a tree-like diagram. It will take you 15-20 minutes to do. What's even more concerning: This thread is about ZFS, yet you're saying applications block when they attempt to do I/O to a filesystem ***other than ZFS***. There must be some kind of commonality here, i.e. a single controller is driving both the ZFS and UFS disks, or something along those lines. If there isn't, then there is something within the kernel I/O subsystem that is doing this. Like I said: very deep, very knowledgeable kernel folks are the only ones who can fix this. > > * How long the stall is in duration (ex. if there's some way to > > roughly calculate this using "date" in a shell script) > They're variable. Some last fractions of a second and are not really > all that noticeable unless you happen to be paying CLOSE attention. > Some last a few (5 or so) seconds. The really bad ones last long enough > that the kernel throws the message "swap_pager: indefinite wait buffer". The message "swap_pager: indefinite wait buffer" indicates that some part of the VM is trying to offload pages of memory to swap via standard I/O write requests, and those writes have not come back within kern.hz*20 seconds. That's a very, very long time. > The machine in the general sense never pages. It contains 12GB of RAM > but historically (prior to ZFS being put into service) always showed "0" > for a "pstat -s", although it does have a 20g raw swap partition (to > /dev/da0s1b, not to a zpool) allocated. The swap_pager message implies otherwise. It may be that the programs you're using poll at intervals of, say, 1 second, and swap-out + swap-in occurs very quickly so you never see it. (Edit: next quoted paragraph shows that there ARE pages of memory hitting swap, so "never pages" is false). I do not know the VM subsystem well enough to know what the criteria are for offloading pages of memory to swap -- but it's obviously happening. It may be due to memory pressure, or it may be due to "pages which have not been touched in a long while" -- again, I do not know. This is where "vmstat -s" would be useful. Possibly Alan Cox knows. > During the stalls I cannot run a pstat (I tried; it stalls) but when it > unlocks I find that there is swap allocated, albeit not a ridiculous > amount. ~20,000 pages or so have made it to the swap partition. This is > not behavior that I had seen before on this machine prior to the stall > problem, and with the two tuning tweaks discussed here I'm now up to 48 > hours without any allocation to swap (or any stalls.) This would fall under the same category as your above statement, re: that any kind of I/O blocks until "something" gets released. The whole thing smells of some kind of global mutex or semaphore, which then makes me think of Giant, except that's mostly gone. > > * Contents of /etc/sysctl.conf and /boot/loader.conf (re: "tweaking" > > of the system) > /boot/loader.conf: > > {snip} > > vfs.zfs.arc_max=2000000000 > vfs.zfs.write_limit_override=1024000000 > > {snip} > > The two ZFS-related entries at the end, if present, stop the stalls. I'd like to know which of the two "stops the stalls". The former limits ARC size (at least on FreeBSD it does; when I used Solaris last, the same tunable on Solaris was a "recommendation" than a hard limit), while the latter limits overall "write bandwidth" (for lack of better term). If the former is what addresses the issue, then memory fragmentation or some ARC-related bug is the cause (again I'm speculating). Again: only low-level kernel folks are going to be able to work this one out, with your help. I am at a loss for this problem. To me, in your case, it sounds like you have a multitude of ZFS and UFS disks on the same controller, and it may be that the **controller** is "wedging" on all these I/O requests. I don't use arcmsr(4), but I don't know how to prove if it's arcmsr(4) doing this. Part of me wonders if folks experiencing this are hitting some kind of memory bus limit or something along those lines, and since ZFS tends to shove everything into the ARC then periodically (vfs.zfs.txg.timeout) flush gigantic amounts to disk, I wonder if there's some contention between different drivers/pieces (arcmsr vs. zfs vs. VM vs. ???) causing the issue. Irrelevant comment: you should use human-readable values for those tunables, for legibility; (make sure you use quotes, and do so consistently throughout loader.conf), ex.: vfs.zfs.arc_max="2G" vfs.zfs.write_limit_override="1G" > {snip} > > sysctl.conf contains: > > {snip} > > net.inet.tcp.imcp_may_rst=0 Irrelevant comment: typo in the MIB name here; surprised you haven't seen messages about this on your system consoles ("unknown oid"). > I suspect (but can't yet prove) that wiring shared memory is likely > involved in this. That makes a BIG difference in Postgres performance, > but I can certainly see where a misbehaving ARC cache could "think" that > the (rather large) shared segment that Postgres has (it currently > allocates 1.5G of shared memory and wires it) can or might "get out of > the way." Remove pgsql from the picture and see if you can reproduce the problem. Like I said: a dedicated test box would do you well. :-) FreeBSD's classic shm_xxx(3) stuff has always been painful, in my experience. I had the wonderful pleasure of dealing with it when it came to PHP/PECL's APC, and found that the mmap(2) mechanism works significantly better (and I don't have to futz with stupid sysctls). But this comment does not solve or do you any good. > {snip} > > I'm quite sure I can reproduce the workload that causes the stalls; > populating the backup pack as a separate zfs pool (with zfs send | zfs > recv) was what led to it happening here originally. > > With that said I've got more than 24 hours on the box that exhibited the > problem with the two tunables in /boot/loader.conf and a sentinal > process that is doing a zpool iostat 5 looking for more than one "all > zeros" I/O line sequentially. > > It hasn't happened since I stuck those two lines in there and at this > point two nightly backup runs have gone to completion along with some > fairly heavy user I/O last evening which was plenty of load to provoke > the misbehavior previously. I wish you had just added one of those lines instead of both. Even with just those 2 lines, the possibilities of cause are still extremely many. My entire gut feeling at this point is that there's some kind of controller (as in firmware or driver-level) nonsense going on. You're going to need that test box up and reproducing the problem, and then (I hate to tell you this) you're probably going to have to hire someone from the FreeBSD Project -- as in pay them hourly -- to figure this out. Otherwise, I found this post of Freddie's to be interesting: http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072702.html This is all I can say with regards to this thread at this point. I have absolutely nothing else of worth to add. Anything else I'd say would just be negative/condescending (upon ZFS) and would do no one any good. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 05:16:15 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B440077A for ; Wed, 6 Mar 2013 05:16:15 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:64]) by mx1.freebsd.org (Postfix) with ESMTP id 2CA6B140 for ; Wed, 6 Mar 2013 05:16:14 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta07.emeryville.ca.mail.comcast.net with comcast id 856o1l0030EPchoA75GEVt; Wed, 06 Mar 2013 05:16:14 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta01.emeryville.ca.mail.comcast.net with comcast id 85GD1l00s1t3BNj8M5GEgm; Wed, 06 Mar 2013 05:16:14 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id E13C873A31; Tue, 5 Mar 2013 21:16:13 -0800 (PST) Date: Tue, 5 Mar 2013 21:16:13 -0800 From: Jeremy Chadwick To: Karl Denninger Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130306051613.GA62470@icarus.home.lan> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> <20130305092700.GA43045@icarus.home.lan> <5135EB62.6060006@denninger.net> <20130306050809.GA61727@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130306050809.GA61727@icarus.home.lan> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362546974; bh=aPHThh7B1po5lOHmXLXALa4xc/FmB0Qyz5p+1agt+wo=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=luSFAec6lE1CVJE3XxutoT5zhd3t80LmjmMTosJF1ZXfAKxrQjbqF67tjI2PH4cdP F70C8aRhWs7pEgx45XZ7w41pRbuvfWbwEfd+WhP9VozV+mqoDaIAvz/c1N0YxBrQeQ vK0yN0fOMMF2LoF8JHxVFYRBfIjOPwXxEVEv7M7lmOqYHRbJjGwRsVI1m+Jv0TamGR hb33b1jXatYyaoLJa/GS1hwqeOHgV2xGJlHxVljzLDJ7I1dypEFihX2OFd1NEFD1iv jPnAP6A+9jooZeSTV+5udtY6b0ax5GRC4p3HI6o5YnZXBv8lpratbZDZIE6F7BcaMt ghnSEbSQcziJA== Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 05:16:15 -0000 On Tue, Mar 05, 2013 at 09:08:09PM -0800, Jeremy Chadwick wrote: > > > * How long the stall is in duration (ex. if there's some way to > > > roughly calculate this using "date" in a shell script) > > They're variable. Some last fractions of a second and are not really > > all that noticeable unless you happen to be paying CLOSE attention. > > Some last a few (5 or so) seconds. The really bad ones last long enough > > that the kernel throws the message "swap_pager: indefinite wait buffer". > > The message "swap_pager: indefinite wait buffer" indicates that some > part of the VM is trying to offload pages of memory to swap via standard > I/O write requests, and those writes have not come back within kern.hz*20 > seconds. That's a very, very long time. Two clarification points: 1. The timeout value is passed to msleep(9) and is literally kern.hz*20. Per sys/vm/swap_pager.c: 1216 if (msleep(mreq, VM_OBJECT_MTX(object), PSWP, "swread", hz*20)) { 1217 printf( 1218 "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n", 1219 bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); How that's interpreted is documented in msleep(9): The parameter timo specifies a timeout for the sleep. If timo is not 0, then the thread will sleep for at most timo / hz seconds. If the timeout expires, then the sleep function will return EWOULDBLOCK. 2. The message appears to be for swap I/O *reads*, not writes; at least that's what the "swread" STATE string (you know, what you see in top(1)) implies. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 09:05:26 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1F9F9541 for ; Wed, 6 Mar 2013 09:05:26 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp2.u-psud.fr (smtp2.u-psud.fr [129.175.33.42]) by mx1.freebsd.org (Postfix) with ESMTP id D1579B58 for ; Wed, 6 Mar 2013 09:05:25 +0000 (UTC) Received: from smtp2.u-psud.fr (localhost [127.0.0.1]) by localhost (MTA) with SMTP id 66C0634B66D for ; Wed, 6 Mar 2013 09:55:27 +0100 (CET) Received: from [10.117.40.5] (dns.institutoptique.fr [129.175.196.160]) by smtp2.u-psud.fr (MTA) with ESMTP id 3992B34B643 for ; Wed, 6 Mar 2013 09:55:27 +0100 (CET) Message-ID: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> Subject: Strange reboot since 9.1 From: =?ISO-8859-1?Q?Lo=EFc?= Blot To: "freebsd-stable@freebsd.org" Date: Wed, 06 Mar 2013 09:55:23 +0100 Organization: UNIX Experience FR Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 09:05:26 -0000 Hello, Since FreeBSD 9.1 I have strange problems with the distribution. Some servers are rebooting without any kernel panic, instanly. First i thought it's a problem with my KVM system, but one of my FreeBSD under a Dell R210 have the same problem. The servers concerned are now: - Monitoring server - LDAP test server - Some other servers, randomly (not in production). First i thought it's a problem with my FreeBSD install, then i download another time the ISO but the problem was already here. After i try another thing, install 9.0 and upgrade to 9.1 but same problem. How can i get informations about this problem ? Thanks for advance. -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 09:15:16 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9BEF696F for ; Wed, 6 Mar 2013 09:15:16 +0000 (UTC) (envelope-from jiansong.liu@gmail.com) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) by mx1.freebsd.org (Postfix) with ESMTP id 3E896C26 for ; Wed, 6 Mar 2013 09:15:16 +0000 (UTC) Received: by mail-wi0-f181.google.com with SMTP id hm6so229546wib.2 for ; Wed, 06 Mar 2013 01:15:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=O0PnX8BKKwfRkYcxUI0yuQpXiAHW0JKMCKw5efxuPgg=; b=auTFzp9ku+dyEFPbuvqTJpjZlA8AGO8V+02ExqUKCuejmR8dSxdrFGV5wxHBpm9FTc pko4NDTLBNtQrxQ4LGuNva8spvfXJh1Hccfa/XEpTm3TkaXWBUMG+XFXJ8/G9pD39MSf jMUU3u6H2NUCH1XzCNO50fx+O1gSzEut1QfdtagULeRLM+MdWWFGhR0SaDDK3RuGOkDf KLem339fXzzkstWsG+bao61wiG8X9DDOk7JtHqgRUaQhIthzw5fxy8L4OC+IjOfbIQiY dqRMADvKElHw5DmNjdqNdA+9lErUSlLOjDDCGqYNn7eohCsvboXHDPYMDL/eGhlU7o7Q ONlw== X-Received: by 10.180.105.229 with SMTP id gp5mr19536411wib.10.1362561267089; Wed, 06 Mar 2013 01:14:27 -0800 (PST) MIME-Version: 1.0 Sender: jiansong.liu@gmail.com Received: by 10.194.109.2 with HTTP; Wed, 6 Mar 2013 01:14:06 -0800 (PST) In-Reply-To: References: From: Jiansong Liu Date: Wed, 6 Mar 2013 17:14:06 +0800 X-Google-Sender-Auth: 9-wcJaCrGPHTBPVzMZsmRWV9PJA Message-ID: Subject: Re: bce0: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state = 0x00004006) To: Marc Fournier Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 09:15:16 -0000 Hi Marc, My DELL PE2950 was 9.1-STABLE r246126M which is definitely earlier than Feb 5th, I use it as VirtualBox host (4.2.6) Yesterday it ran into simular problem like you described, but didn't get any lucky, ping is dead. Then I update it to r247836, it happened again today. There's two warning message after boot: bce1: promiscuous mode enabled bce0: promiscuous mode enabled bce0: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state =3D 0x00004006) bce1: bce_pulse(): Warning: bootcode thinks driver is absent! (bc_state =3D 0x00004006) I can't turn promiscuous mode off by type "ifconfig bce0 -promisc". Any comment are appreciated, thanks. Best regards, Jiansong On Fri, Mar 1, 2013 at 12:07 PM, Marc Fournier wrote: > > Hi =E2=80=A6 > > Running a kernel updated on the 24th, I just experienced a total hang = of the ethernet =E2=80=A6 the funny thing is that when I went to the remote= console, the network suddenly came up on its own, after being down for abo= ut 2 hours ... > > I had originally thought it had to do with VirtualBox, since up until now= , the only boxes I'd see exhibiting this were running a VirtualBox VPS, but= this server doesn't have one running =E2=80=A6 > > Has anyone seem this before? > > The odd thing is that the bce driver code hasn't been modified since Nov = 17th, 2012, according to the FBSDID tag in if_bce.c =E2=80=A6 so its not a = change to the driver itself =E2=80=A6 > > I was having some odd issues with another server that has VirtualBox, whe= re similar would happen =E2=80=A6 I reverted back to code around Feb 5th, a= nd it seemed to have gone away =E2=80=A6 am trying to see if I can narrow i= t down to a specific date, but maybe the error message has more meaning for= someone else =E2=80=A6 ? > > Thx =E2=80=A6 > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 09:18:52 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8A861BFA for ; Wed, 6 Mar 2013 09:18:52 +0000 (UTC) (envelope-from dnaeon@gmail.com) Received: from mail-wg0-f44.google.com (mail-wg0-f44.google.com [74.125.82.44]) by mx1.freebsd.org (Postfix) with ESMTP id 094A4CBF for ; Wed, 6 Mar 2013 09:18:51 +0000 (UTC) Received: by mail-wg0-f44.google.com with SMTP id dr12so6971906wgb.11 for ; Wed, 06 Mar 2013 01:18:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=0YkeAgCRAimy9P+/FxaNGJYMc3FMkpYQILwIkeHanMw=; b=pdZUD4bTrAR2jMcrxte6UQySSGoCJwq1WVJYBYwRuixKtjKuc8JELZUy0NfUbojGuf W1OYHXHUhE1KHyDQbaqK3XoCc5o6PJdEWXmJiQL5SSp7HWkKeWYpLM0jsV7cnpNdExxY KIeZvHOx+ah1nWq4p/uXAVR71Skm47mPeAMWV80OK4llG3NLW7nJYGSJgME9Jib8yxk6 7tzFEvssSm2Kj4cfqpWk79ItNMFwi6tPTjIBib9zq27yTRDCkTUrbFF+lNbAHNSnYAb7 W5XohFAcHXLgpmDatvoEDXpDlQfUQSNeG76pPSxTur9OJbpLQG2uzev22R4QrWe7dMys UNsQ== MIME-Version: 1.0 X-Received: by 10.204.8.207 with SMTP id i15mr10708777bki.19.1362561530682; Wed, 06 Mar 2013 01:18:50 -0800 (PST) Received: by 10.204.165.197 with HTTP; Wed, 6 Mar 2013 01:18:50 -0800 (PST) In-Reply-To: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> Date: Wed, 6 Mar 2013 11:18:50 +0200 Message-ID: Subject: Re: Strange reboot since 9.1 From: Marin Atanasov Nikolov To: loic.blot@unix-experience.fr Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 09:18:52 -0000 On Wed, Mar 6, 2013 at 10:55 AM, Lo=EFc Blot = wrote: > Hello, > Hi, > Since FreeBSD 9.1 I have strange problems with the distribution. Some > servers are rebooting without any kernel panic, instanly. First i > thought it's a problem with my KVM system, but one of my FreeBSD under a > Dell R210 have the same problem. > The servers concerned are now: > - Monitoring server > - LDAP test server > - Some other servers, randomly (not in production). > First i thought it's a problem with my FreeBSD install, then i download > another time the ISO but the problem was already here. After i try > another thing, install 9.0 and upgrade to 9.1 but same problem. > How can i get informations about this problem ? > > I've had similar issues with one of my FreeBSD systems. My system had spontaneous reboots without any kernel panic, without any clear evidence of why it happened. After a lot of trials and tests the root cause appeared to be the amount of ZFS snapshots I had, which were more than 1K on a 8G system. Upgrading from 9.0 to 9.1 didn't solve the issue, as clearly I had to do some cleanup of the ZFS snapshots and since then it's more than a month without any reboots. Few pointers that you could use -- get these systems monitored and keep an eye on the monitoring system -- CPU usage, memory, processes, network traffic, etc.. I've noticed that my system was running low on free memory and that later led me to the ZFS snapshots clue. So, my advise is to get first these systems monitored and watch for anything unusual happening. Then further investigate. Good luck. Regards, Marin > Thanks for advance. > -- > Best regards, > > Lo=EFc BLOT, Engineering > UNIX Systems, Security and Networks > http://www.unix-experience.fr > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" --=20 Marin Atanasov Nikolov dnaeon AT gmail DOT com http://www.unix-heaven.org/ From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 09:59:53 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3C879AFF; Wed, 6 Mar 2013 09:59:53 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) by mx1.freebsd.org (Postfix) with ESMTP id 9DA20E40; Wed, 6 Mar 2013 09:59:52 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id r269xjQU054738 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 6 Mar 2013 11:59:48 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <51371391.8040405@digsys.bg> Date: Wed, 06 Mar 2013 11:59:45 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130304 Thunderbird/17.0.3 MIME-Version: 1.0 To: Steven Hartland Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> <20130305220936.GA54718@icarus.home.lan> <404FB08ACE6E47318A468ABFCE335FA8@multiplay.co.uk> In-Reply-To: <404FB08ACE6E47318A468ABFCE335FA8@multiplay.co.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jeremy Chadwick , FreeBSD-STABLE Mailing List , Andriy Gapon X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 09:59:53 -0000 On 06.03.13 02:42, Steven Hartland wrote: > > ----- Original Message ----- From: "Daniel Kalchev" >> On Mar 6, 2013, at 12:09 AM, Jeremy Chadwick wrote: >> >>> I say that knowing lots of people use ZFS-on-root, which is great -- I >>> just wonder how many of them have tested all the crazy scenarios and >>> then tried to boot from things. >> >> I have verified that ZFS-on-root works reliably in all of the following >> scenarios: single disk, one mirror vdev, many mirror vdevs, raidz. >> Haven't found the time to test many raidz vdevs, I admit. :) > > One thing to watch out for is the available BIOS boot disks. If you try > to do a large RAIDZ with lots of disk as you root pool your likely to > run into problems not because of any ZFS issue but simply because the > disks the BIOS sees and hence tries to boot may not be what you expect. A prudent system administrator should understand this issue and verify that whatever (boot) architecture they come up with, is supported by their particular hardware and firmware. This is no different for ZFS than for any other case. The 2nd stage boot from ZFS loader in FreeBSD could in fact end up with it's own drive detection code one day, which will eliminate it's dependence on BIOS at all. For relatively small systems, where the administrator might be careless enough to not consider all scenarios, today's BIOSes already provide support for enough devices (e.n. most motherboards provide 4-6 SATA ports etc). Using separate boot pools of just few devices is what I do for large storage boxes too. Mostly because I want to be able to fiddle with data disks without caring that might impact the OS. Just make sure the BIOS does see these in the drives list it creates. That is, don't put the boot disks at the last positions in your chassis :) -- use the on-board SATA slots that are scanned first -- sadly, almost every vendor provides for such drives placed inside the chassis, which makes it very inconvenient if one of the drives dies. Daniel From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 10:49:52 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 94942ADD; Wed, 6 Mar 2013 10:49:52 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (lrosenman-1-pt.tunnel.tserv8.dal1.ipv6.he.net [IPv6:2001:470:1f0e:3ad::2]) by mx1.freebsd.org (Postfix) with ESMTP id 4F64EF4; Wed, 6 Mar 2013 10:49:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lerctr.org; s=lerami; h=Message-ID:Subject:To:From:Date:Content-Transfer-Encoding:Content-Type:MIME-Version; bh=tAQAYEVg02koOwrKlJnKERvuAHEpjqEJwR5tvv+9A4Y=; b=cX9zzBg965zk3HjU7EobP/ny6yH/vXnO3ZDlN14lSalE+mBVrrjjnjAcXI3sokd8xdUzeqwk7OAzoVqPuCK6FMVTAAmZHdd77ukCMNswrVyWzTp6fhGOJ2ZLwNOxHKJWe+CMdeUA+JLPTa6fn3TwbTVNdDpglGOcq7/RAFHSOnM=; Received: from localhost.lerctr.org ([127.0.0.1]:55686 helo=webmail.lerctr.org) by thebighonker.lerctr.org with esmtpa (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1UDBuv-000HNw-R0; Wed, 06 Mar 2013 04:49:51 -0600 Received: from cpe-72-182-19-162.austin.res.rr.com ([72.182.19.162]) by webmail.lerctr.org with HTTP (HTTP/1.1 POST); Wed, 06 Mar 2013 04:49:48 -0600 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Date: Wed, 06 Mar 2013 04:49:48 -0600 From: Larry Rosenman To: , Subject: Fwd: Re: zfs send/recv invalid data Message-ID: <46d966dd574cd8097d4972213c73e9be@webmail.lerctr.org> X-Sender: ler@lerctr.org User-Agent: Roundcube Webmail/0.8.5 X-Spam-Score: -3.5 (---) X-LERCTR-Spam-Score: -3.5 (---) X-Spam-Report: SpamScore (-3.5/5.0) ALL_TRUSTED=-1, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.628 X-LERCTR-Spam-Report: SpamScore (-3.5/5.0) ALL_TRUSTED=-1, BAYES_00=-1.9, RP_MATCHES_RCVD=-0.628 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 10:49:52 -0000 I forgot to add current/stable to the list TL;DR: there seems(!) to be something(!) unclean about an ssh path between an 8.3-STABLE(r247820) and 10.0-CURRENT(r247826) box such that a zfs send stream is corrupted in transit. below is the thread from -fs about it, with sshd configs from both sides. If I copy the stream it works, but piping through ssh does NOT. -------- Original Message -------- Subject: Re: zfs send/recv invalid data Date: 2013-03-06 04:46 From: Larry Rosenman To: Steven Hartland Cc: Ronald Klop , On 2013-03-06 02:38, Steven Hartland wrote: > ----- Original Message ----- From: "Larry Rosenman" >>>>>>>> I received an "invalid data" in a zfs send (from 8.3) / zfs >>>>>>>> recv (to 10.0) of a -R -I stream. >>>>>>>> What data do I need to gather to figure out what side and >>>>>>>> what's wrong? >>>>>>>> I've already started zpool scrubs on both sides. >>>>>>>> I can insert a tee to grab the stream on either/both sides if >>>>>>>> that would help. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Is the problem repeatable or is it just a network glitch? >>>>>>> Ronald. >>>>>> Repeatable....... >>>>> Here is the exact error message: >>>>> receiving incremental stream of vault/home/ctr@2013-03-05-test3 >>>>> into zroot/backups/TBH/home/ctr@2013-03-05-test3 >>>>> cannot receive incremental stream: invalid backup stream >>>>> this is the script I'm running: >>>>> #!/bin/sh >>>>> DATE=`date "+%Y-%m-%d-BUG-REPRO"` >>>>> DATE2=`date -v "-1d" "+%Y-%m-%d"` >>>>> # snap the source >>>>> ssh root@tbh.lerctr.org zfs snapshot -r vault@${DATE} >>>>> # zfs copy the source to here. >>>>> ssh root@tbh.lerctr.org "zfs send -R -D -I vault@${DATE2} >>>>> vault@${DATE} | \ >>>>> tee /tmp/backup.stream.send.${DATE} | \ >>>>> ssh home.lerctr.org \"tee /tmp/backup.stream.receive.${DATE} >>>>> | zfs recv -u -v -d zroot/backups/TBH\"" >>>>> # make sure we NEVER allow the backup stuff to automount. >>>>> /sbin/zfs list -H -t filesystem -r zroot/backups/TBH| \ >>>>> awk '{printf "/sbin/zfs set canmount=noauto %s\n",$1}' | sh >>>>> both streams are in http://www.lerctr.org/~ler/ZFS_RECV >>>> Your send and receive sides differ, which indicates your ssh >>>> shell my not be clean. >>>> Looking at the receive side its got what looks like a mail >>>> message appended. >>>> I suspect if you manually copy the receive copy to the 10 machine >>>> and >>>> the receive it will work fine. >>> we're copying mail files........ >>> and it still fails.... >>> >> I've put more example send/recv files in that directory. >> we're copying home dirs, which include lots of mail. >> (this one is my wife's) >> Ideas? >> I *CAN* give access to both sides via ssh..... > The copy of the data stream on both sides should be identical > though and its not, which leads me to believe something is > corrupting the data on the way. Try the following:- > >> From source:- > zfs send -R -D -I vault@${DATE2} vault@${DATE} > test.stream > scp test.stream home.lerctr.org:~/ >> From target: > zfs recv -u -v -d zroot/backups/TBH < test.stream > If this works then there is something unclean about your ssh > shell. > Regards > Steve > send side: # zfs send -R -D -I vault@2013-03-05 vault@2013-03-06 >/tmp/send.stream # openssl md5 /tmp/send.stream MD5(/tmp/send.stream)= 9cd1d73ea8411f1c222bc90e7bea3d33 # scp /tmp/send.stream home:/tmp/send.stream send.stream 100% 1180MB 2.5MB/s 07:44 # uname -a FreeBSD thebighonker.lerctr.org 8.3-STABLE FreeBSD 8.3-STABLE #54 r247820: Mon Mar 4 18:08:11 CST 2013 root@thebighonker.lerctr.org:/usr/obj/usr/src/sys/THEBIGHONKER amd64 # Receive side: # uname -a FreeBSD borg.lerctr.org 10.0-CURRENT FreeBSD 10.0-CURRENT #124 r247826: Mon Mar 4 19:59:08 CST 2013 root@borg.lerctr.org:/usr/obj/usr/src/sys/BORG-DTRACE amd64 # openssl md5 /tmp/send.stream MD5(/tmp/send.stream)= 9cd1d73ea8411f1c222bc90e7bea3d33 # zfs recv -F -u -v -d zroot/backups/TBH < /tmp/send.stream # So, you are correct that something(tm) is unclean about the ssh path. adding -current and -stable for diagnosing ssh issue. sshd config on the 8.3-STABLE box: # cat /etc/ssh/sshd_config # $OpenBSD: sshd_config,v 1.87 2012/07/10 02:19:15 djm Exp $ # $FreeBSD: stable/8/crypto/openssh/sshd_config 247521 2013-03-01 02:06:04Z des $ # This is the sshd server system-wide configuration file. See # sshd_config(5) for more information. # This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin # The strategy used for options in the default sshd_config shipped with # OpenSSH is to specify options with their default value where # possible, but leave them commented. Uncommented options override the # default value. # Note that some of FreeBSD's defaults differ from OpenBSD's, and # FreeBSD has a few additional options. #Port 22 #AddressFamily any #ListenAddress 0.0.0.0 #ListenAddress :: # Disable legacy (protocol version 1) support in the server for new # installations. In future the default will change to require explicit # activation of protocol 1 Protocol 2 # HostKey for protocol version 1 #HostKey /etc/ssh/ssh_host_key # HostKeys for protocol version 2 #HostKey /etc/ssh/ssh_host_rsa_key #HostKey /etc/ssh/ssh_host_dsa_key # Lifetime and size of ephemeral version 1 server key #KeyRegenerationInterval 1h #ServerKeyBits 1024 # Logging # obsoletes QuietMode and FascistLogging #SyslogFacility AUTH #LogLevel INFO # Authentication: #LoginGraceTime 2m PermitRootLogin yes #StrictModes yes #MaxAuthTries 6 #MaxSessions 10 #RSAAuthentication yes #PubkeyAuthentication yes # The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2 # but this is overridden so installations will only check .ssh/authorized_keys #AuthorizedKeysFile .ssh/authorized_keys #AuthorizedPrincipalsFile none # For this to work you will also need host keys in /etc/ssh/ssh_known_hosts #RhostsRSAAuthentication no # similar for protocol version 2 #HostbasedAuthentication no # Change to yes if you don't trust ~/.ssh/known_hosts for # RhostsRSAAuthentication and HostbasedAuthentication #IgnoreUserKnownHosts no # Don't read the user's ~/.rhosts and ~/.shosts files #IgnoreRhosts yes # Change to yes to enable built-in password authentication. #PasswordAuthentication no #PermitEmptyPasswords no # Change to no to disable PAM authentication #ChallengeResponseAuthentication yes # Kerberos options #KerberosAuthentication no #KerberosOrLocalPasswd yes #KerberosTicketCleanup yes #KerberosGetAFSToken no # GSSAPI options #GSSAPIAuthentication no #GSSAPICleanupCredentials yes # Set this to 'no' to disable PAM authentication, account processing, # and session processing. If this is enabled, PAM authentication will # be allowed through the ChallengeResponseAuthentication and # PasswordAuthentication. Depending on your PAM configuration, # PAM authentication via ChallengeResponseAuthentication may bypass # the setting of "PermitRootLogin without-password". # If you just want the PAM account and session checks to run without # PAM authentication, then enable this but set PasswordAuthentication # and ChallengeResponseAuthentication to 'no'. #UsePAM yes #AllowAgentForwarding yes #AllowTcpForwarding yes #GatewayPorts no #X11Forwarding yes #X11DisplayOffset 10 #X11UseLocalhost yes #PrintMotd yes #PrintLastLog yes #TCPKeepAlive yes #UseLogin no #UsePrivilegeSeparation sandbox #PermitUserEnvironment no #Compression delayed ClientAliveInterval 120 ClientAliveCountMax 200000 #UseDNS yes #PidFile /var/run/sshd.pid #MaxStartups 10 #PermitTunnel no #ChrootDirectory none #VersionAddendum FreeBSD-20120901 # no default banner path #Banner none # override default of no subsystems Subsystem sftp /usr/libexec/sftp-server # Disable HPN tuning improvements. #HPNDisabled no # Buffer size for HPN to non-HPN connections. #HPNBufferSize 2048 # TCP receive socket buffer polling for HPN. Disable on non autotuning kernels. #TcpRcvBufPoll yes # Allow the use of the NONE cipher. #NoneEnabled no # Example of overriding settings on a per-user basis #Match User anoncvs # X11Forwarding no # AllowTcpForwarding no # ForceCommand cvs server # sshd config on the 10.0-CURRENT: # cat /etc/ssh/sshd_config # $OpenBSD: sshd_config,v 1.87 2012/07/10 02:19:15 djm Exp $ # $FreeBSD: head/crypto/openssh/sshd_config 240075 2012-09-03 16:51:41Z des $ # This is the sshd server system-wide configuration file. See # sshd_config(5) for more information. # This sshd was compiled with PATH=/usr/bin:/bin:/usr/sbin:/sbin # The strategy used for options in the default sshd_config shipped with # OpenSSH is to specify options with their default value where # possible, but leave them commented. Uncommented options override the # default value. # Note that some of FreeBSD's defaults differ from OpenBSD's, and # FreeBSD has a few additional options. #Port 22 #AddressFamily any #ListenAddress 0.0.0.0 #ListenAddress :: # The default requires explicit activation of protocol 1 #Protocol 2 # HostKey for protocol version 1 #HostKey /etc/ssh/ssh_host_key # HostKeys for protocol version 2 #HostKey /etc/ssh/ssh_host_rsa_key #HostKey /etc/ssh/ssh_host_dsa_key #HostKey /etc/ssh/ssh_host_ecdsa_key # Lifetime and size of ephemeral version 1 server key #KeyRegenerationInterval 1h #ServerKeyBits 1024 # Logging # obsoletes QuietMode and FascistLogging #SyslogFacility AUTH #LogLevel INFO # Authentication: #LoginGraceTime 2m PermitRootLogin yes #StrictModes yes #MaxAuthTries 6 #MaxSessions 10 #RSAAuthentication yes #PubkeyAuthentication yes # The default is to check both .ssh/authorized_keys and .ssh/authorized_keys2 # but this is overridden so installations will only check .ssh/authorized_keys AuthorizedKeysFile .ssh/authorized_keys #AuthorizedPrincipalsFile none # For this to work you will also need host keys in /etc/ssh/ssh_known_hosts #RhostsRSAAuthentication no # similar for protocol version 2 #HostbasedAuthentication no # Change to yes if you don't trust ~/.ssh/known_hosts for # RhostsRSAAuthentication and HostbasedAuthentication #IgnoreUserKnownHosts no # Don't read the user's ~/.rhosts and ~/.shosts files #IgnoreRhosts yes # Change to yes to enable built-in password authentication. #PasswordAuthentication no #PermitEmptyPasswords no # Change to no to disable PAM authentication #ChallengeResponseAuthentication yes # Kerberos options #KerberosAuthentication no #KerberosOrLocalPasswd yes #KerberosTicketCleanup yes #KerberosGetAFSToken no # GSSAPI options #GSSAPIAuthentication no #GSSAPICleanupCredentials yes # Set this to 'no' to disable PAM authentication, account processing, # and session processing. If this is enabled, PAM authentication will # be allowed through the ChallengeResponseAuthentication and # PasswordAuthentication. Depending on your PAM configuration, # PAM authentication via ChallengeResponseAuthentication may bypass # the setting of "PermitRootLogin without-password". # If you just want the PAM account and session checks to run without # PAM authentication, then enable this but set PasswordAuthentication # and ChallengeResponseAuthentication to 'no'. #UsePAM yes #AllowAgentForwarding yes #AllowTcpForwarding yes #GatewayPorts no #X11Forwarding yes #X11DisplayOffset 10 #X11UseLocalhost yes #PrintMotd yes #PrintLastLog yes #TCPKeepAlive yes #UseLogin no #UsePrivilegeSeparation sandbox #PermitUserEnvironment no #Compression delayed ClientAliveInterval 120 ClientAliveCountMax 200000 #UseDNS yes #PidFile /var/run/sshd.pid #MaxStartups 10 #PermitTunnel no #ChrootDirectory none #VersionAddendum FreeBSD-20120901 # no default banner path #Banner none # override default of no subsystems Subsystem sftp /usr/libexec/sftp-server # Disable HPN tuning improvements. #HPNDisabled no # Buffer size for HPN to non-HPN connections. #HPNBufferSize 2048 # TCP receive socket buffer polling for HPN. Disable on non autotuning kernels. #TcpRcvBufPoll yes # Allow the use of the NONE cipher. #NoneEnabled no # Example of overriding settings on a per-user basis #Match User anoncvs # X11Forwarding no # AllowTcpForwarding no # ForceCommand cvs server # Ideas from the ssh folks? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 (c) E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 11:02:51 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A7A23519 for ; Wed, 6 Mar 2013 11:02:51 +0000 (UTC) (envelope-from gondim@bsdinfo.com.br) Received: from zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [186.193.48.13]) by mx1.freebsd.org (Postfix) with ESMTP id 281681AF for ; Wed, 6 Mar 2013 11:02:50 +0000 (UTC) Received: from zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [127.0.0.1]) by zeus.linuxinfo.com.br (Postfix) with ESMTP id 2DD17466A475 for ; Wed, 6 Mar 2013 07:53:22 -0300 (BRT) X-Virus-Scanned: amavisd-new at zeus.linuxinfo.com.br Received: from zeus.linuxinfo.com.br ([127.0.0.1]) by zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8CbMpTmN6Bif for ; Wed, 6 Mar 2013 07:53:19 -0300 (BRT) Received: from MacBook-de-Gondim-2.local (unknown [186.193.54.69]) by zeus.linuxinfo.com.br (Postfix) with ESMTPSA id DD73F466A458 for ; Wed, 6 Mar 2013 07:53:19 -0300 (BRT) Message-ID: <51372090.3050009@bsdinfo.com.br> Date: Wed, 06 Mar 2013 07:55:12 -0300 From: Marcelo Gondim User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130216 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Strange reboot since 9.1 References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 11:02:51 -0000 Em 06/03/13 06:18, Marin Atanasov Nikolov escreveu: > On Wed, Mar 6, 2013 at 10:55 AM, Loïc Blot wrote: > >> Hello, >> > Hi, > > >> Since FreeBSD 9.1 I have strange problems with the distribution. Some >> servers are rebooting without any kernel panic, instanly. First i >> thought it's a problem with my KVM system, but one of my FreeBSD under a >> Dell R210 have the same problem. >> The servers concerned are now: >> - Monitoring server >> - LDAP test server >> - Some other servers, randomly (not in production). >> First i thought it's a problem with my FreeBSD install, then i download >> another time the ISO but the problem was already here. After i try >> another thing, install 9.0 and upgrade to 9.1 but same problem. >> How can i get informations about this problem ? >> >> > I've had similar issues with one of my FreeBSD systems. My system had > spontaneous reboots without any kernel panic, without any clear evidence of > why it happened. > > After a lot of trials and tests the root cause appeared to be the amount of > ZFS snapshots I had, which were more than 1K on a 8G system. > > Upgrading from 9.0 to 9.1 didn't solve the issue, as clearly I had to do > some cleanup of the ZFS snapshots and since then it's more than a month > without any reboots. > > Few pointers that you could use -- get these systems monitored and keep an > eye on the monitoring system -- CPU usage, memory, processes, network > traffic, etc.. I've noticed that my system was running low on free memory > and that later led me to the ZFS snapshots clue. > > So, my advise is to get first these systems monitored and watch for > anything unusual happening. Then further investigate. > > Good luck. > > Regards, > Marin > > >> Thanks for advance. >> -- >> Best regards, >> >> Loïc BLOT, Engineering >> UNIX Systems, Security and Networks >> http://www.unix-experience.fr >> I have same problem but I'm using UFS and FreeBSD 9.1-STABLE with dumdev enabled (dumpdev="AUTO"). After spontaneous reboots, nothing in /var/crash. Spontaneous reboots always happen between 00:00 am and 09:00 am. FreeBSD rt01.xxx.com 9.1-STABLE FreeBSD 9.1-STABLE #14 r247497: Thu Feb 28 21:32:09 BRT 2013 root@rt01.xxx.com:/usr/obj/usr/src/sys/INTNET amd64 hw.machine: amd64 hw.model: Intel(R) Xeon(R) CPU E5606 @ 2.13GHz hw.ncpu: 8 hw.byteorder: 1234 hw.physmem: 8509702144 hw.usermem: 7911686144 Handle 0x0003, DMI type 2, 16 bytes Base Board Information Manufacturer: Intel Corporation Product Name: S5500BC Version: E25124-453 Serial Number: BZBZ04800361 Asset Tag: .................... Features: Board is a hosting board Board is replaceable Location In Chassis: Not Specified Chassis Handle: 0x0004 Type: Motherboard Contained Object Handles: 0 This motherboard have 2 CPU processors. []'s Gondim From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 12:38:22 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C9582E9F for ; Wed, 6 Mar 2013 12:38:22 +0000 (UTC) (envelope-from gondim@bsdinfo.com.br) Received: from zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [186.193.48.13]) by mx1.freebsd.org (Postfix) with ESMTP id 6EC2C83A for ; Wed, 6 Mar 2013 12:38:21 +0000 (UTC) Received: from zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [127.0.0.1]) by zeus.linuxinfo.com.br (Postfix) with ESMTP id C280C466A474 for ; Wed, 6 Mar 2013 09:36:26 -0300 (BRT) X-Virus-Scanned: amavisd-new at zeus.linuxinfo.com.br Received: from zeus.linuxinfo.com.br ([127.0.0.1]) by zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qLg2YlzLnuyO for ; Wed, 6 Mar 2013 09:36:22 -0300 (BRT) Received: from MacBook-de-Gondim-2.local (unknown [186.193.48.8]) by zeus.linuxinfo.com.br (Postfix) with ESMTPSA id 84254466A458 for ; Wed, 6 Mar 2013 09:36:17 -0300 (BRT) Message-ID: <513738B2.5090805@bsdinfo.com.br> Date: Wed, 06 Mar 2013 09:38:10 -0300 From: Marcelo Gondim User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130216 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Strange reboot since 9.1 References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <51372090.3050009@bsdinfo.com.br> In-Reply-To: <51372090.3050009@bsdinfo.com.br> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 12:38:22 -0000 Em 06/03/13 07:55, Marcelo Gondim escreveu: > Em 06/03/13 06:18, Marin Atanasov Nikolov escreveu: >> On Wed, Mar 6, 2013 at 10:55 AM, Loïc Blot >> wrote: >> >>> Hello, >>> >> Hi, >> >> >>> Since FreeBSD 9.1 I have strange problems with the distribution. Some >>> servers are rebooting without any kernel panic, instanly. First i >>> thought it's a problem with my KVM system, but one of my FreeBSD >>> under a >>> Dell R210 have the same problem. >>> The servers concerned are now: >>> - Monitoring server >>> - LDAP test server >>> - Some other servers, randomly (not in production). >>> First i thought it's a problem with my FreeBSD install, then i download >>> another time the ISO but the problem was already here. After i try >>> another thing, install 9.0 and upgrade to 9.1 but same problem. >>> How can i get informations about this problem ? >>> >>> >> I've had similar issues with one of my FreeBSD systems. My system had >> spontaneous reboots without any kernel panic, without any clear >> evidence of >> why it happened. >> >> After a lot of trials and tests the root cause appeared to be the >> amount of >> ZFS snapshots I had, which were more than 1K on a 8G system. >> >> Upgrading from 9.0 to 9.1 didn't solve the issue, as clearly I had to do >> some cleanup of the ZFS snapshots and since then it's more than a month >> without any reboots. >> >> Few pointers that you could use -- get these systems monitored and >> keep an >> eye on the monitoring system -- CPU usage, memory, processes, network >> traffic, etc.. I've noticed that my system was running low on free >> memory >> and that later led me to the ZFS snapshots clue. >> >> So, my advise is to get first these systems monitored and watch for >> anything unusual happening. Then further investigate. >> >> Good luck. >> >> Regards, >> Marin >> >> >>> Thanks for advance. >>> -- >>> Best regards, >>> >>> Loïc BLOT, Engineering >>> UNIX Systems, Security and Networks >>> http://www.unix-experience.fr >>> > I have same problem but I'm using UFS and FreeBSD 9.1-STABLE with > dumdev enabled (dumpdev="AUTO"). After spontaneous reboots, nothing in > /var/crash. > Spontaneous reboots always happen between 00:00 am and 09:00 am. > > FreeBSD rt01.xxx.com 9.1-STABLE FreeBSD 9.1-STABLE #14 r247497: Thu > Feb 28 21:32:09 BRT 2013 root@rt01.xxx.com:/usr/obj/usr/src/sys/INTNET > amd64 > > hw.machine: amd64 > hw.model: Intel(R) Xeon(R) CPU E5606 @ 2.13GHz > hw.ncpu: 8 > hw.byteorder: 1234 > hw.physmem: 8509702144 > hw.usermem: 7911686144 > > Handle 0x0003, DMI type 2, 16 bytes > Base Board Information > Manufacturer: Intel Corporation > Product Name: S5500BC > Version: E25124-453 > Serial Number: BZBZ04800361 > Asset Tag: .................... > Features: > Board is a hosting board > Board is replaceable > Location In Chassis: Not Specified > Chassis Handle: 0x0004 > Type: Motherboard > Contained Object Handles: 0 > > This motherboard have 2 CPU processors. My last log: boot time Wed Mar 6 03:14 boot time Wed Mar 6 02:29 boot time Tue Mar 5 04:32 boot time Mon Mar 4 08:16 boot time Mon Mar 4 07:09 boot time Mon Mar 4 05:54 boot time Mon Mar 4 05:14 boot time Mon Mar 4 04:33 boot time Mon Mar 4 04:29 boot time Mon Mar 4 04:10 boot time Mon Mar 4 04:01 boot time Mon Mar 4 03:22 boot time Sun Mar 3 05:55 boot time Sat Mar 2 08:02 boot time Sat Mar 2 07:54 boot time Sat Mar 2 07:11 boot time Sat Mar 2 05:33 boot time Sat Mar 2 05:09 boot time Sat Mar 2 04:56 boot time Sat Mar 2 04:19 boot time Sat Mar 2 04:13 boot time Sat Mar 2 04:04 boot time Sat Mar 2 03:27 boot time Sat Mar 2 03:20 boot time Sat Mar 2 02:51 boot time Sat Mar 2 02:40 From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 13:58:19 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1FD6F7BD for ; Wed, 6 Mar 2013 13:58:19 +0000 (UTC) (envelope-from service.info@institutoptique.fr) Received: from smtp2.u-psud.fr (smtp2.u-psud.fr [129.175.33.42]) by mx1.freebsd.org (Postfix) with ESMTP id A59ECC26 for ; Wed, 6 Mar 2013 13:58:18 +0000 (UTC) Received: from smtp2.u-psud.fr (localhost [127.0.0.1]) by localhost (MTA) with SMTP id 3CB8334D7BB; Wed, 6 Mar 2013 14:58:11 +0100 (CET) Received: from [10.117.40.5] (dns.institutoptique.fr [129.175.196.160]) by smtp2.u-psud.fr (MTA) with ESMTP id 0EE7434D79B; Wed, 6 Mar 2013 14:58:11 +0100 (CET) Message-ID: <1362578286.16808.16.camel@iMac-LBlot.domain.iogs> Subject: Re: Strange reboot since 9.1 From: Service Info To: Marin Atanasov Nikolov Date: Wed, 06 Mar 2013 14:58:06 +0100 In-Reply-To: References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> Organization: Institut Optique Graduate School Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: service.info@institutoptique.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 13:58:19 -0000 Hi Marin, i don't use ZFS on this system, only UFS2+J :) My LDAP servers reboots more often when i compile a program (yesterday when i compile samba36), i think it's when server it's charged (my monitoring server uses 750 NRPE sensors + MRTG under 50 switches every time and SNORT But the CPU isn't very used, like memory: CPU: 8.4% user, 0.0% nice, 0.6% system, 0.0% interrupt, 91.0% idle Mem: 709M Active, 606M Inact, 885M Wired, 92M Cache, 826M Buf, 5599M Free -- Cordialement, Loïc BLOT Systèmes UNIX, Sécurité et Réseau 01.64.53.31.54 Laboratoire Charles Fabry, CNRS Le mercredi 06 mars 2013 à 11:18 +0200, Marin Atanasov Nikolov a écrit : > > > > On Wed, Mar 6, 2013 at 10:55 AM, Loïc Blot > wrote: > Hello, > > > Hi, > > > Since FreeBSD 9.1 I have strange problems with the > distribution. Some > servers are rebooting without any kernel panic, instanly. > First i > thought it's a problem with my KVM system, but one of my > FreeBSD under a > Dell R210 have the same problem. > The servers concerned are now: > - Monitoring server > - LDAP test server > - Some other servers, randomly (not in production). > First i thought it's a problem with my FreeBSD install, then i > download > another time the ISO but the problem was already here. After i > try > another thing, install 9.0 and upgrade to 9.1 but same > problem. > How can i get informations about this problem ? > > > > I've had similar issues with one of my FreeBSD systems. My system had > spontaneous reboots without any kernel panic, without any clear > evidence of why it happened. > > > After a lot of trials and tests the root cause appeared to be the > amount of ZFS snapshots I had, which were more than 1K on a 8G system. > > > > Upgrading from 9.0 to 9.1 didn't solve the issue, as clearly I had to > do some cleanup of the ZFS snapshots and since then it's more than a > month without any reboots. > > > Few pointers that you could use -- get these systems monitored and > keep an eye on the monitoring system -- CPU usage, memory, processes, > network traffic, etc.. I've noticed that my system was running low on > free memory and that later led me to the ZFS snapshots clue. > > > So, my advise is to get first these systems monitored and watch for > anything unusual happening. Then further investigate. > > Good luck. > > Regards, > Marin > > > Thanks for advance. > -- > Best regards, > > Loïc BLOT, Engineering > UNIX Systems, Security and Networks > http://www.unix-experience.fr > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" > > > > -- > Marin Atanasov Nikolov > > dnaeon AT gmail DOT com > http://www.unix-heaven.org/ From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 18:02:41 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D24A4CBB; Wed, 6 Mar 2013 18:02:41 +0000 (UTC) (envelope-from mike.jakubik@intertainservices.com) Received: from mail.intertainservices.com (mail.intertainservices.com [69.77.177.114]) by mx1.freebsd.org (Postfix) with ESMTP id 8A690B90; Wed, 6 Mar 2013 18:02:41 +0000 (UTC) Received: from [172.16.10.200] (unknown [172.16.10.200]) by mail.intertainservices.com (Postfix) with ESMTPSA id 0073E5644C; Wed, 6 Mar 2013 13:02:29 -0500 (EST) Message-ID: <1362592949.4699.6.camel@mjakubik.localdomain> Subject: Re: 9.1 AMD64 multitasking efficiency low From: Mike Jakubik To: CeDeROM Date: Wed, 06 Mar 2013 13:02:29 -0500 In-Reply-To: References: <201302130844.45388.c47g@gmx.at> <201302131321.49429.c47g@gmx.at> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.3 (3.6.3-2.fc18) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-intertainservices-MailScanner-Information: Please contact the ISP for more information X-intertainservices-MailScanner-ID: 0073E5644C.A022A X-intertainservices-MailScanner: Found to be clean X-intertainservices-MailScanner-From: mike.jakubik@intertainservices.com X-Spam-Status: No Cc: freebsd-stable@freebsd.org, freebsd-emulation@freebsd.org, Bernhard =?ISO-8859-1?Q?Fr=F6hlich?= X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 18:02:41 -0000 On Wed, 2013-02-13 at 13:30 +0100, CeDeROM wrote: > On Wed, Feb 13, 2013 at 1:21 PM, Christian Gusenbauer wrote: > > It has something to do with the drive. I've just connected my external drive > > to the Intel controller and copied some GB of data around without performance > > impacts! So my new WDC drive works on both the JMicron and the Intel > > controller. > > On the other hand these drivers work very well on other operating > systems like WIndows and Linux, so I would rather suspect some > SCSI/CAM/SATA issues on the FreeBSD side...? > I have the same issues on my system. Whenever heavy IO occurs, such as extracting a large tar, my desktop (Xorg + KDE4) becomes completely unusable, i can not even type anything in to an already opened text editor. From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 18:17:43 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 616D0302; Wed, 6 Mar 2013 18:17:43 +0000 (UTC) (envelope-from jlh@FreeBSD.org) Received: from caravan.chchile.org (caravan.chchile.org [178.32.125.136]) by mx1.freebsd.org (Postfix) with ESMTP id 2A6FED33; Wed, 6 Mar 2013 18:17:42 +0000 (UTC) Received: by caravan.chchile.org (Postfix, from userid 1000) id 389D6B583B; Wed, 6 Mar 2013 18:02:23 +0000 (UTC) Date: Wed, 6 Mar 2013 19:02:22 +0100 From: Jeremie Le Hen To: freebsd-stable@FreeBSD.org Subject: gdb broken on 9.1/amd64? Message-ID: <20130306180222.GC5939@caravan.chchile.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Cc: jlh@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 18:17:43 -0000 Hi guys, (Can you please Cc: me on reply, I am not subscribed to this list.) I have two different stable/9.1 amd64 machines which show the following error message upon gdb startup: /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: internal-error: legacy_fetch_link_map_offsets called without legacy link_map support enabled. Can you check if you have the same problem (same or other arch)? Thanks. Example below: root@ingwe:~ # sleep 3600 & [1] 521 root@ingwe:~ # gdb -p 521 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Attaching to process 521 /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: internal-error: legacy_fetch_link_map_offsets called without legacy link_map support enabled. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) n /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: internal-error: legacy_fetch_link_map_offsets called without legacy link_map support enabled. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n Reading symbols from /bin/sleep...(no debugging symbols found)...done. Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. Loaded symbols for /libexec/ld-elf.so.1 0x00000008059f188c in nanosleep () from /lib/libc.so.7 (gdb) bt #0 0x00000008059f188c in nanosleep () from /lib/libc.so.7 #1 0x0000000000400883 in ?? () #2 0x00000000004006f1 in ?? () #3 0x0000000801890000 in ?? () #4 0x0000000000000000 in ?? () -- Jeremie Le Hen Scientists say the world is made up of Protons, Neutrons and Electrons. They forgot to mention Morons. From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 19:47:43 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D584F24D; Wed, 6 Mar 2013 19:47:43 +0000 (UTC) (envelope-from graham@menhennitt.com.au) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by mx1.freebsd.org (Postfix) with ESMTP id 568A720E; Wed, 6 Mar 2013 19:47:42 +0000 (UTC) Received: from maxwell.mencon.com.au (c122-107-224-152.mckinn3.vic.optusnet.com.au [122.107.224.152]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r26JlXvY009519; Thu, 7 Mar 2013 06:47:35 +1100 Received: from starker.mencon.com.au (starker.mencon.com.au [203.2.73.75]) by maxwell.mencon.com.au (Postfix) with ESMTP id A94AC60B9; Thu, 7 Mar 2013 06:47:33 +1100 (EST) Message-ID: <51379D55.10400@menhennitt.com.au> Date: Thu, 07 Mar 2013 06:47:33 +1100 From: Graham Menhennitt User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130111 Thunderbird/17.0.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org, jlh@freebsd.org Subject: Re: gdb broken on 9.1/amd64? References: <20130306180222.GC5939@caravan.chchile.org> In-Reply-To: <20130306180222.GC5939@caravan.chchile.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=D4sfsYtj c=1 sm=1 a=BopLGWBBcmsA:10 a=UfmE7BA1PSwA:10 a=8nJEP1OIZ-IA:10 a=Cpq1HDflAAAA:8 a=oGrv3NR43wIA:10 a=t_AnLRby-K93o8eGRbEA:9 a=wPNLvfGTeEIA:10 a=BQwmYAONLMrTGJtfTy08BQ==:117 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 19:47:43 -0000 On 07/03/2013 05:02, Jeremie Le Hen wrote: > I have two different stable/9.1 amd64 machines which show the following > error message upon gdb startup: > > /usr/src/gnu/usr.bin/gdb/libgdb/../../../../contrib/gdb/gdb/solib-svr4.c:1444: > internal-error: legacy_fetch_link_map_offsets called without legacy > link_map support enabled. > > Can you check if you have the same problem (same or other arch)? > Thanks. > Yep. Same arch, same version -> same error. Graham From owner-freebsd-stable@FreeBSD.ORG Wed Mar 6 20:51:04 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B2DAE532; Wed, 6 Mar 2013 20:51:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 1C79178F; Wed, 6 Mar 2013 20:51:03 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r26KoxF8017158; Wed, 6 Mar 2013 22:50:59 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r26KoxF8017158 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r26KoxWw017157; Wed, 6 Mar 2013 22:50:59 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 6 Mar 2013 22:50:59 +0200 From: Konstantin Belousov To: Jeremie Le Hen Subject: Re: gdb broken on 9.1/amd64? Message-ID: <20130306205059.GQ3794@kib.kiev.ua> References: <20130306180222.GC5939@caravan.chchile.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="MXxcbiX/Q4+iy5U7" Content-Disposition: inline In-Reply-To: <20130306180222.GC5939@caravan.chchile.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2013 20:51:04 -0000 --MXxcbiX/Q4+iy5U7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Mar 06, 2013 at 07:02:22PM +0100, Jeremie Le Hen wrote: > root@ingwe:~ # gdb -p 521 Try to specify the executable binary on the command line. --MXxcbiX/Q4+iy5U7 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRN6wyAAoJEJDCuSvBvK1BslEP/3QklCqOKiqAvEBfWvTAGN97 IS1ASTn79u+EtYznsNg3GeedfI6vhTKCTXxJPCXXAtDEFrx8376O+5d52mGZVcu3 e8lwv7DP9bg/p+f5MmNj3JdLfPlHQIMrxwpJIaM3Zo700sS9BUtZh8eyEgA0h6PL uRAzdQV0TnEIHmlRqGFbsTZ4m5OQHZVA0GGQILTujLggPsWTexyYZo4/TIIPIDwl CrM/Bob9g2We/JOXbEu3kyFQO5F0ss5i/fZOxDSLyAUGEhCMfr+Ihb/bvtWsSMhk vFsH0hd7jlTq/M59DldzCTsGytWjDFsisyF6DOUfLdpl/5NN6k8zucxm1wg4tEv+ B6ndl4wJaw2nD+aT8qUaPfR+MsfUAvJGY7UmAWXrhXyix89gWSmvqxDG5w3jdwYg Ij3XI80nSC+AfqO+GOWc+6UyXGbspIyMQAxMw2Q84Hh8xBRb6fRq+bdeii3xYMGO CAWmQ4FEv/GSlfeSNt27I6IRkSq4a420wwykiw8hKatkvBOE4EKMAi5sX9Gvvwp4 8sNvtgjLahvDHeQzxCUSJ9vyNWe2p1gncjgNplQvSUjMrSXFmjhw5MhI08NlV5xD YXFNIWTZc0nYxzilwdEQMx+Yjfl3awdm8H0jKPK/6cYcxpzGycoixevnPhIWjXzz bliQv3WWzfNTU+4iP74o =pSpn -----END PGP SIGNATURE----- --MXxcbiX/Q4+iy5U7-- From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 06:55:44 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1767B5FA for ; Thu, 7 Mar 2013 06:55:44 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id C054F2A7 for ; Thu, 7 Mar 2013 06:55:43 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id r276taoZ052460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 6 Mar 2013 22:55:36 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id r276ta7w052459; Wed, 6 Mar 2013 22:55:36 -0800 (PST) (envelope-from jmg) Date: Wed, 6 Mar 2013 22:55:36 -0800 From: John-Mark Gurney To: Karl Denninger Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130307065536.GA50035@funkthat.com> Mail-Followup-To: Karl Denninger , freebsd-stable@freebsd.org References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <545CD2ABE3D146F2B91963ADF6090CDE@multiplay.co.uk> <20130305092700.GA43045@icarus.home.lan> <5135EB62.6060006@denninger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5135EB62.6060006@denninger.net> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Wed, 06 Mar 2013 22:55:36 -0800 (PST) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 06:55:44 -0000 Karl Denninger wrote this message on Tue, Mar 05, 2013 at 06:56 -0600: > When it happens on my system anything that is CPU-bound continues to > execute. I can switch consoles and network I/O also works. If I have > an iostat running at the time all I/O counters go to and remain at zero > while the stall is occurring, but the process that is producing the > iostat continues to run and emit characters whether it is a ssh session > or on the physical console. > > The CPUs are running and processing, but all threads block if they > attempt access to the disk I/O subsystem, irrespective of the portion of > the disk I/O subsystem they attempt to access (e.g. UFS, swap or ZFS) I > therefore cannot start any new process that requires image activation. Since it seems like there is a thread that is spinning... Has anyone thought to modify kgdb to mlockall it's memory and run it against the current system (kgdb /boot/kernel/kernel /dev/mem), and then when the thread goes busy, use kgdb to see what where it's spinning? Just a thought... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 06:59:26 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id AAAE1719 for ; Thu, 7 Mar 2013 06:59:26 +0000 (UTC) (envelope-from bc979@lafn.org) Received: from zoom.lafn.org (zoom.lafn.org [108.92.93.123]) by mx1.freebsd.org (Postfix) with ESMTP id 7D1832CE for ; Thu, 7 Mar 2013 06:59:26 +0000 (UTC) Received: from [10.0.1.2] (static-71-177-216-148.lsanca.fios.verizon.net [71.177.216.148]) (authenticated bits=0) by zoom.lafn.org (8.14.3/8.14.2) with ESMTP id r276xJg9096567 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Wed, 6 Mar 2013 22:59:20 -0800 (PST) (envelope-from bc979@lafn.org) From: Doug Hardie Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Sanity Check on Mac Mini Message-Id: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> Date: Wed, 6 Mar 2013 22:59:18 -0800 To: "freebsd-stable@freebsd.org List" Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) X-Mailer: Apple Mail (2.1499) X-Virus-Scanned: clamav-milter 0.97 at zoom.lafn.org X-Virus-Status: Clean X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 06:59:26 -0000 I have a new Mac Mini and have encountered the same problem reported = last year by Richard Kuhns. YongHyeon PYUN provided some patches to the = kernel that resolved the problem. However, without an internet = connection its a bit tricky to get them into the system. Here is the = approach I believe will work, but wanted to check first before I really = mess things up. 1. Downloaded from current today via svnweb.freebsd.org: sys/dev/bge/if_bgereg.h sys/dev/bge/if_bge.c sys/dev/mii/brgphy.c I believe the patches are incorporated in today's versions. The = comments indicate such. Thus I don't need to apply the original = supplied patch. 2. Put those on a flash drive. 3. Install 9.1 release from flash drive onto the Mini disk. Have to = include the system source. 4. Copy the files from 1 above from flash over the files on the disk. 5. Rebuild the kernel and install it. Thanks, -- Doug= From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 07:22:08 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 804B4B7F for ; Thu, 7 Mar 2013 07:22:08 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 1680A391 for ; Thu, 7 Mar 2013 07:22:07 +0000 (UTC) Received: from server.rulingia.com (c220-239-237-213.belrs5.nsw.optusnet.com.au [220.239.237.213]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id r277Loed053569 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 7 Mar 2013 18:21:50 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id r277LjpT011114 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 7 Mar 2013 18:21:45 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id r277LjAf011113; Thu, 7 Mar 2013 18:21:45 +1100 (EST) (envelope-from peter) Date: Thu, 7 Mar 2013 18:21:45 +1100 From: Peter Jeremy To: Karl Denninger Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Message-ID: <20130307072145.GA2923@server.rulingia.com> References: <513524B2.6020600@denninger.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zYM0uCDKw75PZbzx" Content-Disposition: inline In-Reply-To: <513524B2.6020600@denninger.net> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 07:22:08 -0000 --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2013-Mar-04 16:48:18 -0600, Karl Denninger wrote: >The subject machine in question has 12GB of RAM and dual Xeon >5500-series processors. It also has an ARECA 1680ix in it with 2GB of >local cache and the BBU for it. The ZFS spindles are all exported as >JBOD drives. I set up four disks under GPT, have a single freebsd-zfs >partition added to them, are labeled and the providers are then >geli-encrypted and added to the pool. What sort of disks? SAS or SATA? >also known good. I began to get EXTENDED stalls with zero I/O going on, >some lasting for 30 seconds or so. The system was not frozen but >anything that touched I/O would lock until it cleared. Dedup is off, >incidentally. When the system has stalled: - Do you see very low free memory? - What happens to all the different CPU utilisation figures? Do they all go to zero? Do you get high system or interrupt CPU (including going to 1 core's worth)? - What happens to interrupt load? Do you see any disk controller interrupts? Would you be able to build a kernel with WITNESS (and WITNESS_SKIPSPIN) and see if you get any errors when stalls happen. On 2013-Mar-05 14:09:36 -0800, Jeremy Chadwick wrote: >On Tue, Mar 05, 2013 at 01:09:41PM +0200, Andriy Gapon wrote: >> Completely unrelated to the main thread: >>=20 >> on 05/03/2013 07:32 Jeremy Chadwick said the following: >> > That said, I still do not recommend ZFS for a root filesystem >> Why? >Too long a history of problems with it and weird edge cases (keep >reading); the last thing an administrator wants to deal with is a system >where the root filesystem won't mount/can't be used. It makes >recovery or problem-solving (i.e. the server is not physically accessible >given geographic distances) very difficult. I've had lots of problems with a gmirrored UFS root as well. The biggest issue is that gmirror has no audit functionality so you can't verify that both sides of a mirror really do have the same data. >My point/opinion: UFS for a root filesystem is guaranteed to work >without any fiddling about and, barring drive failures or controller >issues, is (again, my opinion) a lot more risk-free than ZFS-on-root. AFAIK, you can't boot from anything other than a single disk (ie no graid). --=20 Peter Jeremy --zYM0uCDKw75PZbzx Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlE4QAkACgkQ/opHv/APuId2wQCgs8WOllSrjKtPxNbBDDqtW9wG Tz8An26LiYxeg46x2+kr6cT9dgakLkKN =vgwF -----END PGP SIGNATURE----- --zYM0uCDKw75PZbzx-- From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 10:27:55 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E7AE47FA for ; Thu, 7 Mar 2013 10:27:55 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp2.u-psud.fr (smtp2.u-psud.fr [129.175.33.42]) by mx1.freebsd.org (Postfix) with ESMTP id 907DBFA7 for ; Thu, 7 Mar 2013 10:27:55 +0000 (UTC) Received: from smtp2.u-psud.fr (localhost [127.0.0.1]) by localhost (MTA) with SMTP id A1F0D34D888; Thu, 7 Mar 2013 11:27:42 +0100 (CET) Received: from [10.117.40.5] (dns.institutoptique.fr [129.175.196.160]) by smtp2.u-psud.fr (MTA) with ESMTP id 44B0E34D849; Thu, 7 Mar 2013 11:27:42 +0100 (CET) Message-ID: <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> Subject: Re: Strange reboot since 9.1 From: =?ISO-8859-1?Q?Lo=EFc?= Blot To: Marin Atanasov Nikolov Date: Thu, 07 Mar 2013 11:27:37 +0100 In-Reply-To: References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> Organization: UNIX Experience FR Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 10:27:56 -0000 Hello, i have enabled dumpdev="AUTO" and run kgdb after a reboot. Here is the backtrace: root@freebsd-server> kgdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 1927 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 #1 0xffffffff808f2d46 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485 #2 0xffffffff8092ba72 in sleepq_timedwait (wchan=0xffffffff81222400, pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:658 #3 0xffffffff808f332f in _sleep (ident=0xffffffff81222400, lock=0x0, priority=Variable "priority" is not available. ) at /usr/src/sys/kern/kern_synch.c:246 #4 0xffffffff80b429db in scheduler (dummy=Variable "dummy" is not available. ) at /usr/src/sys/vm/vm_glue.c:788 #5 0xffffffff8089c047 in mi_startup () at /usr/src/sys/kern/init_main.c:277 #6 0xffffffff802b526c in btext () at /usr/src/sys/amd64/amd64/locore.S:81 #7 0x0000000000000001 in ?? () #8 0xffffffff81240f80 in tdq_cpu () #9 0xffffffff812228a0 in proc0 () #10 0x0000000000000000 in ?? () #11 0xffffffff81529b90 in ?? () #12 0xffffffff81529b38 in ?? () #13 0xfffffe00051c8000 in ?? () #14 0xffffffff8091352e in sched_switch (td=0x0, newtd=0x0, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1921 Previous frame inner to this frame (corrupt stack?) (kgdb) bt f #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 __res = 2 __s = Variable "__s" is not available. -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr Le mercredi 06 mars 2013 à 11:18 +0200, Marin Atanasov Nikolov a écrit : > > > > On Wed, Mar 6, 2013 at 10:55 AM, Loïc Blot > wrote: > Hello, > > > Hi, > > > Since FreeBSD 9.1 I have strange problems with the > distribution. Some > servers are rebooting without any kernel panic, instanly. > First i > thought it's a problem with my KVM system, but one of my > FreeBSD under a > Dell R210 have the same problem. > The servers concerned are now: > - Monitoring server > - LDAP test server > - Some other servers, randomly (not in production). > First i thought it's a problem with my FreeBSD install, then i > download > another time the ISO but the problem was already here. After i > try > another thing, install 9.0 and upgrade to 9.1 but same > problem. > How can i get informations about this problem ? > > > > I've had similar issues with one of my FreeBSD systems. My system had > spontaneous reboots without any kernel panic, without any clear > evidence of why it happened. > > > After a lot of trials and tests the root cause appeared to be the > amount of ZFS snapshots I had, which were more than 1K on a 8G system. > > > > Upgrading from 9.0 to 9.1 didn't solve the issue, as clearly I had to > do some cleanup of the ZFS snapshots and since then it's more than a > month without any reboots. > > > Few pointers that you could use -- get these systems monitored and > keep an eye on the monitoring system -- CPU usage, memory, processes, > network traffic, etc.. I've noticed that my system was running low on > free memory and that later led me to the ZFS snapshots clue. > > > So, my advise is to get first these systems monitored and watch for > anything unusual happening. Then further investigate. > > Good luck. > > Regards, > Marin > > > Thanks for advance. > -- > Best regards, > > Loïc BLOT, Engineering > UNIX Systems, Security and Networks > http://www.unix-experience.fr > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscribe@freebsd.org" > > > > -- > Marin Atanasov Nikolov > > dnaeon AT gmail DOT com > http://www.unix-heaven.org/ From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 11:23:14 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0325B7B3 for ; Thu, 7 Mar 2013 11:23:14 +0000 (UTC) (envelope-from gkontos.mail@gmail.com) Received: from mail-pb0-f52.google.com (mail-pb0-f52.google.com [209.85.160.52]) by mx1.freebsd.org (Postfix) with ESMTP id D464624E for ; Thu, 7 Mar 2013 11:23:13 +0000 (UTC) Received: by mail-pb0-f52.google.com with SMTP id ma3so310735pbc.25 for ; Thu, 07 Mar 2013 03:23:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=v/UF+pU/rrBqlo/i5kCRFyClAf6QIXZkSPo8pOMgP48=; b=HlD3zixPyZmT01CMtN9eX3aaC+p4wwZePcVCd143t7vydxqfTxrmKdh82SU3kxMjJs jmUetsOfdXk655BCniCY5gEQO+jIgBJM6tqLkTd4q3kCWqwi9pRGEml1Zstqx1e4jNHu AA4Tz+e1iHnhBM8Ue8AGG3N+7RN4dlW+hi1AqdR/eyXAnOKxhOmaPzUaxYNeUCTCVPbk TfcI3XcSpI2KksriJjHOQ5Vhrepk0aOKOVHsBr7yf18EoseZ4iuySZBEZV1NEbIPsEh0 o7PpOPDEswc8KL+rBqlAXyQF11mos3oEmnBKRWy8vfXgpJdSwGPYpd7lXIJI7sdk2HzU KNtg== MIME-Version: 1.0 X-Received: by 10.66.26.83 with SMTP id j19mr2298577pag.81.1362655393441; Thu, 07 Mar 2013 03:23:13 -0800 (PST) Received: by 10.68.147.136 with HTTP; Thu, 7 Mar 2013 03:23:13 -0800 (PST) In-Reply-To: <5134D89E.3050004@gmail.com> References: <5130BA35.5060809@denninger.net> <5130EB8A.7060706@gmail.com> <2B318078-F863-4415-8DAE-94EE4431BF4C@ee.ryerson.ca> <5134C6C2.9020009@gmail.com> <1e4c24a68e76a279eaf4dc4f7c0156d3.squirrel@webmail.ee.ryerson.ca> <5134D89E.3050004@gmail.com> Date: Thu, 7 Mar 2013 13:23:13 +0200 Message-ID: Subject: Re: Musings on ZFS Backup strategies From: George Kontostanos To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 11:23:14 -0000 I have found that the use of mbuffer really speeds up the differential transfer process: #!/bin/sh export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin: pool="zroot" destination="tank" host="1.2.3.4" today=`date +"$type-%Y-%m-%d"` yesterday=`date -v -1d +"$type-%Y-%m-%d"` # create today snapshot snapshot_today="$pool@$today" # look for a snapshot with this name if zfs list -H -o name -t snapshot | sort | grep "$snapshot_today$" > /dev/null; then echo " snapshot, $snapshot_today, already exists" exit 1 else echo " taking todays snapshot, $snapshot_today" | sendmail root zfs snapshot -r $snapshot_today fi # look for yesterday snapshot snapshot_yesterday="$pool@$yesterday" if zfs list -H -o name -t snapshot | sort | grep "$snapshot_yesterday$" > /dev/null; then echo " yesterday snapshot, $snapshot_yesterday, exists lets proceed with backup" zfs send -R -i $snapshot_yesterday $snapshot_today | mbuffer -q -v 0 -s 128k -m 1G | ssh root@$host "mbuffer -s 128k -m 1G | zfs receive -Fd $destination" > /dev/null echo " backup complete destroying yesterday snapshot" | sendmail root zfs destroy -r $snapshot_yesterday echo "Backup done" | sendmail root exit 0 else echo " missing yesterday snapshot aborting, $snapshot_yesterday" exit 1 fi -- George Kontostanos --- http://www.aisecure.net From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 12:55:35 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E28253B7 for ; Thu, 7 Mar 2013 12:55:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 17962823 for ; Thu, 7 Mar 2013 12:55:34 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA18653; Thu, 07 Mar 2013 14:55:32 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UDaM7-0009y9-Ux; Thu, 07 Mar 2013 14:55:31 +0200 Message-ID: <51388E42.5040500@FreeBSD.org> Date: Thu, 07 Mar 2013 14:55:30 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130220 Thunderbird/17.0.3 MIME-Version: 1.0 To: loic.blot@unix-experience.fr Subject: Re: Strange reboot since 9.1 References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> In-Reply-To: <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 12:55:35 -0000 on 07/03/2013 12:27 Loïc Blot said the following: > Hello, > i have enabled dumpdev="AUTO" and run kgdb after a reboot. > Here is the backtrace: > > root@freebsd-server> kgdb It's a stack trace of the first thread in your live running system. You need to read kgdb(1), inspect your /var/crash directory and pass a proper vmcore file, if any, to kgdb. > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "amd64-marcel-freebsd"... > #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, > flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1927 > 1927 cpuid = PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, > flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1927 > #1 0xffffffff808f2d46 in mi_switch (flags=260, newtd=0x0) > at /usr/src/sys/kern/kern_synch.c:485 > #2 0xffffffff8092ba72 in sleepq_timedwait (wchan=0xffffffff81222400, > pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:658 > #3 0xffffffff808f332f in _sleep (ident=0xffffffff81222400, lock=0x0, > priority=Variable "priority" is not available. > ) at /usr/src/sys/kern/kern_synch.c:246 > #4 0xffffffff80b429db in scheduler (dummy=Variable "dummy" is not > available. > ) at /usr/src/sys/vm/vm_glue.c:788 > #5 0xffffffff8089c047 in mi_startup () > at /usr/src/sys/kern/init_main.c:277 > #6 0xffffffff802b526c in btext () > at /usr/src/sys/amd64/amd64/locore.S:81 > #7 0x0000000000000001 in ?? () > #8 0xffffffff81240f80 in tdq_cpu () > #9 0xffffffff812228a0 in proc0 () > #10 0x0000000000000000 in ?? () > #11 0xffffffff81529b90 in ?? () > #12 0xffffffff81529b38 in ?? () > #13 0xfffffe00051c8000 in ?? () > #14 0xffffffff8091352e in sched_switch (td=0x0, newtd=0x0, > flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1921 > Previous frame inner to this frame (corrupt stack?) > (kgdb) bt f > #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, > flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1927 > __res = 2 > __s = Variable "__s" is not available. > -- Andriy Gapon From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 13:12:58 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D4000849 for ; Thu, 7 Mar 2013 13:12:58 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp2.u-psud.fr (smtp2.u-psud.fr [129.175.33.42]) by mx1.freebsd.org (Postfix) with ESMTP id 653588F0 for ; Thu, 7 Mar 2013 13:12:58 +0000 (UTC) Received: from smtp2.u-psud.fr (localhost [127.0.0.1]) by localhost (MTA) with SMTP id B178234D83A for ; Thu, 7 Mar 2013 14:12:50 +0100 (CET) Received: from [10.117.40.5] (dns.institutoptique.fr [129.175.196.160]) by smtp2.u-psud.fr (MTA) with ESMTP id 8191934D835 for ; Thu, 7 Mar 2013 14:12:50 +0100 (CET) Message-ID: <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> Subject: Re: Strange reboot since 9.1 From: =?ISO-8859-1?Q?Lo=EFc?= Blot To: freebsd-stable@freebsd.org Date: Thu, 07 Mar 2013 14:12:45 +0100 In-Reply-To: <51388E42.5040500@FreeBSD.org> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> Organization: UNIX Experience FR Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 13:12:58 -0000 Hi Andriy, thanks for your help. here is the stack backtrace (i have 11 core.txt files, and each has this crash). (cat /var/crash/core.txt.11) panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff809208a6 at kdb_backtrace+0x66 #1 0xffffffff808ea8be at panic+0x1ce #2 0xffffffff80bd8240 at trap_fatal+0x290 #3 0xffffffff80bd857d at trap_pfault+0x1ed #4 0xffffffff80bd8b9e at trap+0x3ce #5 0xffffffff80bc315f at calltrap+0x8 #6 0xffffffff80a861d5 at udp_input+0x475 #7 0xffffffff80a043dc at ip_input+0xac #8 0xffffffff809adafb at netisr_dispatch_src+0x20b #9 0xffffffff809a35cd at ether_demux+0x14d #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 #11 0xffffffff809adafb at netisr_dispatch_src+0x20b #12 0xffffffff80438fd7 at bce_intr+0x487 #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 #14 0xffffffff808c0076 at ithread_loop+0xa6 #15 0xffffffff808bb9ef at fork_exit+0x11f #16 0xffffffff80bc368e at fork_trampoline+0xe Uptime: 2h6m59s Dumping 1177 out of 8162 MB:..2%..11%..21%..32%..41%..51%..62%..71%..81%..92% I can't read vmcore.11 only with this option: kgdb -d /var/crash/vmcore.11 I read man and thought i must use kgdb -c /var/crash/vmcore.11 but it's not a suitable image. (kgdb: couldn't find a suitable kernel image) This servers uses UDP packets, for SNMP requests (> 10000/h), NTP (a little), Syslog (that's all i remember). -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr Le jeudi 07 mars 2013 à 14:55 +0200, Andriy Gapon a écrit : > on 07/03/2013 12:27 Loïc Blot said the following: > > Hello, > > i have enabled dumpdev="AUTO" and run kgdb after a reboot. > > Here is the backtrace: > > > > root@freebsd-server> kgdb > > It's a stack trace of the first thread in your live running system. > You need to read kgdb(1), inspect your /var/crash directory and pass a proper > vmcore file, if any, to kgdb. > > > GNU gdb 6.1.1 [FreeBSD] > > Copyright 2004 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and you > > are > > welcome to change it and/or distribute copies of it under certain > > conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for > > details. > > This GDB was configured as "amd64-marcel-freebsd"... > > #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, > > flags=Variable "flags" is not available. > > ) at /usr/src/sys/kern/sched_ule.c:1927 > > 1927 cpuid = PCPU_GET(cpuid); > > (kgdb) bt > > #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, > > flags=Variable "flags" is not available. > > ) at /usr/src/sys/kern/sched_ule.c:1927 > > #1 0xffffffff808f2d46 in mi_switch (flags=260, newtd=0x0) > > at /usr/src/sys/kern/kern_synch.c:485 > > #2 0xffffffff8092ba72 in sleepq_timedwait (wchan=0xffffffff81222400, > > pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:658 > > #3 0xffffffff808f332f in _sleep (ident=0xffffffff81222400, lock=0x0, > > priority=Variable "priority" is not available. > > ) at /usr/src/sys/kern/kern_synch.c:246 > > #4 0xffffffff80b429db in scheduler (dummy=Variable "dummy" is not > > available. > > ) at /usr/src/sys/vm/vm_glue.c:788 > > #5 0xffffffff8089c047 in mi_startup () > > at /usr/src/sys/kern/init_main.c:277 > > #6 0xffffffff802b526c in btext () > > at /usr/src/sys/amd64/amd64/locore.S:81 > > #7 0x0000000000000001 in ?? () > > #8 0xffffffff81240f80 in tdq_cpu () > > #9 0xffffffff812228a0 in proc0 () > > #10 0x0000000000000000 in ?? () > > #11 0xffffffff81529b90 in ?? () > > #12 0xffffffff81529b38 in ?? () > > #13 0xfffffe00051c8000 in ?? () > > #14 0xffffffff8091352e in sched_switch (td=0x0, newtd=0x0, > > flags=Variable "flags" is not available. > > ) at /usr/src/sys/kern/sched_ule.c:1921 > > Previous frame inner to this frame (corrupt stack?) > > (kgdb) bt f > > #0 sched_switch (td=0xffffffff812228a0, newtd=0xfffffe00051c8000, > > flags=Variable "flags" is not available. > > ) at /usr/src/sys/kern/sched_ule.c:1927 > > __res = 2 > > __s = Variable "__s" is not available. > > > > From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 14:06:28 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3D9F5EB1 for ; Thu, 7 Mar 2013 14:06:28 +0000 (UTC) (envelope-from gondim@bsdinfo.com.br) Received: from zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [186.193.48.13]) by mx1.freebsd.org (Postfix) with ESMTP id EA513C23 for ; Thu, 7 Mar 2013 14:06:27 +0000 (UTC) Received: from zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [127.0.0.1]) by zeus.linuxinfo.com.br (Postfix) with ESMTP id 90524466A474 for ; Thu, 7 Mar 2013 11:04:26 -0300 (BRT) X-Virus-Scanned: amavisd-new at zeus.linuxinfo.com.br Received: from zeus.linuxinfo.com.br ([127.0.0.1]) by zeus.linuxinfo.com.br (zeus.linuxinfo.com.br [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p_nQ62pBGKRj for ; Thu, 7 Mar 2013 11:04:24 -0300 (BRT) Received: from MacBook-de-Gondim-2.local (unknown [186.193.48.8]) by zeus.linuxinfo.com.br (Postfix) with ESMTPSA id 50418466A45A for ; Thu, 7 Mar 2013 11:04:20 -0300 (BRT) Message-ID: <51389ED5.6030207@bsdinfo.com.br> Date: Thu, 07 Mar 2013 11:06:13 -0300 From: Marcelo Gondim User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130216 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Strange reboot since 9.1 References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> In-Reply-To: <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 14:06:28 -0000 Em 07/03/13 10:12, Loïc Blot escreveu: > Hi Andriy, > thanks for your help. > > here is the stack backtrace (i have 11 core.txt files, and each has this > crash). (cat /var/crash/core.txt.11) > > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > #1 0xffffffff808ea8be at panic+0x1ce > #2 0xffffffff80bd8240 at trap_fatal+0x290 > #3 0xffffffff80bd857d at trap_pfault+0x1ed > #4 0xffffffff80bd8b9e at trap+0x3ce > #5 0xffffffff80bc315f at calltrap+0x8 > #6 0xffffffff80a861d5 at udp_input+0x475 > #7 0xffffffff80a043dc at ip_input+0xac > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > #9 0xffffffff809a35cd at ether_demux+0x14d > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > #12 0xffffffff80438fd7 at bce_intr+0x487 > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > #14 0xffffffff808c0076 at ithread_loop+0xa6 > #15 0xffffffff808bb9ef at fork_exit+0x11f > #16 0xffffffff80bc368e at fork_trampoline+0xe > Uptime: 2h6m59s > Dumping 1177 out of 8162 > MB:..2%..11%..21%..32%..41%..51%..62%..71%..81%..92% > > I can't read vmcore.11 only with this option: > > kgdb -d /var/crash/vmcore.11 > > I read man and thought i must use kgdb -c /var/crash/vmcore.11 but it's > not a suitable image. (kgdb: couldn't find a suitable kernel image) > > This servers uses UDP packets, for SNMP requests (> 10000/h), NTP (a > little), Syslog (that's all i remember). Hi, Look this http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html []'s Gondim From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 14:32:32 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 3A9EA786 for ; Thu, 7 Mar 2013 14:32:32 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id EB534E28 for ; Thu, 7 Mar 2013 14:32:31 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r27EWNlU074373 for ; Thu, 7 Mar 2013 08:32:23 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Thu Mar 7 08:32:23 2013 Message-ID: <5138A4C1.5090503@denninger.net> Date: Thu, 07 Mar 2013 08:31:29 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> In-Reply-To: <20130307072145.GA2923@server.rulingia.com> X-Enigmail-Version: 1.5.1 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="----enig2DAJKLTVDKROLFATBSLSO" X-Antivirus: avast! (VPS 130307-0, 03/07/2013), Outbound message X-Antivirus-Status: Clean X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 14:32:32 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) ------enig2DAJKLTVDKROLFATBSLSO Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 3/7/2013 1:21 AM, Peter Jeremy wrote: > On 2013-Mar-04 16:48:18 -0600, Karl Denninger wrot= e: >> The subject machine in question has 12GB of RAM and dual Xeon >> 5500-series processors. It also has an ARECA 1680ix in it with 2GB of= >> local cache and the BBU for it. The ZFS spindles are all exported as >> JBOD drives. I set up four disks under GPT, have a single freebsd-zfs= >> partition added to them, are labeled and the providers are then >> geli-encrypted and added to the pool. > What sort of disks? SAS or SATA? SATA. They're clean; they report no errors, no retries, no corrected data (ECC) etc. They also have been running for a couple of years under UFS+SU without problems. This isn't new hardware; it's an in-service system. >> also known good. I began to get EXTENDED stalls with zero I/O going o= n, >> some lasting for 30 seconds or so. The system was not frozen but >> anything that touched I/O would lock until it cleared. Dedup is off, >> incidentally. > When the system has stalled: > - Do you see very low free memory? Yes. Effectively zero. > - What happens to all the different CPU utilisation figures? Do they > all go to zero? Do you get high system or interrupt CPU (including > going to 1 core's worth)? No, they start to fall. This is a bad piece of data to trust though because I am geli-encrypting the spindles, so falling CPU doesn't mean the CPU is actually idle (since with no I/O there is nothing going through geli.) I'm working on instrumenting things sufficiently to try to peel that off -- I suspect the kernel is spinning on something, but the trick is finding out what it is. > - What happens to interrupt load? Do you see any disk controller > interrupts? None. > > Would you be able to build a kernel with WITNESS (and WITNESS_SKIPSPIN)= > and see if you get any errors when stalls happen. If I have to. That's easy to do on the test box -- on the production one, not so much. > On 2013-Mar-05 14:09:36 -0800, Jeremy Chadwick wrote: >> On Tue, Mar 05, 2013 at 01:09:41PM +0200, Andriy Gapon wrote: >>> Completely unrelated to the main thread: >>> >>> on 05/03/2013 07:32 Jeremy Chadwick said the following: >>>> That said, I still do not recommend ZFS for a root filesystem >>> Why? >> Too long a history of problems with it and weird edge cases (keep >> reading); the last thing an administrator wants to deal with is a syst= em >> where the root filesystem won't mount/can't be used. It makes >> recovery or problem-solving (i.e. the server is not physically accessi= ble >> given geographic distances) very difficult. > I've had lots of problems with a gmirrored UFS root as well. The > biggest issue is that gmirror has no audit functionality so you > can't verify that both sides of a mirror really do have the same data. I have root on a 2-drive RAID mirror (done in the controller) and that has been fine. The controller does scrubs on a regular basis internally. The problem is that if it gets a clean read that is different (e.g. no ECC indications, etc) it doesn't know which is the correct copy. The good news is that hasn't happened yet :-) The risk of this happening as my data store continues to expand is one of the reasons I want to move toward ZFS, but not necessarily for the boot drives. For the data store, however.... >> My point/opinion: UFS for a root filesystem is guaranteed to work >> without any fiddling about and, barring drive failures or controller >> issues, is (again, my opinion) a lot more risk-free than ZFS-on-root. > AFAIK, you can't boot from anything other than a single disk (ie no > graid). Where I am right now is this: 1. I *CANNOT* reproduce the spins on the test machine with Postgres stopped in any way. Even with multiple ZFS send/recv copies going on and the load average north of 20 (due to all the geli threads), the system doesn't stall or produce any notable pauses in throughput. Nor does the system RAM allocation get driven hard enough to force paging.=20 This is with NO tuning hacks in /boot/loader.conf. I/O performance is both stable and solid. 2. WITH Postgres running as a connected hot spare (identical to the production machine), allocating ~1.5G of shared, wired memory, running the same synthetic workload in (1) above I am getting SMALL versions of the misbehavior. However, while system RAM allocation gets driven pretty hard and reaches down toward 100MB in some instances it doesn't get driven hard enough to allocate swap. The "burstiness" is very evident in the iostat figures with spates getting into the single digit MB/sec range from time to time but it's not enough to drive the system to a full-on stall. There's pretty-clearly a bad interaction here between Postgres wiring memory and the ARC, when the latter is left alone and allowed to do what it wants. I'm continuing to work on replicating this on the test machine... just not completely there yet. --=20 -- Karl Denninger /The Market Ticker =AE/ Cuda Systems LLC ------enig2DAJKLTVDKROLFATBSLSO Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJROKTyAAoJEGAtiW4Ft0U9jpUIAMaSwgkbF+gK/8mc2RmERB5G y55vazdtFRxYF0PF4//Fjs8XJXtYwEpW/ORgFofuMPz5/q1pGmn7r04TP4Zs9hxb lTNWIoGsfhoYvlVCKuMYzRCSeOMHtgYW4xikzXRSyEdPhN6eHzQBDsm91LnnUaB1 30eFsKXT3FVRheOTNSgnLZG6ywxIJq3inf0x56H3Jayw+voV3fF5BeqYVOH7Wd1E +l4ShlW+C3ysvcyskqRxfNjC2t7lcSI3iV6JB46KbmvmArigGwrz+OKJx55tuUYB Jl+vopzcM7WdzwYylro65UyGFU1CCg7BQXexKOkU1JM/qYdGxg404Mv7HpaUZcc= =mRGs -----END PGP SIGNATURE----- ------enig2DAJKLTVDKROLFATBSLSO-- From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 14:43:19 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 42213D4C for ; Thu, 7 Mar 2013 14:43:19 +0000 (UTC) (envelope-from rjk@wintek.com) Received: from local.wintek.com (local.wintek.com [72.12.201.234]) by mx1.freebsd.org (Postfix) with ESMTP id 09AB2EB6 for ; Thu, 7 Mar 2013 14:43:18 +0000 (UTC) Received: from rjk.wintek.local (172.28.1.248) by local.wintek.com (172.28.1.234) with Microsoft SMTP Server (TLS) id 8.1.436.0; Thu, 7 Mar 2013 09:42:10 -0500 Message-ID: <5138A742.3090200@wintek.com> Date: Thu, 7 Mar 2013 09:42:10 -0500 From: Richard Kuhns Organization: Wintek Corporation User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130220 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: Sanity Check on Mac Mini References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> In-Reply-To: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: rjk@wintek.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 14:43:19 -0000 On 03/07/13 01:59, Doug Hardie wrote: > I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. > > 1. Downloaded from current today via svnweb.freebsd.org: > sys/dev/bge/if_bgereg.h > sys/dev/bge/if_bge.c > sys/dev/mii/brgphy.c > > I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. > > 2. Put those on a flash drive. > > 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. > > 4. Copy the files from 1 above from flash over the files on the disk. > > 5. Rebuild the kernel and install it. > > Thanks, > > -- Doug That's worked for me 3 times now. -- Richard Kuhns My Desk: 765-269-8541 Wintek Corporation Internet Support: 765-269-8503 427 N 6th Street Consulting: 765-269-8504 Lafayette, IN 47901-2211 Accounting: 765-269-8502 From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 15:39:06 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E5B4C18E for ; Thu, 7 Mar 2013 15:39:06 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp2.u-psud.fr (smtp2.u-psud.fr [129.175.33.42]) by mx1.freebsd.org (Postfix) with ESMTP id 65C9418F for ; Thu, 7 Mar 2013 15:39:06 +0000 (UTC) Received: from smtp2.u-psud.fr (localhost [127.0.0.1]) by localhost (MTA) with SMTP id B89C934D5DD for ; Thu, 7 Mar 2013 16:38:59 +0100 (CET) Received: from [10.117.40.5] (dns.institutoptique.fr [129.175.196.160]) by smtp2.u-psud.fr (MTA) with ESMTP id 688AE34D872 for ; Thu, 7 Mar 2013 16:38:59 +0100 (CET) Message-ID: <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> Subject: Re: Strange reboot since 9.1 From: =?ISO-8859-1?Q?Lo=EFc?= Blot To: freebsd-stable@freebsd.org Date: Thu, 07 Mar 2013 16:38:54 +0100 In-Reply-To: <51389ED5.6030207@bsdinfo.com.br> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> Organization: UNIX Experience FR Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 15:39:07 -0000 Hi Marcelo, thanks. Here is a better trace: --------------------------------- kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80a84414 stack pointer = 0x28:0xffffff822fc267a0 frame pointer = 0x28:0xffffff822fc26830 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq265: bce0) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff809208a6 at kdb_backtrace+0x66 #1 0xffffffff808ea8be at panic+0x1ce #2 0xffffffff80bd8240 at trap_fatal+0x290 #3 0xffffffff80bd857d at trap_pfault+0x1ed #4 0xffffffff80bd8b9e at trap+0x3ce #5 0xffffffff80bc315f at calltrap+0x8 #6 0xffffffff80a861d5 at udp_input+0x475 #7 0xffffffff80a043dc at ip_input+0xac #8 0xffffffff809adafb at netisr_dispatch_src+0x20b #9 0xffffffff809a35cd at ether_demux+0x14d #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 #11 0xffffffff809adafb at netisr_dispatch_src+0x20b #12 0xffffffff80438fd7 at bce_intr+0x487 #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 #14 0xffffffff808c0076 at ithread_loop+0xa6 #15 0xffffffff808bb9ef at fork_exit+0x11f #16 0xffffffff80bc368e at fork_trampoline+0xe Uptime: 27m20s Dumping 1265 out of 8162 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% #0 doadump (textdump=Variable "textdump" is not available. ) at pcpu.h:224 224 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt f #0 doadump (textdump=Variable "textdump" is not available. ) at pcpu.h:224 No locals. #1 0xffffffff808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 _ep = Variable "_ep" is not available. (kgdb) bt #0 doadump (textdump=Variable "textdump" is not available. ) at pcpu.h:224 #1 0xffffffff808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0xffffffff808ea897 in panic (fmt=0x1
) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:857 #4 0xffffffff80bd857d in trap_pfault (frame=0xffffff822fc266f0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 #5 0xffffffff80bd8b9e in trap (frame=0xffffff822fc266f0) at /usr/src/sys/amd64/amd64/trap.c:456 #6 0xffffffff80bc315f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #7 0xffffffff80a84414 in udp_append (inp=0xfffffe019e2a1000, ip=0xfffffe00444b6c80, n=0xfffffe00444b6c00, off=20, udp_in=0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 #8 0xffffffff80a861d5 in udp_input (m=0xfffffe00444b6c00, off=Variable "off" is not available. ) at /usr/src/sys/netinet/udp_usrreq.c:618 #9 0xffffffff80a043dc in ip_input (m=0xfffffe00444b6c00) at /usr/src/sys/netinet/ip_input.c:760 #10 0xffffffff809adafb in netisr_dispatch_src (proto=1, source=Variable "source" is not available. ) at /usr/src/sys/net/netisr.c:1013 #11 0xffffffff809a35cd in ether_demux (ifp=0xfffffe00053fa000, m=0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 #12 0xffffffff809a38a4 in ether_nh_input (m=Variable "m" is not available. ) at /usr/src/sys/net/if_ethersubr.c:759 #13 0xffffffff809adafb in netisr_dispatch_src (proto=9, source=Variable "source" is not available. ) at /usr/src/sys/net/netisr.c:1013 #14 0xffffffff80438fd7 in bce_intr (xsc=Variable "xsc" is not available. ) at /usr/src/sys/dev/bce/if_bce.c:6903 #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is not available. ) at /usr/src/sys/kern/kern_intr.c:1262 #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe00057424e0) at /usr/src/sys/kern/kern_intr.c:1275 #17 0xffffffff808bb9ef in fork_exit (callout=0xffffffff808bffd0 , arg=0xfffffe00057424e0, frame=0xffffff822fc26c40) at /usr/src/sys/kern/kern_fork.c:992 #18 0xffffffff80bc368e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602 #19 0x0000000000000000 in ?? () #20 0x0000000000000000 in ?? () #21 0x0000000000000001 in ?? () #22 0x0000000000000000 in ?? () #23 0x0000000000000000 in ?? () #24 0x0000000000000000 in ?? () #25 0x0000000000000000 in ?? () #26 0x0000000000000000 in ?? () #27 0x0000000000000000 in ?? () #28 0x0000000000000000 in ?? () #29 0x0000000000000000 in ?? () #30 0x0000000000000000 in ?? () #31 0x0000000000000000 in ?? () #32 0x0000000000000000 in ?? () #33 0x0000000000000000 in ?? () #34 0x0000000000000000 in ?? () #35 0x0000000000000000 in ?? () #36 0x0000000000000000 in ?? () #37 0x0000000000000000 in ?? () #38 0x0000000000000000 in ?? () #39 0x0000000000000000 in ?? () #40 0x0000000000000000 in ?? () #41 0x0000000000000000 in ?? () #42 0x0000000000000000 in ?? () #43 0x0000000000000002 in ?? () #44 0xffffffff81241c00 in tdq_cpu () #45 0xfffffe0005501000 in ?? () #46 0x0000000000000000 in ?? () #47 0xffffff822fc266d0 in ?? () #48 0xffffff822fc26678 in ?? () #49 0xfffffe019ed11470 in ?? () #50 0xffffffff8091352e in sched_switch (td=0x0, newtd=0xfffffe00057424e0, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1921 Previous frame inner to this frame (corrupt stack?) --------------------------------- -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr Le jeudi 07 mars 2013 à 11:06 -0300, Marcelo Gondim a écrit : > Em 07/03/13 10:12, Loïc Blot escreveu: > > Hi Andriy, > > thanks for your help. > > > > here is the stack backtrace (i have 11 core.txt files, and each has this > > crash). (cat /var/crash/core.txt.11) > > > > panic: page fault > > cpuid = 0 > > KDB: stack backtrace: > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > #1 0xffffffff808ea8be at panic+0x1ce > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > #4 0xffffffff80bd8b9e at trap+0x3ce > > #5 0xffffffff80bc315f at calltrap+0x8 > > #6 0xffffffff80a861d5 at udp_input+0x475 > > #7 0xffffffff80a043dc at ip_input+0xac > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > #9 0xffffffff809a35cd at ether_demux+0x14d > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > Uptime: 2h6m59s > > Dumping 1177 out of 8162 > > MB:..2%..11%..21%..32%..41%..51%..62%..71%..81%..92% > > > > I can't read vmcore.11 only with this option: > > > > kgdb -d /var/crash/vmcore.11 > > > > I read man and thought i must use kgdb -c /var/crash/vmcore.11 but it's > > not a suitable image. (kgdb: couldn't find a suitable kernel image) > > > > This servers uses UDP packets, for SNMP requests (> 10000/h), NTP (a > > little), Syslog (that's all i remember). > Hi, > > Look this > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html > > []'s > Gondim > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 16:38:29 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5E208D11 for ; Thu, 7 Mar 2013 16:38:29 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:48]) by mx1.freebsd.org (Postfix) with ESMTP id 407C4618 for ; Thu, 7 Mar 2013 16:38:29 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta05.emeryville.ca.mail.comcast.net with comcast id 8gaH1l0030b6N64A5geUZe; Thu, 07 Mar 2013 16:38:28 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta03.emeryville.ca.mail.comcast.net with comcast id 8geT1l0091t3BNj8PgeT2g; Thu, 07 Mar 2013 16:38:28 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 5377073A31; Thu, 7 Mar 2013 08:38:27 -0800 (PST) Date: Thu, 7 Mar 2013 08:38:27 -0800 From: Jeremy Chadwick To: Lo?c Blot Subject: Re: Strange reboot since 9.1 Message-ID: <20130307163827.GA96983@icarus.home.lan> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1362674308; bh=K4Xb6DLItwNqpxuV151M48zsSe+7W+GrtjN1bHr0gYE=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=dWpF0p2rjab+MABbPJ1qAJoSip8ssAzes8JPDjHW5lhK98PMWu945fPzYhryjJswI cthEdQNodjqYHhWEOKbUQxjQiN/JIuRY0Zp08RPbq2btkxzBibpEkIvCc999+lh4+A o3oYGkQWDeWPDwZafsUaVjrjwdScEaHs09J4/mynO7GvxIXw1MFcTZEYbWPBvNqLex rT3m3lWcbUrKSB7ccpcUrlgLI5RsiGpd7WUAmBG41mv0RGEsNGQWWg380ZK6+QzPdv fwInZkULUfM01pMKaQUfHoe0zAcJc0ZsBQv6sTk/NEZJCZ1rxMxXX/sS2c498Z0tr6 US6nCYGXZnF6g== Cc: freebsd-stable@freebsd.org, yongari@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 16:38:29 -0000 On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: > Hi Marcelo, thanks. Here is a better trace: > > --------------------------------- > > kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80a84414 > stack pointer = 0x28:0xffffff822fc267a0 > frame pointer = 0x28:0xffffff822fc26830 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (irq265: bce0) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > #1 0xffffffff808ea8be at panic+0x1ce > #2 0xffffffff80bd8240 at trap_fatal+0x290 > #3 0xffffffff80bd857d at trap_pfault+0x1ed > #4 0xffffffff80bd8b9e at trap+0x3ce > #5 0xffffffff80bc315f at calltrap+0x8 > #6 0xffffffff80a861d5 at udp_input+0x475 > #7 0xffffffff80a043dc at ip_input+0xac > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > #9 0xffffffff809a35cd at ether_demux+0x14d > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > #12 0xffffffff80438fd7 at bce_intr+0x487 > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > #14 0xffffffff808c0076 at ithread_loop+0xa6 > #15 0xffffffff808bb9ef at fork_exit+0x11f > #16 0xffffffff80bc368e at fork_trampoline+0xe > Uptime: 27m20s > Dumping 1265 out of 8162 > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% > > #0 doadump (textdump=Variable "textdump" is not available. > ) at pcpu.h:224 > 224 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt f > #0 doadump (textdump=Variable "textdump" is not available. > ) at pcpu.h:224 > No locals. > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:448 > _ep = Variable "_ep" is not available. > (kgdb) bt > #0 doadump (textdump=Variable "textdump" is not available. > ) at pcpu.h:224 > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:448 > #2 0xffffffff808ea897 in panic (fmt=0x1
) > at /usr/src/sys/kern/kern_shutdown.c:636 > #3 0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is > not available. > ) at /usr/src/sys/amd64/amd64/trap.c:857 > #4 0xffffffff80bd857d in trap_pfault (frame=0xffffff822fc266f0, > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 > #5 0xffffffff80bd8b9e in trap (frame=0xffffff822fc266f0) > at /usr/src/sys/amd64/amd64/trap.c:456 > #6 0xffffffff80bc315f in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:228 > #7 0xffffffff80a84414 in udp_append (inp=0xfffffe019e2a1000, > ip=0xfffffe00444b6c80, n=0xfffffe00444b6c00, off=20, > udp_in=0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 > #8 0xffffffff80a861d5 in udp_input (m=0xfffffe00444b6c00, off=Variable > "off" is not available. > ) at /usr/src/sys/netinet/udp_usrreq.c:618 > #9 0xffffffff80a043dc in ip_input (m=0xfffffe00444b6c00) > at /usr/src/sys/netinet/ip_input.c:760 > #10 0xffffffff809adafb in netisr_dispatch_src (proto=1, source=Variable > "source" is not available. > ) at /usr/src/sys/net/netisr.c:1013 > #11 0xffffffff809a35cd in ether_demux (ifp=0xfffffe00053fa000, > m=0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 > #12 0xffffffff809a38a4 in ether_nh_input (m=Variable "m" is not > available. > ) at /usr/src/sys/net/if_ethersubr.c:759 > #13 0xffffffff809adafb in netisr_dispatch_src (proto=9, source=Variable > "source" is not available. > ) at /usr/src/sys/net/netisr.c:1013 > #14 0xffffffff80438fd7 in bce_intr (xsc=Variable "xsc" is not available. > ) at /usr/src/sys/dev/bce/if_bce.c:6903 > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is > not available. > ) at /usr/src/sys/kern/kern_intr.c:1262 > #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe00057424e0) > at /usr/src/sys/kern/kern_intr.c:1275 > #17 0xffffffff808bb9ef in fork_exit (callout=0xffffffff808bffd0 > , arg=0xfffffe00057424e0, frame=0xffffff822fc26c40) > at /usr/src/sys/kern/kern_fork.c:992 > #18 0xffffffff80bc368e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:602 > #19 0x0000000000000000 in ?? () > #20 0x0000000000000000 in ?? () > #21 0x0000000000000001 in ?? () > #22 0x0000000000000000 in ?? () > #23 0x0000000000000000 in ?? () > #24 0x0000000000000000 in ?? () > #25 0x0000000000000000 in ?? () > #26 0x0000000000000000 in ?? () > #27 0x0000000000000000 in ?? () > #28 0x0000000000000000 in ?? () > #29 0x0000000000000000 in ?? () > #30 0x0000000000000000 in ?? () > #31 0x0000000000000000 in ?? () > #32 0x0000000000000000 in ?? () > #33 0x0000000000000000 in ?? () > #34 0x0000000000000000 in ?? () > #35 0x0000000000000000 in ?? () > #36 0x0000000000000000 in ?? () > #37 0x0000000000000000 in ?? () > #38 0x0000000000000000 in ?? () > #39 0x0000000000000000 in ?? () > #40 0x0000000000000000 in ?? () > #41 0x0000000000000000 in ?? () > #42 0x0000000000000000 in ?? () > #43 0x0000000000000002 in ?? () > #44 0xffffffff81241c00 in tdq_cpu () > #45 0xfffffe0005501000 in ?? () > #46 0x0000000000000000 in ?? () > #47 0xffffff822fc266d0 in ?? () > #48 0xffffff822fc26678 in ?? () > #49 0xfffffe019ed11470 in ?? () > #50 0xffffffff8091352e in sched_switch (td=0x0, > newtd=0xfffffe00057424e0, flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1921 > Previous frame inner to this frame (corrupt stack?) > > --------------------------------- > > -- > Best regards, > > Lo??c BLOT, Engineering > UNIX Systems, Security and Networks > http://www.unix-experience.fr > > > Le jeudi 07 mars 2013 ?? 11:06 -0300, Marcelo Gondim a ??crit : > > Em 07/03/13 10:12, Lo??c Blot escreveu: > > > Hi Andriy, > > > thanks for your help. > > > > > > here is the stack backtrace (i have 11 core.txt files, and each has this > > > crash). (cat /var/crash/core.txt.11) > > > > > > panic: page fault > > > cpuid = 0 > > > KDB: stack backtrace: > > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > > #1 0xffffffff808ea8be at panic+0x1ce > > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > > #4 0xffffffff80bd8b9e at trap+0x3ce > > > #5 0xffffffff80bc315f at calltrap+0x8 > > > #6 0xffffffff80a861d5 at udp_input+0x475 > > > #7 0xffffffff80a043dc at ip_input+0xac > > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > #9 0xffffffff809a35cd at ether_demux+0x14d > > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > > Uptime: 2h6m59s > > > Dumping 1177 out of 8162 > > > MB:..2%..11%..21%..32%..41%..51%..62%..71%..81%..92% > > > > > > I can't read vmcore.11 only with this option: > > > > > > kgdb -d /var/crash/vmcore.11 > > > > > > I read man and thought i must use kgdb -c /var/crash/vmcore.11 but it's > > > not a suitable image. (kgdb: couldn't find a suitable kernel image) > > > > > > This servers uses UDP packets, for SNMP requests (> 10000/h), NTP (a > > > little), Syslog (that's all i remember). > > Hi, > > > > Look this > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html > > > > []'s > > Gondim CC'ing Yong-Hyeon (yongari@) who helps maintain the bce(4) driver; it looks to me the issue is there. He may have some advice. In the meantime, can you please provide full output from the following commands: - dmesg - pciconf -lvbc Thanks. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 16:41:04 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C109FED8; Thu, 7 Mar 2013 16:41:04 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp2.u-psud.fr (smtp2.u-psud.fr [129.175.33.42]) by mx1.freebsd.org (Postfix) with ESMTP id A4120649; Thu, 7 Mar 2013 16:41:03 +0000 (UTC) Received: from smtp2.u-psud.fr (localhost [127.0.0.1]) by localhost (MTA) with SMTP id E287534D5EB; Thu, 7 Mar 2013 17:40:56 +0100 (CET) Received: from [10.117.40.5] (dns.institutoptique.fr [129.175.196.160]) by smtp2.u-psud.fr (MTA) with ESMTP id 657DC34D60B; Thu, 7 Mar 2013 17:40:56 +0100 (CET) Message-ID: <1362674451.16808.51.camel@iMac-LBlot.domain.iogs> Subject: Re: Strange reboot since 9.1 From: =?ISO-8859-1?Q?Lo=EFc?= Blot To: Jeremy Chadwick Date: Thu, 07 Mar 2013 17:40:51 +0100 In-Reply-To: <20130307163827.GA96983@icarus.home.lan> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> <20130307163827.GA96983@icarus.home.lan> Organization: UNIX Experience FR Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: freebsd-stable@freebsd.org, yongari@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 16:41:04 -0000 Here is pciconf -lbcv hostb0@pci0:0:0:0: class=0x060000 card=0x02a51028 chip=0xd1308086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor DMI' class = bridge subclass = HOST-PCI cap 05[60] = MSI supports 2 messages, vector masks cap 10[90] = PCI-Express 1 root port max data 128(128) link x4(x4) cap 01[e0] = powerspec 3 supports D0 D3 current D0 ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 000d[150] = unknown 1 ecap 000b[160] = unknown 0 pcib1@pci0:0:3:0: class=0x060400 card=0x02a51028 chip=0xd1388086 rev=0x11 hdr=0x01 vendor = 'Intel Corporation' device = 'Core Processor PCI Express Root Port 1' class = bridge subclass = PCI-PCI cap 0d[40] = PCI Bridge card=0x02a51028 cap 05[60] = MSI supports 2 messages, vector masks cap 10[90] = PCI-Express 2 root port max data 256(256) link x8(x16) cap 01[e0] = powerspec 3 supports D0 D3 current D0 ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 000d[150] = unknown 1 ecap 000b[160] = unknown 0 none0@pci0:0:8:0: class=0x088000 card=0x00000000 chip=0xd1558086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor System Management Registers' class = base peripheral cap 10[40] = PCI-Express 2 root endpoint max data 128(128) link x0(x0) ecap 000b[100] = unknown 0 none1@pci0:0:8:1: class=0x088000 card=0x00000000 chip=0xd1568086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor Semaphore and Scratchpad Registers' class = base peripheral cap 10[40] = PCI-Express 2 root endpoint max data 128(128) link x0(x0) ecap 000b[100] = unknown 0 none2@pci0:0:8:2: class=0x088000 card=0x00000000 chip=0xd1578086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor System Control and Status Registers' class = base peripheral cap 10[40] = PCI-Express 2 root endpoint max data 128(128) link x0(x0) ecap 000b[100] = unknown 0 none3@pci0:0:8:3: class=0x088000 card=0x00000000 chip=0xd1588086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor Miscellaneous Registers' class = base peripheral none4@pci0:0:16:0: class=0x088000 card=0x00000000 chip=0xd1508086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor QPI Link' class = base peripheral none5@pci0:0:16:1: class=0x088000 card=0x00000000 chip=0xd1518086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Core Processor QPI Routing and Protocol Registers' class = base peripheral ehci0@pci0:0:26:0: class=0x0c0320 card=0x02a51028 chip=0x3b3c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset USB2 Enhanced Host Controller' class = serial bus subclass = USB bar [10] = type Memory, range 32, base 0xdf0fa000, size 1024, enabled cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 cap 13[98] = PCI Advanced Features: FLR TP pcib2@pci0:0:28:0: class=0x060400 card=0x02a51028 chip=0x3b428086 rev=0x05 hdr=0x01 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset PCI Express Root Port 1' class = bridge subclass = PCI-PCI cap 10[40] = PCI-Express 2 root port max data 128(128) link x4(x4) cap 05[80] = MSI supports 1 message cap 0d[90] = PCI Bridge card=0x02a51028 cap 01[a0] = powerspec 2 supports D0 D3 current D0 ehci1@pci0:0:29:0: class=0x0c0320 card=0x02a51028 chip=0x3b348086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset USB2 Enhanced Host Controller' class = serial bus subclass = USB bar [10] = type Memory, range 32, base 0xdf0fc000, size 1024, enabled cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 cap 13[98] = PCI Advanced Features: FLR TP pcib3@pci0:0:30:0: class=0x060401 card=0x02a51028 chip=0x244e8086 rev=0xa5 hdr=0x01 vendor = 'Intel Corporation' device = '82801 PCI Bridge' class = bridge subclass = PCI-PCI cap 0d[50] = PCI Bridge card=0x02a51028 isab0@pci0:0:31:0: class=0x060100 card=0x02a51028 chip=0x3b148086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '3400 Series Chipset LPC Interface Controller' class = bridge subclass = PCI-ISA cap 09[e0] = vendor (length 16) Intel cap 1 version 1 ahci0@pci0:0:31:2: class=0x010601 card=0x02a51028 chip=0x3b228086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '5 Series/3400 Series Chipset 6 port SATA AHCI Controller' class = mass storage subclass = SATA bar [10] = type I/O Port, range 32, base 0xecd0, size 8, enabled bar [14] = type I/O Port, range 32, base 0xecc8, size 4, enabled bar [18] = type I/O Port, range 32, base 0xecd8, size 8, enabled bar [1c] = type I/O Port, range 32, base 0xeccc, size 4, enabled bar [20] = type I/O Port, range 32, base 0xece0, size 32, enabled bar [24] = type Memory, range 32, base 0xdf0fe000, size 2048, enabled cap 05[80] = MSI supports 1 message enabled with 1 message cap 01[70] = powerspec 3 supports D0 D3 current D0 cap 12[a8] = SATA Index-Data Pair cap 13[b0] = PCI Advanced Features: FLR TP mpt0@pci0:1:0:0: class=0x010000 card=0x1f0e1028 chip=0x00581000 rev=0x08 hdr=0x00 vendor = 'LSI Logic / Symbios Logic' device = 'SAS1068E PCI-Express Fusion-MPT SAS' class = mass storage subclass = SCSI bar [10] = type I/O Port, range 32, base 0xfc00, size 256, disabled bar [14] = type Memory, range 64, base 0xdf2ec000, size 16384, enabled bar [1c] = type Memory, range 64, base 0xdf2f0000, size 65536, enabled cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 10[68] = PCI-Express 1 endpoint max data 256(4096) link x8(x8) cap 05[98] = MSI supports 1 message, 64 bit cap 11[b0] = MSI-X supports 1 message in map 0x14 enabled ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected bce0@pci0:2:0:0: class=0x020000 card=0x02a51028 chip=0x163b14e4 rev=0x20 hdr=0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme II BCM5716 Gigabit Ethernet' class = network subclass = ethernet bar [10] = type Memory, range 64, base 0xda000000, size 33554432, enabled cap 01[48] = powerspec 3 supports D0 D3 current D0 cap 03[50] = VPD cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message cap 11[a0] = MSI-X supports 9 messages in map 0x10 cap 10[ac] = PCI-Express 2 endpoint max data 128(512) link x4(x4) ecap 0003[100] = Serial 1 0026b9fffe767f1a ecap 0001[110] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0004[150] = unknown 1 ecap 0002[160] = VC 1 max VC0 bce1@pci0:2:0:1: class=0x020000 card=0x02a51028 chip=0x163b14e4 rev=0x20 hdr=0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme II BCM5716 Gigabit Ethernet' class = network subclass = ethernet bar [10] = type Memory, range 64, base 0xdc000000, size 33554432, enabled cap 01[48] = powerspec 3 supports D0 D3 current D0 cap 03[50] = VPD cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message cap 11[a0] = MSI-X supports 9 messages in map 0x10 cap 10[ac] = PCI-Express 2 endpoint max data 128(512) link x4(x4) ecap 0003[100] = Serial 1 0026b9fffe767f1b ecap 0001[110] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0004[150] = unknown 1 ecap 0002[160] = VC 1 max VC0 vgapci0@pci0:3:3:0: class=0x030000 card=0x02a51028 chip=0x0532102b rev=0x0a hdr=0x00 vendor = 'Matrox Graphics, Inc.' device = 'MGA G200eW WPCM450' class = display subclass = VGA bar [10] = type Prefetchable Memory, range 32, base 0xd9800000, size 8388608, enabled bar [14] = type Memory, range 32, base 0xde7fc000, size 16384, enabled bar [18] = type Memory, range 32, base 0xde800000, size 8388608, enabled cap 01[dc] = powerspec 1 supports D0 D3 current D0 and now dmesg CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz (2394.04-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x106e5 Family = 6 Model = 1e Stepping = 5 Features=0xbfebfbff Features2=0x98e3fd AMD Features=0x28100800 AMD Features2=0x1 TSC: P-state invariant, performance statistics real memory = 8589934592 (8192 MB) avail memory = 8219975680 (7839 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 2 cpu2 (AP): APIC ID: 4 cpu3 (AP): APIC ID: 6 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) unknown: I/O range not supported cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 atrtc0: port 0x70-0x7f irq 8 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x5f irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 Event timer "HPET3" frequency 14318180 Hz quality 440 Event timer "HPET4" frequency 14318180 Hz quality 440 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 3.0 on pci0 pci1: on pcib1 mpt0: port 0xfc00-0xfcff mem 0xdf2ec000-0xdf2effff,0xdf2f0000-0xdf2fffff irq 16 at device 0.0 on pci1 mpt0: MPI Version=1.5.18.0 mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) mpt0: 1 Active Volume (2 Max) mpt0: 2 Hidden Drive Members (14 Max) pci0: at device 8.0 (no driver attached) pci0: at device 8.1 (no driver attached) pci0: at device 8.2 (no driver attached) pci0: at device 8.3 (no driver attached) pci0: at device 16.0 (no driver attached) pci0: at device 16.1 (no driver attached) ehci0: mem 0xdf0fa000-0xdf0fa3ff irq 22 at device 26.0 on pci0 usbus0: EHCI version 1.0 usbus0 on ehci0 pcib2: at device 28.0 on pci0 pci2: on pcib2 bce0: mem 0xda000000-0xdbffffff irq 16 at device 0.0 on pci2 miibus0: on bce0 brgphy0: PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce0: Ethernet address: 00:26:b9:76:7f:1a bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.0.11); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.5) Coal (RX:6,6,18,18; TX:20,20,80,80) bce1: mem 0xdc000000-0xddffffff irq 17 at device 0.1 on pci2 miibus1: on bce1 brgphy1: PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce1: Ethernet address: 00:26:b9:76:7f:1b bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.0.11); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.5) Coal (RX:6,6,18,18; TX:20,20,80,80) ehci1: mem 0xdf0fc000-0xdf0fc3ff irq 22 at device 29.0 on pci0 usbus1: EHCI version 1.0 usbus1 on ehci1 pcib3: at device 30.0 on pci0 pci3: on pcib3 vgapci0: mem 0xd9800000-0xd9ffffff,0xde7fc000-0xde7fffff,0xde800000-0xdeffffff irq 19 at device 3.0 on pci3 isab0: at device 31.0 on pci0 isa0: on isab0 ahci0: port 0xecd0-0xecd7,0xecc8-0xeccb,0xecd8-0xecdf,0xeccc-0xeccf,0xece0-0xecff mem 0xdf0fe000-0xdf0fe7ff irq 20 at device 31.2 on pci0 ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahcich2: at channel 2 on ahci0 ahcich3: at channel 3 on ahci0 ahcich4: at channel 4 on ahci0 ahcich5: at channel 5 on ahci0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 orm0: at iomem 0xc0000-0xc7fff,0xce000-0xcefff,0xec000-0xeffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range ctl: CAM Target Layer loaded est0: on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 13 device_attach: est0 attach returned 6 p4tcc0: on cpu0 est1: on cpu1 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 13 device_attach: est1 attach returned 6 p4tcc1: on cpu1 est2: on cpu2 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 13 device_attach: est2 attach returned 6 p4tcc2: on cpu2 est3: on cpu3 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 13 device_attach: est3 attach returned 6 p4tcc3: on cpu3 Timecounters tick every 1.000 msec usbus0: 480Mbps High Speed USB v2.0 usbus1: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 mpt0:vol0(mpt0:0:0): Settings ( Member-WCE Hot-Plug-Spares High-Priority-ReSync ) mpt0:vol0(mpt0:0:0): Using Spare Pool: 0 mpt0:vol0(mpt0:0:0): 2 Members: (mpt0:1:8:0): Primary Online (mpt0:1:1:0): Secondary Online mpt0:vol0(mpt0:0:0): RAID-1 - Optimal mpt0:vol0(mpt0:0:0): Status ( Enabled ) (mpt0:vol0:1): Physical (mpt0:0:1:0), Pass-thru (mpt0:1:0:0) (mpt0:vol0:1): Online (mpt0:vol0:0): Physical (mpt0:0:8:0), Pass-thru (mpt0:1:1:0) (mpt0:vol0:0): Online uhub0: 2 ports with 2 removable, self powered uhub1: 2 ports with 2 removable, self powered ugen0.2: at usbus0 uhub2: on usbus0 ugen1.2: at usbus1 uhub3: on usbus1 uhub2: 6 ports with 6 removable, self powered uhub3: 8 ports with 8 removable, self powered ugen1.3: at usbus1 ukbd0: on usbus1 kbd0 at ukbd0 ums0: on usbus1 ums0: 5 buttons and [XYZ] coordinates ID=1 (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe0:mpt0:0:0:0): REPORT LUNS. CDB: a0 0 0 0 0 0 0 0 0 10 0 0 (probe0:mpt0:0:0:0): CAM status: SCSI Status Error (probe0:mpt0:0:0:0): SCSI status: Check Condition (probe0:mpt0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:ffffffff,ffffffff (Reserved ASC/ASCQ pair) (probe0:mpt0:0:0:0): Error 22, Unretryable error (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Error 5, Retries exhausted (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Retrying command (probe64:mpt0:1:1:0): INQUIRY. CDB: 12 0 0 0 24 0 (probe64:mpt0:1:1:0): CAM status: Unrecoverable Host Bus Adapter Error (probe64:mpt0:1:1:0): Error 5, Retries exhausted da0 at mpt0 bus 0 scbus0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 237824MB (487063552 512 byte sectors: 255H 63S/T 30318C) pass1 at mpt0 bus 1 scbus1 target 0 lun 0 pass1: Fixed Uninstalled SCSI-5 device pass1: 300.000MB/s transfers pass1: Command Queueing enabled cd0 at ahcich3 bus 0 scbus5 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device SMP: AP CPU #2 Launched! cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! Timecounter "TSC-low" frequency 9351702 Hz quality 1000 ugen1.4: at usbus1 uhub4: on usbus1 uhub4: MTT enabled Root mount waiting for: usbus1 uhub4: 4 ports with 4 removable, self powered Trying to mount root from ufs:/dev/da0p2 [rw]... -- Best regards, Loïc BLOT, Engineering UNIX Systems, Security and Networks http://www.unix-experience.fr Le jeudi 07 mars 2013 à 08:38 -0800, Jeremy Chadwick a écrit : > On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: > > Hi Marcelo, thanks. Here is a better trace: > > > > --------------------------------- > > > > kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 > > GNU gdb 6.1.1 [FreeBSD] > > Copyright 2004 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and you > > are > > welcome to change it and/or distribute copies of it under certain > > conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for > > details. > > This GDB was configured as "amd64-marcel-freebsd"... > > > > Unread portion of the kernel message buffer: > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x0 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80a84414 > > stack pointer = 0x28:0xffffff822fc267a0 > > frame pointer = 0x28:0xffffff822fc26830 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 12 (irq265: bce0) > > trap number = 12 > > panic: page fault > > cpuid = 0 > > KDB: stack backtrace: > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > #1 0xffffffff808ea8be at panic+0x1ce > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > #4 0xffffffff80bd8b9e at trap+0x3ce > > #5 0xffffffff80bc315f at calltrap+0x8 > > #6 0xffffffff80a861d5 at udp_input+0x475 > > #7 0xffffffff80a043dc at ip_input+0xac > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > #9 0xffffffff809a35cd at ether_demux+0x14d > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > Uptime: 27m20s > > Dumping 1265 out of 8162 > > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% > > > > #0 doadump (textdump=Variable "textdump" is not available. > > ) at pcpu.h:224 > > 224 pcpu.h: No such file or directory. > > in pcpu.h > > (kgdb) bt f > > #0 doadump (textdump=Variable "textdump" is not available. > > ) at pcpu.h:224 > > No locals. > > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > > at /usr/src/sys/kern/kern_shutdown.c:448 > > _ep = Variable "_ep" is not available. > > (kgdb) bt > > #0 doadump (textdump=Variable "textdump" is not available. > > ) at pcpu.h:224 > > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > > at /usr/src/sys/kern/kern_shutdown.c:448 > > #2 0xffffffff808ea897 in panic (fmt=0x1
) > > at /usr/src/sys/kern/kern_shutdown.c:636 > > #3 0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is > > not available. > > ) at /usr/src/sys/amd64/amd64/trap.c:857 > > #4 0xffffffff80bd857d in trap_pfault (frame=0xffffff822fc266f0, > > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 > > #5 0xffffffff80bd8b9e in trap (frame=0xffffff822fc266f0) > > at /usr/src/sys/amd64/amd64/trap.c:456 > > #6 0xffffffff80bc315f in calltrap () > > at /usr/src/sys/amd64/amd64/exception.S:228 > > #7 0xffffffff80a84414 in udp_append (inp=0xfffffe019e2a1000, > > ip=0xfffffe00444b6c80, n=0xfffffe00444b6c00, off=20, > > udp_in=0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 > > #8 0xffffffff80a861d5 in udp_input (m=0xfffffe00444b6c00, off=Variable > > "off" is not available. > > ) at /usr/src/sys/netinet/udp_usrreq.c:618 > > #9 0xffffffff80a043dc in ip_input (m=0xfffffe00444b6c00) > > at /usr/src/sys/netinet/ip_input.c:760 > > #10 0xffffffff809adafb in netisr_dispatch_src (proto=1, source=Variable > > "source" is not available. > > ) at /usr/src/sys/net/netisr.c:1013 > > #11 0xffffffff809a35cd in ether_demux (ifp=0xfffffe00053fa000, > > m=0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 > > #12 0xffffffff809a38a4 in ether_nh_input (m=Variable "m" is not > > available. > > ) at /usr/src/sys/net/if_ethersubr.c:759 > > #13 0xffffffff809adafb in netisr_dispatch_src (proto=9, source=Variable > > "source" is not available. > > ) at /usr/src/sys/net/netisr.c:1013 > > #14 0xffffffff80438fd7 in bce_intr (xsc=Variable "xsc" is not available. > > ) at /usr/src/sys/dev/bce/if_bce.c:6903 > > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is > > not available. > > ) at /usr/src/sys/kern/kern_intr.c:1262 > > #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe00057424e0) > > at /usr/src/sys/kern/kern_intr.c:1275 > > #17 0xffffffff808bb9ef in fork_exit (callout=0xffffffff808bffd0 > > , arg=0xfffffe00057424e0, frame=0xffffff822fc26c40) > > at /usr/src/sys/kern/kern_fork.c:992 > > #18 0xffffffff80bc368e in fork_trampoline () > > at /usr/src/sys/amd64/amd64/exception.S:602 > > #19 0x0000000000000000 in ?? () > > #20 0x0000000000000000 in ?? () > > #21 0x0000000000000001 in ?? () > > #22 0x0000000000000000 in ?? () > > #23 0x0000000000000000 in ?? () > > #24 0x0000000000000000 in ?? () > > #25 0x0000000000000000 in ?? () > > #26 0x0000000000000000 in ?? () > > #27 0x0000000000000000 in ?? () > > #28 0x0000000000000000 in ?? () > > #29 0x0000000000000000 in ?? () > > #30 0x0000000000000000 in ?? () > > #31 0x0000000000000000 in ?? () > > #32 0x0000000000000000 in ?? () > > #33 0x0000000000000000 in ?? () > > #34 0x0000000000000000 in ?? () > > #35 0x0000000000000000 in ?? () > > #36 0x0000000000000000 in ?? () > > #37 0x0000000000000000 in ?? () > > #38 0x0000000000000000 in ?? () > > #39 0x0000000000000000 in ?? () > > #40 0x0000000000000000 in ?? () > > #41 0x0000000000000000 in ?? () > > #42 0x0000000000000000 in ?? () > > #43 0x0000000000000002 in ?? () > > #44 0xffffffff81241c00 in tdq_cpu () > > #45 0xfffffe0005501000 in ?? () > > #46 0x0000000000000000 in ?? () > > #47 0xffffff822fc266d0 in ?? () > > #48 0xffffff822fc26678 in ?? () > > #49 0xfffffe019ed11470 in ?? () > > #50 0xffffffff8091352e in sched_switch (td=0x0, > > newtd=0xfffffe00057424e0, flags=Variable "flags" is not available. > > ) at /usr/src/sys/kern/sched_ule.c:1921 > > Previous frame inner to this frame (corrupt stack?) > > > > --------------------------------- > > > > -- > > Best regards, > > > > Lo??c BLOT, Engineering > > UNIX Systems, Security and Networks > > http://www.unix-experience.fr > > > > > > Le jeudi 07 mars 2013 ?? 11:06 -0300, Marcelo Gondim a ??crit : > > > Em 07/03/13 10:12, Lo??c Blot escreveu: > > > > Hi Andriy, > > > > thanks for your help. > > > > > > > > here is the stack backtrace (i have 11 core.txt files, and each has this > > > > crash). (cat /var/crash/core.txt.11) > > > > > > > > panic: page fault > > > > cpuid = 0 > > > > KDB: stack backtrace: > > > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > > > #1 0xffffffff808ea8be at panic+0x1ce > > > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > > > #4 0xffffffff80bd8b9e at trap+0x3ce > > > > #5 0xffffffff80bc315f at calltrap+0x8 > > > > #6 0xffffffff80a861d5 at udp_input+0x475 > > > > #7 0xffffffff80a043dc at ip_input+0xac > > > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > > #9 0xffffffff809a35cd at ether_demux+0x14d > > > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > > > Uptime: 2h6m59s > > > > Dumping 1177 out of 8162 > > > > MB:..2%..11%..21%..32%..41%..51%..62%..71%..81%..92% > > > > > > > > I can't read vmcore.11 only with this option: > > > > > > > > kgdb -d /var/crash/vmcore.11 > > > > > > > > I read man and thought i must use kgdb -c /var/crash/vmcore.11 but it's > > > > not a suitable image. (kgdb: couldn't find a suitable kernel image) > > > > > > > > This servers uses UDP packets, for SNMP requests (> 10000/h), NTP (a > > > > little), Syslog (that's all i remember). > > > Hi, > > > > > > Look this > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html > > > > > > []'s > > > Gondim > > CC'ing Yong-Hyeon (yongari@) who helps maintain the bce(4) driver; it > looks to me the issue is there. He may have some advice. > > In the meantime, can you please provide full output from the following > commands: > > - dmesg > - pciconf -lvbc > > Thanks. > From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 18:10:46 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9A4193C1; Thu, 7 Mar 2013 18:10:46 +0000 (UTC) (envelope-from jlh@FreeBSD.org) Received: from caravan.chchile.org (caravan.chchile.org [178.32.125.136]) by mx1.freebsd.org (Postfix) with ESMTP id 68A8DA57; Thu, 7 Mar 2013 18:10:46 +0000 (UTC) Received: by caravan.chchile.org (Postfix, from userid 1000) id D3B4EB4BC4; Thu, 7 Mar 2013 18:10:38 +0000 (UTC) Date: Thu, 7 Mar 2013 19:10:38 +0100 From: Jeremie Le Hen To: Konstantin Belousov Subject: Re: gdb broken on 9.1/amd64? Message-ID: <20130307181038.GA9569@caravan.chchile.org> References: <20130306180222.GC5939@caravan.chchile.org> <20130306205059.GQ3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130306205059.GQ3794@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Jeremie Le Hen , freebsd-stable@FreeBSD.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 18:10:46 -0000 (Please Cc: me on reply.) On Wed, Mar 06, 2013 at 10:50:59PM +0200, Konstantin Belousov wrote: > On Wed, Mar 06, 2013 at 07:02:22PM +0100, Jeremie Le Hen wrote: > > root@ingwe:~ # gdb -p 521 > > Try to specify the executable binary on the command line. It works better indeed! Now I can get a backtrace for sleep(1), but I am experiencing difficulties to debug OpenSTMPD: # gdb /usr/local/sbin/smtpd 25442 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Attaching to program: /usr/local/sbin/smtpd, process 25442 Reading symbols from /usr/local/lib/libsqlite3.so.8...done. Loaded symbols for /usr/local/lib/libsqlite3.so.8 Reading symbols from /usr/local/lib/libevent-1.4.so.4...done. Loaded symbols for /usr/local/lib/libevent-1.4.so.4 Reading symbols from /lib/libcrypto.so.6...done. Loaded symbols for /lib/libcrypto.so.6 Reading symbols from /usr/lib/libssl.so.6...done. Loaded symbols for /usr/lib/libssl.so.6 Reading symbols from /lib/libz.so.6...done. Loaded symbols for /lib/libz.so.6 Reading symbols from /lib/libutil.so.9...done. Loaded symbols for /lib/libutil.so.9 Reading symbols from /lib/libcrypt.so.5...done. Loaded symbols for /lib/libcrypt.so.5 Reading symbols from /usr/lib/libpam.so.5...done. Loaded symbols for /usr/lib/libpam.so.5 Reading symbols from /lib/libc.so.7...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /lib/libthr.so.3...done. Error while reading shared library symbols: Cannot get thread info: invalid key Reading symbols from /libexec/ld-elf.so.1...done. Loaded symbols for /libexec/ld-elf.so.1 0x000000080ce4281c in kevent () from /lib/libc.so.7 (gdb) bt #0 0x000000080ce4281c in kevent () from /lib/libc.so.7 #1 0x0000000803104070 in kq_dispatch () from /usr/local/lib/libevent-1.4.so.4 #2 0x00000008030f802a in event_base_loop () from /usr/local/lib/libevent-1.4.so.4 #3 0x000000000042fd7f in smtp () at smtp.c:295 #4 0x0000000000436c0f in fork_peers () at smtpd.c:983 #5 0x000000000043686d in main (argc=0, argv=0x7fffffff77e8) at smtpd.c:904 (gdb) c Continuing. no thread to satisfy query 0x000000080ce4281c in kevent () from /lib/libc.so.7 The problem is that the process seems hung here, despite the continue. When I am connecting to it, I say HELO but I don't get any reply if gdb(1) is attached. Also, the following might be relevant? (gdb) thread apply all c Cannot get thread info: invalid key Any idea? Thanks. Cheers, -- Jeremie Le Hen Scientists say the world is made up of Protons, Neutrons and Electrons. They forgot to mention Morons. From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 18:57:45 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 928F0C08 for ; Thu, 7 Mar 2013 18:57:45 +0000 (UTC) (envelope-from prvs=1778223946=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 35F4DD05 for ; Thu, 7 Mar 2013 18:57:44 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002602966.msg for ; Thu, 07 Mar 2013 18:57:38 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 07 Mar 2013 18:57:38 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1778223946=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: From: "Steven Hartland" To: "Karl Denninger" , References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Thu, 7 Mar 2013 18:57:41 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 18:57:45 -0000 ----- Original Message ----- From: "Karl Denninger" > Where I am right now is this: > > 1. I *CANNOT* reproduce the spins on the test machine with Postgres > stopped in any way. Even with multiple ZFS send/recv copies going on > and the load average north of 20 (due to all the geli threads), the > system doesn't stall or produce any notable pauses in throughput. Nor > does the system RAM allocation get driven hard enough to force paging. > > This is with NO tuning hacks in /boot/loader.conf. I/O performance is > both stable and solid. > > 2. WITH Postgres running as a connected hot spare (identical to the > production machine), allocating ~1.5G of shared, wired memory, running > the same synthetic workload in (1) above I am getting SMALL versions of > the misbehavior. However, while system RAM allocation gets driven > pretty hard and reaches down toward 100MB in some instances it doesn't > get driven hard enough to allocate swap. The "burstiness" is very > evident in the iostat figures with spates getting into the single digit > MB/sec range from time to time but it's not enough to drive the system > to a full-on stall. > > There's pretty-clearly a bad interaction here between Postgres wiring > memory and the ARC, when the latter is left alone and allowed to do what > it wants. I'm continuing to work on replicating this on the test > machine... just not completely there yet. Another possibility to consider is how postgres uses the FS. For example does is request sync IO in ways not present in the system without it which is causing the FS and possibly underlying disk system to behave differently. One other options to test, just to rule it out is what happens if you use BSD scheduler instead of ULE? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 19:07:18 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 60144C2 for ; Thu, 7 Mar 2013 19:07:18 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 10880D72 for ; Thu, 7 Mar 2013 19:07:17 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r27J7GKF089168 for ; Thu, 7 Mar 2013 13:07:16 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Thu Mar 7 13:07:16 2013 Message-ID: <5138E55F.7080107@denninger.net> Date: Thu, 07 Mar 2013 13:07:11 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> In-Reply-To: X-Enigmail-Version: 1.5.1 X-Antivirus: avast! (VPS 130307-0, 03/07/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 19:07:18 -0000 On 3/7/2013 12:57 PM, Steven Hartland wrote: > > ----- Original Message ----- From: "Karl Denninger" >> Where I am right now is this: >> >> 1. I *CANNOT* reproduce the spins on the test machine with Postgres >> stopped in any way. Even with multiple ZFS send/recv copies going on >> and the load average north of 20 (due to all the geli threads), the >> system doesn't stall or produce any notable pauses in throughput. Nor >> does the system RAM allocation get driven hard enough to force paging. >> This is with NO tuning hacks in /boot/loader.conf. I/O performance is >> both stable and solid. >> >> 2. WITH Postgres running as a connected hot spare (identical to the >> production machine), allocating ~1.5G of shared, wired memory, running >> the same synthetic workload in (1) above I am getting SMALL versions of >> the misbehavior. However, while system RAM allocation gets driven >> pretty hard and reaches down toward 100MB in some instances it doesn't >> get driven hard enough to allocate swap. The "burstiness" is very >> evident in the iostat figures with spates getting into the single digit >> MB/sec range from time to time but it's not enough to drive the system >> to a full-on stall. >> >> There's pretty-clearly a bad interaction here between Postgres wiring >> memory and the ARC, when the latter is left alone and allowed to do what >> it wants. I'm continuing to work on replicating this on the test >> machine... just not completely there yet. > > Another possibility to consider is how postgres uses the FS. For example > does is request sync IO in ways not present in the system without it > which is causing the FS and possibly underlying disk system to behave > differently. > That's possible but not terribly-likely in this particular instance. The reason is that I ran into this with the Postgres data store on a UFS volume BEFORE I converted it. Now it's on the ZFS pool (with recordsize=8k as recommended for that filesystem) but when I first ran into this it was on a separate UFS filesystem (which is where it had resided for 2+ years without incident), so unless the Postgres filesystem use on a UFS volume would give ZFS fits it's unlikely to be involved. > One other options to test, just to rule it out is what happens if you > use BSD scheduler instead of ULE? > > Regards > Steve > I will test that but first I have to get the test machine to reliably stall so I know I'm not chasing my tail. -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 19:10:09 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AE7E22DE for ; Thu, 7 Mar 2013 19:10:09 +0000 (UTC) (envelope-from bc979@lafn.org) Received: from zoom.lafn.org (zoom.lafn.org [108.92.93.123]) by mx1.freebsd.org (Postfix) with ESMTP id 8AC72DA7 for ; Thu, 7 Mar 2013 19:10:09 +0000 (UTC) Received: from [10.0.1.2] (static-71-177-216-148.lsanca.fios.verizon.net [71.177.216.148]) (authenticated bits=0) by zoom.lafn.org (8.14.3/8.14.2) with ESMTP id r27JA7Qx012508 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 7 Mar 2013 11:10:08 -0800 (PST) (envelope-from bc979@lafn.org) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Sanity Check on Mac Mini From: Doug Hardie In-Reply-To: <5138A742.3090200@wintek.com> Date: Thu, 7 Mar 2013 11:10:07 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> To: rjk@wintek.com X-Mailer: Apple Mail (2.1499) X-Virus-Scanned: clamav-milter 0.97 at zoom.lafn.org X-Virus-Status: Clean Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 19:10:09 -0000 On 7 March 2013, at 06:42, Richard Kuhns wrote: > On 03/07/13 01:59, Doug Hardie wrote: >> I have a new Mac Mini and have encountered the same problem reported = last year by Richard Kuhns. YongHyeon PYUN provided some patches to the = kernel that resolved the problem. However, without an internet = connection its a bit tricky to get them into the system. Here is the = approach I believe will work, but wanted to check first before I really = mess things up. >>=20 >> 1. Downloaded from current today via svnweb.freebsd.org: >> sys/dev/bge/if_bgereg.h >> sys/dev/bge/if_bge.c >> sys/dev/mii/brgphy.c >>=20 >> I believe the patches are incorporated in today's versions. The = comments indicate such. Thus I don't need to apply the original = supplied patch. >>=20 >> 2. Put those on a flash drive. >>=20 >> 3. Install 9.1 release from flash drive onto the Mini disk. Have to = include the system source. >>=20 >> 4. Copy the files from 1 above from flash over the files on the = disk. >>=20 >> 5. Rebuild the kernel and install it. >>=20 >> Thanks, >>=20 >> -- Doug >=20 > That's worked for me 3 times now. Thanks. Well, I got 9.1 Release installed, but it won't boot from the = internal disk. It doesn't see the disk as bootable. I installed using = the entire disk for FreeBSD. I used the i386 release. Perhaps I need = to switch to the amd64 release? -- Doug From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 19:27:11 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 879CF685 for ; Thu, 7 Mar 2013 19:27:11 +0000 (UTC) (envelope-from prvs=1778223946=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 2A72AE3A for ; Thu, 7 Mar 2013 19:27:10 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002603614.msg for ; Thu, 07 Mar 2013 19:27:09 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 07 Mar 2013 19:27:09 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1778223946=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: <322C3648171F4BF28201350E5656372A@multiplay.co.uk> From: "Steven Hartland" To: "Karl Denninger" , References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> <5138E55F.7080107@denninger.net> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Thu, 7 Mar 2013 19:27:15 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 19:27:11 -0000 ----- Original Message ----- From: "Karl Denninger" To: Sent: Thursday, March 07, 2013 7:07 PM Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? On 3/7/2013 12:57 PM, Steven Hartland wrote: > > ----- Original Message ----- From: "Karl Denninger" >> Where I am right now is this: >> >> 1. I *CANNOT* reproduce the spins on the test machine with Postgres >> stopped in any way. Even with multiple ZFS send/recv copies going on >> and the load average north of 20 (due to all the geli threads), the >> system doesn't stall or produce any notable pauses in throughput. Nor >> does the system RAM allocation get driven hard enough to force paging. >> This is with NO tuning hacks in /boot/loader.conf. I/O performance is >> both stable and solid. >> >> 2. WITH Postgres running as a connected hot spare (identical to the >> production machine), allocating ~1.5G of shared, wired memory, running >> the same synthetic workload in (1) above I am getting SMALL versions of >> the misbehavior. However, while system RAM allocation gets driven >> pretty hard and reaches down toward 100MB in some instances it doesn't >> get driven hard enough to allocate swap. The "burstiness" is very >> evident in the iostat figures with spates getting into the single digit >> MB/sec range from time to time but it's not enough to drive the system >> to a full-on stall. >> >>> There's pretty-clearly a bad interaction here between Postgres wiring >>> memory and the ARC, when the latter is left alone and allowed to do what >>> it wants. I'm continuing to work on replicating this on the test >>> machine... just not completely there yet. >> >> Another possibility to consider is how postgres uses the FS. For example >> does is request sync IO in ways not present in the system without it >> which is causing the FS and possibly underlying disk system to behave >> differently. > > That's possible but not terribly-likely in this particular instance. > The reason is that I ran into this with the Postgres data store on a UFS > volume BEFORE I converted it. Now it's on the ZFS pool (with > recordsize=8k as recommended for that filesystem) but when I first ran > into this it was on a separate UFS filesystem (which is where it had > resided for 2+ years without incident), so unless the Postgres > filesystem use on a UFS volume would give ZFS fits it's unlikely to be > involved. I hate to say it, but that sounds very familiar to something we experienced with a machine here which was running high numbers of rrd updates. Again we had the issue on UFS and saw the same thing when we moved the ZFS. I'll leave that there as to not derail the investigation with what could be totally irrelavent info, but it may prove an interesting data point later. There are obvious common low level points between UFS and ZFS which may be the cause. One area which springs to mind is device bio ordering and barriers which could well be impacted by sync IO requests independent of the FS in use. >> One other options to test, just to rule it out is what happens if you >> use BSD scheduler instead of ULE? > > I will test that but first I have to get the test machine to reliably > stall so I know I'm not chasing my tail. Very sensible. Assuming you can reproduce it, one thing that might be interesting to try is to eliminate all sync IO. I'm not sure if there are options in Postgres to do this via configuration or if it would require editing the code but this could reduce the problem space. If disabling sync IO eliminated the problem it would go a long way to proving it isn't the IO volume or pattern per say but instead related to the sync nature of said IO. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 19:30:57 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E58FDB0B for ; Thu, 7 Mar 2013 19:30:57 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) by mx1.freebsd.org (Postfix) with ESMTP id 82A4DE6E for ; Thu, 7 Mar 2013 19:30:57 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.6/8.13.1) with ESMTP id r27JUuwO090443 for ; Thu, 7 Mar 2013 13:30:56 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] [192.168.1.40] by Spamblock-sys (LOCAL); Thu Mar 7 13:30:56 2013 Message-ID: <5138EAEB.7010105@denninger.net> Date: Thu, 07 Mar 2013 13:30:51 -0600 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> <5138E55F.7080107@denninger.net> <322C3648171F4BF28201350E5656372A@multiplay.co.uk> In-Reply-To: <322C3648171F4BF28201350E5656372A@multiplay.co.uk> X-Enigmail-Version: 1.5.1 X-Antivirus: avast! (VPS 130307-0, 03/07/2013), Outbound message X-Antivirus-Status: Clean Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 19:30:58 -0000 On 3/7/2013 1:27 PM, Steven Hartland wrote: > > ----- Original Message ----- From: "Karl Denninger" > To: > Sent: Thursday, March 07, 2013 7:07 PM > Subject: Re: ZFS "stalls" -- and maybe we should be talking about > defaults? > > > > On 3/7/2013 12:57 PM, Steven Hartland wrote: >> >> ----- Original Message ----- From: "Karl Denninger" >>> Where I am right now is this: >>> >>> 1. I *CANNOT* reproduce the spins on the test machine with Postgres >>> stopped in any way. Even with multiple ZFS send/recv copies going on >>> and the load average north of 20 (due to all the geli threads), the >>> system doesn't stall or produce any notable pauses in throughput. Nor >>> does the system RAM allocation get driven hard enough to force paging. >>> This is with NO tuning hacks in /boot/loader.conf. I/O performance is >>> both stable and solid. >>> >>> 2. WITH Postgres running as a connected hot spare (identical to the >>> production machine), allocating ~1.5G of shared, wired memory, running >>> the same synthetic workload in (1) above I am getting SMALL versions of >>> the misbehavior. However, while system RAM allocation gets driven >>> pretty hard and reaches down toward 100MB in some instances it doesn't >>> get driven hard enough to allocate swap. The "burstiness" is very >>> evident in the iostat figures with spates getting into the single digit >>> MB/sec range from time to time but it's not enough to drive the system >>> to a full-on stall. >>> >>>> There's pretty-clearly a bad interaction here between Postgres wiring >>>> memory and the ARC, when the latter is left alone and allowed to do >>>> what >>>> it wants. I'm continuing to work on replicating this on the test >>>> machine... just not completely there yet. >>> >>> Another possibility to consider is how postgres uses the FS. For >>> example >>> does is request sync IO in ways not present in the system without it >>> which is causing the FS and possibly underlying disk system to behave >>> differently. >> >> That's possible but not terribly-likely in this particular instance. >> The reason is that I ran into this with the Postgres data store on a UFS >> volume BEFORE I converted it. Now it's on the ZFS pool (with >> recordsize=8k as recommended for that filesystem) but when I first ran >> into this it was on a separate UFS filesystem (which is where it had >> resided for 2+ years without incident), so unless the Postgres >> filesystem use on a UFS volume would give ZFS fits it's unlikely to be >> involved. > > I hate to say it, but that sounds very familiar to something we > experienced > with a machine here which was running high numbers of rrd updates. Again > we had the issue on UFS and saw the same thing when we moved the ZFS. > > I'll leave that there as to not derail the investigation with what could > be totally irrelavent info, but it may prove an interesting data point > later. > > There are obvious common low level points between UFS and ZFS which > may be the cause. One area which springs to mind is device bio ordering > and barriers which could well be impacted by sync IO requests independent > of the FS in use. > >>> One other options to test, just to rule it out is what happens if you >>> use BSD scheduler instead of ULE? >> >> I will test that but first I have to get the test machine to reliably >> stall so I know I'm not chasing my tail. > > Very sensible. > > Assuming you can reproduce it, one thing that might be interesting to > try is to eliminate all sync IO. I'm not sure if there are options in > Postgres to do this via configuration or if it would require editing > the code but this could reduce the problem space. > > If disabling sync IO eliminated the problem it would go a long way > to proving it isn't the IO volume or pattern per say but instead > related to the sync nature of said IO. > That can be turned off in the Postgres configuration. For obvious reasons it's a very bad idea but it is able to be disabled without actually changing the code itself. I don't know if it shuts off ALL sync requests, but the documentation says it does. It's interesting that you ran into this with RRD going; the machine in question does pull RRD data for Cacti, but it's such a small piece of the total load profile that I considered it immaterial. It might not be. -- -- Karl Denninger /The Market Ticker ®/ Cuda Systems LLC From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 19:57:23 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D33915DC for ; Thu, 7 Mar 2013 19:57:23 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-oa0-f42.google.com (mail-oa0-f42.google.com [209.85.219.42]) by mx1.freebsd.org (Postfix) with ESMTP id 98A83F63 for ; Thu, 7 Mar 2013 19:57:23 +0000 (UTC) Received: by mail-oa0-f42.google.com with SMTP id i18so1115759oag.29 for ; Thu, 07 Mar 2013 11:57:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=lf2Ap7yoJIgD/ad4ZD/tHcjYbK1M1Q6ud8LzMd//I8I=; b=ug2M5Yo4iz/S0D8auu/awWlfwA0AI32EVyilpd0KbYpBWRz/CHZXKXvNOKRmuD8PXK ZBfY91On7t9DSCjhYOTtY7whDWAPj15JcKRrhNf3GzG1x7/1jdOm5I0QbTdSlWWQheiz XemWsfZEEEKZr+R/TLgYTW+e69wtBImxX2YnAeTLGK80FpAcmPXv52xrfZI8ruSJtYzM FO6O30RkdQ3azQEb2oXQb+UQTIeHLS75LsyNMbGophh7K95NXEI8InANtg9YTfAb9YnX ygkndMOLAwWI5910HaVS1/t3hdsOk5aDzXmY/ruh5NOR0wNUWTUqWkSRutQCJTdHHnVw lzNg== MIME-Version: 1.0 X-Received: by 10.60.22.69 with SMTP id b5mr27827195oef.38.1362686242537; Thu, 07 Mar 2013 11:57:22 -0800 (PST) Sender: kob6558@gmail.com Received: by 10.76.11.165 with HTTP; Thu, 7 Mar 2013 11:57:21 -0800 (PST) In-Reply-To: <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> Date: Thu, 7 Mar 2013 11:57:21 -0800 X-Google-Sender-Auth: wmjkU64E7fFqIADMYk_l-dGWIgI Message-ID: Subject: Re: Sanity Check on Mac Mini From: Kevin Oberman To: Doug Hardie Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: rjk@wintek.com, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 19:57:23 -0000 On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie wrote: > > On 7 March 2013, at 06:42, Richard Kuhns wrote: > > > On 03/07/13 01:59, Doug Hardie wrote: > >> I have a new Mac Mini and have encountered the same problem reported > last year by Richard Kuhns. YongHyeon PYUN provided some patches to the > kernel that resolved the problem. However, without an internet connection > its a bit tricky to get them into the system. Here is the approach I > believe will work, but wanted to check first before I really mess things up. > >> > >> 1. Downloaded from current today via svnweb.freebsd.org: > >> sys/dev/bge/if_bgereg.h > >> sys/dev/bge/if_bge.c > >> sys/dev/mii/brgphy.c > >> > >> I believe the patches are incorporated in today's versions. The > comments indicate such. Thus I don't need to apply the original supplied > patch. > >> > >> 2. Put those on a flash drive. > >> > >> 3. Install 9.1 release from flash drive onto the Mini disk. Have to > include the system source. > >> > >> 4. Copy the files from 1 above from flash over the files on the disk. > >> > >> 5. Rebuild the kernel and install it. > >> > >> Thanks, > >> > >> -- Doug > > > > That's worked for me 3 times now. > > Thanks. Well, I got 9.1 Release installed, but it won't boot from the > internal disk. It doesn't see the disk as bootable. I installed using the > entire disk for FreeBSD. I used the i386 release. Perhaps I need to > switch to the amd64 release? I would generally recommend using the amd64 release, but it may not get your system to boot. How is your disk partitioned? GPT? Some BIOSes are broken and assume that a GPT formatted disk is UEFI and will not recognize them if they lack the UEFI boot partition. UEFI boot is a current project that seems likely to reach head in the fairly near future, but it's not possible now. You may be able to tweak your BIOS to get it to work or you may have to install using the traditional partitioning system. The installer defaults to GPT, but can create either. I have such a system (ThinkPad T520) and I have two disks... one that came with the system and containing Windows, and my GPT formatted FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, but it works well for me -- R. Kevin Oberman, Network Engineer E-mail: rkoberman@gmail.com From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 20:45:47 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 624F1511 for ; Thu, 7 Mar 2013 20:45:47 +0000 (UTC) (envelope-from prvs=1778223946=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id D6A721EC for ; Thu, 7 Mar 2013 20:45:46 +0000 (UTC) Received: from r2d2 ([46.65.172.4]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50002604816.msg for ; Thu, 07 Mar 2013 20:45:44 +0000 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 07 Mar 2013 20:45:44 +0000 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 46.65.172.4 X-Return-Path: prvs=1778223946=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: From: "Steven Hartland" To: "Karl Denninger" , References: <513524B2.6020600@denninger.net> <20130307072145.GA2923@server.rulingia.com> <5138A4C1.5090503@denninger.net> <5138E55F.7080107@denninger.net> <322C3648171F4BF28201350E5656372A@multiplay.co.uk> <5138EAEB.7010105@denninger.net> Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Date: Thu, 7 Mar 2013 20:45:50 -0000 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 20:45:47 -0000 ----- Original Message ----- From: "Karl Denninger" >>> I will test that but first I have to get the test machine to reliably >>> stall so I know I'm not chasing my tail. >> >> Very sensible. >> >> Assuming you can reproduce it, one thing that might be interesting to >> try is to eliminate all sync IO. I'm not sure if there are options in >> Postgres to do this via configuration or if it would require editing >> the code but this could reduce the problem space. >> >> If disabling sync IO eliminated the problem it would go a long way >> to proving it isn't the IO volume or pattern per say but instead >> related to the sync nature of said IO. >> > That can be turned off in the Postgres configuration. For obvious > reasons it's a very bad idea but it is able to be disabled without > actually changing the code itself. > > I don't know if it shuts off ALL sync requests, but the documentation > says it does. > > It's interesting that you ran into this with RRD going; the machine in > question does pull RRD data for Cacti, but it's such a small piece of > the total load profile that I considered it immaterial. > > It might not be. We never did get to the bottom of it but did come up with a fix. Instead of using straight RRD interaction we switched all out code to use rrdcached and put the files on SSD based pool, never had an issue since. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-stable@FreeBSD.ORG Thu Mar 7 22:18:26 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5A68C313 for ; Thu, 7 Mar 2013 22:18:26 +0000 (UTC) (envelope-from bc979@lafn.org) Received: from zoom.lafn.org (zoom.lafn.org [108.92.93.123]) by mx1.freebsd.org (Postfix) with ESMTP id 37BD77C6 for ; Thu, 7 Mar 2013 22:18:25 +0000 (UTC) Received: from [10.0.1.2] (static-71-177-216-148.lsanca.fios.verizon.net [71.177.216.148]) (authenticated bits=0) by zoom.lafn.org (8.14.3/8.14.2) with ESMTP id r27MIOxL018946 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 7 Mar 2013 14:18:24 -0800 (PST) (envelope-from bc979@lafn.org) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Sanity Check on Mac Mini From: Doug Hardie In-Reply-To: Date: Thu, 7 Mar 2013 14:18:23 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> To: Kevin Oberman X-Mailer: Apple Mail (2.1499) X-Virus-Scanned: clamav-milter 0.97 at zoom.lafn.org X-Virus-Status: Clean Cc: rjk@wintek.com, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2013 22:18:26 -0000 On 7 March 2013, at 11:57, Kevin Oberman wrote: > On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie wrote: >=20 > On 7 March 2013, at 06:42, Richard Kuhns wrote: >=20 > > On 03/07/13 01:59, Doug Hardie wrote: > >> I have a new Mac Mini and have encountered the same problem = reported last year by Richard Kuhns. YongHyeon PYUN provided some = patches to the kernel that resolved the problem. However, without an = internet connection its a bit tricky to get them into the system. Here = is the approach I believe will work, but wanted to check first before I = really mess things up. > >> > >> 1. Downloaded from current today via svnweb.freebsd.org: > >> sys/dev/bge/if_bgereg.h > >> sys/dev/bge/if_bge.c > >> sys/dev/mii/brgphy.c > >> > >> I believe the patches are incorporated in today's versions. The = comments indicate such. Thus I don't need to apply the original = supplied patch. > >> > >> 2. Put those on a flash drive. > >> > >> 3. Install 9.1 release from flash drive onto the Mini disk. Have = to include the system source. > >> > >> 4. Copy the files from 1 above from flash over the files on the = disk. > >> > >> 5. Rebuild the kernel and install it. > >> > >> Thanks, > >> > >> -- Doug > > > > That's worked for me 3 times now. >=20 > Thanks. Well, I got 9.1 Release installed, but it won't boot from the = internal disk. It doesn't see the disk as bootable. I installed using = the entire disk for FreeBSD. I used the i386 release. Perhaps I need = to switch to the amd64 release? >=20 > I would generally recommend using the amd64 release, but it may not = get your system to boot.=20 >=20 > How is your disk partitioned? GPT? Some BIOSes are broken and assume = that a GPT formatted disk is UEFI and will not recognize them if they = lack the UEFI boot partition. UEFI boot is a current project that seems = likely to reach head in the fairly near future, but it's not possible = now. No idea what the default partitioning is for BSDInstall. However the = Mini is only EFI or UFEI with some fallbacks although the comments I = find in the web indicate that different models have different fallbacks. One comment indicates that an older unit will boot if its MBR = partitioning. I don't know if the new installer supports that or not. >=20 > You may be able to tweak your BIOS to get it to work or you may have = to install using the traditional partitioning system. The installer = defaults to GPT, but can create either. >=20 > I have such a system (ThinkPad T520) and I have two disks... one that = came with the system and containing Windows, and my GPT formatted = FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the = Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, = but it works well for me Based on a comment I say, waiting till the empty folder icon appears and = then plugging in the install memstick causes the mini to boot from disk. = That just downright weird, but it works. I could live with that, but = this is an unattended server and would experience some down time if I am = not there when there is a power failure. I just found some "instructions" for using MBR with bsdinstall, but = given there is an effort to create a UEFI boot which I suspect would = expect to find the GPT boot partition, perhaps I should just go with the = memstick approach? -- Doug From owner-freebsd-stable@FreeBSD.ORG Fri Mar 8 01:00:06 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C2DBC173 for ; Fri, 8 Mar 2013 01:00:05 +0000 (UTC) (envelope-from jcm@visi.com) Received: from g2host.com (mailback3.g2host.com [208.42.184.243]) by mx1.freebsd.org (Postfix) with ESMTP id 1628DE42 for ; Fri, 8 Mar 2013 01:00:04 +0000 (UTC) Received: from [208.42.90.57] (account jcm@visi.com) by mailback3.g2host.com (CommuniGate Pro WEBUSER 5.3.11) with HTTP id 11073366 for freebsd-stable@freebsd.org; Thu, 07 Mar 2013 19:00:03 -0600 From: "John Mehr" Subject: Re: Sanity Check on Mac Mini To: X-Mailer: CommuniGate Pro WebUser v5.3.11 Date: Thu, 07 Mar 2013 19:00:03 -0600 Message-ID: In-Reply-To: <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1; format="flowed" Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 01:00:06 -0000 On Thu, 7 Mar 2013 14:18:23 -0800  Doug Hardie wrote: > > On 7 March 2013, at 11:57, Kevin Oberman > wrote: > >> On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie >> wrote: >> >> On 7 March 2013, at 06:42, Richard Kuhns >> wrote: >> >> > On 03/07/13 01:59, Doug Hardie wrote: >> >> I have a new Mac Mini and have encountered the same >>problem reported last year by Richard Kuhns.  YongHyeon >>PYUN provided some patches to the kernel that resolved >>the problem.  However, without an internet connection its >>a bit tricky to get them into the system.  Here is the >>approach I believe will work, but wanted to check first >>before I really mess things up. >> >> >> >> 1.  Downloaded from current today via >>svnweb.freebsd.org: >> >>      sys/dev/bge/if_bgereg.h >> >>      sys/dev/bge/if_bge.c >> >>      sys/dev/mii/brgphy.c >> >> >> >>    I believe the patches are incorporated in today's >>versions.  The comments indicate such.  Thus I don't need >>to apply the original supplied patch. >> >> >> >> 2.  Put those on a flash drive. >> >> >> >> 3.  Install 9.1 release from flash drive onto the >>Mini disk.  Have to include the system source. >> >> >> >> 4.  Copy the files from 1 above from flash over the >>files on the disk. >> >> >> >> 5.  Rebuild the kernel and install it. >> >> >> >> Thanks, >> >> >> >> -- Doug >> > >> > That's worked for me 3 times now. >> >> Thanks.  Well, I got 9.1 Release installed, but it won't >>boot from the internal disk.  It doesn't see the disk as >>bootable.  I installed using the entire disk for FreeBSD. >> I used the i386 release.  Perhaps I need to switch to >>the amd64 release? >> >> I would generally recommend using the amd64 release, but >>it may not get your system to boot. >> >> How is your disk partitioned? GPT? Some BIOSes are >>broken and assume that a GPT formatted disk is UEFI and >>will not recognize them if they lack the UEFI boot >>partition. UEFI boot is a current project that seems >>likely to reach head in the fairly near future, but it's >>not possible now. > > No idea what the default partitioning is for BSDInstall. > However the Mini is only EFI or UFEI with some fallbacks >although the comments I find in the web indicate that >different models have different fallbacks. > > One comment indicates that an older unit will boot if >its MBR partitioning.  I don't know if the new installer >supports that or not. > >> >> You may be able to tweak your BIOS to get it to work or >>you may have to install using the traditional >>partitioning system. The installer defaults to GPT, but >>can create either. >> >> I have such a system (ThinkPad T520) and I have two >>disks... one that came with the system and containing >>Windows, and my GPT formatted FreeBSD disk. I wrote a >>FreeBSD BootEasy boot into the MBR of the Windows disk >>and it CAN boot the GPT disk just fine. Not ideal for >>most, but it works well for me > > Based on a comment I say, waiting till the empty folder >icon appears and then plugging in the install memstick >causes the mini to boot from disk.  That just downright >weird, but it works.  I could live with that, but this is >an unattended server and would experience some down time >if I am not there when there is a power failure. > > I just found some "instructions" for using MBR with >bsdinstall, but given there is an effort to create a UEFI >boot which I suspect would expect to find the GPT boot >partition, perhaps I should just go with the memstick >approach? Hello, If you still have a drive with OS X on it, you may have some luck with OS X's bless command: https://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man8/bless.8.html I got a late 2012 mac mini to boot FreeBSD 9.1 (AMD64) from a hard drive using 'bless' (unfortunately I don't remember the exact command line parameters I used).  If you're looking to dual boot, the only luck I had (without resorting to using third party software like rEFIt) was to put the OS's on different drives and install FreeBSD using MBR on the second drive. From owner-freebsd-stable@FreeBSD.ORG Fri Mar 8 01:51:52 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DE1113C3 for ; Fri, 8 Mar 2013 01:51:52 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-oa0-f51.google.com (mail-oa0-f51.google.com [209.85.219.51]) by mx1.freebsd.org (Postfix) with ESMTP id 9FFEC9A for ; Fri, 8 Mar 2013 01:51:52 +0000 (UTC) Received: by mail-oa0-f51.google.com with SMTP id h2so1451129oag.10 for ; Thu, 07 Mar 2013 17:51:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=/BKFJmpm9H6cbaYp/9GAoD5BGaYkPzUOJz6LAqS+msw=; b=dAQuPa5Q/7RI5aD8AlruYn5ntBybs2uH6AMXNrJEfRUxOkWl1A08wUEb2tC1KCAscB mbtAEqs2Q81oZ+SgCyH7pVfg8yrpHSbnPfqhX1HfrExCKg2v4BeMVI0aQxR04J4iRtzs A2ux54w4jNvR/e2nj30wF2/DyTx89g+BHPXkrmbc9/UJkRqsrMpZWLKEX36etQl6NrJz HnnAWDa+g8J/um4tRrETWfIUiWnJto3LwjB0Mct10x5Ek+rgnJdVxWQ2a4DLFt8S23T0 ZiH5jn77R1U5ix+UXUKfKeYucANKufGNcuJvW2qgXAd3mMGmYxBe6LY+UG3F3+boXEPV pZxA== MIME-Version: 1.0 X-Received: by 10.182.136.72 with SMTP id py8mr357547obb.0.1362707506215; Thu, 07 Mar 2013 17:51:46 -0800 (PST) Sender: kob6558@gmail.com Received: by 10.76.11.165 with HTTP; Thu, 7 Mar 2013 17:51:46 -0800 (PST) In-Reply-To: <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> Date: Thu, 7 Mar 2013 17:51:46 -0800 X-Google-Sender-Auth: SfEXpxfh1cJ2_vwAQ2jKDctI2n4 Message-ID: Subject: Re: Sanity Check on Mac Mini From: Kevin Oberman To: Doug Hardie Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: rjk@wintek.com, freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 01:51:52 -0000 On Thu, Mar 7, 2013 at 2:18 PM, Doug Hardie wrote: > > On 7 March 2013, at 11:57, Kevin Oberman wrote: > > > On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie wrote: > > > > On 7 March 2013, at 06:42, Richard Kuhns wrote: > > > > > On 03/07/13 01:59, Doug Hardie wrote: > > >> I have a new Mac Mini and have encountered the same problem reported > last year by Richard Kuhns. YongHyeon PYUN provided some patches to the > kernel that resolved the problem. However, without an internet connection > its a bit tricky to get them into the system. Here is the approach I > believe will work, but wanted to check first before I really mess things up. > > >> > > >> 1. Downloaded from current today via svnweb.freebsd.org: > > >> sys/dev/bge/if_bgereg.h > > >> sys/dev/bge/if_bge.c > > >> sys/dev/mii/brgphy.c > > >> > > >> I believe the patches are incorporated in today's versions. The > comments indicate such. Thus I don't need to apply the original supplied > patch. > > >> > > >> 2. Put those on a flash drive. > > >> > > >> 3. Install 9.1 release from flash drive onto the Mini disk. Have to > include the system source. > > >> > > >> 4. Copy the files from 1 above from flash over the files on the disk. > > >> > > >> 5. Rebuild the kernel and install it. > > >> > > >> Thanks, > > >> > > >> -- Doug > > > > > > That's worked for me 3 times now. > > > > Thanks. Well, I got 9.1 Release installed, but it won't boot from the > internal disk. It doesn't see the disk as bootable. I installed using the > entire disk for FreeBSD. I used the i386 release. Perhaps I need to > switch to the amd64 release? > > > > I would generally recommend using the amd64 release, but it may not get > your system to boot. > > > > How is your disk partitioned? GPT? Some BIOSes are broken and assume > that a GPT formatted disk is UEFI and will not recognize them if they lack > the UEFI boot partition. UEFI boot is a current project that seems likely > to reach head in the fairly near future, but it's not possible now. > > No idea what the default partitioning is for BSDInstall. However the Mini > is only EFI or UFEI with some fallbacks although the comments I find in the > web indicate that different models have different fallbacks. > > One comment indicates that an older unit will boot if its MBR > partitioning. I don't know if the new installer supports that or not. > > > > > You may be able to tweak your BIOS to get it to work or you may have to > install using the traditional partitioning system. The installer defaults > to GPT, but can create either. > > > > I have such a system (ThinkPad T520) and I have two disks... one that > came with the system and containing Windows, and my GPT formatted FreeBSD > disk. I wrote a FreeBSD BootEasy boot into the MBR of the Windows disk and > it CAN boot the GPT disk just fine. Not ideal for most, but it works well > for me > > Based on a comment I say, waiting till the empty folder icon appears and > then plugging in the install memstick causes the mini to boot from disk. > That just downright weird, but it works. I could live with that, but this > is an unattended server and would experience some down time if I am not > there when there is a power failure. > > I just found some "instructions" for using MBR with bsdinstall, but given > there is an effort to create a UEFI boot which I suspect would expect to > find the GPT boot partition, perhaps I should just go with the memstick > approach To be cleat, you just insert the thumb drive and the hard drive boots? That IS weird! Or do you get the BootEasy prompt for the partition/disk you want to boot? If the latter, the system is processing the MBR from the thumb drive and using that to boot the GPT disk. I am not an expert on EFI or UEFI. I know EFI is older and UEFI replaced it about five years ago. I am not entirely clear on the differences, but I assume a newer Mac Mini would be UEFI. My experience with boot loaders is, to put it politely, ancient. I mean pre-BIOS. I have, at best, a limited understanding of BIOS booting and not much on UEFI, but I know that UEFI can boot devices using the old PC partitioning system as well as GUID (GPT) partitioned ones. The Wikipedia article on UEFI is enlightening. -- R. Kevin Oberman, Network Engineer E-mail: rkoberman@gmail.com From owner-freebsd-stable@FreeBSD.ORG Fri Mar 8 02:33:04 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id EDEAAD2C; Fri, 8 Mar 2013 02:33:04 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by mx1.freebsd.org (Postfix) with ESMTP id 99B8A1C1; Fri, 8 Mar 2013 02:33:04 +0000 (UTC) Received: by mail-pa0-f43.google.com with SMTP id bh2so937618pad.30 for ; Thu, 07 Mar 2013 18:33:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:from:date:to:cc:subject:message-id:reply-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=MWyUhmqsis2f5dcRDQY61dxvFVPrs6PH3fH0poi5KUU=; b=wLwAmIdQtNXc2WDaxb/N6lAdUsVjn/VOWgigE6k3q6FJulQhlM0TYj4AY8q6fJGX5r KKBtpr6WgwwvuQtKSoU32yTVkdJA8pKdn9kqdRLVc84pLt1SrUWgZlR6QwDbrvUY6yfy /XiW2or2/DYDz2k8fiRq6zRLcfw1RGVM4pyqPH3/H8LlVGww+G9Pj1Bq6t7Y5q6h+HVV SmlbdYBiOuU7yrr6zz5jPZUMAoNFQi8qvuYKk5csAwzNJdGaFdfKiXq3Yjp4iyQU690h rdCZelJRgL4lzH94FGDEP8eIXEBL9tiVmm5qQz2r9SvPD7PqDCfPi7OekFcu3JtRu2qj ozrw== X-Received: by 10.66.145.5 with SMTP id sq5mr1622570pab.105.1362709983867; Thu, 07 Mar 2013 18:33:03 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPS id mz8sm3719433pbc.9.2013.03.07.18.33.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 07 Mar 2013 18:33:02 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Fri, 08 Mar 2013 11:32:54 +0900 From: YongHyeon PYUN Date: Fri, 8 Mar 2013 11:32:54 +0900 To: Jeremy Chadwick Subject: Re: Strange reboot since 9.1 Message-ID: <20130308023254.GC3246@michelle.cdnetworks.com> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> <20130307163827.GA96983@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130307163827.GA96983@icarus.home.lan> User-Agent: Mutt/1.4.2.3i Cc: yongari@freebsd.org, freebsd-stable@freebsd.org, Lo?c Blot X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 02:33:05 -0000 On Thu, Mar 07, 2013 at 08:38:27AM -0800, Jeremy Chadwick wrote: > On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: > > Hi Marcelo, thanks. Here is a better trace: > > > > --------------------------------- > > > > kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 > > GNU gdb 6.1.1 [FreeBSD] > > Copyright 2004 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and you > > are > > welcome to change it and/or distribute copies of it under certain > > conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for > > details. > > This GDB was configured as "amd64-marcel-freebsd"... > > > > Unread portion of the kernel message buffer: > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x0 > > fault code = supervisor read data, page not present > > instruction pointer = 0x20:0xffffffff80a84414 > > stack pointer = 0x28:0xffffff822fc267a0 > > frame pointer = 0x28:0xffffff822fc26830 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 12 (irq265: bce0) > > trap number = 12 > > panic: page fault > > cpuid = 0 > > KDB: stack backtrace: > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > #1 0xffffffff808ea8be at panic+0x1ce > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > #4 0xffffffff80bd8b9e at trap+0x3ce > > #5 0xffffffff80bc315f at calltrap+0x8 > > #6 0xffffffff80a861d5 at udp_input+0x475 > > #7 0xffffffff80a043dc at ip_input+0xac > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > #9 0xffffffff809a35cd at ether_demux+0x14d > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > Uptime: 27m20s > > Dumping 1265 out of 8162 > > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% > > > > #0 doadump (textdump=Variable "textdump" is not available. > > ) at pcpu.h:224 > > 224 pcpu.h: No such file or directory. > > in pcpu.h > > (kgdb) bt f > > #0 doadump (textdump=Variable "textdump" is not available. > > ) at pcpu.h:224 > > No locals. > > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > > at /usr/src/sys/kern/kern_shutdown.c:448 > > _ep = Variable "_ep" is not available. > > (kgdb) bt > > #0 doadump (textdump=Variable "textdump" is not available. > > ) at pcpu.h:224 > > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > > at /usr/src/sys/kern/kern_shutdown.c:448 > > #2 0xffffffff808ea897 in panic (fmt=0x1
) > > at /usr/src/sys/kern/kern_shutdown.c:636 > > #3 0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is > > not available. > > ) at /usr/src/sys/amd64/amd64/trap.c:857 > > #4 0xffffffff80bd857d in trap_pfault (frame=0xffffff822fc266f0, > > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 > > #5 0xffffffff80bd8b9e in trap (frame=0xffffff822fc266f0) > > at /usr/src/sys/amd64/amd64/trap.c:456 > > #6 0xffffffff80bc315f in calltrap () > > at /usr/src/sys/amd64/amd64/exception.S:228 > > #7 0xffffffff80a84414 in udp_append (inp=0xfffffe019e2a1000, > > ip=0xfffffe00444b6c80, n=0xfffffe00444b6c00, off=20, > > udp_in=0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 > > #8 0xffffffff80a861d5 in udp_input (m=0xfffffe00444b6c00, off=Variable > > "off" is not available. > > ) at /usr/src/sys/netinet/udp_usrreq.c:618 > > #9 0xffffffff80a043dc in ip_input (m=0xfffffe00444b6c00) > > at /usr/src/sys/netinet/ip_input.c:760 > > #10 0xffffffff809adafb in netisr_dispatch_src (proto=1, source=Variable > > "source" is not available. > > ) at /usr/src/sys/net/netisr.c:1013 > > #11 0xffffffff809a35cd in ether_demux (ifp=0xfffffe00053fa000, > > m=0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 > > #12 0xffffffff809a38a4 in ether_nh_input (m=Variable "m" is not > > available. > > ) at /usr/src/sys/net/if_ethersubr.c:759 > > #13 0xffffffff809adafb in netisr_dispatch_src (proto=9, source=Variable > > "source" is not available. > > ) at /usr/src/sys/net/netisr.c:1013 > > #14 0xffffffff80438fd7 in bce_intr (xsc=Variable "xsc" is not available. > > ) at /usr/src/sys/dev/bce/if_bce.c:6903 > > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is > > not available. > > ) at /usr/src/sys/kern/kern_intr.c:1262 > > #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe00057424e0) > > at /usr/src/sys/kern/kern_intr.c:1275 > > #17 0xffffffff808bb9ef in fork_exit (callout=0xffffffff808bffd0 > > , arg=0xfffffe00057424e0, frame=0xffffff822fc26c40) > > at /usr/src/sys/kern/kern_fork.c:992 > > #18 0xffffffff80bc368e in fork_trampoline () > > at /usr/src/sys/amd64/amd64/exception.S:602 > > #19 0x0000000000000000 in ?? () > > #20 0x0000000000000000 in ?? () > > #21 0x0000000000000001 in ?? () > > #22 0x0000000000000000 in ?? () > > #23 0x0000000000000000 in ?? () > > #24 0x0000000000000000 in ?? () > > #25 0x0000000000000000 in ?? () > > #26 0x0000000000000000 in ?? () > > #27 0x0000000000000000 in ?? () > > #28 0x0000000000000000 in ?? () > > #29 0x0000000000000000 in ?? () > > #30 0x0000000000000000 in ?? () > > #31 0x0000000000000000 in ?? () > > #32 0x0000000000000000 in ?? () > > #33 0x0000000000000000 in ?? () > > #34 0x0000000000000000 in ?? () > > #35 0x0000000000000000 in ?? () > > #36 0x0000000000000000 in ?? () > > #37 0x0000000000000000 in ?? () > > #38 0x0000000000000000 in ?? () > > #39 0x0000000000000000 in ?? () > > #40 0x0000000000000000 in ?? () > > #41 0x0000000000000000 in ?? () > > #42 0x0000000000000000 in ?? () > > #43 0x0000000000000002 in ?? () > > #44 0xffffffff81241c00 in tdq_cpu () > > #45 0xfffffe0005501000 in ?? () > > #46 0x0000000000000000 in ?? () > > #47 0xffffff822fc266d0 in ?? () > > #48 0xffffff822fc26678 in ?? () > > #49 0xfffffe019ed11470 in ?? () > > #50 0xffffffff8091352e in sched_switch (td=0x0, > > newtd=0xfffffe00057424e0, flags=Variable "flags" is not available. > > ) at /usr/src/sys/kern/sched_ule.c:1921 > > Previous frame inner to this frame (corrupt stack?) > > [...] > CC'ing Yong-Hyeon (yongari@) who helps maintain the bce(4) driver; it > looks to me the issue is there. He may have some advice. I recall there had been a couple of bce(4) related crash reports( e.g. kern/171739) but the root cause of the issue was not identified yet. Give that most of crash reports indicate bce(4)'s RX path, I suspect the driver modifies mbufs passed to upper stack. I still have to revive one of my box that can host quad-port bce(4) controllers but couldn't find time and new MB. From owner-freebsd-stable@FreeBSD.ORG Fri Mar 8 09:43:20 2013 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0FF71AF6; Fri, 8 Mar 2013 09:43:20 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop04.sare.net (proxypop04.sare.net [194.30.0.65]) by mx1.freebsd.org (Postfix) with ESMTP id C294A686; Fri, 8 Mar 2013 09:43:19 +0000 (UTC) Received: from [172.16.1.163] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id A0DC79DEB44; Fri, 8 Mar 2013 10:33:18 +0100 (CET) Subject: Re: ZFS "stalls" -- and maybe we should be talking about defaults? Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: <20130305220936.GA54718@icarus.home.lan> Date: Fri, 8 Mar 2013 10:33:17 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <9884DCE0-9FAF-4A49-B230-3596E10C456D@sarenet.es> References: <513524B2.6020600@denninger.net> <89680320E0FA4C0A99D522EA2037CE6E@multiplay.co.uk> <20130305050539.GA52821@anubis.morrow.me.uk> <20130305053249.GA38107@icarus.home.lan> <5135D275.3050500@FreeBSD.org> <20130305220936.GA54718@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1085) Cc: freebsd-stable@FreeBSD.org, Andriy Gapon X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 09:43:20 -0000 On Mar 5, 2013, at 11:09 PM, Jeremy Chadwick wrote: >>> - Disks are GPT and are *partitioned, and ZFS refers to the = partitions >>> not the raw disk -- this matters (honest, it really does; the ZFS >>> code handles things differently with raw disks) >>=20 >> Not on FreeBSD as far I can see. >=20 > My statement comes from here (first line in particular): >=20 > = http://lists.freebsd.org/pipermail/freebsd-questions/2013-January/248697.h= tml >=20 > If this is wrong/false, then this furthers my point about kernel folks > who are in-the-know needing to chime in and help stop the > misinformation. The rest of us are just end-users, often misinformed. As far as I know, this is lore than surfaces periodically in the lists. = It was true in Solaris (at least in the past). But unless I'm terribly wrong, this doesn't happen in FreeBSD. ZFS sees = "disks", and they can be a whole raw device or a partition/slice, even a gnop device. No difference. That's why I mentioned in freebsd-fs that we badly need an official = doctrine, carefully curated, and written in holy letters ;) Borja. From owner-freebsd-stable@FreeBSD.ORG Fri Mar 8 13:18:17 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id CA9BEAD4 for ; Fri, 8 Mar 2013 13:18:17 +0000 (UTC) (envelope-from rjk@wintek.com) Received: from local.wintek.com (local.wintek.com [72.12.201.234]) by mx1.freebsd.org (Postfix) with ESMTP id 7918AB3 for ; Fri, 8 Mar 2013 13:18:17 +0000 (UTC) Received: from rjk.wintek.local (172.28.1.248) by local.wintek.com (172.28.1.234) with Microsoft SMTP Server (TLS) id 8.1.436.0; Fri, 8 Mar 2013 08:18:16 -0500 Message-ID: <5139E513.9000704@wintek.com> Date: Fri, 8 Mar 2013 08:18:11 -0500 From: Richard Kuhns Organization: Wintek Corporation User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130220 Thunderbird/17.0.3 MIME-Version: 1.0 To: Doug Hardie Subject: Re: Sanity Check on Mac Mini References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> In-Reply-To: <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Cc: Kevin Oberman , "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: rjk@wintek.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 13:18:17 -0000 On 03/07/13 17:18, Doug Hardie wrote: > > On 7 March 2013, at 11:57, Kevin Oberman wrote: > >> On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie wrote: >> >> On 7 March 2013, at 06:42, Richard Kuhns wrote: >> >>> On 03/07/13 01:59, Doug Hardie wrote: >>>> I have a new Mac Mini and have encountered the same problem reported last year by Richard Kuhns. YongHyeon PYUN provided some patches to the kernel that resolved the problem. However, without an internet connection its a bit tricky to get them into the system. Here is the approach I believe will work, but wanted to check first before I really mess things up. >>>> >>>> 1. Downloaded from current today via svnweb.freebsd.org: >>>> sys/dev/bge/if_bgereg.h >>>> sys/dev/bge/if_bge.c >>>> sys/dev/mii/brgphy.c >>>> >>>> I believe the patches are incorporated in today's versions. The comments indicate such. Thus I don't need to apply the original supplied patch. >>>> >>>> 2. Put those on a flash drive. >>>> >>>> 3. Install 9.1 release from flash drive onto the Mini disk. Have to include the system source. >>>> >>>> 4. Copy the files from 1 above from flash over the files on the disk. >>>> >>>> 5. Rebuild the kernel and install it. >>>> >>>> Thanks, >>>> >>>> -- Doug >>> >>> That's worked for me 3 times now. >> >> Thanks. Well, I got 9.1 Release installed, but it won't boot from the internal disk. It doesn't see the disk as bootable. I installed using the entire disk for FreeBSD. I used the i386 release. Perhaps I need to switch to the amd64 release? >> >> I would generally recommend using the amd64 release, but it may not get your system to boot. >> >> How is your disk partitioned? GPT? Some BIOSes are broken and assume that a GPT formatted disk is UEFI and will not recognize them if they lack the UEFI boot partition. UEFI boot is a current project that seems likely to reach head in the fairly near future, but it's not possible now. > > No idea what the default partitioning is for BSDInstall. However the Mini is only EFI or UFEI with some fallbacks although the comments I find in the web indicate that different models have different fallbacks. > > One comment indicates that an older unit will boot if its MBR partitioning. I don't know if the new installer supports that or not. > >> >> You may be able to tweak your BIOS to get it to work or you may have to install using the traditional partitioning system. The installer defaults to GPT, but can create either. >> >> I have such a system (ThinkPad T520) and I have two disks... one that came with the system and containing Windows, and my GPT formatted FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, but it works well for me > > Based on a comment I say, waiting till the empty folder icon appears and then plugging in the install memstick causes the mini to boot from disk. That just downright weird, but it works. I could live with that, but this is an unattended server and would experience some down time if I am not there when there is a power failure. > > I just found some "instructions" for using MBR with bsdinstall, but given there is an effort to create a UEFI boot which I suspect would expect to find the GPT boot partition, perhaps I should just go with the memstick approach? > > -- Doug > FWIW, here are the brief notes I made for what has been working for me for the last year or so; most recently with a new Mini purchased about 2 weeks ago. I'm using the entire drive for FreeBSD. Hit Option key while booting, then select 'Windows' USB image. Now trying GPT; looks fine, but will only boot with USB stick in place. If it's not there, just get a folder with a '?' when starting up. Using MBR; boots ok without USB stick. It just takes about 30 seconds before it actually boots. Select YES when asked about GMT. -- Richard Kuhns My Desk: 765-269-8541 Wintek Corporation Internet Support: 765-269-8503 427 N 6th Street Consulting: 765-269-8504 Lafayette, IN 47901-2211 Accounting: 765-269-8502 From owner-freebsd-stable@FreeBSD.ORG Fri Mar 8 16:16:16 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6EA4AE72; Fri, 8 Mar 2013 16:16:16 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id F3021F6C; Fri, 8 Mar 2013 16:16:15 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.6/8.14.6/ALCHEMY.FRANKEN.DE) with ESMTP id r28GGEcF095280; Fri, 8 Mar 2013 17:16:14 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.6/8.14.6/Submit) id r28GGEl4095279; Fri, 8 Mar 2013 17:16:14 +0100 (CET) (envelope-from marius) Date: Fri, 8 Mar 2013 17:16:13 +0100 From: Marius Strobl To: YongHyeon PYUN Subject: Re: Strange reboot since 9.1 Message-ID: <20130308161613.GA82746@alchemy.franken.de> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> <20130307163827.GA96983@icarus.home.lan> <20130308023254.GC3246@michelle.cdnetworks.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130308023254.GC3246@michelle.cdnetworks.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Jeremy Chadwick , Lo?c Blot , freebsd-stable@freebsd.org, yongari@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 16:16:16 -0000 On Fri, Mar 08, 2013 at 11:32:54AM +0900, YongHyeon PYUN wrote: > On Thu, Mar 07, 2013 at 08:38:27AM -0800, Jeremy Chadwick wrote: > > On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: > > > Hi Marcelo, thanks. Here is a better trace: > > > > > > --------------------------------- > > > > > > kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 > > > GNU gdb 6.1.1 [FreeBSD] > > > Copyright 2004 Free Software Foundation, Inc. > > > GDB is free software, covered by the GNU General Public License, and you > > > are > > > welcome to change it and/or distribute copies of it under certain > > > conditions. > > > Type "show copying" to see the conditions. > > > There is absolutely no warranty for GDB. Type "show warranty" for > > > details. > > > This GDB was configured as "amd64-marcel-freebsd"... > > > > > > Unread portion of the kernel message buffer: > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 0; apic id = 00 > > > fault virtual address = 0x0 > > > fault code = supervisor read data, page not present > > > instruction pointer = 0x20:0xffffffff80a84414 > > > stack pointer = 0x28:0xffffff822fc267a0 > > > frame pointer = 0x28:0xffffff822fc26830 > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags = interrupt enabled, resume, IOPL = 0 > > > current process = 12 (irq265: bce0) > > > trap number = 12 > > > panic: page fault > > > cpuid = 0 > > > KDB: stack backtrace: > > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > > #1 0xffffffff808ea8be at panic+0x1ce > > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > > #4 0xffffffff80bd8b9e at trap+0x3ce > > > #5 0xffffffff80bc315f at calltrap+0x8 > > > #6 0xffffffff80a861d5 at udp_input+0x475 > > > #7 0xffffffff80a043dc at ip_input+0xac > > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > #9 0xffffffff809a35cd at ether_demux+0x14d > > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > > Uptime: 27m20s > > > Dumping 1265 out of 8162 > > > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% > > > > > > #0 doadump (textdump=Variable "textdump" is not available. > > > ) at pcpu.h:224 > > > 224 pcpu.h: No such file or directory. > > > in pcpu.h > > > (kgdb) bt f > > > #0 doadump (textdump=Variable "textdump" is not available. > > > ) at pcpu.h:224 > > > No locals. > > > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > > > at /usr/src/sys/kern/kern_shutdown.c:448 > > > _ep = Variable "_ep" is not available. > > > (kgdb) bt > > > #0 doadump (textdump=Variable "textdump" is not available. > > > ) at pcpu.h:224 > > > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) > > > at /usr/src/sys/kern/kern_shutdown.c:448 > > > #2 0xffffffff808ea897 in panic (fmt=0x1
) > > > at /usr/src/sys/kern/kern_shutdown.c:636 > > > #3 0xffffffff80bd8240 in trap_fatal (frame=0xc, eva=Variable "eva" is > > > not available. > > > ) at /usr/src/sys/amd64/amd64/trap.c:857 > > > #4 0xffffffff80bd857d in trap_pfault (frame=0xffffff822fc266f0, > > > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 > > > #5 0xffffffff80bd8b9e in trap (frame=0xffffff822fc266f0) > > > at /usr/src/sys/amd64/amd64/trap.c:456 > > > #6 0xffffffff80bc315f in calltrap () > > > at /usr/src/sys/amd64/amd64/exception.S:228 > > > #7 0xffffffff80a84414 in udp_append (inp=0xfffffe019e2a1000, > > > ip=0xfffffe00444b6c80, n=0xfffffe00444b6c00, off=20, > > > udp_in=0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:252 > > > #8 0xffffffff80a861d5 in udp_input (m=0xfffffe00444b6c00, off=Variable > > > "off" is not available. > > > ) at /usr/src/sys/netinet/udp_usrreq.c:618 > > > #9 0xffffffff80a043dc in ip_input (m=0xfffffe00444b6c00) > > > at /usr/src/sys/netinet/ip_input.c:760 > > > #10 0xffffffff809adafb in netisr_dispatch_src (proto=1, source=Variable > > > "source" is not available. > > > ) at /usr/src/sys/net/netisr.c:1013 > > > #11 0xffffffff809a35cd in ether_demux (ifp=0xfffffe00053fa000, > > > m=0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 > > > #12 0xffffffff809a38a4 in ether_nh_input (m=Variable "m" is not > > > available. > > > ) at /usr/src/sys/net/if_ethersubr.c:759 > > > #13 0xffffffff809adafb in netisr_dispatch_src (proto=9, source=Variable > > > "source" is not available. > > > ) at /usr/src/sys/net/netisr.c:1013 > > > #14 0xffffffff80438fd7 in bce_intr (xsc=Variable "xsc" is not available. > > > ) at /usr/src/sys/dev/bce/if_bce.c:6903 > > > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=Variable "p" is > > > not available. > > > ) at /usr/src/sys/kern/kern_intr.c:1262 > > > #16 0xffffffff808c0076 in ithread_loop (arg=0xfffffe00057424e0) > > > at /usr/src/sys/kern/kern_intr.c:1275 > > > #17 0xffffffff808bb9ef in fork_exit (callout=0xffffffff808bffd0 > > > , arg=0xfffffe00057424e0, frame=0xffffff822fc26c40) > > > at /usr/src/sys/kern/kern_fork.c:992 > > > #18 0xffffffff80bc368e in fork_trampoline () > > > at /usr/src/sys/amd64/amd64/exception.S:602 > > > #19 0x0000000000000000 in ?? () > > > #20 0x0000000000000000 in ?? () > > > #21 0x0000000000000001 in ?? () > > > #22 0x0000000000000000 in ?? () > > > #23 0x0000000000000000 in ?? () > > > #24 0x0000000000000000 in ?? () > > > #25 0x0000000000000000 in ?? () > > > #26 0x0000000000000000 in ?? () > > > #27 0x0000000000000000 in ?? () > > > #28 0x0000000000000000 in ?? () > > > #29 0x0000000000000000 in ?? () > > > #30 0x0000000000000000 in ?? () > > > #31 0x0000000000000000 in ?? () > > > #32 0x0000000000000000 in ?? () > > > #33 0x0000000000000000 in ?? () > > > #34 0x0000000000000000 in ?? () > > > #35 0x0000000000000000 in ?? () > > > #36 0x0000000000000000 in ?? () > > > #37 0x0000000000000000 in ?? () > > > #38 0x0000000000000000 in ?? () > > > #39 0x0000000000000000 in ?? () > > > #40 0x0000000000000000 in ?? () > > > #41 0x0000000000000000 in ?? () > > > #42 0x0000000000000000 in ?? () > > > #43 0x0000000000000002 in ?? () > > > #44 0xffffffff81241c00 in tdq_cpu () > > > #45 0xfffffe0005501000 in ?? () > > > #46 0x0000000000000000 in ?? () > > > #47 0xffffff822fc266d0 in ?? () > > > #48 0xffffff822fc26678 in ?? () > > > #49 0xfffffe019ed11470 in ?? () > > > #50 0xffffffff8091352e in sched_switch (td=0x0, > > > newtd=0xfffffe00057424e0, flags=Variable "flags" is not available. > > > ) at /usr/src/sys/kern/sched_ule.c:1921 > > > Previous frame inner to this frame (corrupt stack?) > > > > > [...] > > > CC'ing Yong-Hyeon (yongari@) who helps maintain the bce(4) driver; it > > looks to me the issue is there. He may have some advice. > > I recall there had been a couple of bce(4) related crash reports( > e.g. kern/171739) but the root cause of the issue was not > identified yet. Give that most of crash reports indicate bce(4)'s > RX path, I suspect the driver modifies mbufs passed to upper stack. > I still have to revive one of my box that can host quad-port bce(4) > controllers but couldn't find time and new MB. I see a possible path leading to exactly that but it's a bit of a shot in the dark as I don't know how a) the hardware and b) the x86 bus_dmamap_load_buffer(9) behave in detail. Loic, could you please give the following patch a try (it's against the 9.1-RELEASE version of if_bce.c but probably also works with stable/9)? http://people.freebsd.org/~marius/bce_cleanup2.diff9.1 Marius From owner-freebsd-stable@FreeBSD.ORG Fri Mar 8 23:43:36 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B1D06FA1 for ; Fri, 8 Mar 2013 23:43:36 +0000 (UTC) (envelope-from bc979@lafn.org) Received: from zoom.lafn.org (zoom.lafn.org [108.92.93.123]) by mx1.freebsd.org (Postfix) with ESMTP id 733CAFBD for ; Fri, 8 Mar 2013 23:43:36 +0000 (UTC) Received: from [10.0.1.2] (static-71-177-216-148.lsanca.fios.verizon.net [71.177.216.148]) (authenticated bits=0) by zoom.lafn.org (8.14.3/8.14.2) with ESMTP id r28NhTLF054825 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 8 Mar 2013 15:43:30 -0800 (PST) (envelope-from bc979@lafn.org) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Sanity Check on Mac Mini From: Doug Hardie In-Reply-To: Date: Fri, 8 Mar 2013 15:43:28 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <428C87E0-7CF4-4664-9EF2-8CD582927AAB@lafn.org> References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> To: John Mehr X-Mailer: Apple Mail (2.1499) X-Virus-Scanned: clamav-milter 0.97 at zoom.lafn.org X-Virus-Status: Clean Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2013 23:43:36 -0000 On 7 March 2013, at 17:00, John Mehr wrote: >=20 >=20 >=20 > On Thu, 7 Mar 2013 14:18:23 -0800 > Doug Hardie wrote: >> On 7 March 2013, at 11:57, Kevin Oberman wrote: >>> On Thu, Mar 7, 2013 at 11:10 AM, Doug Hardie wrote: >>> On 7 March 2013, at 06:42, Richard Kuhns wrote: >>> > On 03/07/13 01:59, Doug Hardie wrote: >>> >> I have a new Mac Mini and have encountered the same problem = reported last year by Richard Kuhns. YongHyeon PYUN provided some = patches to the kernel that resolved the problem. However, without an = internet connection its a bit tricky to get them into the system. Here = is the approach I believe will work, but wanted to check first before I = really mess things up. >>> >> >>> >> 1. Downloaded from current today via svnweb.freebsd.org: >>> >> sys/dev/bge/if_bgereg.h >>> >> sys/dev/bge/if_bge.c >>> >> sys/dev/mii/brgphy.c >>> >> >>> >> I believe the patches are incorporated in today's versions. = The comments indicate such. Thus I don't need to apply the original = supplied patch. >>> >> >>> >> 2. Put those on a flash drive. >>> >> >>> >> 3. Install 9.1 release from flash drive onto the Mini disk. = Have to include the system source. >>> >> >>> >> 4. Copy the files from 1 above from flash over the files on the = disk. >>> >> >>> >> 5. Rebuild the kernel and install it. >>> >> >>> >> Thanks, >>> >> >>> >> -- Doug >>> > >>> > That's worked for me 3 times now. >>> Thanks. Well, I got 9.1 Release installed, but it won't boot from = the internal disk. It doesn't see the disk as bootable. I installed = using the entire disk for FreeBSD. I used the i386 release. Perhaps I = need to switch to the amd64 release? >>> I would generally recommend using the amd64 release, but it may not = get your system to boot. How is your disk partitioned? GPT? Some BIOSes = are broken and assume that a GPT formatted disk is UEFI and will not = recognize them if they lack the UEFI boot partition. UEFI boot is a = current project that seems likely to reach head in the fairly near = future, but it's not possible now. >> No idea what the default partitioning is for BSDInstall. However the = Mini is only EFI or UFEI with some fallbacks although the comments I = find in the web indicate that different models have different fallbacks. >> One comment indicates that an older unit will boot if its MBR = partitioning. I don't know if the new installer supports that or not. >>> You may be able to tweak your BIOS to get it to work or you may have = to install using the traditional partitioning system. The installer = defaults to GPT, but can create either. >>> I have such a system (ThinkPad T520) and I have two disks... one = that came with the system and containing Windows, and my GPT formatted = FreeBSD disk. I wrote a FreeBSD BootEasy boot into the MBR of the = Windows disk and it CAN boot the GPT disk just fine. Not ideal for most, = but it works well for me >> Based on a comment I say, waiting till the empty folder icon appears = and then plugging in the install memstick causes the mini to boot from = disk. That just downright weird, but it works. I could live with that, = but this is an unattended server and would experience some down time if = I am not there when there is a power failure. >> I just found some "instructions" for using MBR with bsdinstall, but = given there is an effort to create a UEFI boot which I suspect would = expect to find the GPT boot partition, perhaps I should just go with the = memstick approach? >=20 > Hello, >=20 > If you still have a drive with OS X on it, you may have some luck with = OS X's bless command: >=20 > = https://developer.apple.com/library/mac/#documentation/Darwin/Reference/Ma= npages/man8/bless.8.html >=20 > I got a late 2012 mac mini to boot FreeBSD 9.1 (AMD64) from a hard = drive using 'bless' (unfortunately I don't remember the exact command = line parameters I used). If you're looking to dual boot, the only luck = I had (without resorting to using third party software like rEFIt) was = to put the OS's on different drives and install FreeBSD using MBR on the = second drive. I have investigated the bless command and nothing I find on google gives = me any good ideal on what folder/file to bless. I am wondering if just = using the volume command and ignoring folder and file would work? From owner-freebsd-stable@FreeBSD.ORG Sat Mar 9 00:54:57 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B769FF9D for ; Sat, 9 Mar 2013 00:54:57 +0000 (UTC) (envelope-from jcm@visi.com) Received: from g2host.com (mailback3.g2host.com [208.42.184.243]) by mx1.freebsd.org (Postfix) with ESMTP id 6A45B244 for ; Sat, 9 Mar 2013 00:54:56 +0000 (UTC) Received: from [208.42.90.57] (account jcm@visi.com) by mailback3.g2host.com (CommuniGate Pro WEBUSER 5.3.11) with HTTP id 11092802 for freebsd-stable@freebsd.org; Fri, 08 Mar 2013 18:54:50 -0600 From: "John Mehr" Subject: Re: Sanity Check on Mac Mini To: X-Mailer: CommuniGate Pro WebUser v5.3.11 Date: Fri, 08 Mar 2013 18:54:50 -0600 Message-ID: In-Reply-To: <428C87E0-7CF4-4664-9EF2-8CD582927AAB@lafn.org> References: <51CB1227-3A5F-4688-B48D-4D0E47A17572@lafn.org> <5138A742.3090200@wintek.com> <97F9BA96-A328-4EF9-8E39-A8160AF9EB7A@lafn.org> <71F173FA-CB9C-43B4-A702-ABA82268EA83@lafn.org> <428C87E0-7CF4-4664-9EF2-8CD582927AAB@lafn.org> MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1; format="flowed" Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2013 00:54:57 -0000 > I have investigated the bless command and nothing I find >on google gives me any good ideal on what folder/file to >bless.  I am wondering if just using the volume command >and ignoring folder and file would work? Hello, If memory serves, I used it in device mode and used the --setBoot option to select the bootable FreeBSD partition.  I was trying to find a dual boot solution at the time and I remember giving up on the bless command when it booted me straight into FreeBSD.  I wish I could remember more... From owner-freebsd-stable@FreeBSD.ORG Sat Mar 9 08:50:00 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B5562B7A for ; Sat, 9 Mar 2013 08:50:00 +0000 (UTC) (envelope-from loic.blot@unix-experience.fr) Received: from smtp.smtpout.orange.fr (smtp10.smtpout.orange.fr [80.12.242.132]) by mx1.freebsd.org (Postfix) with ESMTP id BE8B8601 for ; Sat, 9 Mar 2013 08:49:58 +0000 (UTC) Received: from [10.42.69.152] ([82.120.74.222]) by mwinf5d33 with ME id 9Lpq1l0064nlqgJ03LpqtF; Sat, 09 Mar 2013 09:49:51 +0100 Message-ID: <1362819234.30912.2.camel@Nerz-PC.home> Subject: Re: Strange reboot since 9.1 From: =?ISO-8859-1?Q?Lo=EFc?= BLOT To: freebsd-stable@freebsd.org Date: Sat, 09 Mar 2013 09:53:54 +0100 In-Reply-To: <20130308161613.GA82746@alchemy.franken.de> References: <1362560123.16808.4.camel@iMac-LBlot.domain.iogs> <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> <20130307163827.GA96983@icarus.home.lan> <20130308023254.GC3246@michelle.cdnetworks.com> <20130308161613.GA82746@alchemy.franken.de> Organization: UNIX Experience Fr Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-bdBt2vLu8caQi9AXx/Zb" X-Mailer: Evolution 3.6.3 Mime-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: loic.blot@unix-experience.fr List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2013 08:50:00 -0000 --=-bdBt2vLu8caQi9AXx/Zb Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Marius Thanks for your patch, but it has no effect for stability. The server has rebooted this night after 8h uptime, same backtrace appears. --=20 Best regards, Lo=C3=AFc BLOT,=20 UNIX systems, security and network expert http://www.unix-experience.fr Le vendredi 08 mars 2013 =C3=A0 17:16 +0100, Marius Strobl a =C3=A9crit : > On Fri, Mar 08, 2013 at 11:32:54AM +0900, YongHyeon PYUN wrote: > > On Thu, Mar 07, 2013 at 08:38:27AM -0800, Jeremy Chadwick wrote: > > > On Thu, Mar 07, 2013 at 04:38:54PM +0100, Lo?c Blot wrote: > > > > Hi Marcelo, thanks. Here is a better trace: > > > >=20 > > > > --------------------------------- > > > >=20 > > > > kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.11 > > > > GNU gdb 6.1.1 [FreeBSD] > > > > Copyright 2004 Free Software Foundation, Inc. > > > > GDB is free software, covered by the GNU General Public License, an= d you > > > > are > > > > welcome to change it and/or distribute copies of it under certain > > > > conditions. > > > > Type "show copying" to see the conditions. > > > > There is absolutely no warranty for GDB. Type "show warranty" for > > > > details. > > > > This GDB was configured as "amd64-marcel-freebsd"... > > > >=20 > > > > Unread portion of the kernel message buffer: > > > >=20 > > > >=20 > > > > Fatal trap 12: page fault while in kernel mode > > > > cpuid =3D 0; apic id =3D 00 > > > > fault virtual address =3D 0x0 > > > > fault code =3D supervisor read data, page not present > > > > instruction pointer =3D 0x20:0xffffffff80a84414 > > > > stack pointer =3D 0x28:0xffffff822fc267a0 > > > > frame pointer =3D 0x28:0xffffff822fc26830 > > > > code segment =3D base 0x0, limit 0xfffff, type 0x1b > > > > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > > > > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > > > > current process =3D 12 (irq265: bce0) > > > > trap number =3D 12 > > > > panic: page fault > > > > cpuid =3D 0 > > > > KDB: stack backtrace: > > > > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > > > > #1 0xffffffff808ea8be at panic+0x1ce > > > > #2 0xffffffff80bd8240 at trap_fatal+0x290 > > > > #3 0xffffffff80bd857d at trap_pfault+0x1ed > > > > #4 0xffffffff80bd8b9e at trap+0x3ce > > > > #5 0xffffffff80bc315f at calltrap+0x8 > > > > #6 0xffffffff80a861d5 at udp_input+0x475 > > > > #7 0xffffffff80a043dc at ip_input+0xac > > > > #8 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > > #9 0xffffffff809a35cd at ether_demux+0x14d > > > > #10 0xffffffff809a38a4 at ether_nh_input+0x1f4 > > > > #11 0xffffffff809adafb at netisr_dispatch_src+0x20b > > > > #12 0xffffffff80438fd7 at bce_intr+0x487 > > > > #13 0xffffffff808be8d4 at intr_event_execute_handlers+0x104 > > > > #14 0xffffffff808c0076 at ithread_loop+0xa6 > > > > #15 0xffffffff808bb9ef at fork_exit+0x11f > > > > #16 0xffffffff80bc368e at fork_trampoline+0xe > > > > Uptime: 27m20s > > > > Dumping 1265 out of 8162 > > > > MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..92% > > > >=20 > > > > #0 doadump (textdump=3DVariable "textdump" is not available. > > > > ) at pcpu.h:224 > > > > 224 pcpu.h: No such file or directory. > > > > in pcpu.h > > > > (kgdb) bt f > > > > #0 doadump (textdump=3DVariable "textdump" is not available. > > > > ) at pcpu.h:224 > > > > No locals. > > > > #1 0xffffffff808ea3a1 in kern_reboot (howto=3D260) > > > > at /usr/src/sys/kern/kern_shutdown.c:448 > > > > _ep =3D Variable "_ep" is not available. > > > > (kgdb) bt > > > > #0 doadump (textdump=3DVariable "textdump" is not available. > > > > ) at pcpu.h:224 > > > > #1 0xffffffff808ea3a1 in kern_reboot (howto=3D260) > > > > at /usr/src/sys/kern/kern_shutdown.c:448 > > > > #2 0xffffffff808ea897 in panic (fmt=3D0x1
) > > > > at /usr/src/sys/kern/kern_shutdown.c:636 > > > > #3 0xffffffff80bd8240 in trap_fatal (frame=3D0xc, eva=3DVariable "= eva" is > > > > not available. > > > > ) at /usr/src/sys/amd64/amd64/trap.c:857 > > > > #4 0xffffffff80bd857d in trap_pfault (frame=3D0xffffff822fc266f0, > > > > usermode=3D0) at /usr/src/sys/amd64/amd64/trap.c:773 > > > > #5 0xffffffff80bd8b9e in trap (frame=3D0xffffff822fc266f0) > > > > at /usr/src/sys/amd64/amd64/trap.c:456 > > > > #6 0xffffffff80bc315f in calltrap () > > > > at /usr/src/sys/amd64/amd64/exception.S:228 > > > > #7 0xffffffff80a84414 in udp_append (inp=3D0xfffffe019e2a1000, > > > > ip=3D0xfffffe00444b6c80, n=3D0xfffffe00444b6c00, off=3D20, > > > > udp_in=3D0xffffff822fc268a0) at /usr/src/sys/netinet/udp_usrreq.c:2= 52 > > > > #8 0xffffffff80a861d5 in udp_input (m=3D0xfffffe00444b6c00, off=3D= Variable > > > > "off" is not available. > > > > ) at /usr/src/sys/netinet/udp_usrreq.c:618 > > > > #9 0xffffffff80a043dc in ip_input (m=3D0xfffffe00444b6c00) > > > > at /usr/src/sys/netinet/ip_input.c:760 > > > > #10 0xffffffff809adafb in netisr_dispatch_src (proto=3D1, source=3D= Variable > > > > "source" is not available. > > > > ) at /usr/src/sys/net/netisr.c:1013 > > > > #11 0xffffffff809a35cd in ether_demux (ifp=3D0xfffffe00053fa000, > > > > m=3D0xfffffe00444b6c00) at /usr/src/sys/net/if_ethersubr.c:940 > > > > #12 0xffffffff809a38a4 in ether_nh_input (m=3DVariable "m" is not > > > > available. > > > > ) at /usr/src/sys/net/if_ethersubr.c:759 > > > > #13 0xffffffff809adafb in netisr_dispatch_src (proto=3D9, source=3D= Variable > > > > "source" is not available. > > > > ) at /usr/src/sys/net/netisr.c:1013 > > > > #14 0xffffffff80438fd7 in bce_intr (xsc=3DVariable "xsc" is not ava= ilable. > > > > ) at /usr/src/sys/dev/bce/if_bce.c:6903 > > > > #15 0xffffffff808be8d4 in intr_event_execute_handlers (p=3DVariable= "p" is > > > > not available. > > > > ) at /usr/src/sys/kern/kern_intr.c:1262 > > > > #16 0xffffffff808c0076 in ithread_loop (arg=3D0xfffffe00057424e0) > > > > at /usr/src/sys/kern/kern_intr.c:1275 > > > > #17 0xffffffff808bb9ef in fork_exit (callout=3D0xffffffff808bffd0 > > > > , arg=3D0xfffffe00057424e0, frame=3D0xffffff822fc26c4= 0) > > > > at /usr/src/sys/kern/kern_fork.c:992 > > > > #18 0xffffffff80bc368e in fork_trampoline () > > > > at /usr/src/sys/amd64/amd64/exception.S:602 > > > > #19 0x0000000000000000 in ?? () > > > > #20 0x0000000000000000 in ?? () > > > > #21 0x0000000000000001 in ?? () > > > > #22 0x0000000000000000 in ?? () > > > > #23 0x0000000000000000 in ?? () > > > > #24 0x0000000000000000 in ?? () > > > > #25 0x0000000000000000 in ?? () > > > > #26 0x0000000000000000 in ?? () > > > > #27 0x0000000000000000 in ?? () > > > > #28 0x0000000000000000 in ?? () > > > > #29 0x0000000000000000 in ?? () > > > > #30 0x0000000000000000 in ?? () > > > > #31 0x0000000000000000 in ?? () > > > > #32 0x0000000000000000 in ?? () > > > > #33 0x0000000000000000 in ?? () > > > > #34 0x0000000000000000 in ?? () > > > > #35 0x0000000000000000 in ?? () > > > > #36 0x0000000000000000 in ?? () > > > > #37 0x0000000000000000 in ?? () > > > > #38 0x0000000000000000 in ?? () > > > > #39 0x0000000000000000 in ?? () > > > > #40 0x0000000000000000 in ?? () > > > > #41 0x0000000000000000 in ?? () > > > > #42 0x0000000000000000 in ?? () > > > > #43 0x0000000000000002 in ?? () > > > > #44 0xffffffff81241c00 in tdq_cpu () > > > > #45 0xfffffe0005501000 in ?? () > > > > #46 0x0000000000000000 in ?? () > > > > #47 0xffffff822fc266d0 in ?? () > > > > #48 0xffffff822fc26678 in ?? () > > > > #49 0xfffffe019ed11470 in ?? () > > > > #50 0xffffffff8091352e in sched_switch (td=3D0x0, > > > > newtd=3D0xfffffe00057424e0, flags=3DVariable "flags" is not availab= le. > > > > ) at /usr/src/sys/kern/sched_ule.c:1921 > > > > Previous frame inner to this frame (corrupt stack?) > > > >=20 > >=20 > > [...] > >=20 > > > CC'ing Yong-Hyeon (yongari@) who helps maintain the bce(4) driver; it > > > looks to me the issue is there. He may have some advice. > >=20 > > I recall there had been a couple of bce(4) related crash reports( > > e.g. kern/171739) but the root cause of the issue was not > > identified yet. Give that most of crash reports indicate bce(4)'s > > RX path, I suspect the driver modifies mbufs passed to upper stack. > > I still have to revive one of my box that can host quad-port bce(4) > > controllers but couldn't find time and new MB. >=20 > I see a possible path leading to exactly that but it's a bit of a > shot in the dark as I don't know how a) the hardware and b) the x86 > bus_dmamap_load_buffer(9) behave in detail. > Loic, could you please give the following patch a try (it's against > the 9.1-RELEASE version of if_bce.c but probably also works with > stable/9)? > http://people.freebsd.org/~marius/bce_cleanup2.diff9.1 >=20 > Marius >=20 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" --=-bdBt2vLu8caQi9AXx/Zb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iF4EABEIAAYFAlE6+LAACgkQh290DZyz8uYzVgD/SsSRpnT6oLI5MyuriKFcl0eh YnAl3Xsym2V8bxqv7NMA+gO1OgacND1UQtHaLuQzu3j3fMlzE2TjiiNBkch9n9mi =9+S7 -----END PGP SIGNATURE----- --=-bdBt2vLu8caQi9AXx/Zb-- From owner-freebsd-stable@FreeBSD.ORG Sat Mar 9 09:36:06 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A2D2D90A for ; Sat, 9 Mar 2013 09:36:06 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) by mx1.freebsd.org (Postfix) with ESMTP id 4745C77F for ; Sat, 9 Mar 2013 09:36:05 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.80.1) for freebsd-stable@freebsd.org with esmtp (envelope-from ) id <1UEGCD-000luI-2f>; Sat, 09 Mar 2013 10:36:05 +0100 Received: from e178025158.adsl.alicedsl.de ([85.178.25.158] helo=munin.geoinf.fu-berlin.de) by inpost2.zedat.fu-berlin.de (Exim 4.80.1) for freebsd-stable@freebsd.org with esmtpsa (envelope-from ) id <1UEGCD-0035Oa-0G>; Sat, 09 Mar 2013 10:36:05 +0100 Message-ID: <513B02C5.9090406@zedat.fu-berlin.de> Date: Sat, 09 Mar 2013 10:37:09 +0100 From: "Hartmann, O." Organization: FU Berlin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130309 Thunderbird/17.0.4 MIME-Version: 1.0 To: FreeBSD Stable Subject: lang/ruby19: ruby-1.9.3.392,1 is vulnerable: ** [check-vulnerable] Error code 1 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Originating-IP: 85.178.25.158 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2013 09:36:06 -0000 I try to compile port lang/ruby19 and I always get on a FreeBSD 9.1-STABLE box the following error message, which is obviously triggered by some port auditing - but I do not find the "knob" to switch it off. Can someone give a hint, please? Regards, Oliver ===> Cleaning for ruby-1.9.3.392,1 ===> ruby-1.9.3.392,1 has known vulnerabilities: ruby-1.9.3.392,1 is vulnerable: Ruby -- XSS exploit of RDoc documentation generated by rdoc WWW: http://portaudit.FreeBSD.org/d3e96508-056b-4259-88ad-50dc8d1978a6.html ruby-1.9.3.392,1 is vulnerable: Ruby -- Denial of Service and Unsafe Object Creation Vulnerability in JSON WWW: http://portaudit.FreeBSD.org/c79eb109-a754-45d7-b552-a42099eb2265.html => Please update your ports tree and try again. *** [check-vulnerable] Error code 1 Stop in /usr/ports/lang/ruby19. *** [build] Error code 1 Stop in /usr/ports/lang/ruby19. From owner-freebsd-stable@FreeBSD.ORG Sat Mar 9 09:38:57 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 329F1AB8; Sat, 9 Mar 2013 09:38:57 +0000 (UTC) (envelope-from utisoft@gmail.com) Received: from mail-ee0-f53.google.com (mail-ee0-f53.google.com [74.125.83.53]) by mx1.freebsd.org (Postfix) with ESMTP id 778947AC; Sat, 9 Mar 2013 09:38:56 +0000 (UTC) Received: by mail-ee0-f53.google.com with SMTP id e53so1461570eek.40 for ; Sat, 09 Mar 2013 01:38:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=thdvJ+0cRASO+gtDUhcqxKAwjcLYNH5UeRteiCOxRj0=; b=jYSHVA/eGJXy5Yk2G7U9hwH5nvS8affwD/yzbbs8E8L4tj2CBq3JVfG/jdyqBnvfNz bxD1h/lGZEFkkHw7XEayISXE93FL95AkxiZKxtwdbCXQ3eaF4A2000LTb8lCNOFGiJxB CuyNMDsfJBTNC/ZMDuvysToFEoss5lU6NPN+PYAm6fMhsaBKs18sGb/SDOeKZNV1yBri dW9pKwbrIF4k5nv+WI3mZsSOtWmOBWDRrs/s0dlQbdKucIdBxvoApGbL6Er1phuPo7B4 fct/bHe2a+VT4KZ2J0E7urY2dEMlqdlypuCVtuYZ8XAYvoAFmlNf2SSdFyGjTm33hNZY ePmg== X-Received: by 10.14.0.135 with SMTP id 7mr14482488eeb.5.1362821929275; Sat, 09 Mar 2013 01:38:49 -0800 (PST) MIME-Version: 1.0 Sender: utisoft@gmail.com Received: by 10.14.124.7 with HTTP; Sat, 9 Mar 2013 01:38:19 -0800 (PST) In-Reply-To: <513B02C5.9090406@zedat.fu-berlin.de> References: <513B02C5.9090406@zedat.fu-berlin.de> From: Chris Rees Date: Sat, 9 Mar 2013 09:38:19 +0000 X-Google-Sender-Auth: qQMTIt34kg8Mhn1hUNpxCgBI9pk Message-ID: Subject: Re: lang/ruby19: ruby-1.9.3.392,1 is vulnerable: ** [check-vulnerable] Error code 1 To: "Hartmann, O." Content-Type: text/plain; charset=ISO-8859-1 Cc: "ports@freebsd.org" , FreeBSD Stable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2013 09:38:57 -0000 On 9 March 2013 09:37, Hartmann, O. wrote: > I try to compile port lang/ruby19 and I always get on a FreeBSD > 9.1-STABLE box the following error message, which is obviously triggered > by some port auditing - but I do not find the "knob" to switch it off. > > Can someone give a hint, please? I guess you sent it to -stable by mistake-- the knob you need is DISABLE_VULNERABILITIES=yes. I'm sure I don't need to lecture you on "Be careful with this" :) Chris From owner-freebsd-stable@FreeBSD.ORG Sat Mar 9 14:32:26 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 9ADB6CF9 for ; Sat, 9 Mar 2013 14:32:26 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 275FD60C for ; Sat, 9 Mar 2013 14:32:25 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.6/8.14.6/ALCHEMY.FRANKEN.DE) with ESMTP id r29EWIJg018100; Sat, 9 Mar 2013 15:32:19 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.6/8.14.6/Submit) id r29EWIER018099; Sat, 9 Mar 2013 15:32:18 +0100 (CET) (envelope-from marius) Date: Sat, 9 Mar 2013 15:32:18 +0100 From: Marius Strobl To: =?iso-8859-1?Q?Lo=EFc?= BLOT Subject: Re: Strange reboot since 9.1 Message-ID: <20130309143218.GA18055@alchemy.franken.de> References: <1362652057.16808.23.camel@iMac-LBlot.domain.iogs> <51388E42.5040500@FreeBSD.org> <1362661965.16808.36.camel@iMac-LBlot.domain.iogs> <51389ED5.6030207@bsdinfo.com.br> <1362670734.16808.48.camel@iMac-LBlot.domain.iogs> <20130307163827.GA96983@icarus.home.lan> <20130308023254.GC3246@michelle.cdnetworks.com> <20130308161613.GA82746@alchemy.franken.de> <1362819234.30912.2.camel@Nerz-PC.home> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1362819234.30912.2.camel@Nerz-PC.home> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2013 14:32:26 -0000 On Sat, Mar 09, 2013 at 09:53:54AM +0100, Loïc BLOT wrote: > Hi Marius > Thanks for your patch, but it has no effect for stability. The server > has rebooted this night after 8h uptime, same backtrace appears. Okay, could you please give the following patch a try instead in order to test another theory? http://people.freebsd.org/~marius/bce_rx_corruption.diff Marius From owner-freebsd-stable@FreeBSD.ORG Sat Mar 9 14:59:28 2013 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BEEE5711 for ; Sat, 9 Mar 2013 14:59:28 +0000 (UTC) (envelope-from filippomore@yahoo.com) Received: from nm15-vm2.bullet.mail.ne1.yahoo.com (nm15-vm2.bullet.mail.ne1.yahoo.com [98.138.91.91]) by mx1.freebsd.org (Postfix) with ESMTP id 88192747 for ; Sat, 9 Mar 2013 14:59:28 +0000 (UTC) Received: from [98.138.226.179] by nm15.bullet.mail.ne1.yahoo.com with NNFMP; 09 Mar 2013 14:59:22 -0000 Received: from [98.138.88.239] by tm14.bullet.mail.ne1.yahoo.com with NNFMP; 09 Mar 2013 14:59:22 -0000 Received: from [127.0.0.1] by omp1039.mail.ne1.yahoo.com with NNFMP; 09 Mar 2013 14:59:22 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 842951.19008.bm@omp1039.mail.ne1.yahoo.com Received: (qmail 95659 invoked by uid 60001); 9 Mar 2013 14:59:22 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1362841162; bh=ikvhE5o5OL+9vsuEh6JOE5Wgl4DTsvKJOJyA8e8UeUY=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=XWY40oFnCfU9opSkWxntgkaMHABhH1WEe9h7tSSoqujEk7njfyzGQ4Y9UZ2WaRNePG/FyHn3oXHzOtJ4m0y6R70rxM5UH9m6JH4AUnoCfatIeDbvGD8eL3WbM4tcglA15yxp9sVUWu9vUhvAzGgc/Q8oR5iMawfhAsJCKb4EWoo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=C+I4nMpDiJUGVO9r/qeid+SaYpWNLtUxm71rvu+piLYRVjyAGw/m/OF15Czcp0vunl0Msw6FxzHgPj7HbjpD3UjTbAq1FjdUvhSkSG6IGT1gvv/YZSdT4IY+ruaOG3efB/BihyZVml+dxUWkWvHtTckwTCKif+b3hjU0NlW7QoM=; X-YMail-OSG: ko_9Au8VM1nYKeMrjIJiKCzQKmns39C.crwDkJNImdyo5u7 xRqNNFxQvX426a77wgPvkEsJJot5Ft1UTmc8qt3bECYKM7hxvcvgKyMvONkC gv10LmPsxT.I.eSgNLl4lMSY9AaltUItH9Q3HZs7XFa.qiW9oSdAq3sAuvBQ PQHfCpqDFNrrLulFKBoOjNCQT8ZfrCMSS31u48y6Sad1wthtLlSQ2y1pbjhT 77XIdU1YNJrs.CKel264_rpvdPcbPmHgEwqrswq6WhBN9rNpojGDVFebRyqx 8WN7itw88li8ukLLp6oX3XUNzYIr63mHBf5ASqcPDJcRPsVbaCKsSiGbGqng 6f268NlUCG4CVRxWlnE.nq9TNOU0fLqhlMOu.o2SPrdVkA398ZfpirZKcP2v fkwlZ_FluBF1TqMwSqzoww0DOAI3Cz.wBObnqlzDZNSttzoJ.KB3RyaLEZj7 jNCi5T7qpxfjHA2rbG7IpdUHFVEdUcWILOLV5NO_rWfDYmXFLvve8SNSXiRB 3kUSJ1c2S8OvmTfaJtU.R43fUMO78hxv0tH1rAJK_plsjALLAtwBQrRw- Received: from [2.32.14.202] by web121903.mail.ne1.yahoo.com via HTTP; Sat, 09 Mar 2013 06:59:22 PST X-Rocket-MIMEInfo: 001.001, SSBnZXQgdGhlIGZvbGxvd2luZyBjb21waWxlIGVycm9yIHdoZW4gYXR0ZW1wdGluZyB0byBidWlsZCBvcGVuamRrNiBvbgpGcmVlQlNEIFNUSU5HLnRlbGV0dS5pdCA5LjEtU1RBQkxFIEZyZWVCU0QgOS4xLVNUQUJMRSAjMDogU3VuIE1hciDCoDMgMDA6MDk6MDYgVVRDIDIwMTMgwqAgwqAgcm9vdEBzbmFwLmZyZWVic2Qub3JnOi91c3Ivb2JqL3Vzci9zcmMvc3lzL0dFTkVSSUMgwqBpMzg2CgoKCgp2YS9vcGVuamRrNi93b3JrL2hvdHNwb3Qvc3JjL3NoYXJlL3ZtL2FkbGMvYWRscGFyc2UuY3BwwqAKL3Vzci8BMAEBAQE- X-Mailer: YahooMailWebService/0.8.135.514 Message-ID: <1362841162.85703.YahooMailNeo@web121903.mail.ne1.yahoo.com> Date: Sat, 9 Mar 2013 06:59:22 -0800 (PST) From: Filippo Moretti Subject: problem compiling openjdk6 To: "stable@freebsd.org" MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Filippo Moretti List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2013 14:59:28 -0000 I get the following compile error when attempting to build openjdk6 on=0AFr= eeBSD STING.teletu.it 9.1-STABLE FreeBSD 9.1-STABLE #0: Sun Mar =A03 00:09:= 06 UTC 2013 =A0 =A0 root@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC =A0i= 386=0A=0A=0A=0A=0Ava/openjdk6/work/hotspot/src/share/vm/adlc/adlparse.cpp= =A0=0A/usr/ports/java/openjdk6/work/hotspot/src/share/vm/adlc/adlparse.cpp:= 1: sorry, unimplemented: 64-bit mode not compiled in=0Agmake[6]: *** [../ge= nerated/adfiles/adlparse.o] Error 1=0Agmake[6]: Leaving directory `/usr/por= ts/java/openjdk6/work/build/bsd-i386/hotspot/outputdir/bsd_amd64_compiler2/= product'=0Agmake[5]: *** [ad_stuff] Error 2=0Agmake[5]: Leaving directory `= /usr/ports/java/openjdk6/work/build/bsd-i386/hotspot/outputdir/bsd_amd64_co= mpiler2/product'=0Agmake[4]: *** [product] Error 2=0Agmake[4]: Leaving dire= ctory `/usr/ports/java/openjdk6/work/build/bsd-i386/hotspot/outputdir'=0Agm= ake[3]: *** [generic_build2] Error 2=0Agmake[3]: Leaving directory `/usr/po= rts/java/openjdk6/work/hotspot/make'=0Agmake[2]: *** [product] Error 2=0Agm= ake[2]: Leaving directory `/usr/ports/java/openjdk6/work/hotspot/make'=0Agm= ake[1]: *** [hotspot-build] Error 2=0Agmake[1]: Leaving directory `/usr/por= ts/java/openjdk6/work'=0Agmake: *** [build_product_image] Error 2=0A*** [do= -build] Error code 1=0A=0AStop in /usr/ports/java/openjdk6.=0A*** [install]= Error code 1=0A=0AStop in /usr/ports/java/openjdk6.=0A*** [build-depends] = Error code 1=0A=0AStop in /usr/ports/java/icedtea-web.=0A*** [install] Erro= r code 1=0A=0AStop in /usr/ports/java/icedtea-web.=0Asincerely=0AFilippo