From owner-freebsd-stable@FreeBSD.ORG Fri Jan 25 10:25:53 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 73FDB936 for ; Fri, 25 Jan 2013 10:25:53 +0000 (UTC) (envelope-from rb@gid.co.uk) Received: from mx0.gid.co.uk (mx0.gid.co.uk [194.32.164.250]) by mx1.freebsd.org (Postfix) with ESMTP id 26CB0CB8 for ; Fri, 25 Jan 2013 10:25:52 +0000 (UTC) Received: from [194.32.164.26] (80-46-130-69.static.dsl.as9105.com [80.46.130.69]) by mx0.gid.co.uk (8.14.2/8.14.2) with ESMTP id r0PACPta091054; Fri, 25 Jan 2013 10:12:25 GMT (envelope-from rb@gid.co.uk) Subject: Re: Spontaneous reboots on Intel i5 and FreeBSD 9.0 Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Bob Bishop In-Reply-To: Date: Fri, 25 Jan 2013 10:12:21 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1358527685.32417.237.camel@revolution.hippie.lan> <20130118173602.GA76438@neutralgood.org> <20130119201914.84B761CB@server.theusgroup.com> To: Marin Atanasov Nikolov X-Mailer: Apple Mail (2.1283) Cc: ml-freebsd-stable , John X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jan 2013 10:25:53 -0000 Hi, On 25 Jan 2013, at 09:29, Marin Atanasov Nikolov wrote: > Hello again :) >=20 > Here's my update on these spontaneous reboots after less than a week = since > I've updated to stable/9. >=20 > First two days the system was running fine with no reboots happening, = so I > though that this update actually fixed it, but I was wrong. >=20 > The reboots are still happening and still no clear evidence of the = root > cause. What I did so far: >=20 > * Ran disks tests -- looking good > * Ran memtest -- looking good > * Replaced power cables > * Ran UPS tests -- looking good > * Checked for any bad capacitors -- none found > * Removed all ZFS snapshots >=20 > There is also one more machine connected to the same UPS, so if it was = a > UPS issue I'd expect that the other one reboots too, but that's not = the > case. >=20 > Now that I've excluded the hardware part of this problem Have you done anything to rule out the machine's power supply? > I started looking > again into the software side, and this time in particular -- ZFS. >=20 > I'm running FreeBSD 9.1-STABLE #1 r245686 on a Intel i5 with 8Gb of = memory. >=20 > A quick look at top(1) showed lots of memory usage by ARC and my = available > free memory dropping fast. I've made a screenshot, which you can see = on the > link below: >=20 > * http://users.unix-heaven.org/~dnaeon/top-zfs-arc.jpg >=20 > So I went to the FreeBSD Wiki and started reading the ZFS Tuning Guide = [1], > but honestly at the end I was not sure which parameters I need to > increase/decrease and to what values. >=20 > Here's some info about my current parameters. >=20 > % sysctl vm.kmem_size_max > vm.kmem_size_max: 329853485875 >=20 > % sysctl vm.kmem_size > vm.kmem_size: 8279539712 >=20 > % sysctl vfs.zfs.arc_max > vfs.zfs.arc_max: 7205797888 >=20 > % sysctl kern.maxvnodes > kern.maxvnodes: 206227 >=20 > There's one script at the ZFSTuningGuide which calculates kernel = memory > utilization, and for me these values are listed below: >=20 > TEXT=3D22402749, 21.3649 MB > DATA=3D4896264192, 4669.44 MB > TOTAL=3D4918666941, 4690.81 MB >=20 > While looking for ZFS tuning I've also stumbled upon this thread in = the > FreeBSD Forums [2], where the OP describes a similar behaviour to what = I am > already experiencing, so I'm quite worried now that the reason for = these > crashes is ZFS. >=20 > Before jumping into any change to the kernel parameters (vm.kmem_size, > vm.kmem_max_size, kern.maxvnodes, vfs.zfs.arc_max) I'd like to hear = any > feedback from people that have already done such optimizations on = their ZFS > systems. >=20 > Could you please share what are the optimal values for these = parameters on > a system with 8Gb of memory? Is there a way to calculate these values = or is > it just a "test-and-see-which-fits-better" way of doing this? >=20 > Thanks and regards, > Marin >=20 > [1]: https://wiki.freebsd.org/ZFSTuningGuide > [2]: http://forums.freebsd.org/showthread.php?t=3D9143 >=20 >=20 > On Sun, Jan 20, 2013 at 3:44 PM, Marin Atanasov Nikolov = wrote: >=20 >>=20 >>=20 >>=20 >> On Sat, Jan 19, 2013 at 10:19 PM, John wrote: >>=20 >>>> At 03:00am I can see that periodic(8) runs, but I don't see what = could >>> have >>>> taken so much of the free memory. I'm also running this system on = ZFS and >>>> have daily rotating ZFS snapshots created - currently the number of = ZFS >>>> snapshots are > 1000, and not sure if that could be causing this. = Here's >>> a >>>> list of the periodic(8) daily scripts that run at 03:00am time. >>>>=20 >>>> % ls -1 /etc/periodic/daily >>>> 800.scrub-zfs >>>>=20 >>>> % ls -1 /usr/local/etc/periodic/daily >>>> 402.zfSnap >>>> 403.zfSnap_delete >>>=20 >>> On a couple of my zfs machines, I've found running a scrub along = with >>> other >>> high file system users to be a problem. I therefore run scrub from = cron >>> and >>> schedule it so it doesn't overlap with periodic. >>>=20 >>> I also found on a machine with an i3 and 4G ram that overlapping = scrubs >>> and >>> snapshot destroy would cause the machine to grind to the point of = being >>> non-responsive. This was not a problem when the machine was new, but >>> became one >>> as the pool got larger (dedup is off and the pool is at 45% = capacity). >>>=20 >>> I use my own zfs management script and it prevents snapshot destroys = from >>> overlapping scrubs, and with a lockfile it prevents a new destroy = from >>> being >>> initiated when an old one is still running. >>>=20 >>> zfSnap has its -S switch to prevent actions during a scrub which you >>> should >>> use if you haven't already. >>>=20 >>>=20 >> Hi John, >>=20 >> Thanks for the hints. It was a long time since I've setup zfSnap and = I've >> just checked the configuration and I am using the "-s -S" flags, so = there >> should be no overlapping. >>=20 >> Meanwhile I've updated to 9.1-RELEASE, but then I hit an issue when = trying >> to reboot the system (which appears to be discussed a lot in a = separate >> thread). >>=20 >> Then I've updated to stable/9, so at the least the reboot issue is = now >> solved. Since I've to stable/9 I'm monitoring the system's memory = usage and >> so far it's been pretty stable, so I'll keep an eye of an update to >> stable/9 has actually fixed this strange issue. >>=20 >> Thanks again, >> Marin >>=20 >>=20 >>> Since making these changes, a machine that would have to be rebooted >>> several >>> times a week has now been up 61 days. >>>=20 >>> John Theus >>> TheUs Group >>>=20 >>=20 >>=20 >>=20 >> -- >> Marin Atanasov Nikolov >>=20 >> dnaeon AT gmail DOT com >> http://www.unix-heaven.org/ >>=20 >=20 >=20 >=20 > --=20 > Marin Atanasov Nikolov >=20 > dnaeon AT gmail DOT com > http://www.unix-heaven.org/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to = "freebsd-stable-unsubscribe@freebsd.org" >=20 -- Bob Bishop +44 (0)118 940 1243 rb@gid.co.uk fax +44 (0)118 940 1295 mobile +44 (0)783 626 4518