Date: Wed, 29 Jun 2011 15:15:30 +0200 From: Alexander Leidinger <Alexander@Leidinger.net> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: Glen Barber <gjb@FreeBSD.org>, fs@FreeBSD.org Subject: Re: [RFC] [patch] periodic status-zfs: list pools in daily emails Message-ID: <20110629151530.13154p1oc899fhwy@webmail.leidinger.net> In-Reply-To: <20110629111915.GA75648@icarus.home.lan> References: <20110628203228.GA4957@onyx.glenbarber.us> <20110629104633.26824evikzh8tgtl@webmail.leidinger.net> <4E0B006C.8050000@FreeBSD.org> <20110629111915.GA75648@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Wed, 29 Jun 2011 04:19:15 -0700): > At my workplace we use a heavily modified version of Netsaint, with bits > and pieces Nagios-like created. I happened to write the perl code used > to monitor our production Solaris systems (~2000+ servers) for ZFS pool > status. It parses "zpool status -x" output, monitoring read, write, and > checksum errors per pool, vdev, and device, in addition to general pool > status. I tested too many conditions, not to mention had to deal with > parsing pains as a result of ZFS code changes, plus supporting > completely different revisions of Solaris 10 in production. And before > someone asks: no, I cannot provide the source (employee agreements, LCA, > etc...). I did have to dig through ZFS source code to figure out a > bunch of necessary bits too, so don't be surprised if you have to too. > > My recommendation: just look for pools which are in any state other than > ONLINE (don't try to be smart with an OR regex looking for all the > combos; it doesn't scale when ZFS changes), and you should also handle > situations where a device is currently undergoing manual or automatic > device replacement (specifically regex '^[\t\s]+replacing\s+DEGRADED'), > which will be important to people who keep spares in pools. This might > be difficult with just standard BSD sh, but BSD awk should be able to > handle this. Thanks for your suggestions, but the script is intentionally dump: It runs "zpool status" and looks for "all pools are healthy". If this line is not there, the output is marked as important (this is important if you decided to configure periodic.conf to skip unimportant output). All the rest is up to the person which reads the daily run output. The zpool list output which is added in the patch under discussion, is just displaying "zpool list" additionally to the output of zpool status (if activated). Bye, Alexander. -- "I'll reason with him." -- Vito Corleone, "Chapter 14", page 200 http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110629151530.13154p1oc899fhwy>