From owner-freebsd-stable@FreeBSD.ORG Sat Mar 3 05:32:22 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F20A216A400 for ; Sat, 3 Mar 2007 05:32:21 +0000 (UTC) (envelope-from scrappy@freebsd.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.freebsd.org (Postfix) with ESMTP id A9ABE13C48D for ; Sat, 3 Mar 2007 05:32:21 +0000 (UTC) (envelope-from scrappy@freebsd.org) Received: from localhost (unknown [200.46.204.191]) by hub.org (Postfix) with ESMTP id 5B07385C8C7; Sat, 3 Mar 2007 01:32:20 -0400 (AST) Received: from hub.org ([200.46.204.220]) by localhost (mx1.hub.org [200.46.204.191]) (amavisd-new, port 10024) with ESMTP id 80246-04; Sat, 3 Mar 2007 01:32:20 -0400 (AST) Received: from ganymede.hub.org (blk-89-241-126.eastlink.ca [24.89.241.126]) by hub.org (Postfix) with ESMTP id B86ED85C8C6; Sat, 3 Mar 2007 01:32:19 -0400 (AST) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id 746465F825; Sat, 3 Mar 2007 01:32:20 -0400 (AST) Date: Sat, 03 Mar 2007 01:32:20 -0400 From: "Marc G. Fournier" To: Antony Mawer Message-ID: <3AF45A659F5D4E8DD7260AA1@ganymede.hub.org> In-Reply-To: <45E60761.8050101@mawer.org> References: <5F9C60E2708CB953C06B21EA@ganymede.hub.org> <45E60761.8050101@mawer.org> X-Mailer: Mulberry/4.0.7 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Cc: freebsd-stable@freebsd.org Subject: Re: Some days, it doesn't pay to upgrade ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2007 05:32:22 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I don't know how critical this is, but I just thought about it ... this is my only system running gmirror ... everything seems fine according ot gmirror status, but maybe something iswron gthere I'm not seeing: Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 destroyed. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 removed from md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 removed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 destroyed. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 removed from md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 destroyed. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1 created (id=2282154470). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da2 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider da1 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md1: provider mirror/md1 launched. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2 created (id=3089402334). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 detected. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da4 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider da3 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device md2: provider mirror/md2 launched. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm created (id=2175292049). Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 detected. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 created (id=1094782536). Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md1 attached to md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Disk mirror/md2 attached to md0. Mar 3 01:25:52 mars kernel: GEOM_STRIPE: Device md0 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Force device vm start due to timeout. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider da5 activated. Mar 3 01:25:52 mars kernel: GEOM_MIRROR: Device vm: provider mirror/vm launched. mirror/md1 COMPLETE da1 da2 mirror/md2 COMPLETE da3 da4 mirror/vm DEGRADED da5 I'm not using da5 right now, its just in there ... went with a RAID1+0 vs RAID5 configuration ... - --On Thursday, March 01, 2007 09:51:13 +1100 Antony Mawer wrote: > On 27/02/2007 11:59 PM, Marc G. Fournier wrote: >> After 155 days of problem free uptime, I upgraded my 6-STABLE system the >> other day to the latest cvsup ... 3 days later, the whole thing hung solid >> with: >> >> >> Feb 27 04:32:49 mars uptimec: The server requested that we do a new login >> Feb 27 04:33:00 mars kernel: maxproc limit exceeded by uid 0, please see >> tuning(7) and login.conf(5). >> Feb 27 04:33:10 mars kernel: maxproc limit exceeded by uid 60, please see >> tuning(7) and login.conf(5). >> >> Stupid question: why isn't there some mechanism that prevents new processes >> from starting up, instead of locking up the whole server? I'm not asking >> for the evilness of Linux, where it arbitrarily kills off existing >> processes, but if maxproc is hit, why continue to try and start up new ones? > > What do you define as 'hung solid'? You are unable to get in via SSH? Or at a > console via iLO/etc? > > I've seen this on some of our 6.0-RELEASE machines (along with maxpipekva > exhausted errors), and you can't SSH in from that point... because sshd forks > to handle the connection, and all available process slots are used up. > > I've thought about writing a background daemon to monitor the logs for signs > of this (or even to just try and create a short-lived child process by > fork()ing every 5 minutes or so), and dump information to disk then reboot > the system when this occurs... it's a work-around for something that > "shouldn't happen", but it does anyway... once I'm able to identify _what_ is > causing the build-up of processes, then I might be able to do something about > killing them...!!! > > > It's quite deceptive from an end-user point of view, because things like > Apache that are already keep running, so all they see are strange bits and > pieces that don't work... and as always, its one of those things that only > happens on some clients machines, but never on any of our test machines... > > --Antony > > > PS. I haven't disappeared off the face of the earth.. though close.. my > fiance and I have been busy planning the wedding, and wound up buying a house > at the same time..!! Will catch up shortly once I get a chance to come up for > air!! - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFF6Qhk4QvfyHIvDvMRAhJ0AKDVibziN1W1TagIapB5GWN3+mbCGACdHd4w dgT0Xi40Ie/pBeUMB8Pj1go= =bSuI -----END PGP SIGNATURE-----