From owner-freebsd-questions@FreeBSD.ORG Thu Jun 27 03:09:34 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2B2D4829 for ; Thu, 27 Jun 2013 03:09:34 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-pa0-x234.google.com (mail-pa0-x234.google.com [IPv6:2607:f8b0:400e:c03::234]) by mx1.freebsd.org (Postfix) with ESMTP id 0B375111B for ; Thu, 27 Jun 2013 03:09:34 +0000 (UTC) Received: by mail-pa0-f52.google.com with SMTP id kq13so413355pab.39 for ; Wed, 26 Jun 2013 20:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TNuxM64Mi//JYshMm94gCffd2X3NASZqqGj//KTtk0M=; b=rYelZOQ1bxesnV5PjAxw75j2XVxeFcnUVtfDzEQVSo1Y4P4xUe/4QltDPE/V5p4aH2 Poa2g6bVmCBHyHXTi5uOKoDLpEdlXGIraYK2KVVF+G0fnMTPFt9bjDXag9VOyrPklU1v WV8ZPXSZI6NDLTKL8SBDIJQjfD+DTxPTjfzCgK9iMa/G/rVScoAj9W5xsiR84OGJUMmr aNOx/82efuiddf6YrYqmlv8OXI23oRHq4Y9MVYgjuWQEc0kBfbxNLscxBFOUPXd/9Gz7 Zg6KpySopzCYtQcp9npda9P7cIim658tCH6BeD2HN908gigysmcQ4jWoW2kNjHCZdW2V iD1A== MIME-Version: 1.0 X-Received: by 10.66.122.41 with SMTP id lp9mr1366005pab.6.1372302573196; Wed, 26 Jun 2013 20:09:33 -0700 (PDT) Received: by 10.70.93.137 with HTTP; Wed, 26 Jun 2013 20:09:33 -0700 (PDT) In-Reply-To: <20130627023837.GA7685@sputnjik.localdomain> References: <20130627023837.GA7685@sputnjik.localdomain> Date: Wed, 26 Jun 2013 22:09:33 -0500 Message-ID: Subject: Re: Troubleshooting a gmirror disk marked broken From: Adam Vande More To: FreeBSD Questions Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2013 03:09:34 -0000 On Wed, Jun 26, 2013 at 9:38 PM, Nikola Pavlovi=C4=87 wrot= e: > Hi, > > Last night during a massive (~1 year worth :| ) > portsnap fetch > > the server went unresponsive and ssh eventually disconnected. I decided > to leave it during the night, and, sure enough, the situation was the > same in the morning, so I had to do a hard reset. It came back up, but > one of the two gmirror components was marked as broken and deactivated. > > The hang happened during the 'fetching new files or ports' (~24000 of > them, there are currently ~10000 snapshots in /var/db/portsnap) phase > of postsnap fetch. > > /var/log/messages was completely silent during the period between the > hang and the reset. > > Googling around I found a mention that it's possible to sometimes get a > 'blip'[*] during busy periods, so I decided to just bite the bullet and > reinsert the component with > # gmirror forget gm0 > # gmirror clean ad4 > # gmirror insert gm0 ad4 > > Currently it's syncing and things *seem* OK. My question is how much > should I be worried and what could be the cause of this? Is it possible > that ports snapshot fetching caused this, or that perhaps it was the oth= er > way around (a failing disk causing the machine to choke during the huge > portsnap fetch)? How to proceed? :) > The messages log definitely shows problems with your io. The smart log of the disks are also at least mildly concerning and indicates the drives are in a preliminary stage of death. Some HD deaths take years to complete. Expect random glitches and intermittent reduced performance as a continuous degradation. You might be able to alleviate some of this by switching to the AHCI driver and bumping up timeouts but at the end of the day 2 flaky disks in a mirror don't inspire confidence. --=20 Adam Vande More