Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Jun 2013 22:09:33 -0500
From:      Adam Vande More <amvandemore@gmail.com>
To:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: Troubleshooting a gmirror disk marked broken
Message-ID:  <CA%2BtpaK292v09O9_9Mdi=W9hc9tcb1HmBt1RYKmaaNu7NxcqeMw@mail.gmail.com>
In-Reply-To: <20130627023837.GA7685@sputnjik.localdomain>
References:  <20130627023837.GA7685@sputnjik.localdomain>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 26, 2013 at 9:38 PM, Nikola Pavlovi=C4=87 <nzp@riseup.net> wrot=
e:

> Hi,
>
> Last night during a massive (~1 year worth :| )
> portsnap fetch
>
> the server went unresponsive and ssh eventually disconnected.  I decided
> to leave it during the night, and, sure enough, the situation was the
> same in the morning, so I had to do a hard reset.  It came back up, but
> one of the two gmirror components was marked as broken and deactivated.
>
> The hang happened during the 'fetching new files or ports' (~24000 of
> them, there are currently ~10000 snapshots in /var/db/portsnap) phase
> of postsnap fetch.
>
> /var/log/messages was completely silent during the period between the
> hang and the reset.
>
> Googling around I found a mention that it's possible to sometimes get a
> 'blip'[*] during busy periods, so I decided to just bite the bullet and
> reinsert the component with
> # gmirror forget gm0
> # gmirror clean ad4
> # gmirror insert gm0 ad4
>
> Currently it's syncing and things *seem* OK.  My question is how much
> should I be worried and what could be the cause of this?  Is it possible
> that  ports snapshot fetching caused this, or that perhaps it was the oth=
er
> way around (a failing disk causing the machine to choke during the huge
> portsnap fetch)?  How to proceed? :)
>

The messages log definitely shows problems with your io.  The smart log of
the disks are also at least mildly concerning and indicates the drives are
in a preliminary stage of death.  Some HD deaths take years to complete.
Expect random glitches and intermittent reduced performance as a continuous
degradation.   You might be able to alleviate some of this by switching to
the AHCI driver and bumping up timeouts but at the end of the day 2 flaky
disks in a mirror don't inspire confidence.

--=20
Adam Vande More



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BtpaK292v09O9_9Mdi=W9hc9tcb1HmBt1RYKmaaNu7NxcqeMw>