Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Jun 2013 00:28:05 +0200
From:      Nikola =?utf-8?B?UGF2bG92acSH?= <nzp@riseup.net>
To:        Adam Vande More <amvandemore@gmail.com>
Cc:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: Troubleshooting a gmirror disk marked broken
Message-ID:  <20130628222805.GA15414@sputnjik.localdomain>
In-Reply-To: <CA%2BtpaK292v09O9_9Mdi=W9hc9tcb1HmBt1RYKmaaNu7NxcqeMw@mail.gmail.com>
References:  <20130627023837.GA7685@sputnjik.localdomain> <CA%2BtpaK292v09O9_9Mdi=W9hc9tcb1HmBt1RYKmaaNu7NxcqeMw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 26, 2013 at 10:09:33PM -0500, Adam Vande More wrote:
> On Wed, Jun 26, 2013 at 9:38 PM, Nikola Pavlović <nzp@riseup.net> wrote:
> 
> > Hi,
> >
> > Last night during a massive (~1 year worth :| )
> > portsnap fetch
> >
> > the server went unresponsive and ssh eventually disconnected.  I decided
> > to leave it during the night, and, sure enough, the situation was the
> > same in the morning, so I had to do a hard reset.  It came back up, but
> > one of the two gmirror components was marked as broken and deactivated.
> >
> > The hang happened during the 'fetching new files or ports' (~24000 of
> > them, there are currently ~10000 snapshots in /var/db/portsnap) phase
> > of postsnap fetch.
> >
> > /var/log/messages was completely silent during the period between the
> > hang and the reset.
> >
> > Googling around I found a mention that it's possible to sometimes get a
> > 'blip'[*] during busy periods, so I decided to just bite the bullet and
> > reinsert the component with
> > # gmirror forget gm0
> > # gmirror clean ad4
> > # gmirror insert gm0 ad4
> >
> > Currently it's syncing and things *seem* OK.  My question is how much
> > should I be worried and what could be the cause of this?  Is it possible
> > that  ports snapshot fetching caused this, or that perhaps it was the other
> > way around (a failing disk causing the machine to choke during the huge
> > portsnap fetch)?  How to proceed? :)
> >
> 
> The messages log definitely shows problems with your io.  The smart log of
> the disks are also at least mildly concerning and indicates the drives are
> in a preliminary stage of death.  Some HD deaths take years to complete.
> Expect random glitches and intermittent reduced performance as a continuous
> degradation.   You might be able to alleviate some of this by switching to
> the AHCI driver and bumping up timeouts but at the end of the day 2 flaky
> disks in a mirror don't inspire confidence.
> 

About AHCI, it didn't attach after setting ahci_load="YES" in
loader.conf so I assumed it wasn't enabled in BIOS.  As I don't have
physical access to the machine I asked the support to enable it, and
presumably they did (that's what they said, and the machine was rebooted
when they said they did).  But still no luck.  It's a VIA 6420
controller and maybe it doesn't support AHCI (couldn't find anything
definitive on the net about that).  If that's the case, is it even possible
that there exists an option to enable it in BIOS?  I'm confused because
they didn't say it doesn't support it, but explicitly that they enabled
it.

It's possible to request KVM-over-IP, so I can look for myself, but I
don't want to waste time (and install Java just for this) if it's useless.


-- 
To criticize the incompetent is easy;
it is more difficult to criticize the competent.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130628222805.GA15414>