Date: Wed, 26 Jun 2013 22:09:33 -0500 From: Adam Vande More <amvandemore@gmail.com> To: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: Troubleshooting a gmirror disk marked broken Message-ID: <CA%2BtpaK292v09O9_9Mdi=W9hc9tcb1HmBt1RYKmaaNu7NxcqeMw@mail.gmail.com> In-Reply-To: <20130627023837.GA7685@sputnjik.localdomain> References: <20130627023837.GA7685@sputnjik.localdomain>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jun 26, 2013 at 9:38 PM, Nikola Pavlovi=C4=87 <nzp@riseup.net> wrot= e: > Hi, > > Last night during a massive (~1 year worth :| ) > portsnap fetch > > the server went unresponsive and ssh eventually disconnected. I decided > to leave it during the night, and, sure enough, the situation was the > same in the morning, so I had to do a hard reset. It came back up, but > one of the two gmirror components was marked as broken and deactivated. > > The hang happened during the 'fetching new files or ports' (~24000 of > them, there are currently ~10000 snapshots in /var/db/portsnap) phase > of postsnap fetch. > > /var/log/messages was completely silent during the period between the > hang and the reset. > > Googling around I found a mention that it's possible to sometimes get a > 'blip'[*] during busy periods, so I decided to just bite the bullet and > reinsert the component with > # gmirror forget gm0 > # gmirror clean ad4 > # gmirror insert gm0 ad4 > > Currently it's syncing and things *seem* OK. My question is how much > should I be worried and what could be the cause of this? Is it possible > that ports snapshot fetching caused this, or that perhaps it was the oth= er > way around (a failing disk causing the machine to choke during the huge > portsnap fetch)? How to proceed? :) > The messages log definitely shows problems with your io. The smart log of the disks are also at least mildly concerning and indicates the drives are in a preliminary stage of death. Some HD deaths take years to complete. Expect random glitches and intermittent reduced performance as a continuous degradation. You might be able to alleviate some of this by switching to the AHCI driver and bumping up timeouts but at the end of the day 2 flaky disks in a mirror don't inspire confidence. --=20 Adam Vande More
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BtpaK292v09O9_9Mdi=W9hc9tcb1HmBt1RYKmaaNu7NxcqeMw>