Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 16 Oct 2010 02:18:57 +0400
From:      Sergey Kandaurov <pluknet@gmail.com>
To:        Charles Owens <cowens@greatbaysoftware.com>
Cc:        Scott Long <scottl@freebsd.org>, freebsd-hardware@freebsd.org
Subject:   Re: mfiutil reports "PSTATE 0x0020" new drive state
Message-ID:  <AANLkTimYU_XmZ_DRjA_zJ7dcmgaj47UM6Tf3ea50cZLK@mail.gmail.com>
In-Reply-To: <4CB8BED6.8040204@greatbaysoftware.com>
References:  <4CB8A614.6000707@greatbaysoftware.com> <4CB8BED6.8040204@greatbaysoftware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 16 October 2010 00:51, Charles Owens <cowens@greatbaysoftware.com> wrote=
:
> =A0Hmm... the problem appears to have resolved itself. =A0After a few hou=
rs the
> new drive seems to have gone back into the array, and the original hot sp=
are
> drive put back into hot-spare state.
>
> So I'm interpreting state 0x0020 to therefore mean something like "hang o=
n
> while I use this new drive to automatically put everything back as it was
> before the failure". =A0Is this correct?
>
> Thanks,
> Charles
>
> [root@Bsvr ~]# mfiutil show drives
> mfi0 Physical Drives:
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236JR> =A0SATA enclosure 1=
, slot 0
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM237KF> =A0SATA enclosure 1=
, slot 1
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236N8> =A0SATA enclosure 1=
, slot 2
> ( =A0149G) HOT SPARE<ST9160511NS SN04 serial=3D9SM237EK> =A0SATA enclosur=
e 1, slot
> 3
> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM238AG> =A0SATA enclosure 1=
, slot 4
>
>
>
> On 10/15/10 3:05 PM, Charles Owens wrote:
>>
>> =A0Hello,
>>
>> We have a mfi-based RAID array with a failed drive. =A0When replacing th=
e
>> failed drive with a brand new one 'mfiutil' reports it having status of
>> "PSTATE 0x0020". =A0Attempts to work with the drive to make it a hot spa=
re are
>> unsuccessful (eg. using "good" and/or "add" subcommands of mfiutil). =A0=
 We've
>> tested procedures for replacing failed drives in the past and haven't ru=
n
>> into this.
>>
>> Looking at the code for mfiutil it appears that this is happening becaus=
e
>> the mfi controller is reporting a drive status code that mfiutil doesn't
>> know about. =A0The system is remote and in production, so booting into t=
he LSI
>> in-BIOS RAID-management-tool is not an attractive option.
>>
>> Any help with understanding the situation and potential next steps would
>> be greatly appreciated. =A0More background information follows below.
>>
>> Thanks,
>>
>> Charles
>>
>>
>> Storage configuration: =A04-drive RAID 10 array plus one hot spare
>>
>> [root@svr ~]# mfiutil show config
>> mfi0 Configuration: 2 arrays, 1 volumes, 0 spares
>> =A0 =A0array 0 of 2 drives:
>> =A0 =A0 =A0 =A0drive 0 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
6JR> =A0SATA
>> enclosure 1, slot 0
>> =A0 =A0 =A0 =A0drive 1 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
7KF> =A0SATA
>> enclosure 1, slot 1
>> =A0 =A0array 1 of 2 drives:
>> =A0 =A0 =A0 =A0drive 4 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
7EK> =A0SATA
>> enclosure 1, slot 3
>> =A0 =A0 =A0 =A0drive 3 ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM23=
6N8> =A0SATA
>> enclosure 1, slot 2
>> =A0 =A0volume mfid0 (296G) RAID-1 256K OPTIMAL spans:
>> =A0 =A0 =A0 =A0array 0
>> =A0 =A0 =A0 =A0array 1
>>
>> [root@svr ~]# mfiutil show drives
>> mfi0 Physical Drives:
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236JR> =A0SATA enclosure =
1, slot
>> 0
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM237KF> =A0SATA enclosure =
1, slot
>> 1
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM236N8> =A0SATA enclosure =
1, slot
>> 2
>> ( =A0149G) ONLINE<ST9160511NS SN04 serial=3D9SM237EK> =A0SATA enclosure =
1, slot
>> 3
>> ( =A0149G) PSTATE 0x0020<ST9160511NS SN04 serial=3D9SM238AG> =A0SATA enc=
losure
>> 1, slot 4
>>
>> mfi0:<LSI MegaSAS 1078> =A0port 0x1000-0x10ff mem
>> ...
>>

Hi, Charles Owens.

0x20 is much likely to be the copyback physical state,
which is missing in enum mfi_pd_state.
And what you've experienced is copyback feature in action :)
Your array has been rebuilt with HSP as its ordinal PD, then you
switched failed drive
with good one, and HSP came into copyback mode to move all its data back
to good disk. That prevents reordering of disk numbers in array and
double rebuilding.

--=20
wbr,
pluknet



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTimYU_XmZ_DRjA_zJ7dcmgaj47UM6Tf3ea50cZLK>