Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jul 2006 12:10:30 +0200
From:      =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.org>
To:        freebsd-stable@freebsd.org
Subject:   Re: GEOM problems again...
Message-ID:  <36895211-2796-4213-B336-6279AB3AC3CB@stromnet.org>
In-Reply-To: <884C01BC-3E97-46EC-AA8B-E70C3931F3A4@stromnet.org>
References:  <DAFCD4DC-D2D4-4574-ACBF-367D642D9729@stromnet.org> <8D08DDB6-6AC1-45B6-B2CE-08782F54968A@stromnet.org> <884C01BC-3E97-46EC-AA8B-E70C3931F3A4@stromnet.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 10 jul 2006, at 13.59, Johan Str=F6m wrote:

>
> On 10 jul 2006, at 11.09, Johan Str=F6m wrote:
>
>>
>> On 21 maj 2006, at 11.16, Johan Str=F6m wrote:
>>
>>> Hi
>>>
>>> I've had problems before with GEOM mirror and my SATA drives, and =20=

>>> i've posted about it here before too. The solution seemd to be a =20
>>> change of motherboard, this reduced the crash very much (and also =20=

>>> the speeds archieved was greatly improved, from 10-15MB/s to =20
>>> 40-50MB/s..).
>>> However after the change i had one or two crashes, but now it has =20=

>>> been running for well over 50-60 days or so without any problems.
>>> Then, 11 days ago I upgraded to 6.1... And now I got these =20
>>> "crashe"s again (the mirror is crashed that is, the system still =20
>>> runs fine):
>>>
>>> May 21 02:04:58 elfi kernel: ad6: FAILURE - device detached
>>> May 21 02:04:58 elfi kernel: subdisk6: detached
>>> May 21 02:04:58 elfi kernel: ad6: detached
>>> May 21 02:04:58 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
>>> ad6s1 disconnected.
>>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20
>>> (offset=3D11006308352, length=3D2048)]error =3D 6
>>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20
>>> (offset=3D164847927296, length=3D131072)]error =3D 6
>>> May 21 02:04:58 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20
>>> (offset=3D256680296448, length=3D32768)]error =3D 6
>>>
>>>
>>> Some info about the controller and disks:
>>>
>>> May  9 22:46:52 elfi kernel: ata1: <ATA channel 1> on atapci0
>>> May  9 22:46:52 elfi kernel: atapci1: <nVidia nForce2 Pro SATA150 =20=

>>> controller> port =20
>>> 0xec00-0xec07,0xe880-0xe883,0xe800-0xe807,0xe480-0xe483,0x7f00-0x7f0=20=

>>> f,0x7c0
>>> 0-0x7c7f irq 22 at device 11.0 on pci0
>>>
>>> May  9 22:46:52 elfi kernel: ad4: 286188MB <Maxtor 7L300S0 =20
>>> BANC1G10> at ata2-master SATA150
>>> May  9 22:46:52 elfi kernel: ad6: 286188MB <Maxtor 7L300S0 =20
>>> BANC1G10> at ata3-master SATA150
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1 created =20
>>> (id=3D4118114647).
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
>>> ad4s1 detected.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
>>> ad6s1 detected.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
>>> ad6s1 activated.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
>>> ad4s1 activated.
>>> May  9 22:46:52 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
>>> mirror/gm0s1 launched.
>>> May  9 22:46:52 elfi kernel: Trying to mount root from ufs:/dev/=20
>>> mirror/gm0s1a
>>>
>>> Anyone got any new clues? Afaik the disks should be working fine =20
>>> (they are 6 months old and this same problem has occured multiple =20=

>>> times...)
>>>
>>> Hope to solve this ;)
>>>
>>> Thanks
>>> Johan
>>>
>>
>> Here we go again
>>
>> Jul  7 16:20:09 elfi kernel: ad4: FAILURE - device detached
>> Jul  7 16:20:09 elfi kernel: subdisk4: detached
>> Jul  7 16:20:09 elfi kernel: ad4: detached
>> Jul  7 16:20:09 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
>> ad4s1 disconnected.
>> Jul  7 16:20:09 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20
>> (offset=3D88896847872, length=3D32768)]error =3D 6
>>
>> However no read read timeouts etc as before, just this. 18 days =20
>> uptime this time (i've rebooted for other reasons since last =20
>> mail). It always seems to be ad4 that is disconnecting.. I'm going =20=

>> to do some disk tests on it but i doubt it will give anything =20
>> since i've had similiar problems from day one (did tests at that =20
>> time w/o problems) with this gmirror setup (new disks).
>>
>> Johan
>
> Followup, I ran over the disk with Maxtors own test program, full =20
> length test. Not a single problem.
> After reboot the raid is rebuilding fine:
>
> GEOM_MIRROR: Device gm0s1: rebuilding provider ad4s1.
>
> As usual it seems i cannot get the controller/driver to redetect =20
> the disk using atacontrol etc..
>
> Johan

And now again... raid gone degraded only 2 days after reboot!

Jul 12 22:22:50 elfi kernel: ad4: FAILURE - device detached
Jul 12 22:22:50 elfi kernel: subdisk4: detached
Jul 12 22:22:50 elfi kernel: ad4: detached
Jul 12 22:22:50 elfi kernel: GEOM_MIRROR: Device gm0s1: provider =20
ad4s1 disconnected.
Jul 12 22:22:50 elfi kernel: g_vfs_done():mirror/gm0s1f[READ=20
(offset=3D120776474624, length=3D32768)]error =3D 6

$ uname -a
FreeBSD elfi.stromnet.org 6.1-RELEASE FreeBSD 6.1-RELEASE #3: Tue =20
May  9 20:40:23 CEST 2006 johan@elfi.stromnet.org:/usr/obj/usr/src/=20
sys/GENERIC  i386

Still no luck with atacontrol...

Is there any way to debug this further ?? I've tested the disk, the =20
SATA cables are new... I've had similar problems with other =20
motherboard...
I dont think this is related to hw problems, but rather a =20
softwareproblem that needs to be solved, this is not something one =20
can call stable ;)

So, any pointers how to enable more debugging or anything that could =20
give some clues?

Johan





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?36895211-2796-4213-B336-6279AB3AC3CB>