Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Jan 2009 20:57:37 -0800
From:      David Ehrmann <ehrmann@gmail.com>
To:        Michael Proto <mike@jellydonut.org>
Cc:        freebsd-stable@freebsd.org, Pete French <petefrench@ticketswitch.com>
Subject:   Re: zfs drive keeps failing between export and import
Message-ID:  <6e0e5340901232057p4f696455lbbe0bb4e38248837@mail.gmail.com>
In-Reply-To: <6e0e5340901222112x159409c5xd2fd93e32b020c0f@mail.gmail.com>
References:  <6e0e5340901151158n5108ba8ct6af8fb270b10b75b@mail.gmail.com> <E1LNmxP-0003vM-Lx@dilbert.ticketswitch.com> <6e0e5340901161521t30845197s9529fb5a55dbba13@mail.gmail.com> <6e0e5340901221324o33f1e2b1l53c842ebf9dad9a8@mail.gmail.com> <1de79840901221745r4149dc30yfcfcb8c8a24ad8ce@mail.gmail.com> <6e0e5340901222108i6a5e300fte7fdd1a517fbe049@mail.gmail.com> <6e0e5340901222112x159409c5xd2fd93e32b020c0f@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 22, 2009 at 9:12 PM, David Ehrmann <ehrmann@gmail.com> wrote:
> On Thu, Jan 22, 2009 at 9:08 PM, David Ehrmann <ehrmann@gmail.com> wrote:
>> On Thu, Jan 22, 2009 at 5:45 PM, Michael Proto <mike@jellydonut.org> wrote:
>>> On Thu, Jan 22, 2009 at 4:24 PM, David Ehrmann <ehrmann@gmail.com> wrote:
>>>> On Fri, Jan 16, 2009 at 3:21 PM, David Ehrmann <ehrmann@gmail.com> wrote:
>>>>> On Fri, Jan 16, 2009 at 3:33 AM, Pete French
>>>>> <petefrench@ticketswitch.com> wrote:
>>>>>>> a software problem before hardware.  Both drives are encrypted geli
>>>>>>> devices.  I tried to reproduce the error with 1GB disk images (vs
>>>>>>
>>>>>> This is probably a silly question, but are you sure that the drives
>>>>>> are not auto detaching ? I had big problems with a zfs mirror on top
>>>>>> of geli which turned out to be that drives mounted using "geli_devices"
>>>>>> in rc.conf will auto detach unless you set "geli_autodetach" to NO.
>>>>>
>>>>> Not silly at all.  I didn't know that could be an issue, but they
>>>>> weren't mounted with "geli_devices," they were mounted by hand with
>>>>> "geli attach /dev/ad<disk>."  I did not set the -d flag on attach, and
>>>>> I don't think I used the -l flag on detach, either.  Listing the
>>>>> device says this:
>>>>>
>>>>> Geom name: ad10.eli
>>>>> EncryptionAlgorithm: AES-CBC
>>>>> KeyLength: 128
>>>>> Crypto: hardware
>>>>> UsedKey: 0
>>>>> Flags: NONE
>>>>>
>>>>> (and more stuff)
>>>>>
>>>>> One more interesting thing: I accidentally rebooted the system without
>>>>> any detaching/exporting (it involved a different, bad drive).  When it
>>>>> came up, I was able to re-import tank without any problems.
>>>>>
>>>>
>>>> Ok, here's where it gets interesting:
>>>>
>>>> The next time I saw the import error, I ran zdb -l on the actual dev.
>>>> It couldn't find the labels.  So I used dd to grab the first 4k of the
>>>> .eli device and the actual device. Once I got it working, I repeated.
>>>> The data in the first 4k of /dev/ad8 were all 0x00 both times.  I'm
>>>> guessing this is reserved, or something.  The data in the first 4k of
>>>> /dev/ad8.eli differed between runs (so zdb -l is probably right about
>>>> not finding the label).
>>>>
>>>> In the /dev/ad8.eli that zfs doesn't recognize, I found a 16 byte
>>>> string that was repeated a lot, but it was also repeated in another
>>>> place: the good /dev/ad10.eli (though the offsets were different).
>>>> The other weird thing: the good and bad /dev/ad8.eli look a lot alike:
>>>> one 16 byte string, then another that gets repeated, then another 16
>>>> byte string randomly shows up at 0x200.
>>>>
>>>> Why the same data appear in the bad ad8.eli as the good ad10.eli, I'm
>>>> not sure (I do have the same password and no keyfile with geli), but
>>>> the patterns of data looking the same make me think something's wrong
>>>> with the encryption.  It's using 128 bit AES-CBC, and these patterns
>>>> would not be hidden by it (128 bits == 16 bytes).
>>>>
>>>> I'm using a Via C7 CPU's padlock cryptographic accelerator, and geli
>>>> reports this.  I'm guessing this is either a padlock or a geli bug.
>>>>
>>>> I can't reliably reproduce this problem, but doing it with padlock off
>>>> might be a good test.
>>>> _______________________________________________
>>>> freebsd-stable@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>>>>
>>>
>>> I saw something similar (minus zfs) when I was playing with padlock
>>> and geli on my C7-Esther fileserver. When trying to mount a geli
>>> partition I'd intermittently get a bad decryption key error. Run the
>>> same command again to mount the partition and it'd work fine. This was
>>> using both password and key-file operations. IIRC when I disabled
>>> padlock acceleration it worked fine in my limited testing. That was
>>> 6.4, now that I'm on 7.1 it might be worth looking at again.
>>
>> I just got around to trying it without padlock.  I tried to replicate
>> the problem 5 or 6 times, but no luck.
>>
>> This is 7.1.
>>
>> It *sounds* like a padlock problem, but I'd like to see it make the
>> same mistake with a file or memory backed md device.  Anyway, that
>> this point, I can pretty much rule out zfs as the culprit.
>>
>
> Or geli... Any success (not intermittent) reports with a hifn or
> broadcom accelerator and geli?
>

I wasn't able to reproduce the problem with two ~100MB disk-backed md
devices and just geli.  Next up is a real disk.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6e0e5340901232057p4f696455lbbe0bb4e38248837>