Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Mar 2014 13:56:17 -0000
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Andriy Gapon" <avg@FreeBSD.org>, <freebsd-current@FreeBSD.org>
Cc:        "Justin T. Gibbs" <gibbs@FreeBSD.org>
Subject:   Re: ZFS secondarycache on SSD problem on r255173
Message-ID:  <94E20B469CD448548983C42DD0109A30@multiplay.co.uk>
References:  <20131016080100.GA27758@hell.ukr.net> <3A44A8F6-8B62-4A23-819D-B91A3E6E5EF9@freebsd.org> <E5E6AB7C-C067-4B92-8A38-9DD811011D6F@FreeBSD.org> <7059AA6DCC0D46B8B1D33FC883C31643@multiplay.co.uk> <20131017061248.GA15980@hell.ukr.net> <326B470C65A04BC4BC83E118185B935F@multiplay.co.uk> <20131017073925.GA34958@hell.ukr.net> <2AFE1CBD9B124E3AB9E05A4E483CCE03@multiplay.co.uk> <20131018080148.GA75226@hell.ukr.net> <256B2E5A0BA44DCBB45BB3F3E820E190@multiplay.co.uk> <20131018144524.GA30018@hell.ukr.net> <4459A6FAB7B8445C97CCB9EFF34FD4F0@multiplay.co.uk> <53144891.9050001@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- 
From: "Andriy Gapon" <avg@FreeBSD.org>


> on 18/10/2013 17:57 Steven Hartland said the following:
>> I think we we may well need the following patch to set the minblock
>> size based on the vdev ashift and not SPA_MINBLOCKSIZE.
>> 
>> svn diff -x -p sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
>> ===================================================================
>> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        (revision 256554)
>> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c        (working copy)
>> @@ -5147,7 +5147,7 @@ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr)
>>        len = l2hdr->b_asize;
>>        cdata = zio_data_buf_alloc(len);
>>        csize = zio_compress_data(ZIO_COMPRESS_LZ4, l2hdr->b_tmp_cdata,
>> -           cdata, l2hdr->b_asize, (size_t)SPA_MINBLOCKSIZE);
>> +           cdata, l2hdr->b_asize, (size_t)(1ULL <<
>> l2hdr->b_dev->l2ad_vdev->vdev_ashift));
>> 
>>        if (csize == 0) {
>>                /* zero block, indicate that there's nothing to write */
> 
> 
> This is a rather old thread and change, but I think that I have identified
> another problem with 4KB cache devices.
> 
> I noticed that on some of our systems we were getting a clearly abnormal number
> of l2arc checksum errors accounted in l2_cksum_bad.  The hardware appeared to be
> in good health.  Using DTrace I noticed that the data seemed to be overwritten
> with other data.  After more DTrace analysis I observed that sometimes
> l2arc_write_buffers() would advance l2ad_hand by more than target_sz.
> This meant that l2arc_write_buffers() would write beyond a region cleared by
> l2arc_evict() and thus overwrite data belonging to non-evicted buffers.  Havoc
> ensues.
> 
> The cache devices in question are all SSDs with logical sector size of 4KB.
> I am not sure about other ZFS platforms, but on FreeBSD this fact is detected
> and ashift of 12 is used for the cache vdevs.
> 
> Looking at l2arc_write_buffers() code you can see that it properly accounts for
> ashift when actually writing buffers and advancing l2ad_hand:
>                        /*
>                         * Keep the clock hand suitably device-aligned.
>                         */
>                        buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz);
>                        write_psize += buf_p_sz;
>                        dev->l2ad_hand += buf_p_sz;
> 
> But the same is not done when selecting buffers to be written and checking that
> target_sz is not exceeded.
> So, if ARC contains a lot of buffers smaller than 4K that means that an aligned
> on-disk size of the L2ARC buffers could be quite larger than their non-aligned size.
> 
> I propose the following patch which has been tested and seems to fix the problem
> without introducing any side effects:
> https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size.diff
> https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size

Looks good to me.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94E20B469CD448548983C42DD0109A30>