Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Dec 2015 14:17:11 -0700
From:      Jim Harris <jim.harris@gmail.com>
To:        Steven Hartland <steven@multiplay.co.uk>
Cc:        svn-src-head@freebsd.org
Subject:   Re: svn commit: r290199 - in head/sys/dev: nvd nvme
Message-ID:  <CAJP=Hc9e0eLGZLyu30bnTqw=uZgE5tc3jzrvt4oU580Dr1xu%2BA@mail.gmail.com>
In-Reply-To: <5667422C.1050806@freebsd.org>
References:  <201510301635.t9UGZI0F085365@repo.freebsd.org> <5667422C.1050806@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 8, 2015 at 1:48 PM, Steven Hartland <steven@multiplay.co.uk>
wrote:

> Hi Jim could you let me know the use case for exposing the controller
> stripe size as the disk stripe size done by this commit?
>
> I ask as it actually causes problems for ZFS which has checks to ensure
> zpools perform optimally by correctly configuring ashift to match the
> stripesize if reported.
>
> This is usually fine as stripe size typically reports the physical block
> size of device, where sectorsize is the logical block size, unfortunately
> this is currently limited to ashift of 13 (8KB) so when nvme reports 128KB
> it limits it 8KB and hence every subsequent zpool status reports a warning
> about optimal performance.
>
> Before I look to fix one or the other, I wanted to fully understand the
> reasoning behind how nvme behaves here.
>

Some Intel NVMe controllers have a slow path for I/Os that span a 128KB
stripe boundary.  The FreeBSD NVMe driver checks for this condition, and
will split the I/O inside of the NVMe driver in these cases, to ensure we
do not hit this slow path.

The idea behind reporting the stripe size up through GEOM was to provide a
hint to upper layers, especially for file system layout - in hopes of
reducing the number of I/Os that need to be split.

Based on your findings, limiting the stripe size reported up through GEOM
to 4KB would be OK.  This may result in some small number of additional
I/Os to require splitting, but the NVMe I/O path is very efficient so these
additional I/Os would cause very minimal (if any) difference in performance
or CPU utilization.

-Jim




> On 30/10/2015 16:35, Jim Harris wrote:
>
>> Author: jimharris
>> Date: Fri Oct 30 16:35:18 2015
>> New Revision: 290199
>> URL: https://svnweb.freebsd.org/changeset/base/290199
>>
>> Log:
>>    nvd, nvme: report stripesize through GEOM disk layer
>>       MFC after:        3 days
>>    Sponsored by:        Intel
>>
>> Modified:
>>    head/sys/dev/nvd/nvd.c
>>    head/sys/dev/nvme/nvme.h
>>    head/sys/dev/nvme/nvme_ns.c
>>
>> Modified: head/sys/dev/nvd/nvd.c
>>
>> ==============================================================================
>> --- head/sys/dev/nvd/nvd.c      Fri Oct 30 16:06:34 2015        (r290198)
>> +++ head/sys/dev/nvd/nvd.c      Fri Oct 30 16:35:18 2015        (r290199)
>> @@ -279,6 +279,7 @@ nvd_new_disk(struct nvme_namespace *ns,
>>         disk->d_sectorsize = nvme_ns_get_sector_size(ns);
>>         disk->d_mediasize = (off_t)nvme_ns_get_size(ns);
>>         disk->d_delmaxsize = (off_t)nvme_ns_get_size(ns);
>> +       disk->d_stripesize = nvme_ns_get_stripesize(ns);
>>         if (TAILQ_EMPTY(&disk_head))
>>                 disk->d_unit = 0;
>>
>> Modified: head/sys/dev/nvme/nvme.h
>>
>> ==============================================================================
>> --- head/sys/dev/nvme/nvme.h    Fri Oct 30 16:06:34 2015        (r290198)
>> +++ head/sys/dev/nvme/nvme.h    Fri Oct 30 16:35:18 2015        (r290199)
>> @@ -870,6 +870,7 @@ const char *        nvme_ns_get_serial_number(s
>>   const char *  nvme_ns_get_model_number(struct nvme_namespace *ns);
>>   const struct nvme_namespace_data *
>>                 nvme_ns_get_data(struct nvme_namespace *ns);
>> +uint32_t       nvme_ns_get_stripesize(struct nvme_namespace *ns);
>>     int nvme_ns_bio_process(struct nvme_namespace *ns, struct bio *bp,
>>                             nvme_cb_fn_t cb_fn);
>>
>> Modified: head/sys/dev/nvme/nvme_ns.c
>>
>> ==============================================================================
>> --- head/sys/dev/nvme/nvme_ns.c Fri Oct 30 16:06:34 2015        (r290198)
>> +++ head/sys/dev/nvme/nvme_ns.c Fri Oct 30 16:35:18 2015        (r290199)
>> @@ -210,6 +210,13 @@ nvme_ns_get_data(struct nvme_namespace *
>>         return (&ns->data);
>>   }
>>   +uint32_t
>> +nvme_ns_get_stripesize(struct nvme_namespace *ns)
>> +{
>> +
>> +       return (ns->stripesize);
>> +}
>> +
>>   static void
>>   nvme_ns_bio_done(void *arg, const struct nvme_completion *status)
>>   {
>>
>>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc9e0eLGZLyu30bnTqw=uZgE5tc3jzrvt4oU580Dr1xu%2BA>