Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Jun 2014 14:27:42 -0700
From:      Matthew Ahrens <mahrens@delphix.com>
To:        Nathan Whitehorn <nwhitehorn@freebsd.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: fdisk(8) vs gpart(8), and gnop
Message-ID:  <CAJjvXiFAX7N-30g0OZ6idqLnyJww5dsyhGfLj6nYwKs9Xp--1g@mail.gmail.com>
In-Reply-To: <538B4FD7.4090000@freebsd.org>
References:  <20140601004242.GA97224@bewilderbeast.blackhelicopters.org> <CAOjFWZ5N9FGwgSz0_YFNQjavzdJDitRn52VKn4ipW1ddj6-weQ@mail.gmail.com> <BCA9F5D6-3925-4E7E-9082-128652508305@FreeBSD.org> <3D6974D83AE9495E890D9F3CA654FA94@multiplay.co.uk> <538B4CEF.2030801@freebsd.org> <1DB2D63312CE439A96B23EAADFA9436E@multiplay.co.uk> <538B4FD7.4090000@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jun 1, 2014 at 9:07 AM, Nathan Whitehorn <nwhitehorn@freebsd.org>
wrote:

> On 06/01/14 09:00, Steven Hartland wrote:
>
>>
>> ----- Original Message ----- From: "Nathan Whitehorn" <
>> nwhitehorn@freebsd.org>
>> To: <freebsd-hackers@freebsd.org>; <freebsd-fs@freebsd.org>
>> Sent: Sunday, June 01, 2014 4:55 PM
>> Subject: Re: fdisk(8) vs gpart(8), and gnop
>>
>>
>>  On 06/01/14 08:52, Steven Hartland wrote:
>>>
>>>> ----- Original Message ----- From: "Mark Felder" <feld@freebsd.org>
>>>>
>>>>  On May 31, 2014, at 20:57, Freddie Cash <fjwcash@gmail.com> wrote:
>>>>>
>>>>>  There's a sysctl where you can set the minimum ashift for zfs. Then
>>>>>> you
>>>>>> never need to use gnop.
>>>>>>
>>>>>> I believe it's part of 10.0?
>>>>>>
>>>>>
>>>>> I've not seen this yet. What we need is to port the ability to set
>>>>> ashift at pool creation time:
>>>>>
>>>>> $ zpool create -o ashift=12 tank mirror disk1 disk2 mirror disk3 disk4
>>>>>
>>>>> I believe the Linux zfs port has this functionality now, but we still
>>>>> do not.
>>>>>
>>>>
>>>> We don't have that direct option yet but you can achieve the
>>>> same thing by setting: vfs.zfs.min_auto_ashift=12
>>>>
>>>>  Does anyone have any objections to me changing this default, right
>>> now, today?
>>> -Nathan
>>>
>>
>> I think you will get some objections to that, as it can have quite an
>> impact
>> on the performance for disks which are 512, due to the increased overhead
>> of
>> transfering 4k when only 512 is really required. This has a more dramatic
>> impact on RAIDZx due too.
>>
>> Personally we run a custom kernel on our machines which has just this
>> change
>> in it to ensure capability with future disks, so I can confirm it does
>> indeed
>> have the desired effect :)
>>
>
> So the discussion here is related to what to do about the installer. The
> current ZFS component unconditionally creates gnops all over the place to
> set ashift to 4k. That's across the board worse: it has exactly the
> performance impact of changing the default of this sysctl (whatever that
> is), it can't easily be overridden (which the sysctl can), and it's a
> horrible hack to boot. There are a few options:
>
> 1. Change the default of vfs.zfs.min_auto_ashift
>

This is probably a bad idea -- as others have mentioned, it can drastically
impact space usage and performance on 512B disks, especially when using
small ZFS blocks (e.g. for databases or VDI) and/or RAID-Z.  That said, it
could be a reasonable default for specialized distros that are not used for
these workloads (maybe FreeNAS or PCBSD?).

2. Have the same effect but in a vastly worse way by adjusting the
> installer to create gnops
> 3. Have ZFS choose by itself and decide to do that permanently.
>

If the device reports a 512B sector size, it would be great for ZFS to
assume the device could be lying, and automatically determine the minimum
ashift which gives good performance.  I think this could be done reasonably
well for the common case by doing the following when each 512B-sector
device is added:

1. do random 4KB writes to the disk to determine wIOPS@4K
2. do random 3.5KB writes to the disk to determine wIOPS@3.5K

If wIOPS@4K > wIOPS@3.5K, assume 4KB sectors, otherwise assume 512B
sectors.  (Note: I haven't tried this in practice; we will need to test it
out and perhaps make some tweaks.)

I don't have the time or hardware to implement and test this, but I'd be
happy to mentor or code review.

--matt


>
> Our ATA code is good about reporting block sizes now, so (3) isn't a big
> issue except for the mixed-pool case, which is a huge PITA.
>
> We need to choose one of these. I favor (1).
> -Nathan
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJjvXiFAX7N-30g0OZ6idqLnyJww5dsyhGfLj6nYwKs9Xp--1g>