Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Jun 2014 18:07:14 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Nathan Whitehorn" <nwhitehorn@freebsd.org>, "Matthew Ahrens" <mahrens@delphix.com>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: fdisk(8) vs gpart(8), and gnop
Message-ID:  <A0BA121A5D6941E2B0A3FA41948A2F10@multiplay.co.uk>
References:  <20140601004242.GA97224@bewilderbeast.blackhelicopters.org> <CAOjFWZ5N9FGwgSz0_YFNQjavzdJDitRn52VKn4ipW1ddj6-weQ@mail.gmail.com> <BCA9F5D6-3925-4E7E-9082-128652508305@FreeBSD.org> <3D6974D83AE9495E890D9F3CA654FA94@multiplay.co.uk> <538B4CEF.2030801@freebsd.org> <1DB2D63312CE439A96B23EAADFA9436E@multiplay.co.uk> <538B4FD7.4090000@freebsd.org> <CAJjvXiFAX7N-30g0OZ6idqLnyJww5dsyhGfLj6nYwKs9Xp--1g@mail.gmail.com> <538C9207.9040806@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

----- Original Message ----- 
From: "Nathan Whitehorn" <nwhitehorn@freebsd.org>
To: "Matthew Ahrens" <mahrens@delphix.com>
Cc: "freebsd-fs" <freebsd-fs@freebsd.org>; "FreeBSD Hackers" <freebsd-hackers@freebsd.org>; "Steven Hartland" 
<killing@multiplay.co.uk>
Sent: Monday, June 02, 2014 4:02 PM
Subject: Re: fdisk(8) vs gpart(8), and gnop


> On 06/01/14 14:27, Matthew Ahrens wrote:
>>
>>>> I think you will get some objections to that, as it can have quite an
>>>> impact
>>>> on the performance for disks which are 512, due to the increased overhead
>>>> of
>>>> transfering 4k when only 512 is really required. This has a more dramatic
>>>> impact on RAIDZx due too.
>>>>
>>>> Personally we run a custom kernel on our machines which has just this
>>>> change
>>>> in it to ensure capability with future disks, so I can confirm it does
>>>> indeed
>>>> have the desired effect :)
>>>>
>>> So the discussion here is related to what to do about the installer. The
>>> current ZFS component unconditionally creates gnops all over the place to
>>> set ashift to 4k. That's across the board worse: it has exactly the
>>> performance impact of changing the default of this sysctl (whatever that
>>> is), it can't easily be overridden (which the sysctl can), and it's a
>>> horrible hack to boot. There are a few options:
>>>
>>> 1. Change the default of vfs.zfs.min_auto_ashift
>>>
>> This is probably a bad idea -- as others have mentioned, it can drastically
>> impact space usage and performance on 512B disks, especially when using
>> small ZFS blocks (e.g. for databases or VDI) and/or RAID-Z.  That said, it
>> could be a reasonable default for specialized distros that are not used for
>> these workloads (maybe FreeNAS or PCBSD?).
>>
>> 2. Have the same effect but in a vastly worse way by adjusting the
>>> installer to create gnops
>>> 3. Have ZFS choose by itself and decide to do that permanently.
>>>
>> If the device reports a 512B sector size, it would be great for ZFS to
>> assume the device could be lying, and automatically determine the minimum
>> ashift which gives good performance.  I think this could be done reasonably
>> well for the common case by doing the following when each 512B-sector
>> device is added:
>>
>> 1. do random 4KB writes to the disk to determine wIOPS@4K
>> 2. do random 3.5KB writes to the disk to determine wIOPS@3.5K
>>
>> If wIOPS@4K > wIOPS@3.5K, assume 4KB sectors, otherwise assume 512B
>> sectors.  (Note: I haven't tried this in practice; we will need to test it
>> out and perhaps make some tweaks.)
>>
>> I don't have the time or hardware to implement and test this, but I'd be
>> happy to mentor or code review.
>>
>> --matt
>
> I think we basically don't have any lying disks anymore. The ATA code does a very good job of this -- most tell the truth, but 
> in an odd way that gets reported up the stack. ada(4) has a quirks table for the ones that do not. If this is the only concern, 
> then we should just stop telling people to worry about this.
>
> My bigger concern is this pool upgrade one -- what if someone puts in a 4K disk in the future?

Thats very much not the case I'm afraid, I try to add quirks for disk as
they are reported but there's always going to be quite a few which are
wrong until manufacturers stop making their FW lie :(

We really need a system which can be user updated for this sort of thing
but I've not had any time to even think about that I'm afraid. IIRC scottl
has ideas in the this area too.

    Regards
    Steve

    Regards
    Steve 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A0BA121A5D6941E2B0A3FA41948A2F10>