Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jun 2017 17:16:29 +0000
From:      "Caza, Aaron" <Aaron.Caza@ca.weatherford.com>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   RE: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs
Message-ID:  <36ec1fb476e647a78c19463eea493859@DM2PR58MB013.032d.mgd.msft.net>

next in thread | raw e-mail | index | archive | help
> -----Original Message-----

> From: Steven Hartland [mailto:killing@multiplay.co.uk]

> Sent: Monday, June 19, 2017 7:32 PM

> To: freebsd-fs@freebsd.org<mailto:freebsd-fs@freebsd.org>

> Subject: Re: FreeBSD 11.1 Beta 2 ZFS performance degradation on SSDs

>

> On 20/06/2017 01:57, Caza, Aaron wrote:

> >> vfs.zfs.min_auto_ashift is a sysctl only its not a tuneable, so settin=
g it in /boot/loader.conf won't have any effect.

> >>

> >> There's no need for it to be a tuneable as it only effects vdevs when =
they are created, which an only be done once the system is running.

> >>

> > The bsdinstall install script itself set vfs.zfs.min_auto_shift=3D12 in=
 /boot/loader.conf yet, as you say, this doesn't do anything.  As a user, t=
his is a bit confusing to see it in /boot/loader.conf but do a 'sysctl -a |=
 grep min_auto_ashift' and see 'vfs.zfs.min_auto_ashift: 9' so felt it was =
worth mentioning.

> Absolutely, patch is in review here:

> https://reviews.freebsd.org/D11278



Thanks for taking care of this Steve - appreciated.



> >

> >> You don't explain why you believe there is degrading performance?

> > As I related in my post, my previous FreeBSD 11-Stable setup using this=
 same hardware, I was seeing 950MB/s after bootup.  I've been posting to th=
e freebsd-hackers list, but have moved to freebsd-fs list as this seemingly=
 has something to do with FreeBSD+ZFS behavior and user Jov had previously =
cross-posted to this list for me:

> > https://docs.freebsd.org/cgi/getmsg.cgi?fetch=3D2905+0+archive/2017/fr

> > ee

> > bsd-fs/20170618.freebsd-fs

> >

> > I've been using FreeBSD+ZFS ever since FreeBSD 9.0, admittedly, with a =
different zpool layout which is essentially as follows:

> >      adaXp1 - gptboot loader

> >      adaXp2 - 1GB UFS partition

> >      adaXp3 - UFS with UUID labeled partition hosting a GEOM ELI

> > layer using NULL encryption to emulate 4k sectors (done before

> > ashift was an

> > option)

> >

> > So, adaXp3 would show up as something like the following:

> >

> >    /dev/gpt/b62feb20-554b-11e7-989b-000bab332ee8

> >    /dev/gpt/b62feb20-554b-11e7-989b-000bab332ee8.eli

> >

> > Then, the zpool mirrored pair would be something like the following:

> >

> >    pool: wwbase

> >   state: ONLINE

> >    scan: none requested

> > config:

> >

> >          NAME                                              STATE     RE=
AD WRITE CKSUM

> >          wwbase                                            ONLINE      =
 0     0     0

> >            mirror-0                                        ONLINE      =
 0     0     0

> >              gpt/b62feb20-554b-11e7-989b-000bab332ee8.eli  ONLINE      =
 0     0     0

> >              gpt/4c596d40-554c-11e7-beb1-002590766b41.eli  ONLINE      =
 0     0     0

> >

> > Using the above zpool configuration on this same hardware on FreeBSD

> > 11-Stable, I was seeing read speeds of 950MB/s using dd (dd

> > if=3D/testdb/test of=3D/dev/null bs=3D1m).  However, after anywhere fro=
m 5

> > to 24 hours, performance would degrade down to less than 100MB/s for

> > unknown reasons - server was essentially idle so it's a

>  mystery to me why this occurs.  I'm seeing this behavior on FreeBSD

> 10.3R amd64 up through FreeBSD11.0 Stable.  As I wasn't making any headwa=
y in resolving this, I opted today to use the FreeBSD11.1 Beta 2 memstick i=
mage to create a basic FreeBSD 11.1 Beta 2 amd64 Auto(ZFS) installation to =
see if this would resolve the original  issue I was having as I would be us=
ing ZFS-on-root and vfs.zfs.min_auto_ashift=3D12 instead of my own emulatio=
n as described above.  However, instead of seeing the 950MB/s that I expect=
ed - which it what I see it with my alternative emulation - I'm seeing 450M=
B/s.  I've yet to determine if this zpool setup as done by the bsdinstall s=
cript > will suffer from the original performance degradation I observed.

> >

> >> What is the exact dd command your running as that can have a huge impa=
ct on performance.

> > dd if=3D/testdb/test of=3D/dev/null bs=3D1m

> >

> > Note that file /testdb/test is 16GB, twice the size of ram available in=
 this system.  The /testdb directory is a ZFS file system with recordsize=
=3D8k, chosen as ultimately it's intended to host a PostgreSQL database whi=
ch uses an 8k page size.

> >

> > My understanding is that a ZFS mirrored pool with two drives can read f=
rom both drives at the same time hence double the speed.  This is what I've=
 actually observed ever since I first started using this in FreeBSD 9.0 wit=
h the GEOM ELI 4k sector emulation.  This is actually my first time using F=
reeBSD's native installer's Auto(ZFS) setup > with 4k sectors emulated usin=
g vfs.zfs.min_auto_ashift=3D12.  As it's a ZFS mirrored pool, I still expec=
ted it to be able to read at double-speed as it does with the GEOM ELI 4k s=
ector emulation; however, it does not.

> >

>> On 19/06/2017 23:14, Caza, Aaron wrote:

>>> I've been  having a problem with FreeBSD ZFS SSD performance inexplicab=
ly degrading after < 24  hours uptime as described in a separate e-mail thr=
ead.  In an effort to get down to basics, I've now performed a ZFS-on-Root =
install of FreeBSD 11.1 Beta 2 amd64 using the default Auto(ZFS) install us=
ing the default 4k sector emulation (vfs.zfs.min_auto_ashift=3D3D12) settin=
g (no swap, not encrypted).

>>>

>>> Firstly, the vfs.zfs.min_auto_ashift=3D3D12 is set correctly in the /bo=
ot=3D/loader.conf file, but doesn't appear to work because when I log in an=
d do "systctl -a | grep min_auto_ashift" it's set to 9 and not 12 as expect=
ed.  I tried setting it to vfs.zfs.min_auto_ashift=3D3D"12" in /boot/loader=
.conf but that didn't make any difference so I finally just added it to /et=
c/sysctl.conf where it seems to work.  So, something needs to be changed to=
 make this functionaly work correctly.

>>>

>>> Next, after reboot I was expecting somewhere in the neighborhood of 950=
MB/s from the ZFS mirrored zpool of 2 Samsung 850 Pro 256GB SSDs that I'm u=
sing as I was previously seeing this before with my previous FreeBSD 11-Sta=
ble setup which, admittedly, is a different from the way the bsdinstall scr=
ipt does it.  However, I'm seeing half that on bootup.

>>>

>>> Performance result:

>>> Starting 'dd' test of large file...please wait

>>> 16000+0 records in

>>> 16000+0 records out

>>> 16777216000 bytes transferred in 37.407043 secs (448504207

>>> bytes/sec)

> Can you show the output from gstat -pd during this DD please.



dT: 1.001s  w: 1.000s

L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d =
  %busy Name

    0   4318   4318  34865    0.0      0      0    0.0      0      0    0.0=
   14.2| ada0

    0   4402   4402  35213    0.0      0      0    0.0      0      0    0.0=
   14.4| ada1



dT: 1.002s  w: 1.000s

L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d =
  %busy Name

    1   4249   4249  34136    0.0      0      0    0.0      0      0    0.0=
   14.1| ada0

    0   4393   4393  35287    0.0      0      0    0.0      0      0    0.0=
   14.5| ada1



Every now and again, I was seeing d/s hit, which I understand to be TRIM op=
erations - it would briefly show 16 then go back to 0.



test@f111beta2:~ # dd if=3D/testdb/test of=3D/dev/null bs=3D1m

16000+0 records in

16000+0 records out

16777216000 bytes transferred in 43.447343 secs (386150561 bytes/sec) test@=
f111beta2:~ # uptime  9:54AM  up 19:38, 2 users, load averages: 2.92, 1.01,=
 0.44 root@f111beta2:~ # dd if=3D/testdb/test of=3D/dev/null bs=3D1m

16000+0 records in

16000+0 records out

16777216000 bytes transferred in 236.097011 secs (71060688 bytes/sec) test@=
f111beta2:~ # uptime 10:36AM  up 20:20, 2 users, load averages: 0.90, 0.62,=
 0.36



As can be seen in the above 'dd' test results, I'm back to seeing the origi=
nal issue I reported on freebsd-hackers - performance degrading after < 24 =
hours of uptime going from ~386MB/sec to ~71MB/sec inexpicably - this serve=
r isn't doing anything other than running this test hourly.



Please note in the gstat -pd output above, this was after the performance d=
egradation hit.  Prior to this, I was seeing %busy of ~60%.  In this partic=
ular instance, the performance degradation hit ~20hrs into the test but I'v=
e see it hit as soon as ~5hrs.



Previously, Allan Jude had advised zfs.vfs.trim.enabled=3D0 to see if this =
changed the behavior.  I did this; however, it had no impact - but that was=
 when I was using the GEOM ELI 4k sector emulation and not the ashift 4k se=
ctor emulation.  The GEOM ELI 4k sector emulation does not appear to work w=
ith TRIM operations as gstat -d in that case always stayed at 0 ops/s.  I c=
an try disabling trim, but did not want to reboot the server to restart the=
 test in case there was some additional info worth capturing.



I have captured an hourly log that can be provided containing the initial d=
msg, zpool status, zfs list, zfs get all along with an hourly capture of th=
e results of running the above 'dd' test with associated zfs-stats -a and s=
ysctl -a output though it's currently 2.8MB hence too large to post to this=
 list.



Also, there seems to be a problem with my freebsd-fs subscription as I'm no=
t getting e-mail notifications despite having submitted a subscription requ=
est so apologies for my slow responses.



--

Aaron


Aaron Caza
Senior Server Developer
Weatherford SLS Canada R&D Group
Weatherford | 1620 27 Ave NE | #124B | Calgary | AB | T2E 8W4
Direct +1 (403) 693-7773
Aaron.Caza@ca.weatherford.com<mailto:Aaron.Caza@ca.weatherford.com> | www.w=
eatherford.com<http://www.weatherford.com/>;

[cid:image001.jpg@01D27566.E8E4ABE0]<http://www.weatherford.com/>;

[cid:image002.jpg@01D27566.E8E4ABE0]<https://www.linkedin.com/company/weath=
erford>

[cid:image003.jpg@01D27566.E8E4ABE0]<https://www.facebook.com/WeatherfordCo=
rp/>

[cid:image004.jpg@01D27566.E8E4ABE0]<https://www.youtube.com/user/weatherfo=
rdcorp>

[cid:image005.jpg@01D27566.E8E4ABE0]<https://twitter.com/WeatherfordCorp?la=
ng=3Den>


This message may contain confidential and privileged information.  If it ha=
s been sent to you in error, please reply to advise the sender of the error=
 and then immediately delete it.  If you are not the intended recipient, do=
 not read, copy, disclose or otherwise use this message.  The sender discla=
ims any liability for such unauthorized use.  PLEASE NOTE that all incoming=
 e-mails sent to Weatherford e-mail accounts will be archived and may be sc=
anned by us and/or by external service providers to detect and prevent thre=
ats to our systems, investigate illegal or inappropriate behavior, and/or e=
liminate unsolicited promotional e-mails("spam").  This process could resul=
t in deletion of a legitimate e-mail before it is read by its intended reci=
pient at our organization.  Moreover, based on the scanning results, the fu=
ll text of e-mails and attachments may be made available to Weatherford sec=
urity and other personnel for review and appropriate action.  If you have a=
ny concerns about this process, please contact us at dataprivacy@weatherfor=
d.com.

This message may contain confidential and privileged information. If it has=
 been sent to you in error, please reply to advise the sender of the error =
and then immediately delete it. If you are not the intended recipient, do n=
ot read, copy, disclose or otherwise use this message. The sender disclaims=
 any liability for such unauthorized use. PLEASE NOTE that all incoming e-m=
ails sent to Weatherford e-mail accounts will be archived and may be scanne=
d by us and/or by external service providers to detect and prevent threats =
to our systems, investigate illegal or inappropriate behavior, and/or elimi=
nate unsolicited promotional e-mails (spam). This process could result in d=
eletion of a legitimate e-mail before it is read by its intended recipient =
at our organization. Moreover, based on the scanning results, the full text=
 of e-mails and attachments may be made available to Weatherford security a=
nd other personnel for review and appropriate action. If you have any conce=
rns about this process, please contact us at dataprivacy@weatherford.com.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?36ec1fb476e647a78c19463eea493859>