Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 May 2013 14:45:49 -0700
From:      Freddie Cash <fjwcash@gmail.com>
To:        Brendan Gregg <brendan.gregg@joyent.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: Strange slowdown when cache devices enabled in ZFS
Message-ID:  <CAOjFWZ6CzbYSSnso-rqDWaA=VxcDBx%2BKG=6KX3oT2ijbECm=sQ@mail.gmail.com>
In-Reply-To: <CA%2BXzFFgG%2BJs2w%2BHJFXXd=opsdnR7Z0n1ThPPtMM1qFsPg-dsaQ@mail.gmail.com>
References:  <CA%2BXzFFgG%2BJs2w%2BHJFXXd=opsdnR7Z0n1ThPPtMM1qFsPg-dsaQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 8, 2013 at 2:35 PM, Brendan Gregg <brendan.gregg@joyent.com>wrote:

> Freddie Cash wrote (Mon Apr 29 16:01:55 UTC 2013):
> |
> | The following settings in /etc/sysctl.conf prevent the "stalls"
> completely,
> | even when the L2ARC devices are 100% full and all RAM is wired into the
> | ARC.  Been running without issues for 5 days now:
> |
> | vfs.zfs.l2arc_norw=0                                  # Default is 1
> | vfs.zfs.l2arc_feed_again=0                         # Default is 1
> | vfs.zfs.l2arc_noprefetch=0                          # Default is 0
> | vfs.zfs.l2arc_feed_min_ms=1000                 # Default is 200
> | vfs.zfs.l2arc_write_boost=320000000           # Default is 8 MBps
> | vfs.zfs.l2arc_write_max=160000000             # Default is 8 MBps
> |
> | With these settings, I'm also able to expand the ARC to use the full 128
> GB
> | of RAM in the biggest box, and to use both L2ARC devices (60 GB in
> total).
> | And, can set primarycache and secondarycache to all (the default) instead
> | of just metadata.
> |[...]
>
> The thread earlier described a 100% CPU-bound l2arc_feed_thread, which
> could be caused by these settings:
>
> vfs.zfs.l2arc_write_boost=320000000           # Default is 8 MBps
> vfs.zfs.l2arc_write_max=160000000             # Default is 8 MBps
>
> If I'm reading that correctly, it's increasing the write max and boost to
> be 160 Mbytes and 320 Mbytes. To satisfy these, the L2ARC must scan memory
> from the tail of the ARC lists, lists which may be composed of tiny buffers
> (eg, 8k). Increasing that scan 20 fold could saturate a CPU. And, if it
> doesn't find many bytes to write out, then it will rescan the same buffers
> on the next interval, wasting CPU cycles.
>
> I understand the intent was probably to warm up the L2ARC faster. There is
> no easy way to do this: you are bounded by the throughput of random reads
> from the pool disks.
>
> Random read workloads usually have a 4 - 16 Kbyte record size. The l2arc
> feed thread can't eat uncached data faster than the random reads can be
> read from disk. Therefore, at 8 Kbytes, you need at least 1,000 random read
> disk IOPS to achieve a rate of 8 Mbytes from the ARC list tails, which, for
> rotational disks performing roughly 100 random IOPS (use a different rate
> if you like), means about a dozen disks - depending on the ZFS RAID config.
> All to feed at 8 Mbytes/sec. This is why 8 Mbytes/sec (plus the boost) is
> the default.
>
> To feed at 160 Mbytes/sec, with an 8 Kbyte recsize, you'll need at least
> 20,000 random read disk IOPS. How many spindles does that take? A lot. Do
> you have a lot?
>
>
45x 2 TB SATA harddrives, configured in raidz2 vdevs of 6 disks each for a
total of 7 vdevs (with a few spare disks).  With 2x SSD for log+OS and 2x
SSD for cache.

With plans to expand that out with another 45-disk JBOD next summer-ish
(2014)

With the settings above, I get 120 MBps of writes to the L2ARC until each
SSD is over 90% full (boot), then it settles around 5-10 MBps while
receiving snapshots from the other 3 servers.

I guess I could change the settings to make the _boost 100-odd MBps and
leave the _max at the default.  I'll play with the l2arc_write_* settings
to see if that makes a difference with l2arc_norw enabled.

-- 
Freddie Cash
fjwcash@gmail.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOjFWZ6CzbYSSnso-rqDWaA=VxcDBx%2BKG=6KX3oT2ijbECm=sQ>