Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Sep 2015 07:13:35 +1000
From:      Peter Jeremy <peter@rulingia.com>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS cpu requirements, with/out compression and/or dedup
Message-ID:  <20150921211335.GB41102@server.rulingia.com>
In-Reply-To: <20150921170216.GA98888@blazingdot.com>
References:  <CAEW%2BogbPswfOWQzbwNZR5qyMrCEfrcSP4Q7%2By4zuKVVD=KNuUA@mail.gmail.com> <alpine.GSO.2.01.1509190843040.1673@freddy.simplesystems.org> <20150921170216.GA98888@blazingdot.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--8P1HSweYDcXXzwPJ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2015-Sep-21 13:50:38 +0100, krad <kraduk@gmail.com> wrote:
>"It's also 'permanent' in the sense that you have to turn it on with the
>> creation of a dataset and can't disable it without nuking said dataset. "
>
>This is completely untrue,  there performance issues with dedup are limited
>to writes only, as it needs to check the DDT table for every write to the
>file system with dedup enabled.

Well, it's partially true.  Once you enable dedup on a dataset, it creates
a DDT and it's not possible to remove the DDT without nuking the dataset.
There are basically 3 operations on a block:
Read a block: DDT is never referenced.
Write a new block: DDT is referenced is dedup is enabled.
Free a block: DDT is always referenced if it exists.

The usual "fall off a cliff" scenario is when you go to delete a large file
or snapshot on a dataset where dedup has been enabled at some point in the
past, even if it's not enabled now.  Every block in that file or snapshot
is checked against the DDT.  Since the DDT is basically a very large hash
table this entails lots of random I/O.

On 2015-Sep-21 10:10:46 -0400, Quartz <quartz@sneakertech.com> wrote:
>Also, just for reference: according to the specs each entry in the dedup
>table costs about 320 bytes of memory per block of disk. This means that
>AT BEST (assuming ZFS decides to use full 128K blocks in your case)
>you'll need 2.5GB of ram per 1 TB of used space just for the DDT stuff

And at worst, assuming advanced format disks, you'll have 4K blocks and
need 80GB RAM per 1 TB used space.

In general, the downsides of dedup outweigh the benefits.  If you already
have the data in ZFS, you can use 'zdb -S' to see what effect rebuilding
the pool with dedup enabled would have - how much disk space you will save
and how big the DDT is (and hence how much RAM you will need).  If you can
afford it, make sure you keep good backups, enable DDT and be ready to nuke
the pool and restore from backups if dedup doesn't work out.

On 2015-Sep-21 10:02:16 -0700, Marcus Reid <marcus@blazingdot.com> wrote:
>This is misleading.  lz4 compression is so fast that in the common case
>it _increases_ performance.

This is true of most of the compression algorithms.

>In addition, lz4 has early-abort where it will detect that the data is
>uncompressible, and just write it out when it is instead of compressing
>it.

I'm not sure how lz4 decides data is uncompressible without trying to
compress it.  The way ZFS compression works is that it tries to compress
a block.  Unless the compressed data is small enough to fit into a smaller
block (ie 2:1 compression or better), the uncompressed data is stored.
(And blocks of NULs are "stored" as holes it the file without attempting
compression).  In general, unless you know a dataset will always be
filled with pre-compressed data - videos, non-RAW images, distfiles -
you are better off enabling compression.

--=20
Peter Jeremy

--8P1HSweYDcXXzwPJ
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQJ8BAEBCgBmBQJWAHL/XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRFRUIyOTg2QzMwNjcxRTc0RTY1QzIyN0Ux
NkE1OTdBMEU0QTIwQjM0AAoJEBall6Dkogs0WfkQAJkJBxN2A/+yhx8A8T3sF3vz
+4cX4k7ThFo68NAK9qwLRcj2V/dnPq+PgR9iIrhp4G1mwkTynoFJe78dCrND6rRU
rPIKt9sQOWKm0TqLaVCX2xGfmG/DZquh4d8WmkMqbWycoAScnUfkLXEEn1rP0EVS
wc0V4emmkD/AIYZ9zjGAfV9mHIn1p+uyVMdB2ATnA6e2WdkfnFImIc1tzYjF32cc
bkI1NPEe89wKPfTl0PTl1hO2Aubs5xFlSpqLXPfB4BqpnKdH2d0ZRbBVb7Gmjrd8
G4sEtMfkqr5Yx7EI9bsbowZWtja+Cb7OWgs3+0DV/xeODwZunKQeJm4SUORQkvFE
Pwjv3+gWptwiSKneeqwZXAgME9S0pPUY364PIfwBijTmu7pzYqNbkAHYuvc1e3vz
o1h67mE91dW4r4vIGSbUDTI0Gd1mFWsJHyHrXdrYa8+FZXUnixOOSqyAsLTUr0bp
q/6M+/BevmE5hLRejqFmTNjEQ8KA5xRiSofwu2TTsp1vEaEgWx5n47IAzIrPtJ6B
878wfVSyOw0S/K6br2LJONKTc/G9lJN/RAFMrxuUP2IPHJT+A/4utCKRg5Ytyf/W
e6DP6IotydMJUnGRJSnw1iccu7PNGFbxLXb4Eo1F2tbOjymZiaBq6tgMKBMQ+zvw
G+b15asyXYbQhSjGL/vy
=DFQV
-----END PGP SIGNATURE-----

--8P1HSweYDcXXzwPJ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150921211335.GB41102>