Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Jun 2015 10:33:07 +0200
From:      Andreas Nilsson <andrnils@gmail.com>
To:        =?UTF-8?Q?Karli_Sj=C3=B6berg?= <karli.sjoberg@slu.se>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: [Fwd: Strange networking behaviour in storage server]
Message-ID:  <CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA@mail.gmail.com>
In-Reply-To: <1433146506.14998.177.camel@data-b104.adm.slu.se>
References:  <1433146506.14998.177.camel@data-b104.adm.slu.se>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 1, 2015 at 10:14 AM, Karli Sj=C3=B6berg <karli.sjoberg@slu.se> =
wrote:

> -------- Vidarebefordrat meddelande --------
> > Fr=C3=A5n: Karli Sj=C3=B6berg <karli.sjoberg@slu.se>
> > Till: freebsd-fs@freebsd.org <freebsd-fs@freebsd.org>
> > =C3=84mne: Strange networking behaviour in storage server
> > Datum: Mon, 1 Jun 2015 07:49:56 +0000
> >
> > Hey!
> >
> > So we have this ZFS storage server upgraded from 9.3-RELEASE to
> > 10.1-STABLE to overcome not being able to 1) use SSD drives as
> > L2ARC[1]
> > and 2) not being able to hotswap SATA drives[2].
> >
> > After the upgrade we=C2=B4ve noticed a very odd networking behaviour, i=
t
> > sends/receives full speed for a while, then there is a couple of
> > minutes
> > of complete silence where even terminal commands like an "ls" just
> > waits
> > until they are executed and then it starts sending full speed again. I
> > =C2=B4ve linked to a screenshot showing this send and pause behaviour. =
The
> > blue line is the total, green is SMB and turquoise is NFS over jumbo
> > frames. It behaves this way regardless of the protocol.
> >
> > http://oi62.tinypic.com/33xvjb6.jpg
> >
> > The problem is that these pauses can sometimes be so long that
> > connections drop. Like someone is copying files over SMB or iSCSI and
> > suddenly they get an error message saying that the transfer failed and
> > they have to start over with the file(s). That=C2=B4s horrible!
> >
> > So far NFS has proven to be the most resillient, it=C2=B4s stupid simpl=
e
> > nature just waits and resumes transfer when pause is over. Kudus for
> > that.
> >
> > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2 and 64GB
> > ECC
> > RAM. The hardware has been ruled out, we happened to have a identical
> > MB
> > and CPU lying around and that didn=C2=B4t improve things. We have also
> > installed a Intel PRO 100/1000 Quad-port ethernet adapter to test if
> > that would change things, but it hasn=C2=B4t, it still behaves this way=
.
> >
> > The two built-in NIC's are Intel 82574L and the Quad-port NIC's are
> > Intel 82571EB, so both em(4) driven. I happen to know that the em
> > driver
> > has updated between 9.3 and 10.1. Perhaps that is to blame, but I have
> > no idea.
> >
> > Is there anyone that can make sense of this?
> >
> > [1]:
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D197164
> >
> > [2]:
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D191348
> >
> > /K
> >
> >
>
> Another observation I=C2=B4ve made is that during these pauses, the entir=
e
> system is put on hold, even ZFS scrub stops and then resumes after a
> while. Looking in top, the system is completly idle.
>
> Normally during scrub, the kernel eats 20-30% CPU, but during a pause,
> even the [kernel] goes down to 0.00%. Makes me think the networking has
> nothing to do with it.
>
> What=C2=B4s then to blame? ZFS?
>
> /K
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

Hello,

does this happen when clients are only reading from server? Otherwise I
would suspect that it could be caused by ZFS writing out a large chunck of
data sitting in its caches, and until that is complete I/O is stalled.

Have you tried what is suggested in https://wiki.freebsd.org/ZFSTuningGuide
? In particular setting vfs.zfs.write_limit_override to something
appropriate for your site. The timeout seems to be defaulting to 5 now.

Best regards
Andreas



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA>