Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Jun 2015 02:56:05 -0700
From:      Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To:        =?UTF-8?Q?Karli_Sj=C3=B6berg?= <karli.sjoberg@slu.se>
Cc:        Andreas Nilsson <andrnils@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: [Fwd: Strange networking behaviour in storage server]
Message-ID:  <CAOgwaMs=RjxKvvzRHX966K=-sQO_WMHv3o7mg19VYywkLymM7g@mail.gmail.com>
In-Reply-To: <1433149349.14998.181.camel@data-b104.adm.slu.se>
References:  <1433146506.14998.177.camel@data-b104.adm.slu.se> <CAPS9%2BSturmr32jN3d1sfCsQUnyFneSMofT%2BajwqCP=LPg_nseA@mail.gmail.com> <1433149349.14998.181.camel@data-b104.adm.slu.se>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 1, 2015 at 2:02 AM, Karli Sj=C3=B6berg <karli.sjoberg@slu.se> w=
rote:

> m=C3=A5n 2015-06-01 klockan 10:33 +0200 skrev Andreas Nilsson:
> >
> >
> > On Mon, Jun 1, 2015 at 10:14 AM, Karli Sj=C3=B6berg <karli.sjoberg@slu.=
se>
> > wrote:
> >         -------- Vidarebefordrat meddelande --------
> >         > Fr=C3=A5n: Karli Sj=C3=B6berg <karli.sjoberg@slu.se>
> >         > Till: freebsd-fs@freebsd.org <freebsd-fs@freebsd.org>
> >         > =C3=84mne: Strange networking behaviour in storage server
> >         > Datum: Mon, 1 Jun 2015 07:49:56 +0000
> >         >
> >         > Hey!
> >         >
> >         > So we have this ZFS storage server upgraded from 9.3-RELEASE
> >         to
> >         > 10.1-STABLE to overcome not being able to 1) use SSD drives
> >         as
> >         > L2ARC[1]
> >         > and 2) not being able to hotswap SATA drives[2].
> >         >
> >         > After the upgrade we=C2=B4ve noticed a very odd networking
> >         behaviour, it
> >         > sends/receives full speed for a while, then there is a
> >         couple of
> >         > minutes
> >         > of complete silence where even terminal commands like an
> >         "ls" just
> >         > waits
> >         > until they are executed and then it starts sending full
> >         speed again. I
> >         > =C2=B4ve linked to a screenshot showing this send and pause
> >         behaviour. The
> >         > blue line is the total, green is SMB and turquoise is NFS
> >         over jumbo
> >         > frames. It behaves this way regardless of the protocol.
> >         >
> >         > http://oi62.tinypic.com/33xvjb6.jpg
> >         >
> >         > The problem is that these pauses can sometimes be so long
> >         that
> >         > connections drop. Like someone is copying files over SMB or
> >         iSCSI and
> >         > suddenly they get an error message saying that the transfer
> >         failed and
> >         > they have to start over with the file(s). That=C2=B4s horribl=
e!
> >         >
> >         > So far NFS has proven to be the most resillient, it=C2=B4s st=
upid
> >         simple
> >         > nature just waits and resumes transfer when pause is over.
> >         Kudus for
> >         > that.
> >         >
> >         > The server is driven by a Supermicro X9SRL-F, a Xeon 1620v2
> >         and 64GB
> >         > ECC
> >         > RAM. The hardware has been ruled out, we happened to have a
> >         identical
> >         > MB
> >         > and CPU lying around and that didn=C2=B4t improve things. We =
have
> >         also
> >         > installed a Intel PRO 100/1000 Quad-port ethernet adapter to
> >         test if
> >         > that would change things, but it hasn=C2=B4t, it still behave=
s
> >         this way.
> >         >
> >         > The two built-in NIC's are Intel 82574L and the Quad-port
> >         NIC's are
> >         > Intel 82571EB, so both em(4) driven. I happen to know that
> >         the em
> >         > driver
> >         > has updated between 9.3 and 10.1. Perhaps that is to blame,
> >         but I have
> >         > no idea.
> >         >
> >         > Is there anyone that can make sense of this?
> >         >
> >         > [1]:
> >         > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D197164
> >         >
> >         > [2]:
> >         > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D191348
> >         >
> >         > /K
> >         >
> >         >
> >
> >
> >         Another observation I=C2=B4ve made is that during these pauses,=
 the
> >         entire
> >         system is put on hold, even ZFS scrub stops and then resumes
> >         after a
> >         while. Looking in top, the system is completly idle.
> >
> >         Normally during scrub, the kernel eats 20-30% CPU, but during
> >         a pause,
> >         even the [kernel] goes down to 0.00%. Makes me think the
> >         networking has
> >         nothing to do with it.
> >
> >         What=C2=B4s then to blame? ZFS?
> >
> >         /K
> >         _______________________________________________
> >         freebsd-fs@freebsd.org mailing list
> >         http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> >         To unsubscribe, send any mail to
> >         "freebsd-fs-unsubscribe@freebsd.org"
> >
> >
> > Hello,
> >
> >
> > does this happen when clients are only reading from server?
>
> Yes it happens when clients are only reading from the server.
>
> > Otherwise I would suspect that it could be caused by ZFS writing out a
> > large chunck of data sitting in its caches, and until that is complete
> > I/O is stalled.
>
> That=C2=B4s what so strange, we have three more systems set up about the =
same
> size and none of others are acting this way.
>
> The only thing I can think of that differs that we haven=C2=B4t tested ru=
ling
> out yet is ctld, the other systems are still running istgt as their
> iSCSI daemon.
>
> /K
>
>

If there are other three similar systems and they are exactly installed
with the same structure , my first possibility to consider would be to
suspect a slowly progressing hardware failure :

>From a circuit , it is not possible to get a response in expected time ,
but , it is responding after a time which is not normal . Such an action
may be caused by a faulty soldered or cracked line point in the circuit :
When it is hot , it is disconnecting , when it is cold it is connecting .



Thank you very much .


Mehmet Erol Sanliturk





> >
> >
> > Have you tried what is suggested in
> > https://wiki.freebsd.org/ZFSTuningGuide ? In particular setting
> > vfs.zfs.write_limit_override to something appropriate for your site.
> > The timeout seems to be defaulting to 5 now.
> >
> >
> > Best regards
> >
> > Andreas
> >
> >
> >
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOgwaMs=RjxKvvzRHX966K=-sQO_WMHv3o7mg19VYywkLymM7g>