Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Jan 2016 13:52:09 +0530
From:      Raghavendra G <raghavendra@gluster.com>
To:        Xavier Hernandez <xhernandez@datalab.es>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, freebsd-fs <freebsd-fs@freebsd.org>,  Hubbard Jordan <jkh@ixsystems.com>, Gluster Devel <gluster-devel@gluster.org>
Subject:   Re: [Gluster-devel] FreeBSD port of GlusterFS racks up a lot of CPU usage
Message-ID:  <CADRNtgRM17Eg3Z=LWifVNo=ai72dMiEVRKS3RwNfQ-dK7Pspew@mail.gmail.com>
In-Reply-To: <568F6D07.6070500@datalab.es>
References:  <571237035.145690509.1451437960464.JavaMail.zimbra@uoguelph.ca> <20151230103152.GS13942@ndevos-x240.usersys.redhat.com> <2D8C2729-D556-479B-B4E2-66E1BB222F41@ixsystems.com> <1083933309.146084334.1451517977647.JavaMail.zimbra@uoguelph.ca> <CADRNtgStOg8UZfxNt-SzvvPf7d1J7CC_gi49ww3BbixU0Ey-rg@mail.gmail.com> <568F6D07.6070500@datalab.es>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jan 8, 2016 at 1:32 PM, Xavier Hernandez <xhernandez@datalab.es>
wrote:

>
> On 08/01/16 05:42, Raghavendra G wrote:
>
>> Sorry for the delayed reply. Had missed out this mail. Please find my
>> comments inlined.
>>
>> On Thu, Dec 31, 2015 at 4:56 AM, Rick Macklem <rmacklem@uoguelph.ca
>> <mailto:rmacklem@uoguelph.ca>> wrote:
>>
>>     Jordan Hubbard wrote:
>>     >
>>     > > On Dec 30, 2015, at 2:31 AM, Niels de Vos <ndevos@redhat.com
>> <mailto:ndevos@redhat.com>> wrote:
>>     > >
>>     > >> I'm guessing that Linux uses the event-epoll stuff instead of
>> event-poll,
>>     > >> so it wouldn't exhibit this. Is that correct?
>>     > >
>>     > > Well, both. most (if not all) Linux builds will use event-poll.
>> But,
>>     > > that calls epoll_wait() with a timeout of 1 millisecond as well.
>>     > >
>>     > >> Thanks for any information on this, rick
>>     > >> ps: I am tempted to just crank the timeout of 1msec up to 10 or
>> 20msec.
>>     > >
>>     > > Yes, that is probably what I would do too. And have both poll
>> functions
>>     > > use the same timeout, have it defined in
>> libglusterfs/src/event.h. We
>>     > > could make it a configurable option too, but I do not think it i=
s
>> very
>>     > > useful to have.
>>     >
>>     > I guess this begs the question - what=E2=80=99s the actual purpose=
 of
>> polling for an
>>     > event with a 1 millisecond timeout?  If it was some sort of
>> heartbeat check,
>>     > one might imagine that would be better served by a timer with
>> nothing close
>>     > to 1 millisecond as an interval (that would be one seriously
>> aggressive
>>     > heartbeat) and if filesystem events are incoming that glusterfs
>> needs to
>>     > respond to, why timeout at all?
>>     >
>>     If I understand the code (I probably don't) the timeout allows the
>> loop
>>     to call a function that may add new fd's to be polled. (If I'm right=
,
>>     the new ones might not get serviced.)
>>
>>
>> Yes, that's correct. Since in poll we pass the fds to be polled in an
>> array as an argument, the only place where we can add/remove fds to be
>> polled is at the time we call poll sycall. To make adding/removing fds
>> from polling to be more responsive, poll timeouts "frequently enough".
>> The trade-off we are considering here is between:
>>
>> 1. Number of calls to poll
>>             vs
>> 2. Responsiveness of adding/removing a new fd from polling.
>>
>> For clients, there is not much change of the list of fds that are
>> polled. However, for bricks/server this list can vary frequently as new
>> clients are connected/disconnected.
>>
>> Since epoll provides a way to add new fds for polling while an
>> epoll_wait is in progress (unlike poll), the timeout of epoll_wait is
>> infinite. Also note that on systems where both epoll and poll are
>> available, epoll is preferred over poll.
>>
>
> I don't know anything about gluster's poll implementation so I may be
> totally wrong, but would it be possible to use an eventfd (or a pipe if
> eventfd is not supported) to signal the need to add more file descriptors
> to the poll call ?
>
> The poll call should listen on this new fd. When we need to change the fd
> list, we should simply write to the eventfd or pipe from another thread.
> This will cause the poll call to return and we will be able to change the
> fd list without having a short timeout nor having to decide on any
> trade-off.
>

Thats a nice idea. Based on my understanding of why timeouts are being
used, this approach can work.


>
> Just an idea...
>
> Xavi
>
>
>>
>>     I'll post once I've tried a longer timeout and if it seems ok, I wil=
l
>>     put it in the Redhat bugs database (as mentioned in the last post).
>>     In its current form, it's fine for testing.
>>
>>     > I also have a broader question to go with the specific one:  We (a=
t
>>     > iXsystems) were attempting to engage with some of the Red Hat folk=
s
>> back
>>     > when the FreeBSD port was first done, in the hope of getting it mo=
re
>>     > =E2=80=9Cofficially supported=E2=80=9D for FreeBSD and perhaps eve=
n donating some
>> more
>>     > serious stress-testing and integration work for it, but when those
>> Red Hat
>>     > folks moved on we lost continuity and the effort stalled.  Who at
>> Red Hat
>>     > would / could we work with in getting this back on track?  We=E2=
=80=99d
>> like to
>>     > integrate glusterfs with FreeNAS 10, and in fact have already done
>> so but
>>     > it=E2=80=99s still early days and we=E2=80=99re not even really su=
re what we have
>> yet.
>>     >
>>     Just fyi..sofar, working with FreeBSD11/head and the port of 3.7.6
>>     (the port tarball
>>     is in FreeBSD PR#194409), the only GlusterFS problem I've encountere=
d
>> is
>>     the above one. I'm not sure why this isn't in /usr/ports, but that
>>     would be
>>     nice as it might get more people trying it. (I'm a src comitter, but
>>     not a
>>     ports one.)
>>
>>     However, I have several patches for the FreeBSD fuse interface and f=
or
>>     a mount_glusterfs mount to work ok you need a couple of them.
>>     1 - When an open decides to do DIRECT_IO after the file has done
>> buffer
>>          cache I/O the buffer cache needs to be invalidated so you don't
>> get
>>          stale cached data.
>>     2 - For a WRONLY write, you need to force DIRECT_IO (or do a
>>     read/write open).
>>          If you don't do this, the buffer cache code will get stuck when
>>     trying
>>          to read a block in before writing a partial block. (I think thi=
s
>> is
>>          what FreeBSD PR#194293 is caused by.)
>>
>>     Because I won't be able to do svn until April, these patches won't
>>     make it
>>     into head for a while, but they will both be in PR#194293 within
>> hours.
>>
>>     The others add features like extended attributes, advisory byte
>>     range locking
>>     and the changes needed to export the fuse/glusterfs mount via the
>>     FreeBSD
>>     kernel nfsd. If anyone wants/needs these patches, email and I can
>>     send you
>>     them.
>>
>>     A bit off your topic, but until you have the fixes for FreeBSD fuse,
>> you
>>     probably can't do a lot of serious testing.
>>     (I don't know, but I'd guess that FreeNAS has about the same fuse
>> module
>>       code as FreeBSD's head, since it hasn't been changed much in head
>>     recently.)
>>
>>     Thanks everyone for your help with this, rick
>>
>>      > Thanks,
>>      >
>>      > - Jordan
>>      >
>>      >
>>     _______________________________________________
>>     Gluster-devel mailing list
>>     Gluster-devel@gluster.org <mailto:Gluster-devel@gluster.org>
>>     http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> --
>> Raghavendra G
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



--=20
Raghavendra G



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADRNtgRM17Eg3Z=LWifVNo=ai72dMiEVRKS3RwNfQ-dK7Pspew>