FreeBSD Mail Archives

Date:      Fri, 20 Aug 2010 22:19:53 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        pluknet <pluknet@gmail.com>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: LOR on nfs: vfs_vnops.c:301 kern_descrip.c:1580
Message-ID:  <20100820191953.GN2396@deviant.kiev.zoral.com.ua>
In-Reply-To: <AANLkTin43JMZppMGgFDVhDKxbeE89ydkxwnFANfkrLc4@mail.gmail.com>
References:  <AANLkTimJ=d06D2z24QyRQ98zEa1Pemk4=vkNGLNiX90N@mail.gmail.com> <201008181607.52798.jhb@freebsd.org> <AANLkTikpC9REEQKNUUHtLDxSLB1MmiqKdgbJ0Ee7O%2BGJ@mail.gmail.com> <201008190934.28365.jhb@freebsd.org> <AANLkTin43JMZppMGgFDVhDKxbeE89ydkxwnFANfkrLc4@mail.gmail.com>


--6kwAx1lhQCGpXeLP
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Aug 20, 2010 at 08:55:08PM +0400, pluknet wrote:
> On 19 August 2010 17:34, John Baldwin <jhb@freebsd.org> wrote:
> > On Thursday, August 19, 2010 5:29:25 am pluknet wrote:
> >> On 19 August 2010 00:07, John Baldwin <jhb@freebsd.org> wrote:
> >> > On Wednesday, August 18, 2010 3:17:56 pm pluknet wrote:
> >> >> On 18 August 2010 23:11, pluknet <pluknet@gmail.com> wrote:
> >> >> > On 18 August 2010 17:46, Kostik Belousov <kostikbel@gmail.com> wr=
ote:
> >> >> >> On Wed, Aug 18, 2010 at 02:43:19PM +0400, pluknet wrote:
> >> >> >>> On 18 August 2010 12:07, pluknet <pluknet@gmail.com> wrote:
> >> >> >>> > On 17 August 2010 20:04, Kostik Belousov <kostikbel@gmail.com=
> wrote:
> >> >> >>> >
> >> >> >>> >>
> >> >> >>> >> Also please take a note of the John' suggestion to use the t=
askqueue.
> >> >> >>> >
> >> >> >>> > I decided to go this road. Thank you both.
> >> >> >>> > Now I do nfs buildkernel survive and prepare some benchmark r=
esults.
> >> >> >>> >
> >> >> >>>
> >> >> >>> So, I modified the patch to defer proc_create() with taskqueue(=
9).
> >> >> >>> Below is `time make -j5 buildkernel WITHOUT_MODULES=3Dyes` perf.
> >> > evaluation.
> >> >> >>> Done on 4-way CPU on clean /usr/obj with /usr/src & /usr/obj bo=
th
> >> >> >>> nfs-mounted over 1Gbit LAN.
> >> >> >>>
> >> >> >>> clean old
> >> >> >>> 1137.985u 239.411s 7:42.15 298.0% =9A =9A =9A 6538+2133k 87+433=
88io 226pf+0w
> >> >> >>>
> >> >> >>> clean new
> >> >> >>> 1134.755u 240.032s 7:41.25 298.0% =9A =9A =9A 6553+2133k 87+433=
67io 224pf+0w
> >> >> >>>
> >> >> >>> Patch needs polishing, though it generally works.
> >> >> >>> Not sure if shep_chan (or whatever name it will get) needs lock=
ing.
> >> >> >> As I said yesterday, if several requests to create nfsiod coming=
 one
> >> >> >> after another, you would loose all but the last.
> >> >> >>
> >> >> >> You should put the requests into the list, probably protected by
> >> >> >> nfs_iod_mtx.
> >> >> >>
> >> >> >
> >> >> > How about this patch? Still several things to ask.
> >> >> > 1) I used malloc instance w/ M_NOWAIT, since it's called with nfs=
_iod_mtx
> >> > held.
> >> >> > 2) Probably busy/done gymnastics is a wrong mess. Your help is
> >> > appreciated.
> >> >> > 3) if (1) is fine, is it right to use fail: logic (i.e. set
> >> >> > NFSIOD_NOT_AVAILABLE)
> >> >> > on memory shortage? Not tested.
> >> >> >
> >> >> > There are debug printf() left intentionally to see how 3 contexts=
 run
> >> > under load
> >> >> > to each other. I attached these messages as well if that makes se=
nse.
> >> >> >
> >> >>
> >> >> Ah, yes. Sorry, forgot about that.
> >> >
> >> > Your task handler needs to run in a loop until the list of requests =
is empty.
> >> > If multiple threads call taskqueue_enqueue() before your task is sch=
eduled,
> >> > they will be consolidated down to a single call of your task handler.
> >>
> >> Thanks for your suggestions.
> >>
> >> So I converted nfs_nfsiodnew_tq() into loop, and as such
> >> I changed a cleanup SLIST_FOREACH() as well to free a bulk of
> >> (potentially consolidated) completed requests in one pass.
> >> kproc_create() is still out of the SLIST_FOREACH() to avoid calling it
> >> with nfs_iod_mtx held.
> >>
> >> >
> >> > - =9A =9A =9A int error, i;
> >> > - =9A =9A =9A int newiod;
> >> > + =9A =9A =9A int i, newiod, error;
> >> >
> >> > You should sort these alphabetically if you are going to change this=
. =9AI would
> >> > probably just leave it as-is.
> >>
> >> Err.. that's resulted after a set of changes. Thanks for noting that.
> >>
> >> >
> >> > Also, I'm not sure how this works as you don't actually wait for the=
 task to
> >> > complete. =9AIf the taskqueue_enqueue() doesn't preempt, then you ma=
y read
> >> > ni_error as zero before the kproc has actually been created and retu=
rn success
> >> > even though an nfsiod hasn't been created.
> >> >
> >>
> >> I put taskqueue_drain() right after taskqueue_enqueue() to be in sync =
with
> >> task handler. That was part to think about, as I didn't find such a us=
e pattern.
> >> It seems though that tasks are launched now strictly one-by-one, witho=
ut
> >> visible parallelism (as far as debug printf()s don't overlap anymore).
> >
> > Ah, if it is safe to sleep then I have a couple of suggestions:
> >
>=20
> Cool, credits go to John :).
> I just adopted all of your changes (attached).
>=20
> > - Use M_WAITOK to malloc() so you don't have to worry about the failure=
 case
> > =9A(I see Rick already suggested this).
>=20
> After all that reduces to the following replacement in nfs_nfsiodnew().
> So, no regression should be there in theory.
>=20
>  mtx_unlock(&nfs_iod_mtx);
> - kproc_create(...)
> + malloc(...)
> mtx_lock(&nfs_iod_mtx);
>=20
> It survived after this simple test running for an hour, which
> forces creation of many iods with r/w from 300 threads:
>=20
> nfsserv:/home/svn/freebsd/obj_exp on /usr/obj (nfs)
>=20
> ./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
> ./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
> ./randomio /usr/obj/file 100 0.5 0 4096 60 100 &
> while [ true ]; do sysctl vfs.nfs.iodmax=3D2; sysctl vfs.nfs.iodmax=3D60;
> sleep 20; done
>=20
> randomio is from ports/149838.
>=20
>=20
> P.S.
> it's funny to see in top
>     0 root       10  44    0     0K   144K deadlk  2  23:16 28.86% kernel
> Someone might think the kernel is in deadlock.

It seems nobody replied to the mdf@ objection against wait of the
new proc startup being equivalent to the LOR. I think that the wait
is safe, because the task is executed in the context of
the different process then the enqueue request.
This might be worth noting in the comment or commit message.

You should drain the taskq and prevent creation of new nfsiods before
waiting for existing nfsiods to exit on module unload.

I find it somewhat weird that nip is removed from the list in the
task handler, but freed in the enqueue thread. You could provide
pointers to the local variables to exchange done/exit codes.
I do not insist on this part.

--6kwAx1lhQCGpXeLP
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkxu1VgACgkQC3+MBN1Mb4iKEACg5/0bvwUd2PFkn5tSeFR6BaMw
JPoAnR9kQdlIrE8uONgUC7uLOVt3CE0U
=39jH
-----END PGP SIGNATURE-----

--6kwAx1lhQCGpXeLP--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100820191953.GN2396>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation