Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 31 May 2011 12:21:33 -0700
From:      Artem Belevich <art@freebsd.org>
To:        David P Discher <dpd@bitgravity.com>
Cc:        freebsd-fs@freebsd.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: ZFS: arc_reclaim_thread running 100%, 8.1-RELEASE, LBOLT related
Message-ID:  <BANLkTikO9J6NhCZi9K2YdftmjC75LhdO3A@mail.gmail.com>
In-Reply-To: <C48BAB2B-5667-458D-86A1-8B48C4189560@bitgravity.com>
References:  <0EFD28CD-F2E9-4AE2-B927-1D327EC99DB9@bitgravity.com> <BANLkTikVq0-En7=4Dy_dTf=tM55Cqou_mw@mail.gmail.com> <4DE50811.5060606@FreeBSD.org> <BANLkTinQkS36SQKpZhsavx3C_ad838DG=g@mail.gmail.com> <4DE51DD3.6040602@FreeBSD.org> <C48BAB2B-5667-458D-86A1-8B48C4189560@bitgravity.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, May 31, 2011 at 12:07 PM, David P Discher <dpd@bitgravity.com> wrot=
e:
>
> On May 31, 2011, at 9:56 AM, Andriy Gapon wrote:
>
>>>
>>> FYI, we've already changed clock_t for opensolaris code to int64_t in
>>> r218169 regardless of $MACHINE.
>>
>> I think that's a good solution.
>> I hope that we don't have any place where osol clock_t would somehow mix=
 with fbsd
>> clock_t with detrimental effects.
>
> Well, clock_t to int64_t only fixes l2arc_reclaim_thread.
>
> If you look at lines 1712-1725 in sys/cddl/contrib/opensolaris/uts/common=
/fs/zfs/arc.c, the arc_evict(), (8.1-release), you'll see that clock_t is n=
ot use. =A0So, yes, clock_t will fix the l2arc issue, but not the arc_recla=
im/arc evict issue of the signed 64bit int overflowing for LBOLT.

No disagreement there. The way LBOLT is implemented now would indeed
cause the overflow that leads to CPU hogging you've reported.

> And from the conversation on this thread, there isn't any agreement on ho=
w to really fix it.

Andriy has proposed potential fix that would eliminate multiplication
of gethrtime result by hz and would delay overflow by few hundred
years. Should be good enough.

> Someone please explain to me the units of LBOLT?

HZ ticks per second.

>
> hz is cycles per second right ?
>
> =A0 =A0#define =A0 =A0 LBOLT =A0 ((gethrtime() * hz) / NANOSEC)
>
>
> gethrtime() is returning nanoseconds. =A0 How is LBOLT in ticks-per-secon=
d ?

gethrtime  is indeed nanoseconds. After division by NANOSES it becomes
old good seconds. Multiplication by hz (which is ticks/sec) the whole
thing represents just 'ticks'

>
> In sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c :
>
> =A0 =A0152 /* number of seconds before growing cache again */
> =A0 =A0153 static int =A0 =A0 =A0 =A0 =A0 =A0 =A0arc_grow_retry =3D 60;

That would be in seconds.

> =A0 =A0...
> =A0 =A02255 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* reset the =
growth delay for every reclaim */
> =A0 =A02256 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 growtime =3D =
LBOLT + (arc_grow_retry * hz);

(arc_grow_retry in seconds * hz in ticks/sec) --> ticks
>
>
> LBOLT is then clearly using it math with ticks-per-second. =A0 I'm I just=
 crazy ? =A0It seems to me that in this case, line 2256 ... can't be added =
together.

LBOLT is in 'ticks', not ticks/sec.

--Artem



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTikO9J6NhCZi9K2YdftmjC75LhdO3A>