Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 Dec 2014 15:52:41 +0000
From:      Mike Gelfand <Mike.Gelfand@LogicNow.com>
To:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, "hackers@freebsd.org" <hackers@freebsd.org>
Subject:   Re: [BUG] Getting path to program binary sometimes fails
Message-ID:  <27C465FC-E8C7-44CB-A812-65213BB8AC9F@logicnow.com>
In-Reply-To: <2066750.N3TZpYSHCy@ralph.baldwin.cx>
References:  <91809230-5E81-4A6E-BFD6-BE8815A06BB2@logicnow.com> <201411201125.30087.jhb@freebsd.org> <BC392D92-5DD4-4012-90D4-17C4BC1566CE@logicnow.com> <2066750.N3TZpYSHCy@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 5, 2014, at 6:19 PM, John Baldwin <jhb@freebsd.org> wrote:

> On Friday, December 05, 2014 12:01:15 PM Mike Gelfand wrote:
>> John,
>>=20
>> Sorry for late reply.
>>=20
>> On Nov 20, 2014, at 7:25 PM, John Baldwin <jhb@freebsd.org> wrote:
>>>> Since you=92re saying that current behavior is not a defect, maybe
>>>> documentation is wrong (incomplete, misleading) then? I will readily
>>>> accept
>>>> the =93not a defect=94 explanation, but only if one wouldn=92t have to=
 ask you
>>>> every time this oddity is met. If this is the expected error condition=
,
>>>> what should I do to get the path reliably? Should I retry (and how man=
y
>>>> times)? You=92re saying cache is being purged; does it mean that when =
I
>>>> ask for path then cache is populated again? Does it guarantee then tha=
t
>>>> I=92ll be able to get the path on next call? Could you guarantee that =
I=92ll
>>>> be able to get the path at all if I fail two or more times? Should I
>>>> rely on ENOENT specifically when retrying?>=20
>>> Is this over NFS?  NFS is more aggressive than local filesystems in
>>> purging
>>> name cache entries because there are inherent races in NFS with certain
>>> fileservers (ones that don't use sub-second timestamps), so by default
>>> entries always expire after about a minute.  You can change that via th=
e
>>> 'nametimeo' mount option (takes a count in seconds).
>>=20
>> No, not NFS but ZFS. Could that be an issue? The FreeBSD 8 machine I
>> mentioned before has UFS.
>>=20
>> Also, as you can see from the video I recorded (and from the code I
>> provided), path resolution succeeds and fails within fractions of a seco=
nd
>> after process startup.
>=20
> Are you seeing vnodes being actively recycled?  In particular, do you see=
=20
> vfs.numvnodes close to kern.maxvnodes?  You can try raising kern.maxvnode=
s. =20
> If vfs.numvnodes grows up to the limit then as long as you can stomach th=
e RAM=20
> of having more vnodes around that would increase the changes of your path=
s=20
> remaining valid.

When the call works, sysctl returns:
    vfs.numvnodes: 59638
    kern.maxvnodes: 204723
The times it doesn't, the output is:
    vfs.numvnodes: 60017
    kern.maxvnodes: 204723
I've selected maximum numbers. Monitoring was made with
    while sysctl vfs.numvnodes kern.maxvnodes; do sleep 0.1; done

So it seems that's not related, correct? 60K is much less than 200K.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?27C465FC-E8C7-44CB-A812-65213BB8AC9F>