From owner-freebsd-hackers@FreeBSD.ORG Fri Dec 5 15:52:54 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AFE951DB; Fri, 5 Dec 2014 15:52:54 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0060.outbound.protection.outlook.com [157.56.111.60]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DCA60E42; Fri, 5 Dec 2014 15:52:53 +0000 (UTC) Received: from DM2PR04MB477.namprd04.prod.outlook.com (10.141.105.13) by DM2PR04MB478.namprd04.prod.outlook.com (10.141.105.16) with Microsoft SMTP Server (TLS) id 15.1.26.15; Fri, 5 Dec 2014 15:52:42 +0000 Received: from DM2PR04MB477.namprd04.prod.outlook.com ([169.254.15.29]) by DM2PR04MB477.namprd04.prod.outlook.com ([169.254.15.29]) with mapi id 15.01.0026.003; Fri, 5 Dec 2014 15:52:41 +0000 From: Mike Gelfand To: "freebsd-hackers@freebsd.org" Subject: Re: [BUG] Getting path to program binary sometimes fails Thread-Topic: [BUG] Getting path to program binary sometimes fails Thread-Index: AQHP/y1OF6HbydE23kCBwyNP5wFRKZxeymWAgAEZIgCACdtQgIAXSR6AgAA3SICAAAlgAA== Date: Fri, 5 Dec 2014 15:52:41 +0000 Message-ID: <27C465FC-E8C7-44CB-A812-65213BB8AC9F@logicnow.com> References: <91809230-5E81-4A6E-BFD6-BE8815A06BB2@logicnow.com> <201411201125.30087.jhb@freebsd.org> <2066750.N3TZpYSHCy@ralph.baldwin.cx> In-Reply-To: <2066750.N3TZpYSHCy@ralph.baldwin.cx> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [128.140.241.14] x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR04MB478; x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:;SRVR:DM2PR04MB478; x-forefront-prvs: 04163EF38A x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(51704005)(24454002)(199003)(377454003)(189002)(122556002)(19580395003)(4396001)(21056001)(76176999)(50986999)(40100003)(54356999)(82746002)(110136001)(87936001)(19580405001)(33656002)(31966008)(92566001)(86362001)(97736003)(106116001)(83716003)(20776003)(105586002)(107046002)(2351001)(99286002)(77156002)(93886004)(36756003)(66066001)(101416001)(120916001)(99396003)(46102003)(2656002)(62966003)(64706001)(106356001)(68736005)(102836002)(85282002)(104396002); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR04MB478; H:DM2PR04MB477.namprd04.prod.outlook.com; FPR:; SPF:None; MLV:sfv; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Content-Type: text/plain; charset="Windows-1252" Content-ID: <4B435E7CE4C74B45851B7A335ADC3626@namprd04.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: logicnow.com X-Mailman-Approved-At: Fri, 05 Dec 2014 16:11:07 +0000 Cc: Konstantin Belousov , "hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Dec 2014 15:52:54 -0000 On Dec 5, 2014, at 6:19 PM, John Baldwin wrote: > On Friday, December 05, 2014 12:01:15 PM Mike Gelfand wrote: >> John, >>=20 >> Sorry for late reply. >>=20 >> On Nov 20, 2014, at 7:25 PM, John Baldwin wrote: >>>> Since you=92re saying that current behavior is not a defect, maybe >>>> documentation is wrong (incomplete, misleading) then? I will readily >>>> accept >>>> the =93not a defect=94 explanation, but only if one wouldn=92t have to= ask you >>>> every time this oddity is met. If this is the expected error condition= , >>>> what should I do to get the path reliably? Should I retry (and how man= y >>>> times)? You=92re saying cache is being purged; does it mean that when = I >>>> ask for path then cache is populated again? Does it guarantee then tha= t >>>> I=92ll be able to get the path on next call? Could you guarantee that = I=92ll >>>> be able to get the path at all if I fail two or more times? Should I >>>> rely on ENOENT specifically when retrying?>=20 >>> Is this over NFS? NFS is more aggressive than local filesystems in >>> purging >>> name cache entries because there are inherent races in NFS with certain >>> fileservers (ones that don't use sub-second timestamps), so by default >>> entries always expire after about a minute. You can change that via th= e >>> 'nametimeo' mount option (takes a count in seconds). >>=20 >> No, not NFS but ZFS. Could that be an issue? The FreeBSD 8 machine I >> mentioned before has UFS. >>=20 >> Also, as you can see from the video I recorded (and from the code I >> provided), path resolution succeeds and fails within fractions of a seco= nd >> after process startup. >=20 > Are you seeing vnodes being actively recycled? In particular, do you see= =20 > vfs.numvnodes close to kern.maxvnodes? You can try raising kern.maxvnode= s. =20 > If vfs.numvnodes grows up to the limit then as long as you can stomach th= e RAM=20 > of having more vnodes around that would increase the changes of your path= s=20 > remaining valid. When the call works, sysctl returns: vfs.numvnodes: 59638 kern.maxvnodes: 204723 The times it doesn't, the output is: vfs.numvnodes: 60017 kern.maxvnodes: 204723 I've selected maximum numbers. Monitoring was made with while sysctl vfs.numvnodes kern.maxvnodes; do sleep 0.1; done So it seems that's not related, correct? 60K is much less than 200K.=