From owner-svn-src-head@FreeBSD.ORG Sun Jul 1 21:43:12 2012 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6ECDB106564A; Sun, 1 Jul 2012 21:43:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id C05178FC1A; Sun, 1 Jul 2012 21:43:11 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q61LhE4H090676; Mon, 2 Jul 2012 00:43:14 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q61Lh1f4057241; Mon, 2 Jul 2012 00:43:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q61Lh1Gl057240; Mon, 2 Jul 2012 00:43:01 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 2 Jul 2012 00:43:01 +0300 From: Konstantin Belousov To: Andreas Tobler Message-ID: <20120701214301.GQ2337@deviant.kiev.zoral.com.ua> References: <201206210926.q5L9Q6nR002030@svn.freebsd.org> <4FF03316.5050609@FreeBSD.org> <20120701120408.GM2337@deviant.kiev.zoral.com.ua> <4FF0528E.50002@FreeBSD.org> <20120701134132.GO2337@deviant.kiev.zoral.com.ua> <4FF05724.3050904@FreeBSD.org> <20120701170543.GP2337@deviant.kiev.zoral.com.ua> <4FF097E5.8030909@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pZGq8xo7gUAgOQb5" Content-Disposition: inline In-Reply-To: <4FF097E5.8030909@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r237367 - head/sys/fs/nfsclient X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Jul 2012 21:43:12 -0000 --pZGq8xo7gUAgOQb5 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jul 01, 2012 at 08:33:09PM +0200, Andreas Tobler wrote: > On 01.07.12 19:05, Konstantin Belousov wrote: > >On Sun, Jul 01, 2012 at 03:56:52PM +0200, Andreas Tobler wrote: > >>On 01.07.12 15:41, Konstantin Belousov wrote: > >>>On Sun, Jul 01, 2012 at 03:37:18PM +0200, Andreas Tobler wrote: > >>>>On 01.07.12 14:04, Konstantin Belousov wrote: > >>>>>On Sun, Jul 01, 2012 at 01:23:02PM +0200, Andreas Tobler wrote: > >>>>>>On 21.06.12 11:26, Konstantin Belousov wrote: > >>>>>>>Author: kib > >>>>>>>Date: Thu Jun 21 09:26:06 2012 > >>>>>>>New Revision: 237367 > >>>>>>>URL: http://svn.freebsd.org/changeset/base/237367 > >>>>>>> > >>>>>>>Log: > >>>>>>> Enable deadlock avoidance code for NFS client. > >>>>>> > >>>>>> > >>>>>>Hm, since this commit I fail with my nfs installworld/kernel. > >>>>>> > >>>>>>I have a builder which installs world/kernel to a nfs mounted=20 > >>>>>>directory. > >>>>>>Namely used for cross builds. > >>>>>> > >>>>>>Now since this commit I get the following when I install kernel to = the > >>>>>>nfs directory: > >>>>>> > >>>>>>.. > >>>>>>install -o root -g wheel -m 555 zfs.ko.symbols > >>>>>>/netboot/sparc64/boot/kernel > >>>>>>install: /netboot/sparc64/boot/kernel/zfs.ko.symbols: No such file = or > >>>>>>directory > >>>>>>*** [_kmodinstall] Error code 71 > >>>>>>.. > >>>>>> > >>>>>>The file is there, a local install of the tree works without proble= ms. > >>>>>>Reverting to r237366 also makes it work again. > >>>>>> > >>>>>>The server is a -CURRENT, r237880, The client, -CURRENT too. > >>>>>> > >>>>>>How can I help to track down the real issue? > >>>>> > >>>>>Is it always the same file in the install procedure which causes the > >>>>>failure ? Even more, is the failure pattern always the same ? > >>>> > >>>>I'd say so yes. When installing a kernel onto a nfs mounted fs then > >>>>always (in my cases) the zfs.ko.symbols was the failing pattern. > >>>>I tried ppc64 and sparc64 as target. With both it was the above file. > >>>> > >>>>When doing a installworld, it was, also in both cases, ppc64/sparc64, > >>>>the cc1 in libexec which failed. > >>>> > >>>>>Might be, start with ktrace-ing the whole make invocation, including > >>>>>the children processes. > >>>> > >>>>Some recipes how to start? > >>>ktrace -o -i make installkernel > >>>Then kdump and cut the lines around relevant failure. > >> > >>ktrace -f, right? > >Right, but without -i it is useless. >=20 > Ah, yes, seems clear now after reading the man page. >=20 > >>I placed the whole kdump here: > >> > >>http://people.freebsd.org/~andreast/dumped_installkernel.log > >> > >>It is not clear to me where the failure starts :) > >Because logs do not contain tracepoints from the children. > >See above about -i. > > > >I asked about excerpt because I expect the proper log to have an order > >of magnitude bigger size. >=20 > Ok. The dump is around 100MB, I hope I extracted as much as needed: >=20 > http://people.freebsd.org/~andreast/dumped_installkernel-7.log >=20 > >>>>>I used buildworld on the NFS-mounted obj/ as the test for the change= s. > >>>> > >>>>Here the obj is local, only the src and the destination is on the > >>>>nfs/netboot server. > >>> > >>>I just finished build on NFS obj/ and did several rounds of installs > >>>for world and kernel into nfs-mounted destdir. It seems I cannot=20 > >>>reproduce > >>>this locally. > >> > >>Ok. I try with an nfs obj too. >=20 > So, I was not able to reproduce the failure with an nfs mounted obj dir. >=20 > But I was able to reproduce the failure with three different machines=20 > which all have the obj local and the destination mounted via nfs. >=20 > Are you able to try with a local obj too? Below are two patches. Please follow my instructions literally to get most of your bug report. First, please apply the usr.bin/xinstall patch only, and retry installkernel (no need to use ktrace). It should show the proper error, short write, with zero-sized result, instead of garbage ENOENT from errno. Next, please apply the sys/fs/nfsclient patch, which should fix the core cause. diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c index 71286e3..f7af6fb 100644 --- a/sys/fs/nfsclient/nfs_clbio.c +++ b/sys/fs/nfsclient/nfs_clbio.c @@ -897,7 +897,7 @@ ncl_write(struct vop_write_args *ap) struct nfsmount *nmp =3D VFSTONFS(vp->v_mount); daddr_t lbn; int bcount; - int bp_cached, n, on, error =3D 0; + int bp_cached, n, on, error =3D 0, error1; size_t orig_resid, local_resid; off_t orig_size, tmp_off; =20 @@ -1259,9 +1259,12 @@ again: if ((ioflag & IO_SYNC)) { if (ioflag & IO_INVAL) bp->b_flags |=3D B_NOCACHE; - error =3D bwrite(bp); - if (error) + error1 =3D bwrite(bp); + if (error1 !=3D 0) { + if (error =3D=3D 0) + error =3D error1; break; + } } else if ((n + on) =3D=3D biosize) { bp->b_flags |=3D B_ASYNC; (void) ncl_writebp(bp, 0, NULL); diff --git a/usr.bin/xinstall/xinstall.c b/usr.bin/xinstall/xinstall.c index a920f85..3eba4f7 100644 --- a/usr.bin/xinstall/xinstall.c +++ b/usr.bin/xinstall/xinstall.c @@ -53,6 +53,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -671,11 +672,18 @@ copy(int from_fd, const char *from_name, int to_fd, c= onst char *to_name, if (size <=3D 8 * 1048576 && trymmap(from_fd) && (p =3D mmap(NULL, (size_t)size, PROT_READ, MAP_SHARED, from_fd, (off_t)0)) !=3D (char *)MAP_FAILED) { - if ((nw =3D write(to_fd, p, size)) !=3D size) { + nw =3D write(to_fd, p, size); + if (nw !=3D size) { serrno =3D errno; (void)unlink(to_name); - errno =3D nw > 0 ? EIO : serrno; - err(EX_OSERR, "%s", to_name); + if (nw >=3D 0) { + errx(EX_OSERR, + "short write to %s: %jd bytes written, %jd bytes asked to write", + to_name, (uintmax_t)nw, (uintmax_t)size); + } else { + errno =3D serrno; + err(EX_OSERR, "%s", to_name); + } } done_copy =3D 1; } @@ -684,8 +692,15 @@ copy(int from_fd, const char *from_name, int to_fd, co= nst char *to_name, if ((nw =3D write(to_fd, buf, nr)) !=3D nr) { serrno =3D errno; (void)unlink(to_name); - errno =3D nw > 0 ? EIO : serrno; - err(EX_OSERR, "%s", to_name); + if (nw >=3D 0) { + errx(EX_OSERR, + "short write to %s: %jd bytes written, %jd bytes asked to write", + to_name, (uintmax_t)nw, + (uintmax_t)size); + } else { + errno =3D serrno; + err(EX_OSERR, "%s", to_name); + } } if (nr !=3D 0) { serrno =3D errno; --pZGq8xo7gUAgOQb5 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk/wxGUACgkQC3+MBN1Mb4jp9QCgu2hlRy+3BKQb2ADZnRCzpBPL CLYAoM7c4jnQNMKAzfkTeAtZXvWfAJbc =tF1M -----END PGP SIGNATURE----- --pZGq8xo7gUAgOQb5--