From owner-freebsd-current@FreeBSD.ORG Thu May 15 02:02:26 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E1E7837B401; Thu, 15 May 2003 02:02:26 -0700 (PDT) Received: from mail.imp.ch (mail.imp.ch [157.161.1.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6BD2E43FCB; Thu, 15 May 2003 02:02:25 -0700 (PDT) (envelope-from mb@imp.ch) Received: from cvs.imp.ch (cvs.imp.ch [157.161.4.9]) by mail.imp.ch (8.12.6p2/8.12.3) with ESMTP id h4F92KPq057364; Thu, 15 May 2003 11:02:21 +0200 (CEST) (envelope-from Martin.Blapp@imp.ch) Date: Thu, 15 May 2003 11:02:20 +0200 (CEST) From: Martin Blapp To: rwatson@freebsd.org Message-ID: <20030515101503.A47986@cvs.imp.ch> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org Subject: AMD non-blocking RPC problem now reproducable X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 May 2003 09:02:27 -0000 Hi all, As already told, we still encounter a AMD problem in pre 5.1. With help of Genesys of #bsdcode I could reproduce it here. I'm now able to reproduce it, but debugging is quite difficult ! It's not specific to linux clients. A FreeBSD client suffers too. Make at least two fs exported. Here in my example we use / and /usr On client, do the following: - Start amd - Run this loop: while true ; do amq -u /net/yourserver ; sleep 1 ; ls -ld \ /net/yourserver/usr/local || break ; done It is important that you list the imput of a subdir of use, because the first call seems to succeed always. It's the second one which fails. You will see output like: drwxr-xr-x 11 root root 512 May 5 14:11 /net/yourserver/usr/local It will fail after 2-150 successful trys. If the blocking case (old behaviour) is used within the mountd server, whis will not happen. Even more strange. If I attach a ktrace on the pid of mountd, the bug appears always ! I'm not sure if we trigger the same bug then, but it appears to me that we do. And I begin to suspect that it's timing related. The faster the network response, the less we hit this bug. This is a ktrace on the server ... 86984 mountd RET read 4 86984 mountd CALL gettimeofday(0x80589c0,0) 86984 mountd RET gettimeofday 0 86984 mountd CALL read(0x8,0x807a000,0x74) 86984 mountd GIO fd 8 read 116 bytes "~wG\^W\0\0\0\0\0\0\0\^B\0\^A\M^F\M-%\0\0\0\^C\0\0\0\^A\0\0\0\^A\0\0\0D>\M-COo\0\0\0\rlevais.imp.ch\0\0\0\0\0\0\0\ \0\0\0\0\0\0\0\b\0\0\0\0\0\0\0\0\0\0\0\^B\0\0\0\^C\0\0\0\^D\0\0\0\^E\0\0\0\^T\0\0\0\^_\0\0\0\0\0\0\0\0\0\0\0\^D/u\ sr" 86984 mountd RET read 116/0x74 86984 mountd CALL gettimeofday(0x80589c0,0) 86984 mountd RET gettimeofday 0 86984 mountd CALL read(0x8,0x80545c8,0x4) 86984 mountd RET read -1 errno 35 Resource temporarily unavailable 86984 mountd CALL close(0x8) 86984 mountd RET close 0 86984 mountd CALL select(0x8,0xbfbffb98,0,0,0) EAGAIN is ok, since we use non-blocking RPC. But something goes wrong then and the connection get's closed. Of course additional requests will fail then from client side then. May 15 10:27:31 myclient amd[38168]: mountd rpc failed: RPC: Unable to receive Martin Martin Blapp, ------------------------------------------------------------------ ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 61 826 93 00 Fax: +41 61 826 93 01 PGP: PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E ------------------------------------------------------------------