From owner-freebsd-stable@freebsd.org Wed Jul 1 22:04:57 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 99CE09926DB for ; Wed, 1 Jul 2015 22:04:57 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2C2971B6F for ; Wed, 1 Jul 2015 22:04:57 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wiwl6 with SMTP id l6so178567033wiw.0 for ; Wed, 01 Jul 2015 15:04:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=eSyaiOAosxvZ0CAEyQhd0sUoiG19HngJapk9ecbBAj4=; b=g+yR6pgJPfGEdoEYthOsSf3+EvTc3muB4ceQfJ2fHMCMUwhomQL8qmEiuYrZRUASQ2 aaVvWsrwpDVndtlsGGHQPOUJMUuEprXzBpOf+yrnukuy3qmB9JTncO9hSKGfhvcgKnyU mSnCEj1hoBVvWHnqdx5KcoBrFcwugM2D0mjsVdanOl9Ic8rLYBuJIRXFW874BdmdprMh c20hDghfIetdXNAB+riVsb7qwPrmzUOFKuI/q0JcYD8YRnnz17lxNl9AKrya3ox4mtRq 1RB10wD3/s75IccTHaDk4ZMeLMAeyWXhzZaYlD3BjFpGElaZov6tySUT0iTdUeOoF5DK R/CA== X-Received: by 10.180.79.133 with SMTP id j5mr48514112wix.38.1435788295365; Wed, 01 Jul 2015 15:04:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Wed, 1 Jul 2015 15:04:35 -0700 (PDT) From: Ahmed Kamal Date: Thu, 2 Jul 2015 00:04:35 +0200 Message-ID: Subject: Linux NFSv4 clients are getting (bad sequence-id error!) To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2015 22:04:57 -0000 Hi all, *warning*: Sorry I'm cross-posting this from freebsd-fs, things are too quite there unfortunately I'm a refugee from linux land. I just set up my first freebsd 10.1 zfs box, sharing /home over nfs. Since every home directory is its own zfs dataset, I chose to use nfsv4 to enable recursively sharing/mounting any directory under /home (I understand nfs4 is a must in this scenario!) I'm able to mount form linux (rhel5 latest kernel) successfully. Users are working fine. However every now and then a user screams that his session is frozen. Usually the processes are stuck in nfs_wait or rpc_* state. I tried using a much newer linux kernel (3.2 however it still faced the same problem). The errors in Linux log files are mostly: Jul 1 17:41:47 mammoth kernel: NFS: v4 server nas returned a *bad sequence-id error*! Jul 1 17:52:32 mammoth kernel: nfs4_reclaim_locks: unhandled error -11. Zeroing state Jul 1 17:52:32 mammoth kernel: nfs4_reclaim_open_state: Lock reclaim failed! My search led me to (https://access.redhat.com/solutions/1328073) a detailed analysis of the issue, which you can read over here https://dl.dropboxusercontent.com/u/51939288/nfs4-bad-seq.pdf .. NetApp confirmed this was a bug for them (I'm wondering if this is still in FreeBSD?!) PS: Right before sending this, I saw dmesg on the freebsd box advising increasing vfs.nfsd.tcphighwater .. So I up'ed that to 64000. I also up'ed the number of nfs server threads (-t) from 10 to 60 (we're roughly 40 linux machines) Any advice is most appreciated! Thanks