Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Mar 2018 14:35:03 +0000
From:      NAGY Andreas <Andreas.Nagy@frequentis.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>, "'freebsd-stable@freebsd.org'" <freebsd-stable@freebsd.org>
Subject:   =?iso-8859-1?Q?RE:_NFS_4.1_RECLAIM=5FCOMPLETE_FS=A0failed_error_in_combin?= =?iso-8859-1?Q?ation_with_ESXi_client?=
Message-ID:  <D890568E1D8DD044AA846C56245166780124AFCFF8@vie196nt>
In-Reply-To: <YQBPR0101MB1042B17763E2605A7CE72EF5DDDF0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>
References:  <c5c624de-42bb-45cf-8cf0-b25be56e5f58@frequentis.com> <YQBPR0101MB1042DEF0825996764CBCA829DDC40@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>, <D890568E1D8DD044AA846C56245166780124AFB90E@vie196nt> <YQBPR0101MB1042479407CAA253674BBAEBDDDB0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM> <D890568E1D8DD044AA846C56245166780124AFBD21@vie196nt>, <D890568E1D8DD044AA846C56245166780124AFBD91@vie196nt> <YQBPR0101MB104225B6884FEC70A03C61CCDDDA0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>, <D890568E1D8DD044AA846C56245166780124AFC0E2@vie196nt>, <YQBPR0101MB1042040D2BFB3681E940D271DDDA0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>, <2feda1e2-16d5-43b5-98eb-dcc71cc67c6f@frequentis.com> <YQBPR0101MB10427C97161C74A5C441D1DCDDD80@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>, <D890568E1D8DD044AA846C56245166780124AFCABC@vie196nt> <YQBPR0101MB1042B17763E2605A7CE72EF5DDDF0@YQBPR0101MB1042.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks you, really great how fast you adapt the source/make patches for thi=
s. Saw so many posts were people did not get NFS41 working with ESXi and Fr=
eeBSD and now I have it already running with your changes.

I have now compiled the kernel with all 4 patches, and it works now.

Some problems are still left:

- the "Server returned improper reason for no delegation: 2" warnings are s=
till in the vmkernel.log.
		2018-03-08T11:41:20.290Z cpu0:68011 opID=3D488969b0)WARNING: NFS41: NFS41=
ValidateDelegation:608: Server returned improper reason for no delegation: =
2

- can't delete a folder with the VMware host client datastore browser:
		2018-03-08T11:34:00.349Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.349Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.349Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.350Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.350Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.350Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.351Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.352Z cpu1:67981 opID=3Df5159ce3)WARNING: NFS41: NFS41=
FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158:=
 Transient file system condition, suggest retry
		2018-03-08T11:34:00.352Z cpu1:67981 opID=3Df5159ce3)WARNING: UserFile: 21=
55: hostd-worker: Directory changing too often to perform readdir operation=
 (11 retries), returning busy

- after a reboot of the FreeBSD machine the ESXi does not restore the NFS d=
atastore again with following warning (just disconnecting the links is fine=
)
		2018-03-08T12:39:44.602Z cpu23:66484)WARNING: NFS41: NFS41_Bug:2361: BUG =
- Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP

Actually I have only made some quick benchmarks with ATTO in a Windows VM w=
hich has a vmdk on the NFS41 datastore which is mounted over two 1GB links =
in different subnets.
Read is nearly the double of just a single connection and write is just a b=
it faster. Don't know if write speed could be improved, actually the share =
is UFS on a HW raid controller which has local write speeds about 500MB/s.

At following link is the vmkernel.log from mouning the NFS share, attaching=
 a vmdk from the share to a Win VM, running ATTO benchmark on it, disconnec=
ting/reconnecting network and also the problem with the BIND_CONN_TO_SESSIO=
N error: NFS4ERR_NOTSUPP after reboot.
Till the reboot I have also made a trace on one of the two links. (nfs41_tr=
ace_before_reboot.pcap and nfs41_trace_after_reboot.pcap)

https://files.fm/u/wvybmdmc

andi

-----Original Message-----
From: Rick Macklem [mailto:rmacklem@uoguelph.ca]=20
Sent: Donnerstag, 8. M=E4rz 2018 03:48
To: NAGY Andreas <Andreas.Nagy@frequentis.com>; 'freebsd-stable@freebsd.org=
' <freebsd-stable@freebsd.org>
Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS=A0failed error in combination with=
 ESXi client

NAGY Andreas wrote:
>attached the trace. If I see it correct it uses FORE_OR_BOTH.=20
>(bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003))
Yes. The scary part is the ExchangeID before the BindConnectiontoSession.
(Normally that is only done at the beginning of a new mount to get a Client=
ID,  followed immediately by a CreateSession. I don't know why it would do =
this?)

The attached patch might get BindConnectiontoSession to work. I have no way=
 to test it beyond seeing it compile. Hopefully it will apply cleanly.

>The trace is only with the first patch, have not compiled the wantdeleg pa=
tches so >far.
That's fine. I don't think that matters much.

>I think this is related to the BIND_CONN_TO_SESSION; after a disconnect th=
e ESXi >cannot connect to the NFS also with this warning:
>2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361:=20
>>BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP
If the attached patch works, you'll find out what it fixes.

>Another thing I noticed today is that it is not possible to delete a folde=
r with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bu=
g, but with >NFS3 it works.
>
>Here the vmkernel.log with only one connection contains mounting, trying t=
o >delete a folder and disconnect:
>
>2018-03-07T16:46:04.543Z cpu12:68008 opID=3D55bea165)World: 12235: VC=20
>opID >c55dbe59 maps to vmkernel opID 55bea165 2018-03-07T16:46:04.543Z=20
>cpu12:68008 opID=3D55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server:=20
>10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: ,=20
>options: <none> 2018-03-07T16:46:04.543Z cpu12:68008=20
>opID=3D55bea165)StorageApdHandler: >977: APD Handle  Created with=20
>lock[StorageApd-0x43046e4c6d70] 2018-03-07T16:46:04.544Z=20
>cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming=20
>state, cluster 0x43046e4c7ee0 >[7] 2018-03-07T16:46:04.545Z cpu12:68008=20
>opID=3D55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120=20
>2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20
>>NFS41FSCompleteMount:3792: Max read xfer size: 0x20000=20
>2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20
>>NFS41FSCompleteMount:3793: Max write xfer size: 0x20000=20
>2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20
>>NFS41FSCompleteMount:3794: Max file size: 0x800000000000=20
>2018-03-07T16:46:04.545Z cpu12:68008 opID=3D55bea165)NFS41:=20
>>NFS41FSCompleteMount:3795: Max file name: 255 2018-03-07T16:46:04.545Z=20
>cpu12:68008 opID=3D55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800:=20
>The max file name size (255) of file system is >larger than that of FSS=20
>(128) 2018-03-07T16:46:04.546Z cpu12:68008 opID=3D55bea165)NFS41:=20
>>NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225=20
>mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000=20
>("/") 2018-03-07T16:46:04.546Z cpu12:68008 opID=3D55bea165)NFS41:=20
>>NFS41_VSIMountSet:435: nfsds1 mounted successfully=20
>2018-03-07T16:47:19.869Z cpu21:67981 opID=3De47706ec)World: 12235: VC=20
>opID >c55dbe91 maps to vmkernel opID e47706ec 2018-03-07T16:47:19.869Z=20
>cpu21:67981 opID=3De47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728:=20
>Failed to process READDIR result for fh 0x43046e4c6
I have no idea if getting BindConnectiontoSession working will fix this or =
not?

rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D890568E1D8DD044AA846C56245166780124AFCFF8>