Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 1 Sep 2012 19:57:50 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        =?utf-8?Q?Attila_Bog=C3=A1r?= <attila.bogar@linguamatics.com>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: NFS: rpcsec_gss with Linux clients
Message-ID:  <817398955.1415204.1346543870350.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <5040DABD.20001@linguamatics.com>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_1415203_641086622.1346543870347
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Attila Bogar wrote:
> Hi,
> 
> In the wireshark trace I see, that during an NFS mount, Linux opens
> two
> TCP connections.
> Linux creates the GSS conect on one tcp connection, sends a DESTROY
> destroys rpcsec,
> but immediately (without waiting for the DESTROY reply) - reusing the
> context on the other TCP connection.
> 
> I don't know who is guilty the BSD or the Linux (or both) as I haven't
> spent time reading the RFCs.
> 
This certainly sounds bogus. I can see an argument for 2 TCP connections
for trunking, but since a security context should only be destroyed
when the client is done with it, doing a DESTROY doesn't make sense?
(There is something in the RPC header called a "handle". It identifies
the security context, and it would be nice to check the wireshark
trace to see if it the same as the one being used on the other connection?)

> This is very difficult to reproduce if the server is very fast. You
> have to use an extremely fast client.
> With a Linux virtual machine I couldn't reproduce. Even printf's in
> the
> bsd kernel destroy the balance and everything starts to suddenly work
> because of the timing. This is a quantum bug.
> 
> Look at /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c
> 
> In svc_rpc_gss()
> case RPCSEC_GSS_DESTROY:
> 
> svc_rpc_gss_validate returns FALSE during the DESTROY.
> 
> I don't quite know why, but during the destroy within the the
> svc_rpc_gss_validate() gss_verify_mic() returns maj_stat =
> GSS_S_DEFECTIVE_TOKEN, no matter what heimdal version I use.
> 
That would indicate the encrypted checksum isn't correct. It
might be using an algorithm only supported by the newer RPCSEC_GSS_V3?

> As a consequence, client->cl_state is marked CLIENT_STALE;
> 
For DESTROY when it will fail, I'm not sure if marking the
context stale makes sense. (I can see an argument for and against
doing this.)

I've attached a small patch with disables setting client->cl_state
to CLIENT_STALE for this case, which you could try, to see if it
helps?

> I think client locking should have been used at this point.
> 
> In the meantime the next TCP connection's nfs PUTROOTFH request is
> being
> processed in the kernel.
> 
> And this is the point where the problem may or may not happen.
> In svc_rpc_gss() at the beginning svc_rpc_gss_timeout_clients() is
> called.
> If it's called before svc_rpc_gss_validate() marked the cl_state
> CLIENT_STALE and the Linux client survived.
> 
> Here is my patch for review. This is my first ever kernel patch.
> 
> I'm going to open a PR...
> 
I'd suggest contacting the Linux folks first and see if they are
willing to look at the wireshark trace or know of an issue/fix,
because it really sounds like a Linux client issue.

> Constructive comments are welcome.
> 
> Thanks,
> Attila
> 
> --- /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c.orig 2012-08-30
> 23:34:00.000000000 +0100
> +++ /usr/src/sys/rpc/rpcsec_gss/svc_rpcsec_gss.c 2012-08-31
> 15:59:40.000000000 +0100
> @@ -565,7 +565,8 @@
> */
> client->cl_state = CLIENT_NEW;
> client->cl_locked = FALSE;
> - client->cl_expiration = time_uptime + 5*60;
> + /* we are now more cautious */
> + client->cl_expiration = time_uptime + 4*60;
> 
Waiting 4 minutes instead of 5 shouldn't have any real effect,
although it might avoid the problem for your case w.r.t. timing.

> return (client);
> }
> @@ -930,7 +931,11 @@
> if (cred_lifetime == GSS_C_INDEFINITE)
> cred_lifetime = time_uptime + 24*60*60;
> 
> - client->cl_expiration = time_uptime + cred_lifetime;
> + /*
> + * we are now more cautious
> + * 12 sec is just an adhoc hack value
> + */
> + client->cl_expiration = time_uptime + cred_lifetime - 12;
> 
This time is usually the TGT lifetime (12->24hrs), so subtracting
12 sec from it doesn't really make any sense. (I will note that
the calculation of cred_lifetime for the GSS_C_INDEFINITE case
looks incorrect, since time_uptime gets added twice, but I doubt
that's relevant to your problem, since it is set to more than 24hrs.)

> /*
> * Fill in cred details in the rawcred structure.
> @@ -990,7 +995,7 @@
> gss_buffer_desc rpcbuf, checksum;
> OM_uint32 maj_stat, min_stat;
> gss_qop_t qop_state;
> - int32_t rpchdr[128 / sizeof(int32_t)];
> + int32_t rpchdr[2048 / sizeof(int32_t)];
> int32_t *buf;
> 
> rpc_gss_log_debug("in svc_rpc_gss_validate()");
> @@ -1024,7 +1029,12 @@
> if (maj_stat != GSS_S_COMPLETE) {
> rpc_gss_log_status("gss_verify_mic", client->cl_mech,
> maj_stat, min_stat);
> - client->cl_state = CLIENT_STALE;
> + /*
> + * Linux nfs-utils>=1.2.3 is re-using GSS context
> + * on other TCP NFS connection after it DESTROYED it
> + * The garbage collector will remove client at cl_expiration
> + */
> + /* client->cl_state = CLIENT_STALE; */
> return (FALSE);
> }
> 
If this helps, please try the attached patch which does the
same thing, but only for the DESTROY case.

rick

> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

------=_Part_1415203_641086622.1346543870347
Content-Type: text/x-patch; name=rpcsec-destroy.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=rpcsec-destroy.patch

LS0tIHJwYy9ycGNzZWNfZ3NzL3N2Y19ycGNzZWNfZ3NzLmMuc2F2CTIwMTItMDktMDEgMTk6MjA6
MzUuMDAwMDAwMDAwIC0wNDAwCisrKyBycGMvcnBjc2VjX2dzcy9zdmNfcnBjc2VjX2dzcy5jCTIw
MTItMDktMDEgMTk6MjQ6MTUuMDAwMDAwMDAwIC0wNDAwCkBAIC05ODQsNyArOTg0LDcgQEAgc3Zj
X3JwY19nc3NfYWNjZXB0X3NlY19jb250ZXh0KHN0cnVjdCBzdgogCiBzdGF0aWMgYm9vbF90CiBz
dmNfcnBjX2dzc192YWxpZGF0ZShzdHJ1Y3Qgc3ZjX3JwY19nc3NfY2xpZW50ICpjbGllbnQsIHN0
cnVjdCBycGNfbXNnICptc2csCi0gICAgZ3NzX3FvcF90ICpxb3ApCisgICAgZ3NzX3FvcF90ICpx
b3AsIHJwY19nc3NfcHJvY190IGdjcHJvYykKIHsKIAlzdHJ1Y3Qgb3BhcXVlX2F1dGgJKm9hOwog
CWdzc19idWZmZXJfZGVzYwkJIHJwY2J1ZiwgY2hlY2tzdW07CkBAIC0xMDI0LDcgKzEwMjQsOCBA
QCBzdmNfcnBjX2dzc192YWxpZGF0ZShzdHJ1Y3Qgc3ZjX3JwY19nc3NfCiAJaWYgKG1hal9zdGF0
ICE9IEdTU19TX0NPTVBMRVRFKSB7CiAJCXJwY19nc3NfbG9nX3N0YXR1cygiZ3NzX3ZlcmlmeV9t
aWMiLCBjbGllbnQtPmNsX21lY2gsCiAJCSAgICBtYWpfc3RhdCwgbWluX3N0YXQpOwotCQljbGll
bnQtPmNsX3N0YXRlID0gQ0xJRU5UX1NUQUxFOworCQlpZiAoZ2Nwcm9jICE9IFJQQ1NFQ19HU1Nf
REVTVFJPWSkKKwkJCWNsaWVudC0+Y2xfc3RhdGUgPSBDTElFTlRfU1RBTEU7CiAJCXJldHVybiAo
RkFMU0UpOwogCX0KIApAQCAtMTM1OCw3ICsxMzU5LDcgQEAgc3ZjX3JwY19nc3Moc3RydWN0IHN2
Y19yZXEgKnJxc3QsIHN0cnVjdAogCQkJYnJlYWs7CiAJCX0KIAotCQlpZiAoIXN2Y19ycGNfZ3Nz
X3ZhbGlkYXRlKGNsaWVudCwgbXNnLCAmcW9wKSkgeworCQlpZiAoIXN2Y19ycGNfZ3NzX3ZhbGlk
YXRlKGNsaWVudCwgbXNnLCAmcW9wLCBnYy5nY19wcm9jKSkgewogCQkJcmVzdWx0ID0gUlBDU0VD
X0dTU19DUkVEUFJPQkxFTTsKIAkJCWJyZWFrOwogCQl9Cg==
------=_Part_1415203_641086622.1346543870347--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?817398955.1415204.1346543870350.JavaMail.root>