Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Jul 2020 15:50:11 -0700
From:      Benjamin Kaduk <kaduk@mit.edu>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Benjamin Kaduk <bjkfbsd@gmail.com>, Rick Macklem <rmacklem@freebsd.org>, src-committers <src-committers@freebsd.org>, "svn-src-projects@freebsd.org" <svn-src-projects@freebsd.org>
Subject:   Re: svn commit: r362798 - in projects/nfs-over-tls/sys/rpc: . rpcsec_tls
Message-ID:  <20200701225011.GH58278@kduck.mit.edu>
In-Reply-To: <QB1PR01MB336412382A4903F74CD28F69DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>
References:  <202006301449.05UEnq2x072917@repo.freebsd.org> <CAJ5_RoDe=_s2LZociYXTmdVOP%2BLJDA5HJ7jZkKr7LChffbaH8w@mail.gmail.com> <QB1PR01MB336441A427B14216A4A20384DD6F0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <20200630163340.GN58278@kduck.mit.edu> <QB1PR01MB3364FE7A60B953C2D730E6F3DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <QB1PR01MB33642D5CC58DF44548BB1911DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM> <20200701022040.GE58278@kduck.mit.edu> <QB1PR01MB336412382A4903F74CD28F69DD6C0@QB1PR01MB3364.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 01, 2020 at 10:47:19PM +0000, Rick Macklem wrote:
> Benjamin Kaduk wrote:
> >On Wed, Jul 01, 2020 at 01:23:50AM +0000, Rick Macklem wrote:
> >> Rick Macklem wrote:
> >> >Benjamin Kaduk wrote:
> >> >>On Tue, Jun 30, 2020 at 04:20:45PM +0000, Rick Macklem wrote:
> >> >>> If you happen to know how to set a timeout for SSL_connect() in the openssl
> >> >>> library, I would be interested in hearing that.
> >> >>
> >> >>As it happens, I took a look before I wrote the initial note, and there
> >> >>doesn't seem to be any intrinsic TLS (not DTLS) handshake timeouts in
> >> >>libssl itself; I expect this is actually just the (kernel's!) TCP timeout.
> >> >>So you'd be getting the socket fd (e.g., SSL_get_fd(), if you don't have a
> >> >>reference already) and using setsockopt() to set the timeout(s).
> >> >Interesting. The test case I simulated did not close the TCP socket used by
> >> >SSL_connect(). The server just replied to the STARTTLS Null RPC, but did not
> >> >call SSL_accept(), so the server side just isn't playing "handshake".
> >> >"netstat -a" showed the connection as ESTABLISHED.
> >> >During debugging, I also used the trick of putting:
> >> >    while (1)
> >> >        sleep(1);
> >> >right after the SSL_connect() call and, when watching it via "ps",
> >> >it would switch from "sbwait" to "nanoslp" after 6 minutes and
> >> >a syslog() call showed that SSL_connect() had returned -1.
> >> >
> >> >So, if the TCP connection was "established", what caused the SSL_connect()
> >> >to return with an error (-1) after 6 minutes?
> >> >
> >> >Now, there is a 6 minute idle timeout in the RPC code for TCP where it,
> >> >by default, closes the connection when there is 6 minutes without any
> >> >activity. (I have to look if waiting for a reply for the upcall implies "no activity" and >if
> >> >this also happens for AF_LOCAL sockets, which is what the upcalls use.)
> >> Ok, I figured out what is happening for this test.
> >> It is the 6 minute idle timeout, but it occurs at the server end, where the NFS server
> >> end shuts down the TCP connection.
> >
> >Ah, that makes sense.
> >
> >> Now, the client cannot assume all servers will do this.
> >
> >Right.
> >
> >> I'm going to try playing around with doing a shutdown of the socket on the
> >> client end after a shorter timeout on the upcall and see if that can get
> >> SSL_connect() to return with a failure in the daemon.
> >>
> >> >Now, if that happens, a SIGPIPE would be posted to the daemon, which
> >> >is SIG_IGN'd by the daemon. But maybe the SIGPIPE somehow causes
> >> >SSL_connect() to return -1 by making the syscall it is doing (read/recv on the
> >> >TCP socket sitting in sbwait) return EINTR, or something like that?
> >> Ignore this "theory". It was bunk.
> >
> >Non-ignored signals would cause SSL_connect() to return, but ignored ones
> >should be wholly ignored, yes.
> >
> >> >I can change this 6minute timeout to see if that affects it.
> >> Can't be changed, since it is at the server end of the TCP connection.
> >
> >Can't you set a client-side (e.g., read) timeout, though?
> Well, in this case it would be the read (or recv or ??) that is done inside the
> SSL_connect().
> 
> The timer I can control is the one that I had set to 10minutes, which times out
> the upcall RPC to the userland daemon. I had set it to 10minutes so the
> SSL_connect() would time out first, but now that I know that won't always happen..
> This timer is now set to 15sec and after it times out, the kernel code does a
> soshutdown(so, SHUT_RD) in the client, which seems to be sufficient to get
> SSL_connect() to return an error.
> 
> This seems sufficient and works ok for the testing I've done.

I don't think what you ended up with is wrong, to be clear.

But, you have an SSL* as input to SSL_connect(), and you can call
SSL_get_fd() on that SSL*, which will give you a socket fd that you can
call setsockopt() on, if you're so inclined.  The SSL_connect() abstraction
barrier is not leak-proof :)

> 15sec is pretty arbitrary, but I figure a timeout on the order of seconds is
> reasonable for RPC upcalls to the local daemon. (I'd guess that taking even
> 1sec to do an upcall would indicate something is broken.)
> If others feel 15sec isn't an appropriate timeout, feel free to comment.
> (Note that this timeout should only happen when something is broken, like
>  the server that does a "STARTTLS" reply but does not do a TLS handshake.)

Understood.

Thanks,

Ben



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200701225011.GH58278>