Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Jul 2016 15:49:01 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        karl@denninger.net
Cc:        freebsd-net@freebsd.org
Subject:   Re: IPv6 -> IPv4 fallback broken in serf, kernel bug?
Message-ID:  <201607262249.u6QMn1cY082332@gw.catspoiler.org>
In-Reply-To: <4b7e5fc9-7bc6-02e0-f147-3a5cb0e41788@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 26 Jul, Karl Denninger wrote:
> On 7/26/2016 10:59, Don Lewis wrote:
>> Serf has some code to fall back from IPv4 if an IPv6 and more generally
>> try different addresses on multi-homed servers if connection attempts
>> fail, but it does not work properly on recent versions of FreeBSD. I've
>> tested both recent FreeBSD 10.3-STABLE and HEAD.
>>
>> The way that it is supposed to work is that serf creates a socket, sets
>> it non-blocking, calls connect(), and then passes the fd to poll(). When
>> the connection attempt fails, it expects to see a POLLERR event.  The
>> POLLERR event handler will then call getsockopt(fd, SOL_SOCKET,
>> SO_ERROR, &error, ...).  If the returned error is ECONNREFUSED or one of
>> a couple of other errors, then serf will move on to the next address.
>>
>> Instead what happens is that serf also(?) sees POLLIN set, which it
>> processes first by calling read(), which returns an ECONNREFUSED error.
>> That not a documented error return from read().
>>
>> An easy way to test this is to truss svn and attempt to do an http
>> checkout from a host that has both IPv6 and IPv4 addresses, but is not
>> listening on port 80.  The only connection attempt will be to the IPv6
>> address.
>>
>> socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6)	 = 4 (0x4)
>> fcntl(4,F_GETFL,)				 = 2 (0x2)
>> fcntl(4,F_SETFL,O_NONBLOCK|0x2)			 = 0 (0x0)
>> setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4)	 = 0 (0x0)
>> gettimeofday({ 1469515046.979461 },0x0)		 = 0 (0x0)
>> connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress'
>> gettimeofday({ 1469515046.979614 },0x0)		 = 0 (0x0)
>> kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
>> kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0)
>> kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2)
>> read(4,0x80549c064,8000)			 ERR#61 'Connection refused'
>> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
>> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
>> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
>> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
>> close(4)					 = 0 (0x0)
>> close(3)					 = 0 (0x0)
>> svn: E170013: Unable to connect to a repository at URL ...
>>
>>
>> It looks like it should be possible to patch serf to handle this, but:
>>   * Should POLLIN be set for this event?
>>   
>>   * What errno value should read() return in this case, if it is
>>     ECONNREFUSED, then that should be documented.
>>
>>
> This is kinda serious in that the above manifestation in svn effectively
> disables it for those of us that are on IPv4 connections and have no
> provider capability for IPv6 at the present time.   When I was running
> 10.2 this was not a problem but as soon as I rolled forward to 11.x it
> showed up.

I saw it on 10.3-STABLE, but I don't see any changes in the kernel
source between the stable/10 branch point and the tip of that branch
that look suspicious.  I'll try to find some time to write a simple test
case and run it on some older releases as well as on Linux.

It looks to me like soisdisconnected() should not do a read wakeup if
the socket was never in a connected state.  I think it should also set a
new flag to indicate whether or not the socket was previously connected
so that read() and write() can return the proper errno value if the
socket was never connected.

> Fortunately svnlite does work, but if this same breakage manages to
> migrate there as well.......

I'm surprised that svnlite is working for you.  The truss output looks
the same to me as svn and the serf fallback code is the same.

socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6)	 = 4 (0x4)
fcntl(4,F_GETFL,)				 = 2 (0x2)
fcntl(4,F_SETFL,O_NONBLOCK|0x2)			 = 0 (0x0)
setsockopt(0x4,0x6,0x1,0x7fffffffddb4,0x4)	 = 0 (0x0)
gettimeofday({ 1469572654.492874 },0x0)		 = 0 (0x0)
connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xx]:80 },28) ERR#36 'Operation now in progress'
gettimeofday({ 1469572654.493011 },0x0)		 = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x802898300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x802898300 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x802898300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x802898300 },32,{ 0.500000000 }) = 2 (0x2)
read(4,0x80289d064,8000)			 ERR#61 'Connection refused'
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory'
close(4)					 = 0 (0x0)
close(3)					 = 0 (0x0)
svn: E170013: Unable to connect to a repository at URL ...

The host I pointed svnlite at has both IPv4 and IPv6 addresses in DNS,
but it is only listening to IPv4 on port 80.


A lack of connectivity that results in the IPv6 connection requests
getting dropped into a black hole might behave differently.  I'm not
sure that serf/apr wait for the ETIMEDOUT error to occur and may bail
out early.  In that case they won't see the POLLIN event and won't take
the wrong code path that bypasses the fallback.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201607262249.u6QMn1cY082332>