From owner-freebsd-net@freebsd.org Tue Jul 26 15:59:22 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 32AF9BA40CB for ; Tue, 26 Jul 2016 15:59:22 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id EB7F11EDE for ; Tue, 26 Jul 2016 15:59:21 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6QFxF8a081339 for ; Tue, 26 Jul 2016 08:59:19 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607261559.u6QFxF8a081339@gw.catspoiler.org> Date: Tue, 26 Jul 2016 08:59:15 -0700 (PDT) From: Don Lewis Subject: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: freebsd-net@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2016 15:59:22 -0000 Serf has some code to fall back from IPv4 if an IPv6 and more generally try different addresses on multi-homed servers if connection attempts fail, but it does not work properly on recent versions of FreeBSD. I've tested both recent FreeBSD 10.3-STABLE and HEAD. The way that it is supposed to work is that serf creates a socket, sets it non-blocking, calls connect(), and then passes the fd to poll(). When the connection attempt fails, it expects to see a POLLERR event. The POLLERR event handler will then call getsockopt(fd, SOL_SOCKET, SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of a couple of other errors, then serf will move on to the next address. Instead what happens is that serf also(?) sees POLLIN set, which it processes first by calling read(), which returns an ECONNREFUSED error. That not a documented error return from read(). An easy way to test this is to truss svn and attempt to do an http checkout from a host that has both IPv6 and IPv4 addresses, but is not listening on port 80. The only connection attempt will be to the IPv6 address. socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4) fcntl(4,F_GETFL,) = 2 (0x2) fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0) setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0) gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0) connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress' gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2) read(4,0x80549c064,8000) ERR#61 'Connection refused' kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' close(4) = 0 (0x0) close(3) = 0 (0x0) svn: E170013: Unable to connect to a repository at URL ... It looks like it should be possible to patch serf to handle this, but: * Should POLLIN be set for this event? * What errno value should read() return in this case, if it is ECONNREFUSED, then that should be documented. From owner-freebsd-net@freebsd.org Tue Jul 26 16:06:37 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3DD4BA4445 for ; Tue, 26 Jul 2016 16:06:37 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5F3F318EB for ; Tue, 26 Jul 2016 16:06:36 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 141101A0D71 for ; Tue, 26 Jul 2016 11:06:29 -0500 (CDT) Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: freebsd-net@freebsd.org References: <201607261559.u6QFxF8a081339@gw.catspoiler.org> From: Karl Denninger Message-ID: <4b7e5fc9-7bc6-02e0-f147-3a5cb0e41788@denninger.net> Date: Tue, 26 Jul 2016 11:06:23 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <201607261559.u6QFxF8a081339@gw.catspoiler.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms040802090807040902010000" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2016 16:06:37 -0000 This is a cryptographically signed message in MIME format. --------------ms040802090807040902010000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 7/26/2016 10:59, Don Lewis wrote: > Serf has some code to fall back from IPv4 if an IPv6 and more generally= > try different addresses on multi-homed servers if connection attempts > fail, but it does not work properly on recent versions of FreeBSD. I've= > tested both recent FreeBSD 10.3-STABLE and HEAD. > > The way that it is supposed to work is that serf creates a socket, sets= > it non-blocking, calls connect(), and then passes the fd to poll(). Whe= n > the connection attempt fails, it expects to see a POLLERR event. The > POLLERR event handler will then call getsockopt(fd, SOL_SOCKET, > SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one o= f > a couple of other errors, then serf will move on to the next address. > > Instead what happens is that serf also(?) sees POLLIN set, which it > processes first by calling read(), which returns an ECONNREFUSED error.= > That not a documented error return from read(). > > An easy way to test this is to truss svn and attempt to do an http > checkout from a host that has both IPv6 and IPv4 addresses, but is not > listening on port 80. The only connection attempt will be to the IPv6 > address. > > socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) =3D 4 (0x4) > fcntl(4,F_GETFL,) =3D 2 (0x2) > fcntl(4,F_SETFL,O_NONBLOCK|0x2) =3D 0 (0x0) > setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) =3D 0 (0x0) > gettimeofday({ 1469515046.979461 },0x0) =3D 0 (0x0) > connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Opera= tion now in progress' > gettimeofday({ 1469515046.979614 },0x0) =3D 0 (0x0) > kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) =3D = 0 (0x0) > kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) =3D= 0 (0x0) > kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4= ,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.50000000= 0 }) =3D 2 (0x2) > read(4,0x80549c064,8000) ERR#61 'Connection refused' > kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) =3D 0 (0x= 0) > kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) =3D 0 (0= x0) > kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No= such file or directory' > kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'N= o such file or directory' > close(4) =3D 0 (0x0) > close(3) =3D 0 (0x0) > svn: E170013: Unable to connect to a repository at URL ... > > > It looks like it should be possible to patch serf to handle this, but: > * Should POLLIN be set for this event? > =20 > * What errno value should read() return in this case, if it is > ECONNREFUSED, then that should be documented. > > This is kinda serious in that the above manifestation in svn effectively disables it for those of us that are on IPv4 connections and have no provider capability for IPv6 at the present time. When I was running 10.2 this was not a problem but as soon as I rolled forward to 11.x it showed up. Fortunately svnlite does work, but if this same breakage manages to migrate there as well....... --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms040802090807040902010000 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MjYxNjA2MjNaME8GCSqGSIb3DQEJBDFCBEC7 clJf7fU6vKtuOJ6TYU3duO0G6VT+xfpEk6I6fi5/hwxebYNlJogdqHr9Y6W3tn/VCFLlEW4x CbAkg7M/ks2nMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAorU7t7pz lIIEWPCFtyveyiqW7Wp4qALm7w01cA9aqhirAQ5A5FH2YhDL3Z2y8eHzQFFcNPgLjdwmd7S0 gt4HX+B8w8FLYoi0EJgowd1rxD696Koep6R9Va9qwTkrknTELUJVWDZ5G6lbEAgrBFseXX2X qELJxX11RLPEwW4h07Lim/imkBqZH89cVqOhcu+Qno/uxjmKLqIST5JgR2+RNVww09YCaswp KwqxPq4WODMOME09hDqYzeaN1MCRU2ZOKjqikxSpzVm/XgqlaFMB3NWN63fMgGlDNBfjea4J YChpsbajO11L+3vBxQYB1pxwgIOfCHpGlrfYA4543KhrGfyNOo1uOM+NfWqJc79lLtfRgqt1 71wKNGBHXq9mCgMI9LBHw6HIjzkD5oKdl3NJYrP3e5xiDF0MlgweU+Y/tNNsJ7IPc7UXBFTM 3e2GFOUuPjTlijPoVolCsVLsqQlJCqYe0I1f4wxsnU2oiN0S19fvMqvWGjtebtKdzpZ/Z2iQ Bx6CEQwl2TeocVkfv3w3bLtQX7ioZ8pZWGxaCgAdxtA82kVYpuoAN/+7K29k8nN1mp76264+ RO6Yog+Rq4GcHhxvfPcfCZOSJ9jDcTKlY+JH6NKjFBFNXZ6om9L8Tt9hc8UvIizJ1kAEcxvy pkAKf5OE8NiXKk/evMvTT2blWIMAAAAAAAA= --------------ms040802090807040902010000-- From owner-freebsd-net@freebsd.org Tue Jul 26 18:19:50 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 64DC7BA5663 for ; Tue, 26 Jul 2016 18:19:50 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 460761C9D for ; Tue, 26 Jul 2016 18:19:50 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6QIJhtU081768 for ; Tue, 26 Jul 2016 11:19:47 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607261819.u6QIJhtU081768@gw.catspoiler.org> Date: Tue, 26 Jul 2016 11:19:43 -0700 (PDT) From: Don Lewis Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: freebsd-net@FreeBSD.org In-Reply-To: <201607261559.u6QFxF8a081339@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2016 18:19:50 -0000 On 26 Jul, Don Lewis wrote: > It looks like it should be possible to patch serf to handle this, but: > * Should POLLIN be set for this event? I don't think it should, but the standard doesn't cover this case. On a successful non-blocking connect(), our man page says that select(2) will indicate that the fd is writeable. The Open Group Base Specifications Issue 7 says that pselect(), select(), and poll() shall indicate that the socket is ready for writing. I haven't seen anything that says what should be done if the connect fails. > * What errno value should read() return in this case, if it is > ECONNREFUSED, then that should be documented. Our read(2) man page does not document that ENOTCONN can be returned, though we explicitly return it and it is listed as valid by The Open Group Base Specifications. It does not list the connect failure errno values other than ETIMEDOUT as valid for read(). Though read() should not be called before the connection is up, if it is I *think* these errors should be mapped to ENOTCONN, but handling ETIMEDOUT is trickier. If that error came from the connection attempt, then we would want to return ENOTCONN, but if the connection came up and was later dropped due to a timeout, then ETIMEDOUT should be returned. From owner-freebsd-net@freebsd.org Tue Jul 26 22:49:13 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19886BA65E2 for ; Tue, 26 Jul 2016 22:49:13 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D0D6B1F6C for ; Tue, 26 Jul 2016 22:49:12 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6QMn1cY082332; Tue, 26 Jul 2016 15:49:06 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607262249.u6QMn1cY082332@gw.catspoiler.org> Date: Tue, 26 Jul 2016 15:49:01 -0700 (PDT) From: Don Lewis Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: karl@denninger.net cc: freebsd-net@freebsd.org In-Reply-To: <4b7e5fc9-7bc6-02e0-f147-3a5cb0e41788@denninger.net> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2016 22:49:13 -0000 On 26 Jul, Karl Denninger wrote: > On 7/26/2016 10:59, Don Lewis wrote: >> Serf has some code to fall back from IPv4 if an IPv6 and more generally >> try different addresses on multi-homed servers if connection attempts >> fail, but it does not work properly on recent versions of FreeBSD. I've >> tested both recent FreeBSD 10.3-STABLE and HEAD. >> >> The way that it is supposed to work is that serf creates a socket, sets >> it non-blocking, calls connect(), and then passes the fd to poll(). When >> the connection attempt fails, it expects to see a POLLERR event. The >> POLLERR event handler will then call getsockopt(fd, SOL_SOCKET, >> SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of >> a couple of other errors, then serf will move on to the next address. >> >> Instead what happens is that serf also(?) sees POLLIN set, which it >> processes first by calling read(), which returns an ECONNREFUSED error. >> That not a documented error return from read(). >> >> An easy way to test this is to truss svn and attempt to do an http >> checkout from a host that has both IPv6 and IPv4 addresses, but is not >> listening on port 80. The only connection attempt will be to the IPv6 >> address. >> >> socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4) >> fcntl(4,F_GETFL,) = 2 (0x2) >> fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0) >> setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0) >> gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0) >> connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress' >> gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2) >> read(4,0x80549c064,8000) ERR#61 'Connection refused' >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> close(4) = 0 (0x0) >> close(3) = 0 (0x0) >> svn: E170013: Unable to connect to a repository at URL ... >> >> >> It looks like it should be possible to patch serf to handle this, but: >> * Should POLLIN be set for this event? >> >> * What errno value should read() return in this case, if it is >> ECONNREFUSED, then that should be documented. >> >> > This is kinda serious in that the above manifestation in svn effectively > disables it for those of us that are on IPv4 connections and have no > provider capability for IPv6 at the present time. When I was running > 10.2 this was not a problem but as soon as I rolled forward to 11.x it > showed up. I saw it on 10.3-STABLE, but I don't see any changes in the kernel source between the stable/10 branch point and the tip of that branch that look suspicious. I'll try to find some time to write a simple test case and run it on some older releases as well as on Linux. It looks to me like soisdisconnected() should not do a read wakeup if the socket was never in a connected state. I think it should also set a new flag to indicate whether or not the socket was previously connected so that read() and write() can return the proper errno value if the socket was never connected. > Fortunately svnlite does work, but if this same breakage manages to > migrate there as well....... I'm surprised that svnlite is working for you. The truss output looks the same to me as svn and the serf fallback code is the same. socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4) fcntl(4,F_GETFL,) = 2 (0x2) fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0) setsockopt(0x4,0x6,0x1,0x7fffffffddb4,0x4) = 0 (0x0) gettimeofday({ 1469572654.492874 },0x0) = 0 (0x0) connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xx]:80 },28) ERR#36 'Operation now in progress' gettimeofday({ 1469572654.493011 },0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x802898300 },1,0x0,0,0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x802898300 },1,0x0,0,0x0) = 0 (0x0) kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x802898300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x802898300 },32,{ 0.500000000 }) = 2 (0x2) read(4,0x80289d064,8000) ERR#61 'Connection refused' kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' close(4) = 0 (0x0) close(3) = 0 (0x0) svn: E170013: Unable to connect to a repository at URL ... The host I pointed svnlite at has both IPv4 and IPv6 addresses in DNS, but it is only listening to IPv4 on port 80. A lack of connectivity that results in the IPv6 connection requests getting dropped into a black hole might behave differently. I'm not sure that serf/apr wait for the ETIMEDOUT error to occur and may bail out early. In that case they won't see the POLLIN event and won't take the wrong code path that bypasses the fallback. From owner-freebsd-net@freebsd.org Tue Jul 26 22:57:10 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 071F8BA682C for ; Tue, 26 Jul 2016 22:57:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id A6FD414D2; Tue, 26 Jul 2016 22:57:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 091771048F19; Wed, 27 Jul 2016 08:57:01 +1000 (AEST) Date: Wed, 27 Jul 2016 08:57:00 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Don Lewis cc: freebsd-net@freebsd.org Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? In-Reply-To: <201607261559.u6QFxF8a081339@gw.catspoiler.org> Message-ID: <20160727054616.X990@besplex.bde.org> References: <201607261559.u6QFxF8a081339@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=wJ6eUo_-nK8opg67eDAA:9 a=9UXwsmE3bJ0M3TOt:21 a=_6LvzLDokv7YOv9B:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2016 22:57:10 -0000 On Tue, 26 Jul 2016, Don Lewis wrote: > Serf has some code to fall back from IPv4 if an IPv6 and more generally > try different addresses on multi-homed servers if connection attempts > fail, but it does not work properly on recent versions of FreeBSD. I've > tested both recent FreeBSD 10.3-STABLE and HEAD. > > The way that it is supposed to work is that serf creates a socket, sets > it non-blocking, calls connect(), and then passes the fd to poll(). When > the connection attempt fails, it expects to see a POLLERR event. The > POLLERR event handler will then call getsockopt(fd, SOL_SOCKET, > SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of > a couple of other errors, then serf will move on to the next address. > > Instead what happens is that serf also(?) sees POLLIN set, which it > processes first by calling read(), which returns an ECONNREFUSED error. > That not a documented error return from read(). FreeBSD still bogusly returns POLLIN (and POLLRDNORM) together with POLLHUP at EOF when there is no data (both set should mean both), and still has the bogus POLLINIGNEOF, but it it almost never returns POLLERR. My regression tests in tools/regression/poll check for not having this bug The only setting of POLLERR in kern is in kqueue_poll() for errors in initialization, and this doesn't set the other flags. The only uses of POLLERR in kern are: - in select(), to turn POLLERR into "set" for any backend that sets it (and there seems to be only 1 backend that sets it) - in vop_stdpoll() and poll_no_poll(), there is inconsistent bogus masking using POLLSTANDARD to obfuscate that standard flags which must be ignored are _not_ masked. So I don't see how you can get POLLIN with POLLERR. > An easy way to test this is to truss svn and attempt to do an http > checkout from a host that has both IPv6 and IPv4 addresses, but is not > listening on port 80. The only connection attempt will be to the IPv6 > address. > > socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4) > fcntl(4,F_GETFL,) = 2 (0x2) > fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0) > setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0) > gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0) > connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress' > gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0) > kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) > kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) > kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2) I don't see any POLL* there or completely understand the notation or kqueue, but this looks like the poll() bug with POLLIN together with POLLHUP, not POLLIN together with POLLERR. Everything here seems to be correct. Not very good, but good enough here. EV_EOF is set by filt_soread() when SBS_CANTRECVMORE is set. SBS_CANTRECVMORE means hangup, not EOF, and I think there can be readable data from a socket in general but not after a connection error. So this translation is incorrect in general but correct after a connection error. kqueue just can't represent hangup and conflates it with EOF. When filt_soread() sets EV_EOF, it doesn't clear other flags, so NOTE_LOWAT remains set. This happens to be correct. But since NOTE_LOWAT really means low water, you can't use it to determine if (non-null) data can be read. (POSIX is unclear about whether the "data" for select() and poll() is actual data or just EOF.) poll() has almost the opposite problems. It can represent hangup but can't represent EOF. It can represent no data, but this doesn't mean EOF when the file is open. It can't represent low-water. so_poll_generic() starts carefully by setting POLLIN iff soreadable(). soreadable() is true above the watermark. So POLLIN for a socket normally means that (non-null) data above the watermark can be read (without blocking because it is above the watermark). This is correct semantics. But then so_poll_generic() sets POLLIN if it sets POLLHUP. This makes POLLIN worse than useless. A naive reader won't look at POLLHUP, but will trust POLLIN and spin reading at EOF. A non-naive reader will see POLLHUP but can't trust POLLIN then. It must spin reading until read returns EOF, and poll() is useless for avoiding this busy-waiting. Turning off O_NONBLOCK to avoid spinning is unsafe if the EOF is not sticky. Just having watermarks further complicates the idea of what "data" is. Null data is a special case of data that it is too small to be worth reading. It corresponds to a low watermark of 0 or 1. With watermarks, non-null datai below low water should be considered as not being there for the purposes of select() and poll(), but there if you try to read it. POSIX is unclear about this too. kqueue has the opposite problem. It handles watermarks directly, but seems to be missing support for transient EOF. This causes problems for tty devices too. In Net/2, select() basically uses a hard-coded watermark of 1, and this doesn't even work to give tinygrams because read() blocks after select() returns "set" for certain MIN/TIME combinations where the watermark should be MIN. This was fixed in FreeBSD-1, basically by copying the socket code. This was broken in 4.4BSD. This was broken in FreeBSD-2.early by copying 4.4BSD. This was fixed in FreeBSD-2 by restoring fixes. The fixes were refined in FreeBSD-[2-7]. All of the fixes were lost in FreeBSD-8. Most of the fixes are restored in my version. > read(4,0x80549c064,8000) ERR#61 'Connection refused' > kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) > kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) > kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' > kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' > close(4) = 0 (0x0) > close(3) = 0 (0x0) > svn: E170013: Unable to connect to a repository at URL ... > > It looks like it should be possible to patch serf to handle this, but: > * Should POLLIN be set for this event? I think there never was any data, so no for poll(). kqueue just cannot represent the no-data condition. > * What errno value should read() return in this case, if it is > ECONNREFUSED, then that should be documented. Don't know. Bruce From owner-freebsd-net@freebsd.org Tue Jul 26 23:41:02 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9238BA515E for ; Tue, 26 Jul 2016 23:41:02 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 89F8F1BCB for ; Tue, 26 Jul 2016 23:41:02 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6QNes2t082436; Tue, 26 Jul 2016 16:40:58 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607262340.u6QNes2t082436@gw.catspoiler.org> Date: Tue, 26 Jul 2016 16:40:54 -0700 (PDT) From: Don Lewis Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: brde@optusnet.com.au cc: freebsd-net@freebsd.org In-Reply-To: <20160727054616.X990@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2016 23:41:02 -0000 On 27 Jul, Bruce Evans wrote: > On Tue, 26 Jul 2016, Don Lewis wrote: > >> Serf has some code to fall back from IPv4 if an IPv6 and more generally >> try different addresses on multi-homed servers if connection attempts >> fail, but it does not work properly on recent versions of FreeBSD. I've >> tested both recent FreeBSD 10.3-STABLE and HEAD. >> >> The way that it is supposed to work is that serf creates a socket, sets >> it non-blocking, calls connect(), and then passes the fd to poll(). When >> the connection attempt fails, it expects to see a POLLERR event. The >> POLLERR event handler will then call getsockopt(fd, SOL_SOCKET, >> SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of >> a couple of other errors, then serf will move on to the next address. >> >> Instead what happens is that serf also(?) sees POLLIN set, which it >> processes first by calling read(), which returns an ECONNREFUSED error. >> That not a documented error return from read(). > > FreeBSD still bogusly returns POLLIN (and POLLRDNORM) together with > POLLHUP at EOF when there is no data (both set should mean both), and > still has the bogus POLLINIGNEOF, but it it almost never returns POLLERR. > My regression tests in tools/regression/poll check for not having this > bug > > The only setting of POLLERR in kern is in kqueue_poll() for errors in > initialization, and this doesn't set the other flags. > > The only uses of POLLERR in kern are: > - in select(), to turn POLLERR into "set" for any backend that sets it > (and there seems to be only 1 backend that sets it) > - in vop_stdpoll() and poll_no_poll(), there is inconsistent bogus masking > using POLLSTANDARD to obfuscate that standard flags which must be > ignored are _not_ masked. > > So I don't see how you can get POLLIN with POLLERR. > >> An easy way to test this is to truss svn and attempt to do an http >> checkout from a host that has both IPv6 and IPv4 addresses, but is not >> listening on port 80. The only connection attempt will be to the IPv6 >> address. >> >> socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4) >> fcntl(4,F_GETFL,) = 2 (0x2) >> fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0) >> setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0) >> gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0) >> connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress' >> gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2) > > I don't see any POLL* there or completely understand the notation or kqueue, > but this looks like the poll() bug with POLLIN together with POLLHUP, not > POLLIN together with POLLERR. I didn't try to decipher out the kqueue stuff. I was thinking that our poll() was using kqueue under the hood, but it turns out that the poll emulation is actually being done by apr. Sigh ... A comment in the emulation code says: /* APR_POLLPRI, APR_POLLERR, and APR_POLLNVAL are not handled by this * implementation. ... double sigh. > Everything here seems to be correct. Not very good, but good enough here. > > EV_EOF is set by filt_soread() when SBS_CANTRECVMORE is set. > SBS_CANTRECVMORE means hangup, not EOF, and I think there can be > readable data from a socket in general but not after a connection > error. So this translation is incorrect in general but correct after > a connection error. kqueue just can't represent hangup and conflates > it with EOF. But should there be a hangup or EOF if we never got connected in the first place? > When filt_soread() sets EV_EOF, it doesn't clear other flags, so > NOTE_LOWAT remains set. This happens to be correct. But since NOTE_LOWAT > really means low water, you can't use it to determine if (non-null) data > can be read. (POSIX is unclear about whether the "data" for select() and > poll() is actual data or just EOF.) > > poll() has almost the opposite problems. It can represent hangup but > can't represent EOF. It can represent no data, but this doesn't mean > EOF when the file is open. It can't represent low-water. > so_poll_generic() starts carefully by setting POLLIN iff soreadable(). > soreadable() is true above the watermark. So POLLIN for a socket > normally means that (non-null) data above the watermark can be read > (without blocking because it is above the watermark). This is correct > semantics. But then so_poll_generic() sets POLLIN if it sets POLLHUP. > This makes POLLIN worse than useless. A naive reader won't look at > POLLHUP, but will trust POLLIN and spin reading at EOF. A non-naive > reader will see POLLHUP but can't trust POLLIN then. It must spin > reading until read returns EOF, and poll() is useless for avoiding > this busy-waiting. Turning off O_NONBLOCK to avoid spinning is unsafe > if the EOF is not sticky. > > Just having watermarks further complicates the idea of what "data" is. > Null data is a special case of data that it is too small to be worth > reading. It corresponds to a low watermark of 0 or 1. With watermarks, > non-null datai below low water should be considered as not being there > for the purposes of select() and poll(), but there if you try to read > it. POSIX is unclear about this too. kqueue has the opposite problem. > It handles watermarks directly, but seems to be missing support for > transient EOF. > > This causes problems for tty devices too. In Net/2, select() basically > uses a hard-coded watermark of 1, and this doesn't even work to give > tinygrams because read() blocks after select() returns "set" for certain > MIN/TIME combinations where the watermark should be MIN. This was fixed > in FreeBSD-1, basically by copying the socket code. This was broken in > 4.4BSD. This was broken in FreeBSD-2.early by copying 4.4BSD. This was > fixed in FreeBSD-2 by restoring fixes. The fixes were refined in > FreeBSD-[2-7]. All of the fixes were lost in FreeBSD-8. Most of the > fixes are restored in my version. > >> read(4,0x80549c064,8000) ERR#61 'Connection refused' >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> close(4) = 0 (0x0) >> close(3) = 0 (0x0) >> svn: E170013: Unable to connect to a repository at URL ... >> >> It looks like it should be possible to patch serf to handle this, but: >> * Should POLLIN be set for this event? > > I think there never was any data, so no for poll(). kqueue just cannot > represent the no-data condition. > >> * What errno value should read() return in this case, if it is >> ECONNREFUSED, then that should be documented. > > Don't know. > > Bruce From owner-freebsd-net@freebsd.org Wed Jul 27 01:24:46 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0882BA60DE for ; Wed, 27 Jul 2016 01:24:46 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C27E61329 for ; Wed, 27 Jul 2016 01:24:46 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6R1OcTi082647; Tue, 26 Jul 2016 18:24:42 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607270124.u6R1OcTi082647@gw.catspoiler.org> Date: Tue, 26 Jul 2016 18:24:38 -0700 (PDT) From: Don Lewis Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: brde@optusnet.com.au cc: freebsd-net@freebsd.org In-Reply-To: <201607262340.u6QNes2t082436@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2016 01:24:47 -0000 After giving this some more thought, I believe that the read and write wakeups are correct when the connection attempt fails. I also think that read() should return ENOTCONN if the socket never got to the connected state. I'm not sure how write() should behave. The Open Group Base Specifications Issue 7 says: [ECONNRESET] A write was attempted on a socket that is not connected. [EPIPE] A write was attempted on a socket that is shut down for writing, or is no longer connected. In the latter case, if the socket is of type SOCK_STREAM, a SIGPIPE signal shall also be sent to the thread. whereas our man page only mentions EPIPE. I think poll() should set POLLERR and not POLLIN or POLLOUT if the connection attempt fails. I think kqueue is fine, but the poll() emulation in apr should map the connection failure into POLLERR. From owner-freebsd-net@freebsd.org Wed Jul 27 06:16:46 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68A04BA68DB for ; Wed, 27 Jul 2016 06:16:46 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 343F418AA; Wed, 27 Jul 2016 06:16:46 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 2376410499BD; Wed, 27 Jul 2016 16:16:44 +1000 (AEST) Date: Wed, 27 Jul 2016 16:16:43 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Don Lewis cc: freebsd-net@freebsd.org Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? In-Reply-To: <201607262340.u6QNes2t082436@gw.catspoiler.org> Message-ID: <20160727154549.P871@besplex.bde.org> References: <201607262340.u6QNes2t082436@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=VIkg5I7X c=1 sm=1 tr=0 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=SwiqhOd84akJeGznk5QA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2016 06:16:46 -0000 On Tue, 26 Jul 2016, Don Lewis wrote: > On 27 Jul, Bruce Evans wrote: >> On Tue, 26 Jul 2016, Don Lewis wrote: >>> ... >>> kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >>> kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >>> kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2) >> >> I don't see any POLL* there or completely understand the notation or kqueue, >> but this looks like the poll() bug with POLLIN together with POLLHUP, not >> POLLIN together with POLLERR. > > I didn't try to decipher out the kqueue stuff. I was thinking that our > poll() was using kqueue under the hood, but it turns out that the poll > emulation is actually being done by apr. Sigh ... > > A comment in the emulation code says: > > /* APR_POLLPRI, APR_POLLERR, and APR_POLLNVAL are not handled by this > * implementation. > > ... double sigh. > >> Everything here seems to be correct. Not very good, but good enough here. >> >> EV_EOF is set by filt_soread() when SBS_CANTRECVMORE is set. >> SBS_CANTRECVMORE means hangup, not EOF, and I think there can be >> readable data from a socket in general but not after a connection >> error. So this translation is incorrect in general but correct after >> a connection error. kqueue just can't represent hangup and conflates >> it with EOF. > > But should there be a hangup or EOF if we never got connected in the > first place? I think hangup is correct. Named pipes have this problem and more. The connection may be re-opened, so hangup should not be sticky. Except, for some uses it should be sticky. The initial state when there is no writer and no data is like a non-sticky hangup, and I think POLLHUP should be returned for both. I think this is what the old fifofs implementation did (it set SBS_CANT* initially and sopoll() should turn this into POLLHUP). However, this is not quite right since it leaves no good way to wait for a writer. select() and poll() are useless since they are specified to return immediately in the hangup state. There is no way to get back to a blocking open() with an open fd. You have to use a new blocking open(). But a new open might have side effects, and it often have to be in a separate thread, and with threads you could do almost everything using blocking threads to do the i/o and waiting in these threads instead of select() or poll(). Emulation gives another problem. It was difficult to emulate named pipes on top of sockets in old fifofs even with full access to kernel state and kernel events. The socket layer might be missing some state or events for it changing. It was missing reporting of POLLHUP as late as FreeBSD-4. This is difficult to fix in an emulator, and fifofs in FreeBSD-4 didn't try. POLLHUP was just unsupported for most file types in FreeBSD-4. Bruce From owner-freebsd-net@freebsd.org Wed Jul 27 07:42:57 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 069D9BA63D5 for ; Wed, 27 Jul 2016 07:42:57 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id BABC11E35 for ; Wed, 27 Jul 2016 07:42:56 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6R7gkWE083349; Wed, 27 Jul 2016 00:42:50 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607270742.u6R7gkWE083349@gw.catspoiler.org> Date: Wed, 27 Jul 2016 00:42:46 -0700 (PDT) From: Don Lewis Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: karl@denninger.net cc: freebsd-net@freebsd.org In-Reply-To: <4b7e5fc9-7bc6-02e0-f147-3a5cb0e41788@denninger.net> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2016 07:42:57 -0000 On 26 Jul, Karl Denninger wrote: > On 7/26/2016 10:59, Don Lewis wrote: >> Serf has some code to fall back from IPv4 if an IPv6 and more generally >> try different addresses on multi-homed servers if connection attempts >> fail, but it does not work properly on recent versions of FreeBSD. I've >> tested both recent FreeBSD 10.3-STABLE and HEAD. >> >> The way that it is supposed to work is that serf creates a socket, sets >> it non-blocking, calls connect(), and then passes the fd to poll(). When >> the connection attempt fails, it expects to see a POLLERR event. The >> POLLERR event handler will then call getsockopt(fd, SOL_SOCKET, >> SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of >> a couple of other errors, then serf will move on to the next address. >> >> Instead what happens is that serf also(?) sees POLLIN set, which it >> processes first by calling read(), which returns an ECONNREFUSED error. >> That not a documented error return from read(). >> >> An easy way to test this is to truss svn and attempt to do an http >> checkout from a host that has both IPv6 and IPv4 addresses, but is not >> listening on port 80. The only connection attempt will be to the IPv6 >> address. >> >> socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4) >> fcntl(4,F_GETFL,) = 2 (0x2) >> fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0) >> setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0) >> gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0) >> connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress' >> gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2) >> read(4,0x80549c064,8000) ERR#61 'Connection refused' >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> close(4) = 0 (0x0) >> close(3) = 0 (0x0) >> svn: E170013: Unable to connect to a repository at URL ... >> >> >> It looks like it should be possible to patch serf to handle this, but: >> * Should POLLIN be set for this event? >> >> * What errno value should read() return in this case, if it is >> ECONNREFUSED, then that should be documented. >> >> > This is kinda serious in that the above manifestation in svn effectively > disables it for those of us that are on IPv4 connections and have no > provider capability for IPv6 at the present time. When I was running > 10.2 this was not a problem but as soon as I rolled forward to 11.x it > showed up. Try the following apr patch. It works for me with svn, but I'm getting a crash in another application that uses apr. --- apr-1.5.2/poll/unix/kqueue.c.orig 2015-03-20 01:34:07 UTC +++ apr-1.5.2/poll/unix/kqueue.c @@ -25,21 +25,40 @@ #ifdef HAVE_KQUEUE -static apr_int16_t get_kqueue_revent(apr_int16_t event, apr_int16_t flags) +static apr_int16_t get_kqueue_revent(apr_int16_t event, apr_int16_t flags, + int fflags, intptr_t data) { apr_int16_t rv = 0; - if (event == EVFILT_READ) - rv |= APR_POLLIN; - else if (event == EVFILT_WRITE) - rv |= APR_POLLOUT; - if (flags & EV_EOF) - rv |= APR_POLLHUP; - /* APR_POLLPRI, APR_POLLERR, and APR_POLLNVAL are not handled by this - * implementation. + /* APR_POLLPRI and APR_POLLNVAL are not handled by this implementation. * TODO: See if EV_ERROR + certain system errors in the returned data field * should map to APR_POLLNVAL. */ + if (event == EVFILT_READ) { + if (data > 0 || fflags == 0) + rv |= APR_POLLIN; + else + rv |= APR_POLLERR; + /* + * Don't return POLLHUP if connect fails. Apparently Linux + * does not, and this is expected by serf in order for IPv6 to + * IPv4 or multihomed host fallback to work. + * + * ETIMEDOUT is ambiguous here since we don't know if a + * connection was established. We don't want to return + * POLLHUP here if the connection attempt timed out, but + * we do if the connection was successful but later dropped. + * For now, favor the latter. + */ + if ((flags & EV_EOF) != 0 && fflags != ECONNREFUSED && + fflags != ENETUNREACH && fflags != EHOSTUNREACH) + rv |= APR_POLLHUP; + } else if (event == EVFILT_WRITE) { + if (data > 0 || fflags == 0) + rv |= APR_POLLOUT; + else + rv |= APR_POLLERR; + } return rv; } @@ -290,7 +309,9 @@ static apr_status_t impl_pollset_poll(ap pollset->p->result_set[j] = fd; pollset->p->result_set[j].rtnevents = get_kqueue_revent(pollset->p->ke_set[i].filter, - pollset->p->ke_set[i].flags); + pollset->p->ke_set[i].flags, + pollset->p->ke_set[i].fflags, + pollset->p->ke_set[i].data); j++; } } @@ -471,7 +492,9 @@ static apr_status_t impl_pollcb_poll(apr apr_pollfd_t *pollfd = (apr_pollfd_t *)(pollcb->pollset.ke[i].udata); pollfd->rtnevents = get_kqueue_revent(pollcb->pollset.ke[i].filter, - pollcb->pollset.ke[i].flags); + pollcb->pollset.ke[i].flags, + pollcb->pollset.ke[i].fflags, + pollcb->pollset.ke[i].data); rv = func(baton, pollfd); From owner-freebsd-net@freebsd.org Wed Jul 27 21:05:25 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CF472BA6DFB for ; Wed, 27 Jul 2016 21:05:25 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from hz.grosbein.net (hz.grosbein.net [78.47.246.247]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "hz.grosbein.net", Issuer "hz.grosbein.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 68D8E133B for ; Wed, 27 Jul 2016 21:05:24 +0000 (UTC) (envelope-from eugen@grosbein.net) Received: from eg.sd.rdtc.ru (root@eg.sd.rdtc.ru [62.231.161.221]) by hz.grosbein.net (8.14.9/8.14.9) with ESMTP id u6RKoxJW089624 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 27 Jul 2016 22:51:00 +0200 (CEST) (envelope-from eugen@grosbein.net) X-Envelope-From: eugen@grosbein.net X-Envelope-To: Received: from [10.58.0.10] (dadv@dadvw [10.58.0.10]) by eg.sd.rdtc.ru (8.15.2/8.15.2) with ESMTPS id u6RKouxY002879 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Thu, 28 Jul 2016 03:50:56 +0700 (KRAT) (envelope-from eugen@grosbein.net) To: "freebsd-net@freebsd.org" From: Eugene Grosbein Subject: route table entry for link level address (regression?) Message-ID: <57991EB0.6020507@grosbein.net> Date: Thu, 28 Jul 2016 03:50:56 +0700 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_00, DATE_IN_FUTURE_96_Q, LOCAL_FROM autolearn=no version=3.3.2 X-Spam-Report: * 3.3 DATE_IN_FUTURE_96_Q Date: is 4 days to 4 months after Received: date * -2.3 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * 2.6 LOCAL_FROM From my domains X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hz.grosbein.net X-Spam-Level: *** X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2016 21:05:25 -0000 Hi! I need to create route table entry for single IPv4 host address pointing to specified interface and link level address (MAC address), so that ARP protocol not used and supplied address always used instead. A command similar to the following one used to work in previous versions of FreeBSD (8.x AFAIR). It is still accepted as correct command, is processed and installs new entry to the routing table: # route -n add -host 192.168.0.0 -link lagg0:0.1b.21.bc.10.d0 -interface add host 192.168.0.0: gateway lagg0:0.1b.21.bc.10.d0 fib 0 # netstat -rn Routing tables Internet: Destination Gateway Flags Netif Expire ... 192.168.0.0 lagg0:0.1b.21.bc.10.d0 UHS lagg0 Nice entry, right interface with link level address, nice flags: host (H), no intermediate gateway required (no G flag). Still, it does not work as "ping 192.168.0.0" makes kernel to send ARP requests to lagg0 interface. They get no response as 192.168.0.0 is unpublished loopback address of neighbouring host having noted link level (MAC) address. So, not IP packets sent. How can I install static routing entry for such ethernet neighbour usable without ARP protocol these days? I use recent 10.3-STABLE/amd64. Eugene Grosbein From owner-freebsd-net@freebsd.org Thu Jul 28 22:15:56 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E6711BA571A for ; Thu, 28 Jul 2016 22:15:56 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (unknown [IPv6:2602:304:b010:ef20::f2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gw.catspoiler.org", Issuer "gw.catspoiler.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id B1A311D75 for ; Thu, 28 Jul 2016 22:15:56 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.15.2/8.15.2) with ESMTP id u6SMFjVu089451; Thu, 28 Jul 2016 15:15:49 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201607282215.u6SMFjVu089451@gw.catspoiler.org> Date: Thu, 28 Jul 2016 15:15:45 -0700 (PDT) From: Don Lewis Subject: Re: IPv6 -> IPv4 fallback broken in serf, kernel bug? To: karl@denninger.net cc: freebsd-net@freebsd.org In-Reply-To: <4b7e5fc9-7bc6-02e0-f147-3a5cb0e41788@denninger.net> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2016 22:15:57 -0000 On 26 Jul, Karl Denninger wrote: > On 7/26/2016 10:59, Don Lewis wrote: >> Serf has some code to fall back from IPv4 if an IPv6 and more generally >> try different addresses on multi-homed servers if connection attempts >> fail, but it does not work properly on recent versions of FreeBSD. I've >> tested both recent FreeBSD 10.3-STABLE and HEAD. >> >> The way that it is supposed to work is that serf creates a socket, sets >> it non-blocking, calls connect(), and then passes the fd to poll(). When >> the connection attempt fails, it expects to see a POLLERR event. The >> POLLERR event handler will then call getsockopt(fd, SOL_SOCKET, >> SO_ERROR, &error, ...). If the returned error is ECONNREFUSED or one of >> a couple of other errors, then serf will move on to the next address. >> >> Instead what happens is that serf also(?) sees POLLIN set, which it >> processes first by calling read(), which returns an ECONNREFUSED error. >> That not a documented error return from read(). >> >> An easy way to test this is to truss svn and attempt to do an http >> checkout from a host that has both IPv6 and IPv4 addresses, but is not >> listening on port 80. The only connection attempt will be to the IPv6 >> address. >> >> socket(PF_INET6,SOCK_STREAM|SOCK_CLOEXEC,6) = 4 (0x4) >> fcntl(4,F_GETFL,) = 2 (0x2) >> fcntl(4,F_SETFL,O_NONBLOCK|0x2) = 0 (0x0) >> setsockopt(0x4,0x6,0x1,0x7fffffffdda4,0x4) = 0 (0x0) >> gettimeofday({ 1469515046.979461 },0x0) = 0 (0x0) >> connect(4,{ AF_INET6 [xxxx:xxxx:xxxx:xxxx::xxxx]:80 },28) ERR#36 'Operation now in progress' >> gettimeofday({ 1469515046.979614 },0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_ADD,0x0,0x0,0x805491300 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,0x0,0,{ 4,EVFILT_READ,EV_EOF,NOTE_LOWAT|0x3c,0x0,0x805491300 4,EVFILT_WRITE,EV_EOF,NOTE_LOWAT|0x3c,0x8000,0x805491300 },32,{ 0.500000000 }) = 2 (0x2) >> read(4,0x80549c064,8000) ERR#61 'Connection refused' >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0) >> kevent(3,{ 4,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> kevent(3,{ 4,EVFILT_WRITE,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) ERR#2 'No such file or directory' >> close(4) = 0 (0x0) >> close(3) = 0 (0x0) >> svn: E170013: Unable to connect to a repository at URL ... >> >> >> It looks like it should be possible to patch serf to handle this, but: >> * Should POLLIN be set for this event? >> >> * What errno value should read() return in this case, if it is >> ECONNREFUSED, then that should be documented. >> >> > This is kinda serious in that the above manifestation in svn effectively > disables it for those of us that are on IPv4 connections and have no > provider capability for IPv6 at the present time. When I was running > 10.2 this was not a problem but as soon as I rolled forward to 11.x it > showed up. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211430