Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jul 2018 08:35:09 +0200
From:      Jan Beich <jbeich@FreeBSD.org>
To:        Don Lewis <truckman@FreeBSD.org>
Cc:        Jan =?utf-8?Q?Kokem=C3=BCller?= <jan.kokemueller@gmail.com>, freebsd-hackers@freebsd.org
Subject:   Re: Firefox 63 vs. sendmsg()
Message-ID:  <7elq-l8iq-wny@FreeBSD.org>
In-Reply-To: <tkrat.34455e7d69705d51@FreeBSD.org> (Don Lewis's message of "Thu, 19 Jul 2018 17:31:44 -0700 (PDT)")
References:  <pnzk-oufs-wny@FreeBSD.org> <844b824e-4463-82f4-a01d-5d398c4351e4@gmail.com> <lga7-1hzi-wny@FreeBSD.org> <tkrat.34455e7d69705d51@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Don Lewis <truckman@FreeBSD.org> writes:

> On 19 Jul, Jan Beich wrote:
>
>> Jan Kokem=C3=BCller <jan.kokemueller@gmail.com> writes:
>>=20
>>>> Can someone help debug https://bugzilla.mozilla.org/show_bug.cgi?id=3D=
1475970#c7 ?
>>>> I'm not familar with socket code.
>>>
>>> Could it be related to this bug[1]?
>>> There is a demo program in another bug report[2] that still fails for me
>>> (running a recent -current, r334337).
>>>
>>> [1]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D181741
>>> [2]: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D215933
>>=20
>> Indeed. Firefox 63 works fine after applying the patch in bug 181741.
>> I wonder, if "packet loss" issue is also responsible for IPC instability
>> in Firefox < 63 and Chromium.
>
> I've been having terrible stability problems with Firefox on my
> 11.2-STABLE amd64 machine since upgrading to version 61.  See:
>   https://bugzilla.mozilla.org/show_bug.cgi?id=3D1476130
> I applied this patch:
>   https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D168943&action=3Dd=
iff
> after tweaking it a bit so that all chunks would apply.  It's too soon
> to tell for sure, but this fix appears promising.

Can you attach the rebased version to bug 181741? I think, we may want
to ask a few Chromium users to check if it has an impact on bug 212812.

For comparison, attached to this mail is my own (blind) rebase.

--=-=-=
Content-Type: text/plain
Content-Disposition: attachment; filename=sendmsg.diff
Content-Description: bug 181741 patch series rebased against r336479

>From 8eeaf6d372dbc57c3f4d69b0e208d7b67703957f Mon Sep 17 00:00:00 2001
From: yuri <yuri@rawbw.com>
Date: Sat, 2 Apr 2016 14:46:59 -0700
Subject: [PATCH 1/4] fix dropped control messages on uipc sockets

There is the case when sendmsg(2) silently loses packets for
AF_LOCAL domain when large packets with control part in them
are sent.

* Problem Description
  There is the watermark limit on sockbuf determined by
  net.local.stream.sendspace, default is 8192 bytes (field
  sockbuf.sb_hiwat). When sendmsg(2) sends large enough data
  (8K+ that hits this 8192 limit) with control message,
  sosend_generic will be cutting the message data into separate
  mbufs based on 'sbspace' (derived from the above-mentioned
  sb_hiwat limit) with adjustment for control message size as it
  sees it.  This way it tries to make sure this sb_hiwat limit
  is enforced.

  However, down on uipc level control message is being further
  modified in two ways: unp_internalize modifies it into some
  'internal' form, also unp_addsockcred function adds another
  control message when LOCAL_CREDS are requested by client.
  Both functions only increase control message size beyond its
  original size (seen by sosend_generic). So that the first
  final mbuf sent (concatenation of control and data) will
  always be larger than 'sbspace' limit that sosend_generic was
  cutting data for.

  There is also the function sbappendcontrol_locked.  It checks
  the 'sbspace' limit again, and discards the packet when
  sbspace llimit is exceeded.  Its result code is essentially
  ignored in uipc_send.  I believe, sbappendcontrol_locked
  shouldn't be checking space at all, since packets are expected
  to be properly sized to begin with.  But this won't be the
  right fix, since sizes would be exceeding the sbspace limit
  anyway.

  sosend_default is one level up over uipc level, and it doesn't
  know what uipc will do with control message.  Therefore it
  can't know what the real adjustment for control message is
  needed (to properly cut data). It wrongly takes the original
  control size and this makes the first packet too large and
  discarded by sbappendcontrol_locked.

* Patch synopsys
- Added the new function into struct pr_usrreqs:
  int	(*pru_finalizecontrol)(struct socket *so, int flags,
				struct mbuf **pcontrol);
  It is called by sosend_generic to update control message to
  its final form.

- Removed 'sbspace' check from sbappendcontrol_locked. The only
  context it is called from is uipc_send, and all packet sizes are
  already conforming.

- Fixed few wrong error codes relevant to this situation.

[This patch is from:

  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181741

but updated by Chris Torek <torek@ixsystems.com> to fit in
a more recent FreeBSD kernel.]
---
 sys/kern/uipc_sockbuf.c | 21 +++---------
 sys/kern/uipc_socket.c  | 20 ++++++++++-
 sys/kern/uipc_usrreq.c  | 76 ++++++++++++++++++++++++++++-------------
 sys/sys/protosw.h       |  2 ++
 sys/sys/sockbuf.h       |  4 +--
 5 files changed, 81 insertions(+), 42 deletions(-)

diff --git a/sys/kern/uipc_sockbuf.c b/sys/kern/uipc_sockbuf.c
index f5da502612ba..0c946138f6ab 100644
--- a/sys/kern/uipc_sockbuf.c
+++ b/sys/kern/uipc_sockbuf.c
@@ -955,23 +955,16 @@ sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa,
 	return (retval);
 }
 
-int
+void
 sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
     struct mbuf *control)
 {
-	struct mbuf *m, *n, *mlast;
-	int space;
+	struct mbuf *m, *mlast;
 
 	SOCKBUF_LOCK_ASSERT(sb);
 
-	if (control == NULL)
-		panic("sbappendcontrol_locked");
-	space = m_length(control, &n) + m_length(m0, NULL);
-
-	if (space > sbspace(sb))
-		return (0);
 	m_clrprotoflags(m0);
-	n->m_next = m0;			/* concatenate data to control */
+	m_last(control)->m_next = m0;	/* concatenate data to control */
 
 	SBLASTRECORDCHK(sb);
 
@@ -985,18 +978,14 @@ sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
 	SBLASTMBUFCHK(sb);
 
 	SBLASTRECORDCHK(sb);
-	return (1);
 }
 
-int
+void
 sbappendcontrol(struct sockbuf *sb, struct mbuf *m0, struct mbuf *control)
 {
-	int retval;
-
 	SOCKBUF_LOCK(sb);
-	retval = sbappendcontrol_locked(sb, m0, control);
+	sbappendcontrol_locked(sb, m0, control);
 	SOCKBUF_UNLOCK(sb);
-	return (retval);
 }
 
 /*
diff --git a/sys/kern/uipc_socket.c b/sys/kern/uipc_socket.c
index 8952b1a4fdd4..4a3cd3e9bfda 100644
--- a/sys/kern/uipc_socket.c
+++ b/sys/kern/uipc_socket.c
@@ -1278,6 +1278,11 @@ sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio,
 	KASSERT(so->so_proto->pr_flags & PR_ATOMIC,
 	    ("sosend_dgram: !PR_ATOMIC"));
 
+	if (so->so_proto->pr_usrreqs->pru_finalizecontrol &&
+	    (error = (*so->so_proto->pr_usrreqs->pru_finalizecontrol)(so,
+	        flags, &control, td)))
+		goto out;
+
 	if (uio != NULL)
 		resid = uio->uio_resid;
 	else
@@ -1342,10 +1347,14 @@ sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio,
 	 * problem and need fixing.
 	 */
 	space = sbspace(&so->so_snd);
+	SOCKBUF_UNLOCK(&so->so_snd);
 	if (flags & MSG_OOB)
 		space += 1024;
+	if (clen > space) {
+		error = EMSGSIZE;
+		goto out;
+	}
 	space -= clen;
-	SOCKBUF_UNLOCK(&so->so_snd);
 	if (resid > space) {
 		error = EMSGSIZE;
 		goto out;
@@ -1440,6 +1449,11 @@ sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
 	int clen = 0, error, dontroute;
 	int atomic = sosendallatonce(so) || top;
 
+	if (so->so_proto->pr_usrreqs->pru_finalizecontrol &&
+	    (error = (*so->so_proto->pr_usrreqs->pru_finalizecontrol)(so,
+	        flags, &control, td)))
+		goto out;
+
 	if (uio != NULL)
 		resid = uio->uio_resid;
 	else
@@ -1532,6 +1546,10 @@ sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
 			goto restart;
 		}
 		SOCKBUF_UNLOCK(&so->so_snd);
+		if (clen > space) {
+			error = EMSGSIZE;
+			goto release;
+		}
 		space -= clen;
 		do {
 			if (uio == NULL) {
diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
index ac93ea1b9c4e..3f0bad36fe0c 100644
--- a/sys/kern/uipc_usrreq.c
+++ b/sys/kern/uipc_usrreq.c
@@ -313,7 +313,7 @@ static int	unp_internalize(struct mbuf **, struct thread *);
 static void	unp_internalize_fp(struct file *);
 static int	unp_externalize(struct mbuf *, struct mbuf **, int);
 static int	unp_externalize_fp(struct file *);
-static struct mbuf	*unp_addsockcred(struct thread *, struct mbuf *);
+static int	unp_addsockcred(struct mbuf **, struct thread *);
 static void	unp_process_defers(void * __unused, int);
 
 
@@ -953,6 +953,43 @@ uipc_peeraddr(struct socket *so, struct sockaddr **nam)
 	return (0);
 }
 
+static int
+uipc_finalizecontrol(struct socket *so, int flags, struct mbuf **pcontrol,
+    struct thread *td)
+{
+	struct unpcb *unp, *unp2;
+	int error = 0;
+
+	unp = sotounpcb(so);
+	KASSERT(unp != NULL, ("uipc_finalizecontrol: unp == NULL"));
+
+	unp2 = unp->unp_conn;
+
+	if (*pcontrol != NULL && (error = unp_internalize(pcontrol, td)))
+		return (error);
+
+	/* Lockless read, ignore when not connected. */
+	if (unp2 && unp2->unp_flags & UNP_WANTCRED) {
+		switch (so->so_type) {
+		case SOCK_SEQPACKET:
+		case SOCK_STREAM:
+			/* Credentials are passed only once on streams */
+			UNP_PCB_LOCK(unp2);
+			if (unp2->unp_flags & UNP_WANTCRED) {
+				unp2->unp_flags &= ~UNP_WANTCRED;
+				error = unp_addsockcred(pcontrol, td);
+			}
+			UNP_PCB_UNLOCK(unp2);
+			break;
+		case SOCK_DGRAM:
+			error = unp_addsockcred(pcontrol, td);
+			break;
+		}
+	}
+
+	return (error);
+}
+
 static int
 uipc_rcvd(struct socket *so, int flags)
 {
@@ -1045,8 +1082,6 @@ uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
 		error = EOPNOTSUPP;
 		goto release;
 	}
-	if (control != NULL && (error = unp_internalize(&control, td)))
-		goto release;
 
 	unp2 = NULL;
 	switch (so->so_type) {
@@ -1106,8 +1141,6 @@ uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
 			break;
 		}
 	connect_self:
-		if (unp2->unp_flags & UNP_WANTCRED)
-			control = unp_addsockcred(td, control);
 		if (unp->unp_addr != NULL)
 			from = (struct sockaddr *)unp->unp_addr;
 		else
@@ -1167,14 +1200,6 @@ uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
 			break;
 		}
 		SOCKBUF_LOCK(&so2->so_rcv);
-		if (unp2->unp_flags & UNP_WANTCRED) {
-			/*
-			 * Credentials are passed only once on SOCK_STREAM
-			 * and SOCK_SEQPACKET.
-			 */
-			unp2->unp_flags &= ~UNP_WANTCRED;
-			control = unp_addsockcred(td, control);
-		}
 		/*
 		 * Send to paired receive port, and then reduce send buffer
 		 * hiwater marks to maintain backpressure.  Wake up readers.
@@ -1182,9 +1207,9 @@ uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
 		switch (so->so_type) {
 		case SOCK_STREAM:
 			if (control != NULL) {
-				if (sbappendcontrol_locked(&so2->so_rcv, m,
-				    control))
-					control = NULL;
+				sbappendcontrol_locked(&so2->so_rcv, m,
+				    control);
+				control = NULL;
 			} else
 				sbappend_locked(&so2->so_rcv, m, flags);
 			break;
@@ -1205,6 +1230,7 @@ uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
 			break;
 			}
 		}
+		m = NULL;
 
 		mbcnt = so2->so_rcv.sb_mbcnt;
 		sbcc = sbavail(&so2->so_rcv);
@@ -1225,7 +1251,6 @@ uipc_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam,
 			so->so_snd.sb_flags |= SB_STOP;
 		SOCKBUF_UNLOCK(&so->so_snd);
 		UNP_PCB_UNLOCK(unp2);
-		m = NULL;
 		break;
 	}
 
@@ -1360,6 +1385,7 @@ static struct pr_usrreqs uipc_usrreqs_dgram = {
 	.pru_disconnect =	uipc_disconnect,
 	.pru_listen =		uipc_listen,
 	.pru_peeraddr =		uipc_peeraddr,
+	.pru_finalizecontrol =	uipc_finalizecontrol,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
 	.pru_sense =		uipc_sense,
@@ -1382,6 +1408,7 @@ static struct pr_usrreqs uipc_usrreqs_seqpacket = {
 	.pru_disconnect =	uipc_disconnect,
 	.pru_listen =		uipc_listen,
 	.pru_peeraddr =		uipc_peeraddr,
+	.pru_finalizecontrol =	uipc_finalizecontrol,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
 	.pru_sense =		uipc_sense,
@@ -1404,6 +1431,7 @@ static struct pr_usrreqs uipc_usrreqs_stream = {
 	.pru_disconnect =	uipc_disconnect,
 	.pru_listen =		uipc_listen,
 	.pru_peeraddr =		uipc_peeraddr,
+	.pru_finalizecontrol =	uipc_finalizecontrol,
 	.pru_rcvd =		uipc_rcvd,
 	.pru_send =		uipc_send,
 	.pru_ready =		uipc_ready,
@@ -2025,7 +2053,7 @@ unp_externalize(struct mbuf *control, struct mbuf **controlp, int flags)
 			    SCM_RIGHTS, SOL_SOCKET);
 			if (*controlp == NULL) {
 				FILEDESC_XUNLOCK(fdesc);
-				error = E2BIG;
+				error = ENOBUFS;
 				unp_freerights(fdep, newfds);
 				goto next;
 			}
@@ -2202,7 +2230,7 @@ unp_internalize(struct mbuf **controlp, struct thread *td)
 			    SCM_RIGHTS, SOL_SOCKET);
 			if (*controlp == NULL) {
 				FILEDESC_SUNLOCK(fdesc);
-				error = E2BIG;
+				error = ENOBUFS;
 				goto out;
 			}
 			fdp = data;
@@ -2290,9 +2318,10 @@ unp_internalize(struct mbuf **controlp, struct thread *td)
 	return (error);
 }
 
-static struct mbuf *
-unp_addsockcred(struct thread *td, struct mbuf *control)
+static int
+unp_addsockcred(struct mbuf **pcontrol, struct thread *td)
 {
+	struct mbuf *control = *pcontrol;
 	struct mbuf *m, *n, *n_prev;
 	struct sockcred *sc;
 	const struct cmsghdr *cm;
@@ -2302,7 +2331,7 @@ unp_addsockcred(struct thread *td, struct mbuf *control)
 	ngroups = MIN(td->td_ucred->cr_ngroups, CMGROUP_MAX);
 	m = sbcreatecontrol(NULL, SOCKCREDSIZE(ngroups), SCM_CREDS, SOL_SOCKET);
 	if (m == NULL)
-		return (control);
+		return (ENOBUFS);
 
 	sc = (struct sockcred *) CMSG_DATA(mtod(m, struct cmsghdr *));
 	sc->sc_uid = td->td_ucred->cr_ruid;
@@ -2336,7 +2365,8 @@ unp_addsockcred(struct thread *td, struct mbuf *control)
 
 	/* Prepend it to the head. */
 	m->m_next = control;
-	return (m);
+	*pcontrol = m;
+	return (0);
 }
 
 static struct unpcb *
diff --git a/sys/sys/protosw.h b/sys/sys/protosw.h
index 096b507776e1..b28ae286ff66 100644
--- a/sys/sys/protosw.h
+++ b/sys/sys/protosw.h
@@ -201,6 +201,8 @@ struct pr_usrreqs {
 	int	(*pru_listen)(struct socket *so, int backlog,
 		    struct thread *td);
 	int	(*pru_peeraddr)(struct socket *so, struct sockaddr **nam);
+	int	(*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf **pcontrol,
+		    struct thread *td);
 	int	(*pru_rcvd)(struct socket *so, int flags);
 	int	(*pru_rcvoob)(struct socket *so, struct mbuf *m, int flags);
 	int	(*pru_send)(struct socket *so, int flags, struct mbuf *m,
diff --git a/sys/sys/sockbuf.h b/sys/sys/sockbuf.h
index ff7863da6055..ad452ad4f4ab 100644
--- a/sys/sys/sockbuf.h
+++ b/sys/sys/sockbuf.h
@@ -139,9 +139,9 @@ int	sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa,
 	    struct mbuf *m0, struct mbuf *control);
 int	sbappendaddr_nospacecheck_locked(struct sockbuf *sb,
 	    const struct sockaddr *asa, struct mbuf *m0, struct mbuf *control);
-int	sbappendcontrol(struct sockbuf *sb, struct mbuf *m0,
+void	sbappendcontrol(struct sockbuf *sb, struct mbuf *m0,
 	    struct mbuf *control);
-int	sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
+void	sbappendcontrol_locked(struct sockbuf *sb, struct mbuf *m0,
 	    struct mbuf *control);
 void	sbappendrecord(struct sockbuf *sb, struct mbuf *m0);
 void	sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0);
>From 767e783215ac8efc9925a15bcb2e5b83f90f5dcf Mon Sep 17 00:00:00 2001
From: Chris Torek <chris.torek@gmail.com>
Date: Sat, 2 Apr 2016 16:48:07 -0700
Subject: [PATCH 2/4] socket send: use correct control message length

The new internalize function may (and actually does)
replace a single control message mbuf with a chain, so
we must use m_length() rather than just control->m_len.

Add some comments to that effect.

Also, while here, consolidate the send-space calculation
(which adds an extra magic-number 1024 for out of band
data) into a single inline, so that we have one magic
constant instead of two.
---
 sys/kern/uipc_socket.c | 43 ++++++++++++++++++++++++++++++------------
 sys/kern/uipc_usrreq.c | 10 ++++++++++
 2 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/sys/kern/uipc_socket.c b/sys/kern/uipc_socket.c
index 4a3cd3e9bfda..a574c84195d8 100644
--- a/sys/kern/uipc_socket.c
+++ b/sys/kern/uipc_socket.c
@@ -1266,6 +1266,23 @@ sodisconnect(struct socket *so)
 
 #define	SBLOCKWAIT(f)	(((f) & MSG_DONTWAIT) ? 0 : SBL_WAIT)
 
+/*
+ * Like sbspace(&so->so_snd), but allow extra room for  MSG_OOB.
+ *
+ * It's not clear whether the magic 1024 number is sensible, but
+ * at least it's now in only one place.
+ */
+static inline int
+sosend_space(struct socket *so, int flags)
+{
+	long space;
+
+	space = sbspace(&so->so_snd);
+	if (flags & MSG_OOB)
+		space += 1024;
+	return (space);
+}
+
 int
 sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio,
     struct mbuf *top, struct mbuf *control, int flags, struct thread *td)
@@ -1303,8 +1320,17 @@ sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio,
 	    (flags & MSG_DONTROUTE) && (so->so_options & SO_DONTROUTE) == 0;
 	if (td != NULL)
 		td->td_ru.ru_msgsnd++;
+	/*
+	 * NB: Original user supplied control message may have
+	 * fit and we may have made it too big when we finalized
+	 * it, which will have us return EMSGSIZE below.  This
+	 * seems rude, but it is at least functional: the user
+	 * can try sending smaller control values (mainly, fewer
+	 * fd's at a time, as those are the ones that expand to
+	 * twice their size on I32LP64 systems).
+	 */
 	if (control != NULL)
-		clen = control->m_len;
+		clen = m_length(control, NULL);
 
 	SOCKBUF_LOCK(&so->so_snd);
 	if (so->so_snd.sb_state & SBS_CANTSENDMORE) {
@@ -1342,14 +1368,8 @@ sosend_dgram(struct socket *so, struct sockaddr *addr, struct uio *uio,
 		}
 	}
 
-	/*
-	 * Do we need MSG_OOB support in SOCK_DGRAM?  Signs here may be a
-	 * problem and need fixing.
-	 */
-	space = sbspace(&so->so_snd);
+	space = sosend_space(so, flags);
 	SOCKBUF_UNLOCK(&so->so_snd);
-	if (flags & MSG_OOB)
-		space += 1024;
 	if (clen > space) {
 		error = EMSGSIZE;
 		goto out;
@@ -1479,7 +1499,7 @@ sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
 	if (td != NULL)
 		td->td_ru.ru_msgsnd++;
 	if (control != NULL)
-		clen = control->m_len;
+		clen = m_length(control, NULL);
 
 	error = sblock(&so->so_snd, SBLOCKWAIT(flags));
 	if (error)
@@ -1523,9 +1543,8 @@ sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
 				goto release;
 			}
 		}
-		space = sbspace(&so->so_snd);
-		if (flags & MSG_OOB)
-			space += 1024;
+		space = sosend_space(so, flags);
+		/* NB: control msg is implicitly atomic */
 		if ((atomic && resid > so->so_snd.sb_hiwat) ||
 		    clen > so->so_snd.sb_hiwat) {
 			SOCKBUF_UNLOCK(&so->so_snd);
diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
index 3f0bad36fe0c..e7da95fee7be 100644
--- a/sys/kern/uipc_usrreq.c
+++ b/sys/kern/uipc_usrreq.c
@@ -2140,6 +2140,16 @@ unp_init(void)
 	UNP_DEFERRED_LOCK_INIT();
 }
 
+/*
+ * Convert incoming control message from user-supplied format
+ * to internal form.
+ *
+ * Note that when we're called, *controlp is a single mbuf
+ * whose m_len is the length of the cmsg data structures
+ * that have not yet been internalized.  On return, *controlp
+ * is an mbuf chain whose individual mbufs are internalized;
+ * this chain may have a different length.
+ */
 static int
 unp_internalize(struct mbuf **controlp, struct thread *td)
 {
>From 2e3f20560ed12ffb277659562033ddb7e0c574ba Mon Sep 17 00:00:00 2001
From: Chris Torek <chris.torek@gmail.com>
Date: Sat, 2 Apr 2016 16:55:48 -0700
Subject: [PATCH 3/4] uipc_finalizecontrol: read-lock the unp link

Alan Somers pointed out [1] that the socket linkage has a lock;
we should grab it for form's sake, at least.

While here, rewrite the code for clarity, and note a bug.

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181741#c7
---
 sys/kern/uipc_usrreq.c | 52 +++++++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 19 deletions(-)

diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
index e7da95fee7be..362b23ea6995 100644
--- a/sys/kern/uipc_usrreq.c
+++ b/sys/kern/uipc_usrreq.c
@@ -959,34 +959,48 @@ uipc_finalizecontrol(struct socket *so, int flags, struct mbuf **pcontrol,
 {
 	struct unpcb *unp, *unp2;
 	int error = 0;
+	bool wantcred, oneshot;
 
 	unp = sotounpcb(so);
 	KASSERT(unp != NULL, ("uipc_finalizecontrol: unp == NULL"));
 
-	unp2 = unp->unp_conn;
-
 	if (*pcontrol != NULL && (error = unp_internalize(pcontrol, td)))
 		return (error);
 
-	/* Lockless read, ignore when not connected. */
-	if (unp2 && unp2->unp_flags & UNP_WANTCRED) {
-		switch (so->so_type) {
-		case SOCK_SEQPACKET:
-		case SOCK_STREAM:
-			/* Credentials are passed only once on streams */
-			UNP_PCB_LOCK(unp2);
-			if (unp2->unp_flags & UNP_WANTCRED) {
-				unp2->unp_flags &= ~UNP_WANTCRED;
-				error = unp_addsockcred(pcontrol, td);
-			}
-			UNP_PCB_UNLOCK(unp2);
-			break;
-		case SOCK_DGRAM:
-			error = unp_addsockcred(pcontrol, td);
-			break;
-		}
+	UNP_LINK_RLOCK();
+	unp2 = unp->unp_conn;
+	UNP_LINK_RUNLOCK();
+
+	/*
+	 * If not connected, we're done now (might be auto-connect
+	 * on send, leave everything to caller).  Otherwise, handle
+	 * one-shot credentials on stream and seqpacket sockets here.
+	 *
+	 * XXX If the send fails, we never get another chance.
+	 * We could restore UNP_WANTCRED if the unp_addsockcred()
+	 * call fails here but we can't handle the more likely
+	 * entire-send-fails case.  Deferring clearing the flag
+	 * is not a great solution either.  Perhaps best would be
+	 * to have an additional UNP_CREDS_SENT_SUCCESSFULLY flag
+	 * and check that here.  For now, just leave it this way.
+	 */
+	if (unp2 == NULL)
+		return (0);
+
+	oneshot = so->so_type == SOCK_SEQPACKET ||
+	    so->so_type == SOCK_STREAM;
+	if (oneshot) {
+		UNP_PCB_LOCK(unp2);
+		wantcred = (unp2->unp_flags & UNP_WANTCRED) != 0;
+		unp2->unp_flags &= ~UNP_WANTCRED;
+		UNP_PCB_UNLOCK(unp2);
+	} else {
+		wantcred = true;
 	}
 
+	if (wantcred)
+		error = unp_addsockcred(pcontrol, td);
+
 	return (error);
 }
 
>From f200a14690619f6002c4bbf6b1152b2a50db0852 Mon Sep 17 00:00:00 2001
From: Chris Torek <chris.torek@gmail.com>
Date: Sat, 2 Apr 2016 19:23:32 -0700
Subject: [PATCH 4/4] rewrite unp_internalize()

The existing code was nearly unreadable and probably buggy
if you sent an SCM_RIGHTS control message with an empty fd
list and then any subsequent SCM_* message.

This moves the various internalizers into separate functions,
so that the control path, where we operate on the outer
control-message data, is firewalled off from the transforms,
where we turn the sender's SCM_* messages into the internal
SCM_* forms that go across the unp link to the other socket.
---
 sys/kern/uipc_usrreq.c | 385 +++++++++++++++++++++++++----------------
 1 file changed, 236 insertions(+), 149 deletions(-)

diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
index 362b23ea6995..8b81f5f2241f 100644
--- a/sys/kern/uipc_usrreq.c
+++ b/sys/kern/uipc_usrreq.c
@@ -2154,6 +2154,44 @@ unp_init(void)
 	UNP_DEFERRED_LOCK_INIT();
 }
 
+/*
+ * Arguments passed to internalizer/transformer (ix = internal xform).
+ * The transformation function may fill in a new ix_mbuf.
+ */
+struct internalize_transform_data {
+	socklen_t	ix_odatalen;	/* original data length in bytes */
+	socklen_t	ix_ndatalen;	/* new data length, or 0 */
+	void		*ix_odata;	/* original data */
+	void 		*ix_ndata;	/* new data area, or NULL */
+	struct mbuf	*ix_mbuf;	/* mbuf for new data */
+	struct thread	*ix_td;		/* calling thread */
+};
+
+/*
+ * Internalizers.  If you provide a nonzero size, we pre-allocate
+ * the ix_mbuf and you get a nonzero ndatasize and non-NULL ndata.
+ */
+struct unp_scm_internalize_op {
+	socklen_t size;		/* predefined output size, or 0 */
+	int	(*func)(struct internalize_transform_data *);
+};
+
+static int unp_internalize_creds(struct internalize_transform_data *);
+static int unp_internalize_fds(struct internalize_transform_data *);
+static int unp_internalize_timestamp(struct internalize_transform_data *);
+static int unp_internalize_bintime(struct internalize_transform_data *);
+static int unp_internalize_realtime(struct internalize_transform_data *);
+static int unp_internalize_monotonic(struct internalize_transform_data *);
+
+static struct unp_scm_internalize_op unp_internalize_ops[] = {
+	[SCM_CREDS] = { sizeof(struct cmsgcred), unp_internalize_creds },
+	[SCM_RIGHTS] = { 0, unp_internalize_fds },
+	[SCM_TIMESTAMP] = { sizeof(struct timeval), unp_internalize_timestamp },
+	[SCM_BINTIME] = { sizeof(struct bintime), unp_internalize_bintime },
+	[SCM_REALTIME] = { sizeof(struct timespec), unp_internalize_realtime },
+	[SCM_MONOTONIC] = { sizeof(struct timespec), unp_internalize_monotonic },
+};
+
 /*
  * Convert incoming control message from user-supplied format
  * to internal form.
@@ -2163,185 +2201,234 @@ unp_init(void)
  * that have not yet been internalized.  On return, *controlp
  * is an mbuf chain whose individual mbufs are internalized;
  * this chain may have a different length.
+ *
+ * Caller will always m_freem(*controlp), even if we return error.
  */
 static int
 unp_internalize(struct mbuf **controlp, struct thread *td)
 {
-	struct mbuf *control = *controlp;
-	struct proc *p = td->td_proc;
-	struct filedesc *fdesc = p->p_fd;
-	struct bintime *bt;
-	struct cmsghdr *cm = mtod(control, struct cmsghdr *);
-	struct cmsgcred *cmcred;
-	struct filedescent *fde, **fdep, *fdev;
-	struct file *fp;
-	struct timeval *tv;
-	struct timespec *ts;
-	int i, *fdp;
-	void *data;
-	socklen_t clen = control->m_len, datalen;
-	int error, oldfds;
-	u_int newlen;
+	struct unp_scm_internalize_op *op;
+	struct internalize_transform_data ix;
+	struct cmsghdr *cm;
+	struct mbuf *control, *m;
+	void *odata;
+	int cmtype, error;
+	socklen_t clen, size, odatalen;
 
 	UNP_LINK_UNLOCK_ASSERT();
 
+	ix.ix_td = td;		/* never changes, just passed through */
+
 	error = 0;
+	control = *controlp;
 	*controlp = NULL;
-	while (cm != NULL) {
-		if (sizeof(*cm) > clen || cm->cmsg_level != SOL_SOCKET
-		    || cm->cmsg_len > clen || cm->cmsg_len < sizeof(*cm)) {
-			error = EINVAL;
-			goto out;
-		}
-		data = CMSG_DATA(cm);
-		datalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)data;
+	clen = control->m_len;
+	cm = mtod(control, struct cmsghdr *);
 
-		switch (cm->cmsg_type) {
+	while (error == 0) {
 		/*
-		 * Fill in credential information.
+		 * Verify current control message, and set up type
+		 * and old data pointer and size values.
 		 */
-		case SCM_CREDS:
-			*controlp = sbcreatecontrol(NULL, sizeof(*cmcred),
-			    SCM_CREDS, SOL_SOCKET);
-			if (*controlp == NULL) {
-				error = ENOBUFS;
-				goto out;
-			}
-			cmcred = (struct cmsgcred *)
-			    CMSG_DATA(mtod(*controlp, struct cmsghdr *));
-			cmcred->cmcred_pid = p->p_pid;
-			cmcred->cmcred_uid = td->td_ucred->cr_ruid;
-			cmcred->cmcred_gid = td->td_ucred->cr_rgid;
-			cmcred->cmcred_euid = td->td_ucred->cr_uid;
-			cmcred->cmcred_ngroups = MIN(td->td_ucred->cr_ngroups,
-			    CMGROUP_MAX);
-			for (i = 0; i < cmcred->cmcred_ngroups; i++)
-				cmcred->cmcred_groups[i] =
-				    td->td_ucred->cr_groups[i];
+		if (clen < sizeof(*cm) || cm->cmsg_level != SOL_SOCKET ||
+		    cm->cmsg_len > clen || cm->cmsg_len < sizeof(*cm)) {
+			error = EINVAL;
 			break;
+		}
 
-		case SCM_RIGHTS:
-			oldfds = datalen / sizeof (int);
-			if (oldfds == 0)
-				break;
-			/*
-			 * Check that all the FDs passed in refer to legal
-			 * files.  If not, reject the entire operation.
-			 */
-			fdp = data;
-			FILEDESC_SLOCK(fdesc);
-			for (i = 0; i < oldfds; i++, fdp++) {
-				fp = fget_locked(fdesc, *fdp);
-				if (fp == NULL) {
-					FILEDESC_SUNLOCK(fdesc);
-					error = EBADF;
-					goto out;
-				}
-				if (!(fp->f_ops->fo_flags & DFLAG_PASSABLE)) {
-					FILEDESC_SUNLOCK(fdesc);
-					error = EOPNOTSUPP;
-					goto out;
-				}
-
-			}
-
-			/*
-			 * Now replace the integer FDs with pointers to the
-			 * file structure and capability rights.
-			 */
-			newlen = oldfds * sizeof(fdep[0]);
-			*controlp = sbcreatecontrol(NULL, newlen,
-			    SCM_RIGHTS, SOL_SOCKET);
-			if (*controlp == NULL) {
-				FILEDESC_SUNLOCK(fdesc);
-				error = ENOBUFS;
-				goto out;
-			}
-			fdp = data;
-			fdep = (struct filedescent **)
-			    CMSG_DATA(mtod(*controlp, struct cmsghdr *));
-			fdev = malloc(sizeof(*fdev) * oldfds, M_FILECAPS,
-			    M_WAITOK);
-			for (i = 0; i < oldfds; i++, fdev++, fdp++) {
-				fde = &fdesc->fd_ofiles[*fdp];
-				fdep[i] = fdev;
-				fdep[i]->fde_file = fde->fde_file;
-				filecaps_copy(&fde->fde_caps,
-				    &fdep[i]->fde_caps, true);
-				unp_internalize_fp(fdep[i]->fde_file);
-			}
-			FILEDESC_SUNLOCK(fdesc);
+		cmtype = cm->cmsg_type;
+		if (cmtype < 0 || cmtype >= nitems(unp_internalize_ops)) {
+			error = EINVAL;
 			break;
-
-		case SCM_TIMESTAMP:
-			*controlp = sbcreatecontrol(NULL, sizeof(*tv),
-			    SCM_TIMESTAMP, SOL_SOCKET);
-			if (*controlp == NULL) {
-				error = ENOBUFS;
-				goto out;
-			}
-			tv = (struct timeval *)
-			    CMSG_DATA(mtod(*controlp, struct cmsghdr *));
-			microtime(tv);
+		}
+		op = &unp_internalize_ops[cmtype];
+		if (op->func == NULL) {
+			error = EINVAL;
 			break;
+		}
 
-		case SCM_BINTIME:
-			*controlp = sbcreatecontrol(NULL, sizeof(*bt),
-			    SCM_BINTIME, SOL_SOCKET);
-			if (*controlp == NULL) {
-				error = ENOBUFS;
-				goto out;
-			}
-			bt = (struct bintime *)
-			    CMSG_DATA(mtod(*controlp, struct cmsghdr *));
-			bintime(bt);
-			break;
+		odata = CMSG_DATA(cm);
+		odatalen = (caddr_t)cm + cm->cmsg_len - (caddr_t)odata;
 
-		case SCM_REALTIME:
-			*controlp = sbcreatecontrol(NULL, sizeof(*ts),
-			    SCM_REALTIME, SOL_SOCKET);
-			if (*controlp == NULL) {
-				error = ENOBUFS;
-				goto out;
-			}
-			ts = (struct timespec *)
-			    CMSG_DATA(mtod(*controlp, struct cmsghdr *));
-			nanotime(ts);
-			break;
+		ix.ix_odata = odata;
+		ix.ix_odatalen = odatalen;
 
-		case SCM_MONOTONIC:
-			*controlp = sbcreatecontrol(NULL, sizeof(*ts),
-			    SCM_MONOTONIC, SOL_SOCKET);
-			if (*controlp == NULL) {
+		/*
+		 * If transform function gives us a fixed data
+		 * size, allocate new mbuf here, else leave it
+		 * to the function.
+		 */
+		if ((size = op->size) != 0) {
+			m = sbcreatecontrol(NULL, size, cmtype, SOL_SOCKET);
+			if (m == NULL) {
 				error = ENOBUFS;
-				goto out;
+				break;
 			}
-			ts = (struct timespec *)
-			    CMSG_DATA(mtod(*controlp, struct cmsghdr *));
-			nanouptime(ts);
-			break;
-
-		default:
-			error = EINVAL;
-			goto out;
+			ix.ix_mbuf = m;
+			ix.ix_ndata = CMSG_DATA(mtod(m, struct cmsghdr *));
+			ix.ix_ndatalen = size;
+		} else {
+			ix.ix_mbuf = NULL;
+			ix.ix_ndata = NULL;
+			ix.ix_ndatalen = 0;
 		}
 
-		controlp = &(*controlp)->m_next;
-		if (CMSG_SPACE(datalen) < clen) {
-			clen -= CMSG_SPACE(datalen);
-			cm = (struct cmsghdr *)
-			    ((caddr_t)cm + CMSG_SPACE(datalen));
-		} else {
-			clen = 0;
-			cm = NULL;
+		/*
+		 * Apply transform and append new mbuf (if any) to
+		 * new control chain, even on error, so that it
+		 * will get freed.
+		 */
+		error = (*op->func)(&ix);
+		if ((m = ix.ix_mbuf) != NULL) {
+			*controlp = m;
+			controlp = &m->m_next;
 		}
+
+		/* Advance to next message. */
+		size = CMSG_SPACE(odatalen);
+		if (size >= clen)
+			break;
+		cm = (struct cmsghdr *)((caddr_t)cm + size);
+		clen -= size;
 	}
 
-out:
 	m_freem(control);
 	return (error);
 }
 
+/*
+ * Internalize file descriptors ("rights").
+ */
+static int
+unp_internalize_fds(struct internalize_transform_data *ix)
+{
+	struct proc *p = ix->ix_td->td_proc;
+	struct filedesc *fdesc = p->p_fd;
+	struct filedescent *fde, **fdep, *fdev;
+	struct file *fp;
+	struct mbuf *m;
+	int i, *fdp;
+	int oldfds;
+	u_int newlen;
+
+	KASSERT(ix->ix_ndatalen == 0, ("%s: datalen", __func__));
+
+	/* Round down: this is forgiving, if slightly wrong. */
+	oldfds = ix->ix_odatalen / sizeof (int);
+	if (oldfds == 0)
+		return (0);
+
+	/*
+	 * Check that all the FDs passed in refer to legal
+	 * files.  If not, reject the entire operation.
+	 */
+	fdp = ix->ix_odata;
+	FILEDESC_SLOCK(fdesc);
+	for (i = 0; i < oldfds; i++, fdp++) {
+		fp = fget_locked(fdesc, *fdp);
+		if (fp == NULL) {
+			FILEDESC_SUNLOCK(fdesc);
+			return (EBADF);
+		}
+		if (!(fp->f_ops->fo_flags & DFLAG_PASSABLE)) {
+			FILEDESC_SUNLOCK(fdesc);
+			return (EOPNOTSUPP);
+		}
+
+	}
+
+	/*
+	 * Now replace the integer FDs with pointers to the
+	 * file structure and capability rights.
+	 */
+	newlen = oldfds * sizeof(fdep[0]);
+	m = sbcreatecontrol(NULL, newlen, SCM_RIGHTS, SOL_SOCKET);
+	if (m == NULL) {
+		FILEDESC_SUNLOCK(fdesc);
+		return (ENOBUFS);
+	}
+
+	fdp = ix->ix_odata;
+	fdep = (struct filedescent **)CMSG_DATA(mtod(m, struct cmsghdr *));
+	fdev = malloc(sizeof(*fdev) * oldfds, M_FILECAPS, M_WAITOK);
+	for (i = 0; i < oldfds; i++, fdev++, fdp++) {
+		fde = &fdesc->fd_ofiles[*fdp];
+		fdep[i] = fdev;
+		fdep[i]->fde_file = fde->fde_file;
+		filecaps_copy(&fde->fde_caps, &fdep[i]->fde_caps, true);
+		unp_internalize_fp(fdep[i]->fde_file);
+	}
+	FILEDESC_SUNLOCK(fdesc);
+
+	ix->ix_mbuf = m;
+	return (0);
+}
+
+static int
+unp_internalize_creds(struct internalize_transform_data *ix)
+{
+	struct cmsgcred *cmcred;
+	int i, ngroups;
+	struct thread *td = ix->ix_td;
+	struct ucred *cr = td->td_ucred;
+
+	KASSERT(ix->ix_ndatalen == sizeof(*cmcred), ("%s: datalen", __func__));
+	cmcred = ix->ix_ndata;
+	cmcred->cmcred_pid = td->td_proc->p_pid;
+	cmcred->cmcred_uid = cr->cr_ruid;
+	cmcred->cmcred_gid = cr->cr_rgid;
+	cmcred->cmcred_euid = cr->cr_uid;
+	ngroups = MIN(cr->cr_ngroups, CMGROUP_MAX);
+	cmcred->cmcred_ngroups = ngroups;
+	for (i = 0; i < ngroups; i++)
+		cmcred->cmcred_groups[i] = cr->cr_groups[i];
+	return (0);
+}
+
+static int
+unp_internalize_timestamp(struct internalize_transform_data *ix)
+{
+	struct timeval *tv;
+
+	KASSERT(ix->ix_ndatalen == sizeof(*tv), ("%s: datalen", __func__));
+	tv = ix->ix_ndata;
+	microtime(tv);
+	return (0);
+}
+
+static int
+unp_internalize_bintime(struct internalize_transform_data *ix)
+{
+	struct bintime *bt;
+
+	KASSERT(ix->ix_ndatalen == sizeof(*bt), ("%s: datalen", __func__));
+	bt = ix->ix_ndata;
+	bintime(bt);
+	return (0);
+}
+
+static int
+unp_internalize_realtime(struct internalize_transform_data *ix)
+{
+	struct timespec *ts;
+
+	KASSERT(ix->ix_ndatalen == sizeof(*ts), ("%s: datalen", __func__));
+	ts = ix->ix_ndata;
+	nanotime(ts);
+	return (0);
+}
+
+static int
+unp_internalize_monotonic(struct internalize_transform_data *ix)
+{
+	struct timespec *ts;
+
+	KASSERT(ix->ix_ndatalen == sizeof(*ts), ("%s: datalen", __func__));
+	ts = ix->ix_ndata;
+	nanouptime(ts);
+	return (0);
+}
+
 static int
 unp_addsockcred(struct mbuf **pcontrol, struct thread *td)
 {

--=-=-=--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7elq-l8iq-wny>