Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Jul 2018 10:26:01 -0700
From:      John Baldwin <jhb@FreeBSD.org>
To:        Sean Bruno <sbruno@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r336282 - head/sys/netinet
Message-ID:  <8f1bc11e-5653-b737-42d0-140aa9b8ad84@FreeBSD.org>
In-Reply-To: <e3dbc55b-a085-ce92-ae89-4db0a7287113@freebsd.org>
References:  <201807141619.w6EGJk2Y016469@repo.freebsd.org> <7bbbb553-fca5-9801-49aa-caf6df929c2f@FreeBSD.org> <e3dbc55b-a085-ce92-ae89-4db0a7287113@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 7/14/18 9:57 AM, Sean Bruno wrote:
> 
> 
> On 07/14/18 10:37, John Baldwin wrote:
>> On 7/14/18 9:19 AM, Sean Bruno wrote:
>>> Author: sbruno
>>> Date: Sat Jul 14 16:19:46 2018
>>> New Revision: 336282
>>> URL: https://svnweb.freebsd.org/changeset/base/336282
>>>
>>> Log:
>>>   Fixup memory management for fetching options in ip_ctloutput()
>>>   
>>>   Submitted by:	Jason Eggleston <jason@eggnet.com>
>>>   Sponsored by:	Limelight Networks
>>>   Differential Revision:	https://reviews.freebsd.org/D14621
>>>
>>> Modified:
>>>   head/sys/netinet/ip_output.c
>>>
>>> Modified: head/sys/netinet/ip_output.c
>>> ==============================================================================
>>> --- head/sys/netinet/ip_output.c	Sat Jul 14 16:06:53 2018	(r336281)
>>> +++ head/sys/netinet/ip_output.c	Sat Jul 14 16:19:46 2018	(r336282)
>>> @@ -1256,12 +1256,18 @@ ip_ctloutput(struct socket *so, struct sockopt *sopt)
>>>  		switch (sopt->sopt_name) {
>>>  		case IP_OPTIONS:
>>>  		case IP_RETOPTS:
>>> -			if (inp->inp_options)
>>> +			if (inp->inp_options) {
>>> +				unsigned long len = ulmin(inp->inp_options->m_len, sopt->sopt_valsize);
>>> +				struct mbuf *options = malloc(len, M_TEMP, M_WAITOK);
>>> +				INP_RLOCK(inp);
>>> +				bcopy(inp->inp_options, options, len);
>>> +				INP_RUNLOCK(inp);
>>>  				error = sooptcopyout(sopt,
>>> -						     mtod(inp->inp_options,
>>> +						     mtod(options,
>>>  							  char *),
>>> -						     inp->inp_options->m_len);
>>> -			else
>>> +						     len);
>>> +				free(options, M_TEMP);
>>> +			} else
>>>  				sopt->sopt_valsize = 0;
>>>  			break;
>>
>> You can't malloc an mbuf, and you don't really want an mbuf here anyway.
>> Also, style(9) doesn't normally assign values when a variable is created.
>> I would perhaps have done something like this:
>>
>> 	if (inp->inp_options) {
>> 		void *options;
>> 		u_long len;
>>
>> 		INP_RLOCK(inp);
>> 		len = ulmin(inp->inp_options->m_len, sopt->sopt_valsize);
>> 		INP_RUNLOCK(inp);
>> 		options = malloc(len, M_TEMP, M_WAITOK);
>> 		INP_RLOCK(inp);
>> 		len = ulmin(inp->inp_options->m_len, len);
>> 		m_copydata(inp->inp_options, 0, len, options);
>> 		INP_RUNLOCK(inp);
>> 		error = sooptcopyout(sopt, options, len);
>> 		free(options, M_TEMP);
>> 	}
>>
>> The current code isn't doing what you think it is doing since the bcopy()
>> copies the m_data pointer from 'inp_options' into 'options' so that
>> 'options->m_data' points at the m_data buffer in the 'inp_options' mbuf.
>> Thus, the mtod() returns a pointer into 'inp_options' just like the old
>> code, so this commit doesn't change anything in terms of the previous
>> race (it is still just as broken).
>>
>> Also, while INP_RLOCK isn't helpful in my version above for reading the
>> 'm_len' member (it's racy to read and then malloc no matter what), you
>> still need the lock to ensure the inp_options pointer itself is valid
>> when you dereference it.  Without the lock another thread might have
>> freed inp_options out from under you and the unlocked dereference could
>> panic (though it probably just reads garbage on most of our architectures
>> rather than panicking since we don't tend to invalidate KVA if you lose
>> that race).
>>
> 
> 
> Shall I revert and rethink?

Well, we can fix it, but even my fix above is wrong.  You can't
do the ulmin without breaking getsockopt().  The output length is
an in-out parameter to getsockopt() so that you can determine the
size of variable-sized variables (such as this one) with this type
of pattern:

	socklen_t len;
	void *buf;

	len = 0;
	getsockopt(s, IPPROTO_IP, IP_OPTIONS, NULL, &len);
	buf = malloc(len);
	getsockopt(s, IPPROTO_IP, IP_OPTIONS, buf, &len);

(In theory you'd really need to use a loop with realloc until you
obtain a consistent snapshot.)

However, for this to work, the kernel code that calls sooptcopyout
has to always pass the full size.  sooptcopyout ensures it doesn't
overflow the user's supplied buffer, but depends on the length it
is passed to determine the output value of 'len'.  The ulmin breaks
this since it will return a len of 0 for the first call always.

I think the right way to fix this is to use m_dup() with M_NOWAIT
while holding the INP lock and fail with ENOMEM if m_dup() fails:

	INP_RLOCK(inp);
	if (inp->inp_options) {
		struct mbuf *options;

		options = m_dup(inp->inp_options, M_NOWAIT);
		INP_RUNLOCK(inp);
		if (options != NULL) {
			error = sooptcopyout(sopt,
			    mtod(options, char *), options->m_len);
			m_free(options);
		} else
			error = ENOEM;
	} else {
		INP_RUNLOCK(inp);
		sopt->optvalsize = 0;
	}

(Someone else may prefer m_freem() to m_free() in case inp_options
can be an arbitrary chain.)

I chose to always acquire INP_RLOCK because I would have had to
test it again under the lock anyway before calling m_dup() to
determine ENOMEM vs no valid options, and getsockopt(IP_OPTIONS)
isn't a critical path such that it warrants doing an unlocked
check before doing a check under the lock.

I would also probably have expanded a bit on the commit message.
"Fixup memory management" is a bit vague as opposed to something
like "Fix a potential use after free in access to inp_options."

I would maybe ask jtl@ to review the above and/or offer an
alternative.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8f1bc11e-5653-b737-42d0-140aa9b8ad84>