From owner-freebsd-current@freebsd.org Mon Jul 27 17:21:31 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3672B36949C for ; Mon, 27 Jul 2020 17:21:31 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4BFmn559H2z45H0; Mon, 27 Jul 2020 17:21:29 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from [192.168.1.2] (pool-74-110-137-7.rcmdva.fios.verizon.net [74.110.137.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: gallatin) by duke.cs.duke.edu (Postfix) with ESMTPSA id 3E0B32700371; Mon, 27 Jul 2020 13:21:28 -0400 (EDT) DMARC-Filter: OpenDMARC Filter v1.3.1 duke.cs.duke.edu 3E0B32700371 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail0816; t=1595870488; bh=S5ukxyiQiS2dEQ7UohLAIawjr2l7TV5rST7xlPIUjOw=; h=Subject:To:From:Date:From; b=grHXr6Bn2XSIpTG6dUKGfb8IcS78o8vjGLB/V4LImpfRGe8FBYzkUDdIk/rGE2cLU j2vuuoHiuTxuOum46tbQPZVZkKm6P3ilGaTqWFcWUVfG3ouHLIThpCeAbUSiU7L50y cUnD/VuYGXudGaMUNqJ/nldYIL0KvgB1ITqRi0v20HhriQwfvv2O89hhpXiZS2dzqn 3BmG9XNc2Xa/KxBJlW1klf0INu+Tco4SI7xQ33YElTb6UTqGpicM8YyQTDXroP6mfH DwZzhwpptpB6d3Dv8uBjOB7VzfzWFOCwASlVsRYetuTdOWMvDrZhjn+m7EG3mYdBhl ri0cFWnRA91Vg== Subject: Re: RFC: ktls and krpc using M_EXTPG mbufs To: Rick Macklem , "freebsd-current@FreeBSD.org" Cc: "jhb@FreeBSD.org" , "gallatin@freebsd.org" , Gleb Smirnoff References: From: Andrew Gallatin Message-ID: <319c92f4-4157-74a3-2bec-8f40e3979261@cs.duke.edu> Date: Mon, 27 Jul 2020 13:21:27 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4BFmn559H2z45H0 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=cs.duke.edu header.s=mail0816 header.b=grHXr6Bn; dmarc=pass (policy=none) header.from=cs.duke.edu; spf=pass (mx1.freebsd.org: domain of gallatin@cs.duke.edu designates 152.3.140.1 as permitted sender) smtp.mailfrom=gallatin@cs.duke.edu X-Spamd-Result: default: False [-5.10 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[cs.duke.edu:s=mail0816]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:152.3.140.0/23]; NEURAL_HAM_LONG(-1.04)[-1.039]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; DWL_DNSWL_LOW(-1.00)[duke.edu:dkim]; RECEIVED_SPAMHAUS_PBL(0.00)[74.110.137.7:received]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[cs.duke.edu:+]; DMARC_POLICY_ALLOW(-0.50)[cs.duke.edu,none]; NEURAL_HAM_SHORT(-0.94)[-0.942]; NEURAL_HAM_MEDIUM(-1.02)[-1.020]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:13371, ipnet:152.3.128.0/17, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[152.3.140.1:from] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jul 2020 17:21:31 -0000 On 2020-07-19 19:34, Rick Macklem wrote: > I spent a little time chasing a problem in the nfs-over-tls code, where it > would sometimes end up with corrupted data in the file(s) of a mirrored > pNFS configuration. > > I think the problem was that the code filled the data to be written into > anonymous page M_EXTPG mbufs, then did a m_copym() { copy by > reference } and used the copies for the mirrored writes. > --> In ktls_encrypt(), the encryption was done to the same pages and, > sometimes, the encrypted data got encrypted again during the > sosend() of the other copy. > > Although I haven't reproduced it, a regular kernel write RPC could suffer the > same consequences if the RPC is retried (it keeps an m_copym() copy > of the request in the krpc for an RPC retry). > > At this time, the code in projects/nfs-over-tls works correctly, since it > always fills the data to be written into mbuf clusters, m_copym()s those > and then copies those { real copying using memcpy() } via > mb_mapped_to_unmapped() just before calling sosend(). > --> This works, but it would be nice to avoid the mb_mapped_to_unmapped() > copying for all the data being written via an NFS over TLS connection. > > For the TCP_TLS_MODE_SW case: > --> The NFS code can fill the written data into anonymous pages on M_EXTPG > mbufs. > Then, the ktls_encrypt() could be modified to > allocate a new set of anonymous pages for the destination side of > the encryption (it already does this for the sendfile case) and put those > in a new mbuf list. > --> This would result in new anonymous pages and mbufs being allocated, > but would not do memcpy()s. > After encryption, it would just do a m_freem() on the unencrypted list. > --> For the krpc client case, this call would only decrement the reference > count on the unencrypted list and it could be used for a retry by the krpc > and then be free'd { m_freem() call } after a reply is received. > > If doing this for all the sosend()s of anonymous page M_EXTPG mbufs seems > like unnecessary overhead, the above could be enabled via a setsockopt() > on the socket. > > What do others think of this? Several comments: mb_mapped_to_unmapped() is surprisingly inexpensive. It was less than 5% before I converted iflib to M_NOMAP aware. It seems like NFS should be constructing mbufs like sendfile does, and pointing mbufs at its pages. This would cause the crypto code to allocate a new set of pages upon encryption. > For the hardware offload case: > - Can I assume that the anonymous pages in M_EXTPG mbufs will remain > unchanged? > --> If so, and it won't change to TCP_TLS_MODE_SW, the NFS code could > fill the data to be written into M_EXTPG mbufs safely. > > - And, if so, can I safely use the ktls_session mode field to decide if offload > is happening? > I see the TCP_TXTLS_MODE socket opt which seems to > switch the mode to TCP_TLS_MODE_SW. > When does this happen? Or, can this happen to a session once in use? Yes. The intent is to allow something (TCP stack, smart user daemon) to look at a connection & move it from hardware to software, if it has a lot of TCP re-transmits. Drew