From owner-svn-src-stable-12@freebsd.org Wed Jun 19 16:25:41 2019 Return-Path: Delivered-To: svn-src-stable-12@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B56015BE028; Wed, 19 Jun 2019 16:25:41 +0000 (UTC) (envelope-from jtl@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E37FF6D5F0; Wed, 19 Jun 2019 16:25:40 +0000 (UTC) (envelope-from jtl@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id B03DE26F30; Wed, 19 Jun 2019 16:25:40 +0000 (UTC) (envelope-from jtl@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x5JGPeWb017923; Wed, 19 Jun 2019 16:25:40 GMT (envelope-from jtl@FreeBSD.org) Received: (from jtl@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id x5JGPe6r017921; Wed, 19 Jun 2019 16:25:40 GMT (envelope-from jtl@FreeBSD.org) Message-Id: <201906191625.x5JGPe6r017921@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: jtl set sender to jtl@FreeBSD.org using -f From: "Jonathan T. Looney" Date: Wed, 19 Jun 2019 16:25:40 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-12@freebsd.org Subject: svn commit: r349197 - stable/12/sys/netinet/tcp_stacks X-SVN-Group: stable-12 X-SVN-Commit-Author: jtl X-SVN-Commit-Paths: stable/12/sys/netinet/tcp_stacks X-SVN-Commit-Revision: 349197 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: E37FF6D5F0 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.97 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.97)[-0.974,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US] X-BeenThere: svn-src-stable-12@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for only the 12-stable src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jun 2019 16:25:41 -0000 Author: jtl Date: Wed Jun 19 16:25:39 2019 New Revision: 349197 URL: https://svnweb.freebsd.org/changeset/base/349197 Log: MFC r349192: Add the ability to limit how much the code will fragment the RACK send map in response to SACKs. The default behavior is unchanged; however, the limit can be activated by changing the new net.inet.tcp.rack.split_limit sysctl. Approved by: so (gordon) Security: CVE-2019-5599 Modified: stable/12/sys/netinet/tcp_stacks/rack.c stable/12/sys/netinet/tcp_stacks/tcp_rack.h Directory Properties: stable/12/ (props changed) Modified: stable/12/sys/netinet/tcp_stacks/rack.c ============================================================================== --- stable/12/sys/netinet/tcp_stacks/rack.c Wed Jun 19 16:09:20 2019 (r349196) +++ stable/12/sys/netinet/tcp_stacks/rack.c Wed Jun 19 16:25:39 2019 (r349197) @@ -1,6 +1,5 @@ /*- - * Copyright (c) 2016-2018 - * Netflix Inc. All rights reserved. + * Copyright (c) 2016-2019 Netflix, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -203,6 +202,7 @@ static int32_t rack_always_send_oldest = 0; static int32_t rack_sack_block_limit = 128; static int32_t rack_use_sack_filter = 1; static int32_t rack_tlp_threshold_use = TLP_USE_TWO_ONE; +static uint32_t rack_map_split_limit = 0; /* unlimited by default */ /* Rack specific counters */ counter_u64_t rack_badfr; @@ -228,6 +228,8 @@ counter_u64_t rack_to_arm_tlp; counter_u64_t rack_to_alloc; counter_u64_t rack_to_alloc_hard; counter_u64_t rack_to_alloc_emerg; +counter_u64_t rack_alloc_limited_conns; +counter_u64_t rack_split_limited; counter_u64_t rack_sack_proc_all; counter_u64_t rack_sack_proc_short; @@ -261,6 +263,8 @@ static void rack_ack_received(struct tcpcb *tp, struct tcp_rack *rack, struct tcphdr *th, uint16_t nsegs, uint16_t type, int32_t recovery); static struct rack_sendmap *rack_alloc(struct tcp_rack *rack); +static struct rack_sendmap *rack_alloc_limit(struct tcp_rack *rack, + uint8_t limit_type); static struct rack_sendmap * rack_check_recovery_mode(struct tcpcb *tp, uint32_t tsused); @@ -445,6 +449,8 @@ sysctl_rack_clear(SYSCTL_HANDLER_ARGS) counter_u64_zero(rack_sack_proc_short); counter_u64_zero(rack_sack_proc_restart); counter_u64_zero(rack_to_alloc); + counter_u64_zero(rack_alloc_limited_conns); + counter_u64_zero(rack_split_limited); counter_u64_zero(rack_find_high); counter_u64_zero(rack_runt_sacks); counter_u64_zero(rack_used_tlpmethod); @@ -622,6 +628,11 @@ rack_init_sysctls() OID_AUTO, "pktdelay", CTLFLAG_RW, &rack_pkt_delay, 1, "Extra RACK time (in ms) besides reordering thresh"); + SYSCTL_ADD_U32(&rack_sysctl_ctx, + SYSCTL_CHILDREN(rack_sysctl_root), + OID_AUTO, "split_limit", CTLFLAG_RW, + &rack_map_split_limit, 0, + "Is there a limit on the number of map split entries (0=unlimited)"); SYSCTL_ADD_S32(&rack_sysctl_ctx, SYSCTL_CHILDREN(rack_sysctl_root), OID_AUTO, "inc_var", CTLFLAG_RW, @@ -757,7 +768,19 @@ rack_init_sysctls() SYSCTL_CHILDREN(rack_sysctl_root), OID_AUTO, "allocemerg", CTLFLAG_RD, &rack_to_alloc_emerg, - "Total alocations done from emergency cache"); + "Total allocations done from emergency cache"); + rack_alloc_limited_conns = counter_u64_alloc(M_WAITOK); + SYSCTL_ADD_COUNTER_U64(&rack_sysctl_ctx, + SYSCTL_CHILDREN(rack_sysctl_root), + OID_AUTO, "alloc_limited_conns", CTLFLAG_RD, + &rack_alloc_limited_conns, + "Connections with allocations dropped due to limit"); + rack_split_limited = counter_u64_alloc(M_WAITOK); + SYSCTL_ADD_COUNTER_U64(&rack_sysctl_ctx, + SYSCTL_CHILDREN(rack_sysctl_root), + OID_AUTO, "split_limited", CTLFLAG_RD, + &rack_split_limited, + "Split allocations dropped due to limit"); rack_sack_proc_all = counter_u64_alloc(M_WAITOK); SYSCTL_ADD_COUNTER_U64(&rack_sysctl_ctx, SYSCTL_CHILDREN(rack_sysctl_root), @@ -1121,10 +1144,11 @@ rack_alloc(struct tcp_rack *rack) { struct rack_sendmap *rsm; - counter_u64_add(rack_to_alloc, 1); - rack->r_ctl.rc_num_maps_alloced++; rsm = uma_zalloc(rack_zone, M_NOWAIT); if (rsm) { +alloc_done: + counter_u64_add(rack_to_alloc, 1); + rack->r_ctl.rc_num_maps_alloced++; return (rsm); } if (rack->rc_free_cnt) { @@ -1132,14 +1156,46 @@ rack_alloc(struct tcp_rack *rack) rsm = TAILQ_FIRST(&rack->r_ctl.rc_free); TAILQ_REMOVE(&rack->r_ctl.rc_free, rsm, r_next); rack->rc_free_cnt--; - return (rsm); + goto alloc_done; } return (NULL); } +/* wrapper to allocate a sendmap entry, subject to a specific limit */ +static struct rack_sendmap * +rack_alloc_limit(struct tcp_rack *rack, uint8_t limit_type) +{ + struct rack_sendmap *rsm; + + if (limit_type) { + /* currently there is only one limit type */ + if (rack_map_split_limit > 0 && + rack->r_ctl.rc_num_split_allocs >= rack_map_split_limit) { + counter_u64_add(rack_split_limited, 1); + if (!rack->alloc_limit_reported) { + rack->alloc_limit_reported = 1; + counter_u64_add(rack_alloc_limited_conns, 1); + } + return (NULL); + } + } + + /* allocate and mark in the limit type, if set */ + rsm = rack_alloc(rack); + if (rsm != NULL && limit_type) { + rsm->r_limit_type = limit_type; + rack->r_ctl.rc_num_split_allocs++; + } + return (rsm); +} + static void rack_free(struct tcp_rack *rack, struct rack_sendmap *rsm) { + if (rsm->r_limit_type) { + /* currently there is only one limit type */ + rack->r_ctl.rc_num_split_allocs--; + } rack->r_ctl.rc_num_maps_alloced--; if (rack->r_ctl.rc_tlpsend == rsm) rack->r_ctl.rc_tlpsend = NULL; @@ -3955,7 +4011,7 @@ do_rest_ofb: /* * Need to split this in two pieces the before and after. */ - nrsm = rack_alloc(rack); + nrsm = rack_alloc_limit(rack, RACK_LIMIT_TYPE_SPLIT); if (nrsm == NULL) { /* * failed XXXrrs what can we do but loose the sack @@ -4016,7 +4072,7 @@ do_rest_ofb: goto do_rest_ofb; } /* Ok we need to split off this one at the tail */ - nrsm = rack_alloc(rack); + nrsm = rack_alloc_limit(rack, RACK_LIMIT_TYPE_SPLIT); if (nrsm == NULL) { /* failed rrs what can we do but loose the sack info? */ goto out; Modified: stable/12/sys/netinet/tcp_stacks/tcp_rack.h ============================================================================== --- stable/12/sys/netinet/tcp_stacks/tcp_rack.h Wed Jun 19 16:09:20 2019 (r349196) +++ stable/12/sys/netinet/tcp_stacks/tcp_rack.h Wed Jun 19 16:25:39 2019 (r349197) @@ -55,8 +55,10 @@ struct rack_sendmap { uint8_t r_sndcnt; /* Retran count, not limited by * RACK_NUM_OF_RETRANS */ uint8_t r_in_tmap; /* Flag to see if its in the r_tnext array */ - uint8_t r_resv[3]; + uint8_t r_limit_type; /* is this entry counted against a limit? */ + uint8_t r_resv[2]; }; +#define RACK_LIMIT_TYPE_SPLIT 1 TAILQ_HEAD(rack_head, rack_sendmap); @@ -242,7 +244,7 @@ struct rack_control { uint32_t rc_num_maps_alloced; /* Number of map blocks (sacks) we * have allocated */ uint32_t rc_rcvtime; /* When we last received data */ - uint32_t rc_notused; + uint32_t rc_num_split_allocs; /* num split map entries allocated */ uint32_t rc_last_output_to; uint32_t rc_went_idle_time; @@ -311,7 +313,8 @@ struct tcp_rack { uint8_t rack_tlp_threshold_use; uint8_t rc_allow_data_af_clo: 1, delayed_ack : 1, - rc_avail : 6; + alloc_limit_reported : 1, + rc_avail : 5; uint8_t r_resv[2]; /* Fill to cache line boundary */ /* Cache line 2 0x40 */ struct rack_control r_ctl;