From owner-freebsd-pf@freebsd.org Tue Nov 13 22:17:51 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9993C1134530 for ; Tue, 13 Nov 2018 22:17:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0CF277B30A; Tue, 13 Nov 2018 22:17:51 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.codepro.be", Issuer "Let's Encrypt Authority X3" (verified OK)) (Authenticated sender: kp) by smtp.freebsd.org (Postfix) with ESMTPSA id 9CEDF252E2; Tue, 13 Nov 2018 22:17:50 +0000 (UTC) (envelope-from kp@FreeBSD.org) Received: from [192.168.228.1] (94-224-12-202.access.telenet.be [94.224.12.202]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id 103A5BDE7; Tue, 13 Nov 2018 23:17:49 +0100 (CET) From: "Kristof Provost" To: "Andreas Longwitz" , "Gleb Smirnoff" , kib@freebsd.org Cc: freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections Date: Tue, 13 Nov 2018 23:17:47 +0100 X-Mailer: MailMate (2.0BETAr6127) Message-ID: <9004F62C-D1DC-4CFA-93A1-67E981274831@FreeBSD.org> In-Reply-To: <5BEB3B9A.9080402@incore.de> References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> MIME-Version: 1.0 X-Rspamd-Queue-Id: 0CF277B30A X-Spamd-Result: default: False [-106.83 / 200.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ALLOW_DOMAIN_WHITELIST(-100.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_NA(0.00)[FreeBSD.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; R_SPF_SOFTFAIL(0.00)[~all]; RCVD_COUNT_THREE(0.00)[3]; MX_GOOD(-0.01)[cached: mx1.FreeBSD.org]; NEURAL_HAM_SHORT(-1.00)[-1.000,0]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; IP_SCORE(-3.72)[ip: (-9.75), ipnet: 96.47.64.0/20(-4.83), asn: 11403(-3.91), country: US(-0.09)]; ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Server: mx1.freebsd.org Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2018 22:17:51 -0000 On 13 Nov 2018, at 22:01, Andreas Longwitz wrote: >> >> Are there any hints why the counter pf_default_rule->states_cur >> could get a negative value ? >> >> I’m afraid I have no idea right now. >> > > OK, in the meantime I did some more research and I am now quite sure > the > problem with the bogus pf_default_rule->states_cur counter is not a > problem in pf. I am convinced it is a problem in counter(9) on i386 > server. The critical code is the machine instruction cmpxchg8b used in > /sys/i386/include/counter.h. > I’m always happy to hear problems aren’t my fault :) >> From intel instruction set reference manual: > Zhis instruction can be used with a LOCK prefix allow the instruction > to > be executed atomically. > > We have two other sources in kernel using cmpxchg8b: > /sys/i386/include/atomic.h and > /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S > > Both make use of the LOCK feature, in atomic.h a detailed explanation > is > given. Because counter.h lacks the LOCK prefix I propose the following > patch to get around the leak: > > --- counter.h.orig 2015-07-03 16:45:36.000000000 +0200 > +++ counter.h 2018-11-13 16:07:20.329053000 +0100 > @@ -60,6 +60,7 @@ > "movl %%edx,%%ecx\n\t" > "addl (%%edi),%%ebx\n\t" > "adcl 4(%%edi),%%ecx\n\t" > + "lock \n\t" > "cmpxchg8b %%fs:(%%esi)\n\t" > "jnz 1b" > : > @@ -76,6 +77,7 @@ > __asm __volatile( > "movl %%eax,%%ebx\n\t" > "movl %%edx,%%ecx\n\t" > + "lock \n\t" > "cmpxchg8b (%2)" > : "=a" (res_lo), "=d"(res_high) > : "SD" (p) > @@ -121,6 +123,7 @@ > "xorl %%ebx,%%ebx\n\t" > "xorl %%ecx,%%ecx\n\t" > "1:\n\t" > + "lock \n\t" > "cmpxchg8b (%0)\n\t" > "jnz 1b" > : > That looks very plausible. I’m somewhat out of my depth here, so I’d like the authors of the counter code to take a look at it. Best regards, Kristof From owner-freebsd-pf@freebsd.org Tue Nov 13 22:25:45 2018 Return-Path: Delivered-To: freebsd-pf@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 144F9113485A for ; Tue, 13 Nov 2018 22:25:45 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (glebi.us [198.45.61.253]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "cell.glebi.us", Issuer "cell.glebi.us" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6B09A7B755 for ; Tue, 13 Nov 2018 22:25:44 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from cell.glebi.us (localhost [127.0.0.1]) by cell.glebi.us (8.15.2/8.15.2) with ESMTPS id wADMPanK046049 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 13 Nov 2018 14:25:36 -0800 (PST) (envelope-from glebius@freebsd.org) Received: (from glebius@localhost) by cell.glebi.us (8.15.2/8.15.2/Submit) id wADMPXmb046047; Tue, 13 Nov 2018 14:25:33 -0800 (PST) (envelope-from glebius@freebsd.org) X-Authentication-Warning: cell.glebi.us: glebius set sender to glebius@freebsd.org using -f Date: Tue, 13 Nov 2018 14:25:33 -0800 From: Gleb Smirnoff To: Andreas Longwitz Cc: Kristof Provost , freebsd-pf@freebsd.org Subject: Re: rdr pass for proto tcp sometimes creates states with expire time zero and so breaking connections Message-ID: <20181113222533.GJ9744@FreeBSD.org> References: <5BC51424.5000309@incore.de> <5BD45882.1000207@incore.de> <5BEB3B9A.9080402@incore.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5BEB3B9A.9080402@incore.de> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: 6B09A7B755 X-Spamd-Result: default: False [-103.10 / 200.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; ALLOW_DOMAIN_WHITELIST(-100.00)[freebsd.org]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; HAS_XAW(0.00)[]; DMARC_NA(0.00)[freebsd.org]; R_SPF_SOFTFAIL(0.00)[~all]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: mx66.freebsd.org]; NEURAL_HAM_SHORT(-0.97)[-0.972,0]; IP_SCORE(-0.02)[country: US(-0.09)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:2906, ipnet:198.45.48.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2018 22:25:45 -0000 On Tue, Nov 13, 2018 at 10:01:14PM +0100, Andreas Longwitz wrote: A> OK, in the meantime I did some more research and I am now quite sure the A> problem with the bogus pf_default_rule->states_cur counter is not a A> problem in pf. I am convinced it is a problem in counter(9) on i386 A> server. The critical code is the machine instruction cmpxchg8b used in A> /sys/i386/include/counter.h. A> A> From intel instruction set reference manual: A> Zhis instruction can be used with a LOCK prefix allow the instruction to A> be executed atomically. A> A> We have two other sources in kernel using cmpxchg8b: A> /sys/i386/include/atomic.h and A> /sys/cddl/contrib/opensolaris/common/atomic/i386/opensolaris_atomic.S A single CPU instruction is atomic by definition, with regards to the CPU. A preemption can not happen in a middle of instruction. What the "lock" prefix does is memory locking to avoid unlocked parallel access to the same address by different CPUs. What is special about counter(9) is that %fs:%esi always points to a per-CPU address, because %fs is unique for every CPU and is constant, so no other CPU may write to this address, so lock prefix isn't needed. Of course a true SMP i386 isn't a well tested arch, so I won't assert that counter(9) doesn't have bugs on this arch. However, I don't see lock prefix necessary here. -- Gleb Smirnoff