From owner-freebsd-net@freebsd.org Sun Sep 20 17:10:28 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 99078A05B8C for ; Sun, 20 Sep 2015 17:10:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 86E231180 for ; Sun, 20 Sep 2015 17:10:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8KHASEw073287 for ; Sun, 20 Sep 2015 17:10:28 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Sun, 20 Sep 2015 17:10:28 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Sep 2015 17:10:28 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-net@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Sun Sep 20 21:00:23 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9280AA055A0 for ; Sun, 20 Sep 2015 21:00:23 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6E11012D3 for ; Sun, 20 Sep 2015 21:00:23 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8KL0NFX049018 for ; Sun, 20 Sep 2015 21:00:23 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201509202100.t8KL0NFX049018@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-net@FreeBSD.org Subject: Problem reports for freebsd-net@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 20 Sep 2015 21:00:23 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Sep 2015 21:00:23 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 194515 | Fatal Trap 12 Kernel with vimage Open | 199136 | [if_tap] Added down_on_close sysctl variable to t 2 problems total for which you should take action. From owner-freebsd-net@freebsd.org Mon Sep 21 08:21:30 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CA999CFD1C for ; Mon, 21 Sep 2015 08:21:30 +0000 (UTC) (envelope-from julien@jch.io) Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EAFC513B9 for ; Mon, 21 Sep 2015 08:21:29 +0000 (UTC) (envelope-from julien@jch.io) Received: by wicge5 with SMTP id ge5so104510705wic.0 for ; Mon, 21 Sep 2015 01:21:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type; bh=yzKdGm2BB0aFxStvx27M+wji/t+GxfSrzOPybM16PW0=; b=KQsBsAmvoKqEeTU/IUQ7LpKiq4DmGztx7gAPg1WxThNgxrN1sJkgb3G1PNeY7au+jp gWu7BtUU58XxDJJraf+14tU+hpwETwXcvkcp4NHTAvJVSDqGiLw0w2CFFICSBgc442i+ 0C95Cq2XQvKdtHvZvTRD+xDjvkQv4BNp8fDNUlW8xeoPWF5VQ3fZxTclWquCa1nY6M5Y YKen5KzzwSW3XQrI6q7JNkzLIkgB6IOz+50T9kUwaQtnhVGLseMxHFcxxsdN7HeD+zMT VyQfaLZBhqk+2uhwyLnLX/5UQ5a+yACrywh8unPWml9doVtGuxJgaKRoo341V4nE5n36 A5Qg== X-Gm-Message-State: ALoCoQnJ8zZrszHUzwp2ega0WayGionm6+/ol7fX4jrpFMSyS3dsgWRiuIQZj9phjiuT0Rpf3NvI X-Received: by 10.180.86.232 with SMTP id s8mr12430588wiz.27.1442823687824; Mon, 21 Sep 2015 01:21:27 -0700 (PDT) Received: from res1momishra-l1.vcorp.ad.vrsn.com ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id p4sm12162122wia.15.2015.09.21.01.21.26 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Sep 2015 01:21:27 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Konstantin Belousov References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> Cc: freebsd-net@freebsd.org, Palle Girgensohn From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <55FFBE01.6060706@freebsd.org> Date: Mon, 21 Sep 2015 10:21:21 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150918160605.GN67105@kib.kiev.ua> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="0ehaFj5lButCRW2rORLCVxB4rePkEoGV8" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2015 08:21:30 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --0ehaFj5lButCRW2rORLCVxB4rePkEoGV8 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Konstantin, Hi Palle, On 18/09/15 18:06, Konstantin Belousov wrote: > On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote: >> Hi Palle, >> >> On 18/09/15 11:12, Palle Girgensohn wrote: >>> We see daily panics on our production systems (web server, apache >>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using netgraph= >>> interfaces [not epair]). >>> >>> The problem started after the summer. Normal port upgrades seems to >>> be the only difference. The problem occurs with 10.2-p2 kernel as >>> well as 10.1-p4 and 10.1-p15. >>> >>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>> >>> Any ideas? >> >> Thanks for you detailed report. I am not aware of any tcp_twclose() >> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean ther= e >> are none). Few interesting facts (at least for me): >> >> - Your crash happens when unlocking a inp exclusive lock with INP_WUN= LOCK() >> >> - Something is already wrong before calling turnstile_broadcast() as = it >> is called with ts =3D NULL: > In the kernel without witness this is a 99%-sure indication of attempt = to > unlock not owned lock. Thanks, this is useful. So far I did not find any path where tcp_twclose() can call INP_WUNLOCK without having the exclusive lock held, that makes this issue interesting. >> I won't go to far here as I am not expert enough in VIMAGE, but one >> question anyway: >> >> - Can you correlate this kernel panic to a particular event? Like fo= r >> example a VIMAGE/VNET jail destruction. >> >> I will test that on my side on a 10.2 machine. I did not find any issues while testing 10.2 + VIMAGE on my side. Thus Palle what I would suggest: - First, test with stable/10 to see if by chance this issue has already been fixed in stable branch. - Second, if issue is still in stable/10, compile 10.2 kernel with these options: options DDB options DEADLKRES options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN To see where the original fault is coming from. Thanks. -- Julien --0ehaFj5lButCRW2rORLCVxB4rePkEoGV8 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJV/74JAAoJEKVlQ5Je6dhxkr4IAK+4UoTQ8JrDCbfESMDgExGB MLB/2yRhBvSh+5Wl6csKDrVhlt517/2fyJ1Qq9c7VACD88dYK0qiKuV/0lyHrcn+ i9KtnvryFNDvwfOpnyzoCuxneGhoL60mIk9vsTWFzWDACbc1qM+7H5nI7WYBlvcv qTgilD45m6XVbflA23RGTrycUSE3dvG0dkpE+9Eclz29aPwDjfBBcdv5mmzbPYET cBeudX+FHxTEMlfy1HiZo88P3XxHI9el1hM66gwEszWXN+duLaBK8K+WQPJMCCxv nCO5r+YpstK72zXPUAE6WUieoZR1rmVqRfFceFHCTKhdxJRBWDYfgP8gve7DdyM= =e0uJ -----END PGP SIGNATURE----- --0ehaFj5lButCRW2rORLCVxB4rePkEoGV8-- From owner-freebsd-net@freebsd.org Mon Sep 21 08:28:56 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9F25A052BE for ; Mon, 21 Sep 2015 08:28:56 +0000 (UTC) (envelope-from julien@jch.io) Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 73BF218B0 for ; Mon, 21 Sep 2015 08:28:56 +0000 (UTC) (envelope-from julien@jch.io) Received: by wiclk2 with SMTP id lk2so134869459wic.0 for ; Mon, 21 Sep 2015 01:28:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type; bh=seQgeVL3VMoMKmYpURnkFaJnqjT6Ywc1uMGP8ASJJJU=; b=I9aPnGi7zC0oix52+E4O3Glq7TsybK0Uh7+toQWv0JMn1UquCHarcvsfXTHvw1PNBX kQsFfevO8/d5v2ZFttsj4Gpx10USagzJyRlsSgqPDQplTDeqTzwkmJBEZxDBuCigFoDo VEdSeCQwlauSRhx+JHcuAeSz4yFkoOwGolRAUi5Ja//rMm6+oQCOmt6YXG2jsAKHlpHB TYqddcweh2jndi/ehtiNfUHVokjxZvnU++oMinOxm3TuWPmf4TtuF2veUyfQtHKvcuxI MhqyVr2iWF3gO6KX21uIjLNJ3yNmCzDFi++tEPVl2Oi80Jv56WjugRZ4WFYfIiEkZWKC 0kyQ== X-Gm-Message-State: ALoCoQltiWviK7jPPBSGQfFz34kq48c73yipajiTQdYEs29t0XszVfqjJmzyGjbklg+Q33XLpnY4 X-Received: by 10.194.23.167 with SMTP id n7mr20833361wjf.112.1442824129029; Mon, 21 Sep 2015 01:28:49 -0700 (PDT) Received: from res1momishra-l1.vcorp.ad.vrsn.com ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id kb5sm22864757wjc.17.2015.09.21.01.28.47 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Sep 2015 01:28:48 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn , Konstantin Belousov , Adrian Chadd References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <9A234106-62EC-49C9-954A-2DA8315E9B4A@pingpong.net> Cc: Palle Girgensohn , "freebsd-net@freebsd.org" From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <55FFBFBC.30905@freebsd.org> Date: Mon, 21 Sep 2015 10:28:44 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <9A234106-62EC-49C9-954A-2DA8315E9B4A@pingpong.net> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="bVkmOmAognLcpdvLOM6QEA6rrTAiq6GbA" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2015 08:28:56 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --bVkmOmAognLcpdvLOM6QEA6rrTAiq6GbA Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Palle, On 18/09/15 22:42, Palle Girgensohn wrote: >> 18 sep 2015 kl. 18:06 skrev Konstantin Belousov >> : >>=20 >>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote:=20 >>> Hi Palle, >>>=20 >>>> On 18/09/15 11:12, Palle Girgensohn wrote: We see daily panics >>>> on our production systems (web server, apache running MPM >>>> event, openjdk8. Kernel with VIMAGE. Jails using netgraph=20 >>>> interfaces [not epair]). >>>>=20 >>>> The problem started after the summer. Normal port upgrades >>>> seems to be the only difference. The problem occurs with >>>> 10.2-p2 kernel as well as 10.1-p4 and 10.1-p15. >>>>=20 >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>>>=20 >>>> Any ideas? >>>=20 >>> Thanks for you detailed report. I am not aware of any >>> tcp_twclose() related issues (without VIMAGE) since FreeBSD 10.0 >>> (does not mean there are none). Few interesting facts (at least >>> for me): >>>=20 >>> - Your crash happens when unlocking a inp exclusive lock with >>> INP_WUNLOCK() >>>=20 >>> - Something is already wrong before calling turnstile_broadcast() >>> as it is called with ts =3D NULL: >> In the kernel without witness this is a 99%-sure indication of >> attempt to unlock not owned lock. >>=20 >>>=20 >>> turnstile_broadcast (ts=3D0x0, queue=3D1) at=20 >>> /usr/src/sys/kern/subr_turnstile.c:838 __rw_wunlock_hard () at >>> /usr/src/sys/kern/kern_rwlock.c:988 tcp_twclose () at >>> /usr/src/sys/netinet/tcp_timewait.c:540 tcp_tw_2msl_scan () at >>> /usr/src/sys/netinet/tcp_timewait.c:748 tcp_slowtimo () at >>> /usr/src/sys/netinet/tcp_timer.c:198 >>>=20 >>> I won't go to far here as I am not expert enough in VIMAGE, but >>> one question anyway: >>>=20 >>> - Can you correlate this kernel panic to a particular event? >>> Like for example a VIMAGE/VNET jail destruction. >>>=20 >>> I will test that on my side on a 10.2 machine. >=20 > I just got a response from adrian@ where he seems to remember that it > has all been fixed in head. >=20 > I would really prefer not to run a head kernel in production unless I > have to, so the question is if it is possible to pin down the > specific fixes for this problem? Any suggestions? >=20 > Thanks for all the help so far! On my side, all issues we have found in TCP stack are currently both fixed in 10.2 and HEAD. The remaining differences are only performance improvements that are solely in HEAD. adrian@ might have more details on fixes he has in mind. -- Julien --bVkmOmAognLcpdvLOM6QEA6rrTAiq6GbA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJV/7/CAAoJEKVlQ5Je6dhxNb0H/i+UyIVET1W7Qcv+zmCj3G5j WGkXE6VltZy52Hb8dl/vbkUuWaeu6jxxiQDY9uAp73twxYChrIWLIvRQ0yhOxZHo lNuwNaK//ahKwMVn2Q7ALJaMli7j318DoVAeS0XgPLa9m9xN9/mGURZJIeF/vNM8 s85GPa6rbnTNJMsOlZfCIlh384jtzIL31XlQTnBV+hRUypuSohjJTfZsGa8ISxmJ XDxmxxwe8yPFJ0ch5PpSqITth1SEs61L/UkY/TIxNMF1zpdeT+9xSXn7YMAeObzA WaJXDFbgWibhsg+wo96eUrFc6vkfU2Y68xczHuTUy+22gmcLSMNVLYVfFsFFXV8= =JGZA -----END PGP SIGNATURE----- --bVkmOmAognLcpdvLOM6QEA6rrTAiq6GbA-- From owner-freebsd-net@freebsd.org Mon Sep 21 08:55:46 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8F4BA0619A for ; Mon, 21 Sep 2015 08:55:46 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 43DF21484; Mon, 21 Sep 2015 08:55:45 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from [10.0.1.25] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 5C226C8BC; Mon, 21 Sep 2015 10:55:38 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <55FFBFBC.30905@freebsd.org> Date: Mon, 21 Sep 2015 10:55:37 +0200 Cc: Konstantin Belousov , Adrian Chadd , "freebsd-net@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <9A234106-62EC-49C9-954A-2DA8315E9B4A@pingpong.net> <55FFBFBC.30905@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2015 08:55:46 -0000 > 21 sep 2015 kl. 10:28 skrev Julien Charbon : >=20 >=20 > Hi Palle, >=20 > On 18/09/15 22:42, Palle Girgensohn wrote: >>> 18 sep 2015 kl. 18:06 skrev Konstantin Belousov >>> : >>>=20 >>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote:=20 >>>> Hi Palle, >>>>=20 >>>>> On 18/09/15 11:12, Palle Girgensohn wrote: We see daily panics >>>>> on our production systems (web server, apache running MPM >>>>> event, openjdk8. Kernel with VIMAGE. Jails using netgraph=20 >>>>> interfaces [not epair]). >>>>>=20 >>>>> The problem started after the summer. Normal port upgrades >>>>> seems to be the only difference. The problem occurs with >>>>> 10.2-p2 kernel as well as 10.1-p4 and 10.1-p15. >>>>>=20 >>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>>>>=20 >>>>> Any ideas? >>>>=20 >>>> Thanks for you detailed report. I am not aware of any >>>> tcp_twclose() related issues (without VIMAGE) since FreeBSD 10.0 >>>> (does not mean there are none). Few interesting facts (at least >>>> for me): >>>>=20 >>>> - Your crash happens when unlocking a inp exclusive lock with >>>> INP_WUNLOCK() >>>>=20 >>>> - Something is already wrong before calling turnstile_broadcast() >>>> as it is called with ts =3D NULL: >>> In the kernel without witness this is a 99%-sure indication of >>> attempt to unlock not owned lock. >>>=20 >>>>=20 >>>> turnstile_broadcast (ts=3D0x0, queue=3D1) at=20 >>>> /usr/src/sys/kern/subr_turnstile.c:838 __rw_wunlock_hard () at >>>> /usr/src/sys/kern/kern_rwlock.c:988 tcp_twclose () at >>>> /usr/src/sys/netinet/tcp_timewait.c:540 tcp_tw_2msl_scan () at >>>> /usr/src/sys/netinet/tcp_timewait.c:748 tcp_slowtimo () at >>>> /usr/src/sys/netinet/tcp_timer.c:198 >>>>=20 >>>> I won't go to far here as I am not expert enough in VIMAGE, but >>>> one question anyway: >>>>=20 >>>> - Can you correlate this kernel panic to a particular event? >>>> Like for example a VIMAGE/VNET jail destruction. >>>>=20 >>>> I will test that on my side on a 10.2 machine. >>=20 >> I just got a response from adrian@ where he seems to remember that it >> has all been fixed in head. >>=20 >> I would really prefer not to run a head kernel in production unless I >> have to, so the question is if it is possible to pin down the >> specific fixes for this problem? Any suggestions? >>=20 >> Thanks for all the help so far! >=20 > On my side, all issues we have found in TCP stack are currently both > fixed in 10.2 and HEAD. The remaining differences are only = performance > improvements that are solely in HEAD. adrian@ might have more details > on fixes he has in mind. Hi, 10.2 gives us the same sort of crash as 10.1. Vi are now testing releng/10.1 with these two patches merged: https://svnweb.freebsd.org/changeset/base/287261 https://svnweb.freebsd.org/changeset/base/287780 We have yet to see a crash, so it is looking vaguelly promising, but we = have to wait and see. Palle PS. I've failed to mention that except VIMAGE +jails, the jail host is = an NFS client as well. They NFS shares are mounted from the jail host, = not the jails (since that is not possible anyway). DS. From owner-freebsd-net@freebsd.org Mon Sep 21 10:38:00 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 920AAA0571D for ; Mon, 21 Sep 2015 10:38:00 +0000 (UTC) (envelope-from a.shcheglov@sushishop.ru) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 6F3AE1B36 for ; Mon, 21 Sep 2015 10:38:00 +0000 (UTC) (envelope-from a.shcheglov@sushishop.ru) Received: by mailman.ysv.freebsd.org (Postfix) id 6C86AA0571B; Mon, 21 Sep 2015 10:38:00 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 520D3A0571A for ; Mon, 21 Sep 2015 10:38:00 +0000 (UTC) (envelope-from a.shcheglov@sushishop.ru) Received: from mx.sushishop.ru (mx.sushishop.ru [46.228.6.83]) by mx1.freebsd.org (Postfix) with ESMTP id AA8651B33 for ; Mon, 21 Sep 2015 10:37:58 +0000 (UTC) (envelope-from a.shcheglov@sushishop.ru) Received: from localhost (localhost [127.0.0.1]) by mx.sushishop.ru (Postfix) with ESMTP id 2287D41090D2 for ; Mon, 21 Sep 2015 13:27:54 +0300 (MSK) Received: from mx.sushishop.ru ([127.0.0.1]) by localhost (mx.sushishop.ru [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id lxoy6OGKya63 for ; Mon, 21 Sep 2015 13:27:53 +0300 (MSK) Received: from localhost (localhost [127.0.0.1]) by mx.sushishop.ru (Postfix) with ESMTP id BA3DF4107338 for ; Mon, 21 Sep 2015 13:27:53 +0300 (MSK) X-Virus-Scanned: amavisd-new at mx.sushishop.ru Received: from mx.sushishop.ru ([127.0.0.1]) by localhost (mx.sushishop.ru [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id OlIn3tpBcAH5 for ; Mon, 21 Sep 2015 13:27:53 +0300 (MSK) Received: from [192.168.157.157] (unknown [192.168.157.157]) by mx.sushishop.ru (Postfix) with ESMTPSA id A07564107166 for ; Mon, 21 Sep 2015 13:27:53 +0300 (MSK) From: =?utf-8?q?=d0=90=d0=bd=d0=b4=d1=80=d0=b5=d0=b9=20=d0=a9=d0=b5=d0=b3=d0=bb=d0=be=d0=b2?= To: net@FreeBSD.org Subject: Hi, can u help me with Atheros card? Date: Mon, 21 Sep 2015 10:26:58 +0000 Message-Id: Reply-To: =?utf-8?q?=d0=90=d0=bd=d0=b4=d1=80=d0=b5=d0=b9=20=d0=a9=d0=b5=d0=b3=d0=bb=d0=be=d0=b2?= User-Agent: eM_Client/6.0.22344.0 Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2015 10:38:00 -0000 Hi. How can i activate network card Atheros AR8151 ? [root@ASTERSUSHI /usr/home/nord]# uname -a FreeBSD ASTERSUSHI 9.1-RELEASE FreeBSD 9.1-RELEASE #0: Mon Aug 17=20 02:11:08 MSK 2015 root@ASTERSUSHI:/usr/obj/usr/src/sys/KERNEL amd64 [root@ASTERSUSHI /usr/home/nord]# pciconf -l -v | grep -B3 network em0@pci0:2:0:0: class=3D0x020000 card=3D0xa01f8086 chip=3D0x10d38086 rev= =3D0x00=20 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82574L Gigabit Network Connection' class =3D network -- em1@pci0:3:0:0: class=3D0x020000 card=3D0xa01f8086 chip=3D0x10d38086 rev= =3D0x00=20 hdr=3D0x00 vendor =3D 'Intel Corporation' device =3D '82574L Gigabit Network Connection' class =3D network -- subclass =3D PCI-PCI none2@pci0:7:0:0: class=3D0x020000 card=3D0xe0001458 chip=3D0x1091196= 9=20 rev=3D0x10 hdr=3D0x00 vendor =3D 'Atheros Communications' class =3D network --- =D0=90=D0=BD=D0=B4=D1=80=D0=B5=D0=B9 =D0=A9=D0=B5=D0=B3=D0=BB=D0=BE=D0=B2 =D0=A1=D1=82=D0=B0=D1=80=D1=88=D0=B8=D0=B9 =D1=81=D0=B8=D1=81=D1=82=D0=B5= =D0=BC=D0=BD=D1=8B=D0=B9 =D0=B0=D0=B4=D0=BC=D0=B8=D0=BD=D0=B8=D1=81=D1=82= =D1=80=D0=B0=D1=82=D0=BE=D1=80 =D1=82=D0=B5=D0=BB: 8 (812) 318-51-44 =D0=B4=D0=BE=D0=B1. 2241 =D0=BC=D0=BE=D0=B1: 8 (921) 956-1337 e-mail: a.shcheglov@sushishop.ru www.sushishop.ru From owner-freebsd-net@freebsd.org Mon Sep 21 10:44:38 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7EFFDA05AA7 for ; Mon, 21 Sep 2015 10:44:38 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 5D1F31F0A for ; Mon, 21 Sep 2015 10:44:38 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 5A792A05AA6; Mon, 21 Sep 2015 10:44:38 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5A08CA05AA5 for ; Mon, 21 Sep 2015 10:44:38 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-pa0-x233.google.com (mail-pa0-x233.google.com [IPv6:2607:f8b0:400e:c03::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2AEFD1F09 for ; Mon, 21 Sep 2015 10:44:38 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by padhy16 with SMTP id hy16so113516066pad.1 for ; Mon, 21 Sep 2015 03:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=BtHMq1tlWSHK4zkTY8V98v3TgcAEwjL+2hlsoeESwJY=; b=TsZpA3eaJOvHObp7l5sIFwesKUeA+bxrNmJvX1sM12gcRUCdP7LoSAEVtArGq5AQVP pdsFfopvXi21D/knr927lhsjU+fzBgdbrda7EBLjNi4esYcoO9jNPik1zA1Ogk3mh8kC EmQ3l87tKZbHmpYVdxZ0eTsYSfc8ZnGVvUhpnKSqyDqQlaAnNueIakf1f7OU9H20JLT/ NrIWuQf7p8g2xHVHFlQF7FPgOGy0QUWGOJYcQC5OtEZdcHG+93bUAf4v4NgErAdXyAnG AqBpQIUbUWYhvlkU4OW5W8ZtUC6icd3Ju3uYdgiluVBQlcjDUgC01N53X2djbL3y/Tyt bNlg== X-Received: by 10.68.176.227 with SMTP id cl3mr24327789pbc.8.1442832277560; Mon, 21 Sep 2015 03:44:37 -0700 (PDT) Received: from localhost ([106.247.248.2]) by smtp.gmail.com with ESMTPSA id qa5sm9758643pbc.70.2015.09.21.03.44.34 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Sep 2015 03:44:36 -0700 (PDT) From: YongHyeon PYUN X-Google-Original-From: "YongHyeon PYUN" Received: by localhost (sSMTP sendmail emulation); Mon, 21 Sep 2015 19:44:29 +0900 Date: Mon, 21 Sep 2015 19:44:29 +0900 To: =?EUC-KR?B?0JDQvdC00YDQtdC5INCp0LXQs9C70L7Qsg==?= Cc: net@FreeBSD.org Subject: Re: Hi, can u help me with Atheros card? Message-ID: <20150921104429.GA1184@michelle.fasterthan.co.kr> Reply-To: pyunyh@gmail.com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2015 10:44:38 -0000 On Mon, Sep 21, 2015 at 10:26:58AM +0000, ???????????? ???????????? wrote: > Hi. How can i activate network card Atheros AR8151 ? > > > [root@ASTERSUSHI /usr/home/nord]# uname -a > FreeBSD ASTERSUSHI 9.1-RELEASE FreeBSD 9.1-RELEASE #0: Mon Aug 17 > 02:11:08 MSK 2015 root@ASTERSUSHI:/usr/obj/usr/src/sys/KERNEL amd64 > > [root@ASTERSUSHI /usr/home/nord]# pciconf -l -v | grep -B3 network > em0@pci0:2:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82574L Gigabit Network Connection' > class = network > -- > em1@pci0:3:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 > hdr=0x00 > vendor = 'Intel Corporation' > device = '82574L Gigabit Network Connection' > class = network > -- > subclass = PCI-PCI > none2@pci0:7:0:0: class=0x020000 card=0xe0001458 chip=0x10911969 > rev=0x10 hdr=0x00 > vendor = 'Atheros Communications' > class = network > It seems it looks like a AR8161 controller. I believe it should be supported by alc(4) on 10.2-RELEASE. Alternatively you would get support by updating to latest stable/9. From owner-freebsd-net@freebsd.org Mon Sep 21 13:53:43 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1FA4A068FB for ; Mon, 21 Sep 2015 13:53:43 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 57A091F6C; Mon, 21 Sep 2015 13:53:42 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from [10.0.0.143] (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 4ED7DC6C5; Mon, 21 Sep 2015 15:53:42 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <55FFBE01.6060706@freebsd.org> Date: Mon, 21 Sep 2015 15:53:41 +0200 Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky Content-Transfer-Encoding: quoted-printable Message-Id: <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Sep 2015 13:53:43 -0000 > 21 sep 2015 kl. 10:21 skrev Julien Charbon : >=20 >=20 > Hi Konstantin, Hi Palle, >=20 > On 18/09/15 18:06, Konstantin Belousov wrote: >> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote: >>> Hi Palle, >>>=20 >>> On 18/09/15 11:12, Palle Girgensohn wrote: >>>> We see daily panics on our production systems (web server, apache >>>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using = netgraph >>>> interfaces [not epair]). >>>>=20 >>>> The problem started after the summer. Normal port upgrades seems to >>>> be the only difference. The problem occurs with 10.2-p2 kernel as >>>> well as 10.1-p4 and 10.1-p15. >>>>=20 >>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>>>=20 >>>> Any ideas? >>>=20 >>> Thanks for you detailed report. I am not aware of any tcp_twclose() >>> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean = there >>> are none). Few interesting facts (at least for me): >>>=20 >>> - Your crash happens when unlocking a inp exclusive lock with = INP_WUNLOCK() >>>=20 >>> - Something is already wrong before calling turnstile_broadcast() as = it >>> is called with ts =3D NULL: >> In the kernel without witness this is a 99%-sure indication of = attempt to >> unlock not owned lock. >=20 > Thanks, this is useful. So far I did not find any path where > tcp_twclose() can call INP_WUNLOCK without having the exclusive lock > held, that makes this issue interesting. >=20 >>> I won't go to far here as I am not expert enough in VIMAGE, but one >>> question anyway: >>>=20 >>> - Can you correlate this kernel panic to a particular event? Like = for >>> example a VIMAGE/VNET jail destruction. >>>=20 >>> I will test that on my side on a 10.2 machine. >=20 > I did not find any issues while testing 10.2 + VIMAGE on my side. Thus > Palle what I would suggest: >=20 > - First, test with stable/10 to see if by chance this issue has = already > been fixed in stable branch. >=20 > - Second, if issue is still in stable/10, compile 10.2 kernel with > these options: >=20 > options DDB > options DEADLKRES > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options WITNESS_SKIPSPIN >=20 > To see where the original fault is coming from. Hi, We just had two crashes within 15 minutes using 10.2 with these two = added: https://svnweb.freebsd.org/changeset/base/287261 https://svnweb.freebsd.org/changeset/base/287780=20 We don't always get a core dump, but the second time, we did. very similar stack trace, but not identical: (kgdb) #0 doadump (textdump=3D) at pcpu.h:219 #1 0xffffffff80949a82 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff80949e65 in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff80949cf3 in panic (fmt=3D0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff80d5d0bb in trap_fatal (frame=3D, eva=3D) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff80d5d3bd in trap_pfault (frame=3D0xfffffe1760bc1840, usermode=3D) at = /usr/src/sys/amd64/amd64/trap.c:674 #6 0xffffffff80d5ca5a in trap (frame=3D0xfffffe1760bc1840) at /usr/src/sys/amd64/amd64/trap.c:440 #7 0xffffffff80d42dd2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff8099861c in turnstile_broadcast (ts=3D0x0, queue=3D1) at /usr/src/sys/kern/subr_turnstile.c:838 #9 0xffffffff80948100 in __rw_wunlock_hard (c=3D0xfffff811c43487a0, = tid=3D1, file=3D0x1
, line=3D1) at /usr/src/sys/kern/kern_rwlock.c:988 #10 0xffffffff80b067c4 in tcp_twclose (tw=3D, reuse=3D) at = /usr/src/sys/netinet/tcp_timewait.c:540 #11 0xffffffff80b06e0b in tcp_tw_2msl_scan (reuse=3D0) at /usr/src/sys/netinet/tcp_timewait.c:748 #12 0xffffffff80b04b0e in tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:198 #13 0xffffffff809b7a04 in pfslowtimo (arg=3D0x0) at /usr/src/sys/kern/uipc_domain.c:508 #14 0xffffffff8095f91b in softclock_call_cc (c=3D0xffffffff81620bf0, cc=3D0xffffffff8169dc00, direct=3D0) at = /usr/src/sys/kern/kern_timeout.c:685 #15 0xffffffff8095fd44 in softclock (arg=3D0xffffffff8169dc00) at /usr/src/sys/kern/kern_timeout.c:814 #16 0xffffffff8091592b in intr_event_execute_handlers ( p=3D, ie=3D0xfffff801102e0d00) at /usr/src/sys/kern/kern_intr.c:1264 #17 0xffffffff80915d76 in ithread_loop (arg=3D0xfffff801102adee0) at /usr/src/sys/kern/kern_intr.c:1277 #18 0xffffffff8091347a in fork_exit ( callout=3D0xffffffff80915ce0 , arg=3D0xfffff801102adee0,= frame=3D0xfffffe1760bc1c00) at /usr/src/sys/kern/kern_fork.c:1018 #19 0xffffffff80d4330e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #20 0x0000000000000000 in ?? () I'll try stable/10 now. Would you suggest a "clean" stable/10, or could = 287621 and 287780 help? I'll add the debugging suggested options right away. Palle From owner-freebsd-net@freebsd.org Tue Sep 22 01:42:31 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4E0B1A05EE5 for ; Tue, 22 Sep 2015 01:42:31 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1B14F1EC0 for ; Tue, 22 Sep 2015 01:42:31 +0000 (UTC) (envelope-from yongmincho82@gmail.com) Received: by ioiz6 with SMTP id z6so2365517ioi.2 for ; Mon, 21 Sep 2015 18:42:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=83qnlPft7/XkiUmY845JfA8YC9sqOzCP84lVQsuW138=; b=EeQGvRkN9ho1ziMz91xSoN3qCJ2+Rgn7mmdq1pF3SDr0Fx3Z9vvH9bG3m+lndzRxZz OQaUZTpvmqrfyCi8r7E8J9Q8WHKf+A6no6MiUAA/zbzdx/l7SoQqO8f3MnSyhbK6p8dc k++qgnitOaH7Djsx7fmkTlA4EPXJBnXSS6jsURBs6I0NNpYJwcT5CFfdqOtWEJCl8MAP GV6joX3p6N/b89UJQh8Ax4r6zyrai8PtR9mMsUrlXzrO3Rd2lrHDWBIAk8NWBZJ/Se8e Sun9JlsQKoFOKiADB/jBL490wBcy16f5OpBoFjoyjP5ENxlDt4TWQheaPJ6qz5TFcueM 0BkA== MIME-Version: 1.0 X-Received: by 10.107.128.88 with SMTP id b85mr28543999iod.64.1442886150308; Mon, 21 Sep 2015 18:42:30 -0700 (PDT) Received: by 10.64.121.230 with HTTP; Mon, 21 Sep 2015 18:42:30 -0700 (PDT) Date: Tue, 22 Sep 2015 10:42:30 +0900 Message-ID: Subject: in case of resetting the t_dupacks in tcp_input.c From: =?UTF-8?B?7KGw7Jqp66+8?= To: freebsd-net@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 01:42:31 -0000 Hi, I have a question that reset a dupack count in tcp stack. My company's product was tested on freebsd 10. As far as I know The fast retransmission is triggered when the receiver is received 3 dup acks. Why is the t_dupack value reset, if we happen to get data or a window update along with a duplication ack? I checked openbsd and netbsd. The t_dupack is not reset on the netbsd, if it receive ack that get a window update(changed) along with a duplication ack. The t_dupack is reset on the openbsd, if it receive ack that get a window shrink along with a duplication ack. I don't know why the t_dupack is reset, if to get a window update. I think Just it is skipped(is not reset), if we receive the ack that is window update. like netbsd. could you explain about this? Thank you in advance for your answers! Best Regards, Yongmin From owner-freebsd-net@freebsd.org Tue Sep 22 06:43:13 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B823A038C5 for ; Tue, 22 Sep 2015 06:43:13 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mail.strugglingcoder.info (strugglingcoder.info [65.19.130.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.strugglingcoder.info", Issuer "mail.strugglingcoder.info" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 894371DBE for ; Tue, 22 Sep 2015 06:43:13 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id 08A7FD133E; Mon, 21 Sep 2015 23:43:12 -0700 (PDT) Date: Mon, 21 Sep 2015 23:43:11 -0700 From: hiren panchasara To: ??? Cc: freebsd-net@freebsd.org Subject: Re: in case of resetting the t_dupacks in tcp_input.c Message-ID: <20150922064311.GA29193@strugglingcoder.info> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="7JfCtLOvnd9MIVvH" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 06:43:13 -0000 --7JfCtLOvnd9MIVvH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 09/22/15 at 10:42P, ??? wrote: > Hi, >=20 > I have a question that reset a dupack count in tcp stack. > My company's product was tested on freebsd 10. > As far as I know The fast retransmission is triggered when the receiver is > received 3 dup acks. > Why is the t_dupack value reset, if we happen to get data or a window > update along with a duplication ack? >=20 > I checked openbsd and netbsd. > The t_dupack is not reset on the netbsd, if it receive ack that > get a window update(changed) along with a duplication ack. > The t_dupack is reset on the openbsd, if it receive ack that > get a window shrink along with a duplication ack. > I don't know why the t_dupack is reset, if to get a window update. > I think Just it is skipped(is not reset), > if we receive the ack that is window update. like netbsd. >=20 > could you explain about this? Your assessment is correct. RFC 5681 doesn't seem to suggest that the 3 dupacks have to be consecutive but FreeBSD does. So, whenever we get a duplicate ack that doesn't follow the definition, we reset it to 0.=20 IMO, NetBSD implementation is correct in this regard. I and a bunch of other developers are looking into fixing this the right way. Cheers, Hiren --7JfCtLOvnd9MIVvH Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJWAPh8XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lNXIH/35pkslDLjdX1HmrctJs2fEd SfGYgZEFm5vfkCK+OM9bqZDKPCBUj7gEA8aFKeZfBJ8sz80cxCdI+z+2Zn0/Cwdk q7v6+Y5wxPCV3QnXDtNtQLzM8WQxiq/7IBz7TKXdW26LKjCSK252M+gH7lGg9kn0 z0d+MmWhG61tkaWtf0holYJEG8P/WI3kREh4QGPtmhPJUVXV+sKBc9796yvZKV1+ 8x0nGYQZlTHOLyQlNqFseN8L0x4okaPqmgvmpM1+w4ZEhbofIzKcdpXUPxEesJKf J5nT0MWcXdvJWJ7ht307sAAnAQ9uo1Xi0xl1SBi/rZiwyO+8Xz7OW7jGL8R1Cmw= =oUej -----END PGP SIGNATURE----- --7JfCtLOvnd9MIVvH-- From owner-freebsd-net@freebsd.org Tue Sep 22 08:35:03 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8F33CA03D13 for ; Tue, 22 Sep 2015 08:35:03 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id ACF791491; Tue, 22 Sep 2015 08:35:02 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA01002; Tue, 22 Sep 2015 11:35:00 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZeJ2R-0009s2-Uo; Tue, 22 Sep 2015 11:34:59 +0300 To: freebsd-net , "George V. Neville-Neil" From: Andriy Gapon Subject: page fault in tcp_do_segment (r287759 suspected) X-Enigmail-Draft-Status: N1110 Message-ID: <56011276.4060206@FreeBSD.org> Date: Tue, 22 Sep 2015 11:33:58 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 08:35:03 -0000 I've just got the following panic on amd64 r288066. I never experienced that kind of panic with r286985. Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8073dd68 stack pointer = 0x28:0xfffffe02b4cb9640 frame pointer = 0x28:0xfffffe02b4cb9700 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq260: re0) trap number = 12 panic: page fault cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at 0xffffffff8041c9fb = db_trace_self_wrapper+0x2b/frame 0xfffffe02b4cb9110 kdb_backtrace() at 0xffffffff80668239 = kdb_backtrace+0x39/frame 0xfffffe02b4cb91c0 vpanic() at 0xffffffff806334d2 = vpanic+0x152/frame 0xfffffe02b4cb9200 panic() at 0xffffffff80633213 = panic+0x43/frame 0xfffffe02b4cb9260 trap_fatal() at 0xffffffff8081fdc3 = trap_fatal+0x343/frame 0xfffffe02b4cb92c0 trap_pfault() at 0xffffffff8081fff6 = trap_pfault+0x206/frame 0xfffffe02b4cb9350 trap() at 0xffffffff8081f70a = trap+0x4ca/frame 0xfffffe02b4cb9560 trap_check() at 0xffffffff8082011a = trap_check+0x2a/frame 0xfffffe02b4cb9580 calltrap() at 0xffffffff80807db3 = calltrap+0x8/frame 0xfffffe02b4cb9580 --- trap 0xc, rip = 0xffffffff8073dd68, rsp = 0xfffffe02b4cb9650, rbp = 0xfffffe02b4cb9700 --- tcp_do_segment() at 0xffffffff8073dd68 = tcp_do_segment+0xbc8/frame 0xfffffe02b4cb9700 tcp_input() at 0xffffffff8073c899 = tcp_input+0x999/frame 0xfffffe02b4cb9810 ip_input() at 0xffffffff80733cbe = ip_input+0xbe/frame 0xfffffe02b4cb9860 netisr_dispatch_src() at 0xffffffff807121fe = netisr_dispatch_src+0x17e/frame 0xfffffe02b4cb98d0 netisr_dispatch() at 0xffffffff80712481 = netisr_dispatch+0x11/frame 0xfffffe02b4cb98e0 ether_demux() at 0xffffffff8070984b = ether_demux+0x13b/frame 0xfffffe02b4cb9910 ether_input_internal() at 0xffffffff8070a3ec = ether_input_internal+0x32c/frame 0xfffffe02b4cb9950 ether_nh_input() at 0xffffffff8070a093 = ether_nh_input+0x23/frame 0xfffffe02b4cb9960 netisr_dispatch_src() at 0xffffffff807121fe = netisr_dispatch_src+0x17e/frame 0xfffffe02b4cb99d0 netisr_dispatch() at 0xffffffff80712481 = netisr_dispatch+0x11/frame 0xfffffe02b4cb99e0 ether_input() at 0xffffffff80709b3c = ether_input+0x2c/frame 0xfffffe02b4cb9a00 re_rxeof() at 0xffffffff8049b858 = re_rxeof+0x228/frame 0xfffffe02b4cb9a60 re_intr_msi() at 0xffffffff8049d78f = re_intr_msi+0xbf/frame 0xfffffe02b4cb9aa0 intr_event_execute_handlers() at 0xffffffff805fec1f = intr_event_execute_handlers+0x12f/frame 0xfffffe02b4cb9b00 ithread_execute_handlers() at 0xffffffff805ff74c = ithread_execute_handlers+0x2c/frame 0xfffffe02b4cb9b20 ithread_loop() at 0xffffffff805ff5bb = ithread_loop+0x5b/frame 0xfffffe02b4cb9b80 fork_exit() at 0xffffffff805fc23b = fork_exit+0xdb/frame 0xfffffe02b4cb9bf0 fork_trampoline() at 0xffffffff808082ee = fork_trampoline+0xe/frame 0xfffffe02b4cb9bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- (kgdb) fr 9 #9 0xffffffff8073dd68 in tcp_do_segment (m=, th=0xfffff80052387022, so=0xfffff80207149000, tp=, drop_hdrlen=60, tlen=, iptos=, ti_locked=Cannot access memory at address 0x1 ) at /usr/src/sys/netinet/tcp_input.c:3103 3103 INP_INFO_RUNLOCK(&V_tcbinfo); (kgdb) list 3098 tcp_dropwithreset(m, th, NULL, tlen, rstreason); 3099 return; 3100 3101 drop: 3102 if (ti_locked == TI_RLOCKED) { 3103 INP_INFO_RUNLOCK(&V_tcbinfo); 3104 ti_locked = TI_UNLOCKED; 3105 } 3106 #ifdef INVARIANTS 3107 But judging from the disassembly the fault happens right after calling rw_runlock(): 0xffffffff8073dd50 : mov $0xffffffff80ff84d0,%rdi 0xffffffff8073dd57 : mov $0xffffffff809ba28f,%rsi 0xffffffff8073dd5e : mov $0xc1f,%edx 0xffffffff8073dd63 : callq 0xffffffff806309c0 <_rw_runlock_cookie> 0xffffffff8073dd68 : mov 0x10(%r12),%rdx 0xffffffff8073dd6d : mov %r15,%rdi 0xffffffff8073dd70 : mov %r14,%rsi That code actually looks like the following DTrace probe a few lines below: TCP_PROBE3(debug__input, tp, th, mtod(m, const char *)); So, it seems like 'm' could be NULL here. I see two places in tcp_do_segment() where m gets assigned with NULL followed by goto drop. If I had to guess then my guess would be that one of those code paths was taken. Since those NULL assignments were there for more than a year, then I would guess that the addition of the probe is to blame: https://svnweb.freebsd.org/base?view=revision&revision=287759 -- Andriy Gapon From owner-freebsd-net@freebsd.org Tue Sep 22 08:36:52 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 40DD2A03DFA for ; Tue, 22 Sep 2015 08:36:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2D7921580 for ; Tue, 22 Sep 2015 08:36:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8M8aqcM088148 for ; Tue, 22 Sep 2015 08:36:52 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Tue, 22 Sep 2015 08:36:52 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 08:36:52 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #2 from Andriy Gapon --- It might be useful to go to frame 8 and run disassemble there. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Tue Sep 22 08:41:16 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EB797A05119 for ; Tue, 22 Sep 2015 08:41:16 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) Received: from relay2.tomsk.ru (mail.sibptus.tomsk.ru [212.73.124.5]) by mx1.freebsd.org (Postfix) with ESMTP id 5CEDF1967 for ; Tue, 22 Sep 2015 08:41:15 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) X-Virus-Scanned: by clamd daemon 0.98.5_1 for FreeBSD at relay2.tomsk.ru Received: from admin.sibptus.TOMSK.ru ([212.73.125.240] verified) by relay2.tomsk.ru (CommuniGate Pro SMTP 5.1.16) with ESMTPS id 38871826 for freebsd-net@freebsd.org; Tue, 22 Sep 2015 14:41:13 +0600 Received: from admin.sibptus.TOMSK.ru (sudakov@localhost [127.0.0.1]) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7) with ESMTP id t8M8fCLv090500 for ; Tue, 22 Sep 2015 14:41:13 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) Received: (from sudakov@localhost) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7/Submit) id t8M8fCK9090499 for freebsd-net@freebsd.org; Tue, 22 Sep 2015 14:41:12 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) X-Authentication-Warning: admin.sibptus.TOMSK.ru: sudakov set sender to vas@mpeks.tomsk.su using -f Date: Tue, 22 Sep 2015 14:41:12 +0600 From: Victor Sudakov To: freebsd-net@freebsd.org Subject: transport mode IPSec with Windows 7, static keys Message-ID: <20150922084111.GA89385@admin.sibptus.tomsk.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Organization: OAO "Svyaztransneft", SibPTUS X-PGP-Key: http://www.dreamwidth.org/pubkey?user=victor_sudakov X-PGP-Fingerprint: 10E3 1171 1273 E007 C2E9 3532 0DA4 F259 9B5E C634 User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 08:41:17 -0000 Dear Colleagues, Has anyone tried to set up transport mode IPSec with Windows 7 using static keys? I have trouble finding encryption and authentication algorithms mutually acceptable on FreeBSD and Windows 7. The latter can only do des or 3des for encryption and md5 or sha1 for authentication, and requires both ealgo and aalgo to be configured. If anyone has a success story, could you please show your manually added SAD entries or, better still, the relevant "setkey -c add ..." commands? Thank you very much in advance. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN sip:sudakov@sibptus.tomsk.ru From owner-freebsd-net@freebsd.org Tue Sep 22 15:10:09 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CBFF0A06307 for ; Tue, 22 Sep 2015 15:10:09 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) Received: from relay2.tomsk.ru (mail.sibptus.tomsk.ru [212.73.124.5]) by mx1.freebsd.org (Postfix) with ESMTP id 3C1891632 for ; Tue, 22 Sep 2015 15:10:08 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) X-Virus-Scanned: by clamd daemon 0.98.5_1 for FreeBSD at relay2.tomsk.ru Received: from admin.sibptus.TOMSK.ru ([212.73.125.240] verified) by relay2.tomsk.ru (CommuniGate Pro SMTP 5.1.16) with ESMTPS id 38872467; Tue, 22 Sep 2015 21:10:06 +0600 Received: from admin.sibptus.TOMSK.ru (sudakov@localhost [127.0.0.1]) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7) with ESMTP id t8MFA3j4098544; Tue, 22 Sep 2015 21:10:05 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) Received: (from sudakov@localhost) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7/Submit) id t8MFA3KX098543; Tue, 22 Sep 2015 21:10:03 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) X-Authentication-Warning: admin.sibptus.TOMSK.ru: sudakov set sender to vas@mpeks.tomsk.su using -f Date: Tue, 22 Sep 2015 21:10:03 +0600 From: Victor Sudakov To: Larry Baird , freebsd-net@freebsd.org Subject: Re: transport mode IPSec with Windows 7, static keys Message-ID: <20150922151003.GA98507@admin.sibptus.tomsk.ru> References: <115822.44131.97331@localhost> <20150922144246.61965.qmail@mailgate.gta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150922144246.61965.qmail@mailgate.gta.com> Organization: OAO "Svyaztransneft", SibPTUS X-PGP-Key: http://www.dreamwidth.org/pubkey?user=victor_sudakov X-PGP-Fingerprint: 10E3 1171 1273 E007 C2E9 3532 0DA4 F259 9B5E C634 User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 15:10:09 -0000 Larry Baird wrote: > > > > Has anyone tried to set up transport mode IPSec with Windows 7 using > > static keys? > > > > I have trouble finding encryption and authentication algorithms > > mutually acceptable on FreeBSD and Windows 7. The latter can only do > > des or 3des for encryption and md5 or sha1 for authentication, and > > requires both ealgo and aalgo to be configured. > > > > If anyone has a success story, could you please show your manually > > added SAD entries or, better still, the relevant "setkey -c add ..." > > commands? > > > > Thank you very much in advance. > Try using strongswan and IKEv2. I don't have this running at the moment, > but I know I used link below to get it to work in the past. > https://wiki.strongswan.org/projects/strongswan/wiki/Windows7 I use IKE when I have to, but would like to use static keys with Windows specifically, or at least would like to definitely know if it is at all possible or not. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN sip:sudakov@sibptus.tomsk.ru From owner-freebsd-net@freebsd.org Tue Sep 22 16:47:04 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D838BA07407 for ; Tue, 22 Sep 2015 16:47:04 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 7E50E1AE8; Tue, 22 Sep 2015 16:47:04 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id 80C8AD122; Tue, 22 Sep 2015 18:46:56 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id wUruCjEF0MUp; Tue, 22 Sep 2015 18:46:56 +0200 (CEST) Received: from [10.0.0.143] (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 34889D11F; Tue, 22 Sep 2015 18:46:56 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> Date: Tue, 22 Sep 2015 18:46:55 +0200 Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky Content-Transfer-Encoding: quoted-printable Message-Id: <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 16:47:04 -0000 Hi all, > 21 sep 2015 kl. 15:53 skrev Palle Girgensohn : >=20 >>=20 >> 21 sep 2015 kl. 10:21 skrev Julien Charbon : >>=20 >>=20 >> Hi Konstantin, Hi Palle, >>=20 >> On 18/09/15 18:06, Konstantin Belousov wrote: >>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote: >>>> Hi Palle, >>>>=20 >>>> On 18/09/15 11:12, Palle Girgensohn wrote: >>>>> We see daily panics on our production systems (web server, apache >>>>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using = netgraph >>>>> interfaces [not epair]). >>>>>=20 >>>>> The problem started after the summer. Normal port upgrades seems = to >>>>> be the only difference. The problem occurs with 10.2-p2 kernel as >>>>> well as 10.1-p4 and 10.1-p15. >>>>>=20 >>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>>>>=20 >>>>> Any ideas? >>>>=20 >>>> Thanks for you detailed report. I am not aware of any = tcp_twclose() >>>> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean = there >>>> are none). Few interesting facts (at least for me): >>>>=20 >>>> - Your crash happens when unlocking a inp exclusive lock with = INP_WUNLOCK() >>>>=20 >>>> - Something is already wrong before calling turnstile_broadcast() = as it >>>> is called with ts =3D NULL: >>> In the kernel without witness this is a 99%-sure indication of = attempt to >>> unlock not owned lock. >>=20 >> Thanks, this is useful. So far I did not find any path where >> tcp_twclose() can call INP_WUNLOCK without having the exclusive lock >> held, that makes this issue interesting. >>=20 >>>> I won't go to far here as I am not expert enough in VIMAGE, but one >>>> question anyway: >>>>=20 >>>> - Can you correlate this kernel panic to a particular event? Like = for >>>> example a VIMAGE/VNET jail destruction. >>>>=20 >>>> I will test that on my side on a 10.2 machine. >>=20 >> I did not find any issues while testing 10.2 + VIMAGE on my side. = Thus >> Palle what I would suggest: >>=20 >> - First, test with stable/10 to see if by chance this issue has = already >> been fixed in stable branch. >>=20 >> - Second, if issue is still in stable/10, compile 10.2 kernel with >> these options: >>=20 >> options DDB >> options DEADLKRES >> options INVARIANTS >> options INVARIANT_SUPPORT >> options WITNESS >> options WITNESS_SKIPSPIN >>=20 >> To see where the original fault is coming from. >=20 > Hi, >=20 > We just had two crashes within 15 minutes using 10.2 with these two = added: >=20 > https://svnweb.freebsd.org/changeset/base/287261 >=20 > https://svnweb.freebsd.org/changeset/base/287780=20 >=20 > We don't always get a core dump, but the second time, we did. >=20 > very similar stack trace, but not identical: >=20 > (kgdb) #0 doadump (textdump=3D) at pcpu.h:219 > #1 0xffffffff80949a82 in kern_reboot (howto=3D260) > at /usr/src/sys/kern/kern_shutdown.c:451 > #2 0xffffffff80949e65 in vpanic (fmt=3D, > ap=3D) at = /usr/src/sys/kern/kern_shutdown.c:758 > #3 0xffffffff80949cf3 in panic (fmt=3D0x0) > at /usr/src/sys/kern/kern_shutdown.c:687 > #4 0xffffffff80d5d0bb in trap_fatal (frame=3D, > eva=3D) at /usr/src/sys/amd64/amd64/trap.c:851 > #5 0xffffffff80d5d3bd in trap_pfault (frame=3D0xfffffe1760bc1840, > usermode=3D) at = /usr/src/sys/amd64/amd64/trap.c:674 > #6 0xffffffff80d5ca5a in trap (frame=3D0xfffffe1760bc1840) > at /usr/src/sys/amd64/amd64/trap.c:440 > #7 0xffffffff80d42dd2 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:236 > #8 0xffffffff8099861c in turnstile_broadcast (ts=3D0x0, queue=3D1) > at /usr/src/sys/kern/subr_turnstile.c:838 > #9 0xffffffff80948100 in __rw_wunlock_hard (c=3D0xfffff811c43487a0, = tid=3D1, > file=3D0x1
, line=3D1) > at /usr/src/sys/kern/kern_rwlock.c:988 > #10 0xffffffff80b067c4 in tcp_twclose (tw=3D, > reuse=3D) at = /usr/src/sys/netinet/tcp_timewait.c:540 > #11 0xffffffff80b06e0b in tcp_tw_2msl_scan (reuse=3D0) > at /usr/src/sys/netinet/tcp_timewait.c:748 > #12 0xffffffff80b04b0e in tcp_slowtimo () > at /usr/src/sys/netinet/tcp_timer.c:198 > #13 0xffffffff809b7a04 in pfslowtimo (arg=3D0x0) > at /usr/src/sys/kern/uipc_domain.c:508 > #14 0xffffffff8095f91b in softclock_call_cc (c=3D0xffffffff81620bf0, > cc=3D0xffffffff8169dc00, direct=3D0) at = /usr/src/sys/kern/kern_timeout.c:685 > #15 0xffffffff8095fd44 in softclock (arg=3D0xffffffff8169dc00) > at /usr/src/sys/kern/kern_timeout.c:814 > #16 0xffffffff8091592b in intr_event_execute_handlers ( > p=3D, ie=3D0xfffff801102e0d00) > at /usr/src/sys/kern/kern_intr.c:1264 > #17 0xffffffff80915d76 in ithread_loop (arg=3D0xfffff801102adee0) > at /usr/src/sys/kern/kern_intr.c:1277 > #18 0xffffffff8091347a in fork_exit ( > callout=3D0xffffffff80915ce0 , = arg=3D0xfffff801102adee0, > frame=3D0xfffffe1760bc1c00) at /usr/src/sys/kern/kern_fork.c:1018 > #19 0xffffffff80d4330e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:611 > #20 0x0000000000000000 in ?? () >=20 >=20 >=20 > I'll try stable/10 now. Would you suggest a "clean" stable/10, or = could 287621 and 287780 help? >=20 > I'll add the debugging suggested options right away. >=20 > Palle I have a new core dump from ^/stable/10 with: options DDB options DEADLKRES options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN What can I do with the core dump? "corrupt stack"... (kgdb) #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in kassert_panic (fmt=3D) at /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in tcp_usr_detach (so=3D) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in soclose (so=3D) at /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at = file.h:343 #8 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, fd=3D, fp=3D0xfffff802a593db40, = td=3D0xfffff80eebc894a0, holdleaders=3D) at /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in amd64_syscall (td=3D0xfffff80eebc894a0, = traced=3D0) at subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) Thanks, Palle From owner-freebsd-net@freebsd.org Tue Sep 22 16:49:47 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9275A074D7 for ; Tue, 22 Sep 2015 16:49:46 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 4A42E1C02; Tue, 22 Sep 2015 16:49:45 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id 4ED0AD13C; Tue, 22 Sep 2015 18:49:45 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id CSKo1GG-g3nl; Tue, 22 Sep 2015 18:49:44 +0200 (CEST) Received: from [10.0.0.143] (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id CEE1BD139; Tue, 22 Sep 2015 18:49:41 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> Date: Tue, 22 Sep 2015 18:49:41 +0200 Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky Content-Transfer-Encoding: quoted-printable Message-Id: <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 16:49:47 -0000 > 22 sep 2015 kl. 18:46 skrev Palle Girgensohn : >=20 > Hi all, >=20 >=20 >> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn : >>=20 >>>=20 >>> 21 sep 2015 kl. 10:21 skrev Julien Charbon : >>>=20 >>>=20 >>> Hi Konstantin, Hi Palle, >>>=20 >>> On 18/09/15 18:06, Konstantin Belousov wrote: >>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon wrote: >>>>> Hi Palle, >>>>>=20 >>>>> On 18/09/15 11:12, Palle Girgensohn wrote: >>>>>> We see daily panics on our production systems (web server, apache >>>>>> running MPM event, openjdk8. Kernel with VIMAGE. Jails using = netgraph >>>>>> interfaces [not epair]). >>>>>>=20 >>>>>> The problem started after the summer. Normal port upgrades seems = to >>>>>> be the only difference. The problem occurs with 10.2-p2 kernel as >>>>>> well as 10.1-p4 and 10.1-p15. >>>>>>=20 >>>>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D203175 >>>>>>=20 >>>>>> Any ideas? >>>>>=20 >>>>> Thanks for you detailed report. I am not aware of any = tcp_twclose() >>>>> related issues (without VIMAGE) since FreeBSD 10.0 (does not mean = there >>>>> are none). Few interesting facts (at least for me): >>>>>=20 >>>>> - Your crash happens when unlocking a inp exclusive lock with = INP_WUNLOCK() >>>>>=20 >>>>> - Something is already wrong before calling turnstile_broadcast() = as it >>>>> is called with ts =3D NULL: >>>> In the kernel without witness this is a 99%-sure indication of = attempt to >>>> unlock not owned lock. >>>=20 >>> Thanks, this is useful. So far I did not find any path where >>> tcp_twclose() can call INP_WUNLOCK without having the exclusive lock >>> held, that makes this issue interesting. >>>=20 >>>>> I won't go to far here as I am not expert enough in VIMAGE, but = one >>>>> question anyway: >>>>>=20 >>>>> - Can you correlate this kernel panic to a particular event? Like = for >>>>> example a VIMAGE/VNET jail destruction. >>>>>=20 >>>>> I will test that on my side on a 10.2 machine. >>>=20 >>> I did not find any issues while testing 10.2 + VIMAGE on my side. = Thus >>> Palle what I would suggest: >>>=20 >>> - First, test with stable/10 to see if by chance this issue has = already >>> been fixed in stable branch. >>>=20 >>> - Second, if issue is still in stable/10, compile 10.2 kernel with >>> these options: >>>=20 >>> options DDB >>> options DEADLKRES >>> options INVARIANTS >>> options INVARIANT_SUPPORT >>> options WITNESS >>> options WITNESS_SKIPSPIN >>>=20 >>> To see where the original fault is coming from. >>=20 >> Hi, >>=20 >> We just had two crashes within 15 minutes using 10.2 with these two = added: >>=20 >> https://svnweb.freebsd.org/changeset/base/287261 >>=20 >> https://svnweb.freebsd.org/changeset/base/287780=20 >>=20 >> We don't always get a core dump, but the second time, we did. >>=20 >> very similar stack trace, but not identical: >>=20 >> (kgdb) #0 doadump (textdump=3D) at pcpu.h:219 >> #1 0xffffffff80949a82 in kern_reboot (howto=3D260) >> at /usr/src/sys/kern/kern_shutdown.c:451 >> #2 0xffffffff80949e65 in vpanic (fmt=3D, >> ap=3D) at = /usr/src/sys/kern/kern_shutdown.c:758 >> #3 0xffffffff80949cf3 in panic (fmt=3D0x0) >> at /usr/src/sys/kern/kern_shutdown.c:687 >> #4 0xffffffff80d5d0bb in trap_fatal (frame=3D, >> eva=3D) at /usr/src/sys/amd64/amd64/trap.c:851 >> #5 0xffffffff80d5d3bd in trap_pfault (frame=3D0xfffffe1760bc1840, >> usermode=3D) at = /usr/src/sys/amd64/amd64/trap.c:674 >> #6 0xffffffff80d5ca5a in trap (frame=3D0xfffffe1760bc1840) >> at /usr/src/sys/amd64/amd64/trap.c:440 >> #7 0xffffffff80d42dd2 in calltrap () >> at /usr/src/sys/amd64/amd64/exception.S:236 >> #8 0xffffffff8099861c in turnstile_broadcast (ts=3D0x0, queue=3D1) >> at /usr/src/sys/kern/subr_turnstile.c:838 >> #9 0xffffffff80948100 in __rw_wunlock_hard (c=3D0xfffff811c43487a0, = tid=3D1, >> file=3D0x1
, line=3D1) >> at /usr/src/sys/kern/kern_rwlock.c:988 >> #10 0xffffffff80b067c4 in tcp_twclose (tw=3D, >> reuse=3D) at = /usr/src/sys/netinet/tcp_timewait.c:540 >> #11 0xffffffff80b06e0b in tcp_tw_2msl_scan (reuse=3D0) >> at /usr/src/sys/netinet/tcp_timewait.c:748 >> #12 0xffffffff80b04b0e in tcp_slowtimo () >> at /usr/src/sys/netinet/tcp_timer.c:198 >> #13 0xffffffff809b7a04 in pfslowtimo (arg=3D0x0) >> at /usr/src/sys/kern/uipc_domain.c:508 >> #14 0xffffffff8095f91b in softclock_call_cc (c=3D0xffffffff81620bf0, >> cc=3D0xffffffff8169dc00, direct=3D0) at = /usr/src/sys/kern/kern_timeout.c:685 >> #15 0xffffffff8095fd44 in softclock (arg=3D0xffffffff8169dc00) >> at /usr/src/sys/kern/kern_timeout.c:814 >> #16 0xffffffff8091592b in intr_event_execute_handlers ( >> p=3D, ie=3D0xfffff801102e0d00) >> at /usr/src/sys/kern/kern_intr.c:1264 >> #17 0xffffffff80915d76 in ithread_loop (arg=3D0xfffff801102adee0) >> at /usr/src/sys/kern/kern_intr.c:1277 >> #18 0xffffffff8091347a in fork_exit ( >> callout=3D0xffffffff80915ce0 , = arg=3D0xfffff801102adee0, >> frame=3D0xfffffe1760bc1c00) at /usr/src/sys/kern/kern_fork.c:1018 >> #19 0xffffffff80d4330e in fork_trampoline () >> at /usr/src/sys/amd64/amd64/exception.S:611 >> #20 0x0000000000000000 in ?? () >>=20 >>=20 >>=20 >> I'll try stable/10 now. Would you suggest a "clean" stable/10, or = could 287621 and 287780 help? >>=20 >> I'll add the debugging suggested options right away. >>=20 >> Palle >=20 >=20 > I have a new core dump from ^/stable/10 with: >=20 >=20 > options DDB > options DEADLKRES > options INVARIANTS > options INVARIANT_SUPPORT > options WITNESS > options WITNESS_SKIPSPIN >=20 >=20 > What can I do with the core dump? "corrupt stack"... >=20 > (kgdb) #0 doadump (textdump=3D1) at pcpu.h:219 > #1 0xffffffff8094b337 in kern_reboot (howto=3D260) > at /usr/src/sys/kern/kern_shutdown.c:451 > #2 0xffffffff8094b845 in vpanic (fmt=3D, > ap=3D) at = /usr/src/sys/kern/kern_shutdown.c:758 > #3 0xffffffff8094b6d9 in kassert_panic (fmt=3D) > at /usr/src/sys/kern/kern_shutdown.c:646 > #4 0xffffffff80b1ee59 in tcp_usr_detach (so=3D) > at /usr/src/sys/netinet/tcp_usrreq.c:202 > #5 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) > at /usr/src/sys/kern/uipc_socket.c:747 > #6 0xffffffff809cdb00 in soclose (so=3D) > at /usr/src/sys/kern/uipc_socket.c:849 > #7 0xffffffff808fe659 in _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) = at file.h:343 > #8 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40, > td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 > #9 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, > fd=3D, fp=3D0xfffff802a593db40, = td=3D0xfffff80eebc894a0, > holdleaders=3D) > at /usr/src/sys/kern/kern_descrip.c:1194 > #10 0xffffffff80d7bc3a in amd64_syscall (td=3D0xfffff80eebc894a0, = traced=3D0) > at subr_syscall.c:134 > #11 0xffffffff80d5f1db in Xfast_syscall () > at /usr/src/sys/amd64/amd64/exception.S:396 > #12 0x0000000801c8d94a in ?? () > Previous frame inner to this frame (corrupt stack?) > Current language: auto; currently minimal > (kgdb) >=20 >=20 > Thanks, > Palle >=20 # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain = conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for = details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame = 0xfffffe183d9e97e0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D = 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic Uptime: 18h57m59s Dumping 23085 out of 98263 = MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for /boot/kernel/ng_bridge.ko.symbols Reading symbols from /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for /boot/kernel/netgraph.ko.symbols Reading symbols from /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for /boot/kernel/ng_eiface.ko.symbols Reading symbols from /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for /boot/kernel/ng_ether.ko.symbols Reading symbols from /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for /boot/kernel/accf_data.ko.symbols Reading symbols from /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for /boot/kernel/accf_http.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for /boot/kernel/ng_socket.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in kern_reboot (howto=3D260) at = /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in kassert_panic (fmt=3D) at = /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in tcp_usr_detach (so=3D) at = /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at = /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in soclose (so=3D) at = /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at = file.h:343 #8 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40, = td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, fd=3D, fp=3D0xfffff802a593db40,=20 td=3D0xfffff80eebc894a0, holdleaders=3D) at = /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in amd64_syscall (td=3D0xfffff80eebc894a0, = traced=3D0) at subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at = /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) f 8 #8 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40, = td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 2338 return (fdrop(fp, td)); (kgdb) help=20 List of classes of commands: aliases -- Aliases of other commands breakpoints -- Making program stop at certain points data -- Examining data files -- Specifying and examining files internals -- Maintenance commands obscure -- Obscure features running -- Running the program stack -- Examining the stack status -- Status inquiries support -- Support facilities tracepoints -- Tracing of program execution without stopping the program user-defined -- User-defined commands Type "help" followed by a class name for a list of commands in that = class. Type "help" followed by command name for full documentation. Command name abbreviations are allowed if unambiguous. (kgdb) disassemble Dump of assembler code for function closef: 0xffffffff80900eb0 : push %rbp 0xffffffff80900eb1 : mov %rsp,%rbp 0xffffffff80900eb4 : push %r15 0xffffffff80900eb6 : push %r14 0xffffffff80900eb8 : push %r13 0xffffffff80900eba : push %r12 0xffffffff80900ebc : push %rbx 0xffffffff80900ebd : sub $0x58,%rsp 0xffffffff80900ec1 : mov %rsi,%r12 0xffffffff80900ec4 : mov %rdi,%r14 0xffffffff80900ec7 : cmpw $0x1,0x20(%r14) 0xffffffff80900ecd : jne 0xffffffff80901077 0xffffffff80900ed3 : test %r12,%r12 0xffffffff80900ed6 : je 0xffffffff80901077 0xffffffff80900edc : mov 0x8(%r12),%rax 0xffffffff80900ee1 : mov 0x428(%rax),%rcx 0xffffffff80900ee8 : testb $0x1,0xb0(%rcx) 0xffffffff80900eef : je 0xffffffff80900f50 0xffffffff80900ef1 : mov 0x18(%r14),%rcx 0xffffffff80900ef5 : movw $0x0,-0x62(%rbp) 0xffffffff80900efb : movq $0x0,-0x78(%rbp) 0xffffffff80900f03 : movq $0x0,-0x70(%rbp) 0xffffffff80900f0b : movw $0x2,-0x64(%rbp) 0xffffffff80900f11 : mov 0x428(%rax),%rax 0xffffffff80900f18 : movq = $0xffffffff81557f68,-0x58(%rbp) 0xffffffff80900f20 : mov %rcx,-0x50(%rbp) 0xffffffff80900f24 : mov %rax,-0x48(%rbp) 0xffffffff80900f28 : movl $0x2,-0x40(%rbp) 0xffffffff80900f2f : lea -0x78(%rbp),%rax 0xffffffff80900f33 : mov %rax,-0x38(%rbp) 0xffffffff80900f37 : movl $0x40,-0x30(%rbp) 0xffffffff80900f3e : mov 0x8(%rcx),%rdi 0xffffffff80900f42 : lea -0x58(%rbp),%rsi 0xffffffff80900f46 : callq 0xffffffff80ea8870 = 0xffffffff80900f4b : mov 0x8(%r12),%rax 0xffffffff80900f50 : mov 0x50(%rax),%rbx 0xffffffff80900f54 : test %rbx,%rbx 0xffffffff80900f57 : je 0xffffffff80901077 = 0xffffffff80900f5d : mov 0x48(%rax),%r15 0xffffffff80900f61 : add $0x40,%r15 0xffffffff80900f65 : xor %esi,%esi 0xffffffff80900f67 : mov $0xffffffff810042e9,%rdx 0xffffffff80900f6e : mov $0x906,%ecx 0xffffffff80900f73 : mov %r15,%rdi 0xffffffff80900f76 : callq 0xffffffff80952ba0 = <_sx_xlock> 0xffffffff80900f7b : mov 0x20(%rbx),%rbx 0xffffffff80900f7f : mov 0x8(%r12),%rax 0xffffffff80900f84 : cmp 0x50(%rax),%rbx ---Type to continue, or q to quit--- 0xffffffff80900f88 : je 0xffffffff80901063 = 0xffffffff80900f8e : lea -0x58(%rbp),%r13 0xffffffff80900f92 : nopw %cs:0x0(%rax,%rax,1) 0xffffffff80900fa0 : mov 0x10(%rbx),%rax 0xffffffff80900fa4 : testb $0x1,0xb0(%rax) 0xffffffff80900fab : je 0xffffffff80901050 = 0xffffffff80900fb1 : incl 0x4(%rbx) 0xffffffff80900fb4 : mov $0xffffffff810042e9,%rsi 0xffffffff80900fbb : mov $0x90e,%edx 0xffffffff80900fc0 : mov %r15,%rdi 0xffffffff80900fc3 : callq 0xffffffff80952f90 = <_sx_xunlock> 0xffffffff80900fc8 : movw $0x0,-0x62(%rbp) 0xffffffff80900fce : movq $0x0,-0x78(%rbp) 0xffffffff80900fd6 : movq $0x0,-0x70(%rbp) 0xffffffff80900fde : movw $0x2,-0x64(%rbp) 0xffffffff80900fe4 : mov 0x18(%r14),%rax 0xffffffff80900fe8 : mov 0x10(%rbx),%rcx 0xffffffff80900fec : movq = $0xffffffff81557f68,-0x58(%rbp) 0xffffffff80900ff4 : mov %rax,-0x50(%rbp) 0xffffffff80900ff8 : mov %rcx,-0x48(%rbp) 0xffffffff80900ffc : movl $0x2,-0x40(%rbp) 0xffffffff80901003 : lea -0x78(%rbp),%rcx 0xffffffff80901007 : mov %rcx,-0x38(%rbp) 0xffffffff8090100b : movl $0x40,-0x30(%rbp) 0xffffffff80901012 : mov 0x8(%rax),%rdi 0xffffffff80901016 : mov %r13,%rsi 0xffffffff80901019 : callq 0xffffffff80ea8870 = 0xffffffff8090101e : xor %esi,%esi 0xffffffff80901020 : mov $0xffffffff810042e9,%rdx 0xffffffff80901027 : mov $0x917,%ecx 0xffffffff8090102c : mov %r15,%rdi 0xffffffff8090102f : callq 0xffffffff80952ba0 = <_sx_xlock> 0xffffffff80901034 : decl 0x4(%rbx) 0xffffffff80901037 : jne 0xffffffff80901050 = 0xffffffff80901039 : cmpl $0x0,0x8(%rbx) 0xffffffff8090103d : je 0xffffffff80901050 = 0xffffffff8090103f : movl $0x0,0x8(%rbx) 0xffffffff80901046 : mov %rbx,%rdi 0xffffffff80901049 : callq 0xffffffff80954a40 = 0xffffffff8090104e : xchg %ax,%ax 0xffffffff80901050 : mov 0x20(%rbx),%rbx 0xffffffff80901054 : mov 0x8(%r12),%rax 0xffffffff80901059 : cmp 0x50(%rax),%rbx 0xffffffff8090105d : jne 0xffffffff80900fa0 = 0xffffffff80901063 : mov $0xffffffff810042e9,%rsi 0xffffffff8090106a : mov $0x91f,%edx 0xffffffff8090106f : mov %r15,%rdi 0xffffffff80901072 : callq 0xffffffff80952f90 = <_sx_xunlock> 0xffffffff80901077 : mov $0xffffffff,%eax ---Type to continue, or q to quit--- 0xffffffff8090107c : lock xadd %eax,0x28(%r14) 0xffffffff80901082 : cmp $0x1,%eax 0xffffffff80901085 : jne 0xffffffff809010a5 = 0xffffffff80901087 : mov %r14,%rdi 0xffffffff8090108a : mov %r12,%rsi 0xffffffff8090108d : callq 0xffffffff808fe630 = <_fdrop> 0xffffffff80901092 : mov %eax,%ebx 0xffffffff80901094 : mov %ebx,%eax 0xffffffff80901096 : add $0x58,%rsp 0xffffffff8090109a : pop %rbx 0xffffffff8090109b : pop %r12 0xffffffff8090109d : pop %r13 0xffffffff8090109f : pop %r14 0xffffffff809010a1 : pop %r15 0xffffffff809010a3 : pop %rbp 0xffffffff809010a4 : retq =20 0xffffffff809010a5 : xor %ebx,%ebx 0xffffffff809010a7 : test %eax,%eax 0xffffffff809010a9 : jne 0xffffffff80901094 = 0xffffffff809010ab : add $0x28,%r14 0xffffffff809010af : xor %ebx,%ebx 0xffffffff809010b1 : mov $0xffffffff80ebcddb,%rdi 0xffffffff809010b8 : xor %eax,%eax 0xffffffff809010ba : mov %r14,%rsi 0xffffffff809010bd : callq 0xffffffff8094b5a0 = 0xffffffff809010c2 : jmp 0xffffffff80901094 = End of assembler dump. From owner-freebsd-net@freebsd.org Tue Sep 22 16:51:25 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4C03AA0761E for ; Tue, 22 Sep 2015 16:51:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3133010B7 for ; Tue, 22 Sep 2015 16:51:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8MGpPCn043576 for ; Tue, 22 Sep 2015 16:51:25 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Tue, 22 Sep 2015 16:51:25 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 16:51:25 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #3 from Palle Girgensohn --- Hi! This is a fresh core dump. This is beyond the scope of my experience, so please advice what to do next. Thanks! :-) # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid = 16 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp = 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic Uptime: 18h57m59s Dumping 23085 out of 98263 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for /boot/kernel/ng_bridge.ko.symbols Reading symbols from /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for /boot/kernel/netgraph.ko.symbols Reading symbols from /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for /boot/kernel/ng_eiface.ko.symbols Reading symbols from /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for /boot/kernel/ng_ether.ko.symbols Reading symbols from /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for /boot/kernel/accf_data.ko.symbols Reading symbols from /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for /boot/kernel/accf_http.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for /boot/kernel/ng_socket.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=1) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8094b337 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in kassert_panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in tcp_usr_detach (so=) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in sofree (so=0xfffff801dd302000) at /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in soclose (so=) at /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8 0xffffffff80901092 in closef (fp=0xfffff802a593db40, td=0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d in closefp (fdp=0xfffff80b20cce000, fd=, fp=0xfffff802a593db40, td=0xfffff80eebc894a0, holdleaders=) at /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in amd64_syscall (td=0xfffff80eebc894a0, traced=0) at subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) f 8 #8 0xffffffff80901092 in closef (fp=0xfffff802a593db40, td=0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 2338 return (fdrop(fp, td)); (kgdb) help List of classes of commands: aliases -- Aliases of other commands breakpoints -- Making program stop at certain points data -- Examining data files -- Specifying and examining files internals -- Maintenance commands obscure -- Obscure features running -- Running the program stack -- Examining the stack status -- Status inquiries support -- Support facilities tracepoints -- Tracing of program execution without stopping the program user-defined -- User-defined commands Type "help" followed by a class name for a list of commands in that class. Type "help" followed by command name for full documentation. Command name abbreviations are allowed if unambiguous. (kgdb) disassemble Dump of assembler code for function closef: 0xffffffff80900eb0 : push %rbp 0xffffffff80900eb1 : mov %rsp,%rbp 0xffffffff80900eb4 : push %r15 0xffffffff80900eb6 : push %r14 0xffffffff80900eb8 : push %r13 0xffffffff80900eba : push %r12 0xffffffff80900ebc : push %rbx 0xffffffff80900ebd : sub $0x58,%rsp 0xffffffff80900ec1 : mov %rsi,%r12 0xffffffff80900ec4 : mov %rdi,%r14 0xffffffff80900ec7 : cmpw $0x1,0x20(%r14) 0xffffffff80900ecd : jne 0xffffffff80901077 0xffffffff80900ed3 : test %r12,%r12 0xffffffff80900ed6 : je 0xffffffff80901077 0xffffffff80900edc : mov 0x8(%r12),%rax 0xffffffff80900ee1 : mov 0x428(%rax),%rcx 0xffffffff80900ee8 : testb $0x1,0xb0(%rcx) 0xffffffff80900eef : je 0xffffffff80900f50 0xffffffff80900ef1 : mov 0x18(%r14),%rcx 0xffffffff80900ef5 : movw $0x0,-0x62(%rbp) 0xffffffff80900efb : movq $0x0,-0x78(%rbp) 0xffffffff80900f03 : movq $0x0,-0x70(%rbp) 0xffffffff80900f0b : movw $0x2,-0x64(%rbp) 0xffffffff80900f11 : mov 0x428(%rax),%rax 0xffffffff80900f18 : movq $0xffffffff81557f68,-0x58(%rbp) 0xffffffff80900f20 : mov %rcx,-0x50(%rbp) 0xffffffff80900f24 : mov %rax,-0x48(%rbp) 0xffffffff80900f28 : movl $0x2,-0x40(%rbp) 0xffffffff80900f2f : lea -0x78(%rbp),%rax 0xffffffff80900f33 : mov %rax,-0x38(%rbp) 0xffffffff80900f37 : movl $0x40,-0x30(%rbp) 0xffffffff80900f3e : mov 0x8(%rcx),%rdi 0xffffffff80900f42 : lea -0x58(%rbp),%rsi 0xffffffff80900f46 : callq 0xffffffff80ea8870 0xffffffff80900f4b : mov 0x8(%r12),%rax 0xffffffff80900f50 : mov 0x50(%rax),%rbx 0xffffffff80900f54 : test %rbx,%rbx 0xffffffff80900f57 : je 0xffffffff80901077 0xffffffff80900f5d : mov 0x48(%rax),%r15 0xffffffff80900f61 : add $0x40,%r15 0xffffffff80900f65 : xor %esi,%esi 0xffffffff80900f67 : mov $0xffffffff810042e9,%rdx 0xffffffff80900f6e : mov $0x906,%ecx 0xffffffff80900f73 : mov %r15,%rdi 0xffffffff80900f76 : callq 0xffffffff80952ba0 <_sx_xlock> 0xffffffff80900f7b : mov 0x20(%rbx),%rbx 0xffffffff80900f7f : mov 0x8(%r12),%rax 0xffffffff80900f84 : cmp 0x50(%rax),%rbx ---Type to continue, or q to quit--- 0xffffffff80900f88 : je 0xffffffff80901063 0xffffffff80900f8e : lea -0x58(%rbp),%r13 0xffffffff80900f92 : nopw %cs:0x0(%rax,%rax,1) 0xffffffff80900fa0 : mov 0x10(%rbx),%rax 0xffffffff80900fa4 : testb $0x1,0xb0(%rax) 0xffffffff80900fab : je 0xffffffff80901050 0xffffffff80900fb1 : incl 0x4(%rbx) 0xffffffff80900fb4 : mov $0xffffffff810042e9,%rsi 0xffffffff80900fbb : mov $0x90e,%edx 0xffffffff80900fc0 : mov %r15,%rdi 0xffffffff80900fc3 : callq 0xffffffff80952f90 <_sx_xunlock> 0xffffffff80900fc8 : movw $0x0,-0x62(%rbp) 0xffffffff80900fce : movq $0x0,-0x78(%rbp) 0xffffffff80900fd6 : movq $0x0,-0x70(%rbp) 0xffffffff80900fde : movw $0x2,-0x64(%rbp) 0xffffffff80900fe4 : mov 0x18(%r14),%rax 0xffffffff80900fe8 : mov 0x10(%rbx),%rcx 0xffffffff80900fec : movq $0xffffffff81557f68,-0x58(%rbp) 0xffffffff80900ff4 : mov %rax,-0x50(%rbp) 0xffffffff80900ff8 : mov %rcx,-0x48(%rbp) 0xffffffff80900ffc : movl $0x2,-0x40(%rbp) 0xffffffff80901003 : lea -0x78(%rbp),%rcx 0xffffffff80901007 : mov %rcx,-0x38(%rbp) 0xffffffff8090100b : movl $0x40,-0x30(%rbp) 0xffffffff80901012 : mov 0x8(%rax),%rdi 0xffffffff80901016 : mov %r13,%rsi 0xffffffff80901019 : callq 0xffffffff80ea8870 0xffffffff8090101e : xor %esi,%esi 0xffffffff80901020 : mov $0xffffffff810042e9,%rdx 0xffffffff80901027 : mov $0x917,%ecx 0xffffffff8090102c : mov %r15,%rdi 0xffffffff8090102f : callq 0xffffffff80952ba0 <_sx_xlock> 0xffffffff80901034 : decl 0x4(%rbx) 0xffffffff80901037 : jne 0xffffffff80901050 0xffffffff80901039 : cmpl $0x0,0x8(%rbx) 0xffffffff8090103d : je 0xffffffff80901050 0xffffffff8090103f : movl $0x0,0x8(%rbx) 0xffffffff80901046 : mov %rbx,%rdi 0xffffffff80901049 : callq 0xffffffff80954a40 0xffffffff8090104e : xchg %ax,%ax 0xffffffff80901050 : mov 0x20(%rbx),%rbx 0xffffffff80901054 : mov 0x8(%r12),%rax 0xffffffff80901059 : cmp 0x50(%rax),%rbx 0xffffffff8090105d : jne 0xffffffff80900fa0 0xffffffff80901063 : mov $0xffffffff810042e9,%rsi 0xffffffff8090106a : mov $0x91f,%edx 0xffffffff8090106f : mov %r15,%rdi 0xffffffff80901072 : callq 0xffffffff80952f90 <_sx_xunlock> 0xffffffff80901077 : mov $0xffffffff,%eax ---Type to continue, or q to quit--- 0xffffffff8090107c : lock xadd %eax,0x28(%r14) 0xffffffff80901082 : cmp $0x1,%eax 0xffffffff80901085 : jne 0xffffffff809010a5 0xffffffff80901087 : mov %r14,%rdi 0xffffffff8090108a : mov %r12,%rsi 0xffffffff8090108d : callq 0xffffffff808fe630 <_fdrop> 0xffffffff80901092 : mov %eax,%ebx 0xffffffff80901094 : mov %ebx,%eax 0xffffffff80901096 : add $0x58,%rsp 0xffffffff8090109a : pop %rbx 0xffffffff8090109b : pop %r12 0xffffffff8090109d : pop %r13 0xffffffff8090109f : pop %r14 0xffffffff809010a1 : pop %r15 0xffffffff809010a3 : pop %rbp 0xffffffff809010a4 : retq 0xffffffff809010a5 : xor %ebx,%ebx 0xffffffff809010a7 : test %eax,%eax 0xffffffff809010a9 : jne 0xffffffff80901094 0xffffffff809010ab : add $0x28,%r14 0xffffffff809010af : xor %ebx,%ebx 0xffffffff809010b1 : mov $0xffffffff80ebcddb,%rdi 0xffffffff809010b8 : xor %eax,%eax 0xffffffff809010ba : mov %r14,%rsi 0xffffffff809010bd : callq 0xffffffff8094b5a0 0xffffffff809010c2 : jmp 0xffffffff80901094 End of assembler dump. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Tue Sep 22 16:52:14 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2D3FDA0776E for ; Tue, 22 Sep 2015 16:52:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 19FD61346 for ; Tue, 22 Sep 2015 16:52:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8MGqDtE046568 for ; Tue, 22 Sep 2015 16:52:13 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Tue, 22 Sep 2015 16:52:14 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 16:52:14 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #4 from Palle Girgensohn --- The core dump has options DDB options DEADLKRES options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN so it should be possible to get more information, right? -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Tue Sep 22 18:16:54 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 53213A07B60 for ; Tue, 22 Sep 2015 18:16:54 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-qg0-f48.google.com (mail-qg0-f48.google.com [209.85.192.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 10D0814F2; Tue, 22 Sep 2015 18:16:53 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by qgev79 with SMTP id v79so560038qge.0; Tue, 22 Sep 2015 11:16:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=bBnHvRUhWwZpnhuKnsm79JiGaBwHE9TWsPMgzHSXjVU=; b=kKvRrC3k5cuM8ngUETFBlyMTkmu+48NgI4OCgtymlTBX93h6GaJXYQuBOjjxcULQyH ccchxKVaYmoz8QByazk6MxterLS32uXCs9g+em0lE8DYMR4YJWi76fA0/BgEG82lob6a eBnDN7Haltqc4+V/4kYhH+kAzCzswSgoddi0fQ1a47lhsq2iZPXTzXqCD8j6PXpUOivb NklKIspe+Ex4aIG4BMXPIGFnDp80LFZlFME7EMhBsWQCjE2Z3sz4JOfuuSUC6Vpm1PWf sjphsH4DvgF5hMEoLhJEhOgKqYrkmFbWc/vLkOMglK4rsvIP/LYItM0mi+268O5K9UEY zeqA== X-Received: by 10.140.42.136 with SMTP id c8mr31075709qga.64.1442945807182; Tue, 22 Sep 2015 11:16:47 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([2a02:aa11:2100:2380:99bb:ad06:48d7:e0e9]) by smtp.googlemail.com with ESMTPSA id b16sm1064034qkj.1.2015.09.22.11.16.45 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 11:16:46 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky From: Julien Charbon X-Enigmail-Draft-Status: N1210 Message-ID: <56019AF8.8000705@freebsd.org> Date: Tue, 22 Sep 2015 20:16:24 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="EgETcgrd61vGi0aPEO2cRVwqs1AH5WosX" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 18:16:54 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --EgETcgrd61vGi0aPEO2cRVwqs1AH5WosX Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Palle, On 22/09/15 18:49, Palle Girgensohn wrote: >> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn >> : >>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>> : >>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>> : On 18/09/15 18:06, Konstantin Belousov >>>> wrote: >>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon >>>>> wrote: >>>>>> [...] >>>> - Second, if issue is still in stable/10, compile 10.2 kernel >>>> with these options: >>>>=20 >>>> options DDB options DEADLKRES options >>>> INVARIANTS options INVARIANT_SUPPORT options >>>> WITNESS options WITNESS_SKIPSPIN >>>>=20 >>>> To see where the original fault is coming from. >>> [...] >>>=20 >>> I'll try stable/10 now. Would you suggest a "clean" stable/10, >>> or could 287621 and 287780 help? >>>=20 >>> I'll add the debugging suggested options right away. >>>=20 >>> Palle >>=20 >> I have a new core dump from ^/stable/10 with: >>=20 >> options DDB options DEADLKRES options >> INVARIANTS options INVARIANT_SUPPORT options >> WITNESS options WITNESS_SKIPSPIN >=20 > # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] Copyright > 2004 Free Software Foundation, Inc. GDB is free software, covered > by the GNU General Public License, and you are welcome to change it > and/or distribute copies of it under certain conditions. Type "show > copying" to see the conditions. There is absolutely no warranty for > GDB. Type "show warranty" for details. This GDB was configured as > "amd64-marcel-freebsd"... >=20 > Unread portion of the kernel message buffer: panic: tcp_detach: > INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack > backtrace: db_trace_self_wrapper() at > db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 kdb_backtrace() > at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 vpanic() at > vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() at > kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() at > tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at > sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at > soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at > _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame > 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame > 0xfffffe183d9e9ae0 amd64_syscall() at amd64_syscall+0x25a/frame > 0xfffffe183d9e9bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame > 0xfffffe183d9e9bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip =3D > 0x801c8d94a, rsp =3D 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: > enter: panic Uptime: 18h57m59s Dumping 23085 out of 98263 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >=20 > Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded > symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from > /boot/kernel/zfs.ko.symbols...done. Loaded symbols for > /boot/kernel/zfs.ko.symbols Reading symbols from > /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for > /boot/kernel/opensolaris.ko.symbols Reading symbols from > /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for > /boot/kernel/ng_bridge.ko.symbols Reading symbols from > /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for > /boot/kernel/netgraph.ko.symbols Reading symbols from > /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for > /boot/kernel/ng_eiface.ko.symbols Reading symbols from > /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for > /boot/kernel/ng_ether.ko.symbols Reading symbols from > /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for > /boot/kernel/accf_data.ko.symbols Reading symbols from > /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for > /boot/kernel/accf_http.ko.symbols Reading symbols from > /boot/kernel/ums.ko.symbols...done. Loaded symbols for > /boot/kernel/ums.ko.symbols Reading symbols from > /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for > /boot/kernel/ng_socket.ko.symbols Reading symbols from > /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for > /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at > pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0 > doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in > kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 > 0xffffffff8094b845 in vpanic (fmt=3D, ap=3D optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 > 0xffffffff8094b6d9 in kassert_panic (fmt=3D) at > /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in > tcp_usr_detach (so=3D) at > /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in > sofree (so=3D0xfffff801dd302000) at > /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in > soclose (so=3D) at > /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in > _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8 > 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40, > td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 #9 > 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, fd=3D optimized out>, fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, > holdleaders=3D) at > /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in > amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at > subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a in > ?? () Previous frame inner to this frame (corrupt stack?) Current > language: auto; currently minimal Thanks for the information. As I suspected the initial error was elsewhere than tcp_twclose(), never got this assertion before: tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL from here: static void tcp_detach(struct socket *so, struct inpcb *inp) { struct tcpcb *tp; INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp")); KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D = so")); tp =3D intotcpcb(inp); if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAI= T && " "INP_DROPPED && tp !=3D NULL")); Let me check if I could find a path that could lead to this unexpected case. Unexpected because: INP_DROPPED and inp->inp_ppcb is set to NULL are set at same time here: void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so; struct inpcb *inp; inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw, ("tcp_twclose: inp_ppcb !=3D tw"))= ; INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ INP_WLOCK_ASSERT(inp); tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL; in_pcbdrop(inp); ... Interesting and by the way could you try: # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads To see if other thread are also in TCP stack at the same time, and if one of this thread is referencing the same inp. Thanks. -- Julien --EgETcgrd61vGi0aPEO2cRVwqs1AH5WosX Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWAZsLAAoJEKVlQ5Je6dhx5YgIAIH04fEuy9VVTbG2hAKCBqAQ w131yh4EONifUJ2a8MvtcQH57o5LixGb7o0Y5LAtmRTzubFerzcl9Fd/UNdv3bpn izYVG3+mHee36OftnGTtUBWmHhz2P1Ht9GS3i0a7Vxo+sKL5kuuijnPYo4lp4GsS IkiCrk+oFAd3NJz4ZRkzTDoj9+wcjYQzDeTtF5RMmhrZZZbIdJnOaMBdWse2D8Vv rh3OiK2BKgjNS/H6x0evEV5Y76VVoZkBWBes2WjKM/pYyxBU/9RNUpaQKyTh4uVg 9+tS8lpiE2M89AwPNUSN7PO3UbZB/6pADJeE5KQPGAHacZC+rtOADM6+2L3T2Uk= =RNay -----END PGP SIGNATURE----- --EgETcgrd61vGi0aPEO2cRVwqs1AH5WosX-- From owner-freebsd-net@freebsd.org Tue Sep 22 20:32:38 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D1150A026F9 for ; Tue, 22 Sep 2015 20:32:38 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 18E351372; Tue, 22 Sep 2015 20:32:37 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id BE127DBAE; Tue, 22 Sep 2015 22:32:35 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id sqBiQ8W7aEwp; Tue, 22 Sep 2015 22:32:35 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 3D760DBA8; Tue, 22 Sep 2015 22:32:35 +0200 (CEST) Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <56019AF8.8000705@freebsd.org> Date: Tue, 22 Sep 2015 22:32:34 +0200 Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 20:32:38 -0000 Hi, > 22 sep 2015 kl. 20:16 skrev Julien Charbon : >=20 >=20 > Hi Palle, >=20 > On 22/09/15 18:49, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn >>> : >>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>> : >>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>> : On 18/09/15 18:06, Konstantin Belousov >>>>> wrote: >>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon >>>>>> wrote: >>>>>>> [...] >>>>> - Second, if issue is still in stable/10, compile 10.2 kernel >>>>> with these options: >>>>>=20 >>>>> options DDB options DEADLKRES options >>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>> WITNESS options WITNESS_SKIPSPIN >>>>>=20 >>>>> To see where the original fault is coming from. >>>> [...] >>>>=20 >>>> I'll try stable/10 now. Would you suggest a "clean" stable/10, >>>> or could 287621 and 287780 help? >>>>=20 >>>> I'll add the debugging suggested options right away. >>>>=20 >>>> Palle >>>=20 >>> I have a new core dump from ^/stable/10 with: >>>=20 >>> options DDB options DEADLKRES options >>> INVARIANTS options INVARIANT_SUPPORT options >>> WITNESS options WITNESS_SKIPSPIN >>=20 >> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] Copyright >> 2004 Free Software Foundation, Inc. GDB is free software, covered >> by the GNU General Public License, and you are welcome to change it >> and/or distribute copies of it under certain conditions. Type "show >> copying" to see the conditions. There is absolutely no warranty for >> GDB. Type "show warranty" for details. This GDB was configured as >> "amd64-marcel-freebsd"... >>=20 >> Unread portion of the kernel message buffer: panic: tcp_detach: >> INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack >> backtrace: db_trace_self_wrapper() at >> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 kdb_backtrace() >> at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 vpanic() at >> vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() at >> kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() at >> tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at >> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at >> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at >> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >> 0xfffffe183d9e9ae0 amd64_syscall() at amd64_syscall+0x25a/frame >> 0xfffffe183d9e9bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame >> 0xfffffe183d9e9bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip =3D >> 0x801c8d94a, rsp =3D 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: >> enter: panic Uptime: 18h57m59s Dumping 23085 out of 98263 >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>=20 >> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded >> symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from >> /boot/kernel/zfs.ko.symbols...done. Loaded symbols for >> /boot/kernel/zfs.ko.symbols Reading symbols from >> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for >> /boot/kernel/opensolaris.ko.symbols Reading symbols from >> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_bridge.ko.symbols Reading symbols from >> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for >> /boot/kernel/netgraph.ko.symbols Reading symbols from >> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_eiface.ko.symbols Reading symbols from >> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_ether.ko.symbols Reading symbols from >> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >> /boot/kernel/accf_data.ko.symbols Reading symbols from >> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >> /boot/kernel/accf_http.ko.symbols Reading symbols from >> /boot/kernel/ums.ko.symbols...done. Loaded symbols for >> /boot/kernel/ums.ko.symbols Reading symbols from >> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_socket.ko.symbols Reading symbols from >> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for >> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at >> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) = (kgdb) bt #0 >> doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in >> kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 >> 0xffffffff8094b845 in vpanic (fmt=3D, ap=3D> optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >> 0xffffffff8094b6d9 in kassert_panic (fmt=3D) at >> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in >> tcp_usr_detach (so=3D) at >> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in >> sofree (so=3D0xfffff801dd302000) at >> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >> soclose (so=3D) at >> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8 >> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40, >> td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 #9 >> 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, fd=3D> optimized out>, fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >> holdleaders=3D) at >> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in >> amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at >> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a in >> ?? () Previous frame inner to this frame (corrupt stack?) Current >> language: auto; currently minimal >=20 > Thanks for the information. As I suspected the initial error was > elsewhere than tcp_twclose(), never got this assertion before: >=20 > tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >=20 > from here: >=20 > static void > tcp_detach(struct socket *so, struct inpcb *inp) > { > struct tcpcb *tp; >=20 > INP_INFO_WLOCK_ASSERT(&V_tcbinfo); > INP_WLOCK_ASSERT(inp); >=20 > KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D = inp")); > KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D = so")); >=20 > tp =3D intotcpcb(inp); >=20 > if (inp->inp_flags & INP_TIMEWAIT) { > if (inp->inp_flags & INP_DROPPED) { > KASSERT(tp =3D=3D NULL, ("tcp_detach: = INP_TIMEWAIT > && " > "INP_DROPPED && tp !=3D NULL")); >=20 > Let me check if I could find a path that could lead to this > unexpected case. Unexpected because: INP_DROPPED and inp->inp_ppcb > is set to NULL are set at same time here: >=20 > void > tcp_twclose(struct tcptw *tw, int reuse) > { > struct socket *so; > struct inpcb *inp; >=20 > inp =3D tw->tw_inpcb; > KASSERT((inp->inp_flags & INP_TIMEWAIT), ("tcp_twclose: > !timewait")); > KASSERT(intotw(inp) =3D=3D tw, ("tcp_twclose: inp_ppcb !=3D = tw")); > INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ > INP_WLOCK_ASSERT(inp); >=20 > tcp_tw_2msl_stop(tw, reuse); > inp->inp_ppcb =3D NULL; > in_pcbdrop(inp); > ... >=20 > Interesting and by the way could you try: >=20 > # kgdb kernel /var/crash/vmcore.2 > (kgdb) info threads >=20 > To see if other thread are also in TCP stack at the same time, and if > one of this thread is referencing the same inp. >=20 > Thanks. >=20 > -- > Julien >=20 Thanks for pursuing this. Enclosed is output of info threads. +6100 = threads, so I compressed it. Most threads are in sched_switch, but the = few ones below differ. Anything else I can do? I could supply the entire core file, or access = to the machine?=20 Palle $ grep -v -A 1 sched info_threads2.txt=20 (kgdb) info threads 6132 Thread 105613 (PID=3D43806: python2.7) sched_switch = (td=3D0xfffff803c56af940, newtd=3D,=20 -- * 4206 Thread 106143 (PID=3D13296: httpd) doadump (textdump=3D1) at = pcpu.h:219 4205 Thread 106142 (PID=3D13296: httpd) sched_switch = (td=3D0xfffff80eebc89940, newtd=3D,=20 -- 1656 Thread 106919 (PID=3D10658: jsvc) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 1655 Thread 106918 (PID=3D10658: jsvc) sched_switch = (td=3D0xfffff804900f7000, newtd=3D,=20 -- 1435 Thread 106442 (PID=3D10436: jsvc) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 1434 Thread 106441 (PID=3D10436: jsvc) sched_switch = (td=3D0xfffff80fe183e4a0, newtd=3D,=20 -- 592 Thread 100139 (PID=3D12: intr/swi0: uart uart) fork_trampoline () = at /usr/src/sys/amd64/amd64/exception.S:608 591 Thread 100138 (PID=3D12: intr/irq1: atkbd0) sched_switch = (td=3D0xfffff80110893000, newtd=3D,=20 -- 590 Thread 100120 (PID=3D12: intr/irq22: uhci3) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 589 Thread 100114 (PID=3D12: intr/irq23: uhci2 uhci4) fork_trampoline = () at /usr/src/sys/amd64/amd64/exception.S:608 588 Thread 100108 (PID=3D12: intr/irq20: hpet0 uhci1+) sched_switch = (td=3D0xfffff801105fa000,=20 -- 586 Thread 100101 (PID=3D12: intr/irq260: bce3) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 585 Thread 100100 (PID=3D12: intr/irq259: bce2) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 584 Thread 100099 (PID=3D12: intr/irq258: bce1) sched_switch = (td=3D0xfffff8011049b000, newtd=3D,=20 -- 580 Thread 100084 (PID=3D12: intr/swi5: fast taskq) fork_trampoline = () at /usr/src/sys/amd64/amd64/exception.S:608 579 Thread 100082 (PID=3D12: intr/swi6: Giant taskq) sched_switch = (td=3D0xfffff80110495940,=20 -- 578 Thread 100052 (PID=3D12: intr/swi3: vm) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 577 Thread 100051 (PID=3D12: intr/swi1: netisr 0) sched_switch = (td=3D0xfffff801102f4000, newtd=3D,=20 -- 576 Thread 100050 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 575 Thread 100049 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 574 Thread 100048 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 573 Thread 100047 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 572 Thread 100046 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 571 Thread 100045 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 570 Thread 100044 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 569 Thread 100043 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 568 Thread 100042 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 567 Thread 100041 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 566 Thread 100040 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 565 Thread 100039 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 564 Thread 100038 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 563 Thread 100037 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 562 Thread 100036 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 561 Thread 100035 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 560 Thread 100034 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 559 Thread 100033 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 558 Thread 100032 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 557 Thread 100031 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 556 Thread 100030 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 555 Thread 100029 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 554 Thread 100028 (PID=3D12: intr/swi4: clock) fork_trampoline () at = /usr/src/sys/amd64/amd64/exception.S:608 553 Thread 100027 (PID=3D12: intr/swi4: clock) sched_switch = (td=3D0xfffff801102cf000, newtd=3D,=20 -- 552 Thread 100026 (PID=3D11: idle/idle: cpu23) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 551 Thread 100025 (PID=3D11: idle/idle: cpu22) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 550 Thread 100024 (PID=3D11: idle/idle: cpu21) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 549 Thread 100023 (PID=3D11: idle/idle: cpu20) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 548 Thread 100022 (PID=3D11: idle/idle: cpu19) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 547 Thread 100021 (PID=3D11: idle/idle: cpu18) sched_switch = (td=3D0xfffff801102d1000, newtd=3D,=20 -- 546 Thread 100020 (PID=3D11: idle/idle: cpu17) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 545 Thread 100019 (PID=3D11: idle/idle: cpu16) sched_switch = (td=3D0xfffff801102d1940, newtd=3D,=20 -- 544 Thread 100018 (PID=3D11: idle/idle: cpu15) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 543 Thread 100017 (PID=3D11: idle/idle: cpu14) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 542 Thread 100016 (PID=3D11: idle/idle: cpu13) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 541 Thread 100015 (PID=3D11: idle/idle: cpu12) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 540 Thread 100014 (PID=3D11: idle/idle: cpu11) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 539 Thread 100013 (PID=3D11: idle/idle: cpu10) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 538 Thread 100012 (PID=3D11: idle/idle: cpu9) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 537 Thread 100011 (PID=3D11: idle/idle: cpu8) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 536 Thread 100010 (PID=3D11: idle/idle: cpu7) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 535 Thread 100009 (PID=3D11: idle/idle: cpu6) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 534 Thread 100008 (PID=3D11: idle/idle: cpu5) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 533 Thread 100007 (PID=3D11: idle/idle: cpu4) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 532 Thread 100006 (PID=3D11: idle/idle: cpu3) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 531 Thread 100005 (PID=3D11: idle/idle: cpu2) sched_switch = (td=3D0xfffff801102b84a0, newtd=3D,=20 -- 530 Thread 100004 (PID=3D11: idle/idle: cpu1) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 529 Thread 100003 (PID=3D11: idle/idle: cpu0) 0xffffffff80d68e88 in = cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1451 528 Thread 100002 (PID=3D1: init) sched_switch = (td=3D0xfffff801102b94a0, newtd=3D,=20 -- Palle From owner-freebsd-net@freebsd.org Tue Sep 22 20:58:24 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C653CA034C6 for ; Tue, 22 Sep 2015 20:58:24 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 5020F18D9; Tue, 22 Sep 2015 20:58:24 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id 205F4DC61; Tue, 22 Sep 2015 22:58:23 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 9F7RDaJoxs4j; Tue, 22 Sep 2015 22:58:22 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id E2BDBDC54; Tue, 22 Sep 2015 22:58:22 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <56019AF8.8000705@freebsd.org> Date: Tue, 22 Sep 2015 22:58:22 +0200 Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky Content-Transfer-Encoding: quoted-printable Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 20:58:25 -0000 > 22 sep 2015 kl. 20:16 skrev Julien Charbon : >=20 >=20 > Hi Palle, >=20 > On 22/09/15 18:49, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn >>> : >>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>> : >>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>> : On 18/09/15 18:06, Konstantin Belousov >>>>> wrote: >>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon >>>>>> wrote: >>>>>>> [...] >>>>> - Second, if issue is still in stable/10, compile 10.2 kernel >>>>> with these options: >>>>>=20 >>>>> options DDB options DEADLKRES options >>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>> WITNESS options WITNESS_SKIPSPIN >>>>>=20 >>>>> To see where the original fault is coming from. >>>> [...] >>>>=20 >>>> I'll try stable/10 now. Would you suggest a "clean" stable/10, >>>> or could 287621 and 287780 help? >>>>=20 >>>> I'll add the debugging suggested options right away. >>>>=20 >>>> Palle >>>=20 >>> I have a new core dump from ^/stable/10 with: >>>=20 >>> options DDB options DEADLKRES options >>> INVARIANTS options INVARIANT_SUPPORT options >>> WITNESS options WITNESS_SKIPSPIN >>=20 >> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] Copyright >> 2004 Free Software Foundation, Inc. GDB is free software, covered >> by the GNU General Public License, and you are welcome to change it >> and/or distribute copies of it under certain conditions. Type "show >> copying" to see the conditions. There is absolutely no warranty for >> GDB. Type "show warranty" for details. This GDB was configured as >> "amd64-marcel-freebsd"... >>=20 >> Unread portion of the kernel message buffer: panic: tcp_detach: >> INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack >> backtrace: db_trace_self_wrapper() at >> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 kdb_backtrace() >> at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 vpanic() at >> vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() at >> kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() at >> tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at >> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at >> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at >> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >> 0xfffffe183d9e9ae0 amd64_syscall() at amd64_syscall+0x25a/frame >> 0xfffffe183d9e9bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame >> 0xfffffe183d9e9bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip =3D >> 0x801c8d94a, rsp =3D 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: >> enter: panic Uptime: 18h57m59s Dumping 23085 out of 98263 >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>=20 >> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded >> symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from >> /boot/kernel/zfs.ko.symbols...done. Loaded symbols for >> /boot/kernel/zfs.ko.symbols Reading symbols from >> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for >> /boot/kernel/opensolaris.ko.symbols Reading symbols from >> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_bridge.ko.symbols Reading symbols from >> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for >> /boot/kernel/netgraph.ko.symbols Reading symbols from >> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_eiface.ko.symbols Reading symbols from >> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_ether.ko.symbols Reading symbols from >> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >> /boot/kernel/accf_data.ko.symbols Reading symbols from >> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >> /boot/kernel/accf_http.ko.symbols Reading symbols from >> /boot/kernel/ums.ko.symbols...done. Loaded symbols for >> /boot/kernel/ums.ko.symbols Reading symbols from >> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >> /boot/kernel/ng_socket.ko.symbols Reading symbols from >> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for >> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at >> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) = (kgdb) bt #0 >> doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in >> kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 >> 0xffffffff8094b845 in vpanic (fmt=3D, ap=3D> optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >> 0xffffffff8094b6d9 in kassert_panic (fmt=3D) at >> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in >> tcp_usr_detach (so=3D) at >> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in >> sofree (so=3D0xfffff801dd302000) at >> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >> soclose (so=3D) at >> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8 >> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40, >> td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 #9 >> 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, fd=3D> optimized out>, fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >> holdleaders=3D) at >> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in >> amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at >> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at >> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a in >> ?? () Previous frame inner to this frame (corrupt stack?) Current >> language: auto; currently minimal >=20 > Thanks for the information. As I suspected the initial error was > elsewhere than tcp_twclose(), never got this assertion before: >=20 > tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >=20 > from here: >=20 > static void > tcp_detach(struct socket *so, struct inpcb *inp) > { > struct tcpcb *tp; >=20 > INP_INFO_WLOCK_ASSERT(&V_tcbinfo); > INP_WLOCK_ASSERT(inp); >=20 > KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D = inp")); > KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D = so")); >=20 > tp =3D intotcpcb(inp); >=20 > if (inp->inp_flags & INP_TIMEWAIT) { > if (inp->inp_flags & INP_DROPPED) { > KASSERT(tp =3D=3D NULL, ("tcp_detach: = INP_TIMEWAIT > && " > "INP_DROPPED && tp !=3D NULL")); >=20 > Let me check if I could find a path that could lead to this > unexpected case. Unexpected because: INP_DROPPED and inp->inp_ppcb > is set to NULL are set at same time here: >=20 > void > tcp_twclose(struct tcptw *tw, int reuse) > { > struct socket *so; > struct inpcb *inp; >=20 > inp =3D tw->tw_inpcb; > KASSERT((inp->inp_flags & INP_TIMEWAIT), ("tcp_twclose: > !timewait")); > KASSERT(intotw(inp) =3D=3D tw, ("tcp_twclose: inp_ppcb !=3D = tw")); > INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ > INP_WLOCK_ASSERT(inp); >=20 > tcp_tw_2msl_stop(tw, reuse); > inp->inp_ppcb =3D NULL; > in_pcbdrop(inp); > ... >=20 > Interesting and by the way could you try: >=20 > # kgdb kernel /var/crash/vmcore.2 > (kgdb) info threads >=20 > To see if other thread are also in TCP stack at the same time, and if > one of this thread is referencing the same inp. >=20 > Thanks. >=20 > -- > Julien >=20 BTW, this backtrace looks quite different from the previous ones? Is it = a different problem, or just a different way to reveal the same problem? Palle From owner-freebsd-net@freebsd.org Tue Sep 22 21:59:23 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 972C6A073C0 for ; Tue, 22 Sep 2015 21:59:23 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2DD721623; Tue, 22 Sep 2015 21:59:22 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wiclk2 with SMTP id lk2so43942823wic.1; Tue, 22 Sep 2015 14:59:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=ChbptmmxeReXNhV2ucA0xEf9WlCQnZz9Xm5XJywnXTA=; b=eq5vQvLAsp9Mx/rmMmDvWHNDCxvTogy9Iy0vYq5i2Cq74ghoTjkzXyYPjoW5J8+qVI EWdce5n6y3nuTXfj3x+ZowpTF3YRoX13pJFRxXdhZi2r+5lMOawjvl4WOL9xnfyMhfZI KqGjNRdPBJdln3tJsdp5NFEOt6/ji9nwUNDz1fl5XyYI1pOh8OAXO0qEP3H12K70CvM9 8MLp+0BYS7XgrTmWpjiET8pyxNYI7T+KYJEEr014EygyNaSrD198TtiJZDK7vOijNtaH KlvvxUZCGBm4FMbELU/+wyd/WsQjJXk+GuVc2pLhgeBzP8r/nGoKMFUm6pLH5poKdw/b xmXg== X-Received: by 10.180.8.232 with SMTP id u8mr51217wia.10.1442959161104; Tue, 22 Sep 2015 14:59:21 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([2a02:aa11:2100:2380:21d9:6dfc:8b7b:1a2b]) by smtp.googlemail.com with ESMTPSA id gk9sm5248780wib.9.2015.09.22.14.59.19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 14:59:20 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <5601CF2D.9030307@freebsd.org> Date: Tue, 22 Sep 2015 23:59:09 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="busGMoBpmdTluNL5AR0vCxaduPQvGv3At" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 21:59:23 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --busGMoBpmdTluNL5AR0vCxaduPQvGv3At Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Palle, On 22/09/15 22:58, Palle Girgensohn wrote: >> 22 sep 2015 kl. 20:16 skrev Julien Charbon : >> On 22/09/15 18:49, Palle Girgensohn wrote: >>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>> : >>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>> : >>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>> : On 18/09/15 18:06, Konstantin Belousov=20 >>>>>> wrote: >>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon=20 >>>>>>> wrote: >>>>>>>> [...] >>>>>> - Second, if issue is still in stable/10, compile 10.2 >>>>>> kernel with these options: >>>>>>=20 >>>>>> options DDB options DEADLKRES options=20 >>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>> options WITNESS_SKIPSPIN >>>>>>=20 >>>>>> To see where the original fault is coming from. >>>>> [...] >>>>>=20 >>>>> I'll try stable/10 now. Would you suggest a "clean" >>>>> stable/10, or could 287621 and 287780 help? >>>>>=20 >>>>> I'll add the debugging suggested options right away. >>>>>=20 >>>>> Palle >>>>=20 >>>> I have a new core dump from ^/stable/10 with: >>>>=20 >>>> options DDB options DEADLKRES options INVARIANTS >>>> options INVARIANT_SUPPORT options WITNESS options >>>> WITNESS_SKIPSPIN >>>=20 >>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] >>> Copyright 2004 Free Software Foundation, Inc. GDB is free >>> software, covered by the GNU General Public License, and you are >>> welcome to change it and/or distribute copies of it under certain >>> conditions. Type "show copying" to see the conditions. There is >>> absolutely no warranty for GDB. Type "show warranty" for >>> details. This GDB was configured as "amd64-marcel-freebsd"... >>>=20 >>> Unread portion of the kernel message buffer: panic: tcp_detach:=20 >>> INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack=20 >>> backtrace: db_trace_self_wrapper() at=20 >>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 >>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() >>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() >>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at=20 >>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at=20 >>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at=20 >>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at >>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at >>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at >>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at >>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, >>> FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D >>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic >>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>=20 >>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols >>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/zfs.ko.symbols Reading symbols from=20 >>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/opensolaris.ko.symbols Reading symbols from=20 >>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0=20 >>> doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in=20 >>> kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 >>> #2 0xffffffff8094b845 in vpanic (fmt=3D, >>> ap=3D) at >>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in >>> kassert_panic (fmt=3D) at=20 >>> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in=20 >>> tcp_usr_detach (so=3D) at=20 >>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in=20 >>> sofree (so=3D0xfffff801dd302000) at=20 >>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in=20 >>> soclose (so=3D) at=20 >>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in=20 >>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>> td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 >>> #9 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, >>> fd=3D, fp=3D0xfffff802a593db40, >>> td=3D0xfffff80eebc894a0, holdleaders=3D) at=20 >>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in=20 >>> amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at=20 >>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a >>> in ?? () Previous frame inner to this frame (corrupt stack?) >>> Current language: auto; currently minimal >>=20 >> Thanks for the information. As I suspected the initial error was=20 >> elsewhere than tcp_twclose(), never got this assertion before: >>=20 >> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>=20 >> from here: >>=20 >> static void tcp_detach(struct socket *so, struct inpcb *inp) {=20 >> struct tcpcb *tp; >>=20 >> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>=20 >> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20 >> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D so"))= ; >>=20 >> tp =3D intotcpcb(inp); >>=20 >> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & >> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT && "= =20 >> "INP_DROPPED && tp !=3D NULL")); >>=20 >> Let me check if I could find a path that could lead to this=20 >> unexpected case. Unexpected because: INP_DROPPED and >> inp->inp_ppcb is set to NULL are set at same time here: >>=20 >> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so;=20 >> struct inpcb *inp; >>=20 >> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), >> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw, >> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >> INP_WLOCK_ASSERT(inp); >>=20 >> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >> in_pcbdrop(inp); ... >>=20 >> Interesting and by the way could you try: >>=20 >> # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads >>=20 >> To see if other thread are also in TCP stack at the same time, and >> if one of this thread is referencing the same inp. >>=20 >=20 > BTW, this backtrace looks quite different from the previous ones? Is > it a different problem, or just a different way to reveal the same > problem? Having a different backstrace is expected, the first backstrace was quite deep in the stack, and kernel panic-ed quite late after the original issue. With the kernel debug options, the kernel stopped at the first suspicious fact. And I would say that I still does not understand how it is possible to reach that state, it is like kernel exclusive lock stopped to work and allow several thread to work on the same locked inp at same time. (Stack overflow, VIMAGE memory corruption, UMA issue? ... weird). I will ask you the kernel and the core dump but off list to avoid too much spam in -net. -- Julien --busGMoBpmdTluNL5AR0vCxaduPQvGv3At Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWAc82AAoJEKVlQ5Je6dhxYfgIANdO8mB3Uz3Bi3L/qQCiAbzb g/hYJDr7bYeIT3kpAYRCAsBP+qt0M58ArE0npx0Emj419rCqfhcg6Ld2FEu19B1+ 0L3N/511c+kR9aJnZ9rJnm4qkHQIvboZpu4bV+8QZ9QNhzYJ8vjiwjnb4Z92xbbq Qo5jvLxnyhmvHdgBlUUZup/eq2nHL8zD0GI1WX5pT9Q7ekgGu7jG3E3qPWEVzX6H j0cLrxnAIZEnbEnrgNZtsjDVvM1S5IzwVGpkwkKtvUbkEr42KRI8xHpU1cDL0if1 NCnxm4kJ+YKUoKTBYqa+7mPJb+GoS5x2oSqh5C5/rfoNYRDKoGj9RAnw9YauVz4= =2++u -----END PGP SIGNATURE----- --busGMoBpmdTluNL5AR0vCxaduPQvGv3At-- From owner-freebsd-net@freebsd.org Tue Sep 22 22:01:50 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62003A07591 for ; Tue, 22 Sep 2015 22:01:50 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 0498B1A19; Tue, 22 Sep 2015 22:01:50 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id D76ACE452; Wed, 23 Sep 2015 00:01:47 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 6JXv_H6c4ybY; Wed, 23 Sep 2015 00:01:47 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 9F095E44F; Wed, 23 Sep 2015 00:01:47 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <5601CF2D.9030307@freebsd.org> Date: Wed, 23 Sep 2015 00:01:47 +0200 Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky Content-Transfer-Encoding: 7bit Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Sep 2015 22:01:50 -0000 > 22 sep 2015 kl. 23:59 skrev Julien Charbon : > > > Hi Palle, > > On 22/09/15 22:58, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 20:16 skrev Julien Charbon : >>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn >>>>> : >>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>>>> : >>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>>>> : On 18/09/15 18:06, Konstantin Belousov >>>>>>> wrote: >>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon >>>>>>>> wrote: >>>>>>>>> [...] >>>>>>> - Second, if issue is still in stable/10, compile 10.2 >>>>>>> kernel with these options: >>>>>>> >>>>>>> options DDB options DEADLKRES options >>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>> options WITNESS_SKIPSPIN >>>>>>> >>>>>>> To see where the original fault is coming from. >>>>>> [...] >>>>>> >>>>>> I'll try stable/10 now. Would you suggest a "clean" >>>>>> stable/10, or could 287621 and 287780 help? >>>>>> >>>>>> I'll add the debugging suggested options right away. >>>>>> >>>>>> Palle >>>>> >>>>> I have a new core dump from ^/stable/10 with: >>>>> >>>>> options DDB options DEADLKRES options INVARIANTS >>>>> options INVARIANT_SUPPORT options WITNESS options >>>>> WITNESS_SKIPSPIN >>>> >>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] >>>> Copyright 2004 Free Software Foundation, Inc. GDB is free >>>> software, covered by the GNU General Public License, and you are >>>> welcome to change it and/or distribute copies of it under certain >>>> conditions. Type "show copying" to see the conditions. There is >>>> absolutely no warranty for GDB. Type "show warranty" for >>>> details. This GDB was configured as "amd64-marcel-freebsd"... >>>> >>>> Unread portion of the kernel message buffer: panic: tcp_detach: >>>> INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid = 16 KDB: stack >>>> backtrace: db_trace_self_wrapper() at >>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 >>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() >>>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() >>>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at >>>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at >>>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at >>>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at >>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at >>>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at >>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at >>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, >>>> FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp = >>>> 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic >>>> Uptime: 18h57m59s Dumping 23085 out of 98263 >>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>> >>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols >>>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/zfs.ko.symbols Reading symbols from >>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from >>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/netgraph.ko.symbols Reading symbols from >>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from >>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from >>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/accf_data.ko.symbols Reading symbols from >>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/accf_http.ko.symbols Reading symbols from >>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ums.ko.symbols Reading symbols from >>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from >>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for >>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=1) at >>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 >>>> doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8094b337 in >>>> kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 >>>> #2 0xffffffff8094b845 in vpanic (fmt=, >>>> ap=) at >>>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in >>>> kassert_panic (fmt=) at >>>> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in >>>> tcp_usr_detach (so=) at >>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in >>>> sofree (so=0xfffff801dd302000) at >>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>> soclose (so=) at >>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>> _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8 >>>> 0xffffffff80901092 in closef (fp=0xfffff802a593db40, >>>> td=0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 >>>> #9 0xffffffff808feb5d in closefp (fdp=0xfffff80b20cce000, >>>> fd=, fp=0xfffff802a593db40, >>>> td=0xfffff80eebc894a0, holdleaders=) at >>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in >>>> amd64_syscall (td=0xfffff80eebc894a0, traced=0) at >>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at >>>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a >>>> in ?? () Previous frame inner to this frame (corrupt stack?) >>>> Current language: auto; currently minimal >>> >>> Thanks for the information. As I suspected the initial error was >>> elsewhere than tcp_twclose(), never got this assertion before: >>> >>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL >>> >>> from here: >>> >>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>> struct tcpcb *tp; >>> >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>> >>> KASSERT(so->so_pcb == inp, ("tcp_detach: so_pcb != inp")); >>> KASSERT(inp->inp_socket == so, ("tcp_detach: inp_socket != so")); >>> >>> tp = intotcpcb(inp); >>> >>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & >>> INP_DROPPED) { KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT && " >>> "INP_DROPPED && tp != NULL")); >>> >>> Let me check if I could find a path that could lead to this >>> unexpected case. Unexpected because: INP_DROPPED and >>> inp->inp_ppcb is set to NULL are set at same time here: >>> >>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so; >>> struct inpcb *inp; >>> >>> inp = tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), >>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) == tw, >>> ("tcp_twclose: inp_ppcb != tw")); >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */ >>> INP_WLOCK_ASSERT(inp); >>> >>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb = NULL; >>> in_pcbdrop(inp); ... >>> >>> Interesting and by the way could you try: >>> >>> # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads >>> >>> To see if other thread are also in TCP stack at the same time, and >>> if one of this thread is referencing the same inp. >>> >> >> BTW, this backtrace looks quite different from the previous ones? Is >> it a different problem, or just a different way to reveal the same >> problem? > > Having a different backstrace is expected, the first backstrace was > quite deep in the stack, and kernel panic-ed quite late after the > original issue. With the kernel debug options, the kernel stopped at > the first suspicious fact. > > And I would say that I still does not understand how it is possible to > reach that state, it is like kernel exclusive lock stopped to work and > allow several thread to work on the same locked inp at same time. > (Stack overflow, VIMAGE memory corruption, UMA issue? ... weird). > > I will ask you the kernel and the core dump but off list to avoid too > much spam in -net. > > -- > Julien > Great, I'll send it you promptly. Palle From owner-freebsd-net@freebsd.org Wed Sep 23 03:03:46 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F303A07072 for ; Wed, 23 Sep 2015 03:03:46 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) Received: from relay2.tomsk.ru (mail.sibptus.tomsk.ru [212.73.124.5]) by mx1.freebsd.org (Postfix) with ESMTP id D2E561186 for ; Wed, 23 Sep 2015 03:03:44 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) X-Virus-Scanned: by clamd daemon 0.98.5_1 for FreeBSD at relay2.tomsk.ru Received: from admin.sibptus.TOMSK.ru ([212.73.125.240] verified) by relay2.tomsk.ru (CommuniGate Pro SMTP 5.1.16) with ESMTPS id 38873119; Wed, 23 Sep 2015 09:03:43 +0600 Received: from admin.sibptus.TOMSK.ru (sudakov@localhost [127.0.0.1]) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7) with ESMTP id t8N33eLb005164; Wed, 23 Sep 2015 09:03:41 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) Received: (from sudakov@localhost) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7/Submit) id t8N33eme005163; Wed, 23 Sep 2015 09:03:40 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) X-Authentication-Warning: admin.sibptus.TOMSK.ru: sudakov set sender to vas@mpeks.tomsk.su using -f Date: Wed, 23 Sep 2015 09:03:40 +0600 From: Victor Sudakov To: Larry Baird , freebsd-net@freebsd.org Subject: Re: transport mode IPSec with Windows 7, static keys Message-ID: <20150923030340.GB4556@admin.sibptus.tomsk.ru> References: <115822.44131.97331@localhost> <20150922144246.61965.qmail@mailgate.gta.com> <20150922151003.GA98507@admin.sibptus.tomsk.ru> <20150922163845.GB82457@gta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150922163845.GB82457@gta.com> Organization: OAO "Svyaztransneft", SibPTUS X-PGP-Key: http://www.dreamwidth.org/pubkey?user=victor_sudakov X-PGP-Fingerprint: 10E3 1171 1273 E007 C2E9 3532 0DA4 F259 9B5E C634 User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 03:03:46 -0000 Larry Baird wrote: > > > I use IKE when I have to, but would like to use static keys with > > Windows specifically, or at least would like to definitely know if it > > is at all possible or not. > Static keys are too weak from a security stand point. I can imagine situations where static keys are sufficient, or may present a lesser risk than installing third party VPN solutions on Windows. > I have never tried > to configure them on Windows. Sorry I can't help. I configured them between FreeBSD and Cisco, as well as two FreeBSD hosts. The main problem with Windows is that it can have only one key both for encryption and authentication, while setkey requires two different keys to be of different lengths, which is kinda difficult to set up with setkey. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN sip:sudakov@sibptus.tomsk.ru From owner-freebsd-net@freebsd.org Wed Sep 23 07:12:59 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D53C1A0686C for ; Wed, 23 Sep 2015 07:12:59 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EA54A1618; Wed, 23 Sep 2015 07:12:58 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA18113; Wed, 23 Sep 2015 10:12:50 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ZeeEU-000BJV-8v; Wed, 23 Sep 2015 10:12:50 +0300 Subject: Re: page fault in tcp_do_segment (r287759 suspected) To: freebsd-net , "George V. Neville-Neil" References: <56011276.4060206@FreeBSD.org> From: Andriy Gapon Message-ID: <560250B9.3080002@FreeBSD.org> Date: Wed, 23 Sep 2015 10:11:53 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <56011276.4060206@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 07:12:59 -0000 On 22/09/2015 11:33, Andriy Gapon wrote: > That code actually looks like the following DTrace probe a few lines below: > > TCP_PROBE3(debug__input, tp, th, mtod(m, const char *)); > > So, it seems like 'm' could be NULL here. > I see two places in tcp_do_segment() where m gets assigned with NULL followed by > goto drop. If I had to guess then my guess would be that one of those code > paths was taken. > Since those NULL assignments were there for more than a year, then I would guess > that the addition of the probe is to blame: > https://svnweb.freebsd.org/base?view=revision&revision=287759 Should I file a bug report about this? Does anyone has suggestion for a simple fix? -- Andriy Gapon From owner-freebsd-net@freebsd.org Wed Sep 23 08:35:53 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0920A075B5 for ; Wed, 23 Sep 2015 08:35:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DCEE71BBA for ; Wed, 23 Sep 2015 08:35:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8N8Zr0r087466 for ; Wed, 23 Sep 2015 08:35:53 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Wed, 23 Sep 2015 08:35:53 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 08:35:54 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #5 from Palle Girgensohn --- Created attachment 161296 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=161296&action=edit output from kgdb "info threads" -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 08:37:51 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E5197A076D4 for ; Wed, 23 Sep 2015 08:37:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C9F3B1CB2 for ; Wed, 23 Sep 2015 08:37:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8N8bow5089558 for ; Wed, 23 Sep 2015 08:37:50 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Wed, 23 Sep 2015 08:37:50 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 08:37:51 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #6 from Palle Girgensohn --- I got a fresh core dump from another machine with the same setup. WITNESS, VARIANTS etc all turned on. Attached the quite long out put from "info threads" as well. We *really* need to find a way to stop our servers from crashing, so any help appreciated! (kgdb) bt #0 doadump (textdump=) at pcpu.h:219 #1 0xffffffff80945d02 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff809460e5 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff80945f73 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff80d595cb in trap_fatal (frame=, eva=) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff80d598cd in trap_pfault (frame=0xfffffe00003cc700, usermode=) at /usr/src/sys/amd64/amd64/trap.c:674 #6 0xffffffff80d58f6a in trap (frame=0xfffffe00003cc700) at /usr/src/sys/amd64/amd64/trap.c:440 #7 0xffffffff80d3ef72 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff8099487c in turnstile_broadcast (ts=0x0, queue=1) at /usr/src/sys/kern/subr_turnstile.c:838 #9 0xffffffff80944380 in __rw_wunlock_hard (c=0xfffff803755e9928, tid=1, file=0x1
, line=1) at /usr/src/sys/kern/kern_rwlock.c:988 #10 0xffffffff80b02b24 in tcp_twclose (tw=, reuse=) at /usr/src/sys/netinet/tcp_timewait.c:540 #11 0xffffffff80b0316b in tcp_tw_2msl_scan (reuse=0) at /usr/src/sys/netinet/tcp_timewait.c:748 #12 0xffffffff80b00e7e in tcp_slowtimo () at /usr/src/sys/netinet/tcp_timer.c:198 #13 0xffffffff809b3c74 in pfslowtimo (arg=0x0) at /usr/src/sys/kern/uipc_domain.c:508 #14 0xffffffff8095bb7b in softclock_call_cc (c=0xffffffff8161bbf0, cc=0xffffffff81698d00, direct=0) at /usr/src/sys/kern/kern_timeout.c:685 #15 0xffffffff8095bfa4 in softclock (arg=0xffffffff81698d00) at /usr/src/sys/kern/kern_timeout.c:814 #16 0xffffffff80911c3b in intr_event_execute_handlers (p=, ie=0xfffff8010bbabc00) at /usr/src/sys/kern/kern_intr.c:1264 #17 0xffffffff80912086 in ithread_loop (arg=0xfffff801102abf40) at /usr/src/sys/kern/kern_intr.c:1277 #18 0xffffffff8090f78a in fork_exit (callout=0xffffffff80911ff0 , arg=0xfffff801102abf40, frame=0xfffffe00003ccac0) at /usr/src/sys/kern/kern_fork.c:1018 #19 0xffffffff80d3f4ae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #20 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) info threads -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 08:45:23 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68E56A07ABF for ; Wed, 23 Sep 2015 08:45:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 55C0D10FA for ; Wed, 23 Sep 2015 08:45:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8N8jNBc004458 for ; Wed, 23 Sep 2015 08:45:23 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Wed, 23 Sep 2015 08:45:23 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 08:45:23 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #7 from Palle Girgensohn --- Perhaps this is helpful as well? Fatal trap 12: page fault while in kernel mode cpuid = 14; apic id = 0e fault virtual address = 0x30 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8099487c stack pointer = 0x28:0xfffffe00003cc7b0 frame pointer = 0x28:0xfffffe00003cc7e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 12 (swi4: clock) trap number = 12 panic: page fault cpuid = 14 -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 09:28:29 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 15A08A067C8 for ; Wed, 23 Sep 2015 09:28:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EECD01333 for ; Wed, 23 Sep 2015 09:28:28 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8N9SSBf029422 for ; Wed, 23 Sep 2015 09:28:28 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 201488] dummynet appears broken in 10.0-RELEASE and onwards (can't traffic shape on bridges) Date: Wed, 23 Sep 2015 09:28:29 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-RELEASE X-Bugzilla-Keywords: regression X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: fbsd@peralex.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: luigi@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 09:28:29 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201488 --- Comment #4 from Mark C --- After a lot of recompiling, it looks like the bug crept in with r240099 (committed by 'melifaro'). -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 13:55:41 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CFCD7A07038 for ; Wed, 23 Sep 2015 13:55:41 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BBEED1B7A for ; Wed, 23 Sep 2015 13:55:41 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8NDtf2V099573 for ; Wed, 23 Sep 2015 13:55:41 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 201488] dummynet appears broken in 10.0-RELEASE and onwards (can't traffic shape on bridges) Date: Wed, 23 Sep 2015 13:55:41 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-RELEASE X-Bugzilla-Keywords: regression X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: melifaro@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: melifaro@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 13:55:41 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201488 Alexander V. Chernikov changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|luigi@FreeBSD.org |melifaro@FreeBSD.org CC| |melifaro@FreeBSD.org -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 14:17:07 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26B82A07A11 for ; Wed, 23 Sep 2015 14:17:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 12E84150C for ; Wed, 23 Sep 2015 14:17:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8NEH6k2066960 for ; Wed, 23 Sep 2015 14:17:06 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Wed, 23 Sep 2015 14:17:07 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 14:17:07 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #8 from Palle Girgensohn --- Sorry, comment 6 and 7 where core dumps *without* WITNESS et al. my bad. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 14:17:33 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7CCBDA07A4E for ; Wed, 23 Sep 2015 14:17:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 68FE515CE for ; Wed, 23 Sep 2015 14:17:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8NEHX7Z067348 for ; Wed, 23 Sep 2015 14:17:33 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Wed, 23 Sep 2015 14:17:33 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 14:17:33 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #9 from Palle Girgensohn --- Sorry, comment 6 and 7 where core dumps *without* WITNESS et al. my bad. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 14:18:59 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC38AA07B0E for ; Wed, 23 Sep 2015 14:18:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C86FC179C for ; Wed, 23 Sep 2015 14:18:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8NEIxwM068764 for ; Wed, 23 Sep 2015 14:18:59 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Wed, 23 Sep 2015 14:19:00 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: girgen@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 14:19:00 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #10 from Palle Girgensohn --- (In reply to Palle Girgensohn from comment #7) this was without WITNESS/INVARIANTS. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Wed Sep 23 14:36:42 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C91EDA06307 for ; Wed, 23 Sep 2015 14:36:42 +0000 (UTC) (envelope-from girgen@gmail.com) Received: from mail-la0-x234.google.com (mail-la0-x234.google.com [IPv6:2a00:1450:4010:c03::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 57B711F8C; Wed, 23 Sep 2015 14:36:42 +0000 (UTC) (envelope-from girgen@gmail.com) Received: by lahh2 with SMTP id h2so28892218lah.0; Wed, 23 Sep 2015 07:36:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:mime-version:subject:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=zIew7/K6xwLUATNdzBrfovmeAl9br4mqEoDbJN8TrKA=; b=dwChrmMBJDIG/eJRCUTyFhw+i9DPKXvaBCPfCFLCf4g5MPt73DSw7EBQZWc5+KSada fkmC68C1Ma3cAn1+XBk4PZZN8ip3qfXNFTn95w/Vb3sB19N+9qqBvpXcS4oLkPZgYrvK XcdHjR5wzd4vI2K0PGYqqIAmzqmEEFSb1AeeB9bD+gnmaP0cwHiMAQJpECf7ai0OLQNL uLHYdp7lT/iqhuH8dLuwsVDedKPePjvJ1L+hrK4hgpvgr7W2YKwXkHwrYXxSP7SelOCH Mjoj4aaCAlcgKixuRazYufFXC3Vty4baIejSRLbFxhIg0p0d08r1kwOBWeJ8f00GbW35 Lhvw== X-Received: by 10.112.158.38 with SMTP id wr6mr11775515lbb.25.1443018999221; Wed, 23 Sep 2015 07:36:39 -0700 (PDT) Received: from [10.0.0.143] (citron2.pingpong.net. [195.178.173.68]) by smtp.gmail.com with ESMTPSA id t188sm644883lfe.33.2015.09.23.07.36.37 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 23 Sep 2015 07:36:38 -0700 (PDT) From: Palle Girgensohn X-Google-Original-From: Palle Girgensohn Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose In-Reply-To: <5601CF2D.9030307@freebsd.org> Date: Wed, 23 Sep 2015 16:36:36 +0200 Cc: Konstantin Belousov , freebsd-net@freebsd.org, Hans Petter Selasky Content-Transfer-Encoding: quoted-printable Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 14:36:42 -0000 > 22 sep 2015 kl. 23:59 skrev Julien Charbon : >=20 >=20 > Hi Palle, >=20 > On 22/09/15 22:58, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 20:16 skrev Julien Charbon : >>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>> : >>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>> : >>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>> : On 18/09/15 18:06, Konstantin Belousov=20 >>>>>>> wrote: >>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon=20 >>>>>>>> wrote: >>>>>>>>> [...] >>>>>>> - Second, if issue is still in stable/10, compile 10.2 >>>>>>> kernel with these options: >>>>>>>=20 >>>>>>> options DDB options DEADLKRES options=20 >>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>> options WITNESS_SKIPSPIN >>>>>>>=20 >>>>>>> To see where the original fault is coming from. >>>>>> [...] >>>>>>=20 >>>>>> I'll try stable/10 now. Would you suggest a "clean" >>>>>> stable/10, or could 287621 and 287780 help? >>>>>>=20 >>>>>> I'll add the debugging suggested options right away. >>>>>>=20 >>>>>> Palle >>>>>=20 >>>>> I have a new core dump from ^/stable/10 with: >>>>>=20 >>>>> options DDB options DEADLKRES options INVARIANTS >>>>> options INVARIANT_SUPPORT options WITNESS options >>>>> WITNESS_SKIPSPIN >>>>=20 >>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] >>>> Copyright 2004 Free Software Foundation, Inc. GDB is free >>>> software, covered by the GNU General Public License, and you are >>>> welcome to change it and/or distribute copies of it under certain >>>> conditions. Type "show copying" to see the conditions. There is >>>> absolutely no warranty for GDB. Type "show warranty" for >>>> details. This GDB was configured as "amd64-marcel-freebsd"... >>>>=20 >>>> Unread portion of the kernel message buffer: panic: tcp_detach:=20 >>>> INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack=20= >>>> backtrace: db_trace_self_wrapper() at=20 >>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 >>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() >>>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() >>>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at=20 >>>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at=20 >>>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at=20 >>>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at >>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at >>>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at >>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at >>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, >>>> FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D >>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic >>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>=20 >>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols >>>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/zfs.ko.symbols Reading symbols from=20 >>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/opensolaris.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) = (kgdb) bt #0=20 >>>> doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in=20 >>>> kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 >>>> #2 0xffffffff8094b845 in vpanic (fmt=3D, >>>> ap=3D) at >>>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in >>>> kassert_panic (fmt=3D) at=20 >>>> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in=20 >>>> tcp_usr_detach (so=3D) at=20 >>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in=20 >>>> sofree (so=3D0xfffff801dd302000) at=20 >>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in=20 >>>> soclose (so=3D) at=20 >>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in=20 >>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>> td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 >>>> #9 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, >>>> fd=3D, fp=3D0xfffff802a593db40, >>>> td=3D0xfffff80eebc894a0, holdleaders=3D) at=20 >>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in=20 >>>> amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at=20 >>>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a >>>> in ?? () Previous frame inner to this frame (corrupt stack?) >>>> Current language: auto; currently minimal >>>=20 >>> Thanks for the information. As I suspected the initial error was=20 >>> elsewhere than tcp_twclose(), never got this assertion before: >>>=20 >>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>=20 >>> from here: >>>=20 >>> static void tcp_detach(struct socket *so, struct inpcb *inp) {=20 >>> struct tcpcb *tp; >>>=20 >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>=20 >>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20 >>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D = so")); >>>=20 >>> tp =3D intotcpcb(inp); >>>=20 >>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & >>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT && = "=20 >>> "INP_DROPPED && tp !=3D NULL")); >>>=20 >>> Let me check if I could find a path that could lead to this=20 >>> unexpected case. Unexpected because: INP_DROPPED and >>> inp->inp_ppcb is set to NULL are set at same time here: >>>=20 >>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so;=20= >>> struct inpcb *inp; >>>=20 >>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), >>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw, >>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>> INP_WLOCK_ASSERT(inp); >>>=20 >>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>> in_pcbdrop(inp); ... >>>=20 >>> Interesting and by the way could you try: >>>=20 >>> # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads >>>=20 >>> To see if other thread are also in TCP stack at the same time, and >>> if one of this thread is referencing the same inp. >>>=20 >>=20 >> BTW, this backtrace looks quite different from the previous ones? Is >> it a different problem, or just a different way to reveal the same >> problem? >=20 > Having a different backstrace is expected, the first backstrace was > quite deep in the stack, and kernel panic-ed quite late after the > original issue. With the kernel debug options, the kernel stopped at > the first suspicious fact. >=20 > And I would say that I still does not understand how it is possible to > reach that state, it is like kernel exclusive lock stopped to work and > allow several thread to work on the same locked inp at same time. > (Stack overflow, VIMAGE memory corruption, UMA issue? ... weird). >=20 > I will ask you the kernel and the core dump but off list to avoid too > much spam in -net. >=20 Just a quick update. Julien is pursuing this off list with a core dump = and we are now waiting for a new core dump with the first KASSERT = removed. this is on a stable/10 kernel. This is a really big problem for us now, 100k+ users use these = systems... Would anyone suggest any paralell routes here to speed up the = process of finding the culprit? Perhaps try a head kernel on one of the = machines? Palle From owner-freebsd-net@freebsd.org Wed Sep 23 14:47:46 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 290A8A06C8B for ; Wed, 23 Sep 2015 14:47:46 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f177.google.com (mail-wi0-f177.google.com [209.85.212.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B45D51928; Wed, 23 Sep 2015 14:47:45 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wiclk2 with SMTP id lk2so72968010wic.1; Wed, 23 Sep 2015 07:47:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=diXdyRwqoawyzOFUN0UW5DlS8+ZLiHD9ewLjaJk92cM=; b=ShVfza2uybC9hKxkIMiTNSyFgOp7o3UhIPPcjHrq/yvygBJF6ZlPZSo+raxbL/YVk7 x94kFB66yiTUxmzZqaKelXfxn4WrQarJw2crGCIIK2yDXwI6rf1NaDHrbwNiA0yK5z7e ZSfZ3EKRmbSbwtifyR2Tl+8NS1IE2pB3xj1w7TrcBjYWRplBGOjb7gb+t6tnz4HOWbM2 45t8/MkNnySy7sxx+sxU9JZzzqahR/S0kfJEQ9TqDhNn5StZ706XVXjtaE8jaB7F6eDK MhWagkZjGGdzd5Zq/HEvQ/uWQ5YPl3/LH7AhsWEoHpvHOrkygpG+HlkMEKaIPOZ/6o9Q rJKA== X-Received: by 10.194.23.167 with SMTP id n7mr35129629wjf.112.1443019658576; Wed, 23 Sep 2015 07:47:38 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id pl7sm757207wic.4.2015.09.23.07.47.36 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Sep 2015 07:47:38 -0700 (PDT) Subject: Can tcp_close() be called in INP_TIMEWAIT case [was: Re: Kernel panics in tcp_twclose] To: Palle Girgensohn , John Baldwin References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> Cc: Konstantin Belousov , freebsd-net@freebsd.org, Robert Watson , Warner Losh From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <5602BB7A.9010504@freebsd.org> Date: Wed, 23 Sep 2015 16:47:22 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <5601CF2D.9030307@freebsd.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="s9lbmCv2GbI7TneJF7f7xkPr5S7L6gbRV" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 14:47:46 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --s9lbmCv2GbI7TneJF7f7xkPr5S7L6gbRV Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi -net, On 22/09/15 23:59, Julien Charbon wrote: > On 22/09/15 22:58, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 20:16 skrev Julien Charbon : >>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>> : >>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>> : >>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>> : On 18/09/15 18:06, Konstantin Belousov=20 >>>>>>> wrote: >>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon=20 >>>>>>>> wrote: >>>>>>>>> [...] >>>>>>> - Second, if issue is still in stable/10, compile 10.2 >>>>>>> kernel with these options: >>>>>>> >>>>>>> options DDB options DEADLKRES options=20 >>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>> options WITNESS_SKIPSPIN >>>>>>> >>>>>>> To see where the original fault is coming from. >>>>>> [...] >>>>>> >>>>>> I'll try stable/10 now. Would you suggest a "clean" >>>>>> stable/10, or could 287621 and 287780 help? >>>>>> >>>>>> I'll add the debugging suggested options right away. >>>>>> >>>>>> Palle >>>>> >>>>> I have a new core dump from ^/stable/10 with: >>>>> >>>>> options DDB options DEADLKRES options INVARIANTS >>>>> options INVARIANT_SUPPORT options WITNESS options >>>>> WITNESS_SKIPSPIN >>>> >>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD] >>>> Copyright 2004 Free Software Foundation, Inc. GDB is free >>>> software, covered by the GNU General Public License, and you are >>>> welcome to change it and/or distribute copies of it under certain >>>> conditions. Type "show copying" to see the conditions. There is >>>> absolutely no warranty for GDB. Type "show warranty" for >>>> details. This GDB was configured as "amd64-marcel-freebsd"... >>>> >>>> Unread portion of the kernel message buffer: panic: tcp_detach:=20 >>>> INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D 16 KDB: stack = >>>> backtrace: db_trace_self_wrapper() at=20 >>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0 >>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic() >>>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach() >>>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at=20 >>>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at=20 >>>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at=20 >>>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at >>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at >>>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at >>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at >>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6, >>>> FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D >>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic >>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>> >>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols >>>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/zfs.ko.symbols Reading symbols from=20 >>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/opensolaris.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0=20 >>>> doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 in=20 >>>> kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 >>>> #2 0xffffffff8094b845 in vpanic (fmt=3D, >>>> ap=3D) at >>>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in >>>> kassert_panic (fmt=3D) at=20 >>>> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in=20 >>>> tcp_usr_detach (so=3D) at=20 >>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in=20 >>>> sofree (so=3D0xfffff801dd302000) at=20 >>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in=20 >>>> soclose (so=3D) at=20 >>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in=20 >>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>> td=3D0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338 >>>> #9 0xffffffff808feb5d in closefp (fdp=3D0xfffff80b20cce000, >>>> fd=3D, fp=3D0xfffff802a593db40, >>>> td=3D0xfffff80eebc894a0, holdleaders=3D) at=20 >>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in=20 >>>> amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at=20 >>>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a >>>> in ?? () Previous frame inner to this frame (corrupt stack?) >>>> Current language: auto; currently minimal >>> >>> Thanks for the information. As I suspected the initial error was=20 >>> elsewhere than tcp_twclose(), never got this assertion before: >>> >>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>> >>> from here: >>> >>> static void tcp_detach(struct socket *so, struct inpcb *inp) {=20 >>> struct tcpcb *tp; >>> >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>> >>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20 >>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D so")= ); >>> >>> tp =3D intotcpcb(inp); >>> >>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & >>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT && = "=20 >>> "INP_DROPPED && tp !=3D NULL")); >>> >>> Let me check if I could find a path that could lead to this=20 >>> unexpected case. Unexpected because: INP_DROPPED and >>> inp->inp_ppcb is set to NULL are set at same time here: >>> >>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so;=20 >>> struct inpcb *inp; >>> >>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT), >>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw, >>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>> INP_WLOCK_ASSERT(inp); >>> >>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>> in_pcbdrop(inp); ... Thanks to Palle, I got access to the kernel dump. And the results is more interesting than expected: Thus somehow the kernel reaches a state in tcp_detach() where: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL In details: - inp is in TIMEWAIT state - inp has been dropped by in_pcbdrop() - inp->inp_ppcb (a struct tcptw) is not NULL All the related structures looks good from the coredump: socket, inp, and tcptw, thus no sign of any memory corruption (so far). And for the kernel, this state it is _not_ ok. Hopefully, there are only two functions that set the INP_DROPPED flags: - tcp_twclose() and, - tcp_close() If tcp_twclose() is called inp->inp_ppcb is set to NULL and the struct tcptw is freed (all good, not assertion) If tcp_close() is called inp->inp_ppcb is left untouched (less ok, potential assertion) Almost all tcp_close() calls (or tcp_close() parents calls) use a pattern like: if (inp->inp_flags & INP_TIMEWAIT) { /* Don't call tcp_close() just return */ return; } /* Call tcp_close() */ tcp_close(); But not _all_ tcp_close() calls. Thus the most important point here is: Either this assertion is wrong, either tcp_close() in INP_TIMEWAIT state should not happen. This assert and tcp_close() current behavior is here since a long time, thus I would like old beards^W^W^W more experimented TCP stack developers to give an opinion/refresh theirs memories on this very specific case. Thanks. -- Julien --s9lbmCv2GbI7TneJF7f7xkPr5S7L6gbRV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWAruHAAoJEKVlQ5Je6dhxG18H/jDEaEnWvEPSUvo7xPkHcTeW eiyWDP99Lrfa2xe0CNdyLMhrv3/Ck5nga9bBfS1GFGyijy4+Ta6GEBC9xy+XP9uY Y7hh2PEZiGHaHsfZo+DwnOWhZzAWACZoXZwi4kgDU5cvzlona/6Oij9880EYISeJ oXUIX74e1RyUP0JcWLyAxjFx/V/Hf9c4UNpZOONDcGnbwjR9s5x4KggyVDGXaAaX rae+kbXyFrIG7Uaia5J2ipyck7wmrHV505qKrPQrZq0eig1rQtIn9jpjf9LRzHgR /EJ5IGfXBD8ajVazqLf3uHR+M+8DX49NS/iTUo0qzmLesNgFxf0oUlvwBR+RhM8= =m8Mc -----END PGP SIGNATURE----- --s9lbmCv2GbI7TneJF7f7xkPr5S7L6gbRV-- From owner-freebsd-net@freebsd.org Wed Sep 23 15:40:33 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E6BC4A06853 for ; Wed, 23 Sep 2015 15:40:33 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-ig0-x22a.google.com (mail-ig0-x22a.google.com [IPv6:2607:f8b0:4001:c05::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B5B5C1A92; Wed, 23 Sep 2015 15:40:33 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by igxx6 with SMTP id x6so30404945igx.1; Wed, 23 Sep 2015 08:40:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=PZGl/Z3HlUDzrv3MdHljPZAu2CZeaNs6sbSXXhODLgY=; b=YOQLwwZyGD8ck5dSYD1vAhB0FNs8yKdMZgkFBizDg9vtniCKuEXRHubieqleTd2AGI qzOo2jI4wvcxXu4iqLwFqR+y+9jA5NmW8i/xJ9vMWn26vb7EaJ26EjBlbFhfM4fhvRxA eTHA8T4bF2I8uSD3+oDwn7PxH43Uoa8xB2j6ZDYxnIeLPzLm8D2nlxteiYuIsUXAWqw4 WN5CU8jX74NGfk8DXxIa2NpjPSE2GvThY6CF+zz9sDkbuZWld9dt2HdMukSjB44g7k1p 8yFmIoqdGJK59sGUhAaLaHD2H6xg4wn1oqi34mRr87aEiV7KdIjNXHkTqKC1ZOXHZECG eolg== MIME-Version: 1.0 X-Received: by 10.50.45.33 with SMTP id j1mr22857859igm.61.1443022833229; Wed, 23 Sep 2015 08:40:33 -0700 (PDT) Received: by 10.36.28.208 with HTTP; Wed, 23 Sep 2015 08:40:33 -0700 (PDT) In-Reply-To: <560250B9.3080002@FreeBSD.org> References: <56011276.4060206@FreeBSD.org> <560250B9.3080002@FreeBSD.org> Date: Wed, 23 Sep 2015 08:40:33 -0700 Message-ID: Subject: Re: page fault in tcp_do_segment (r287759 suspected) From: Adrian Chadd To: Andriy Gapon Cc: freebsd-net , "George V. Neville-Neil" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 15:40:34 -0000 Yes, file a bug. :) -a On 23 September 2015 at 00:11, Andriy Gapon wrote: > On 22/09/2015 11:33, Andriy Gapon wrote: >> That code actually looks like the following DTrace probe a few lines below: >> >> TCP_PROBE3(debug__input, tp, th, mtod(m, const char *)); >> >> So, it seems like 'm' could be NULL here. >> I see two places in tcp_do_segment() where m gets assigned with NULL followed by >> goto drop. If I had to guess then my guess would be that one of those code >> paths was taken. >> Since those NULL assignments were there for more than a year, then I would guess >> that the addition of the probe is to blame: >> https://svnweb.freebsd.org/base?view=revision&revision=287759 > > Should I file a bug report about this? > Does anyone has suggestion for a simple fix? > > -- > Andriy Gapon > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@freebsd.org Wed Sep 23 18:01:57 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AFE2AA073A2 for ; Wed, 23 Sep 2015 18:01:57 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 61D0D12D9; Wed, 23 Sep 2015 18:01:57 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wicge5 with SMTP id ge5so219085573wic.0; Wed, 23 Sep 2015 11:01:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=IcMn0Aa3k5jSNmA0xMTsXqtShidola2jY8g0hG8iDDs=; b=A5djKGwMamfw8zpKEYmhDT7MgVoLeIiVaSpC8GVkVBqWfxB8Hb2hODFi/VfxPdBj+A R0b97ntrEDwcnVwcCE2ALRT3hvEFPaxyfv0Gj1+xxigeFj+4UOh0LMSOWTjOIxFv69DY KWLPACYpatTFwGGBfAtrvcT63cXud+H9O367IhqkUvBTjZhKVEJAMeTGpkc2PPlvA25F z5fy/JHtIgN3P5zw42G/A6hXhPwf8OP0Tk8KX748cp6gk0zbkh8hhg1ydeqDLpNEjsxn Lqy7TmX0RLe+W181X5kPeeWrvAvw4r2JXeP0jQAQSGOWZj8/UUwNm7AuKdCKAubCB7+z SH5g== X-Received: by 10.194.71.39 with SMTP id r7mr41441059wju.120.1443031315590; Wed, 23 Sep 2015 11:01:55 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id c2sm1527660wiy.11.2015.09.23.11.01.54 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Sep 2015 11:01:54 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn , George Neville-Neil References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> Cc: freebsd-net@freebsd.org From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <5602E90A.9050504@freebsd.org> Date: Wed, 23 Sep 2015 20:01:46 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="QuKFFsTUFTTrLT9EK0ha1xMF6OlutTF7c" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 18:01:57 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --QuKFFsTUFTTrLT9EK0ha1xMF6OlutTF7c Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Palle, Hi George, On 23/09/15 16:36, Palle Girgensohn wrote: >> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >> 22/09/15 22:58, Palle Girgensohn wrote: >>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20 >>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>> : >>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>> : >>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>> Belousov wrote: >>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>> Charbon wrote: >>>>>>>>>> [...] >>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>> kernel with these options: >>>>>>>>=20 >>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>=20 >>>>>>>> To see where the original fault is coming from. >>>>>>> [...] >>>>>>>=20 >>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>=20 >>>>>>> I'll add the debugging suggested options right away. >>>>>>>=20 >>>>>>> Palle >>>>>>=20 >>>>>> I have a new core dump from ^/stable/10 with: >>>>>>=20 >>>>>> options DDB options DEADLKRES options >>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>> options WITNESS_SKIPSPIN >>>>>=20 >>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>> software, covered by the GNU General Public License, and you >>>>> are welcome to change it and/or distribute copies of it under >>>>> certain conditions. Type "show copying" to see the >>>>> conditions. There is absolutely no warranty for GDB. Type >>>>> "show warranty" for details. This GDB was configured as >>>>> "amd64-marcel-freebsd"... >>>>>=20 >>>>> Unread portion of the kernel message buffer: panic: >>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D >>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20 >>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20 >>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>=20 >>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20 >>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt >>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>> in kern_reboot (howto=3D260) at >>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>> in vpanic (fmt=3D, ap=3D>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>> soclose (so=3D) at=20 >>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>> td=3D0xfffff80eebc894a0) at >>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D, >>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>> holdleaders=3D) at=20 >>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>> frame (corrupt stack?) Current language: auto; currently >>>>> minimal >>>>=20 >>>> Thanks for the information. As I suspected the initial error >>>> was elsewhere than tcp_twclose(), never got this assertion >>>> before: >>>>=20 >>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>=20 >>>> from here: >>>>=20 >>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>> struct tcpcb *tp; >>>>=20 >>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>=20 >>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20 >>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D >>>> so")); >>>>=20 >>>> tp =3D intotcpcb(inp); >>>>=20 >>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT >>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>=20 >>>> Let me check if I could find a path that could lead to this=20 >>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>=20 >>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>> *so; struct inpcb *inp; >>>>=20 >>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20 >>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>> INP_WLOCK_ASSERT(inp); >>>>=20 >>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>> in_pcbdrop(inp); ... >>>>=20 >>>> Interesting [...] >=20 > Just a quick update. Julien is pursuing this off list with a core > dump and we are now waiting for a new core dump with the first > KASSERT removed. this is on a stable/10 kernel. By the way Palle could you also run below Dtrace script to see where this tcp_close() in INP_TIMEWAIT comes from: $ cat tcp-close-tw.d fbt::tcp_close:entry /args[0]->t_inpcb->inp_flags & 0x01000000/ { @s1[stack()] =3D count() } tick-1sec { printa(@s1); } $ sudo dtrace -s tcp-close-tw.d And share any backtraces reported in this dtrace script output. George, could you check if this dtrace script makes sense for you, and if you have any improvements to add, I am quite a rookie in Dtrace script= s. > This is a really big problem for us now, 100k+ users use these > systems... Would anyone suggest any paralell routes here to speed up > the process of finding the culprit? I agree, this issue is a complex one, more eyeballs/propositions would be great. Thanks. -- Julien --QuKFFsTUFTTrLT9EK0ha1xMF6OlutTF7c Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWAukRAAoJEKVlQ5Je6dhxcscIAMd9BXemZn8hXsSoZfzxfEgE yKSqLz5BVjFMsVhZqrGGbHgJa7AMtLtE8sB5MFsO4RRKYY/kotepO8epvRR8sxDK +EyLsZnl5k/mqmlCkQ533ZTW8YUvPs57XauGLVp543jWxe1IxdjnTJnH6AQeN5Tg r7iljW8RoIvGqc/8EMsuCrHsZYPEoMJhsqx8m86oBaTJv7n2PkDyIE3RaBN8hliP iI6Z5KAud2UUF/bH99RJdsIHtHHphrsNcZ5c0RBHxSTubf1BP97PnVHNL6EUlpRD J12PI9P8cKFdBnrHoKb34EN/fOBEZSsy991mQbr4tkuAVpIUf/Yylzy4Dqy30is= =vBpj -----END PGP SIGNATURE----- --QuKFFsTUFTTrLT9EK0ha1xMF6OlutTF7c-- From owner-freebsd-net@freebsd.org Wed Sep 23 18:26:07 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A6E3A07E94 for ; Wed, 23 Sep 2015 18:26:07 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 9933412B9; Wed, 23 Sep 2015 18:26:05 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from [10.21.86.209] (c-5eeaaabe-74736162.cust.telenor.se [94.234.170.190]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id E7636CB64; Wed, 23 Sep 2015 20:25:57 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn X-Mailer: iPhone Mail (13A344) In-Reply-To: <5602E90A.9050504@freebsd.org> Date: Wed, 23 Sep 2015 20:26:02 +0200 Cc: Palle Girgensohn , George Neville-Neil , freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> To: Julien Charbon X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 18:26:07 -0000 > 23 sep. 2015 kl. 20:01 skrev Julien Charbon : >=20 >=20 > Hi Palle, Hi George, >=20 > On 23/09/15 16:36, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20 >>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>> : >>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>>> : >>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>>> Belousov wrote: >>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>> Charbon wrote: >>>>>>>>>>> [...] >>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>>> kernel with these options: >>>>>>>>>=20 >>>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>=20 >>>>>>>>> To see where the original fault is coming from. >>>>>>>> [...] >>>>>>>>=20 >>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>=20 >>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>=20 >>>>>>>> Palle >>>>>>>=20 >>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>=20 >>>>>>> options DDB options DEADLKRES options >>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>> options WITNESS_SKIPSPIN >>>>>>=20 >>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>>> software, covered by the GNU General Public License, and you >>>>>> are welcome to change it and/or distribute copies of it under >>>>>> certain conditions. Type "show copying" to see the >>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>> "show warranty" for details. This GDB was configured as >>>>>> "amd64-marcel-freebsd"... >>>>>>=20 >>>>>> Unread portion of the kernel message buffer: panic: >>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D >>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20 >>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20 >>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>=20 >>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20 >>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt= >>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>>> in kern_reboot (howto=3D260) at >>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>> in vpanic (fmt=3D, ap=3D>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>> soclose (so=3D) at=20 >>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>>> td=3D0xfffff80eebc894a0) at >>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D, >>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>>> holdleaders=3D) at=20 >>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>> minimal >>>>>=20 >>>>> Thanks for the information. As I suspected the initial error >>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>> before: >>>>>=20 >>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>>=20 >>>>> from here: >>>>>=20 >>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>> struct tcpcb *tp; >>>>>=20 >>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>=20 >>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20 >>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D >>>>> so")); >>>>>=20 >>>>> tp =3D intotcpcb(inp); >>>>>=20 >>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT >>>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>>=20 >>>>> Let me check if I could find a path that could lead to this=20 >>>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>=20 >>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>> *so; struct inpcb *inp; >>>>>=20 >>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20 >>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>>> INP_WLOCK_ASSERT(inp); >>>>>=20 >>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>>> in_pcbdrop(inp); ... >>>>>=20 >>>>> Interesting [...] >>=20 >> Just a quick update. Julien is pursuing this off list with a core >> dump and we are now waiting for a new core dump with the first >> KASSERT removed. this is on a stable/10 kernel. >=20 > By the way Palle could you also run below Dtrace script to see where > this tcp_close() in INP_TIMEWAIT comes from: >=20 > $ cat tcp-close-tw.d > fbt::tcp_close:entry > /args[0]->t_inpcb->inp_flags & 0x01000000/ > { > @s1[stack()] =3D count() > } >=20 > tick-1sec { > printa(@s1); > } > $ sudo dtrace -s tcp-close-tw.d >=20 > And share any backtraces reported in this dtrace script output. >=20 > George, could you check if this dtrace script makes sense for you, and > if you have any improvements to add, I am quite a rookie in Dtrace scripts= . >=20 >> This is a really big problem for us now, 100k+ users use these >> systems... Would anyone suggest any paralell routes here to speed up >> the process of finding the culprit? >=20 > I agree, this issue is a complex one, more eyeballs/propositions would > be great. >=20 > Thanks. >=20 > -- > Julien >=20 Shall I let the dtrace script run continuously until the machine crashes? Or= just run it once? AFK for a few hours. Will try it later tonight. :-)= From owner-freebsd-net@freebsd.org Wed Sep 23 18:32:54 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 235A8A06256 for ; Wed, 23 Sep 2015 18:32:54 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A71C418B2; Wed, 23 Sep 2015 18:32:53 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wicfx3 with SMTP id fx3so82113649wic.0; Wed, 23 Sep 2015 11:32:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=VV3IkMx1rlwBHuhdbztLl723fj0QJ9XOKpq0LTP+LsE=; b=TbogrIDQmgexW7YVJeCfq22aauoho2rNgV/wvkw7839bwSXRDy8nt9TbxbOezYwKcb nmiROw9d0BYuZBRYnOjaQHemxl8ekZlIE5n1kYFnT5QJdJDW49tpM0JlDVz/A4846zJx BZFG0zyupLYJSHqz7c5zttXy0IvGTcldntu1T2YBf4NAI9OEcP0L2uXGHQDqP6j/vjRM svP0WnWHv4v6yeARCye6OtaDRmcmM+lsTDaTUUKKV/69f7nNfvwrPUP+mEtiA6kFGv/R w766JeHjviKGW1MJv2Ud4HVHFyOsRm1UD9lAg6cQNcZwXl9IOxLroAHBh9MC1bBbSvhR AL6w== X-Received: by 10.180.107.130 with SMTP id hc2mr5127224wib.92.1443033166339; Wed, 23 Sep 2015 11:32:46 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id hs5sm1655455wib.6.2015.09.23.11.32.44 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Sep 2015 11:32:45 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> Cc: Palle Girgensohn , George Neville-Neil , freebsd-net@freebsd.org From: Julien Charbon Message-ID: <5602F044.5010606@freebsd.org> Date: Wed, 23 Sep 2015 20:32:36 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="epRRqLROkIsrJcaDOpMFvODp97niGFVa6" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 18:32:54 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --epRRqLROkIsrJcaDOpMFvODp97niGFVa6 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi, On 23/09/15 20:26, Palle Girgensohn wrote: >> 23 sep. 2015 kl. 20:01 skrev Julien Charbon : >> On 23/09/15 16:36, Palle Girgensohn wrote: >>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >>>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20 >>>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>>> : >>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>>>> : >>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>>>> Belousov wrote: >>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>>> Charbon wrote: >>>>>>>>>>>> [...] >>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>>>> kernel with these options: >>>>>>>>>> >>>>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>> >>>>>>>>>> To see where the original fault is coming from. >>>>>>>>> [...] >>>>>>>>> >>>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>> >>>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>> >>>>>>>>> Palle >>>>>>>> >>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>> >>>>>>>> options DDB options DEADLKRES options >>>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>>> options WITNESS_SKIPSPIN >>>>>>> >>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>>>> software, covered by the GNU General Public License, and you >>>>>>> are welcome to change it and/or distribute copies of it under >>>>>>> certain conditions. Type "show copying" to see the >>>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>>> "show warranty" for details. This GDB was configured as >>>>>>> "amd64-marcel-freebsd"... >>>>>>> >>>>>>> Unread portion of the kernel message buffer: panic: >>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D= >>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20 >>>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20 >>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>> >>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20 >>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb= ) bt >>>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>>>> in kern_reboot (howto=3D260) at >>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>>> in vpanic (fmt=3D, ap=3D>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>>> soclose (so=3D) at=20 >>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>>>> td=3D0xfffff80eebc894a0) at >>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D,= >>>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>>>> holdleaders=3D) at=20 >>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>>> minimal >>>>>> >>>>>> Thanks for the information. As I suspected the initial error >>>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>>> before: >>>>>> >>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>>> >>>>>> from here: >>>>>> >>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>>> struct tcpcb *tp; >>>>>> >>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>> >>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20 >>>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D >>>>>> so")); >>>>>> >>>>>> tp =3D intotcpcb(inp); >>>>>> >>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT >>>>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>>> >>>>>> Let me check if I could find a path that could lead to this=20 >>>>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>> >>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>>> *so; struct inpcb *inp; >>>>>> >>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20 >>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>>>> INP_WLOCK_ASSERT(inp); >>>>>> >>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>>>> in_pcbdrop(inp); ... >>>>>> >>>>>> Interesting [...] >>> >>> Just a quick update. Julien is pursuing this off list with a core >>> dump and we are now waiting for a new core dump with the first >>> KASSERT removed. this is on a stable/10 kernel. >> >> By the way Palle could you also run below Dtrace script to see where >> this tcp_close() in INP_TIMEWAIT comes from: >> >> $ cat tcp-close-tw.d >> fbt::tcp_close:entry >> /args[0]->t_inpcb->inp_flags & 0x01000000/ >> { >> @s1[stack()] =3D count() >> } >> >> tick-1sec { >> printa(@s1); >> } >> $ sudo dtrace -s tcp-close-tw.d >> >> And share any backtraces reported in this dtrace script output. >> >> George, could you check if this dtrace script makes sense for you, and= >> if you have any improvements to add, I am quite a rookie in Dtrace scr= ipts. >=20 > Shall I let the dtrace script run continuously until the machine crashe= s? Or just run it once? Continuously until the machine crashes. You can report any backstrace outputs like: kernel`tcp_usr_close+0x86 kernel`soclose+0xe4 kernel`_fdrop+0x29 kernel`closef+0x237 kernel`closefp+0x95 kernel`amd64_syscall+0x357 kernel`0xffffffff80c83c4b 1 before the machine crashes. But I expect the problematic case detection with Dtrace to be quickly followed by the crash. Will see. -- Julien --epRRqLROkIsrJcaDOpMFvODp97niGFVa6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWAvBLAAoJEKVlQ5Je6dhxCaMH/3bDejnfXmp8vdhssczFoK7w rcjM/BZOpwLAMRFujSnIvokDC3eNhFFg2xM6ieECA0yAPSqN15jG7RBAiK6J2itS Lfo3LNgnoBfQMgXnkU1kpNZ4n5bQLVFIl2MKvOYyDjphhtDcUj0LUBICGt6t9CLN D0SU7+R6PkUSNpHXnI0K8K/cfgtkiXZhSAS6sucgOqhPiRn6gT0lFDql3LBV7QqJ w1RMjdOzO1jgIGJZl4J+2yer7rzFeciIasjfPtEk7NkvVkKh0q99jDTQGcRkwpnb BxS+ux2wJ4hiMV6F9LVAtgV4T000FnhvVFljL5Z8jY0hqJDPuzl2TIkEGttO5aE= =1EB0 -----END PGP SIGNATURE----- --epRRqLROkIsrJcaDOpMFvODp97niGFVa6-- From owner-freebsd-net@freebsd.org Wed Sep 23 21:15:05 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4E42A08936 for ; Wed, 23 Sep 2015 21:15:05 +0000 (UTC) (envelope-from girgen@gmail.com) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 63D6E1E6E; Wed, 23 Sep 2015 21:15:04 +0000 (UTC) (envelope-from girgen@gmail.com) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id 364B3D38C; Wed, 23 Sep 2015 23:15:03 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id L5-uLX4pnD5v; Wed, 23 Sep 2015 23:15:03 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 0BCD2D389; Wed, 23 Sep 2015 23:15:03 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <5602E90A.9050504@freebsd.org> Date: Wed, 23 Sep 2015 23:15:03 +0200 Cc: George Neville-Neil , freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 21:15:05 -0000 > 23 sep 2015 kl. 20:01 skrev Julien Charbon : >=20 >=20 > Hi Palle, Hi George, >=20 > On 23/09/15 16:36, Palle Girgensohn wrote: >>> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20 >>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>> : >>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>>> : >>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>>> Belousov wrote: >>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>> Charbon wrote: >>>>>>>>>>> [...] >>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>>> kernel with these options: >>>>>>>>>=20 >>>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>=20 >>>>>>>>> To see where the original fault is coming from. >>>>>>>> [...] >>>>>>>>=20 >>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>=20 >>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>=20 >>>>>>>> Palle >>>>>>>=20 >>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>=20 >>>>>>> options DDB options DEADLKRES options >>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>> options WITNESS_SKIPSPIN >>>>>>=20 >>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>>> software, covered by the GNU General Public License, and you >>>>>> are welcome to change it and/or distribute copies of it under >>>>>> certain conditions. Type "show copying" to see the >>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>> "show warranty" for details. This GDB was configured as >>>>>> "amd64-marcel-freebsd"... >>>>>>=20 >>>>>> Unread portion of the kernel message buffer: panic: >>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D >>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20 >>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20 >>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>=20 >>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20 >>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20 >>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) = (kgdb) bt >>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>>> in kern_reboot (howto=3D260) at >>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>> in vpanic (fmt=3D, ap=3D>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>> soclose (so=3D) at=20 >>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>>> td=3D0xfffff80eebc894a0) at >>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D, >>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>>> holdleaders=3D) at=20 >>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>> minimal >>>>>=20 >>>>> Thanks for the information. As I suspected the initial error >>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>> before: >>>>>=20 >>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>>=20 >>>>> from here: >>>>>=20 >>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>> struct tcpcb *tp; >>>>>=20 >>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>=20 >>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20= >>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D >>>>> so")); >>>>>=20 >>>>> tp =3D intotcpcb(inp); >>>>>=20 >>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT >>>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>>=20 >>>>> Let me check if I could find a path that could lead to this=20 >>>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>=20 >>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>> *so; struct inpcb *inp; >>>>>=20 >>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20 >>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>>> INP_WLOCK_ASSERT(inp); >>>>>=20 >>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>>> in_pcbdrop(inp); ... >>>>>=20 >>>>> Interesting [...] >>=20 >> Just a quick update. Julien is pursuing this off list with a core >> dump and we are now waiting for a new core dump with the first >> KASSERT removed. this is on a stable/10 kernel. >=20 > By the way Palle could you also run below Dtrace script to see where > this tcp_close() in INP_TIMEWAIT comes from: >=20 > $ cat tcp-close-tw.d > fbt::tcp_close:entry > /args[0]->t_inpcb->inp_flags & 0x01000000/ > { > @s1[stack()] =3D count() > } >=20 > tick-1sec { > printa(@s1); > } > $ sudo dtrace -s tcp-close-tw.d # dtrace -s tcp-close-tw.d dtrace: failed to compile script tcp-close-tw.d: line 2: t_inpcb is not = a member of struct e1000_hw >=20 on one system... and for the other two: # dtrace -s tcp-close-tw.d dtrace: failed to initialize dtrace: DTrace device not available on = system I'm adding options KDTRACE_HOOKS to the kernels, I guess that will help? Palle From owner-freebsd-net@freebsd.org Wed Sep 23 21:25:46 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07592A08D9E for ; Wed, 23 Sep 2015 21:25:46 +0000 (UTC) (envelope-from nparhar@gmail.com) Received: from mail-pa0-x22e.google.com (mail-pa0-x22e.google.com [IPv6:2607:f8b0:400e:c03::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C91641370; Wed, 23 Sep 2015 21:25:45 +0000 (UTC) (envelope-from nparhar@gmail.com) Received: by pacex6 with SMTP id ex6so51063981pac.0; Wed, 23 Sep 2015 14:25:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=wb4uTsfZLPI2sJowlsXKS9gm1t0zzhr5PXSB9XQvoqM=; b=lYIa1BoRQexSd7gX61TIvtH4Rsf71MjTyettW/KQGsevNFgOd1R9DBrPZNsTR/CavB hiPo0LN8FJ5MEx83LLaIM8FIErJutnlKv+7rpQhttl5qB3H1zCZW4cjlqJ9sHryRhYYZ ulYac5QZe6cIfxT0R7wY2ekXqnltTVbzm2cpwLz7WPeOVRwLAk7f2eqkXIWXGuHTtarm I2kvy/6alewT4sx6bYAkyPgVh0bqW6MyBW97A+iG0VR2tBI0BwEBvZGxHOu8LLgUSoDY bfUyo3LertijHI83iDhLLsKHXkxzasNbs8r1RMBNV089A08YFTNgwysjxk8zKs20yX1L 9xAQ== X-Received: by 10.68.68.233 with SMTP id z9mr39851635pbt.132.1443043545493; Wed, 23 Sep 2015 14:25:45 -0700 (PDT) Received: from nparhar-pc (nat-198-95-226-228.netapp.com. [198.95.226.228]) by smtp.gmail.com with ESMTPSA id gi4sm9724870pbc.4.2015.09.23.14.25.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Sep 2015 14:25:44 -0700 (PDT) Date: Wed, 23 Sep 2015 14:25:39 -0700 From: Navdeep Parhar To: Palle Girgensohn Cc: Julien Charbon , George Neville-Neil , freebsd-net@freebsd.org Subject: Re: Kernel panics in tcp_twclose Message-ID: <20150923212539.GA2233@nparhar-pc> Mail-Followup-To: Palle Girgensohn , Julien Charbon , George Neville-Neil , freebsd-net@freebsd.org References: <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 21:25:46 -0000 On Wed, Sep 23, 2015 at 11:15:03PM +0200, Palle Girgensohn wrote: ... > > By the way Palle could you also run below Dtrace script to see where > > this tcp_close() in INP_TIMEWAIT comes from: > > > > $ cat tcp-close-tw.d > > fbt::tcp_close:entry > > /args[0]->t_inpcb->inp_flags & 0x01000000/ > > { > > @s1[stack()] = count() > > } > > > > tick-1sec { > > printa(@s1); > > } > > $ sudo dtrace -s tcp-close-tw.d > > # dtrace -s tcp-close-tw.d > dtrace: failed to compile script tcp-close-tw.d: line 2: t_inpcb is not a member of struct e1000_hw > > > > > on one system... > > and for the other two: > > # dtrace -s tcp-close-tw.d > dtrace: failed to initialize dtrace: DTrace device not available on system > > I'm adding > > options KDTRACE_HOOKS > > to the kernels, I guess that will help? Load the DTrace modules ("kldload dtraceall") before trying to run the DTrace script. Regards, Navdeep From owner-freebsd-net@freebsd.org Wed Sep 23 21:29:11 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E2D8A08F22 for ; Wed, 23 Sep 2015 21:29:11 +0000 (UTC) (envelope-from girgen@gmail.com) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 444CC1556; Wed, 23 Sep 2015 21:29:11 +0000 (UTC) (envelope-from girgen@gmail.com) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id 0DAD7D404; Wed, 23 Sep 2015 23:29:10 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id PLMJ5mOJbvJa; Wed, 23 Sep 2015 23:29:09 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id CF004D401; Wed, 23 Sep 2015 23:29:09 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <20150923212539.GA2233@nparhar-pc> Date: Wed, 23 Sep 2015 23:29:09 +0200 Cc: Julien Charbon , George Neville-Neil , freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <20150923212539.GA2233@nparhar-pc> To: Navdeep Parhar X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 21:29:11 -0000 > 23 sep 2015 kl. 23:25 skrev Navdeep Parhar : >=20 > On Wed, Sep 23, 2015 at 11:15:03PM +0200, Palle Girgensohn wrote: > ... >>> By the way Palle could you also run below Dtrace script to see where >>> this tcp_close() in INP_TIMEWAIT comes from: >>>=20 >>> $ cat tcp-close-tw.d >>> fbt::tcp_close:entry >>> /args[0]->t_inpcb->inp_flags & 0x01000000/ >>> { >>> @s1[stack()] =3D count() >>> } >>>=20 >>> tick-1sec { >>> printa(@s1); >>> } >>> $ sudo dtrace -s tcp-close-tw.d >>=20 >> # dtrace -s tcp-close-tw.d >> dtrace: failed to compile script tcp-close-tw.d: line 2: t_inpcb is = not a member of struct e1000_hw >>=20 >>>=20 >>=20 >> on one system... >>=20 >> and for the other two: >>=20 >> # dtrace -s tcp-close-tw.d >> dtrace: failed to initialize dtrace: DTrace device not available on = system >>=20 >> I'm adding >>=20 >> options KDTRACE_HOOKS >>=20 >> to the kernels, I guess that will help? >=20 > Load the DTrace modules ("kldload dtraceall") before trying to run the > DTrace script. >=20 > Regards, > Navdeep Ah, cool, thanks. I've downgraded the kernels in thw machines from = stable to 10.2, hence som extra whining.=20 So I don't need=20 options DDB_CTF makeoptions DEBUG=3D-g makeoptions WITH_CTF=3D1 like [https://www.freebsd.org/doc/handbook/dtrace-enable.html] claims? Palle From owner-freebsd-net@freebsd.org Wed Sep 23 22:05:10 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D43C4A0638D for ; Wed, 23 Sep 2015 22:05:10 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 585CB1E7C; Wed, 23 Sep 2015 22:05:09 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id 34481D605; Thu, 24 Sep 2015 00:05:08 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 5-5IYl4I2SIt; Thu, 24 Sep 2015 00:05:08 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 09577D602; Thu, 24 Sep 2015 00:05:07 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <5602F044.5010606@freebsd.org> Date: Thu, 24 Sep 2015 00:05:07 +0200 Cc: George Neville-Neil , freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Sep 2015 22:05:10 -0000 Hi, > 23 sep 2015 kl. 20:32 skrev Julien Charbon : >=20 >=20 > Hi, >=20 > On 23/09/15 20:26, Palle Girgensohn wrote: >>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon : >>> On 23/09/15 16:36, Palle Girgensohn wrote: >>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >>>>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20 >>>>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>>>> : >>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>>>>> : >>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>>>>> Belousov wrote: >>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>>>> Charbon wrote: >>>>>>>>>>>>> [...] >>>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>>>>> kernel with these options: >>>>>>>>>>>=20 >>>>>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>>>=20 >>>>>>>>>>> To see where the original fault is coming from. >>>>>>>>>> [...] >>>>>>>>>>=20 >>>>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>>>=20 >>>>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>>>=20 >>>>>>>>>> Palle >>>>>>>>>=20 >>>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>>>=20 >>>>>>>>> options DDB options DEADLKRES options >>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>>>> options WITNESS_SKIPSPIN >>>>>>>>=20 >>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>>>>> software, covered by the GNU General Public License, and you >>>>>>>> are welcome to change it and/or distribute copies of it under >>>>>>>> certain conditions. Type "show copying" to see the >>>>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>>>> "show warranty" for details. This GDB was configured as >>>>>>>> "amd64-marcel-freebsd"... >>>>>>>>=20 >>>>>>>> Unread portion of the kernel message buffer: panic: >>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid =3D= >>>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20 >>>>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20 >>>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>>>=20 >>>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20 >>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20= >>>>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) = (kgdb) bt >>>>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>>>>> in kern_reboot (howto=3D260) at >>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>>>> in vpanic (fmt=3D, ap=3D>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>>>> soclose (so=3D) at=20 >>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>>>>> td=3D0xfffff80eebc894a0) at >>>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D, >>>>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>>>>> holdleaders=3D) at=20 >>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>>>> minimal >>>>>>>=20 >>>>>>> Thanks for the information. As I suspected the initial error >>>>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>>>> before: >>>>>>>=20 >>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>>>>=20 >>>>>>> from here: >>>>>>>=20 >>>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>>>> struct tcpcb *tp; >>>>>>>=20 >>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>>>=20 >>>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D inp"));=20= >>>>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D >>>>>>> so")); >>>>>>>=20 >>>>>>> tp =3D intotcpcb(inp); >>>>>>>=20 >>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: = INP_TIMEWAIT >>>>>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>>>>=20 >>>>>>> Let me check if I could find a path that could lead to this=20 >>>>>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>>>=20 >>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>>>> *so; struct inpcb *inp; >>>>>>>=20 >>>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20= >>>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>>>>> INP_WLOCK_ASSERT(inp); >>>>>>>=20 >>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>>>>> in_pcbdrop(inp); ... >>>>>>>=20 >>>>>>> Interesting [...] >>>>=20 >>>> Just a quick update. Julien is pursuing this off list with a core >>>> dump and we are now waiting for a new core dump with the first >>>> KASSERT removed. this is on a stable/10 kernel. >>>=20 >>> By the way Palle could you also run below Dtrace script to see where >>> this tcp_close() in INP_TIMEWAIT comes from: >>>=20 >>> $ cat tcp-close-tw.d >>> fbt::tcp_close:entry >>> /args[0]->t_inpcb->inp_flags & 0x01000000/ >>> { >>> @s1[stack()] =3D count() >>> } >>>=20 >>> tick-1sec { >>> printa(@s1); >>> } >>> $ sudo dtrace -s tcp-close-tw.d >>>=20 >>> And share any backtraces reported in this dtrace script output. >>>=20 >>> George, could you check if this dtrace script makes sense for you, = and >>> if you have any improvements to add, I am quite a rookie in Dtrace = scripts. >>=20 >> Shall I let the dtrace script run continuously until the machine = crashes? Or just run it once? >=20 > Continuously until the machine crashes. You can report any backstrace > outputs like: >=20 > kernel`tcp_usr_close+0x86 > kernel`soclose+0xe4 > kernel`_fdrop+0x29 > kernel`closef+0x237 > kernel`closefp+0x95 > kernel`amd64_syscall+0x357 > kernel`0xffffffff80c83c4b > 1 >=20 > before the machine crashes. But I expect the problematic case > detection with Dtrace to be quickly followed by the crash. Will see. >=20 > -- > Julien >=20 Kernels and userland are updated to 10.2-p3 with the patch removing the = suspicous KASSERT.=20 dtrace running continously redirecting to a log file. now we're just waiting... :) Palle From owner-freebsd-net@freebsd.org Thu Sep 24 05:51:09 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1904CA07436 for ; Thu, 24 Sep 2015 05:51:09 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id B2CD41F6F; Thu, 24 Sep 2015 05:51:08 +0000 (UTC) (envelope-from girgen@pingpong.net) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id D627DEF76; Thu, 24 Sep 2015 07:51:05 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id XO5HCWo4hW-q; Thu, 24 Sep 2015 07:51:05 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id A8887EF73; Thu, 24 Sep 2015 07:51:05 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> Date: Thu, 24 Sep 2015 07:51:02 +0200 Cc: George Neville-Neil , freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 05:51:09 -0000 > 24 sep 2015 kl. 00:05 skrev Palle Girgensohn : >=20 > Hi, >=20 >> 23 sep 2015 kl. 20:32 skrev Julien Charbon : >>=20 >>=20 >> Hi, >>=20 >> On 23/09/15 20:26, Palle Girgensohn wrote: >>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon : >>>> On 23/09/15 16:36, Palle Girgensohn wrote: >>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >>>>>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20= >>>>>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>>>>> : >>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>>>>>> : >>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>>>>>> Belousov wrote: >>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>>>>> Charbon wrote: >>>>>>>>>>>>>> [...] >>>>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>>>>>> kernel with these options: >>>>>>>>>>>>=20 >>>>>>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>>>>=20 >>>>>>>>>>>> To see where the original fault is coming from. >>>>>>>>>>> [...] >>>>>>>>>>>=20 >>>>>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>>>>=20 >>>>>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>>>>=20 >>>>>>>>>>> Palle >>>>>>>>>>=20 >>>>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>>>>=20 >>>>>>>>>> options DDB options DEADLKRES options >>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>>>>> options WITNESS_SKIPSPIN >>>>>>>>>=20 >>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>>>>>> software, covered by the GNU General Public License, and you >>>>>>>>> are welcome to change it and/or distribute copies of it under >>>>>>>>> certain conditions. Type "show copying" to see the >>>>>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>>>>> "show warranty" for details. This GDB was configured as >>>>>>>>> "amd64-marcel-freebsd"... >>>>>>>>>=20 >>>>>>>>> Unread portion of the kernel message buffer: panic: >>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid = =3D >>>>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20 >>>>>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20= >>>>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>>>>=20 >>>>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20 >>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20 >>>>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20 >>>>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20= >>>>>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) = (kgdb) bt >>>>>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 0xffffffff8094b337 >>>>>>>>> in kern_reboot (howto=3D260) at >>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>>>>> in vpanic (fmt=3D, ap=3D>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>>>>> soclose (so=3D) at=20 >>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20 >>>>>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>>>>>> td=3D0xfffff80eebc894a0) at >>>>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D, >>>>>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>>>>>> holdleaders=3D) at=20 >>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>>>>> minimal >>>>>>>>=20 >>>>>>>> Thanks for the information. As I suspected the initial error >>>>>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>>>>> before: >>>>>>>>=20 >>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>>>>>=20 >>>>>>>> from here: >>>>>>>>=20 >>>>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>>>>> struct tcpcb *tp; >>>>>>>>=20 >>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>>>>=20 >>>>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D = inp"));=20 >>>>>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D >>>>>>>> so")); >>>>>>>>=20 >>>>>>>> tp =3D intotcpcb(inp); >>>>>>>>=20 >>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>>>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: = INP_TIMEWAIT >>>>>>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>>>>>=20 >>>>>>>> Let me check if I could find a path that could lead to this=20 >>>>>>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>>>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>>>>=20 >>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>>>>> *so; struct inpcb *inp; >>>>>>>>=20 >>>>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20= >>>>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>>>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>>>>>> INP_WLOCK_ASSERT(inp); >>>>>>>>=20 >>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>>>>>> in_pcbdrop(inp); ... >>>>>>>>=20 >>>>>>>> Interesting [...] >>>>>=20 >>>>> Just a quick update. Julien is pursuing this off list with a core >>>>> dump and we are now waiting for a new core dump with the first >>>>> KASSERT removed. this is on a stable/10 kernel. >>>>=20 >>>> By the way Palle could you also run below Dtrace script to see = where >>>> this tcp_close() in INP_TIMEWAIT comes from: >>>>=20 >>>> $ cat tcp-close-tw.d >>>> fbt::tcp_close:entry >>>> /args[0]->t_inpcb->inp_flags & 0x01000000/ >>>> { >>>> @s1[stack()] =3D count() >>>> } >>>>=20 >>>> tick-1sec { >>>> printa(@s1); >>>> } >>>> $ sudo dtrace -s tcp-close-tw.d >>>>=20 >>>> And share any backtraces reported in this dtrace script output. >>>>=20 >>>> George, could you check if this dtrace script makes sense for you, = and >>>> if you have any improvements to add, I am quite a rookie in Dtrace = scripts. >>>=20 >>> Shall I let the dtrace script run continuously until the machine = crashes? Or just run it once? >>=20 >> Continuously until the machine crashes. You can report any = backstrace >> outputs like: >>=20 >> kernel`tcp_usr_close+0x86 >> kernel`soclose+0xe4 >> kernel`_fdrop+0x29 >> kernel`closef+0x237 >> kernel`closefp+0x95 >> kernel`amd64_syscall+0x357 >> kernel`0xffffffff80c83c4b >> 1 >>=20 >> before the machine crashes. But I expect the problematic case >> detection with Dtrace to be quickly followed by the crash. Will see. >>=20 >> -- >> Julien >>=20 >=20 > Kernels and userland are updated to 10.2-p3 with the patch removing = the suspicous KASSERT.=20 >=20 > dtrace running continously redirecting to a log file. >=20 > now we're just waiting... :) >=20 > Palle Is the dtrace correct? $ sort -u dtrace.out=20 0 59779 :tick-1sec=20 CPU ID FUNCTION:NAME $ wc -l dtrace.out=20 56233 dtrace.out All it does is write 0 59779 :tick-1sec=20 once a second. Just checking... :) Palle From owner-freebsd-net@freebsd.org Thu Sep 24 06:55:32 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C5587A07145 for ; Thu, 24 Sep 2015 06:55:32 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 218191A5E; Thu, 24 Sep 2015 06:55:31 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (localhost [127.0.0.1]) by mail.pingpong.net (Postfix) with ESMTP id B8B65C77A; Thu, 24 Sep 2015 08:55:28 +0200 (CEST) X-Virus-Scanned: by amavisd-new at pingpong.net Received: from mail.pingpong.net ([127.0.0.1]) by mail.pingpong.net (mail.pingpong.net [127.0.0.1]) (amavisd-new, port 10024) with LMTP id QpjzHylDOd0B; Thu, 24 Sep 2015 08:55:28 +0200 (CEST) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 88A02C775; Thu, 24 Sep 2015 08:55:28 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> Date: Thu, 24 Sep 2015 08:55:27 +0200 Cc: George Neville-Neil , freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 06:55:33 -0000 > 24 sep 2015 kl. 07:51 skrev Palle Girgensohn : >=20 >=20 >> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn : >>=20 >> Hi, >>=20 >>> 23 sep 2015 kl. 20:32 skrev Julien Charbon : >>>=20 >>>=20 >>> Hi, >>>=20 >>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon : >>>>> On 23/09/15 16:36, Palle Girgensohn wrote: >>>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon : On >>>>>>> 22/09/15 22:58, Palle Girgensohn wrote: >>>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon :=20= >>>>>>>>> On 22/09/15 18:49, Palle Girgensohn wrote: >>>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>>>>>> : >>>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn=20 >>>>>>>>>>>> : >>>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon=20 >>>>>>>>>>>>> : On 18/09/15 18:06, Konstantin >>>>>>>>>>>>> Belousov wrote: >>>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien >>>>>>>>>>>>>> Charbon wrote: >>>>>>>>>>>>>>> [...] >>>>>>>>>>>>> - Second, if issue is still in stable/10, compile 10.2=20 >>>>>>>>>>>>> kernel with these options: >>>>>>>>>>>>>=20 >>>>>>>>>>>>> options DDB options DEADLKRES options=20 >>>>>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options >>>>>>>>>>>>> WITNESS options WITNESS_SKIPSPIN >>>>>>>>>>>>>=20 >>>>>>>>>>>>> To see where the original fault is coming from. >>>>>>>>>>>> [...] >>>>>>>>>>>>=20 >>>>>>>>>>>> I'll try stable/10 now. Would you suggest a "clean"=20 >>>>>>>>>>>> stable/10, or could 287621 and 287780 help? >>>>>>>>>>>>=20 >>>>>>>>>>>> I'll add the debugging suggested options right away. >>>>>>>>>>>>=20 >>>>>>>>>>>> Palle >>>>>>>>>>>=20 >>>>>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>>>>>=20 >>>>>>>>>>> options DDB options DEADLKRES options >>>>>>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS >>>>>>>>>>> options WITNESS_SKIPSPIN >>>>>>>>>>=20 >>>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]=20 >>>>>>>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free=20 >>>>>>>>>> software, covered by the GNU General Public License, and you >>>>>>>>>> are welcome to change it and/or distribute copies of it under >>>>>>>>>> certain conditions. Type "show copying" to see the >>>>>>>>>> conditions. There is absolutely no warranty for GDB. Type >>>>>>>>>> "show warranty" for details. This GDB was configured as >>>>>>>>>> "amd64-marcel-freebsd"... >>>>>>>>>>=20 >>>>>>>>>> Unread portion of the kernel message buffer: panic: >>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL cpuid = =3D >>>>>>>>>> 16 KDB: stack backtrace: db_trace_self_wrapper() at=20 >>>>>>>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0=20 >>>>>>>>>> kdb_backtrace() at kdb_backtrace+0x39/frame >>>>>>>>>> 0xfffffe183d9e9890 vpanic() at vpanic+0x126/frame >>>>>>>>>> 0xfffffe183d9e98d0 kassert_panic() at >>>>>>>>>> kassert_panic+0x139/frame 0xfffffe183d9e9940 >>>>>>>>>> tcp_usr_detach() at tcp_usr_detach+0xf9/frame >>>>>>>>>> 0xfffffe183d9e9970 sofree() at sofree+0x1f1/frame >>>>>>>>>> 0xfffffe183d9e99a0 soclose() at soclose+0x3a0/frame >>>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame >>>>>>>>>> 0xfffffe183d9e9a10 closef() at closef+0x1e2/frame >>>>>>>>>> 0xfffffe183d9e9aa0 closefp() at closefp+0x9d/frame >>>>>>>>>> 0xfffffe183d9e9ae0 amd64_syscall() at=20 >>>>>>>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() >>>>>>>>>> at Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall >>>>>>>>>> (6, FreeBSD ELF64, sys_close), rip =3D 0x801c8d94a, rsp =3D=20= >>>>>>>>>> 0x7ffff91c8668, rbp =3D 0x7ffff91c8680 --- KDB: enter: panic=20= >>>>>>>>>> Uptime: 18h57m59s Dumping 23085 out of 98263=20 >>>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>>>>>=20 >>>>>>>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.=20= >>>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading >>>>>>>>>> symbols from /boot/kernel/zfs.ko.symbols...done. Loaded >>>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading symbols from >>>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols >>>>>>>>>> for /boot/kernel/opensolaris.ko.symbols Reading symbols from >>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for >>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for=20= >>>>>>>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for >>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for=20= >>>>>>>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for >>>>>>>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for >>>>>>>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for=20 >>>>>>>>>> /boot/kernel/ums.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for >>>>>>>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from=20 >>>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for=20 >>>>>>>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=3D1) at=20= >>>>>>>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=3Dr" (td) = (kgdb) bt >>>>>>>>>> #0 doadump (textdump=3D1) at pcpu.h:219 #1 = 0xffffffff8094b337 >>>>>>>>>> in kern_reboot (howto=3D260) at >>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff8094b845 >>>>>>>>>> in vpanic (fmt=3D, ap=3D>>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 >>>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>>>>>> out>) at /usr/src/sys/kern/kern_shutdown.c:646 #4 >>>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>>>>>> out>) at /usr/src/sys/netinet/tcp_usrreq.c:202 #5 >>>>>>>>>> 0xffffffff809cd291 in sofree (so=3D0xfffff801dd302000) at=20 >>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in >>>>>>>>>> soclose (so=3D) at=20 >>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in >>>>>>>>>> _fdrop (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8=20= >>>>>>>>>> 0xffffffff80901092 in closef (fp=3D0xfffff802a593db40,=20 >>>>>>>>>> td=3D0xfffff80eebc894a0) at >>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:2338 #9 0xffffffff808feb5d >>>>>>>>>> in closefp (fdp=3D0xfffff80b20cce000, fd=3D, >>>>>>>>>> fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0, >>>>>>>>>> holdleaders=3D) at=20 >>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a >>>>>>>>>> in amd64_syscall (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () >>>>>>>>>> at /usr/src/sys/amd64/amd64/exception.S:396 #12 >>>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner to this >>>>>>>>>> frame (corrupt stack?) Current language: auto; currently >>>>>>>>>> minimal >>>>>>>>>=20 >>>>>>>>> Thanks for the information. As I suspected the initial error >>>>>>>>> was elsewhere than tcp_twclose(), never got this assertion >>>>>>>>> before: >>>>>>>>>=20 >>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >>>>>>>>>=20 >>>>>>>>> from here: >>>>>>>>>=20 >>>>>>>>> static void tcp_detach(struct socket *so, struct inpcb *inp) { >>>>>>>>> struct tcpcb *tp; >>>>>>>>>=20 >>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp); >>>>>>>>>=20 >>>>>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D = inp"));=20 >>>>>>>>> KASSERT(inp->inp_socket =3D=3D so, ("tcp_detach: inp_socket !=3D= >>>>>>>>> so")); >>>>>>>>>=20 >>>>>>>>> tp =3D intotcpcb(inp); >>>>>>>>>=20 >>>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &=20 >>>>>>>>> INP_DROPPED) { KASSERT(tp =3D=3D NULL, ("tcp_detach: = INP_TIMEWAIT >>>>>>>>> && " "INP_DROPPED && tp !=3D NULL")); >>>>>>>>>=20 >>>>>>>>> Let me check if I could find a path that could lead to this=20 >>>>>>>>> unexpected case. Unexpected because: INP_DROPPED and=20 >>>>>>>>> inp->inp_ppcb is set to NULL are set at same time here: >>>>>>>>>=20 >>>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket >>>>>>>>> *so; struct inpcb *inp; >>>>>>>>>=20 >>>>>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),=20= >>>>>>>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) =3D=3D tw,=20 >>>>>>>>> ("tcp_twclose: inp_ppcb !=3D tw"));=20 >>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */=20 >>>>>>>>> INP_WLOCK_ASSERT(inp); >>>>>>>>>=20 >>>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL;=20 >>>>>>>>> in_pcbdrop(inp); ... >>>>>>>>>=20 >>>>>>>>> Interesting [...] >>>>>>=20 >>>>>> Just a quick update. Julien is pursuing this off list with a core >>>>>> dump and we are now waiting for a new core dump with the first >>>>>> KASSERT removed. this is on a stable/10 kernel. >>>>>=20 >>>>> By the way Palle could you also run below Dtrace script to see = where >>>>> this tcp_close() in INP_TIMEWAIT comes from: >>>>>=20 >>>>> $ cat tcp-close-tw.d >>>>> fbt::tcp_close:entry >>>>> /args[0]->t_inpcb->inp_flags & 0x01000000/ >>>>> { >>>>> @s1[stack()] =3D count() >>>>> } >>>>>=20 >>>>> tick-1sec { >>>>> printa(@s1); >>>>> } >>>>> $ sudo dtrace -s tcp-close-tw.d >>>>>=20 >>>>> And share any backtraces reported in this dtrace script output. >>>>>=20 >>>>> George, could you check if this dtrace script makes sense for you, = and >>>>> if you have any improvements to add, I am quite a rookie in Dtrace = scripts. >>>>=20 >>>> Shall I let the dtrace script run continuously until the machine = crashes? Or just run it once? >>>=20 >>> Continuously until the machine crashes. You can report any = backstrace >>> outputs like: >>>=20 >>> kernel`tcp_usr_close+0x86 >>> kernel`soclose+0xe4 >>> kernel`_fdrop+0x29 >>> kernel`closef+0x237 >>> kernel`closefp+0x95 >>> kernel`amd64_syscall+0x357 >>> kernel`0xffffffff80c83c4b >>> 1 >>>=20 >>> before the machine crashes. But I expect the problematic case >>> detection with Dtrace to be quickly followed by the crash. Will = see. >>>=20 >>> -- >>> Julien >>>=20 >>=20 >> Kernels and userland are updated to 10.2-p3 with the patch removing = the suspicous KASSERT.=20 >>=20 >> dtrace running continously redirecting to a log file. >>=20 >> now we're just waiting... :) >>=20 >> Palle >=20 > Is the dtrace correct? >=20 > $ sort -u dtrace.out=20 >=20 > 0 59779 :tick-1sec=20 > CPU ID FUNCTION:NAME > $ wc -l dtrace.out=20 > 56233 dtrace.out >=20 >=20 > All it does is write >=20 > 0 59779 :tick-1sec=20 >=20 > once a second. >=20 > Just checking... :) >=20 > Palle Just had a crash. Unfortunately, the kernel was stuck at the db> prompt, = and the remote keyboard was unresponsive (HP ILO, not impressed). So I = had to reset the power and never got a core dump... :( panic: tcp_tw_2msl_stop: inp should not be released here = =20 cpuid =3D 0 = =20 KDB: stack backtrace: = =20 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame = 0xfffffe175acd16a0 =20 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe175acd1750 = =20 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790 = =20 kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 = =20 tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 = =20 tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame 0xfffffe175acd1890 = =20 tcp_slowtimo() at tcp_slowtimo+0x68/frame 0xfffffe175acd18c0 = =20 pfslowtimo() at pfslowtimo+0x54/frame 0xfffffe175acd18f0 = =20 softclock_call_cc() at softclock_call_cc+0x193/frame 0xfffffe175acd19d0 = =20 softclock() at softclock+0x47/frame 0xfffffe175acd19f0 = =20 intr_event_execute_handlers() at intr_event_execute_handlers+0x93/frame = 0xfffffe 175acd1a30 = =20 ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 = =20 fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 = =20 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 = =20 --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- = =20 KDB: enter: panic = =20 [ thread pid 12 tid 100043 ] = =20 Stopped at kdb_enter+0x3e: movq $0,kdb_why = =20 db> = =20 Is there a way to configure the kernel to get all the gory debug stuff = without it dropping to the debug prompt on panic? I'd rather see it just = dump core and restart automatically. Palle From owner-freebsd-net@freebsd.org Thu Sep 24 07:03:42 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 23D60A07545 for ; Thu, 24 Sep 2015 07:03:42 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-qg0-f48.google.com (mail-qg0-f48.google.com [209.85.192.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C577D1E25; Thu, 24 Sep 2015 07:03:41 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by qgez77 with SMTP id z77so37964137qge.1; Thu, 24 Sep 2015 00:03:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=qOsCQTcDY7xbFE107RR7IbEdfSdjVTi38Wc72j+htqc=; b=lYNKZ8R6SOkHzQ9O5OL3Uiok1yNV9XS0Ml1bpLBdOpGoeKTEu7GU2eiwJYNG2AWuuR gDKNU5OxADIaYptg6QDe0LmG3yZqMIM0LwMKZdS95yEdgVDL2ZCtQSElk2ydJcDKOb5C 5t7J6SwReYUqcHET05NH6re+KVQAKMJSChUfdIT2IV0nidho2e17KNvxrWzvoeq/dCna PWQK3+DX86MAsWL3Y0tHR+m8nMbwj/0LgSbqyvGagddE5AWn+0KWceO47b0BA4DKJ9YB vO+nkBk0vUvDOMclMwQFYg7Hu8O1Kg1r3PniR2U+cw0SFSdl+5KocKXNdnllma/KH7if xGBQ== X-Received: by 10.140.237.207 with SMTP id i198mr43834452qhc.45.1443078214852; Thu, 24 Sep 2015 00:03:34 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id b39sm1032484qkb.13.2015.09.24.00.03.32 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 Sep 2015 00:03:34 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> Cc: freebsd-net@freebsd.org From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <5603A03B.4060002@freebsd.org> Date: Thu, 24 Sep 2015 09:03:23 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 07:03:42 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Palle, On 24/09/15 08:55, Palle Girgensohn wrote: >> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >> : >>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>> : >>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon :=20 >>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>>> 23 sep. 2015 kl. 20:01 skrev Julien Charbon >>>>>> : On 23/09/15 16:36, Palle Girgensohn >>>>>> wrote: >>>>>>>> 22 sep 2015 kl. 23:59 skrev Julien Charbon >>>>>>>> : On 22/09/15 22:58, Palle Girgensohn >>>>>>>> wrote: >>>>>>>>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon >>>>>>>>>> : On 22/09/15 18:49, Palle >>>>>>>>>> Girgensohn wrote: >>>>>>>>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn=20 >>>>>>>>>>>> : >>>>>>>>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn >>>>>>>>>>>>> : >>>>>>>>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon >>>>>>>>>>>>>> : On 18/09/15 18:06, >>>>>>>>>>>>>> Konstantin Belousov wrote: >>>>>>>>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, >>>>>>>>>>>>>>> Julien Charbon wrote: >>>>>>>>>>>>>>>> [...] >>>>>>>>>>>>>> - Second, if issue is still in stable/10, >>>>>>>>>>>>>> compile 10.2 kernel with these options: >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> options DDB options DEADLKRES >>>>>>>>>>>>>> options INVARIANTS options >>>>>>>>>>>>>> INVARIANT_SUPPORT options WITNESS options >>>>>>>>>>>>>> WITNESS_SKIPSPIN >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> To see where the original fault is coming >>>>>>>>>>>>>> from. >>>>>>>>>>>>> [...] >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I'll try stable/10 now. Would you suggest a >>>>>>>>>>>>> "clean" stable/10, or could 287621 and 287780 >>>>>>>>>>>>> help? >>>>>>>>>>>>>=20 >>>>>>>>>>>>> I'll add the debugging suggested options >>>>>>>>>>>>> right away. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Palle >>>>>>>>>>>>=20 >>>>>>>>>>>> I have a new core dump from ^/stable/10 with: >>>>>>>>>>>>=20 >>>>>>>>>>>> options DDB options DEADLKRES >>>>>>>>>>>> options INVARIANTS options >>>>>>>>>>>> INVARIANT_SUPPORT options WITNESS options >>>>>>>>>>>> WITNESS_SKIPSPIN >>>>>>>>>>>=20 >>>>>>>>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 >>>>>>>>>>> [FreeBSD] Copyright 2004 Free Software >>>>>>>>>>> Foundation, Inc. GDB is free software, covered by >>>>>>>>>>> the GNU General Public License, and you are >>>>>>>>>>> welcome to change it and/or distribute copies of >>>>>>>>>>> it under certain conditions. Type "show copying" >>>>>>>>>>> to see the conditions. There is absolutely no >>>>>>>>>>> warranty for GDB. Type "show warranty" for >>>>>>>>>>> details. This GDB was configured as=20 >>>>>>>>>>> "amd64-marcel-freebsd"... >>>>>>>>>>>=20 >>>>>>>>>>> Unread portion of the kernel message buffer: >>>>>>>>>>> panic: tcp_detach: INP_TIMEWAIT && INP_DROPPED && >>>>>>>>>>> tp !=3D NULL cpuid =3D 16 KDB: stack backtrace: >>>>>>>>>>> db_trace_self_wrapper() at=20 >>>>>>>>>>> db_trace_self_wrapper+0x2b/frame >>>>>>>>>>> 0xfffffe183d9e97e0 kdb_backtrace() at >>>>>>>>>>> kdb_backtrace+0x39/frame 0xfffffe183d9e9890 >>>>>>>>>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 >>>>>>>>>>> kassert_panic() at kassert_panic+0x139/frame >>>>>>>>>>> 0xfffffe183d9e9940 tcp_usr_detach() at >>>>>>>>>>> tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 >>>>>>>>>>> sofree() at sofree+0x1f1/frame 0xfffffe183d9e99a0 >>>>>>>>>>> soclose() at soclose+0x3a0/frame=20 >>>>>>>>>>> 0xfffffe183d9e99f0 _fdrop() at _fdrop+0x29/frame=20 >>>>>>>>>>> 0xfffffe183d9e9a10 closef() at >>>>>>>>>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() >>>>>>>>>>> at closefp+0x9d/frame 0xfffffe183d9e9ae0 >>>>>>>>>>> amd64_syscall() at amd64_syscall+0x25a/frame >>>>>>>>>>> 0xfffffe183d9e9bf0 Xfast_syscall() at >>>>>>>>>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- >>>>>>>>>>> syscall (6, FreeBSD ELF64, sys_close), rip =3D >>>>>>>>>>> 0x801c8d94a, rsp =3D 0x7ffff91c8668, rbp =3D >>>>>>>>>>> 0x7ffff91c8680 --- KDB: enter: panic Uptime: >>>>>>>>>>> 18h57m59s Dumping 23085 out of 98263=20 >>>>>>>>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>=20 Reading symbols from /boot/kernel/nullfs.ko.symbols...done. >>>>>>>>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols >>>>>>>>>>> Reading symbols from >>>>>>>>>>> /boot/kernel/zfs.ko.symbols...done. Loaded=20 >>>>>>>>>>> symbols for /boot/kernel/zfs.ko.symbols Reading >>>>>>>>>>> symbols from=20 >>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols...done. >>>>>>>>>>> Loaded symbols for >>>>>>>>>>> /boot/kernel/opensolaris.ko.symbols Reading >>>>>>>>>>> symbols from=20 >>>>>>>>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_bridge.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/netgraph.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_eiface.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_ether.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/accf_data.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/accf_http.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/ums.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ums.ko.symbols Reading >>>>>>>>>>> symbols from=20 >>>>>>>>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/ng_socket.ko.symbols >>>>>>>>>>> Reading symbols from=20 >>>>>>>>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded >>>>>>>>>>> symbols for /boot/kernel/fdescfs.ko.symbols #0 >>>>>>>>>>> doadump (textdump=3D1) at pcpu.h:219 219 >>>>>>>>>>> __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0 >>>>>>>>>>> doadump (textdump=3D1) at pcpu.h:219 #1 >>>>>>>>>>> 0xffffffff8094b337 in kern_reboot (howto=3D260) at=20 >>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:451 #2 >>>>>>>>>>> 0xffffffff8094b845 in vpanic (fmt=3D>>>>>>>>>> optimized out>, ap=3D) at >>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:758 #3=20 >>>>>>>>>>> 0xffffffff8094b6d9 in kassert_panic (fmt=3D>>>>>>>>>> optimized out>) at >>>>>>>>>>> /usr/src/sys/kern/kern_shutdown.c:646 #4=20 >>>>>>>>>>> 0xffffffff80b1ee59 in tcp_usr_detach (so=3D>>>>>>>>>> optimized out>) at >>>>>>>>>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5=20 >>>>>>>>>>> 0xffffffff809cd291 in sofree >>>>>>>>>>> (so=3D0xfffff801dd302000) at=20 >>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 >>>>>>>>>>> 0xffffffff809cdb00 in soclose (so=3D>>>>>>>>>> optimized out>) at=20 >>>>>>>>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 >>>>>>>>>>> 0xffffffff808fe659 in _fdrop >>>>>>>>>>> (fp=3D0xfffff802a593db40, td=3D0x0) at file.h:343 #8 >>>>>>>>>>> 0xffffffff80901092 in closef >>>>>>>>>>> (fp=3D0xfffff802a593db40, td=3D0xfffff80eebc894a0) >>>>>>>>>>> at /usr/src/sys/kern/kern_descrip.c:2338 #9 >>>>>>>>>>> 0xffffffff808feb5d in closefp >>>>>>>>>>> (fdp=3D0xfffff80b20cce000, fd=3D>>>>>>>>>> out>, fp=3D0xfffff802a593db40, >>>>>>>>>>> td=3D0xfffff80eebc894a0, holdleaders=3D>>>>>>>>>> optimized out>) at=20 >>>>>>>>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 >>>>>>>>>>> 0xffffffff80d7bc3a in amd64_syscall >>>>>>>>>>> (td=3D0xfffff80eebc894a0, traced=3D0) at=20 >>>>>>>>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in >>>>>>>>>>> Xfast_syscall () at >>>>>>>>>>> /usr/src/sys/amd64/amd64/exception.S:396 #12=20 >>>>>>>>>>> 0x0000000801c8d94a in ?? () Previous frame inner >>>>>>>>>>> to this frame (corrupt stack?) Current language: >>>>>>>>>>> auto; currently minimal >>>>>>>>>>=20 >>>>>>>>>> Thanks for the information. As I suspected the >>>>>>>>>> initial error was elsewhere than tcp_twclose(), >>>>>>>>>> never got this assertion before: >>>>>>>>>>=20 >>>>>>>>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp !=3D >>>>>>>>>> NULL >>>>>>>>>>=20 >>>>>>>>>> from here: >>>>>>>>>>=20 >>>>>>>>>> static void tcp_detach(struct socket *so, struct >>>>>>>>>> inpcb *inp) { struct tcpcb *tp; >>>>>>>>>>=20 >>>>>>>>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >>>>>>>>>> INP_WLOCK_ASSERT(inp); >>>>>>>>>>=20 >>>>>>>>>> KASSERT(so->so_pcb =3D=3D inp, ("tcp_detach: so_pcb !=3D >>>>>>>>>> inp")); KASSERT(inp->inp_socket =3D=3D so, >>>>>>>>>> ("tcp_detach: inp_socket !=3D so")); >>>>>>>>>>=20 >>>>>>>>>> tp =3D intotcpcb(inp); >>>>>>>>>>=20 >>>>>>>>>> if (inp->inp_flags & INP_TIMEWAIT) { if >>>>>>>>>> (inp->inp_flags & INP_DROPPED) { KASSERT(tp =3D=3D >>>>>>>>>> NULL, ("tcp_detach: INP_TIMEWAIT && " "INP_DROPPED >>>>>>>>>> && tp !=3D NULL")); >>>>>>>>>>=20 >>>>>>>>>> Let me check if I could find a path that could lead >>>>>>>>>> to this unexpected case. Unexpected because: >>>>>>>>>> INP_DROPPED and inp->inp_ppcb is set to NULL are >>>>>>>>>> set at same time here: >>>>>>>>>>=20 >>>>>>>>>> void tcp_twclose(struct tcptw *tw, int reuse) { >>>>>>>>>> struct socket *so; struct inpcb *inp; >>>>>>>>>>=20 >>>>>>>>>> inp =3D tw->tw_inpcb; KASSERT((inp->inp_flags & >>>>>>>>>> INP_TIMEWAIT), ("tcp_twclose: !timewait")); >>>>>>>>>> KASSERT(intotw(inp) =3D=3D tw, ("tcp_twclose: inp_ppcb >>>>>>>>>> !=3D tw")); INP_INFO_WLOCK_ASSERT(&V_tcbinfo); >>>>>>>>>> /* in_pcbfree() */ INP_WLOCK_ASSERT(inp); >>>>>>>>>>=20 >>>>>>>>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL; >>>>>>>>>> in_pcbdrop(inp); ... >>>>>>>>>>=20 >>>>>>>>>> Interesting [...] >>>>>>>=20 >>>>>>> Just a quick update. Julien is pursuing this off list >>>>>>> with a core dump and we are now waiting for a new core >>>>>>> dump with the first KASSERT removed. this is on a >>>>>>> stable/10 kernel. >>>>>>=20 >>>>>> By the way Palle could you also run below Dtrace script to >>>>>> see where this tcp_close() in INP_TIMEWAIT comes from: >>>>>>=20 >>>>>> $ cat tcp-close-tw.d fbt::tcp_close:entry=20 >>>>>> /args[0]->t_inpcb->inp_flags & 0x01000000/ { @s1[stack()] =3D >>>>>> count() } >>>>>>=20 >>>>>> tick-1sec { printa(@s1); } $ sudo dtrace -s tcp-close-tw.d >>>>>>=20 >>>>>> And share any backtraces reported in this dtrace script >>>>>> output. >>>>>>=20 >>>>>> George, could you check if this dtrace script makes sense >>>>>> for you, and if you have any improvements to add, I am >>>>>> quite a rookie in Dtrace scripts. >>>>>=20 >>>>> Shall I let the dtrace script run continuously until the >>>>> machine crashes? Or just run it once? >>>>=20 >>>> Continuously until the machine crashes. You can report any >>>> backstrace outputs like: >>>>=20 >>>> kernel`tcp_usr_close+0x86 kernel`soclose+0xe4=20 >>>> kernel`_fdrop+0x29 kernel`closef+0x237 kernel`closefp+0x95=20 >>>> kernel`amd64_syscall+0x357 kernel`0xffffffff80c83c4b 1 >>>>=20 >>>> before the machine crashes. But I expect the problematic >>>> case detection with Dtrace to be quickly followed by the crash. >>>> Will see. >>>>=20 >>>> -- Julien >>>>=20 >>>=20 >>> Kernels and userland are updated to 10.2-p3 with the patch >>> removing the suspicous KASSERT. >>>=20 >>> dtrace running continously redirecting to a log file. >>>=20 >>> now we're just waiting... :) >>>=20 >>> Palle >>=20 >> Is the dtrace correct? >>=20 >> $ sort -u dtrace.out >>=20 >> 0 59779 :tick-1sec CPU ID >> FUNCTION:NAME $ wc -l dtrace.out 56233 dtrace.out >>=20 >>=20 >> All it does is write >>=20 >> 0 59779 :tick-1sec >>=20 >> once a second. It is right, it tries to display the guilty backstrace every second, but this is a rare event. 0 59779 :tick-1sec is printed when you have no backstrace yet. > Just had a crash. Unfortunately, the kernel was stuck at the db> > prompt, and the remote keyboard was unresponsive (HP ILO, not > impressed). So I had to reset the power and never got a core dump... >=20 > panic: tcp_tw_2msl_stop: inp should not be released here > cpuid =3D 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame > 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790 > kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 > tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 > tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame > 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame > 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame > 0xfffffe175acd18f0 softclock_call_cc() at > softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at > softclock+0x47/frame 0xfffffe175acd19f0 intr_event_execute_handlers() > at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 > ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 > fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 > --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- > KDB: enter: panic > [ thread pid 12 tid 100043 ] > Stopped at kdb_enter+0x3e: movq $0,kdb_why > db> Thanks a log for this backstrace. This is what at expected, when tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called one extra time that leads to: tcp_tw_2msl_stop: inp should not be released here Let me try to come with a tentative fix for this case. > Is there a way to configure the kernel to get all the gory debug > stuff without it dropping to the debug prompt on panic? I'd rather > see it just dump core and restart automatically. You can set: options KDB_UNATTENDED or debug.debugger_on_panic=3D0 in /etc/sysctl.conf both are equivalent. More details here: https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-opti= ons.html Thanks. -- Julien --B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWA6BBAAoJEKVlQ5Je6dhx5i4IALr7OhuK7NvVESFUK+9t7ygo lluCrPhEA+z/t1t2C4HutWnVEucLW9Kh0lBjg2qodcvVHVl0wZHN+T1IySaIfhbf dFAxUcWAS3xGQgsPwSuT4UU1uFmqrYtmhOmt0GJVmywLzfKN4AL9toK7uQKknHzW HZM0K+vbzT1pz8RrD+DBSrQrWTrDk1adF94oi5Z+01MssGZ0ljcyeStsxba3Wg3N 9pYKfSSXKdld6qpjdj2di6EmevS12LVyUGt755ke9Bd1r09DXhFbZ2hXkDf+NQAN QQswXp+/DGVBXtc8L+AiAJhuu8oPUqWc38jdNe0HGRa7naEsdgLWC4OkSchzwcE= =msZs -----END PGP SIGNATURE----- --B3Hv69f1vfCN4351XBs8QWLSu6hqAaUQ1-- From owner-freebsd-net@freebsd.org Thu Sep 24 07:58:00 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6AB18A08CF3 for ; Thu, 24 Sep 2015 07:58:00 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E6A3E1376; Thu, 24 Sep 2015 07:57:59 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wicge5 with SMTP id ge5so240029756wic.0; Thu, 24 Sep 2015 00:57:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=UkKh9D9MO9F+jbcG4OYPkYN438nuf7qv9paW7Uyueos=; b=EEb9cya+eY9qv05RAYZemOxvu2rlHUjeGx2lrMvQpOtSTJrfEOtuHh0QMUiP9CTRT8 SMFSqFcI4XFP3e0AzkmqjgTThwycKqcmjhORgqNwAmttQJv7z8ug8r8doujGgTQIbVUa fN/bos5fjnrgTBFt/CcEj6tWRroI8hKt1WYP5LTFRfjOml0ULAkoFx9Y1C7AfCzU8upg yaH/9NGu310mzYio3XqFGjf1lXaxQv8+9I+SQ1QaNJqKbM9s40IVY801D7DjQcsZh8dE Shv0bE+7nKf6hhiacwlqTzJEYQvJfXchpk7MNmIc/7wqURfYb/iFYhWOSei9n3tSLTRT 26cg== X-Received: by 10.180.8.164 with SMTP id s4mr8856876wia.5.1443081471762; Thu, 24 Sep 2015 00:57:51 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id mx19sm12368757wic.0.2015.09.24.00.57.50 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 24 Sep 2015 00:57:51 -0700 (PDT) Subject: Re: Kernel panics in tcp_twclose To: Palle Girgensohn References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> Cc: freebsd-net@freebsd.org From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <5603ACF7.7040403@freebsd.org> Date: Thu, 24 Sep 2015 09:57:43 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <5603A03B.4060002@freebsd.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="2xKnTLrEPFmUogFPILk9iCAhuv6Q8h3jH" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 07:58:00 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --2xKnTLrEPFmUogFPILk9iCAhuv6Q8h3jH Content-Type: multipart/mixed; boundary="------------010008010008030404040209" This is a multi-part message in MIME format. --------------010008010008030404040209 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi -net, On 24/09/15 09:03, Julien Charbon wrote: > On 24/09/15 08:55, Palle Girgensohn wrote: >>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>> : >>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>> : >>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon :=20 >>>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>> Kernels and userland are updated to 10.2-p3 with the patch >>>> removing the suspicous KASSERT. >>>> dtrace running continously redirecting to a log file. >> Just had a crash. Unfortunately, the kernel was stuck at the db> >> prompt, and the remote keyboard was unresponsive (HP ILO, not >> impressed). So I had to reset the power and never got a core dump... >> >> panic: tcp_tw_2msl_stop: inp should not be released here >> cpuid =3D 0 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame >> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790 >> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 >> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 >> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame >> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame >> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame >> 0xfffffe175acd18f0 softclock_call_cc() at >> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at >> softclock+0x47/frame 0xfffffe175acd19f0 intr_event_execute_handlers() >> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 >> --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- >> KDB: enter: panic >> [ thread pid 12 tid 100043 ] >> Stopped at kdb_enter+0x3e: movq $0,kdb_why >> db> >=20 > Thanks a log for this backstrace. This is what at expected, when > tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called on= e > extra time that leads to: >=20 > tcp_tw_2msl_stop: inp should not be released here >=20 > Let me try to come with a tentative fix for this case. See joined my tentative patch for these case. It is only a first tentative patch as I am still waiting on -net feedbacks on what should be the rule here. By the way: - I see nothing specific to VIMAGE here - Anyone aware of tcp_close() (or tcp_drop()) calls modified/introduced recently in 10.2 that could explained why this issue only appears only no= w? -- Julien --------------010008010008030404040209 Content-Type: text/plain; charset=UTF-8; name="tcp-close-fix-v1.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="tcp-close-fix-v1.patch" diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c index be9e0e7..4379e19 100644 --- a/sys/netinet/tcp_usrreq.c +++ b/sys/netinet/tcp_usrreq.c @@ -199,10 +199,11 @@ tcp_detach(struct socket *so, struct inpcb *inp) * In all three cases the tcptw should not be freed here. */ if (inp->inp_flags & INP_DROPPED) { - KASSERT(tp =3D=3D NULL, ("tcp_detach: INP_TIMEWAIT && " - "INP_DROPPED && tp !=3D NULL")); in_pcbdetach(inp); - in_pcbfree(inp); + if (tp =3D=3D NULL) + in_pcbfree(inp); + else + INP_WUNLOCK(inp); } else { in_pcbdetach(inp); INP_WUNLOCK(inp); --------------010008010008030404040209-- --2xKnTLrEPFmUogFPILk9iCAhuv6Q8h3jH Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWA6z/AAoJEKVlQ5Je6dhx6LUIAK7m6qHVcm75uthiiz44kUq6 qt5OaGmCvvRKuCY1czSXGScQKf6SdQ0njIh8Mu3ZgSUaq6rEI0hB5XWf73vu48Cm 5U41urew/qp3myahlpYn4qrTRr+hO7tFrmQpXkHW31T/a7oAIYE/F+t35P7pxQWI 2HcUcjYkwqShTFAlonSqof5mBRX8YquFnQ0BQ3Jmi80wYoO0eBiZJE2ut3BhSWE0 YkVjui1eoPpxoMuwy2KCuFF72GrhJBJe+NL30lR5W/FhJQu1tf1Yp4eqcANAjn+J ZyYYu3Zt+JFFaQfJabTJIQkQBIvZx0E79f1iJIXbge7a7vq8megbfTRpUPAgQ0I= =/wdk -----END PGP SIGNATURE----- --2xKnTLrEPFmUogFPILk9iCAhuv6Q8h3jH-- From owner-freebsd-net@freebsd.org Thu Sep 24 08:04:54 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BAF0BA071C8 for ; Thu, 24 Sep 2015 08:04:54 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id DBA2B1928; Thu, 24 Sep 2015 08:04:53 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id BEB3ED4D1; Thu, 24 Sep 2015 10:04:52 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <5603ACF7.7040403@freebsd.org> Date: Thu, 24 Sep 2015 10:04:52 +0200 Cc: freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> <5603ACF7.7040403@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 08:04:54 -0000 > 24 sep 2015 kl. 09:57 skrev Julien Charbon : >=20 >=20 > Hi -net, >=20 > On 24/09/15 09:03, Julien Charbon wrote: >> On 24/09/15 08:55, Palle Girgensohn wrote: >>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>>> : >>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>>> : >>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon :=20 >>>>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>> Kernels and userland are updated to 10.2-p3 with the patch >>>>> removing the suspicous KASSERT. >>>>> dtrace running continously redirecting to a log file. >>> Just had a crash. Unfortunately, the kernel was stuck at the db> >>> prompt, and the remote keyboard was unresponsive (HP ILO, not >>> impressed). So I had to reset the power and never got a core dump... >>>=20 >>> panic: tcp_tw_2msl_stop: inp should not be released here >>> cpuid =3D 0 >>> KDB: stack backtrace: >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame >>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790 >>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 >>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 >>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame >>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame >>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame >>> 0xfffffe175acd18f0 softclock_call_cc() at >>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at >>> softclock+0x47/frame 0xfffffe175acd19f0 = intr_event_execute_handlers() >>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 >>> --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- >>> KDB: enter: panic >>> [ thread pid 12 tid 100043 ] >>> Stopped at kdb_enter+0x3e: movq $0,kdb_why >>> db> >>=20 >> Thanks a log for this backstrace. This is what at expected, when >> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called = one >> extra time that leads to: >>=20 >> tcp_tw_2msl_stop: inp should not be released here >>=20 >> Let me try to come with a tentative fix for this case. >=20 > See joined my tentative patch for these case. It is only a first > tentative patch as I am still waiting on -net feedbacks on what should > be the rule here. >=20 > By the way: >=20 > - I see nothing specific to VIMAGE here >=20 We only see the probem with VIMAGE kernels and we see it on all VIMAGE = kernels that we have a reasonable amount of load. For us, it started i = August. It could be due to more load after the quiet summer (or system = is used somewhat seasonal) or it could be due to some package update in = userland that changed and triggered the bug. We cannot find anything = that would clearly explain why it started right now. > - Anyone aware of tcp_close() (or tcp_drop()) calls = modified/introduced > recently in 10.2 that could explained why this issue only appears only = now? We started by backing kernels as far as releng/10.1 from January, so the = problem (OK, might not be the same reason, but at least the same crash = pattern) was definitely there already in 10.1 i January. Palle From owner-freebsd-net@freebsd.org Thu Sep 24 09:39:14 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6A744A08464 for ; Thu, 24 Sep 2015 09:39:14 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id EA2A21A90; Thu, 24 Sep 2015 09:39:13 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from [10.0.1.12] (h-155-4-74-242.na.cust.bahnhof.se [155.4.74.242]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 34E0DEFE6; Thu, 24 Sep 2015 11:39:08 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <5603ACF7.7040403@freebsd.org> Date: Thu, 24 Sep 2015 11:39:08 +0200 Cc: freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <97E97774-842B-440A-BBA4-808FF821EC98@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> <5603ACF7.7040403@freebsd.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 09:39:14 -0000 > 24 sep 2015 kl. 09:57 skrev Julien Charbon : >=20 >=20 > Hi -net, >=20 > On 24/09/15 09:03, Julien Charbon wrote: >> On 24/09/15 08:55, Palle Girgensohn wrote: >>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>>> : >>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>>> : >>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon :=20 >>>>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>> Kernels and userland are updated to 10.2-p3 with the patch >>>>> removing the suspicous KASSERT. >>>>> dtrace running continously redirecting to a log file. >>> Just had a crash. Unfortunately, the kernel was stuck at the db> >>> prompt, and the remote keyboard was unresponsive (HP ILO, not >>> impressed). So I had to reset the power and never got a core dump... >>>=20 >>> panic: tcp_tw_2msl_stop: inp should not be released here >>> cpuid =3D 0 >>> KDB: stack backtrace: >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame >>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame 0xfffffe175acd1790 >>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 >>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 >>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame >>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame >>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame >>> 0xfffffe175acd18f0 softclock_call_cc() at >>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at >>> softclock+0x47/frame 0xfffffe175acd19f0 = intr_event_execute_handlers() >>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 >>> --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- >>> KDB: enter: panic >>> [ thread pid 12 tid 100043 ] >>> Stopped at kdb_enter+0x3e: movq $0,kdb_why >>> db> >>=20 >> Thanks a log for this backstrace. This is what at expected, when >> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called = one >> extra time that leads to: >>=20 >> tcp_tw_2msl_stop: inp should not be released here >>=20 >> Let me try to come with a tentative fix for this case. >=20 > See joined my tentative patch for these case. It is only a first > tentative patch as I am still waiting on -net feedbacks on what should > be the rule here. >=20 > By the way: >=20 > - I see nothing specific to VIMAGE here >=20 > - Anyone aware of tcp_close() (or tcp_drop()) calls = modified/introduced > recently in 10.2 that could explained why this issue only appears only = now? >=20 > -- > Julien > Running a machine with the patch now (it just crashed and rebooted with = the new kernel). Hoping it will have a "soothing" effect... ;-) dtrace running as previously. No output yet, though. Palle From owner-freebsd-net@freebsd.org Thu Sep 24 09:45:07 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72007A0874F for ; Thu, 24 Sep 2015 09:45:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5DB331E2F for ; Thu, 24 Sep 2015 09:45:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8O9j7SL065365 for ; Thu, 24 Sep 2015 09:45:07 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203175] Daily kernel crashes in tcp_twclose
on 10.2-p2 using VIMAGE Date: Thu, 24 Sep 2015 09:45:07 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: jch@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 09:45:07 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203175 --- Comment #11 from Julien Charbon --- Created attachment 161327 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=161327&action=edit First tentative patch for this issue Waiting to see if it improves things, and -net feedbacks to get a better overview of what is going on here. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Thu Sep 24 12:13:42 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D7FCA0768D for ; Thu, 24 Sep 2015 12:13:42 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D4C491AE5; Thu, 24 Sep 2015 12:13:41 +0000 (UTC) (envelope-from julien.charbon@gmail.com) Received: by wiclk2 with SMTP id lk2so109917343wic.1; Thu, 24 Sep 2015 05:13:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type; bh=XrVXqTYIWB7Im2kgYHlf9E0b/9YKaPrKQ4l4483bHN8=; b=bKic0QcMGvIBJ74Y5ZJFs1+kDs4XUsry1i7C7BtYQNaitK6VtLyYfQNoNZFowK70he Z2qZeCOY0VOItKkNDaxYyc2QYSmJRJP2k/OMqDKIiDBmq4u/+YBjWukbJg21vX3t37gY YrxUOJMwKfetRv1GKo/7ENwYxLx62kAUTYPiI2fkQydJp4n/sEO50x+GE8ORj88v6gFM D7pAhe56IwmKiW50O/Y3uQQ1rJ2WaqbY9t/0Pijn0BZJkBWTc+8U7LlUEG1U2i1T04BH yaaClRTvIPsib6DbzEjvc2h/gxw4z392AjcSghsR4Sb/nC5ioJzAKAqoc9O+bwEo1D5F yEEQ== X-Received: by 10.194.58.177 with SMTP id s17mr48548382wjq.102.1443096814178; Thu, 24 Sep 2015 05:13:34 -0700 (PDT) Received: from FRI2JCHARBON-M1.local ([217.30.88.7]) by smtp.googlemail.com with ESMTPSA id x9sm9625989wjf.44.2015.09.24.05.13.32 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 24 Sep 2015 05:13:33 -0700 (PDT) Subject: Re: Can tcp_close() be called in INP_TIMEWAIT case To: Palle Girgensohn , John Baldwin , George Neville-Neil References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602BB7A.9010504@freebsd.org> Cc: Konstantin Belousov , freebsd-net@freebsd.org From: Julien Charbon X-Enigmail-Draft-Status: N1110 Message-ID: <5603E8E4.5030406@freebsd.org> Date: Thu, 24 Sep 2015 14:13:24 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <5602BB7A.9010504@freebsd.org> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="jf9xFk6R77gtURuCKOu9XtufFhCKG4dl3" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 12:13:42 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --jf9xFk6R77gtURuCKOu9XtufFhCKG4dl3 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi -net, On 23/09/15 16:47, Julien Charbon wrote: > Thanks to Palle, I got access to the kernel dump. And the results is > more interesting than expected: Thus somehow the kernel reaches a stat= e > in tcp_detach() where: >=20 > INP_TIMEWAIT && INP_DROPPED && tp !=3D NULL >=20 > In details: >=20 > - inp is in TIMEWAIT state > - inp has been dropped by in_pcbdrop() > - inp->inp_ppcb (a struct tcptw) is not NULL >=20 > All the related structures looks good from the coredump: socket, inp,= > and tcptw, thus no sign of any memory corruption (so far). >=20 > And for the kernel, this state it is _not_ ok. Hopefully, there are > only two functions that set the INP_DROPPED flags: >=20 > - tcp_twclose() and, > - tcp_close() >=20 > If tcp_twclose() is called inp->inp_ppcb is set to NULL and the struct= > tcptw is freed (all good, not assertion) >=20 > If tcp_close() is called inp->inp_ppcb is left untouched (less ok, > potential assertion) >=20 > Almost all tcp_close() calls (or tcp_close() parents calls) use a > pattern like: >=20 > if (inp->inp_flags & INP_TIMEWAIT) { > /* Don't call tcp_close() just return */ > return; > } >=20 > /* Call tcp_close() */ > tcp_close(); >=20 > But not _all_ tcp_close() calls. >=20 > Thus the most important point here is: Either this assertion is wrong= , > either tcp_close() in INP_TIMEWAIT state should not happen. >=20 > This assert and tcp_close() current behavior is here since a long time= , > thus I would like old beards^W^W^W more experimented TCP stack > developers to give an opinion/refresh theirs memories on this very > specific case. So the issue is: - tcp_close() is called for some reasons with an inp in INP_TIMEWAIT state and sets the INP_DROPPED flag, - tcp_detach() is called when the last reference on socket is dropped then now in_pcbfree() can be called twice instead of once: 1. First in tcp_detach(): static void tcp_detach(struct socket *so, struct inpcb *inp) { struct tcpcb *tp; tp =3D intotcpcb(inp); if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags & INP_DROPPED) { in_pcbdetach(inp); in_pcbfree(inp); <-- } 2. Second when tcptw expires here: void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so; struct inpcb *inp; inp =3D tw->tw_inpcb; tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb =3D NULL; in_pcbdrop(inp); so =3D inp->inp_socket; if (so !=3D NULL) { ... } else { in_pcbfree(inp); <-- } This behavior is backed by Palle kernel panic backstraces and coredumps.= o Solutions: Long: Forbid to call tcp_close() when inp is in INP_TIMEWAIT state, the TCP stack rule being: - if !INP_TIMEWAIT: Call tcp_close() - if INP_TIMEWAIT: Call tcp_twclose() (or call nothing, the tcptw will expire/be recycled anyway) Short: if INP_TIMEWAIT & INP_DROPPED: Do not call in_pcbfree(inp) in tcp_detach() unless tcptw is already discarded. The long solution seems cleaner, backed by tcp_detach() old comments and most of current tcp_close() calls but I won't take that longer path without -net approval first. Thanks. -- Julien "For every complex problem there is an answer that is clear, simple, and wrong" -- H. L. Mencken --jf9xFk6R77gtURuCKOu9XtufFhCKG4dl3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJWA+jtAAoJEKVlQ5Je6dhxh8QIAIkArsWyFAWKuLHEiQe+SYso 7YnxipNNTcGnD8T6GY1MJwhFxpc/PVf2wTOgdOcDmdFOL8FsYPzajvZyUWWoIrz4 CPDlUZ8k0ZeUDTafkYcUf/EITVMF3p6znbz30AxEI5Bi2vJ2BPCWv1KPZxlYZIoz IpNy10/ucL5xHNzqmSZDdUGLko2ODjUHTpYMTxH9nyrYkD8Y1fmH/I3C2HsoWi4O a++prZLYmQL0LgRyH4j6EMCV1epkmj8VWRHGWG72EJS1Gm0DP6JYs+aLxFfSrcKn P6GfsAO+fyLIZoOn+9AE+utvBHN30s2NzgYKf4PxtLN4Ahzt0oBD21/sbSYz2TQ= =fxeu -----END PGP SIGNATURE----- --jf9xFk6R77gtURuCKOu9XtufFhCKG4dl3-- From owner-freebsd-net@freebsd.org Thu Sep 24 13:01:11 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 900B9A0746A for ; Thu, 24 Sep 2015 13:01:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7DB801905 for ; Thu, 24 Sep 2015 13:01:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8OD1BdR000242 for ; Thu, 24 Sep 2015 13:01:11 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203031] LACP problem with FreeBSD 10.2-RELEASE Date: Thu, 24 Sep 2015 13:01:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: gondim@bsdinfo.com.br X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 13:01:11 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203031 --- Comment #3 from gondim@bsdinfo.com.br --- Hi All, I'm realizing another strange thing in all FreeBSD 10.2-RELEASE: Look my ix0: ix0: flags=8843 metric 0 mtu 1500 options=8407bb ether 00:1b:21:89:25:28 inet 192.168.255.1 netmask 0xffffff00 broadcast 192.168.255.255 inet6 fe80::21b:21ff:fe89:2528%ix0 prefixlen 64 scopeid 0x1 inet6 2804:1054:dead:faca::1 prefixlen 64 nd6 options=21 media: Ethernet autoselect (10Gbase-SR ) status: active ==> rxpause and txpause is correct? All servers that are updated showing this message. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Thu Sep 24 14:14:54 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DDB1CA0879B for ; Thu, 24 Sep 2015 14:14:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CB3FD141A for ; Thu, 24 Sep 2015 14:14:54 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8OEEs0x072432 for ; Thu, 24 Sep 2015 14:14:54 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203031] LACP problem with FreeBSD 10.2-RELEASE Date: Thu, 24 Sep 2015 14:14:54 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: jeffrey.e.pieper@intel.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2015 14:14:55 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203031 Jeff Pieper changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jeffrey.e.pieper@intel.com --- Comment #4 from Jeff Pieper --- (In reply to gondim from comment #3) Yes that is correct. It should correspond to sysctl dev.ix.0.fc=3, which indicates that both tx and rx pause frames are enabled. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Fri Sep 25 06:42:40 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE496A0796B for ; Fri, 25 Sep 2015 06:42:40 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) Received: from relay2.tomsk.ru (mail.sibptus.tomsk.ru [212.73.124.5]) by mx1.freebsd.org (Postfix) with ESMTP id 2B6011C97 for ; Fri, 25 Sep 2015 06:42:39 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) X-Virus-Scanned: by clamd daemon 0.98.5_1 for FreeBSD at relay2.tomsk.ru Received: from admin.sibptus.TOMSK.ru ([212.73.125.240] verified) by relay2.tomsk.ru (CommuniGate Pro SMTP 5.1.16) with ESMTPS id 38877424 for freebsd-net@freebsd.org; Fri, 25 Sep 2015 12:42:36 +0600 Received: from admin.sibptus.TOMSK.ru (sudakov@localhost [127.0.0.1]) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7) with ESMTP id t8P6gZaF063716 for ; Fri, 25 Sep 2015 12:42:36 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) Received: (from sudakov@localhost) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7/Submit) id t8P6gZ4N063715 for freebsd-net@freebsd.org; Fri, 25 Sep 2015 12:42:35 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) X-Authentication-Warning: admin.sibptus.TOMSK.ru: sudakov set sender to vas@mpeks.tomsk.su using -f Date: Fri, 25 Sep 2015 12:42:34 +0600 From: Victor Sudakov To: freebsd-net@freebsd.org Subject: Re: transport mode IPSec with Windows 7, static keys Message-ID: <20150925064234.GA63016@admin.sibptus.tomsk.ru> References: <20150922084111.GA89385@admin.sibptus.tomsk.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150922084111.GA89385@admin.sibptus.tomsk.ru> Organization: OAO "Svyaztransneft", SibPTUS X-PGP-Key: http://www.dreamwidth.org/pubkey?user=victor_sudakov X-PGP-Fingerprint: 10E3 1171 1273 E007 C2E9 3532 0DA4 F259 9B5E C634 User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 06:42:40 -0000 Victor Sudakov wrote: > > Has anyone tried to set up transport mode IPSec with Windows 7 using > static keys? Hereby I declare that I have failed to setup static keys IPSec between FreeBSD and Windows. However, FreeBSD+racoon and Windows 7 with its builtin IPsec PolicyAgent service work more or less (E: 3des-cbc, A: hmac-sha1) on pre-shared secret. The only problem I have encountered is that after Windows reboot, traffic stops flowing between FreeBSD and Windows until racoon is restarted. I wonder if it has anything to do with the net.key.preferred_oldsa setting. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN sip:sudakov@sibptus.tomsk.ru From owner-freebsd-net@freebsd.org Fri Sep 25 13:42:58 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C902AA09872 for ; Fri, 25 Sep 2015 13:42:58 +0000 (UTC) (envelope-from rj@obsigna.com) Received: from mo6-p00-ob.smtp.rzone.de (mo6-p00-ob.smtp.rzone.de [IPv6:2a01:238:20a:202:5300::5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.smtp.rzone.de", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 665CF1EF1 for ; Fri, 25 Sep 2015 13:42:57 +0000 (UTC) (envelope-from rj@obsigna.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1443188574; l=745; s=domk; d=obsigna.com; h=Mime-Version:To:Date:Subject:Content-Transfer-Encoding:Content-Type: From; bh=dXP9SKw0VLNTZVhcWWXyD85XEc83PgnwhqtwBArp16I=; b=OAp7tp6GmRYwLWciMoA0tmZIlgaOZY7HSFGCBL0DjoW682G4LjV0dYmeVGA4O2Dh25i C27ik8eRXMRygIjFGfrzr3MR3TcsKUGP0xV3oWv+yLH6lrMOLOm6fismWPGJnw5rHlp/k VLX6Fb4rZD7HtsXmF2DVtEvUs3O0zaombgI= X-RZG-AUTH: :O2kGeEG7b/pS1EK7WHa0hxqKZr4lnx6UhToX1IWHkW4X7v2ImaU2BqdKiuq0gOHBJBc= X-RZG-CLASS-ID: mo00 Received: from mail.obsigna.com (bb033d4b.virtua.com.br [187.3.61.75]) by smtp.strato.de (RZmta 37.12 DYNA|AUTH) with ESMTPSA id 903223r8PDgrchG (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate) for ; Fri, 25 Sep 2015 15:42:53 +0200 (CEST) Received: from rolf.projectworld.net (rolf.projectworld.net [192.168.222.5]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.obsigna.com (Postfix) with ESMTPSA id 3DC9E14B90549 for ; Fri, 25 Sep 2015 10:42:50 -0300 (BRT) From: "Dr. Rolf Jansen" Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Clearance of checksum flags when decapsulating ESP packets Message-Id: Date: Fri, 25 Sep 2015 10:42:48 -0300 To: freebsd-net@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 13:42:58 -0000 Please, may I ask about the rationale behind the lines 1557 to 1562 in = function udp4_espdecap() of file src/sys/netinet/udp_usrreq.c on = FreeBSD 10.2-RELEASE-p3. =E2=80=A6 /* * We cannot yet update the cksums so clear any * h/w cksum flags as they are no longer valid. */ if (m->m_pkthdr.csum_flags & CSUM_DATA_VALID) m->m_pkthdr.csum_flags &=3D = ~(CSUM_DATA_VALID|CSUM_PSEUDO_HDR); =E2=80=A6 I am specially interested in learning about possibly adverse effects on = operating an IPsec and IPsec-NAT-T enabled kernel when leaving the = checksum flags in place, i.e. removing the above lines from the file = src/sys/netinet/udp_usrreq.c. Many thanks in advance for any enlightment. Best regards Rolf Jansen From owner-freebsd-net@freebsd.org Fri Sep 25 14:14:19 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E6C4EA08C07 for ; Fri, 25 Sep 2015 14:14:18 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 70F011368; Fri, 25 Sep 2015 14:14:18 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from [10.0.0.143] (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id CBC74D8F2; Fri, 25 Sep 2015 16:14:09 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <97E97774-842B-440A-BBA4-808FF821EC98@FreeBSD.org> Date: Fri, 25 Sep 2015 16:14:08 +0200 Cc: freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> <5603ACF7.7040403@freebsd.org> <97E97774-842B-440A-BBA4-808FF821EC98@FreeBSD.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 14:14:19 -0000 > 24 sep 2015 kl. 11:39 skrev Palle Girgensohn : >=20 >=20 >> 24 sep 2015 kl. 09:57 skrev Julien Charbon : >>=20 >>=20 >> Hi -net, >>=20 >> On 24/09/15 09:03, Julien Charbon wrote: >>> On 24/09/15 08:55, Palle Girgensohn wrote: >>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>>>> : >>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>>>> : >>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon :=20 >>>>>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>>> Kernels and userland are updated to 10.2-p3 with the patch >>>>>> removing the suspicous KASSERT. >>>>>> dtrace running continously redirecting to a log file. >>>> Just had a crash. Unfortunately, the kernel was stuck at the db> >>>> prompt, and the remote keyboard was unresponsive (HP ILO, not >>>> impressed). So I had to reset the power and never got a core = dump... >>>>=20 >>>> panic: tcp_tw_2msl_stop: inp should not be released here >>>> cpuid =3D 0 >>>> KDB: stack backtrace: >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame >>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame = 0xfffffe175acd1790 >>>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 >>>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 >>>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame >>>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame >>>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame >>>> 0xfffffe175acd18f0 softclock_call_cc() at >>>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at >>>> softclock+0x47/frame 0xfffffe175acd19f0 = intr_event_execute_handlers() >>>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 >>>> --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- >>>> KDB: enter: panic >>>> [ thread pid 12 tid 100043 ] >>>> Stopped at kdb_enter+0x3e: movq $0,kdb_why >>>> db> >>>=20 >>> Thanks a log for this backstrace. This is what at expected, when >>> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be called = one >>> extra time that leads to: >>>=20 >>> tcp_tw_2msl_stop: inp should not be released here >>>=20 >>> Let me try to come with a tentative fix for this case. >>=20 >> See joined my tentative patch for these case. It is only a first >> tentative patch as I am still waiting on -net feedbacks on what = should >> be the rule here. >>=20 >> By the way: >>=20 >> - I see nothing specific to VIMAGE here >>=20 >> - Anyone aware of tcp_close() (or tcp_drop()) calls = modified/introduced >> recently in 10.2 that could explained why this issue only appears = only now? >>=20 >> -- >> Julien >> >=20 >=20 > Running a machine with the patch now (it just crashed and rebooted = with the new kernel). >=20 > Hoping it will have a "soothing" effect... ;-) >=20 >=20 > dtrace running as previously. No output yet, though. >=20 >=20 Hello -net & Julien! First of, loud cheers and a big *thank you* to Julien for helping us get = our systems to stop crashing. This really means a lot to us! Thank you! We have been running more than 24 hours with no crash, so I'm getting = more and more confident that the change acually makes the system stable. Dtrace still shows nothing. Palle From owner-freebsd-net@freebsd.org Fri Sep 25 14:19:27 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A7991A08E88 for ; Fri, 25 Sep 2015 14:19:27 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from mail.pingpong.net (mail.pingpong.net [79.136.116.202]) by mx1.freebsd.org (Postfix) with ESMTP id 4CD32160D; Fri, 25 Sep 2015 14:19:27 +0000 (UTC) (envelope-from girgen@FreeBSD.org) Received: from [10.0.0.143] (citron2.pingpong.net [195.178.173.68]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.pingpong.net (Postfix) with ESMTPSA id 9838FD955; Fri, 25 Sep 2015 16:19:26 +0200 (CEST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Kernel panics in tcp_twclose From: Palle Girgensohn In-Reply-To: <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org> Date: Fri, 25 Sep 2015 16:19:26 +0200 Cc: freebsd-net@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <9529CF41-E4B9-4AC5-9703-945EC35924BC@FreeBSD.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <55FC1809.3070903@freebsd.org> <20150918160605.GN67105@kib.kiev.ua> <55FFBE01.6060706@freebsd.org> <3721F099-F45D-4DCD-8AB3-84D1ABC44145@FreeBSD.org> <73856F2B-3E70-483C-9988-C84E798CEB44@FreeBSD.org> <44EBAC98-4761-4E47-8E47-5032430A1C8A@FreeBSD.org> <56019AF8.8000705@freebsd.org> <5601CF2D.9030307@freebsd.org> <5602E90A.9050504@freebsd.org> <0931591A-23EC-40CB-A109-72E9308B1A2D@pingpong.net> <5602F044.5010606@freebsd.org> <54767991-9D3B-4ECB-A07E-CFA21A54BBDD@pingpong.net> <4E148E2E-F8D2-41C2-B232-9FD1548AA20B@pingpong.net> <30AD333B-EC8B-4EEF-8FE2-8EA8C216601E@FreeBSD.org> <5603A03B.4060002@freebsd.org> <5603ACF7.7040403@freebsd.org> <97E97774-842B-440A-BBA4-808FF821EC98@FreeBSD.org> <6BA42863-E584-4552-8D73-7471616ADC6D@FreeBSD.org> To: Julien Charbon X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 14:19:27 -0000 > 25 sep 2015 kl. 16:14 skrev Palle Girgensohn : >=20 >>=20 >> 24 sep 2015 kl. 11:39 skrev Palle Girgensohn : >>=20 >>=20 >>> 24 sep 2015 kl. 09:57 skrev Julien Charbon : >>>=20 >>>=20 >>> Hi -net, >>>=20 >>> On 24/09/15 09:03, Julien Charbon wrote: >>>> On 24/09/15 08:55, Palle Girgensohn wrote: >>>>>> 24 sep 2015 kl. 07:51 skrev Palle Girgensohn >>>>>> : >>>>>>> 24 sep 2015 kl. 00:05 skrev Palle Girgensohn >>>>>>> : >>>>>>>> 23 sep 2015 kl. 20:32 skrev Julien Charbon :=20= >>>>>>>> On 23/09/15 20:26, Palle Girgensohn wrote: >>>>>>> Kernels and userland are updated to 10.2-p3 with the patch >>>>>>> removing the suspicous KASSERT. >>>>>>> dtrace running continously redirecting to a log file. >>>>> Just had a crash. Unfortunately, the kernel was stuck at the db> >>>>> prompt, and the remote keyboard was unresponsive (HP ILO, not >>>>> impressed). So I had to reset the power and never got a core = dump... >>>>>=20 >>>>> panic: tcp_tw_2msl_stop: inp should not be released here >>>>> cpuid =3D 0 >>>>> KDB: stack backtrace: >>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >>>>> 0xfffffe175acd16a0 kdb_backtrace() at kdb_backtrace+0x39/frame >>>>> 0xfffffe175acd1750 vpanic() at vpanic+0x126/frame = 0xfffffe175acd1790 >>>>> kassert_panic() at kassert_panic+0x139/frame 0xfffffe175acd1800 >>>>> tcp_twclose() at tcp_twclose+0x2cb/frame 0xfffffe175acd1850 >>>>> tcp_tw_2msl_scan() at tcp_tw_2msl_scan+0x13b/frame >>>>> 0xfffffe175acd1890 tcp_slowtimo() at tcp_slowtimo+0x68/frame >>>>> 0xfffffe175acd18c0 pfslowtimo() at pfslowtimo+0x54/frame >>>>> 0xfffffe175acd18f0 softclock_call_cc() at >>>>> softclock_call_cc+0x193/frame 0xfffffe175acd19d0 softclock() at >>>>> softclock+0x47/frame 0xfffffe175acd19f0 = intr_event_execute_handlers() >>>>> at intr_event_execute_handlers+0x93/frame 0xfffffe 175acd1a30 >>>>> ithread_loop() at ithread_loop+0xa6/frame 0xfffffe175acd1a70 >>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe175acd1ab0 >>>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe175acd1ab0 >>>>> --- trap 0, rip =3D 0, rsp =3D 0xfffffe175acd1b70, rbp =3D 0 --- >>>>> KDB: enter: panic >>>>> [ thread pid 12 tid 100043 ] >>>>> Stopped at kdb_enter+0x3e: movq $0,kdb_why >>>>> db> >>>>=20 >>>> Thanks a log for this backstrace. This is what at expected, when >>>> tcp_close() in call in INP_TIMEWAIT case, in_pcbfree() can be = called one >>>> extra time that leads to: >>>>=20 >>>> tcp_tw_2msl_stop: inp should not be released here >>>>=20 >>>> Let me try to come with a tentative fix for this case. >>>=20 >>> See joined my tentative patch for these case. It is only a first >>> tentative patch as I am still waiting on -net feedbacks on what = should >>> be the rule here. >>>=20 >>> By the way: >>>=20 >>> - I see nothing specific to VIMAGE here >>>=20 >>> - Anyone aware of tcp_close() (or tcp_drop()) calls = modified/introduced >>> recently in 10.2 that could explained why this issue only appears = only now? >>>=20 >>> -- >>> Julien >>> >>=20 >>=20 >> Running a machine with the patch now (it just crashed and rebooted = with the new kernel). >>=20 >> Hoping it will have a "soothing" effect... ;-) >>=20 >>=20 >> dtrace running as previously. No output yet, though. >>=20 >>=20 >=20 > Hello -net & Julien! >=20 > First of, loud cheers and a big *thank you* to Julien for helping us = get our systems to stop crashing. This really means a lot to us! Thank = you! >=20 > We have been running more than 24 hours with no crash, so I'm getting = more and more confident that the change acually makes the system stable. >=20 > Dtrace still shows nothing. >=20 > Palle Secondly, is this error related? This is *not* VIMAGE, *not* jail. It is = a binary installed GENERIC from freebsd-update. 10.1-RELEASE-p19. It = just crashed today, and we did not get any core dump, but I found this = core.txt from a crash in August that I was not aware of (I was on = holiday then... :) Since it is installed binary, I have no kernel.debug. ... panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 = clashing GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain = conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for = details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xfffff80312126c68 and mbuf 0xfffff800b4a36800 = clashing cpuid =3D 1 KDB: stack backtrace: #0 0xffffffff80963000 at kdb_backtrace+0x60 #1 0xffffffff80928125 at panic+0x155 #2 0xffffffff8099c180 at sbdroprecord_locked+0 #3 0xffffffff80ac8c9c at tcp_output+0xdbc #4 0xffffffff80ac6a95 at tcp_do_segment+0x3045 #5 0xffffffff80ac2e04 at tcp_input+0xd04 #6 0xffffffff80a54fc7 at ip_input+0x97 #7 0xffffffff809f4f73 at swi_net+0x143 #8 0xffffffff808faf4b at intr_event_execute_handlers+0xab #9 0xffffffff808fb396 at ithread_loop+0x96 #10 0xffffffff808f8b6a at fork_exit+0x9a #11 0xffffffff80d0b67e at fork_trampoline+0xe Uptime: 21d0h54m53s Dumping 2005 out of 32709 = MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for /boot/kernel/accf_data.ko.symbols Reading symbols from /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for /boot/kernel/accf_http.ko.symbols Reading symbols from /boot/kernel/oce.ko.symbols...done. Loaded symbols for /boot/kernel/oce.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. Loaded symbols for /boot/kernel/linprocfs.ko.symbols Reading symbols from /boot/kernel/linux.ko.symbols...done. Loaded symbols for /boot/kernel/linux.ko.symbols Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols #0 doadump (textdump=3D) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=3D) at pcpu.h:219 #1 0xffffffff80927da2 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff80928164 in panic (fmt=3D) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8099c180 in sbsndptr (sb=3D,=20 off=3D, len=3D,=20 moff=3D) at = /usr/src/sys/kern/uipc_sockbuf.c:1011 #4 0xffffffff80ac8c9c in tcp_output (tp=3D0xfffff80312ef5800) at /usr/src/sys/netinet/tcp_output.c:870 #5 0xffffffff80ac6a95 in tcp_do_segment (m=3D,=20 th=3D, so=3D,=20 tp=3D, drop_hdrlen=3D, = tlen=3D0,=20 iptos=3D, ti_locked=3DCannot access memory at = address 0x1 ) at /usr/src/sys/netinet/tcp_input.c:3018 #6 0xffffffff80ac2e04 in tcp_input (m=3D,=20 off0=3D) at = /usr/src/sys/netinet/tcp_input.c:1377 #7 0xffffffff80a54fc7 in ip_input (m=3D0xfffff800b4516600) at /usr/src/sys/netinet/ip_input.c:734 #8 0xffffffff809f4f73 in swi_net (arg=3D0xffffffff81988880) at /usr/src/sys/net/netisr.c:765 #9 0xffffffff808faf4b in intr_event_execute_handlers ( p=3D, ie=3D0xfffff800093ac600) at /usr/src/sys/kern/kern_intr.c:1263 #10 0xffffffff808fb396 in ithread_loop (arg=3D0xfffff80009388e40) at /usr/src/sys/kern/kern_intr.c:1276 #11 0xffffffff808f8b6a in fork_exit ( callout=3D0xffffffff808fb300 , arg=3D0xfffff80009388e40,= =20 frame=3D0xfffffe083c3e3ac0) at /usr/src/sys/kern/kern_fork.c:996 #12 0xffffffff80d0b67e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 #13 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb)=20 From owner-freebsd-net@freebsd.org Fri Sep 25 16:50:49 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A14D6A089FB for ; Fri, 25 Sep 2015 16:50:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7DD8F11F9; Fri, 25 Sep 2015 16:50:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id BC156B915; Fri, 25 Sep 2015 12:50:47 -0400 (EDT) From: John Baldwin To: Julien Charbon Cc: Palle Girgensohn , George Neville-Neil , Konstantin Belousov , freebsd-net@freebsd.org Subject: Re: Can tcp_close() be called in INP_TIMEWAIT case Date: Fri, 25 Sep 2015 09:42:51 -0700 Message-ID: <2216936.QIvWsOndvU@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <5603E8E4.5030406@freebsd.org> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <5602BB7A.9010504@freebsd.org> <5603E8E4.5030406@freebsd.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 25 Sep 2015 12:50:47 -0400 (EDT) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 16:50:49 -0000 On Thursday, September 24, 2015 02:13:24 PM Julien Charbon wrote: > So the issue is: > > - tcp_close() is called for some reasons with an inp in INP_TIMEWAIT > state and sets the INP_DROPPED flag, > - tcp_detach() is called when the last reference on socket is dropped > > then now in_pcbfree() can be called twice instead of once: > > 1. First in tcp_detach(): > > static void > tcp_detach(struct socket *so, struct inpcb *inp) > { > struct tcpcb *tp; > tp = intotcpcb(inp); > > if (inp->inp_flags & INP_TIMEWAIT) { > if (inp->inp_flags & INP_DROPPED) { > in_pcbdetach(inp); > in_pcbfree(inp); <-- > } > > 2. Second when tcptw expires here: > > void > tcp_twclose(struct tcptw *tw, int reuse) > { > struct socket *so; > struct inpcb *inp; > > inp = tw->tw_inpcb; > > tcp_tw_2msl_stop(tw, reuse); > inp->inp_ppcb = NULL; > in_pcbdrop(inp); > > so = inp->inp_socket; > if (so != NULL) { > ... > } else { > in_pcbfree(inp); <-- > } > > This behavior is backed by Palle kernel panic backstraces and coredumps. > > o Solutions: > > Long: Forbid to call tcp_close() when inp is in INP_TIMEWAIT state, > the TCP stack rule being: > > - if !INP_TIMEWAIT: Call tcp_close() > - if INP_TIMEWAIT: Call tcp_twclose() (or call nothing, the tcptw will > expire/be recycled anyway) > > Short: > if INP_TIMEWAIT & INP_DROPPED: > Do not call in_pcbfree(inp) in tcp_detach() unless tcptw is already > discarded. > > The long solution seems cleaner, backed by tcp_detach() old comments > and most of current tcp_close() calls but I won't take that longer path > without -net approval first. I prefer the longer solution if it keeps tcp_detach() simpler by avoiding an extra condition. Please just document it via assertions in tcp_close() (or is this the assertion that fired and triggered the reported panic? If so, then you obviously don't need to add it. :-P) -- John Baldwin From owner-freebsd-net@freebsd.org Fri Sep 25 17:50:53 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8B42FA08FE2 for ; Fri, 25 Sep 2015 17:50:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 780B91452 for ; Fri, 25 Sep 2015 17:50:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8PHor1Z030622 for ; Fri, 25 Sep 2015 17:50:53 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 147912] [boot] FreeBSD 8 Beta won't boot on Thinkpad i1300 1171-5XU Date: Fri, 25 Sep 2015 17:50:53 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: emaste@freebsd.org X-Bugzilla-Status: In Progress X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 17:50:53 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=147912 Ed Maste changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |emaste@freebsd.org --- Comment #2 from Ed Maste --- Is this reproducible on 10.x? -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Fri Sep 25 22:39:45 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 39FE8A08DA0 for ; Fri, 25 Sep 2015 22:39:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 26D2E1AFF for ; Fri, 25 Sep 2015 22:39:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8PMdjWE007569 for ; Fri, 25 Sep 2015 22:39:45 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 172895] [ixgb] [ixgbe] do not properly determine link-state Date: Fri, 25 Sep 2015 22:39:44 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 9.1-PRERELEASE X-Bugzilla-Keywords: IntelNetworking X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: asomers@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 22:39:45 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=172895 --- Comment #7 from Alan Somers --- (In reply to Alan Somers from comment #5) Actually, that's not 100% true. It turns out that my interface _will_ report carrier status when up but with no IP. It just takes a few seconds. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-net@freebsd.org Fri Sep 25 22:46:43 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E481CA09309 for ; Fri, 25 Sep 2015 22:46:42 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from mail.strugglingcoder.info (strugglingcoder.info [65.19.130.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.strugglingcoder.info", Issuer "mail.strugglingcoder.info" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C744C1142; Fri, 25 Sep 2015 22:46:42 +0000 (UTC) (envelope-from hiren@strugglingcoder.info) Received: from localhost (unknown [10.1.1.3]) (Authenticated sender: hiren@strugglingcoder.info) by mail.strugglingcoder.info (Postfix) with ESMTPA id 03836D145A; Fri, 25 Sep 2015 15:46:36 -0700 (PDT) Date: Fri, 25 Sep 2015 15:46:35 -0700 From: hiren panchasara To: John Baldwin Cc: Julien Charbon , Konstantin Belousov , George Neville-Neil , Palle Girgensohn , freebsd-net@freebsd.org Subject: Re: Can tcp_close() be called in INP_TIMEWAIT case Message-ID: <20150925224635.GR46700@strugglingcoder.info> References: <26B0FF93-8AE3-4514-BDA1-B966230AAB65@FreeBSD.org> <5602BB7A.9010504@freebsd.org> <5603E8E4.5030406@freebsd.org> <2216936.QIvWsOndvU@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="uNvczuo8OWfsyO2w" Content-Disposition: inline In-Reply-To: <2216936.QIvWsOndvU@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2015 22:46:43 -0000 --uNvczuo8OWfsyO2w Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 09/25/15 at 09:42P, John Baldwin wrote: > On Thursday, September 24, 2015 02:13:24 PM Julien Charbon wrote: > > So the issue is: > >=20 > > - tcp_close() is called for some reasons with an inp in INP_TIMEWAIT > > state and sets the INP_DROPPED flag, > > - tcp_detach() is called when the last reference on socket is dropped > >=20 > > then now in_pcbfree() can be called twice instead of once: > >=20 > > 1. First in tcp_detach(): > >=20 > > static void > > tcp_detach(struct socket *so, struct inpcb *inp) > > { > > struct tcpcb *tp; > > tp =3D intotcpcb(inp); > >=20 > > if (inp->inp_flags & INP_TIMEWAIT) { > > if (inp->inp_flags & INP_DROPPED) { > > in_pcbdetach(inp); > > in_pcbfree(inp); <-- > > } > >=20 > > 2. Second when tcptw expires here: > >=20 > > void > > tcp_twclose(struct tcptw *tw, int reuse) > > { > > struct socket *so; > > struct inpcb *inp; > >=20 > > inp =3D tw->tw_inpcb; > >=20 > > tcp_tw_2msl_stop(tw, reuse); > > inp->inp_ppcb =3D NULL; > > in_pcbdrop(inp); > >=20 > > so =3D inp->inp_socket; > > if (so !=3D NULL) { > > ... > > } else { > > in_pcbfree(inp); <-- > > } > >=20 > > This behavior is backed by Palle kernel panic backstraces and coredump= s. > >=20 > > o Solutions: > >=20 > > Long: Forbid to call tcp_close() when inp is in INP_TIMEWAIT state, > > the TCP stack rule being: > >=20 > > - if !INP_TIMEWAIT: Call tcp_close() > > - if INP_TIMEWAIT: Call tcp_twclose() (or call nothing, the tcptw will > > expire/be recycled anyway) > >=20 > > Short: > > if INP_TIMEWAIT & INP_DROPPED: > > Do not call in_pcbfree(inp) in tcp_detach() unless tcptw is already > > discarded. > >=20 > > The long solution seems cleaner, backed by tcp_detach() old comments > > and most of current tcp_close() calls but I won't take that longer path > > without -net approval first. >=20 > I prefer the longer solution if it keeps tcp_detach() simpler by avoiding > an extra condition. Please just document it via assertions in tcp_close() > (or is this the assertion that fired and triggered the reported panic? I= f so, > then you obviously don't need to add it. :-P) I also like the longer solution because it seems cleaner and more readable/followable. Julien, nice work on investigation and follow-up. :-) Cheers, Hiren --uNvczuo8OWfsyO2w Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAABCgBmBQJWBc7IXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNEUyMEZBMUQ4Nzg4RjNGMTdFNjZGMDI4 QjkyNTBFMTU2M0VERkU1AAoJEIuSUOFWPt/lGroH/jGmVcCO1HOvtoje3poR7pG4 JXtNvztCPATB/cwKL1ufFN8f7Yev9aCyJ6W+grLdl/NYZJxhPelMr1CTNjmDGknW Iq4aEZhC8NUp64AJJEKBVl5nr4oJDYdhqw4c86Y1OKBY6Ov4sxQ+KQxd2eLE7xsT EjRvocSTwVecfH6B6OTUzCdYUDGNBD5WoYv+PbW4AhoaoNU9wM1mOQSU6GcBD/Or KVQHB6IlCrfnu78d/RGbPbbG8TZQ9ITSAl+pQYIL4HlmjCZJ3qcxwSVpjabWC5yY SU1jBBHpOKmrpv6fdnizQ/7uJOsbjg3yAPY1xZbfZXhSDsQUZBPgKf9iA2fOqsQ= =MXHc -----END PGP SIGNATURE----- --uNvczuo8OWfsyO2w-- From owner-freebsd-net@freebsd.org Sat Sep 26 14:31:03 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4630BA09F42 for ; Sat, 26 Sep 2015 14:31:03 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) Received: from relay2.tomsk.ru (mail.sibptus.tomsk.ru [212.73.124.5]) by mx1.freebsd.org (Postfix) with ESMTP id A4ABB96 for ; Sat, 26 Sep 2015 14:31:01 +0000 (UTC) (envelope-from vas@mpeks.tomsk.su) X-Virus-Scanned: by clamd daemon 0.98.5_1 for FreeBSD at relay2.tomsk.ru Received: from admin.sibptus.TOMSK.ru ([212.73.125.240] verified) by relay2.tomsk.ru (CommuniGate Pro SMTP 5.1.16) with ESMTPS id 38879170 for freebsd-net@freebsd.org; Sat, 26 Sep 2015 20:30:59 +0600 Received: from admin.sibptus.TOMSK.ru (sudakov@localhost [127.0.0.1]) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7) with ESMTP id t8QEUvCK088551 for ; Sat, 26 Sep 2015 20:30:59 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) Received: (from sudakov@localhost) by admin.sibptus.TOMSK.ru (8.14.9/8.14.7/Submit) id t8QEUvos088550 for freebsd-net@freebsd.org; Sat, 26 Sep 2015 20:30:57 +0600 (NOVT) (envelope-from vas@mpeks.tomsk.su) X-Authentication-Warning: admin.sibptus.TOMSK.ru: sudakov set sender to vas@mpeks.tomsk.su using -f Date: Sat, 26 Sep 2015 20:30:57 +0600 From: Victor Sudakov To: freebsd-net@freebsd.org Subject: Re: transport mode IPSec with Windows 7, static keys Message-ID: <20150926143057.GA88375@admin.sibptus.tomsk.ru> References: <20150922084111.GA89385@admin.sibptus.tomsk.ru> <20150925064234.GA63016@admin.sibptus.tomsk.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150925064234.GA63016@admin.sibptus.tomsk.ru> Organization: OAO "Svyaztransneft", SibPTUS X-PGP-Key: http://www.dreamwidth.org/pubkey?user=victor_sudakov X-PGP-Fingerprint: 10E3 1171 1273 E007 C2E9 3532 0DA4 F259 9B5E C634 User-Agent: Mutt/1.5.24 (2015-08-30) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Sep 2015 14:31:03 -0000 Victor Sudakov wrote: > > However, FreeBSD+racoon and Windows 7 with its builtin IPsec > PolicyAgent service work more or less (E: 3des-cbc, A: hmac-sha1) on > pre-shared secret. > > The only problem I have encountered is that after Windows reboot, > traffic stops flowing between FreeBSD and Windows until racoon is > restarted. > > I wonder if it has anything to do with the net.key.preferred_oldsa > setting. The two sysctls: net.key.preferred_oldsa=0 net.key.blockacq_count=0 seem to fix the reboot problem. Could anyone explain the mechanism? I have never had to tweak them to get IPsec working between FreeBSD hosts. -- Victor Sudakov, VAS4-RIPE, VAS47-RIPN sip:sudakov@sibptus.tomsk.ru From owner-freebsd-net@freebsd.org Sat Sep 26 17:30:22 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F7CDA085FA for ; Sat, 26 Sep 2015 17:30:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1CD9EF28 for ; Sat, 26 Sep 2015 17:30:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id t8QHULrd032251 for ; Sat, 26 Sep 2015 17:30:21 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-net@FreeBSD.org Subject: [Bug 203288] axge(4) panics on unplug Date: Sat, 26 Sep 2015 17:30:22 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-net@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Sep 2015 17:30:22 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203288 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-bugs@FreeBSD.org |freebsd-net@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug.