From owner-freebsd-scsi@freebsd.org Sun Jun 25 14:54:31 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4F3D1D92E64; Sun, 25 Jun 2017 14:54:31 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wr0-x233.google.com (mail-wr0-x233.google.com [IPv6:2a00:1450:400c:c0c::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CE98E70DFE; Sun, 25 Jun 2017 14:54:30 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wr0-x233.google.com with SMTP id k67so121281522wrc.2; Sun, 25 Jun 2017 07:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=EHRT7r60qlKUbl2PrY0ILdGA/eiVC6MuO//XFBnYNFY=; b=nhRCcdYd0bXzBbNuvLETpqnQp++k4jsFSBahXn7hVe9bsivNQzXhVzvX7BGB5xl6yn v7vAxanVKqLWCp9w33AVHoBGrQG0O2rKYbzlRV2ZAWx67yNM5FbgcnDedLL5UzVZdDCO dEBkybR5l76jgjVnlPopui+OZYGTzhvwb0vKxtpzNaN880rastHlmnYFQ3jP0KC0zgcs BmZFhWaWE5rHXvps08qPO7JEWYLXQC4DM8NxheM7jg4DWPr+ADHyFnCrbUIJcQIQyIo3 hqrFouKJAOYCUTruG4hL2pdbi7oPoafNilCBAf9r7Er5sOOTYFZv3tNHBwT2fF6N4HPC CEVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=EHRT7r60qlKUbl2PrY0ILdGA/eiVC6MuO//XFBnYNFY=; b=ljQz57AVxJr3PVuAQNRbBwO0vuuUYMhAQALTyLVjaTtgT2WQEvX8vw/FuHWN07hU7t vPjB2jHD+1j6JjA4qCxGKOLlpUH8pzQEyzJ9oyPHEJlm6WWDx36Ng8FwNadSKeRSod+y QZGmZ5KyI66vczyaocQHMAeRLaAK8puBcISbrlXP2bJbUpAVrz1fBxprg/DKzpc5OKsR V512SHSvKivI3A7pfwrBb9Meg1KAWOOYeKSKEXKS8xPcF/LQbIGTvfbAxoCv6BzCykhD TkFAEBjc8x732+6Sdomcuy56xz0wueMYlt0PDcBs1Ur7PXSafgCbIkJmuQ0268SsOvvM 3Bmw== X-Gm-Message-State: AKS2vOzs4tB03/meVo+w3+uYrFdOH/5Es0Zlpxe9Pcf9KfI3jJSW/ao2 Y7x97tlyyx30GLmABbs= X-Received: by 10.223.144.39 with SMTP id h36mr11995373wrh.114.1498402467549; Sun, 25 Jun 2017 07:54:27 -0700 (PDT) Received: from ben.home (LFbn-1-7159-4.w90-116.abo.wanadoo.fr. [90.116.90.4]) by smtp.gmail.com with ESMTPSA id m73sm10541797wmi.25.2017.06.25.07.54.26 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 25 Jun 2017 07:54:26 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: mbuf_jumbo_9k & iSCSI failing From: Ben RUBSON In-Reply-To: <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> Date: Sun, 25 Jun 2017 16:54:25 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <9B507AA6-40FE-4B8D-853F-2A9422A2DF67@gmail.com> References: <486A6DA0-54C8-40DF-8437-F6E382DA01A8@gmail.com> <6a31ef00-5f7a-d36e-d5e6-0414e8b813c7@selasky.org> <613AFD8E-72B2-4E3F-9C70-1D1E43109B8A@gmail.com> <2c9a9c2652a74d8eb4b34f5a32c7ad5c@AM5PR0502MB2916.eurprd05.prod.outlook.com> <52A2608C-A57E-4E75-A952-F4776BA23CA4@gmail.com> To: FreeBSD Net , freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 14:54:31 -0000 > On 30 Dec 2016, at 22:55, Ben RUBSON wrote: >=20 > Hello, >=20 > 2 FreeBSD 11.0-p3 servers, one iSCSI initiator, one target. > Both with Mellanox ConnectX-3 40G. >=20 > Since a few days, sometimes, under undetermined circumstances, as soon = as there is some (very low) iSCSI traffic, some of the disks get = disconnected : > kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) = after 5 seconds; dropping connection >=20 > At the same moment, sysctl counters hw.mlxen1.stat.rx_ring*.error grow = on initiator side. >=20 > I then tried to reproduce these network errors burning the link at 40G = full-duplex using iPerf. > But I did not manage to increase these error counters. >=20 > It's strange because it's a sporadic issue, I can have traffic on = iSCSI disks without any issue, and sometimes, they get disconnected with = errors growing. > On 01 Jan 2017, at 09:16, Meny Yossefi wrote: >=20 > Any chance you ran out of mbufs in the system? > On 02 Jan 2017, at 12:09, Ben RUBSON wrote: >=20 > I think you are right, this could be a mbufs issue. > Here are some more numbers : >=20 > # vmstat -z | grep -v "0, 0$" > ITEM SIZE LIMIT USED FREE REQ = FAIL SLEEP > 4 Bucket: 32, 0, 2673, 28327, 88449799, = 17317, 0 > 8 Bucket: 64, 0, 449, 15609, 13926386, = 4871, 0 > 12 Bucket: 96, 0, 335, 5323, 10293892, = 142872, 0 > 16 Bucket: 128, 0, 533, 6070, 7618615, = 472647, 0 > 32 Bucket: 256, 0, 8317, 22133, 36020376, = 563479, 0 > 64 Bucket: 512, 0, 1238, 3298, 20138111, = 11430742, 0 > 128 Bucket: 1024, 0, 1865, 2963, 21162182, = 158752, 0 > 256 Bucket: 2048, 0, 1626, 450, 80253784, = 4890164, 0 > mbuf_jumbo_9k: 9216, 603712, 16400, 8744, 4128521064, = 2661, 0 > On 03 Jan 2017, at 07:27, Meny Yossefi wrote: >=20 > Have you tried increasing the mbufs limit?=20 > (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed) > On 04 Jan 2017, at 14:47, Ben RUBSON wrote: >=20 > No I did not try this yet. > However, from the numbers above (and below), I think I should increase = kern.ipc.nmbjumbo9 instead ? > On 30 Jan 2017, at 15:36, Ben RUBSON wrote: >=20 > So, to give some news, increasing kern.ipc.nmbjumbo9 helped a lot. > Just a very little issue (compared to the others before) over the last = 3 weeks. Hello, I'm back today with this issue. Above is my discussion with Meny from Mellanox at the beginning of 2017. (topic was "iSCSI failing, MLX rx_ring errors ?", on freebsd-net list) So this morning issue came again, some of my iSCSI disks were = disconnected. Below are some numbers. # vmstat -z | grep -v "0, 0$" ITEM SIZE LIMIT USED FREE REQ FAIL = SLEEP 8 Bucket: 64, 0, 654, 8522, 28604967, 11, 0 12 Bucket: 96, 0, 976, 5092, 23758734, 78, 0 32 Bucket: 256, 0, 789, 4491, 43446969, 137, 0 64 Bucket: 512, 0, 666, 2750, 47568959, 1272018, 0 128 Bucket: 1024, 0, 1047, 1249, 28774042, 232504, 0 256 Bucket: 2048, 0, 1611, 369, 139988097, 8931139, 0 vmem btag: 56, 0, 2949738, 15506, 18092235, 20908, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 8776, 8610737115, 297, 0 # uname -rs FreeBSD 11.0-RELEASE-p8 # uptime 3:34p.m. up 88 days, 15:57, 2 users, load averages: 0.95, 0.67, 0.62 # grep kern.ipc.nmb /boot/loader.conf=20 kern.ipc.nmbjumbo9=3D2037529 kern.ipc.nmbjumbo16=3D1 # sysctl kern.ipc | grep mb kern.ipc.nmbufs: 26080380 kern.ipc.nmbjumbo16: 4 kern.ipc.nmbjumbo9: 6112587 kern.ipc.nmbjumbop: 2037529 kern.ipc.nmbclusters: 4075060 kern.ipc.maxmbufmem: 33382887424 # ifconfig mlxen1 mlxen1: flags=3D8843 metric 0 = mtu 9020 = options=3Ded07bb nd6 options=3D29 media: Ethernet autoselect (40Gbase-CR4 ) status: active I just caught the issue growing : # vmstat -z | grep mbuf_jumbo_9k ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735246407, 665, 0 mbuf_jumbo_9k: 9216, 2037529, 16411, 7320,8735286748, 665, 0 mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735298937, 667, 0 mbuf_jumbo_9k: 9216, 2037529, 16438, 7293,8735337634, 667, 0 mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8735354339, 668, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735382105, 669, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735392836, 671, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735423910, 671, 0 mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735456393, 671, 0 mbuf_jumbo_9k: 9216, 2037529, 16409, 7322,8735472284, 672, 0 mbuf_jumbo_9k: 9216, 2037529, 16420, 7311,8735512237, 673, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735518502, 675, 0 mbuf_jumbo_9k: 9216, 2037529, 16410, 7321,8735543668, 676, 0 mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8735555646, 678, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735568986, 679, 0 mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735579075, 680, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735603983, 681, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735634273, 681, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735646057, 683, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735658213, 684, 0 mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735675678, 686, 0 mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735686017, 687, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735707335, 687, 0 mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8736016546, 708, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736037292, 709, 0 mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736053865, 710, 0 mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8736070103, 711, 0 mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8736086810, 711, 0 mbuf_jumbo_9k: 9216, 2037529, 16430, 7301,8736098568, 713, 0 mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736122803, 714, 0 mbuf_jumbo_9k: 9216, 2037529, 16417, 7314,8736134322, 715, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736152338, 715, 0 mbuf_jumbo_9k: 9216, 2037529, 16403, 7328,8736167677, 715, 0 mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736170783, 717, 0 mbuf_jumbo_9k: 9216, 2037529, 16445, 7286,8736546084, 733, 0 During this, top was reporting the following : Mem: 4056K Active, 426M Inact, 59G Wired, 2531M Free And in /var/log/messages : kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) after = 5 seconds; dropping connection Any idea why I'm experiencing this ? Thank you very much for your help & support, Best regards, Ben