Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Sep 2016 10:51:07 +0200
From:      Julien Charbon <jch@freebsd.org>
To:        Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@FreeBSD.org, hiren panchasara <hiren@strugglingcoder.info>
Subject:   Re: 11.0 stuck on high network load
Message-ID:  <c33b5f76-046c-10cf-b81b-2eac55e2491c@freebsd.org>
In-Reply-To: <20160925185815.GA6177@zxy.spb.ru>
References:  <20160919204328.GN2840@zxy.spb.ru> <8ba75d6e-4f01-895e-0aed-53c6c6692cb9@freebsd.org> <20160920202633.GQ2840@zxy.spb.ru> <f644cd52-4377-aa90-123a-3a2887972bbc@freebsd.org> <20160921195155.GW2840@zxy.spb.ru> <e4e0188c-b22b-29af-ed15-b650c3ec4553@gmail.com> <20160922095331.GB2840@zxy.spb.ru> <67862b33-63c0-2f23-d254-5ddc55dbb554@freebsd.org> <20160922102045.GC2840@zxy.spb.ru> <20160923191656.GF2840@zxy.spb.ru> <20160925185815.GA6177@zxy.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--TtwNGnolL41w7XTiubjCTpMUQ4E3GcXkQ
Content-Type: multipart/mixed; boundary="LOEuSf0BIKjR7HVfK3G906jP9JLEU0oiC";
 protected-headers="v1"
From: Julien Charbon <jch@freebsd.org>
To: Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@FreeBSD.org,
 hiren panchasara <hiren@strugglingcoder.info>
Message-ID: <c33b5f76-046c-10cf-b81b-2eac55e2491c@freebsd.org>
Subject: Re: 11.0 stuck on high network load
References: <20160919204328.GN2840@zxy.spb.ru>
 <8ba75d6e-4f01-895e-0aed-53c6c6692cb9@freebsd.org>
 <20160920202633.GQ2840@zxy.spb.ru>
 <f644cd52-4377-aa90-123a-3a2887972bbc@freebsd.org>
 <20160921195155.GW2840@zxy.spb.ru>
 <e4e0188c-b22b-29af-ed15-b650c3ec4553@gmail.com>
 <20160922095331.GB2840@zxy.spb.ru>
 <67862b33-63c0-2f23-d254-5ddc55dbb554@freebsd.org>
 <20160922102045.GC2840@zxy.spb.ru> <20160923191656.GF2840@zxy.spb.ru>
 <20160925185815.GA6177@zxy.spb.ru>
In-Reply-To: <20160925185815.GA6177@zxy.spb.ru>

--LOEuSf0BIKjR7HVfK3G906jP9JLEU0oiC
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 9/25/16 8:58 PM, Slawa Olhovchenkov wrote:
> On Fri, Sep 23, 2016 at 10:16:56PM +0300, Slawa Olhovchenkov wrote:
>=20
>> On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote:
>>
>>> On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote:
>>>
>>>>>>  These paths can indeed compete for the same INP lock, as both
>>>>>> tcp_tw_2msl_scan() calls always start with the first inp found in
>>>>>> twq_2msl list.  But in both cases, this first inp should be quickl=
y used
>>>>>> and its lock released anyway, thus that could explain your situati=
on it
>>>>>> that the TCP stack is doing that all the time, for example:
>>>>>>
>>>>>>  - Let say that you are running out completely and constantly of t=
cptw,
>>>>>> and then all connections transitioning to TIME_WAIT state are comp=
eting
>>>>>> with the TIME_WAIT timeout scan that tries to free all the expired=

>>>>>> tcptw.  If the stack is doing that all the time, it can appear lik=
e
>>>>>> "live" locked.
>>>>>>
>>>>>>  This is just an hypothesis and as usual might be a red herring.
>>>>>> Anyway, could you run:
>>>>>>
>>>>>> $ vmstat -z | head -2; vmstat -z | grep -E 'tcp|sock'
>>>>>
>>>>> ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL =
SLEEP
>>>>>
>>>>> socket:                 864, 4192664,   18604,   25348,49276158,   =
0,   0
>>>>> tcp_inpcb:              464, 4192664,   34226,   18702,49250593,   =
0,   0
>>>>> tcpcb:                 1040, 4192665,   18424,   18953,49250593,   =
0,   0
>>>>> tcptw:                   88,  16425,   15802,     623,14526919,   8=
,   0
>>>>> tcpreass:                40,  32800,      15,    2285,  632381,   0=
,   0
>>>>>
>>>>> In normal case tcptw is about 16425/600/900
>>>>>
>>>>> And after `sysctl -a | grep tcp` system stuck on serial console and=
 I am reset it.
>>>>>
>>>>>>  Ideally, once when everything is ok, and once when you have the i=
ssue
>>>>>> to see the differences (if any).
>>>>>>
>>>>>>  If it appears your are quite low in tcptw, and if you have enough=

>>>>>> memory, could you try increase the tcptw limit using sysctl
>>>>>
>>>>> I think this is not eliminate stuck, just may do it less frequency
>>>>
>>>>  You are right, it would just be a big hint that the tcp_tw_2msl_sca=
n()
>>>> contention hypothesis is the right one.  As I see you have plenty of=

>>>> memory on your server, thus could you try with:
>>>>
>>>> net.inet.tcp.maxtcptw=3D4192665
>>>>
>>>>  And see what happen. Just to validate this hypothesis.
>>>
>>> This is bad way for validate, with maxtcptw=3D16384 happened is rando=
m
>>> and can be waited for month. After maxtcptw=3D4192665 I am don't know=

>>> how long need to wait for verification this hypothesis.
>>>
>>> More frequency (may be 3-5 times per day) happening less traffic drop=
s
>>> (not to zero for minutes). May be this caused also by contention in
>>> tcp_tw_2msl_scan, but fast resolved (stochastic process). By eating
>>> CPU power nginx can't service connection and clients closed
>>> connections and need more TIME_WAIT and can trigered
>>> tcp_tw_2msl_scan(reuse=3D1). After this we can got live lock.
>>>
>>> May be after I learning to catch and dignostic this validation is mor=
e
>>> accurately.
>>
>> Some more bits:
>>
>> socket:                 864, 4192664,   30806,     790,28524160,   0, =
  0
>> ipq:                     56,  32802,       0,    1278,    1022,   0,  =
 0
>> udp_inpcb:              464, 4192664,      44,     364,   14066,   0, =
  0
>> udpcb:                   32, 4192750,      44,    3081,   14066,   0, =
  0
>> tcp_inpcb:              464, 4192664,   38558,     378,28476709,   0, =
  0
>> tcpcb:                 1040, 4192665,   30690,     738,28476709,   0, =
  0
>> tcptw:                   88,  32805,    7868,     772, 8412249,   0,  =
 0
>>
>> last pid: 49575;  load averages:  2.00,  2.05,  3.75    up 1+01:12:08 =
 22:13:42
>> 853 processes: 15 running, 769 sleeping, 35 waiting, 34 lock
>> CPU 0:   0.0% user,  0.0% nice,  0.0% system,  100% interrupt,  0.0% i=
dle
>> CPU 1:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 2:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 3:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 4:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 5:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 6:   0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% i=
dle
>> CPU 7:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 8:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 9:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> CPU 10:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% i=
dle
>> CPU 11:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% i=
dle
>> Mem: 8659M Active, 8385M Inact, 107G Wired, 1325M Free
>> ARC: 99G Total, 88G MFU, 10G MRU, 32K Anon, 167M Header, 529M Other
>> Swap: 32G Total, 32G Free
>=20
> I can collect some more info.
> First:
>=20
>  1046 -         -  R      125:10.79 nginx: worker process (nginx)
>  1047 kqread    -  I      150:11.98 nginx: worker process (nginx)
>  1049 kqread    -  I      145:58.35 nginx: worker process (nginx)
>  1050 kqread    -  I      136:33.36 nginx: worker process (nginx)
>  1051 kqread    -  I      140:59.73 nginx: worker process (nginx)
>  1052 kqread    -  I      137:18.12 nginx: worker process (nginx)
>=20
> pid 1046 is nginx running on CPU0 (affinity mask set).
>=20
> # procstat -k -k 1046
>   PID    TID COMM             TDNAME           KSTACK
>  1046 100686 nginx            -                mi_switch+0xd2 critical_=
exit+0x7e lapic_handle_timer+0xb1 Xtimerint+0x8c __mtx_lock_sleep+0x168 z=
one_fetch_slab+0x47 zone_import+0x52 zone_alloc_item+0x36 keg_alloc_slab+=
0x63 keg_fetch_slab+0x16e zone_fetch_slab+0x6e zone_import+0x52 uma_zallo=
c_arg+0x36e m_getm2+0x14f m_uiotombuf+0x64 sosend_generic+0x356 soo_write=
+0x42 dofilewrite+0x87
>=20
> Tracing command nginx pid 1046 tid 100686 td 0xfffff8014485f500
> sched_switch() at 0xffffffff804c956d =3D sched_switch+0x6ad/frame 0xfff=
ffe20216992a0 /usr/src/sys/kern/sched_ule.c:1973
> mi_switch() at 0xffffffff804a8d92 =3D mi_switch+0xd2/frame 0xfffffe2021=
6992d0 /usr/src/sys/kern/kern_synch.c:465
> critical_exit() at 0xffffffff804a6bee =3D critical_exit+0x7e/frame 0xff=
fffe20216992f0 /usr/src/sys/kern/kern_switch.c:219
> lapic_handle_timer() at 0xffffffff80771701 =3D lapic_handle_timer+0xb1/=
frame 0xfffffe2021699330 /usr/src/sys/x86/x86/local_apic.c:1185
> Xtimerint() at 0xffffffff806cbbcc =3D Xtimerint+0x8c/frame 0xfffffe2021=
699330 /usr/src/sys/amd64/amd64/apic_vector.S:135
> --- interrupt, rip =3D 0xffffffff804de424, rsp =3D 0xfffffe2021699400, =
rbp =3D 0xfffffe2021699420 ---
> lock_delay() at 0xffffffff804de424 =3D lock_delay+0x54/frame 0xfffffe20=
21699420 /usr/src/sys/kern/subr_lock.c:127
> __mtx_lock_sleep() at 0xffffffff80484dc8 =3D __mtx_lock_sleep+0x168/fra=
me 0xfffffe20216994a0 /usr/src/sys/kern/kern_mutex.c:512
> zone_fetch_slab() at 0xffffffff806a4257 =3D zone_fetch_slab+0x47/frame =
0xfffffe20216994e0 /usr/src/sys/vm/uma_core.c:2378
> zone_import() at 0xffffffff806a4312 =3D zone_import+0x52/frame 0xfffffe=
2021699530 /usr/src/sys/vm/uma_core.c:2501
> zone_alloc_item() at 0xffffffff806a0986 =3D zone_alloc_item+0x36/frame =
0xfffffe2021699570 /usr/src/sys/vm/uma_core.c:2591
> keg_alloc_slab() at 0xffffffff806a2463 =3D keg_alloc_slab+0x63/frame 0x=
fffffe20216995d0 /usr/src/sys/vm/uma_core.c:965
> keg_fetch_slab() at 0xffffffff806a48ce =3D keg_fetch_slab+0x16e/frame 0=
xfffffe2021699620 /usr/src/sys/vm/uma_core.c:2349
> zone_fetch_slab() at 0xffffffff806a427e =3D zone_fetch_slab+0x6e/frame =
0xfffffe2021699660 /usr/src/sys/vm/uma_core.c:2375
> zone_import() at 0xffffffff806a4312 =3D zone_import+0x52/frame 0xfffffe=
20216996b0 /usr/src/sys/vm/uma_core.c:2501=20
> uma_zalloc_arg() at 0xffffffff806a147e =3D uma_zalloc_arg+0x36e/frame 0=
xfffffe2021699720 /usr/src/sys/vm/uma_core.c:2531
> m_getm2() at 0xffffffff8048231f =3D m_getm2+0x14f/frame 0xfffffe2021699=
790 /usr/src/sys/kern/kern_mbuf.c:830
> m_uiotombuf() at 0xffffffff80516044 =3D m_uiotombuf+0x64/frame 0xfffffe=
20216997e0 /usr/src/sys/kern/uipc_mbuf.c:1535
> sosend_generic() at 0xffffffff8051ce56 =3D sosend_generic+0x356/frame 0=
xfffffe20216998a0
> soo_write() at 0xffffffff804fd872 =3D soo_write+0x42/frame 0xfffffe2021=
6998d0
> dofilewrite() at 0xffffffff804f5c97 =3D dofilewrite+0x87/frame 0xfffffe=
2021699920
> kern_writev() at 0xffffffff804f5978 =3D kern_writev+0x68/frame 0xfffffe=
2021699970
> sys_writev() at 0xffffffff804f5be6 =3D sys_writev+0x36/frame 0xfffffe20=
216999a0
> amd64_syscall() at 0xffffffff806e4051 =3D amd64_syscall+0x2c1/frame 0xf=
ffffe2021699ab0
> Xfast_syscall() at 0xffffffff806cb2bb =3D Xfast_syscall+0xfb/frame 0xff=
fffe2021699ab0
> --- syscall (121, FreeBSD ELF64, sys_writev), rip =3D 0x8019cc6ba, rsp =
=3D 0x7fffffffd688, rbp =3D 0x7fffffffd6c0 ---

  This call stack is quite interesting:

 1: A process is calling writev()
 2: Kernel calls sosend_generic() that starts allocating memory
 3: This allocation is then interrupted by the timer interrupt handler
[that could actually trigger tcp_tw_2msl_scan(reuse=3D0)]
 4: The timer interrupt handler seems to wait on sched_switch()

 And fun fact:  When sosend_generic() calls m_uiotombuf() it does not
hold INP_WLOCK yet...

--
Julien


--LOEuSf0BIKjR7HVfK3G906jP9JLEU0oiC--

--TtwNGnolL41w7XTiubjCTpMUQ4E3GcXkQ
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJX6OF/AAoJEKVlQ5Je6dhxPtgIAOkdrwJWREyUwwRewrVTz/GR
eohxzAM6NBuw7shSEvt/8qgaST8ClzxzuGQgrAf2+hevxNwa4Fo2cGqyfRdBp57v
EeDKk2N/zJsiSCRr7HMbj/FWX+Nioa8IvxiABZx9SIhdTPtYD9lIUgSSrNEGWUrQ
qw5VA+/i0ufm0pqLr8onZFSauZPRduHb2hpZFnCaxrNK3CeG2V6xYziP8f4anjyy
F5P+exeiQ+iPZZ50+E49u2xOnnla3LwnoEO9MTZ/lSjLrVw+cc0L8HHfsFb+euA5
TgamvpQ9eSkx3yQsn1sc2ejc8+mvLQPz0eMLv1CrgCcbz0pHc11WAnVyrhJzc28=
=i032
-----END PGP SIGNATURE-----

--TtwNGnolL41w7XTiubjCTpMUQ4E3GcXkQ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c33b5f76-046c-10cf-b81b-2eac55e2491c>