From owner-freebsd-hackers@freebsd.org  Wed Mar 16 20:29:55 2016
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA5B3AD3391
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 16 Mar 2016 20:29:55 +0000 (UTC)
 (envelope-from stas@freebsd.org)
Received: from mx0.deglitch.com (mx0.deglitch.com
 [IPv6:2a00:13c0:63:7194:1::3])
 by mx1.freebsd.org (Postfix) with ESMTP id 567C186D;
 Wed, 16 Mar 2016 20:29:55 +0000 (UTC)
 (envelope-from stas@freebsd.org)
Received: from [IPv6:2620:10d:c082:1803:e9ee:be74:4bb7:1ef6] (unknown
 [IPv6:2620:10d:c090:200::8:4655])
 by mx0.deglitch.com (Postfix) with ESMTPSA id 35C308FC0B;
 Wed, 16 Mar 2016 13:29:50 -0700 (PDT)
From: Stanislav Sedov <stas@FreeBSD.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: Peformance issues with r278325
Date: Wed, 16 Mar 2016 13:29:47 -0700
Message-Id: <FA50A68E-7F3D-4361-8A8A-EB7F97EF3D00@FreeBSD.org>
To: freebsd-hackers@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
X-Mailer: Apple Mail (2.3112)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Mar 2016 20:29:55 -0000

Hi!

We're seeing weird performance issues on some of our systems on both
FreeBSD stable/10 and HEAD with r278325.  In particular, the part of the
change that switched the lapic_ipi_wait to microsecond-based delay seem
to cause slowdowns.

Some background.  Out systems mostly run a single large application that
is both I/O heavy and memory allocation heavy.  It is doing network =
traffic
processing, but that does not seem to be the culprit.  What we noticed =
is that
when running on FreeBSD 10.2+ or HEAD the memory allocations (and, =
particularly,
munmaps) become exponentially slower the bigger the application becomes =
in terms
of the memory space allocated.  This does not seem to happen on FreeBSD =
10.1
or stable/10 with the lapic_ipi_wait() change reverted.

The top output when this slowdown happens shows the following:
   =
__________________________________________________________________________=
_________
  /
 | 1585 user   -21  r31 20225M 17260M vm map  5 483:20  78.47% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M *proce 11 483:26  77.29% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M vm map 15 484:05  76.86% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M CPU21  21 262:49  76.86% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M CPU26  26 473:31  76.76% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M RUN     5 462:01  76.56% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M vm map  2 282:16  76.37% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M CPU4    4 475:24  76.27% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M CPU38  38 366:16  76.17% =
[beam.smp{beam.smp}]
 | 1585 user   -21  r31 20225M 17260M vm map 16 434:35  76.07% =
[beam.smp{beam.smp}]
  =
`-------------------------------------------------------------------------=
-----------

I had a pmcstat output as well somewhere, but can't find it at the =
moment.  But it was
indicating that a lot of time has been spend in VM mutexes as far as I =
remember.

I'm not entirely sure why this change would cause any effects like this. =
 One theory
I have is that TLB shootdowns become slower due to lapic_ipi_wait taking =
longer than
before to complete (as it was using pause instructions instead of =
explicit delays).

Any ideas?  Anything else I can look at to track the reason down?  Out =
platform is
Supermicro X9DRi-LN4+ with dual Xeon(R) CPU E5-2690 v2 CPUs.  I'm not =
sure if it's
platform specific or not though.  I've seen that happen on servers with =
E5-2690 v3
as well.

Thanks!

  =
__________________________________________________________________________=
___________
 /
|  int
|  lapic_ipi_wait(int delay)
|  {
| -       int x;
| +       int x, incr;
| =20
|         /*
| -        * Wait delay microseconds for IPI to be sent.  If delay is
| +        * Wait delay loops for IPI to be sent.  This is highly bogus
| +        * since this is sensitive to CPU clock speed.  If delay is
|          * -1, we wait forever.
|          */
|         if (delay =3D=3D -1) {
| -               while ((lapic->icr_lo & APIC_DELSTAT_MASK) !=3D =
APIC_DELSTAT_IDLE)
| -                       ia32_pause();
| -               return (1);
| -       }
| -
| -       for (x =3D 0; x < delay; x +=3D 5) {
| +               incr =3D 0;
| +               delay =3D 1;
| +       } else
| +               incr =3D 1;
| +       for (x =3D 0; x < delay; x +=3D incr) {
|                 if ((lapic->icr_lo & APIC_DELSTAT_MASK) =3D=3D =
APIC_DELSTAT_IDLE)
|                         return (1);
| -               DELAY(5);
| +               ia32_pause();
|         }
|         return (0);
|  }
| @@ -1433,9 +1433,9 @@ lapic_ipi_raw(register_t icrlo, u_int dest)
|         intr_restore(saveintr);
|  }
| =20
| -#define        BEFORE_SPIN     50000
| +#define        BEFORE_SPIN     1000000
|  #ifdef DETECT_DEADLOCK
| -#define        AFTER_SPIN      50
| +#define        AFTER_SPIN      1000
|  #endif
 =
`-------------------------------------------------------------------------=
----------

--
Stanislav Sedov
ST4096-RIPE