From nobody Sat Apr 20 17:23:41 2024 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VMJJr0kSlz5J8XC for ; Sat, 20 Apr 2024 17:23:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-vk1-f175.google.com (mail-vk1-f175.google.com [209.85.221.175]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VMJJq4TFKz43Kb; Sat, 20 Apr 2024 17:23:55 +0000 (UTC) (envelope-from asomers@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-vk1-f175.google.com with SMTP id 71dfb90a1353d-4dabbd69c71so833457e0c.1; Sat, 20 Apr 2024 10:23:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713633833; x=1714238633; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/sLfHja7xqhpW1mSfTEg4t+dp4YPmEGvUjKWIfniI4c=; b=RFVdONZeSz8t7YT9Nxa1yqclX6sA2eiW/UmEY8ApuhV2YyZy3Upe3aTksjKLp1wSIl NM7vh9RQWo/7t7FcECFfGhEJKmypURkXE2J9P2cQ4TFHA7SRyt3SfZmS6/JHtOvL7wjY Chgj2/ZWa5m5tAjalzRjQhwqhI7ONX2Mh/is5VqKtMnJ2MFiGk3O+QDFpbS26MUdD7rH M4EvMk2uiz7jBIV1l8IPTBxnvSKywTtpUVRnV6FiRY5pJPrGjERtvEdMjTGk9VF5HBQo LN7aoJQ6Lvh52r6PH2kerZwhytRFvzgh7WU2dvY/wEk8de5YPNvrUxomA/iBFBEpXUoX aHFw== X-Gm-Message-State: AOJu0YyaHm+t0R9RUZ+qOtaQcRnSGGb47q5cwRGCUy6rL7Xs26R/ZnRj fN3DvAOyGFrYwq9goPNhIUg5NB8jA4mQHZ0Ivh1+yeb8YqWzGy+uDQJJ/8E00gfaxJ6lbox6UWW m4kAdL2YsEnEu0BtdSNF7wqagjkklgB9z X-Google-Smtp-Source: AGHT+IFpiAcs/HwUMWeD3u6Jbb+GITOpIfD0A06fgrEt/ZiX9rUx7BZ+aLNsEz/IutFsLwXMv+eezccCRuuYc/TI8xs= X-Received: by 2002:a67:f788:0:b0:47b:b94e:9162 with SMTP id j8-20020a67f788000000b0047bb94e9162mr6089919vso.4.1713633833500; Sat, 20 Apr 2024 10:23:53 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Sat, 20 Apr 2024 11:23:41 -0600 Message-ID: Subject: Re: Stressing malloc(9) To: Mark Johnston Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US] X-Rspamd-Queue-Id: 4VMJJq4TFKz43Kb On Sat, Apr 20, 2024 at 9:07=E2=80=AFAM Mark Johnston w= rote: > > On Fri, Apr 19, 2024 at 04:23:51PM -0600, Alan Somers wrote: > > TLDR; > > How can I create a workload that causes malloc(9)'s performance to plum= met? > > > > Background: > > I recently witnessed a performance problem on a production server. > > Overall throughput dropped by over 30x. dtrace showed that 60% of the > > CPU time was dominated by lock_delay as called by three functions: > > printf (via ctl_worker_thread), g_eli_alloc_data, and > > g_eli_write_done. One thing those three have in common is that they > > all use malloc(9). Fixing the problem was as simple as telling CTL to > > stop printing so many warnings, by tuning > > kern.cam.ctl.time_io_secs=3D100000. > > > > But even with CTL quieted, dtrace still reports ~6% of the CPU cycles > > in lock_delay via g_eli_alloc_data. So I believe that malloc is > > limiting geli's performance. I would like to try replacing it with > > uma(9). > > What is the size of the allocations that g_eli_alloc_data() is doing? > malloc() is a pretty thin layer over UMA for allocations <=3D 64KB. > Larger allocations are handled by a different path (malloc_large()) > which goes directly to the kmem_* allocator functions. Those functions > are very expensive: they're serialized by global locks and need to > update the pmap (and perform TLB shootdowns when memory is freed). > They're not meant to be used at a high rate. In my benchmarks so far, 512B. In the real application the size is mostly between 4k and 16k, and it's always a multiple of 4k. But it's sometimes great enough to use malloc_large, and it's those malloc_large calls that account for the majority of the time spent in g_eli_alloc_data. lockstat shows that malloc_large, as called by g_elI_alloc_data, sometimes blocks for multiple ms. But oddly, if I change the parameters so that g_eli_alloc_data allocates 128kB, I still don't see malloc_large getting called. And both dtrace and vmstat show that malloc is mostly operating on 512B allocations. But dtrace does confirm that g_eli_alloc_data is being called with 128kB arguments. Maybe something is getting inlined? I don't understand how this is happening. I could probably figure out if I recompile with some extra SDT probes, though. > > My first guess would be that your production workload was hitting this > path, and your benchmarks are not. If you have stack traces or lock > names from DTrace, that would help validate this theory, in which case > using UMA to cache buffers would be a reasonable solution. Would that require creating an extra UMA zone for every possible geli allocation size above 64kB? > > > But on a non-production server, none of my benchmark workloads causes > > g_eli_alloc_data to break a sweat. I can't get its CPU consumption to > > rise higher than 0.5%. And that's using the smallest sector size and > > block size that I can. > > > > So my question is: does anybody have a program that can really stress > > malloc(9)? I'd like to run it in parallel with my geli benchmarks to > > see how much it interferes. > > > > -Alan > >