From owner-freebsd-stable@freebsd.org Fri Sep 1 15:40:08 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 04F34E158A5 for ; Fri, 1 Sep 2017 15:40:08 +0000 (UTC) (envelope-from devgs@ukr.net) Received: from frv191.fwdcdn.com (frv191.fwdcdn.com [212.42.77.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BF42576A6A for ; Fri, 1 Sep 2017 15:40:07 +0000 (UTC) (envelope-from devgs@ukr.net) Received: from [10.10.1.26] (helo=frv197.fwdcdn.com) by frv191.fwdcdn.com with esmtp ID 1dnnkr-000DDB-Hp for freebsd-stable@freebsd.org; Fri, 01 Sep 2017 18:21:09 +0300 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ukr.net; s=ffe; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-Id:To: Subject:From:Date:Sender:Reply-To:Cc:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=PuQCoafWGHc1y1XBWkuf33OMlHg4Wr1RXSnIhqlGT3k=; b=MzNvjhl9OU1/AISnEJjbvFHNKx +ZJwRxEXXydrUwIf2Ng+FAW+GvTZ59qP7iW1hEEef1WCDCSmUbbXI6BlQffqit/FqvHzh9RoOhb27 m1oe6AkInk5WoLAQKnrSJg4bWB3dscXYgzbw2ioi7j7Mm6p6Ldb4VvhA6Io0jbwq0ZtQ=; Received: from [10.10.10.33] (helo=frv33.fwdcdn.com) by frv197.fwdcdn.com with smtp ID 1dnnki-000BnD-TX for freebsd-stable@freebsd.org; Fri, 01 Sep 2017 18:21:00 +0300 Date: Fri, 01 Sep 2017 18:21:00 +0300 From: Paul Subject: High CPU usage in kernel on highly contended lock file To: freebsd-stable@freebsd.org, freebsd-hackers@freebsd.org X-Mailer: mail.ukr.net 5.0 Message-Id: <1504278581.38180443.tad3fj7o@frv33.fwdcdn.com> Received: from devgs@ukr.net by frv33.fwdcdn.com; Fri, 01 Sep 2017 18:21:00 +0300 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: binary X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Sep 2017 15:40:08 -0000 It seems that a lot of CPU resources are spend when trying to get exclusive lock on file from multiple processes concurrently. By multiple i mean hundreds. It seems that there's an initial cost of fcntl() call. Each process that tries to lock the file consumes some amount of CPU and cools down. However, each repeated fcntl() call will consume same amount of resources again. It seems as if entering the "wait queue" is expensive. Environment: #uname -a FreeBSD test.com 11.1-STABLE FreeBSD 11.1-STABLE #0 r322650: Thu Aug 31 19:49:49 EEST 2017 root@test.com:/usr/obj/usr/src/sys/SERVER amd64 Test case: test.c: #include #include #include #include #include #include static int child_count = 0; static void schild_handler(int sig) { --child_count; } static void alarm_handler(int sig) { } void lock_write(int fd) { struct flock fl; fl.l_type = F_WRLCK; fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 1; do { // Simulate interruption with alarm to re-enter the wait queue. alarm(1); } while (fcntl(fd, F_SETLKW, &fl) < 0); } int main(int argc, char ** argv) { if (argc < 2) { return 1; } signal(SIGCHLD, schild_handler); struct sigaction sig_action; memset(&sig_action, 0, sizeof sig_action); sig_action.sa_handler = alarm_handler; sigemptyset(&sig_action.sa_mask); sigaction(SIGALRM, &sig_action, NULL); int fd = open(argv[1], O_CREAT|O_RDWR, 0777); for (int i = 0; i < 300; ++i) { pid_t child_pid = fork(); if (!child_pid) { // Lock the descriptor. lock_write(fd); // Simulate some work. sleep(1); return 0; } ++child_count; } do { printf("\rchild count: %5u\n", child_count); sleep(1); } while(child_count); return 0; } Commands: # cd /tmp # ~~~~~ Create test.c # clang -o test test.c # ./test 11111 Note that on linux, even if 1000 children are spawned instead of 300, none of them ever appear in the top. This is a huge problem, because our current software uses lock files for sync purposes. And at times, when a lot of processes of said software are spawned (prime time), system becomes totally unresponsive with over 1000 LA.