From owner-freebsd-performance@FreeBSD.ORG Mon Nov 22 00:22:22 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CA54F106566C for ; Mon, 22 Nov 2010 00:22:22 +0000 (UTC) (envelope-from gofp-freebsd-performance@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 53BFF8FC14 for ; Mon, 22 Nov 2010 00:22:21 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PKKAi-0003rn-8D for freebsd-performance@freebsd.org; Mon, 22 Nov 2010 01:22:16 +0100 Received: from cpe-188-129-98-75.dynamic.amis.hr ([188.129.98.75]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 22 Nov 2010 01:22:16 +0100 Received: from ivoras by cpe-188-129-98-75.dynamic.amis.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 22 Nov 2010 01:22:16 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-performance@freebsd.org From: Ivan Voras Date: Mon, 22 Nov 2010 01:21:58 +0100 Lines: 46 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cpe-188-129-98-75.dynamic.amis.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 Cc: freebsd-hackers@freebsd.org Subject: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Nov 2010 00:22:22 -0000 This is not a request for help but a report, in case it helps developers or someone in the future. The setup is: AMD64 machine, 24 GB RAM, 2x6-core Xeon CPU + HTT (24 logical CPUs) FreeBSD 8.1-stable, AMD64 PostgreSQL 9.0.1, 10 GB shared buffers, using pgbench with a scale factor of 500 (7.5 GB database) with pgbench -S (SELECT-queries only, no disk IO) the performance curve is: -c# result 4 33549 8 64864 12 79491 16 79887 20 66957 24 52576 28 50406 32 49491 40 45535 50 39499 75 29415 After 16 clients (which is still good since there are only 12 "real" cores in the system), the performance drops sharply, and looking at the processes' state, most of them seem to eat away system call (i.e. executing in the kernel) in states "semwait" and "sbwait", i.e. semaphore wait and socket buffer wait, for example: 3107 pgsql 1 62 0 10533M 439M CPU1 0 0:02 13.57% postgres 3105 pgsql 1 63 0 10533M 438M CPU9 9 0:02 13.57% postgres 3109 pgsql 1 62 0 10533M 440M sbwait 13 0:02 13.48% postgres 3106 pgsql 1 61 0 10533M 445M sbwait 8 0:02 13.48% postgres 3118 pgsql 1 62 0 10533M 431M sbwait 21 0:02 13.48% postgres 3114 pgsql 1 63 0 10533M 434M sbwait 19 0:02 13.38% postgres 3122 pgsql 1 63 0 10533M 428M sbwait 15 0:02 13.28% postgres 3108 pgsql 1 63 0 10533M 439M sbwait 5 0:02 13.18% postgres 3116 pgsql 1 62 0 10533M 432M sbwait 11 0:02 13.18% postgres 3113 pgsql 1 62 0 10533M 430M semwai 20 0:02 13.18% postgres 3115 pgsql 1 62 0 10533M 428M RUN 14 0:02 13.18% postgres The "semwait" part is from PostgreSQL - probably shared buffer locking, but there's a large number of processes regularly in sbwait - maybe something can be optimized here? This is IPC over Unix sockets. From owner-freebsd-performance@FreeBSD.ORG Mon Nov 22 06:14:31 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B634F1065670 for ; Mon, 22 Nov 2010 06:14:31 +0000 (UTC) (envelope-from feld@feld.me) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 875098FC0C for ; Mon, 22 Nov 2010 06:14:31 +0000 (UTC) Received: by iwn39 with SMTP id 39so8148767iwn.13 for ; Sun, 21 Nov 2010 22:14:31 -0800 (PST) Received: by 10.231.19.8 with SMTP id y8mr6455970iba.111.1290404958172; Sun, 21 Nov 2010 21:49:18 -0800 (PST) Received: from skeletor.lan (66-168-54-242.dhcp.mdsn.wi.charter.com [66.168.54.242]) by mx.google.com with ESMTPS id 34sm5089638ibi.20.2010.11.21.21.49.17 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 21 Nov 2010 21:49:17 -0800 (PST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-performance@freebsd.org References: Date: Sun, 21 Nov 2010 23:49:16 -0600 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Mark Felder" Message-ID: In-Reply-To: User-Agent: Opera Mail/11.00 (FreeBSD) Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Nov 2010 06:14:31 -0000 I recommend posting this on the Postgres performance list, too. Regards, Mark From owner-freebsd-performance@FreeBSD.ORG Mon Nov 22 08:37:23 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30680106566B for ; Mon, 22 Nov 2010 08:37:23 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 053D58FC12; Mon, 22 Nov 2010 08:37:23 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oAM8bL0T093475; Mon, 22 Nov 2010 08:37:22 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4CEA9C46.8010507@freebsd.org> Date: Mon, 22 Nov 2010 16:37:26 +0000 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: Mark Felder References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Nov 2010 08:37:23 -0000 Mark Felder wrote: > I recommend posting this on the Postgres performance list, too. > > > > > Regards, > > > Mark I think if PostgreSQL uses semaphore for inter-process locking, it might be a good idea to use POSIX semaphore exits in our head branch, the new POSIX semaphore implementation now supports process-shared, and is more light weight than SYSV semaphore, if there is no contention, a process need not enter kernel to acquire/release a lock. Note that I have just fixed a bug in head branch. However RELENG_8 does not support process-shared semaphore yet. Regards, David Xu From owner-freebsd-performance@FreeBSD.ORG Tue Nov 23 00:30:07 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED38F106564A for ; Tue, 23 Nov 2010 00:30:07 +0000 (UTC) (envelope-from gofp-freebsd-performance@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 963DF8FC15 for ; Tue, 23 Nov 2010 00:30:06 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PKglp-0007v4-FE for freebsd-performance@freebsd.org; Tue, 23 Nov 2010 01:30:05 +0100 Received: from cpe-188-129-85-205.dynamic.amis.hr ([188.129.85.205]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 23 Nov 2010 01:30:05 +0100 Received: from ivoras by cpe-188-129-85-205.dynamic.amis.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 23 Nov 2010 01:30:05 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-performance@freebsd.org From: Ivan Voras Date: Tue, 23 Nov 2010 01:26:27 +0100 Lines: 38 Message-ID: References: <4CEA9C46.8010507@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cpe-188-129-85-205.dynamic.amis.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: <4CEA9C46.8010507@freebsd.org> Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2010 00:30:08 -0000 On 11/22/10 17:37, David Xu wrote: > Mark Felder wrote: >> I recommend posting this on the Postgres performance list, too. >> >> >> >> >> Regards, >> >> >> Mark > > I think if PostgreSQL uses semaphore for inter-process locking, > it might be a good idea to use POSIX semaphore exits in our head > branch, the new POSIX semaphore implementation now supports > process-shared, and is more light weight than SYSV semaphore, > if there is no contention, a process need not enter kernel to > acquire/release a lock. Note that I have just fixed a bug in head > branch. However RELENG_8 does not support process-shared semaphore > yet. Another thing might be that, despite that they appear to try to avoid it, they possibly have a large number of processes hanging on the same semaphore, leading to thundering herd problem. There already is code for POSIX semaphores in PostgreSQL. It requires some manual fiddling with the configuration to enable (USE_UNNAMED_POSIX_SEMAPHORES). However, I've just tried it on 9-CURRENT and it doesn't work: Nov 23 01:23:02 biggie postgres[1515]: [1-1] FATAL: sem_init failed: No space left on device PostgreSQL calls it as "sem_init(sem, 1, 1);" One more thing: apparently I had to kldload sem.ko - which looks like an error, since it is in GENERIC in 8-STABLE ! From owner-freebsd-performance@FreeBSD.ORG Tue Nov 23 00:51:36 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 495E91065693 for ; Tue, 23 Nov 2010 00:51:36 +0000 (UTC) (envelope-from gofp-freebsd-performance@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id C03578FC26 for ; Tue, 23 Nov 2010 00:51:35 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PKh6b-0006xC-Qa for freebsd-performance@freebsd.org; Tue, 23 Nov 2010 01:51:33 +0100 Received: from cpe-188-129-85-205.dynamic.amis.hr ([188.129.85.205]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 23 Nov 2010 01:51:33 +0100 Received: from ivoras by cpe-188-129-85-205.dynamic.amis.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 23 Nov 2010 01:51:33 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-performance@freebsd.org From: Ivan Voras Date: Tue, 23 Nov 2010 01:51:21 +0100 Lines: 78 Message-ID: References: <4CEA9C46.8010507@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cpe-188-129-85-205.dynamic.amis.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2010 00:51:36 -0000 On 11/23/10 01:26, Ivan Voras wrote: > On 11/22/10 17:37, David Xu wrote: >> Mark Felder wrote: >>> I recommend posting this on the Postgres performance list, too. >>> >>> >>> >>> >>> Regards, >>> >>> >>> Mark >> >> I think if PostgreSQL uses semaphore for inter-process locking, >> it might be a good idea to use POSIX semaphore exits in our head >> branch, the new POSIX semaphore implementation now supports >> process-shared, and is more light weight than SYSV semaphore, >> if there is no contention, a process need not enter kernel to >> acquire/release a lock. Note that I have just fixed a bug in head >> branch. However RELENG_8 does not support process-shared semaphore >> yet. > > Another thing might be that, despite that they appear to try to avoid > it, they possibly have a large number of processes hanging on the same > semaphore, leading to thundering herd problem. > > There already is code for POSIX semaphores in PostgreSQL. It requires > some manual fiddling with the configuration to enable > (USE_UNNAMED_POSIX_SEMAPHORES). > > However, I've just tried it on 9-CURRENT and it doesn't work: > > Nov 23 01:23:02 biggie postgres[1515]: [1-1] FATAL: sem_init failed: No > space left on device Ok, I've found the p1003_1b.sem_nsems_max sysctl. It seems to help when used instead of sysv semaphores, but very little: sysv semaphores: -c# result 4 33549 8 64864 12 79491 16 79887 20 66957 24 52576 28 50406 32 49491 40 45535 50 39499 75 29415 posix semaphores: 16 79125 20 70061 24 55620 After 20 clients, sys time goes sharply up like before procs memory page disks faults cpu r b w avm fre flt re pi po fr sr mf0 mf1 in sy cs us sy id 27 32 0 11887M 3250M 62442 0 0 0 0 0 0 0 10 255078 109047 18 73 10 30 32 0 11887M 3162M 58165 0 0 0 12 0 0 1 7 272540 114416 17 75 9 29 32 0 11887M 3105M 57487 0 0 0 0 0 0 0 8 279475 117891 15 75 10 16 31 0 11887M 3063M 59215 0 0 0 0 0 0 0 6 295342 121090 16 70 13 and the overall behaviour is similar - the processes spend a lot of time in "sbwait" and "ksem" states. From owner-freebsd-performance@FreeBSD.ORG Tue Nov 23 01:35:39 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B88E51065670; Tue, 23 Nov 2010 01:35:39 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8C3C68FC13; Tue, 23 Nov 2010 01:35:39 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oAN1Zb7w056682; Tue, 23 Nov 2010 01:35:38 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4CEB8AEF.7030202@freebsd.org> Date: Tue, 23 Nov 2010 09:35:43 +0000 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: Ivan Voras References: <4CEA9C46.8010507@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2010 01:35:39 -0000 Ivan Voras wrote: > On 11/23/10 01:26, Ivan Voras wrote: >> On 11/22/10 17:37, David Xu wrote: >>> Mark Felder wrote: >>>> I recommend posting this on the Postgres performance list, too. >>>> >>>> >>>> >>>> >>>> Regards, >>>> >>>> >>>> Mark >>> >>> I think if PostgreSQL uses semaphore for inter-process locking, >>> it might be a good idea to use POSIX semaphore exits in our head >>> branch, the new POSIX semaphore implementation now supports >>> process-shared, and is more light weight than SYSV semaphore, >>> if there is no contention, a process need not enter kernel to >>> acquire/release a lock. Note that I have just fixed a bug in head >>> branch. However RELENG_8 does not support process-shared semaphore >>> yet. >> >> Another thing might be that, despite that they appear to try to avoid >> it, they possibly have a large number of processes hanging on the same >> semaphore, leading to thundering herd problem. >> >> There already is code for POSIX semaphores in PostgreSQL. It requires >> some manual fiddling with the configuration to enable >> (USE_UNNAMED_POSIX_SEMAPHORES). >> >> However, I've just tried it on 9-CURRENT and it doesn't work: >> >> Nov 23 01:23:02 biggie postgres[1515]: [1-1] FATAL: sem_init failed: No >> space left on device > > Ok, I've found the p1003_1b.sem_nsems_max sysctl. > > It seems to help when used instead of sysv semaphores, but very little: > > sysv semaphores: > > -c# result > 4 33549 > 8 64864 > 12 79491 > 16 79887 > 20 66957 > 24 52576 > 28 50406 > 32 49491 > 40 45535 > 50 39499 > 75 29415 > > posix semaphores: > > 16 79125 > 20 70061 > 24 55620 > > After 20 clients, sys time goes sharply up like before > > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr mf0 mf1 in sy cs > us sy id > 27 32 0 11887M 3250M 62442 0 0 0 0 0 0 0 10 255078 > 109047 18 73 10 > 30 32 0 11887M 3162M 58165 0 0 0 12 0 0 1 7 272540 > 114416 17 75 9 > 29 32 0 11887M 3105M 57487 0 0 0 0 0 0 0 8 279475 > 117891 15 75 10 > 16 31 0 11887M 3063M 59215 0 0 0 0 0 0 0 6 295342 > 121090 16 70 13 > > > and the overall behaviour is similar - the processes spend a lot of time > in "sbwait" and "ksem" states. > Strange, the POSIX semaphore in head branch does not use ksem, it is based on umtx, there is no limit on POSIX semaphore, the only limit is process's address space which limits how many semaphores can be used. From owner-freebsd-performance@FreeBSD.ORG Tue Nov 23 02:15:34 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6DA11065674; Tue, 23 Nov 2010 02:15:34 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 6796B8FC0A; Tue, 23 Nov 2010 02:15:34 +0000 (UTC) Received: by qwg5 with SMTP id 5so531679qwg.13 for ; Mon, 22 Nov 2010 18:15:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:sender:received :in-reply-to:references:from:date:x-google-sender-auth:message-id :subject:to:cc:content-type; bh=Pt3d+RqVQv8vj3tMYBfXT6sHnlrcXWU5K2HhOnJmXN4=; b=Lfb1dmXN8b3+1KIhQlNIAU3zk2LmEbdSfL//+6gX0n5l+jZuC8puon2NjQ6GHp3qk9 9dr5XYZZmAuQ7Sd8VciVoQ1YwALGddizs/isoxW1xrv6F+MoBpE5MzASrBeEMBPmeKER HYaVWdVDz48bmxaV3MRHc6vIJFSiIDPy2rYAk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; b=bAgP+74gPBl4J0wo5yBLV/+N9McnVhiDHQGImpvBKbJFNYSve0UMR2iG40K1egqp5L BEBKSiZpRhVjm9Wjv/eFLwSu1V9JT0lRzdNDG14V3KmgaUn70SpW+SHAGUgHauixZvep ViGUUcIKlWe4oUOfzutW19EgiAIvvb5T3j+Vs= Received: by 10.229.212.5 with SMTP id gq5mr5599744qcb.275.1290476759318; Mon, 22 Nov 2010 17:45:59 -0800 (PST) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.229.231.143 with HTTP; Mon, 22 Nov 2010 17:45:19 -0800 (PST) In-Reply-To: <4CEB8AEF.7030202@freebsd.org> References: <4CEA9C46.8010507@freebsd.org> <4CEB8AEF.7030202@freebsd.org> From: Ivan Voras Date: Tue, 23 Nov 2010 02:45:19 +0100 X-Google-Sender-Auth: Fel-KL5ST49mnrAAqD_IhjtgAzc Message-ID: To: David Xu Content-Type: text/plain; charset=UTF-8 Cc: freebsd-performance@freebsd.org Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2010 02:15:34 -0000 On 23 November 2010 10:35, David Xu wrote: > Ivan Voras wrote: >> and the overall behaviour is similar - the processes spend a lot of time >> in "sbwait" and "ksem" states. >> > Strange, the POSIX semaphore in head branch does not use ksem, it is > based on umtx, there is no limit on POSIX semaphore, the only limit > is process's address space which limits how many semaphores can be > used. *shrug*; I don't know how it could be wrong - this PostgreSQL was built from ports after I upgraded & booted 9-current. If it didn't use POSIX semaphores from HEAD, shared semaphores wouldn't have worked, right? From owner-freebsd-performance@FreeBSD.ORG Tue Nov 23 02:34:59 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0FD87106564A; Tue, 23 Nov 2010 02:34:59 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id F05E78FC15; Tue, 23 Nov 2010 02:34:58 +0000 (UTC) Received: from xyf.my.dom (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id oAN2YuJl018060; Tue, 23 Nov 2010 02:34:57 GMT (envelope-from davidxu@freebsd.org) Message-ID: <4CEB98D6.40902@freebsd.org> Date: Tue, 23 Nov 2010 10:35:02 +0000 From: David Xu User-Agent: Thunderbird 2.0.0.24 (X11/20100630) MIME-Version: 1.0 To: Ivan Voras References: <4CEA9C46.8010507@freebsd.org> <4CEB8AEF.7030202@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-performance@freebsd.org Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Nov 2010 02:34:59 -0000 Ivan Voras wrote: > On 23 November 2010 10:35, David Xu wrote: >> Ivan Voras wrote: > >>> and the overall behaviour is similar - the processes spend a lot of time >>> in "sbwait" and "ksem" states. >>> >> Strange, the POSIX semaphore in head branch does not use ksem, it is >> based on umtx, there is no limit on POSIX semaphore, the only limit >> is process's address space which limits how many semaphores can be >> used. > > *shrug*; I don't know how it could be wrong - this PostgreSQL was > built from ports after I upgraded & booted 9-current. > > If it didn't use POSIX semaphores from HEAD, shared semaphores > wouldn't have worked, right? > It may work, but even it is shared in memory, it still enters kernel to do P/V operation. From owner-freebsd-performance@FreeBSD.ORG Wed Nov 24 01:40:07 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 376E71065672 for ; Wed, 24 Nov 2010 01:40:07 +0000 (UTC) (envelope-from gofp-freebsd-performance@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id D773B8FC17 for ; Wed, 24 Nov 2010 01:40:06 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PL4L6-00021T-N3 for freebsd-performance@freebsd.org; Wed, 24 Nov 2010 02:40:04 +0100 Received: from bl13-84-33.dsl.telepac.pt ([85.246.84.33]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 24 Nov 2010 02:40:04 +0100 Received: from luis.neves by bl13-84-33.dsl.telepac.pt with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 24 Nov 2010 02:40:04 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-performance@freebsd.org From: Luis Neves Date: Wed, 24 Nov 2010 01:30:01 +0000 Lines: 23 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: bl13-84-33.dsl.telepac.pt User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 In-Reply-To: X-Mailman-Approved-At: Wed, 24 Nov 2010 02:57:13 +0000 Cc: freebsd-hackers@freebsd.org Subject: Re: PostgreSQL performance scaling X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Nov 2010 01:40:07 -0000 On 11/22/2010 12:21 AM, Ivan Voras wrote: > The "semwait" part is from PostgreSQL - probably shared buffer locking, > but there's a large number of processes regularly in sbwait - maybe > something can be optimized here? I think this paper was mentioned before, did you read it?... "An Analysis of Linux Scalability to Many Cores"? ABSTRACT. "This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48- core computer." The paper is about Linux, but it also focus on some changes that can be made to PostgreSQL to achieve better concurrency. -- Luis Neves From owner-freebsd-performance@FreeBSD.ORG Thu Nov 25 09:50:16 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3291106564A for ; Thu, 25 Nov 2010 09:50:15 +0000 (UTC) (envelope-from yar.tikhiy@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8D7478FC16 for ; Thu, 25 Nov 2010 09:50:15 +0000 (UTC) Received: by wyf19 with SMTP id 19so708294wyf.13 for ; Thu, 25 Nov 2010 01:50:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=NPhOp55UigYWIhd8Qfz8LbuP3WsklTO4Yb3dz+odlck=; b=ZVMRjDGij0E29qhRceVCFrAgHgu3aR5lHrSQGG7mDTf43NSf5lfFkQL7iEu0TsTy3S sUNXIX7XcXri+dGaHc1Z8Ck75biBwpbo1w5uMZDnNo8w5B5HcIIkHCImAM2ryxMIS0sN 5E+LUQTlSjuTSrjv1hLx3KDVZ/BQ3GMaFbjA4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=eYOyXw1Xn7VT8vlKwHeO5GNapOT2uHW1D2b0fTQ1VDJot7l8QZLgC9iKD/k4FANveh 94vY7IlLZ8ZiEAQqFzGMOSeOWsJ/xSfqB20zfFVSdgWZjBt76gv5MITddK2H0n66qF34 1dyCE1NI0RigCD76D/ayqrWRcZumjykd+WjY0= MIME-Version: 1.0 Received: by 10.227.132.137 with SMTP id b9mr531094wbt.48.1290676835234; Thu, 25 Nov 2010 01:20:35 -0800 (PST) Received: by 10.227.127.143 with HTTP; Thu, 25 Nov 2010 01:20:35 -0800 (PST) Date: Thu, 25 Nov 2010 20:20:35 +1100 Message-ID: From: Yar Tikhiy To: freebsd-performance@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Poor RAID performance demystified X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2010 09:50:16 -0000 Hi all, This issue has been raised periodically on various lists and forums and I myself recently ran into it so I feel that I should just post my findings here. Every now and then somebody complains about extremely poor RAID performance. What is common in those reports is that they usually mention FreeBSD and HP RAID controllers, and all of them are about load patterns from Postgresql. We are just about to see why it's so. People get surprisingly low disk I/O performance (e.g., 1-2MB/s) in spite of numerous spindles striped in the array when the benchmark involves a lot of tiny DB transactions. On the same array, sequential read and write rates can be more than satisfactory. That happens just because Postgresql in its default configuration is *remarkably* stringent about flushing every transaction out to the disk before proceeding to the next one. The PG folks know that well. But, as it is known from practice, just application flushing data wouldn't be sufficient for this effect to be so pronounced. What _might_ be happening here is that HP RAIDs as driven by FreeBSD do fully comply with flush requests all the way down the disk stack whereas other popular RAID / OS combos can effectively ignore them to a certain extent due to latent write-back caching, e.g., that in the drives. Why does striping fail to speed the things up? Just because the transactions are tiny and every disk write ends up blocked waiting for a single spindle to handle it. No striping can speed up 8K or 16K synchronous writes because they are seek limited, not bandwidth limited. (Likewise, no RAID or cache can speed up highly random reads just a few blocks each as reads are synchronous by their nature just because you can't know the data before it has been read in.) It is easy to check if you are hitting this kind of bottleneck. While running your benchmark, watch the output from iostat or systat -vm or gstat. The average I/O size will closely match the FS block size (the default is 16K now on FFS) and the tps (transfers per second) value will be quite close to your disks RPM rate expressed in revs per second. E.g., with 10K RPM disks you are going to get 10000 / 60 = ~170 tps and with 15K RPM disks it'll be around 250 tps. You are just hitting very basic laws of nature and logic here. The final question will be, of course, what to do about this issue. First of all, make up your mind if 150 or 200 write transactions per second aren't going to be enough for your task. Your actual load pattern can be quite different from that in the benchmark. If you still need greater write performance on tiny transactions, consider getting a battery backup unit (BBU) for your RAID adapter. Quite remarkably, HP refer to them as "Write-back Cache Enablers" because installing one is the only way to get an HP RAID adapter do write-back caching. A write-back cache with BBU will let the adapter delay and coalesce tiny writes without jeopardizing the DB integrity. However, you'll need to trust your BBU as your DB integrity will be staked on it (the PG folks are somehow skeptical about BBUs). On the other hand, just fiddling with the PG settings to disable transaction flushing is a certain recipe for disaster. Fortunately, there is a trade-off mode in PG where it does transaction coalescing by itself -- search for synchronous_commit. The downside of it is that, should the system crash, a few most recent transactions can be lost after they were reported as successful to the SQL client. That can be OK or not OK depending on the task, and synchronous_commit can be toggled on per session or per transaction basis to finely tune the trade-off. That's it, folks. Thanks, Yar From owner-freebsd-performance@FreeBSD.ORG Thu Nov 25 11:10:35 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 944AE1065674 for ; Thu, 25 Nov 2010 11:10:35 +0000 (UTC) (envelope-from gofp-freebsd-performance@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 477B38FC16 for ; Thu, 25 Nov 2010 11:10:34 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PLZii-0004vA-Rl for freebsd-performance@freebsd.org; Thu, 25 Nov 2010 12:10:32 +0100 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 25 Nov 2010 12:10:32 +0100 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 25 Nov 2010 12:10:32 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-performance@freebsd.org From: Ivan Voras Date: Thu, 25 Nov 2010 12:10:28 +0100 Lines: 19 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.12) Gecko/20101102 Thunderbird/3.1.6 In-Reply-To: X-Enigmail-Version: 1.1.2 Subject: Re: Poor RAID performance demystified X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Nov 2010 11:10:35 -0000 On 11/25/10 10:20, Yar Tikhiy wrote: > If you > still need greater write performance on tiny transactions, consider > getting a battery backup unit (BBU) for your RAID adapter. Quite > remarkably, HP refer to them as "Write-back Cache Enablers" because > installing one is the only way to get an HP RAID adapter do write-back > caching. A write-back cache with BBU will let the adapter delay and > coalesce tiny writes without jeopardizing the DB integrity. However, > you'll need to trust your BBU as your DB integrity will be staked on > it (the PG folks are somehow skeptical about BBUs). HP also has (and so do probably others by now) capacitor-backed flash caches; the theory is to have a fast random IO chunk of flash memory and use the capacitor to keep the power up for as long as the flash needs to write its large blocks. I've tried it and the performance is good, but don't have it in production yet.