From owner-freebsd-fs@FreeBSD.ORG Tue Mar 31 14:55:05 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D7FDB7C7 for ; Tue, 31 Mar 2015 14:55:05 +0000 (UTC) Received: from new2-smtp.messagingengine.com (new2-smtp.messagingengine.com [66.111.4.224]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A4394FB0 for ; Tue, 31 Mar 2015 14:55:05 +0000 (UTC) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailnew.nyi.internal (Postfix) with ESMTP id 8057DADF for ; Tue, 31 Mar 2015 10:54:55 -0400 (EDT) Received: from web3 ([10.202.2.213]) by compute2.internal (MEProxy); Tue, 31 Mar 2015 10:54:58 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=eBrAFpU/i9EudS2 iG08R50Q4ZAQ=; b=jJUUxOhwLqaFVAhYcIi1PKQUFZj7HewjL363eHwiAirYtrq jffuy7AzcJwGQr4ehedXpzSXjkB/BW/L93ikyHQlyzTKHVHNtGtdg5YSnfqRJv7b E02AEYSUCWI4rlFgj3oNBm9IYYEoyOrzDghVQ4Q7KToPjVpBtckRH9gGESqc= Received: by web3.nyi.internal (Postfix, from userid 99) id 34137114A48; Tue, 31 Mar 2015 10:54:58 -0400 (EDT) Message-Id: <1427813698.641733.247585797.28816738@webmail.messagingengine.com> X-Sasl-Enc: CLndvw47TnUYuCiOc9jL8M+wOSnZKU5JziU8vAJnXn15 1427813698 From: Mark Felder To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" X-Mailer: MessagingEngine.com Webmail Interface - ajax-0b3c2300 In-Reply-To: <5519F74C.1040308@artem.ru> References: <55170D9C.1070107@artem.ru> <1427727936.293597.247070269.5CE0D411@webmail.messagingengine.com> <55196FC7.8090107@artem.ru> <1427730597.303984.247097389.165D5AAB@webmail.messagingengine.com> <5519716F.6060007@artem.ru> <1427731061.306961.247099633.0A421E90@webmail.messagingengine.com> <5519740A.1070902@artem.ru> <1427731759.309823.247107417.308CD298@webmail.messagingengine.com> <5519F74C.1040308@artem.ru> Subject: Re: Little research how rm -rf and tar kill server Date: Tue, 31 Mar 2015 09:54:58 -0500 Cc: mav@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Mar 2015 14:55:05 -0000 On Mon, Mar 30, 2015, at 20:24, Artem Kuchin wrote: > 30.03.2015 19:09, Mark Felder =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > > > > On Mon, Mar 30, 2015, at 11:04, Artem Kuchin wrote: > >> 30.03.2015 18:57, Mark Felder =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > >>> On Mon, Mar 30, 2015, at 10:53, Artem Kuchin wrote: > >>>> This is normal state, not under rm -rf > >>>> Do you need it during rm -rf ? > >>>> > >>> No, but I wonder if changing the timer from LAPIC to HPET or possibly > >>> one of the other timers makes the system more responsive under that > >>> load. Would you mind testing that? > >>> > >>> You can switch the timer like this: > >>> > >>> sysctl kern.eventtimer.timer=3DHPET > >>> > >>> And then run some of your I/O tests > >>> > >> I see. I will test at night, when load goes down. > >> I cannot say sure that's a right way to dig, but i will test anything= :) > >> > >> Just to remind: untar overloads the system, but untar + sync every 120s > >> does not. > >> That seems very strange to me. I think the problem might be somewhere > >> here. > >> > > I just heard from mav that there was a bottleneck in gmirror/graid with > > regards to BIO_DELETE requests > > > > https://svnweb.freebsd.org/base?view=3Drevision&revision=3D280757 > > >=20 > I applied this patch manually and rebuilt the kernel. > Hit this bug > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D195458 > on reboot, wasted 1 hour fsck-ing 2 times (was dirty after first fsck) > and after boot tried doing > rm -rf test1 > I coult not test anything, because it complete after 1 minute, instead=20 > 15 minutes before. > I copier the dir 4 times into subdirs and rm -rf full tree (4x larger) -= =20 > fast and smooth, > mariadb did not tonice this, server were working fine. >=20 > However, i also noticed another thing: > cp -Rp test test1 > also work a lot faster now, probably 3-5 times faster > Maybe it is because fs is free of tons BIO_DELETE from other processes >=20 >=20 > Then i did the untar test at maximum speed ( no pv to limit bandwidth): > i see that mysql request became slower, but mysql sql request queue=20 > built up slower now. > However, when it reached 70 i stopped untar and mariadb could not=20 > recover from condition > until i executed sync. However, this time sync took only a second. > I see big improvement, but i still don't understand why i need to issue=20 > sync manually to push > everything to recover from overload. >=20 > # man 2 sync > a sync() system call is issued frequently by the user process syncer(4)=20 > (about every 30 seconds). >=20 > it does not seem to be true >=20 > I checked syncer sysctl >=20 > # sysctl kern.filedelay > kern.filedelay: 30 > # sysctl kern.dirdelay > kern.dirdelay: 29 > # sysctl kern.metadelay > kern.metadelay: 28 >=20 > # ps ax | grep sync > 23 - DL 0:03.82 [syncer] >=20 > no clue why need manual sync >=20 > By the way: is there way to make sure that SU+J is really working?=20 > Maybe it is disabled for some reason > and i don't know it. tunefs just shows stored setting, but, for example,= =20 > with dirty fs, journaling is not > working in reality. Any way to get current status of SU journaling? >=20 > off topic: suggestion to move to ZFA was not so good, i see a "All=20 > available memory used when deleting files from ZFS" > topic. I'd rather have slow server when i can login and fix than halted=20 > on panic. Just to point that ZFS still have plenty > of unpredictable issues. >=20 This information is very good. Perhaps there is some more additional tweaking that could be done. I will cc mav@ on this.