From owner-freebsd-questions@FreeBSD.ORG Sun Dec 7 23:08:06 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 34254AE6 for ; Sun, 7 Dec 2014 23:08:06 +0000 (UTC) Received: from rc1.surewest.net (rc1.surewest.net [66.60.130.50]) by mx1.freebsd.org (Postfix) with ESMTP id 11063FFA for ; Sun, 7 Dec 2014 23:08:05 +0000 (UTC) Received: from smtp4.surewest.net ([66.60.130.145]) by rc1.surewest.net ({dfaaa318-551d-4a0a-8038-7c31cf31c4f6}) via TCP (outbound) with ESMTP id 20141207230803975; Sun, 07 Dec 2014 23:08:03 +0000 X-RC-FROM: Received: from smtpauth.surewest.net (smtpauth.surewest.net [66.60.130.153]) by smtp4.surewest.net (Postfix) with ESMTP id 772F189662; Sun, 7 Dec 2014 15:08:33 -0800 (PST) Received: from blacklamb.mykitchentable.net (mykitchentable.net [69.62.167.70]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by smtpauth.surewest.net (Postfix) with ESMTPSA id 77FAE58B3; Sun, 7 Dec 2014 15:08:03 -0800 (PST) Received: from [127.0.0.1] (blacksheep.mykitchentable.net [192.168.1.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: drew) by blacklamb.mykitchentable.net (Postfix) with ESMTPSA id 16F21E0299; Sun, 7 Dec 2014 15:08:00 -0800 (PST) Message-ID: <5484DDD0.2090005@mykitchentable.net> Date: Sun, 07 Dec 2014 15:08:00 -0800 From: Drew Tomlinson User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Paul Pathiakis , freebsd-questions@freebsd.org Subject: Re: Probably Hardware Trouble But What Is It? References: <5483A639.2050704@mykitchentable.net> <548488CD.50207@yahoo.com> In-Reply-To: <548488CD.50207@yahoo.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: avast! (VPS 141207-2, 12/07/2014), Outbound message X-Antivirus-Status: Clean X-MAG-OUTBOUND: surewest.redcondor.net@66.60.130.145/32 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 23:08:06 -0000 On 12/7/2014 9:05 AM, Paul Pathiakis via freebsd-questions wrote: > Drew, > > Just trying to assist.... > > From the look of it, something is definitely failing and it is either > the controller or the disk. FreeBSD is trying to stay alive. (I've > had something similar happen in the past. When I rebooted, a disk > showed to be faulted and inaccessible.) > > I'd theorize that the first line about the kernel maxfiles being > exceeded by root (borrowing you haven't changed the setting) is due to > the failure trying to allocate file handles to handle the requests > that can't be completed due to the failure. > > If you have access to the console and another drive, you may want to > connect a second drive, configure it to mirror the first and hope that > it can mirror the first. If it works, great. BTW, don't forget to > install bootblocks if this is your boot drive. > > Now, if it doesn't start to mirror the drive after being attached, > you're going to have to reboot. That's probably going to show you the > real failure. :-( > > If the controller card is onboard, not much you can do. If it's a > PCIe bus card, try to re-seat it. Sometimes things get pulled on, or > hit inadvertently and aren't sitting in the slot correctly any more. > > I agree with the other post in either replacing the connecting cables > and/or re-seating them. > > If, after all this, it doesn't work, it's probably the disk itself. > > Now, comes the patient part. If it's the drive, it's probably pretty > hot from failing and trying to do it's job. Don't laugh at this it's > worked for me 5 out of 7 times. Remove it from the machine, let it > cool to room temperature on anti-static bag. Once cool, put it in the > bag, put it in your freezer for at least three hours. Re-insert into > the machine. (At this point, you should have that other drive for the > mirror connected.) If the drive isn't a catastrophic loss, it will > work for a short time. I recommend you allow it to mirror. Ask the > drive to do NOTHING but let it sit and mirror while in single-user mode. > > However, before going to that last 'iffy' part, check everything > before that. Thank you for your suggestions. Funny you mention the freezer trick. I was just telling a co-worker about that as he's having trouble with a drive. My problem was that because of the failing drive, I couldn't verify which drive was causing the problem. Every time I'd try to issue a zpool or zfs command, it would just hang. I actually have 4 drives internally in the box and they are all together in a raidz1 pool and this pool contains my full FBSD system. Then I have another drive in an external SATA dock which I've put in it's own pool and mounted just to use for backups. I disconnected this drive and rebooted. Now I can access my system and have been able to verify that this is the failing drive. So I am lucky. All I have lost are backups. And thus all I need to do is replace this drive and then resume my backups. Thanks for your suggestions! Cheers, Drew -- Like card tricks? Visit The Alchemist's Warehouse to learn card magic secrets for free! http://alchemistswarehouse.com > > > On 12/06/2014 19:58, Drew Tomlinson wrote: >> I'm running FBS 9.1 RELEASE that I built several years ago. It's >> mostly a Samba server and has "just worked" so I've never done much >> more with it. However recently, I find it "locked up" with thousands >> of these messages on the console: >> >> kernel: kern.maxfiles limit exceeded by uid 0, please see tuning(7) >> >> I've looked in /var/log/messages and also see lots of messages like >> these: >> >> Dec 6 13:55:53 vm kernel: siisch0: ... waiting for slots 18000000 >> Dec 6 13:55:53 vm kernel: siisch0: Timeout on slot 28 >> Dec 6 13:55:53 vm kernel: siisch0: siis_timeout is 00040000 ss >> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000 >> Dec 6 13:55:53 vm kernel: siisch0: ... waiting for slots 08000000 >> Dec 6 13:55:55 vm kernel: siisch0: Timeout on slot 27 >> Dec 6 13:55:55 vm kernel: siisch0: siis_timeout is 00040000 ss >> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000 >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): FLUSHCACHE48. ACB: >> ea 00 00 00 00 40 00 00 00 00 00 00 >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command >> timeout >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. >> ACB: 60 01 fe d8 74 40 39 00 00 00 00 00 >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command >> timeout >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command >> Dec 6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. >> ACB: 60 0a a5 7f 00 40 4c 00 00 00 00 00 >> >> This machine uses zfs. I have two pools: >> >> # zpool list >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >> zback 1.81T 848G 1008G 45% 1.00x ONLINE - >> zroot 1.81T 1.16T 666G 64% 1.00x ONLINE - >> >> Then I tried this and my ssh window is now stuck: >> >> # zpool status >> pool: zback >> state: ONLINE >> status: One or more devices are faulted in response to IO failures. >> action: Make sure the affected devices are connected, then run 'zpool >> clear'. >> see: http://illumos.org/msg/ZFS-8000-HC >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> zback ONLINE 3 0 0 >> ada0 ONLINE 4 0 0 >> >> I opened another ssh window and tried 'zpool clear zback' as >> suggested but it appears stuck too. >> >> I'm sure I haven't provided all the relevant information so please >> ask and I will do so. I'd appreciate any guidance on how to take a >> proper backup of ada0 and what I should do next. I think this zback >> pool is just the one disk which is a 2TB drive. I'd like to know how >> to confirm that if possible since it seems the zpool commands aren't >> able to complete. >> >> I appreciate any suggestions or guidance. >> >> Thanks, >> >> Drew >> > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org" >