Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 07 Dec 2014 15:08:00 -0800
From:      Drew Tomlinson <drew@mykitchentable.net>
To:        Paul Pathiakis <pathiaki2@yahoo.com>, freebsd-questions@freebsd.org
Subject:   Re: Probably Hardware Trouble But What Is It?
Message-ID:  <5484DDD0.2090005@mykitchentable.net>
In-Reply-To: <548488CD.50207@yahoo.com>
References:  <5483A639.2050704@mykitchentable.net> <548488CD.50207@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12/7/2014 9:05 AM, Paul Pathiakis via freebsd-questions wrote:
> Drew,
>
> Just trying to assist....
>
> From the look of it, something is definitely failing and it is either 
> the controller or the disk.  FreeBSD is trying to stay alive.  (I've 
> had something similar happen in the past.  When I rebooted, a disk 
> showed to be faulted and inaccessible.)
>
> I'd theorize that the first line about the kernel maxfiles being 
> exceeded by root (borrowing you haven't changed the setting) is due to 
> the failure trying to allocate file handles to handle the requests 
> that can't be completed due to the failure.
>
> If you have access to the console and another drive, you may want to 
> connect a second drive, configure it to mirror the first and hope that 
> it can mirror the first.  If it works, great.  BTW, don't forget to 
> install bootblocks if this is your boot drive.
>
> Now, if it doesn't start to mirror the drive after being attached, 
> you're going to have to reboot.  That's probably going to show you the 
> real failure. :-(
>
> If the controller card is onboard, not much you can do.  If it's a 
> PCIe bus card, try to re-seat it.  Sometimes things get pulled on, or 
> hit inadvertently and aren't sitting in the slot correctly any more.
>
> I agree with the other post in either replacing the connecting cables 
> and/or re-seating them.
>
> If, after all this, it doesn't work, it's probably the disk itself.
>
> Now, comes the patient part.  If it's the drive, it's probably pretty 
> hot from failing and trying to do it's job.  Don't laugh at this it's 
> worked for me 5 out of 7 times.  Remove it from the machine, let it 
> cool to room temperature on anti-static bag.  Once cool, put it in the 
> bag, put it in your freezer for at least three hours.  Re-insert into 
> the machine.  (At this point, you should have that other drive for the 
> mirror connected.)  If the drive isn't a catastrophic loss, it will 
> work for a short time.  I recommend you allow it to mirror.  Ask the 
> drive to do NOTHING but let it sit and mirror while in single-user mode.
>
> However, before going to that last 'iffy' part, check everything 
> before that.

Thank you for your suggestions.  Funny you mention the freezer trick.  I 
was just telling a co-worker about that as he's having trouble with a drive.

My problem was that because of the failing drive, I couldn't verify 
which drive was causing the problem.  Every time I'd try to issue a 
zpool or zfs command, it would just hang.  I actually have 4 drives 
internally in the box and they are all together in a raidz1 pool and 
this pool contains my full FBSD system.  Then I have another drive in an 
external SATA dock which I've put in it's own pool and mounted just to 
use for backups.  I disconnected this drive and rebooted. Now I can 
access my system and have been able to verify that this is the failing 
drive.

So I am lucky.  All I have lost are backups.  And thus all I need to do 
is replace this drive and then resume my backups.

Thanks for your suggestions!

Cheers,

Drew

-- 
Like card tricks?

Visit The Alchemist's Warehouse to
learn card magic secrets for free!

http://alchemistswarehouse.com


>
>
> On 12/06/2014 19:58, Drew Tomlinson wrote:
>> I'm running FBS 9.1 RELEASE that I built several years ago.  It's 
>> mostly a Samba server and has "just worked" so I've never done much 
>> more with it.  However recently, I find it "locked up" with thousands 
>> of these messages on the console:
>>
>> kernel: kern.maxfiles limit exceeded by uid 0, please see tuning(7)
>>
>> I've looked in /var/log/messages and also see lots of messages like 
>> these:
>>
>> Dec  6 13:55:53 vm kernel: siisch0:  ... waiting for slots 18000000
>> Dec  6 13:55:53 vm kernel: siisch0: Timeout on slot 28
>> Dec  6 13:55:53 vm kernel: siisch0: siis_timeout is 00040000 ss 
>> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000
>> Dec  6 13:55:53 vm kernel: siisch0:  ... waiting for slots 08000000
>> Dec  6 13:55:55 vm kernel: siisch0: Timeout on slot 27
>> Dec  6 13:55:55 vm kernel: siisch0: siis_timeout is 00040000 ss 
>> 78000000 rs 78000000 es 00000000 sts 801b0000 serr 00000000
>> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): FLUSHCACHE48. ACB: 
>> ea 00 00 00 00 40 00 00 00 00 00 00
>> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command 
>> timeout
>> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command
>> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. 
>> ACB: 60 01 fe d8 74 40 39 00 00 00 00 00
>> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): CAM status: Command 
>> timeout
>> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): Retrying command
>> Dec  6 13:55:55 vm kernel: (ada0:siisch0:0:0:0): READ_FPDMA_QUEUED. 
>> ACB: 60 0a a5 7f 00 40 4c 00 00 00 00 00
>>
>> This machine uses zfs.  I have two pools:
>>
>> # zpool list
>> NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> zback  1.81T   848G  1008G    45%  1.00x  ONLINE  -
>> zroot  1.81T  1.16T   666G    64%  1.00x  ONLINE  -
>>
>> Then I tried this and my ssh window is now stuck:
>>
>> # zpool status
>>   pool: zback
>>  state: ONLINE
>> status: One or more devices are faulted in response to IO failures.
>> action: Make sure the affected devices are connected, then run 'zpool 
>> clear'.
>>    see: http://illumos.org/msg/ZFS-8000-HC
>>   scan: none requested
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         zback       ONLINE       3     0     0
>>           ada0      ONLINE       4     0     0
>>
>> I opened another ssh window and tried 'zpool clear zback' as 
>> suggested but it appears stuck too.
>>
>> I'm sure I haven't provided all the relevant information so please 
>> ask and I will do so.  I'd appreciate any guidance on how to take a 
>> proper backup of ada0 and what I should do next.  I think this zback 
>> pool is just the one disk which is a 2TB drive.  I'd like to know how 
>> to confirm that if possible since it seems the zpool commands aren't 
>> able to complete.
>>
>> I appreciate any suggestions or guidance.
>>
>> Thanks,
>>
>> Drew
>>
>
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to 
> "freebsd-questions-unsubscribe@freebsd.org"
>





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5484DDD0.2090005>