From owner-freebsd-current@FreeBSD.ORG  Fri Jun  3 11:35:30 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 5F46E16A41C;
	Fri,  3 Jun 2005 11:35:30 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50])
	by mx1.FreeBSD.org (Postfix) with ESMTP id EF01B43D48;
	Fri,  3 Jun 2005 11:35:29 +0000 (GMT)
	(envelope-from anderson@centtech.com)
Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220])
	by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id j53BZKlb065715;
	Fri, 3 Jun 2005 06:35:21 -0500 (CDT)
	(envelope-from anderson@centtech.com)
Message-ID: <42A04064.60007@centtech.com>
Date: Fri, 03 Jun 2005 06:35:00 -0500
From: Eric Anderson <anderson@centtech.com>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.7) Gecko/20050504
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Julian Elischer <julian@elischer.org>
References: <200506021824.j52IOkcQ004052@gw.catspoiler.org>
	<429F5DF2.9000300@elischer.org>
In-Reply-To: <429F5DF2.9000300@elischer.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV 0.82/908/Thu Jun  2 15:39:40 2005 on mh1.centtech.com
X-Virus-Status: Clean
Cc: Don Lewis <truckman@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: cannot alloc 19968 bytes for inoinfo
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jun 2005 11:35:30 -0000

Julian Elischer wrote:
> Bcc'd to some recipients. (you know who you are..)
> 
> Don Lewis wrote:
> 
>> On  2 Jun, Eric Anderson wrote:
>>  
>>
>>> Don Lewis wrote:
>>>   
>>>
>>>> On  1 Jun, Eric Anderson wrote:
>>>>
>>>>     
>>>>
>>>>> Andre Guibert de Bruet wrote:
>>>>>
>>>>>       
>>>>>
>>>>>> On Wed, 1 Jun 2005, Eric Anderson wrote:
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>> Don Lewis wrote:
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>> On 31 May, Eric Anderson wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>> One of my filesystems won't fsck.  I'm not sure how to fix it, 
>>>>>>>>> or what it's really trying to tell me.
>>>>>>>>>
>>>>>>>>> # fsck -y /vol1
>>>>>>>>> ** /dev/da0s1d
>>>>>>>>> ** Last Mounted on /vol1
>>>>>>>>> ** Phase 1 - Check Blocks and Sizes
>>>>>>>>> fsck_ufs: cannot alloc 19968 bytes for inoinfo
>>>>>>>>>
>>>>>>>>> df -i /vol1 output:
>>>>>>>>> Filesystem  1K-blocks        Used    Avail Capacity  iused     
>>>>>>>>> ifree %iused  Mounted on
>>>>>>>>> /dev/da0s1d 1891668564 1684163832 56171248    97% 55109756 
>>>>>>>>> 189360002 23% /vol1
>>>>>>>>>
>>>>>>>>> Any help would be very appreciated!
>>>>>>>>>               
>>>>>>>>
>>>>>>>> You're probably running into the default 512MB data size limit.  
>>>>>>>> Try
>>>>>>>> setting kern.maxdsiz to a larger value in /boot/loader.conf and
>>>>>>>> rebooting.  I've got mine set to 1GB.
>>>>>>>>   kern.maxdsiz="1073741824"
>>>>>>>>             
>>>>>>>
>>>>>>> Hmm - I don't seem to have that sysctl..  What would create it?
>>>>>>>           
>>>>>>
>>>>>> It's a loader tunable, not a sysctl variable. man 5 loader.conf
>>>>>>         
>>>>>
>>>>> Oh.. oops. :)   Ok, then I have it set correctly but it isn't 
>>>>> helping me.  My fsck still dies the same way.  Looks like it's 
>>>>> taking up about 362MB memory (I have 1GB).  Any more ideas?
>>>>>       
>>>>
>>>> What does the shell limit command say about your datasize limit?  Your
>>>> limit might have been cranked down in login.conf.
>>>>     
>>>
>>> I looked too early at the fsck. It appears to actually be going up to 
>>> the 1GB limit now, and then bombing. It's now bombing at a different 
>>> point:
>>>
>>> # fsck -y /vol1
>>> ** /dev/da0s1d
>>> ** Last Mounted on /vol1
>>> ** Phase 1 - Check Blocks and Sizes
>>> fsck_ufs: cannot increase directory list
>>>
>>>
>>> # limits
>>> Resource limits (current):
>>>   cputime          infinity secs
>>>   filesize         infinity kb
>>>   datasize          1048576 kb
>>>   stacksize           65536 kb
>>>   coredumpsize     infinity kb
>>>   memoryuse        infinity kb
>>>   memorylocked     infinity kb
>>>   maxprocesses         7390
>>>   openfiles           14781
>>>   sbsize           infinity bytes
>>>   vmemoryuse       infinity kb
>>>
>>> So I think I just need more RAM.. This is really a major ceiling for 
>>> anyone that wants a somewhat large filesystem, or someone who needs a 
>>> lot of inodes.  Is there maybe a different way to do the fsck that 
>>> might take longer, but run in 'small' memory footprints like 1GB or 
>>> less?  I know little to nothing about coding fsck tools or memory 
>>> management, but I do know that there's always more ways to do 
>>> something.  Just curious if there could be a 'lowmem' option for fsck 
>>> that would utilize memory differently in order to fsck large fs's.
>>>   
>>
>>
>> You can crank up datasize to be larger than RAM assuming that you have
>> sufficient swap configured.  It'll just be slow as fsck starts paging.
>> At some point on 32-bit architectures like the i386 you'll run into the
>> address space limit, probably at 2-3GB.
>>
>> I think julian@ has mentioned having a version of fsck that uses
>> external storage.  I would expect a substantial speed impact, and you
>> would need at least one other file system mounted rw.
>>  
>>
> 
> exactly.
> We haven't done this yet, however it's on our development
> roadmap for the next generation of raids.
> In our "back of the envelope" designs and calculations it
> is hard to say conclusively  that it will be a lot slower. You do
> the work in several passes of the disk where you never go 'backwards'..
> instead, you write out "addresses of interest" that are behind
> you to a "read this on the next pass" list that you write
> out to the other storage.  The "next pass read" lists are sorted
> in ram as much as possible before being written out
> and then you do an on-disk "merge sort" on them as needed
> to produce an "in-order" read list.
> 
> you keep doing this, producing some sorted output lists that detail such 
> things as
> block ranges in use etc. and in the end you reconcile all the output files
> to produce the real bitmaps, find collisions, find ureferenced Inodes etc.
> Onc eagain you put the output files out in chunks htat ar epre-sorted in 
> RAM
> and then do merge-sorts as needed on them to produce sorted output lists.
> The output files would be sorted by different fields for different tests.
> for example a list of referenced block ranges, sorted by start block,
> quickly finds multiple inodes referencing the same blocks and quickly gives
> you the correct bitmaps. A list of referenced inodes, sorted by inode 
> number
> gives you link counts and unreferenced inodes.. etc.etc.
> 
> In the current fsck the majority of time is spent waiting for the head to
> rattle backwards and forwards as it follows links so it is hard to say 
> without trying it
> whether we will be slower than that if we only do forward passes..
> 
> (my guess is "yes, we'll be slower, but not by orders of magnitude, and 
> definitly
> a lot faster than a system that can't fsck it at all due to lack of RAM. 
> :-)

This sounds really awesome!  This would mean I could actually put all my 
filesystems together for one large 18TB partition! :)

Let me know when there is something to test - I'd be more than happy to 
give detailed feedback..


Eric


-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
A lost ounce of gold may be found, a lost moment of time never.
------------------------------------------------------------------------