Date: Thu, 07 Jun 2001 23:52:50 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Ian Dowse <iedowse@maths.tcd.ie> Cc: Matt Dillon <dillon@earth.backplane.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: UFS large directory performance Message-ID: <3B207642.A7C7E7C@mindspring.com> References: <200106021207.aa78407@salmon.maths.tcd.ie>
next in thread | previous in thread | raw e-mail | index | archive | help
Ian Dowse wrote: > > The only potential problem I see here is that you could > > end up seriously fragmenting the malloc pool you are > > using to allocate the slot arrays. And, of course, the > > two issues you brought up in regards to regularing memory > > use. > > Thanks for the comments :-) Yes, malloc pool fragmentation is a > problem. I think that it can be addressed to some extent by using > a 2-level system (an array of pointers to fixed-size arrays) instead > of a single large array, but I'm open to any better suggestions. Use a chain allocator. I would suggest using the zone allocator, but it has some fundamental problems that I don't think are really resolvable without a rewrite. Unlike the zone allocator, which rounds to 64 byte units, a chain allocator allocates a free chain that is fit to a 32 bit boundary (4 bytes instead of 64, for an average wastage of 2, instead of 32, bytes, if there is any at all). In addition, the allocator will allocate based on least common multiple (after 32 bit rounding), and will use a backing object size to get the multiple. For a 60 byte object, this would mean allocating 15 pages, and dividing it into 1024 objexts (2048 for Alpha, with its 8k page size). But the memory can be backed by kmem_malloc() or alloc(), which allocates page units, instead of by the power of 2 allocator. And there are zero bytes wasted. The memory also need not be permanently type stable, like memory in a zone, though you would need to build a reclaimer that ran in the background. You could make it deterministic, by sorting the freelist in chunks of the allocation unit at a time (see Knuth; a quicksort is really not appropriate). You could use a second level reclaim to return memory to the system; this would effectively make it a hybrid between the Dynix zone allocator (on which it is modelled), and a slab allocator. Finally, you could maintain per-cpu chains to allow the allocations and frees to occur on a per CPU basis; that would mean that there was no IPI or bus contention for allocations or deallocations (whether you want to defer migration of objects to the chains of their home CPU, or let the free to the system at a low watermark take care of that for you, giving preference to frees of non-local origin elements, is reall a policy decision). I've actually written one of these, without the Dynix style reclaimer (e.g. the memory becomes type stable only after allocation, which makes it an improvement on the zone allocator, at the cost of not being usable at interrupt time for allocations), as part of some performance improvement work in a modified FreeBSD kernel. > If the second-level array size was fixed at around 4k, that would > keep the variable-length first-level array small enough not to > cause too many fragmentation issues. The per-DIRBLKSIZ free space > summary array is probably relatively okay as it is now. If you are going to fix it, and your object happens to be an even power of 2 in size, at the very least, make the backing object be the page size, so it will still work well on the Alpha, without page fragmentation. I rather expect that the IA64 will end up forcing our Intel page size to be increased. > The other main issue, that of discarding old hashes when the memory > limit is reached, may be quite tricky to resolve. Any approach > based on doing this from ufsdirhash_build() is likely to become a > locking nightmare. My original idea was to have ufsdirhash_build() > walk a list of other inodes with hashes attached and free them if > possible, but that would involve locking the other inode, so things > could get messy. Add an LRU field, and lock the hashes on a "32 object at a time" or similarly high granularity. This is more natural with a chain block, of course, but works well in any case. I have been considering moving the per process open file table into a direct/indirect object list, similar to the SVR4 method, rather than the size doubling, for similar performance reasons. Locking a block of files, and maintaining per block freelists (instead of a hints pointer -- what the heck else are you keeping in those idle structs anyway, such that you _shouldn't_ be cating them into a list?!?). If you poke around in that area long enough, you'll see that there are some "magic" numbers like "NFILES", which started out as an optimization, but which all the recent additions for things like kqueue event chains have turned into hopeless pessimizations. 8-(. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3B207642.A7C7E7C>