Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Oct 2011 02:46:51 +0200 (CEST)
From:      Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        hackers@freebsd.org, Grzegorz Kulewski <grzegorz@kulewski.pl>
Subject:   Re: mmap performance and memory use
Message-ID:  <alpine.BSF.2.00.1110070225050.25209@wojtek.tensor.gdynia.pl>
In-Reply-To: <20111006160159.GQ1511@deviant.kiev.zoral.com.ua>
References:  <alpine.BSF.2.00.1110061637270.15552@wojtek.tensor.gdynia.pl> <20111006160159.GQ1511@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
>> page. how much memory is used to manage this?
> I am not sure how deep the enumeration you want to know, but the first
> approximation will be:
> one struct vm_map_entry
> one struct vm_object
> one pv_entry

actually i don't need precise answer but algorithms.

>
> Page table structures need four pages for directories and page table proper.
>>
>> 2) suppose we have 1TB file on disk without holes and 100000 processes
>> mmaps this file to it's address space. are just pages shared or can
>> pagetables be shared too? how much memory is used to manage such
>> situation?
> Only pages are shared. Pagetables are not.

this is what i really asked, thank you for an answer. My example was 
rather extreme but datasets of tens of gigabytes would be used.

> superpages are due to more efficient use of TLB.
actually this is not really working at least a while ago (but already in 
FreeBSD 8) i tested it. Even with 1GB squid process without any swapping 
it wasn't often allocating them.

Even with working case it probably will not help much here unless 
completely all data is in RAM, and following explains why

> accurate tracking of the accesses and writes, which can result in better
> pageout performance.
>
> For the situation 1TB/100000 processes, you will probably need to tune
> the amount of pv entries, see sysctl vm.pmap.pv*.

so there is a workaround but causing lots of soft page faults as there 
would be no more than few hundreds or so instructions between touching 
different pages.

What i want to do is database library (but no SQL!). It will be something 
alike (but definitely not the same and NOT compatible) CA-Clipper/Harbour 
or harbour but with higher performance and to use it including heavy 
cases.

With this system one user is one process, one thread. if used as 
WWW/something alike it will be this+some other thing doing WWW interface 
but still one logged user=exactly one process


As properly planned database tables should not be huge i assume most of 
them (possibly excluded parts that are mostly not used) will be kept in 
memory by VM subsystem. So hard faults and disk I/O will not be a deciding 
factor.

To avoid system calls i just want to mmap tables and indexes. All 
semaphores can be done from userspace too, and i already know how to avoid 
lock contention well.

Using indexes means doing lots of memory reads from different pages, but 
for every process it will be usually not all pages touched but small 
subset.

So it MAY work well this way, or may end with 95% system CPU time mostly 
doing soft faults.

But future question - is something for that case planned in FreeBSD? I
think i am not the only one about that, not all people on earth use 
computers for few processes or personal usage and there are IMHO many 
cases when programs need to share huge dataset using mmap, while doing 
heavy timesharing.

I understand that mmap works that way because it may be mapped in 
different places and even with parts of single file in different places as 
this is what mmap allows.

But is it possible to make different mmap in kernel like that

mmap_fullfile(fd,maxsize)

which (taking amd64 case) will map file at 2MB boundary if maxsize<=2MB, 
1GB boundary if maxsize<=1GB, 512GB boundary otherwise, with 
subsequent multiple 512GB address blocks if needed, and sharing 
everything?

it is completely no problem that things like madvise from one process will 
clean madvise setting from other process, or other problems - as only one 
type of programs that are aware of this would use it.

this way there will be practicaly no pagetable mapping overhead and 
actually simpler/faster OS duties.

I don't really know how exactly VM subsystem works under FreeBSD but if it 
is not hard i may do this with some help from you.

And no - i don't want to use any popular database systems for good 
reasons.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1110070225050.25209>