Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 06 Jun 2002 07:57:43 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Miguel Mendez <flynn@energyhq.homeip.net>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: allocating memory
Message-ID:  <3CFF7867.4F7193E2@mindspring.com>
References:  <3CFEEB99.AEDC5DB9@math.missouri.edu> <3CFF2780.FAD81226@mindspring.com> <20020606122702.A81113@energyhq.homeip.net> <3CFF4FE8.86C44C31@mindspring.com> <20020606152458.A81446@energyhq.homeip.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Miguel Mendez wrote:
> On Thu, Jun 06, 2002 at 05:04:56AM -0700, Terry Lambert wrote:
> How come? A Sun Blade 100 is about $1,000. That's not what I call
> expensive. It's not an E4500, but not a bad box once you load it with a
> bit more RAM and a SCSI controller. You can get Ultra 10 boxen pretty
> cheap these days too.

A Sun Blade 100 is limited to 2G of RAM.

A Sun Blade 2000 (limited to 8G of RAM) is ~US$10K

The lowest cost Ultra workstation (the 60) is also limited to 2G,
and costs ~US$7K.

The V120 rack mount is ~US$2.5K; it's the lowest end system that
can do 4G.  To get to 8G, you need to go to the 280R; also ~US$10K.


> > guess if I didn't limit your remarks to FreeBSD, and included Solaris,
> > AIX, or NetBSD as options, the field would be larger.
> 
> Yes, I'd include those OS, as FreeBSD is not, and won't be, production
> ready for a while for those platforms.

I guess you could post about that to the "solaris-hackers@sun.com"
mailing list, if such a thing existed... ;^).


> > port (which *is* complete) unfortunately can't handle more than 2G
> > of RAM (apparently, this has to do with the I/O architecture issues
> > that have yet to be resolved completely, by way of code changes
> > that *should* happen, but haven't, because the i386 doesn't need
> > them to happen).
> 
> It seems to me most developers have lost interest in it and moved
> already towards more exciting targets, like the sparc port.

Uh, there are some things that are transportable like that, but most
things aren't.  "I used to hack Alpha assembly code, but now I think
I will go hack SPARC 64 assembly code" doesn't really happen in the
real world (unless you are this crazy guy I know).


> > It's not there; there would have been a big announcement, I think,
> > like "Hyperthreading" (really, SMT).  Peter Wemm was reported by
> > Alfred Perlstein to have been working on it.  If Peter is on it, it
> 
> Well, now *that* would be interesting to see, as a hacker exercise.

Peter is a commercial, professional programmer.  Not just a hacker.
You don't get that kind of depth in efforts out of volunteers who
are not nuts.  8-).



> Assume a (software based) 64bit address space, by means of using long
> long for pointers. Of course you can only access a 4GB chunk at a time,
> but programs need not to know about that. Do they want to malloc or mmap
> 8GB? You let them. If the program is doing random access all the time,
> it will spend a lot of time in kernel, as not only pages, but segments
> have to be taken in account when accessing a memory location. It would
> work pretty well for programs doing consecutive accesses to their
> dataset (or within the 4GB boundary). Doing some MMU magic you can have
> a transparent system to allow programs use more than 4GB.

Virtual addressing is handled in hardware, which is limited to
32 bits.  To make this idea work, you would have to take a fault
on every memory access, and then do a fixup that (maybe) included
a bank selection process as well (similar to how write faults are
emulated for i386 in supervisor mode, since they do not result in
faults, and you want to avoid people using copy-on-write on a read
to a bogus address to spam kernel memory as a means of hacking a
higher priviledge level by reading, say, a uid of 0 into the
current process's cred).

Handling this would be so incredibly expensive that you might as
well give up and just add swap to the system in question.

Really, the only way to deal with it adequately is by abusing
hardware, at a task granularity, where you have work to do at
task management time, anyway, and it can be amortized over a
lot of CPU time.

> Maybe if/when that hardware becomes affordable I'll try myself such a hack :)

You should (in theory, from the documentation -- I don't have
a PAE board with 3G of RAM lying around to check) be able to
bank select even without the extra hardware, as long as the PAE
is supported in the processor.  You just need enough RAM to be
able to fit in the low granularity for two windows contents
worth.


> > In other words, if you need X G of RAM, then 4 times that much
> > is not going to save you, and you need to reconsider your code.
> 
> Databases for one love to have huge amounts of memory. It's not uncommon
> to have e.g. informix processes using 16GB of ram on Sun big iron.

Good reason to buy 64 bit iron, IMO, instead of trying to pretend
by emulating your 1GHz pentium with 32M of ram on your PC-XT and
swapping to the old ST506 to simulate RAM.


> > What this boils down to is that the physical RAM ends up getting
> > used up for housekeeping of the physical RAM.  You can push this
> > up closer to 3G; but to do that, you have to make modifications
> > to pmap.c and machdep.c.
> 
> Such an enhancement needs a lot of modifications to the VM subsystem.

Not as many as you might think, actually.  The PPC and Alpha
memory management somewhat resemble the work necessary to be
done in software.  And the task switching has to happen anyway.
Most of the problem is in the bank selection and limiting device
drivers to not using banked memory.

Even so, I don't think it's worthwhile.  The modifications that
*are* needed are fugly, and unlikely to be committed by anyone
polite, IMO.


> > housekeeping for.  So the only real approach is to got to a
> > MIPS-like software lookaside (hierarchical) so that you can
> > take each bank, and take the 1/4 out of the bank itself.  This
> > works, but it's incredibly expensive, overall.
> 
> Hmm, yes. So what does Windows 2000 Datacenter do wrt that problem?
> Waste memory like there's no tomorrow?

It's always fun to try to poke at Windows with a sharp stick,
but I'll take your comment literally, instead of as a sideways
jab at Windows:

Actually, I have no idea.

I know how I would do it in Windows 98 and in Windows NT 3.5 and
4.0 SP2, if it were my job to do, but as to what they actually
do, and in a more "modern" version of Windows, I don't know,
since I haven't had the pleasure of grovelling through the code
of a more modern Windows.

The closest I could come would be some educated guesses.  There
are at least three places you would have to hack in VMM32.VXD, and
about six other places in the IFSmgr and networking code, and I'm
probably forgetting some esoteric code path I never had to crawl
through with WinICE.

Probably, the MS people got some input into the design, so it's
close enough to what they were already doing that their overhead
would be lower.

The Linux overhead is pretty low, too, since they do a lot of
stuff in software that FreeBSD does in hardware, in their VM,
in order to make it more naturally easy to port.  The design is
less Intel-centric, making it a bit slower on Intel than it could
be, if they were running closer to the glass.


> > Basically ...the memory is only good for programs themselves,
> > not for DMA, mmap'ed files, more mbufs... anything really useful.
> 
> Of course, it's the applications demanding memory we are talking about.
> For the OS itself, it's just a half assed solution, no practical at all.

I'd really have to go out of my way to design a real pig of an
application in order to make it need this.  Almost everything I
do these days ends up I/O bound, where the ability to move data
in and out of memory ends up being the bottleneck.  With rare
exceptions, even going Gigabit, I have a hard time pushing an
800MHz CPU over 60%.

The PAE increases the amount of copying, and so it doesn't
save me bus cycles off my memory bus.  Even if all I did was
use the memory as a soft "L3 cache", I've got a lot more copy
overhead, which means that if my problem is my memory bus
bandwidth, all I'm doing is shooting myself im my knee so as to
avoid hitting my foot.

I have applications that would really like the extra memory, but
as they are all network applications, and I can't use the extra
memory as mbufs because I can't DMA into or out of it without
adding an extra copy in both directions, and I's have to add a
copy, both in and out, in most cases where I don't have one today.

So unless I did something dumb, like run a whole bunch of virtual
servers (I'd be inclined to bank select between servers, and then
time slice on the same boundary), I'd be hard pressed to find a
situation where it was a win (dumb because I might as well build
more 1U boxes: they're less expensive, faster, and one crashing
doesn't kill everyone else).


> > Now add in that most new, fast memory is actually limited to
> > 2G in a system because of the memory modules being used (e.g.
> > the new 450MHz memory busses can't handle more than 2G).
> 
> Add more memory buses :)

There is one motherboard that I know of with 2 (limit 4G).

The problem is that there is good evidence that most people who
build chipsets couldn't build one that could walk and chew gum
at the same time without causing problems.  I would have a very
hard time tryusing something like that.

The AMD Hammer stuff with the Hyperchannel, I think will be OK,
if they ever start selling boxes this century.  They are already
8 months behind their opriginal "tape out" date of last November.
Right now, it's just so much vapor.


> > priority processes, you prefer the ones that are in the same
> > bank as the current process.
> 
> I'd keep all .text pages in the low 4GB of the machine. The probability
> that a program's code is bigger than that is, imho, null.

At this point, you are redesigning it into an application specific
OS, rather than a general purpose OS.  If you do that, you really
can't expect that anyone would be willing to accept the penalty or
maintain the code, if they didn't.

This might work well for your specific need (the ability to have
up to 4 times the current memory limit without buying a 64 bit
processor, at the expense of really expensive 32 bit hardware),
but the marginal returns mean that the IRR on the investment is
going to satisfy maybe 5% of people who need that much RAM, which
I would argue are, at most, 5% of the users.  That's just under
3 tenths of a percent, agregate, for the user base.  No wonder it
is not already supported.  8-).


> > The bottom line is that, in order to have a *usable* amount of
> > physical RAM over 4G, you pretty much have to go to a 64 bit
> > platform, and if you are a user, now that Alpha is dead and no
> > one looks to be making quick progress on the Alpha 2G barrier,
> > that pretty much means "Itanium".
> 
> Except Itanium is nowhere production ready, so you probably need
> something else, e.g. sparc or ppc. Mips is also a nice arch to work
> with, btw, unfortunately SGI hardware is extremely expensive.

Production :== I can buy one at Fry's and load FreeBSD on it,
and it will work.  So it counts as "production", I think.

If you want a MIPS box that supports a lot of RAM, buy a Sibytes
card.  Chris Dimetreau is one of the guys who worked on it, so
it runs NetBSD, and plugs into a PCI slot.  It's supposed to be
a "network processor".  Be warned that the CPU speed on the MIPS
cores is pretty freakishly slow, compared to the original product
announcement, but if you are willing to entertain the idea of PAE,
then "freakishly slow" obviously doesn't bother you.  ;^).

Personally, I think that's a lot of effort, just to make political
noises about Itanium.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CFF7867.4F7193E2>