Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Nov 1997 10:21:54 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        mike@smith.net.au (Mike Smith)
Cc:        mini@d198-232.uoregon.edu, hackers@FreeBSD.ORG
Subject:   Re: x86 gods; advice? Suggestions?
Message-ID:  <199711091021.DAA24289@usr06.primenet.com>
In-Reply-To: <199711080954.UAA00629@word.smith.net.au> from "Mike Smith" at Nov 8, 97 08:24:16 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > > This was suggested by another respondent.  I'd be very interested in 
> > > knowing how I could arrange such a thing, either overloading the 
> > > existing syscall callgate or making another for temporary use (I have 
> > > another free descriptor that I can hijack for the purpose).
> > 
> >   I don't know, I've never done it myself, personally. :)
> 
> Ah.  A handwave response.  8)   I'll look at it tomorrow; it was 
> suggested that I buy a book - obviously a suggestion from someone that 
> doesn't live in this book-desert.

I was the one who suggested a book:

	Protected Mode Software Architecture
	- PC System Architecture Series
	Tom Shanley
	MindShare, Inc.
	ISBN: 0-201-55447-X

Lest I be accused of handwaving-by-association yet gain, here's the
process (people who are too lazy to wrap their brains around complicated
things should delete this message now instead of bitching at me like
they do when I talk about other complicated things they are too lazy to
wrap their brains around, like file systems, etc.):


o	Load all or part of the task into memory (minimally, startup
	code).

o	Create a TSS for the task.  A TSS is an execution context for
	the processor at the point it first begins or resumes the
	exection of the task.  A special TSS segment desriptor is
	placed in the GDT defining the base address, length, and
	Descriptor Priviledge Level for the TSS it points to.  A
	TSS starts at the TR specified TSS base address and extends
	to the TSS limit, also from the TR.  It looks like:

		0						31
	00	Link (old TSS selector)		0000000000000000
	04	ESP0
	08	SS0				0000000000000000
	0C	ESP1
	10	SS1				0000000000000000
	14	ESP2
	18	SS2				0000000000000000
	1C	CR3
	20	EIP
	24	EFLAGS
	28	EAX
	2C	ECX
	30	EDX
	34	EBX	<-- this ordering explains a bit, eh?
	38	ESP
	3C	EBP
	40	ESI
	44	EDI
	48	ES				0000000000000000
	4C	CS				0000000000000000
	50	SS				0000000000000000
	54	DS				0000000000000000
	58	FS				0000000000000000
	5C	GS				0000000000000000
	60	Task's LDT selector		0000000000000000
	64	X000000000000000		Base address of I/O Map (*)

	Optional additonal information:

	You can follow this starting at 68 with OS specific data
	of arbitrary length (I wouldn't make it too large... the
	whole thing shouldn't exceed 64k, and should be longword
	aligned -- the I/O Map Base address is a 16 bit offset).

	If you are using the Appendix H disclosures to set the virtual
	8086 mode interrupt handling by setting bit 0 (Later 486 and
	Pentium processors or better only), then a CLI, STI, INT, PUSHF,
	POPF, or IRET is not handled the same was as on the 386 (and 486's
	before a rev or two after the Pentium was introduced).  Instead,
	the IF register is shadowed in a register called VIF (V=virtual),
	and modifying EFLAGS IF modifies this instead.  A bitmap that must
	be exactly 8 longwords long (256 bits)  then lets you let the
	interrupt be handled in "real mode" instead of trapping to your
	Virtual Machine Monitor if the bit is set.

	The "Base address of I/O Map" field points to the first longword
	immediately following the above two areas (ie: the processor uses
	this value and a negative offset of 32 bytes to access the virtual
	interrupt processing bitmap).

	The I/O Map must be present if your VM86 task is going to access
	I/O ports.  If you virtualize all accesses, you do not need a
	bitmap.  In theory, the bitmap needs to be 8k.  In practice, you
	can do it in longword chunks up to the TSS limit from the TR if
	you don't want to permit accesses to ports above some arbitrary
	limit.

	(*) If X = 1, a debug exception occurs when switching to the task

	The TSS descriptor looks like:

		0			7
	0	LSB of Segment Size
	1	2nd byte of Segment size
	2	LSB of Base Address
	3	2nd byte of Base Address
	4	3rd byte of Base Address
	5	1  B  0  X  S  D1 D2 L
	6	[ nibble ]  U  0  0  G
	7	4th byte of Base Address

	B	1 = task is busy
	X	0 = 16 bit TSS, 1 = 32 bit TSS
	S	0 = system segment (must be zero in TSS descriptor)
	D1&D2	Descriptor Priviledge Level (0 for OS, 3 for user/VM86)
	P	1 = Segment present
	nibble	upper nibble of Segment Size (20 bits total)
	U	user bit (can be used by FreeBSD, etc.)
	G	Granularity of Segment Size (0 = bytes, 1 = pages)

o	Set up a hardware timer to force you back.  PCAUDIO is a definite
	drag at this point because of interrupts.

o	Switch to the task using a far call or jump that selects the TSS
	descriptor in the GDT.  This implies you have a TSS for the OS,
	since the processor is going to copy the registers out to the
	OS's TSS.

o	The processor loads the new task's TSS an uses the CS:EIP from
	the new TSS to start fetching (and executing) code from the
	new task.


This sort of answers Tony Overfield's "3.  Something else..." question:
you can get a Task Gate Descriptor in the LDT or GDT or IDT.  This means
"Something else..." can be triggered by:

o	A far call
o	A jump
o	A hardware interrupt
o	A software exception
o	An INT instruction (hello, thunking...)

You use a VMM to field the event (in the INT case) in a memory extender;
basically you need a VMM to do VM86 mode anyway.  It works by being the
general purpose exception handler code.  A VM86 task pushes the EFLAGS
register on the stack, but then clears the VM bit disabling VM86 mode.
The exception handler looks at the EFLAGS on the stack to see if the VM
bit is set; if it isn't, it does the normal protected mode stuff.  If
it is, however, the VMM must determine the action requested by the DOS
task, and figure out how to do it itself.

For example, an INT 21 against 0x80 could be handled as a request to
a virtual "C:" actually implemented as a subhierarchy of an FFS mounted
volume (Yet Another Reason To Allow Terry's Layering Fixes: now there
are *4* VFS consumers:  system calls, the NFS server, the kld code, and
the newly defined VMM fielding INT 21 calls on behalf of a DOS process.
Now we *really* can't keep our eyes tighly closed and pretend it's only
system calls...).

> > 	- a 32-bit 'we are the kernel now' context,
> > 	- a 16-bit protected-mode 'let's play with the BIOS' context, and
> > 	- a 16-bit vm86 'let's pretend we are a Microsoft OS' context.
> 
> Too complicated, and inadequate.  There is a separate 32-bit context 
> needed for the APM BIOS as well, and we are out of descriptors already. 
> The vm86 support handles this differently by creating a kernel process 
> at a later stage.

Because we want to *not* virtualize INT 21 (or more likely INT 13) in
the case where we are are running a fallback driver, and *not* virtualize
INT 10 in the case where we are calling it (against my better judgement:
many VGA cards disable interrupts in BIOS to get rid of "sparklies" any
time you call INT 10 -- can you say "sucks"?  I knew you could...) to
set video modes via BIOS, etc....  Then we want to have at least three
TSS's: one for the OS, one for the OS to make BIOS calls with, and one
for the OS to run a VM86 for DOS under UNIX (and depending on the
complexity of the VMM, maybe even emulate the functions of the 386 to
the point of running Windows 95, like some other x86 protected mode OS's
can).

The APM mention above would be handled in the second context, above.

Technically, you could do this using only two GDT's: one for the OS
(in this case FreeBSD), and one that you switched off between as
many others as you wanted, so long as they never trapped back to
something other than the OS exception handler (acting as the VMM) to
do their thing.  Practically, there are far to many things that rely
on the "DOS not busy" interrupt at INT 27 for this to work well...
for example, if you wanted to unset the bit for an INT for a network
card in the OS TSS and make the VMM switch to a VM86 and run a DOS
network card driver to field the interrupt and use real mode IPX or
ODI or whatever stacks, and then pass the data to FreeBSD via thunk.
This would leave you with two TSS's, which, together, implemented the
actual OS (instead of one).  Any card driver that ran in DOS but not
in FreeBSD could, in principle, be handled the same way.


> > 	- hop into protected mode, create a vm86 task which handle
> > 	loading the kernel. (map the vm86 1M+ range to where you want the
> > 	kernel to go, or do a 1:1 map of all physical memory, and then set
> > 	the vm86's descriptor limit to 4G or so. Do all the loading from
> > 	vm86 mode. much easier code to look at)
> 
> Unfortunately, we have no tools for writing realmode code.  I was 
> perhaps somewhat misleading regarding running the bootstrap in real 
> mode; please look at the code for an understanding of the issues.

I agree.  We want to enter protected mode as early as we can, build
a VM86, and *stay* in protected mode ever after.

> This isn't making a lot of sense to me.  Are you implying that one 
> could be in 32-bit PM and vm86 mode at the same time?

No.  You can swap them, but one has 16 bit segments, the other 32,
so at least two additional TSS's are needed ...but they could be
virtual, as stated above, so you'd have only two total real GDT entries,
if you felt you were running low...

Buy the book; it goes into much more detail than I did, including VM86
mode, task creation, the "magic registers" from Appendix H, including
how to do a virtualized INT 10 for a VM86 task to think it has video
hardware (it doesn't cover screen writes, but you can do that fairly
simply by marking the "screen memory" read-only and taking an exception
when it's written; use two timers, the first for latency so that inactivity
causes the "real" screen, probably an X window or virtual console, to be
updated, and the second, longer one for interval so that even if the
display is highly active, you mirror the changes to that point.  This
lets you do things like graphics fairly quickly, at the cost of keeping
a "diff" copy around to note deltas at any given timer firing.  Yet Another
Reason To Listen To Terry And Put DDX In The Kernel: console graphics for
programs running under VM86...).


This is entirely more than I had intended to type using only 9 fingers
(one of my fingers is currently on strike for the next 6 weeks or so...).

					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199711091021.DAA24289>