From owner-freebsd-hackers  Mon May 20 20:42:00 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id UAA18191
          for hackers-outgoing; Mon, 20 May 1996 20:42:00 -0700 (PDT)
Received: from paloalto.access.hp.com (daemon@paloalto.access.hp.com [15.254.56.2])
          by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id UAA18186
          for <hackers@freebsd.org>; Mon, 20 May 1996 20:41:55 -0700 (PDT)
Received: from fakir.india.hp.com by paloalto.access.hp.com with ESMTP
	(1.37.109.16/15.5+ECS 3.3) id AA134330105; Mon, 20 May 1996 20:41:51 -0700
Received: from localhost by fakir.india.hp.com with SMTP
	(1.37.109.16/15.5+ECS 3.3) id AA215120311; Tue, 21 May 1996 09:15:12 +0530
Message-Id: <199605210345.AA215120311@fakir.india.hp.com>
To: hackers@freebsd.org
Subject: I-/D- cache coherency issues
Date: Tue, 21 May 1996 09:15:10 +0530
From: A JOSEPH KOSHY <koshy@india.hp.com>
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


I'm looking at generating machine code on the fly and executing it later.
Since many of the newer uPs have separate I- and D- caches without
consistency checking between the two, this requires a way to ensure 
that the instructions executed from a virtual address range are what were
written out to memory by the code generator.

Most I- cache implementations are simple and don't snoop the bus so the
responsibility for maintaining coherency rests with the OS.

Before the list jumps on me about the horrors of self-modifying code 
I'd like to point out that rolling your machine code is useful in:

	o Direct threaded interpreters: FORTH comes to mind.

	o Reiser raster ops: turns out that this is one of the ways you
	  get decent performance out of torturous hardware like the IBM
	  (vanilla) VGA --- you generate m/c code for your graphics operation
	  special cased for operation desired and then let it rip.  

	o I would hazard a guess that a native mode compiler for the Java 
	  virtual machine would need similar facilities too.

	o Then of course there is the amusement value :-).

The magic incantation for correctly executing freshly generated code varies
from a simple "jmp $+2" on a '386 to more arcane calls to PALcode or equivalent
on the newer riscs.  I.e. its pretty much processor architecture and memory 
architecture dependent.

So my questions are:

(a) Do we have a means in userland of ensuring that a particular range of 
    virtual addresses is flushed from the I- or D- cache?  Something that
    would work across {Free,Net,*}BSD hopefully?

(b) Is there a non-m/c specific way this can be done from within the kernel?
    I.e. are the suitable kernel VM primitives that one could invoke?

Thanks,
Koshy