From owner-freebsd-arch@FreeBSD.ORG Tue Jun 6 07:22:59 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F59D16B698 for ; Tue, 6 Jun 2006 07:18:36 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from wx-out-0102.google.com (wx-out-0102.google.com [66.249.82.193]) by mx1.FreeBSD.org (Postfix) with ESMTP id F250F43D48 for ; Tue, 6 Jun 2006 07:18:34 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by wx-out-0102.google.com with SMTP id i31so880659wxd for ; Tue, 06 Jun 2006 00:18:34 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=nv9DzTiuhbMsYO0/0AMHqhjwL1I9m3cPCNd1YQlt/b1V5OslIB0ALn1sCsLEdtjHB7gCipNJcpJTfSvOBlMS8a5Qb3hIvmI0h5+CpmpE5uVzCTmfr8uD6yMYUX1XQI3SWUqt2vJ1++zPTdLsTH33fCJMeWgmWeAqKP20VOuqv/8= Received: by 10.70.130.14 with SMTP id c14mr7180871wxd; Tue, 06 Jun 2006 00:18:34 -0700 (PDT) Received: by 10.70.37.15 with HTTP; Tue, 6 Jun 2006 00:18:34 -0700 (PDT) Message-ID: <3bbf2fe10606060018k7d9052eck672277079144ca10@mail.gmail.com> Date: Tue, 6 Jun 2006 09:18:34 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Matthew Dillon" , "Alexander Leidinger" , "Bruce Evans" , "Suleiman Souhlal" , freebsd-arch@freebsd.org, freebsd-hackers@freebsd.org In-Reply-To: <200606060648.k566m0df046035@apollo.backplane.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <3bbf2fe10605311156p7e629283r34d22b368877582d@mail.gmail.com> <447DFA0C.20207@FreeBSD.org> <3bbf2fe10605311329h7adc1722j9088253515e0265b@mail.gmail.com> <20060601084052.D32549@delplex.bde.org> <3bbf2fe10605311632w58c2949buc072e58ac103d7d@mail.gmail.com> <20060601093016.ygeptkv80840gkww@netchild.homeip.net> <200606060648.k566m0df046035@apollo.backplane.com> X-Google-Sender-Auth: b97437906e0b7dea Cc: Subject: Re: [patch] Adding optimized kernel copying support - Part III X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jun 2006 07:23:04 -0000 2006/6/6, Matthew Dillon : > :AFAIR the DFly FPU rework allows to use FPU/XMM instructions in their > :kernel without the need to do some manual state preserving (it's done > :... > : > :Bye, > :Alexander. > > That actually isn't quite how it works. If the userland had active > FP state then the kernel still has to save it before it can use the > FP registers. The kernel does not have to restore it, however (that is, > it can just let userland take a fault to restore its FP state). > However, the kernel still has to mess around with CR0_TS when pushing > and popping an FP context / save area. > > The FP state reworking in DragonFly had the following effects: > > * We now have a save area pointer instead of a fixed, static save area. > This allows FP state to be 'stacked' without having to play weird > games with a static save area. > > * The standard FP restoration fault is no longer limited to userland. > The kernel can push its own state, switch away to another thread, > switch back, and take a fault to restore it, independant of the > user FP state. > > -- > > It would be possible to simplify matters and actually implement what > you say... the ability to use FP registers without any manual state > preserving. That is, to be able to treat the FP registers just like > normal registers. It would require saving and restoring a great deal > more state in the interrupt/exception frame push code and the > thread switch code, though. It could be conditionalized based CR0_TS > or it could just be done unconditionally. I'm not sure it would yield > any improvement in performance, though. I tend to agree with you beacause it would be too much work/storage savings which will loose all the improvements gave to xmm registers. The point about using xmm registers is just performance improvements. I think that having an interlock into the kernel (and so just one kernel saving-state) is the better thing for performances, even if it doesn't provide a real unconditional usage. Attilio PS: Please consider too that xmm registers seem increasing performances just if used with aligned with aligned datas (movaps, movdqa), so not in the general case. MMXs, instead, seem giving very poor improvement, in particular on evolved architectures (>= P3) -- Peace can only be achieved by understanding - A. Einstein