From owner-freebsd-acpi@FreeBSD.ORG Sat May 31 19:43:03 2008 Return-Path: Delivered-To: freebsd-acpi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A6FD106566C for ; Sat, 31 May 2008 19:43:03 +0000 (UTC) (envelope-from david@wood2.org.uk) Received: from v-smtp-auth-relay-1.gradwell.net (v-smtp-auth-relay-1.gradwell.net [79.135.125.40]) by mx1.freebsd.org (Postfix) with ESMTP id 5B27E8FC17 for ; Sat, 31 May 2008 19:43:02 +0000 (UTC) (envelope-from david@wood2.org.uk) Received: from argon.wood2.org.uk ([82.71.104.124] country=GB ident=postmaster^pop3&wood2#org$uk) by v-smtp-auth-relay-1.gradwell.net with esmtpa (Gradwell gwh-smtpd 1.290) id 4841a0bf.75cb.4c for freebsd-acpi@freebsd.org; Sat, 31 May 2008 20:02:23 +0100 (envelope-sender ) Message-ID: Date: Sat, 31 May 2008 20:00:47 +0100 To: freebsd-acpi@freebsd.org From: David Wood MIME-Version: 1.0 Content-Type: text/plain;charset=us-ascii;format=flowed User-Agent: Turnpike/6.06-M (<+nhRuLNS5oZIqwOH7WWZxwfp$O>) Subject: Dell PowerEdge 2950 III - CPU power management problems X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 May 2008 19:43:03 -0000 Dear all, I'm having problems with CPU power management on a Dell PowerEdge 2950 III. I've posted this to freebsd-acpi in the first instance - though it may finish up belonging on freebsd-stable. As I am almost certain there are bugs in teh DSDT, I thought I'd start on freebsd-acpi. THE HARDWARE AND OS The machine has BIOS version 2.2.6; it came from the factory that way and there's no later version available on the Dell Support web site. It has two Xeon E5430 processors (2.66GHz quad core Penryn), for a total of 8 cores. The "Demand-based power management" option is selected in the F2 BIOS setup. The machine is running FreeBSD 7.0-STABLE, csupped, built and installed earlier today (ACPI_DEBUG=1 was passed to all the make commands). The intention is to deploy this machine in production with 7.0-RELEASE and that's the OS I started off with. When I found I had problems, I went to -STABLE in case any relevant fixes had already been checked in. The behaviour of est(4) is different in -STABLE to 7.0-RELEASE. WHERE I THINK I AM I can't control the processor core clock frequencies, even after fixing what I'm almost certain are bugs in the DSDT. The values in the _PSS object for Control and Status could be wrong, upsetting est(4) - but the correct values are only available in the (Intel Confidential - only available via an Intel FAE) BIOS Writers' Guide. There are errors from est on a verbose boot. Maybe the _PSS values aren't the problem, even though they appear to be wrong. I believe there are restrictions on controlling clock frequency of the individual cores of multi-core processors. However, it would be useful to be able to use powerd on this box as its load in production will be bursty and slowing down the processor cores should save power (and save on generating heat). dev.cpu.0.freq_levels looks reasonable, but no other cpu has freq_levels or freq. sysctl dev.cpu.0.freq=2000 is accepted without error. sysctl dev.cpu.0.freq=2667 more often that not results in: dev.cpu.0.freq: 2667 sysctl: dev.cpu.0.freq: Invalid argument dev.est.n.freq_settings (for all values of n) is not as expected at all. Even values of n has 2667/103000 (from the _PSS object in the DSDT, there should be three levels), and odd values of n just has 0. All the est related errors in dmesg indicate that something is not working with Enhanced SpeedStep. Download links for "sysctl dev" and dmesg output after a verbose boot can be found at the end of this message. I'd be grateful if someone could look this over. If a developer needs access to the machine, please email me and I'll see if I can sort something out - it does have a remote management card which simplifies this sort of debugging considerably. The notes below have been prepared with Nate's notes in the handbook in mind - I hope that they're in a helpful format. THE DSDT - AND WHAT I THINK ARE BUGS The output of acpidump -dt is at http://www.wood2.org.uk/freebsd/djwood-Dell_2950_2.2.6.asl I've posted a patch at http://www.wood2.org.uk/freebsd/djwood-Dell_2950_2.2.6.asl.diff that fixes what I'm almost certain are two separate bugs. Firstly, the _CST method for CPU1 says it's returning 3 states when there are only two. This leads to cpu0: invalid _CST state count (3 != 2) at boot if it isn't fixed. Secondly, the reference in CPUs 2-8 for _CST is incorrect. Return (\_PR.CPU0.CST) should be Return (\_PR.CPU1._CST) in each case otherwise you get ACPI Error (psargs-0459): [\\_PR_.CPU0.CST_] Namespace lookup failure, AE_NOT_FOUND when initialising each cpu from cpu1 to cpu7 if it isn't fixed. There's a redundant External (\_PR_.CPU0.CST_, IntObj) left over after fixing the second bug - I've commented it out. I built AML from my patched source with iasl -2f and set /boot/loader.conf to soft load it. Both the errors mentioned no longer occur, though I still can't control the clock frequency of my processor cores. The -f is needed otherwise iasl complaints about reserved names in the TPM part of the DSDT and won't emit any output (I built the latest iasl from the source on the ACPI CA site and got the same results). -2 is because I believe the output of acpidump -dt is ACPI 2.x compatible - if this is incorrect on my part, let me know. These problems only appear when the "Demand-based power management" option is selected in the F2 BIOS setup. Without that option turned on, most of the power management related stuff is omitted from the DSDT. I tried to report these problems to Dell, as they also show under CentOS 5.1 (which is really just a debranded RedHat Enterprise Linux 5.1 - a Dell supported OS for this system), asking that my notes were passed to R&D. I got a worthless response saying that there was little that PowerEdge Linux support could do unless I have a definite hardware fault. Not only was I left wondering why I bothered, I'm somewhat saddened that the buggy BIOS for one of Dell's leading rack mount servers passed software QA in this state. I'd be grateful if someone can confirm these bugs and my fixes. If anyone has a suitable contact at Dell to get these bugs fixed, please pass this message on. LOGS AND SO ON /boot/loader.conf reads: # For MegaCLi mfi_linux_load="YES" # For SMART monitoring mfip_load="YES" # ACPI fixing acpi_dsdt_load="YES" acpi_dsdt_name="/boot/acpi.aml" debug.acpi.layer="ACPI_ALL_COMPONENTS ACPI_ALL_DRIVERS" debug.acpi.level="ACPI_LV_VERBOSE" debug.cpufreq.verbose="1" dmesg output after verbose boot with those settings (apart from the acpi_dsdt_load line, which is commented) can be found at: http://www.wood2.org.uk/freebsd/djwood-Dell_2950_2.2.6-dmesg.log When the acpi_dsdt_load line is uncommented to soft-load my fixed DSDT, the dmesg output can be found at: http://www.wood2.org.uk/freebsd/djwood-Dell_2950_2.2.6_fixed-dmesg.log sysctl hw.acpi isn't very useful here. The output of sysctl dev with my soft-loaded DSDT can be found at: http://www.wood2.org.uk/freebsd/djwood-Dell_2950_2.2.6_fixed.sysctl.log I think grep -E '(freq|^dev.cpu)' will pull out the salient points, but seeing as it's a file, it may as well contain everything. Best wishes, David -- David Wood david@wood2.org.uk