Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Sep 2006 10:26:16 -0400
From:      "Alexandre \"Sunny\" Kovalenko" <Alex.Kovalenko@verizon.net>
To:        David Wolfskill <david@catwhisker.org>
Cc:        acpi@freebsd.org
Subject:   Re: Avoiding "WARNING: system temperature too high,	shutting down	soon!"?
Message-ID:  <1158503176.754.26.camel@RabbitsDen>
In-Reply-To: <20060916234642.GC698@bunrab.catwhisker.org>
References:  <20060916234642.GC698@bunrab.catwhisker.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Boundary_(ID_O58A4aArqvfZgtRbBK518g)
Content-type: text/plain; charset=utf-8
Content-transfer-encoding: 8BIT

On Sat, 2006-09-16 at 16:46 -0700, David Wolfskill wrote:
> I could use some help:  I seem to overheat my laptop; I'd like to get
> some idea of how to avoid the overheating, preferably while still
> getting the work done.
> 
> The laptop is a Dell Inspiron 8200.  I recently bought this one to
> replace a 1.6 GHz one that had developed an occasional problem with
> the LCD display that made the display unusable (though I could SSH in to
> the machine usually).  This machine is a 2.4 GHz P4M with 768 MB RAM (at
> the moment).
> 
> During Nate's BAFUG talk earlier this month, I decided to try running
> powerd; I set the mode at "adaptive" for AC, battery, and unknown, and
> dev.cpu.0.freq reports that it normally sits at 150, but appears to ramp
> up quite responsively during, say, a "make buildworld."  (The eralier
> laptop sits at dev.cpu.0.freq=1600 during that process; the current one
> sits at 2400 -- as expected).
> 
> However, the temperature (as reported by hw.acpi.thermal.tz0.temperature),
> which meanders between 52 - 62C while the machine isn't doing much,
> tends to spend long stretches of time in the 80 - 90C range during a
> "make buildworld" (as reported by a "while (1)" loop during said
> process).  As you can see from the salient sysctl values, that's not a
> lot of headroom:
> 
> g1-18(6.2-P)[4] sysctl hw.acpi.thermal dev.cpu.0
> hw.acpi.thermal.min_runtime: 0
> hw.acpi.thermal.polling_rate: 10
> hw.acpi.thermal.user_override: 0
> hw.acpi.thermal.tz0.temperature: 58.5C
> hw.acpi.thermal.tz0.active: -1
> hw.acpi.thermal.tz0.passive_cooling: 0
> hw.acpi.thermal.tz0.thermal_flags: 0
> hw.acpi.thermal.tz0._PSV: -1
> hw.acpi.thermal.tz0._HOT: -1
> hw.acpi.thermal.tz0._CRT: 94.0C
> hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
> dev.cpu.0.%desc: ACPI CPU
> dev.cpu.0.%driver: cpu
> dev.cpu.0.%location: handle=\_PR_.CPU0
> dev.cpu.0.%pnpinfo: _HID=none _UID=0
> dev.cpu.0.%parent: acpi0
> dev.cpu.0.freq: 150
> dev.cpu.0.freq_levels: 2400/0 2100/0 1800/0 1500/0 1200/0 1050/0 900/0 750/0 600/0 450/0 300/0 150/0
> g1-18(6.2-P)[5] 
> 
> leading to:
> 
> Sep 16 10:11:43 localhost root: WARNING: system temperature too high, shutting down soon!
> Sep 16 10:11:43 localhost syslogd: /dev/:0: No such file or directory
> Sep 16 10:11:49 localhost kernel: acpi_tz0: WARNING - current temperature (94.5C) exceeds safe limits
> Sep 16 10:11:55 localhost syslogd: exiting on signal 15
> 
> this morning while I was running yesterday's -CURRENT, building today's.
> (I had already built today's -STABLE, aka -6.2-PRERELEASE successfully.)
> 
> And that's the work that I'd like to be able to do:  track RELENG_6
> & HEAD on a daily basis.  With a few interruptions, mostly from
> events not of my choosing, I've been doing this with various machines,
> including laptops, for some years.
> 
> I suppose it's possible that the cooling just isn't adequate for the
> machine, though each of the 2 fans appears to operate.  (Each has a
> "high", "low", and "off" setting; one fan is for the CPU; the other is
> for the motherboard -- per Dell's diagnostics.  The motherboard fan
> does make an odd sound sometimes, though the diagnostics claim that it
> was running fine.)
> 
> Just prior to the forced shutdown (above), the reported temperature
> had been >90C for several minutes, and the fans were going full
> bore.  I had elevated the laptop above a smooth flat surface, then
> put a bag of ice under it -- apparently to no avail.
> 
> So:  in the face of prolonged near-critical temperatures, is there a way
> to tell the machine to throttle back & work a bit less hard?  OF course,
> if there's a way to make the cooling more effective, I'd certainly be
> interested in that, as well -- but having the machine shut down like
> that is awfully disruptive.  :-/
> 
> Please include me in responses, as ACPI isn't one of the things I follow
> closely enough to subscribe to the list.
> 
> I will, of course, summariize responses sent off-list that appear to be
> useful.
> 
> Thanks!
> 
> Peace,
> david
I have attached patch, I have put together in the days of 6-CURRENT (I
think), which adds -t <temperature (C)> switch to powerd. Patch would
coerce powerd to drop CPU frequency when temperature is reached.
Unfortunately, I no longer have 6.x system to try it on, so if patch
would not apply, you can either manually add necessary code or send me
your version of /usr/src/usr.sbin/powerd/powerd.c and I will modify it
appropriately.

Since, it does not look like you have ACx levels configured in your ASL,
it is possible that your BIOS have "Fan learning" option. This is the
mode when CPU is run at different frequencies and under different load
and fan speed is adjusted to keep temperature at certain level.
Obstructing air flow (by partially bloking air holes) during learning
mode will usually result in cooler, but noisier system.

When I had similar problem with my laptop, applying moderate amount of
the thermal grease and resetting CPU fan fixed it for good. I would also
recommend investigation source of the noise, you have mentioned --
mechanical obstacles in the path of the fan might cause fan itself to
heat up in the most inopportune moment (read at the highest speeds).

Additionally, if you are pretty much sure that your hardware could
withstand higher temperatures, you can always override _CRT value in
your ASL. See appropriate handbook section to dump your ASL and then
search for something like 

                        Method (_CRT, 0, NotSerialized)
                        {
                            Return (KELV (0x5d))
                        }

return value is in the tenth of the degree on the Kelvin's scale. I,
personally, would not do that.

And last, but not the least -- Antec coolpad (active, USB powered) is
buildword's best friend -- even if your laptop handles temperature
properly, replacing the coolpad is much cheaper and easier then
replacing the fan which has died because it was running full bore for
far too long.

HTH,

-- 
Alexandre Kovalenko (Олександр Коваленко)

--Boundary_(ID_O58A4aArqvfZgtRbBK518g)
Content-type: text/x-patch; name=powerd.c.patch; charset=utf-8
Content-transfer-encoding: 7BIT
Content-disposition: attachment; filename=powerd.c.patch

--- ./usr.sbin/powerd/powerd.c	Sun Apr 17 11:25:41 2005
+++ /home/sunny/powerd.c	Sun Apr 24 20:46:40 2005
@@ -46,7 +46,8 @@
 
 #define DEFAULT_ACTIVE_PERCENT	65
 #define DEFAULT_IDLE_PERCENT	90
-#define DEFAULT_POLL_INTERVAL	500	/* Poll interval in milliseconds */
+#define DEFAULT_POLL_INTERVAL	500     /* Poll interval in milliseconds */
+#define VERY_HIGH_TEMPERATURE   200
 
 enum modes_t {
 	MODE_MIN,
@@ -83,11 +84,13 @@
 static int	freq_mib[4];
 static int	levels_mib[4];
 static int	acline_mib[3];
+static int      temp_mib[5];
 
 /* Configuration */
 static int	cpu_running_mark;
 static int	cpu_idle_mark;
 static int	poll_ival;
+static int      passive_cooling_mark;
 
 static int	apm_fd;
 static int	exit_requested;
@@ -244,7 +247,7 @@
 {
 
 	fprintf(stderr,
-"usage: powerd [-v] [-a mode] [-b mode] [-i %%] [-n mode] [-p ival] [-r %%]\n");
+"usage: powerd [-v] [-a mode] [-b mode] [-i %%] [-n mode] [-p ival] [-r %%] [-t temperature]\n");
 	exit(1);
 }
 
@@ -252,7 +255,7 @@
 main(int argc, char * argv[])
 {
 	long idle, total;
-	int curfreq, *freqs, i, *mwatts, numfreqs;
+	int curfreq, *freqs, i, *mwatts, numfreqs, temperature;
 	int ch, mode_ac, mode_battery, mode_none, acline, mode, vflag;
 	uint64_t mjoules_used;
 	size_t len;
@@ -263,10 +266,11 @@
 	cpu_idle_mark = DEFAULT_IDLE_PERCENT;
 	poll_ival = DEFAULT_POLL_INTERVAL;
 	mjoules_used = 0;
+        passive_cooling_mark = VERY_HIGH_TEMPERATURE;
 	vflag = 0;
 	apm_fd = -1;
 
-	while ((ch = getopt(argc, argv, "a:b:i:n:p:r:v")) != EOF)
+	while ((ch = getopt(argc, argv, "a:b:i:n:p:r:t:v")) != EOF)
 		switch (ch) {
 		case 'a':
 			parse_mode(optarg, &mode_ac, ch);
@@ -300,6 +304,16 @@
 				usage();
 			}
 			break;
+                case 't':
+                        passive_cooling_mark = atoi(optarg);
+                        if(passive_cooling_mark < 0 || passive_cooling_mark > 100) {
+                                 warnx("%d is not valid temperature for passive cooling",
+                                       passive_cooling_mark);
+                                 usage();
+                        }
+                        passive_cooling_mark *= 10;
+                        passive_cooling_mark += 2733;
+                        break;
 		case 'v':
 			vflag = 1;
 			break;
@@ -320,6 +334,9 @@
 	len = 4;
 	if (sysctlnametomib("dev.cpu.0.freq_levels", levels_mib, &len))
 		err(1, "lookup freq_levels");
+	len = 5;
+	if (sysctlnametomib("hw.acpi.thermal.tz0.temperature", temp_mib, &len))
+		err(1, "lookup temperature");
 
 	/* Check if we can read the idle time and supported freqs. */
 	if (read_usage_times(NULL, NULL))
@@ -370,6 +387,10 @@
 		len = sizeof(curfreq);
 		if (sysctl(freq_mib, 4, &curfreq, &len, NULL, 0))
 			err(1, "error reading current CPU frequency");
+                /* Read current temperature. */
+                len = sizeof(temperature);
+                if(sysctl(temp_mib, 5, &temperature, &len, NULL, 0))
+                        err(1, "error reading current temperature");
 
 		if (vflag) {
 			for (i = 0; i < numfreqs; i++) {
@@ -410,12 +431,31 @@
 					err(1, "error setting CPU freq %d",
 					    freqs[0]);
 			}
+                        /* Check for passive cooling override */
+                        if(temperature > passive_cooling_mark) {
+				if (vflag) {
+					printf("passive cooling override; "
+					    "changing frequency to %d MHz\n",
+					    freqs[numfreqs - 1]);
+				}
+				if (set_freq(freqs[numfreqs - 1]))
+					err(1, "error setting CPU freq %d",
+					    freqs[numfreqs - 1]);
+                        }
 			continue;
 		}
 
 		/* Adaptive mode; get the current CPU usage times. */
 		if (read_usage_times(&idle, &total))
 			err(1, "read_usage_times");
+                /*
+                 * If temperature has risen over passive cooling mark, we 
+                 * would want to decrease frequency regardless of the load,
+                 * Simplest way to go about this would be to report 100%
+                 * idle CPU and let adaptive algorithm do its job.
+                 */
+                if(temperature > passive_cooling_mark)
+                  idle = total;
 
 		/*
 		 * If we're idle less than the active mark, jump the CPU to

--Boundary_(ID_O58A4aArqvfZgtRbBK518g)--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1158503176.754.26.camel>