Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Mar 2008 23:43:31 +0700 (KRAT)
From:      Eugene Grosbein <eugen@kuzbass.ru>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   kern/121433: [cpufreq] kern_cpu.c's logic error leads to spontaneous disabling of passive cooling
Message-ID:  <200803061643.m26GhVBU005478@delikates-nk.ru>
Resent-Message-ID: <200803061700.m26H042m004891@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         121433
>Category:       kern
>Synopsis:       [cpufreq] kern_cpu.c's logic error leads to spontaneous disabling of passive cooling
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 06 17:00:03 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Eugene Grosbein
>Release:        FreeBSD 6.3-PRERELEASE i386
>Organization:
Svyaz-Service JSC
>Environment:
System: FreeBSD 6.3-PRERELEASE, Pentium-4 2.0Ghz

>Description:
	I've 1U/unipocessor FreeBSD 6.3-PRERELEASE server having inadequate
	active cooling that leads to CPU overheating. The server is remote and while
	good cooling is being prepared, I decided to use passive cooling feature
	of acpi_thermal(4). It uses p4tcc here and really helps
	to keep CPU temperature in bounds but there is annoying bug:
	very often (many times per hour) the acpi_thermal(4)
	disables passive cooling with a message:

failed to set new freq, disabling passive cooling

	So I need to use cron to (re)enable passive cooling ones a minute
	to keep it running.

	I've tracked this down to src/sys/kern/kern_cpu.c,
	function cf_get_method():

	1) src/sys/dev/acpica/acpi_thermal.c, function acpi_tz_cooling_thread()
	calls acpi_tz_cpufreq_update() from same file;

	2) acpi_tz_cpufreq_update() calls CPUFREQ_GET() that takes us to
	src/sys/kern/kern_cpu.c, cf_get_method();

	3) cf_get_method() has the following code:

        /*
         * Reacquire the lock and search for the given level.
         *
         * XXX Note: this is not quite right since we really need to go
         * through each level and compare both absolute and relative
         * settings for each driver in the system before making a match.
         * The estimation code below catches this case though.
         */
        CF_MTX_LOCK(&sc->lock);
        for (n = 0; n < numdevs && curr_set->freq == CPUFREQ_VAL_UNKNOWN; n++) {
                if (!device_is_attached(devs[n]))
                        continue;
                error = CPUFREQ_DRV_GET(devs[n], &set);
                if (error)
                        continue;
                for (i = 0; i < count; i++) {
                        if (CPUFREQ_CMP(set.freq, levels[i].total_set.freq)) {
                                sc->curr_level = levels[i];
                                break;
                        }
                }
        }

	Note that error value is not cleaned after this cycle.
	It happens to be ENXIO after the cycle in my case.
	Later code successfully reports:

CF_DEBUG("get estimated freq %d\n", curr_set->freq);

	(curr_set->freq always happens to be max value of CPU frequency here)

	Then it does 'return (error);' with value ENXIO propagated
	from the cycle shown above.

	4) acpi_tz_cpufreq_update() propagates ENXIO
	to acpi_tz_cooling_thread() that disables passive cooling.

>How-To-Repeat:

	Just use uniprocessor Pentium-4 system with heavy constant CPU load,
	acpi_thermal/cpufreq/p4tcc and tune acpi_thermal so passive cooling
	gets used. Here is my /etc/sysctl.conf:

debug.cpufreq.lowest=1246                                                                      
#debug.cpufreq.verbose=1                                                                       
hw.acpi.thermal.user_override=1                                                                
hw.acpi.thermal.tz0.passive_cooling=1                                                          
hw.acpi.thermal.tz0._PSV=70C                                                                   
hw.acpi.thermal.tz0._CRT=75C


>Fix:

	Unknown. Perhaps, just clear errno after the code cited above?
	As workaround, I've patched acpi_thermal(4) to not disable
	passive cooling when acpi_tz_cpufreq_update() returns ENXIO,
	that works for me.

Eugene Grosbein
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200803061643.m26GhVBU005478>