Skip site navigation (1)Skip section navigation (2)
Date:      22 Apr 2008 19:38:32 +0200
From:      "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
To:        Mike Tancsa <mike@sentex.net>
Cc:        stable@freebsd.org
Subject:   Re: nfs-server silent data corruption
Message-ID:  <wpzlrlu6w7.fsf@heho.snv.jussieu.fr>
In-Reply-To: <200804221501.m3MF1guW092221@lava.sentex.ca>
References:  <wpmyno2kqe.fsf@heho.snv.jussieu.fr> <20080421094718.GY25623@hub.freebsd.org> <wp63ubp8e0.fsf@heho.snv.jussieu.fr> <200804211537.m3LFbaZA086977@lava.sentex.ca> <wpy77650s0.fsf@heho.snv.jussieu.fr> <200804221501.m3MF1guW092221@lava.sentex.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

Hello,

Mike Tancsa <mike@sentex.net> writes:

> At 05:57 PM 4/21/2008, Arno J. Klaassen wrote:
> > > Hi,
> > > How long does it take for the problem to show up ?
> >
> >
> >Less than an hour in general (running the same client script
> >simultanuously on a 100Mbps linux box and 1Gbps bds6-x86)
> 
> I am running my nic at gig speeds only...   I recompiled the kernel
> this morning to include cpufreq as well as made sure the cool&quiet
> was enabled in the BIOS.
> 
> 
> 
> >for info, I test with args '38 999' (38M, try 999 times) on linux
> >(slightly adapted script BTW) and '138 999' on bsd. The best 'score' I
> >got was 'still 871 iterations to go'
> 
> 
> So far I have done 150 loops with an 80MB file and no issues and 200
> loopswith a 160MB file.  My nfe nic does not support MSI and has its
> own interrupt
> 
> # vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                           5          0
> irq4: sio0                          3049          1
> irq16: twe0                       327046        164
> irq19: bge0                       385147        194
> irq21: atapci1                    976355        492
> irq23: nfe0                     11876726       5986
> cpu0: timer                      3966420       1999
> cpu1: timer                      3964392       1998
 

# vmstat -i
interrupt                          total       rate
irq1: atkbd0                           4          0
irq14: ata0                           69          0
irq20: nfe0                     11650955       5283
irq24: atapci1                        94          0
irq28: atapci2                       178          0
irq29: ahd0                       355704        161
cpu0: timer                      4409020       1999
cpu1: timer                      4391646       1991
cpu2: timer                      4391643       1991
cpu3: timer                      4391641       1991
 
> I have powerd started up with
> powerd_enable="YES"
> powerd_flags="-a adaptive -b adaptive -n adaptive"


slightly different, I mostly use "-b adaptive -i 90 -n adaptive -r 80"
but the problem shows up without flags as well.

 
> With the "sleep" in my test script, powerd does seem to be fiddling
> with frequencies as well during the inactivity.

I most often provoke slight swapping for "randomizing" frequency changes
and a burnK7 or similar to psuh up and down by hand
 
> # sysctl dev. | grep -i fre
> dev.cpu.0.freq: 1800
> dev.cpu.0.freq_levels: 2200/110000 2000/105600 1800/89100 1000/49000
> dev.powernow.0.freq_settings: 2200/110000 2000/105600 1800/89100 1000/49000
> dev.powernow.1.freq_settings: 2200/110000 2000/105600 1800/89100 1000/49000
> dev.cpufreq.0.%driver: cpufreq
> dev.cpufreq.0.%parent: cpu0
> dev.cpufreq.1.%driver: cpufreq
> dev.cpufreq.1.%parent: cpu1

funny, when I do that :

# sysctl dev. | grep -i fre
dev.cpu.0.freq: 995
dev.cpu.0.freq_levels: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.0.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.1.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.2.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100
dev.powernow.3.freq_settings: 6747/95000 6228/90300 5709/76200 5190/63800 4671/53200 2595/36100
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.1.%driver: cpufreq
dev.cpufreq.1.%parent: cpu1
dev.cpufreq.2.%driver: cpufreq
dev.cpufreq.2.%parent: cpu2
dev.cpufreq.3.%driver: cpufreq
dev.cpufreq.3.%parent: cpu3

especially the  dev.powernow.3.freq_settings look weird ...

that said, I once more dug up the old acpi_ppc.c and slightly
adapted it for fbsd7 (basically some name changes and using
read_cpu_time() i.s.o. cp_time) and the problem disappears ...

the algo of acpi_ppc makes it somewhat harder to push up frequencies,
though I doubt that matters.

I tried as well with hint.acpi_throttle.0.disabled="1" in loader.conf
with no luck (using powerd).

I'm out of office tomorrow but will try to find time tommorow evening
to test with another NIC.

Best, Arno



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wpzlrlu6w7.fsf>