From owner-freebsd-stable@FreeBSD.ORG Tue Apr 22 17:38:36 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6171E106566B for ; Tue, 22 Apr 2008 17:38:36 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.freebsd.org (Postfix) with ESMTP id 05DD58FC0C for ; Tue, 22 Apr 2008 17:38:35 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.14.2/jtpda-5.4) with ESMTP id m3MHcXe2067701 ; Tue, 22 Apr 2008 19:38:34 +0200 (CEST) X-Ids: 164 Received: from heho.snv.jussieu.fr (localhost [127.0.0.1]) by heho.snv.jussieu.fr (8.13.3/jtpda-5.2) with ESMTP id m3MHcWd0026441 ; Tue, 22 Apr 2008 19:38:32 +0200 (MEST) Received: (from arno@localhost) by heho.snv.jussieu.fr (8.13.3/8.13.1/Submit) id m3MHcWv7026438; Tue, 22 Apr 2008 19:38:32 +0200 (MEST) (envelope-from arno) To: Mike Tancsa References: <20080421094718.GY25623@hub.freebsd.org> <200804211537.m3LFbaZA086977@lava.sentex.ca> <200804221501.m3MF1guW092221@lava.sentex.ca> From: "Arno J. Klaassen" Date: 22 Apr 2008 19:38:32 +0200 In-Reply-To: <200804221501.m3MF1guW092221@lava.sentex.ca> Message-ID: Lines: 112 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (shiva.jussieu.fr [134.157.0.164]); Tue, 22 Apr 2008 19:38:34 +0200 (CEST) X-Virus-Scanned: ClamAV 0.92/6880/Tue Apr 22 16:13:41 2008 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at jchkmail.jussieu.fr with ID 480E229A.000 by Joe's j-chkmail (http : // j-chkmail dot ensmp dot fr)! X-j-chkmail-Enveloppe: 480E229A.000/134.157.184.22/heho.snv.jussieu.fr/heho.snv.jussieu.fr/ X-j-chkmail-Score: MSGID : 480E229A.000 on jchkmail.jussieu.fr : j-chkmail score : . : R=. U=. O=. B=0.018 -> S=0.018 X-j-chkmail-Status: Ham Cc: stable@freebsd.org Subject: Re: nfs-server silent data corruption X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Apr 2008 17:38:36 -0000 Hello, Mike Tancsa writes: > At 05:57 PM 4/21/2008, Arno J. Klaassen wrote: > > > Hi, > > > How long does it take for the problem to show up ? > > > > > >Less than an hour in general (running the same client script > >simultanuously on a 100Mbps linux box and 1Gbps bds6-x86) > > I am running my nic at gig speeds only... I recompiled the kernel > this morning to include cpufreq as well as made sure the cool&quiet > was enabled in the BIOS. > > > > >for info, I test with args '38 999' (38M, try 999 times) on linux > >(slightly adapted script BTW) and '138 999' on bsd. The best 'score' I > >got was 'still 871 iterations to go' > > > So far I have done 150 loops with an 80MB file and no issues and 200 > loopswith a 160MB file. My nfe nic does not support MSI and has its > own interrupt > > # vmstat -i > interrupt total rate > irq1: atkbd0 5 0 > irq4: sio0 3049 1 > irq16: twe0 327046 164 > irq19: bge0 385147 194 > irq21: atapci1 976355 492 > irq23: nfe0 11876726 5986 > cpu0: timer 3966420 1999 > cpu1: timer 3964392 1998 # vmstat -i interrupt total rate irq1: atkbd0 4 0 irq14: ata0 69 0 irq20: nfe0 11650955 5283 irq24: atapci1 94 0 irq28: atapci2 178 0 irq29: ahd0 355704 161 cpu0: timer 4409020 1999 cpu1: timer 4391646 1991 cpu2: timer 4391643 1991 cpu3: timer 4391641 1991 > I have powerd started up with > powerd_enable="YES" > powerd_flags="-a adaptive -b adaptive -n adaptive" slightly different, I mostly use "-b adaptive -i 90 -n adaptive -r 80" but the problem shows up without flags as well. > With the "sleep" in my test script, powerd does seem to be fiddling > with frequencies as well during the inactivity. I most often provoke slight swapping for "randomizing" frequency changes and a burnK7 or similar to psuh up and down by hand > # sysctl dev. | grep -i fre > dev.cpu.0.freq: 1800 > dev.cpu.0.freq_levels: 2200/110000 2000/105600 1800/89100 1000/49000 > dev.powernow.0.freq_settings: 2200/110000 2000/105600 1800/89100 1000/49000 > dev.powernow.1.freq_settings: 2200/110000 2000/105600 1800/89100 1000/49000 > dev.cpufreq.0.%driver: cpufreq > dev.cpufreq.0.%parent: cpu0 > dev.cpufreq.1.%driver: cpufreq > dev.cpufreq.1.%parent: cpu1 funny, when I do that : # sysctl dev. | grep -i fre dev.cpu.0.freq: 995 dev.cpu.0.freq_levels: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.0.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.1.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.2.freq_settings: 2587/95000 2388/90300 2189/76200 1990/63800 1791/53200 995/36100 dev.powernow.3.freq_settings: 6747/95000 6228/90300 5709/76200 5190/63800 4671/53200 2595/36100 dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%parent: cpu0 dev.cpufreq.1.%driver: cpufreq dev.cpufreq.1.%parent: cpu1 dev.cpufreq.2.%driver: cpufreq dev.cpufreq.2.%parent: cpu2 dev.cpufreq.3.%driver: cpufreq dev.cpufreq.3.%parent: cpu3 especially the dev.powernow.3.freq_settings look weird ... that said, I once more dug up the old acpi_ppc.c and slightly adapted it for fbsd7 (basically some name changes and using read_cpu_time() i.s.o. cp_time) and the problem disappears ... the algo of acpi_ppc makes it somewhat harder to push up frequencies, though I doubt that matters. I tried as well with hint.acpi_throttle.0.disabled="1" in loader.conf with no luck (using powerd). I'm out of office tomorrow but will try to find time tommorow evening to test with another NIC. Best, Arno