From owner-freebsd-questions@FreeBSD.ORG Wed Oct 4 00:55:16 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B6B8216A47B for ; Wed, 4 Oct 2006 00:55:16 +0000 (UTC) (envelope-from perikillo@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id ABA0543D6D for ; Wed, 4 Oct 2006 00:55:03 +0000 (GMT) (envelope-from perikillo@gmail.com) Received: by wx-out-0506.google.com with SMTP id i27so1733wxd for ; Tue, 03 Oct 2006 17:55:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=k22/UCm+b3JuKuOHefW5mZ4OWKkAVfczfjJY8dFHpLi1OdT+JNc/cIBnn6EPXl69PpTyURnPCXtTPB400XukMtkHPbW8zyo8zdLR9CFgHEGTYv11m1UAfSAFbMhErcdluUVjeeSu8HzlDovw53PrVTESpivjXlggJPUnu+N6xmk= Received: by 10.90.25.3 with SMTP id 3mr4943agy; Tue, 03 Oct 2006 17:55:02 -0700 (PDT) Received: by 10.90.70.18 with HTTP; Tue, 3 Oct 2006 17:55:02 -0700 (PDT) Message-ID: <51d7a5160610031755m643de45dk48b2c4d26d3f511a@mail.gmail.com> Date: Tue, 3 Oct 2006 17:55:02 -0700 From: perikillo To: "FreeBSD Mailing List" In-Reply-To: <51d7a5160610031351h62e33f42kbbdd7e4001c9342d@mail.gmail.com> MIME-Version: 1.0 References: <51d7a5160610031351h62e33f42kbbdd7e4001c9342d@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Oct 2006 00:55:16 -0000 On 10/3/06, perikillo wrote: > > Hi people i have read a some mails about this problem, it looks like all > was running some 5.X branch, i have been using FreeBSD 6.1 some months > ago, yesterday i make the buildworld process, right now i have my box with > FreeBSD6.1-p10. > > This box runs bacula server with this NIC: > > vr0: port 0xe400-0xe4ff mem > 0xee022000-0xee0220ff at device 18.0 on pci0 > vr0: Reserved 0x100 bytes for rid 0x10 type 4 at 0xe400 > miibus0: on vr0 > vr0: bpf attached > vr0: Ethernet address: 00:01:6c:2c:09:90 > vr0: [MPSAFE] > > This NIC is integrated with the motherboard, i used this box with > freebsd 5.4-pX almost 1 year running bacula 1.38.5 without a problem. > > 1 full backup take almost 140Gb of data. > > Last week i lost 1 job Full Backup from one of my biggest servers running > RH9 aprox 80Gb off data, bacula just backup 35Gb and mark the job ->Error > > 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: Network > error with FD during Backup: ERR=Operation timed out > 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: No Job > status returned from FD. > 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Error: Bacula 1.38.11(28Jun06): 26-Sep-2006 00:28:48 > > FD termination status: Error > SD termination status: Error > Termination: *** Backup Error *** > > I have no problem with the client, is running our ERP software and no > comment here. > > In my freebsd console appear this: > > vr0: watchdog timeout > > I reset the server, and all the Differential backups has been working > good, i do the buildworld yesterday and let my bacula server ready to do a > full backup for all my clients and whops... > > I lost 2 clients jobs: > > Client 1: > > 02-Oct 18:30 bacula-dir: Start Backup JobId 176, Job= > PDC.2006-10-02_18.30.00 > 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: Network > error with FD during Backup: ERR=Operation timed out > 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: No Job > status returned from FD. > 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Error: Bacula 1.38.11(28Jun06): 02-Oct-2006 20:40:11 > JobId: 176 > Job: PDC.2006-10-02_18.30.00 > Backup Level: Full > Client: "PDC" Windows NT 4.0,MVS,NT 4.0.1381 > FileSet: "PDC-FS" 2006-08-21 18:04:12 > Pool: "FullTape" > Storage: "LTO-1" > Scheduled time: 02-Oct-2006 18:30:00 > Start time: 02-Oct-2006 18:30:06 > End time: 02-Oct-2006 20:40:11 > Elapsed time: 2 hours 10 mins 5 secs > Priority: 11 > FD Files Written: 0 > SD Files Written: 0 > FD Bytes Written: 0 (0 B) > SD Bytes Written: 0 (0 B) > Rate: 0.0 KB/s > Software Compression: None > Volume name(s): FullTape-0004 > Volume Session Id: 2 > Volume Session Time: 1159832414 > Last Volume Bytes: 38,857,830,949 ( 38.85 GB) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: Error > SD termination status: Error > Termination: *** Backup Error *** > > Client 2 > > 02-Oct 21:30 bacula-dir: Start Backup JobId 178, Job= > MBXBDCB.2006-10-02_21.30.00 > 02-Oct 21:31 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 > Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down > Retrying ... > 02-Oct 21:37 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 > Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down > Retrying ... > 02-Oct 21:44 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 > Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down > Retrying ... > 02-Oct 21:51 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 > Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down > Retrying ... > 02-Oct 21:58 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 > Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down > Retrying ... > 02-Oct 22:04 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 > Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down > Retrying ... > 02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Fatal error: bnet.c:859 > Unable to connect to File daemon on 192.168.2.9:9102 . ERR=Host is down > 02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Error: Bacula 1.38.11(28Jun06): 02-Oct-2006 22:10:03 > JobId: 178 > Job: MBXBDCB.2006-10-02_21.30.00 > Backup Level: Full > Client: "MBXBDCB" i686-pc-linux-gnu,redhat,9 > FileSet: "MBXBDCB-FS" 2006-08-21 23:00:02 > Pool: "FullTape" > Storage: "LTO-1" > Scheduled time: 02-Oct-2006 21:30:00 > Start time: 02-Oct-2006 21:30:02 > End time: 02-Oct-2006 22:10:03 > Elapsed time: 40 mins 1 sec > Priority: 13 > FD Files Written: 0 > SD Files Written: 0 > FD Bytes Written: 0 (0 B) > SD Bytes Written: 0 (0 B) > Rate: 0.0 KB/s > Software Compression: None > Volume name(s): > Volume Session Id: 4 > Volume Session Time: 1159832414 > Last Volume Bytes: 38,857,830,949 (38.85 GB) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: > SD termination status: Waiting on FD > Termination: *** Backup Error *** > > My console again: > > vr0: watchdog timeout > > But my catalog backup was made with success. > > 03-Oct 03:00 bacula-dir: Start Backup JobId 179, Job= > BackupCatalog.2006-10-03_03.00.00 > 03-Oct 03:03 bacula-dir: Bacula 1.38.11 (28Jun06): 03-Oct-2006 03:03:00 > JobId: 179 > Job: BackupCatalog.2006-10-03_03.00.00 > Backup Level: Full > Client: "BACULA" i386-portbld-freebsd6.1,freebsd, > 6.1-RELEASE-p3 > FileSet: "CATALOG-FS" 2006-08-22 05:00:02 > Pool: "FullTape" > Storage: "LTO-1" > Scheduled time: 03-Oct-2006 03:00:00 > Start time: 03-Oct-2006 03:00:50 > End time: 03-Oct-2006 03:03:00 > Elapsed time: 2 mins 10 secs > Priority: 14 > FD Files Written: 7,646 > SD Files Written: 7,646 > FD Bytes Written: 360,432,688 (360.4 MB) > SD Bytes Written: 361,320,457 (361.3 MB) > Rate: 2772.6 KB/s > Software Compression: None > Volume name(s): FullTape-0004 > Volume Session Id: 5 > Volume Session Time: 1159832414 > Last Volume Bytes: 39,219,629,264 (39.21 GB) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: OK > SD termination status: OK > Termination: Backup OK > > I wasnt on that office, i note this during the morning because went i > was trying to access that server from the other building with putty, i > couldn't connect at first, them my main say "it's happend again :-("... i > call to my friend there to un-plug and plug the cable and just with that i > was able to connect to that server. > > It looks like this NIC is having problems with the workload hi, i have > 2 things here that i can do: > > 1; Change the cable and try again. > 2; Change the NIC and try again. > > What else can i do..? > > But i really hope someone fix this problem, thanks all for your time. > > Part of my dmesg output: > > Copyright (c) 1992-2006 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD 6.1-RELEASE-p10 #5: Mon Oct 2 13:26:52 PDT 2006 > root@bacula.MBX.local:/usr/obj/usr/src/sys/BACULA > Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a14000. > Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a14188. > Table 'FACP' at 0x1bff3040 > Table 'APIC' at 0x1bff7dc0 > MADT: Found table at 0x1bff7dc0 > MP Configuration Table version 1.1 found at 0xc00f1400 > APIC: Using the MADT enumerator. > MADT: Found CPU APIC ID 0 ACPI ID 0: enabled > ACPI APIC Table: > Calibrating clock(s) ... i8254 clock: 1193181 Hz > CLK_USE_I8254_CALIBRATION not specified - using default frequency > Timecounter "i8254" frequency 1193182 Hz quality 0 > Calibrating TSC clock ... TSC clock: 1600072446 Hz > CPU: AMD Duron(tm) processor (1600.07-MHz 686-class CPU) > Origin = "AuthenticAMD" Id = 0x681 Stepping = 1 > > Features=0x383fbff > AMD Features=0xc0400800 > Data TLB: 32 entries, fully associative > Instruction TLB: 16 entries, fully associative > L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative > L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way > associative > L2 internal cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 8-way > associative > real memory = 469696512 (447 MB) > Physical memory chunk(s): > 0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages) > 0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages) > 0x0000000000c25000 - 0x000000001b7d7fff, 448475136 bytes (109491 pages) > avail memory = 450490368 (429 MB) > bios32: Found BIOS32 Service Directory header at 0xc00fac70 > bios32: Entry = 0xfb0f0 (c00fb0f0) Rev = 0 Len = 1 > pcibios: PCI BIOS entry at 0xf0000+0xb160 > pnpbios: Found PnP BIOS data at 0xc00fbc20 > pnpbios: Entry = f0000:bc50 Rev = 1.0 > Other BIOS signatures found: > APIC: CPU 0 has ACPI ID 0 > MADT: Found IO APIC ID 2, Interrupt 0 at 0xfec00000 > ioapic0: Routing external 8259A's -> intpin 0 > > Greetings. > > Wow.... Today i change the NIC to one 3COM: xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xd400-0xd47f mem 0xee021000-0xee02107f irq 17 at device 9.0 on pci0 xl0: Reserved 0x80 bytes for rid 0x14 type 3 at 0xee021000 xl0: using memory mapped I/O xl0: media options word: a xl0: found MII/AUTO miibus0: on xl0 xl0: bpf attached xl0: Ethernet address: 00:01:02:6d:e8:a4 xl0: [MPSAFE] Right now my first backup again crash xl0: watchdog timeout Right now i change the cable from on port to another and see what happends. Guy, please someone has something to tell me, this is critical for me. This is my second NIC.