From owner-freebsd-questions@FreeBSD.ORG Tue Oct 3 20:53:25 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 967D216A4EE for ; Tue, 3 Oct 2006 20:53:25 +0000 (UTC) (envelope-from perikillo@gmail.com) Received: from wx-out-0506.google.com (wx-out-0506.google.com [66.249.82.236]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4084043D6A for ; Tue, 3 Oct 2006 20:51:44 +0000 (GMT) (envelope-from perikillo@gmail.com) Received: by wx-out-0506.google.com with SMTP id i27so2187799wxd for ; Tue, 03 Oct 2006 13:51:43 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=EvIj7Vq6xuIfB7l/R8N8E0cALk1YmqL81Te8GDx3hnIHd/lthzhPqy5MUzLz5IMHcPEHTITiNpIrdHVBZwYeoERZSOTDfi2zydValJNfq0OoEw7OvM0bHWWI/gHqQOnJdt4Bi+ME9VJMjHcqxWoZMsaPmyUwe5Z7dk+Z7z40yEM= Received: by 10.90.93.6 with SMTP id q6mr4086498agb; Tue, 03 Oct 2006 13:51:43 -0700 (PDT) Received: by 10.90.70.18 with HTTP; Tue, 3 Oct 2006 13:51:42 -0700 (PDT) Message-ID: <51d7a5160610031351h62e33f42kbbdd7e4001c9342d@mail.gmail.com> Date: Tue, 3 Oct 2006 13:51:42 -0700 From: perikillo To: "FreeBSD Mailing List" MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: vr0: watchdog timeout FreeBSD 6.1-p10 Crashing my backups X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Oct 2006 20:53:25 -0000 Hi people i have read a some mails about this problem, it looks like all was running some 5.X branch, i have been using FreeBSD 6.1 some months ago, yesterday i make the buildworld process, right now i have my box with FreeBSD6.1-p10. This box runs bacula server with this NIC: vr0: port 0xe400-0xe4ff mem 0xee022000-0xee0220ff at device 18.0 on pci0 vr0: Reserved 0x100 bytes for rid 0x10 type 4 at 0xe400 miibus0: on vr0 vr0: bpf attached vr0: Ethernet address: 00:01:6c:2c:09:90 vr0: [MPSAFE] This NIC is integrated with the motherboard, i used this box with freebsd 5.4-pX almost 1 year running bacula 1.38.5 without a problem. 1 full backup take almost 140Gb of data. Last week i lost 1 job Full Backup from one of my biggest servers running RH9 aprox 80Gb off data, bacula just backup 35Gb and mark the job ->Error 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: Network error with FD during Backup: ERR=Operation timed out 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Fatal error: No Job status returned from FD. 26-Sep 00:28 bacula-dir: MBXBDCB.2006-09-25_21.30.00 Error: Bacula 1.38.11(28Jun06): 26-Sep-2006 00:28:48 FD termination status: Error SD termination status: Error Termination: *** Backup Error *** I have no problem with the client, is running our ERP software and no comment here. In my freebsd console appear this: vr0: watchdog timeout I reset the server, and all the Differential backups has been working good, i do the buildworld yesterday and let my bacula server ready to do a full backup for all my clients and whops... I lost 2 clients jobs: Client 1: 02-Oct 18:30 bacula-dir: Start Backup JobId 176, Job=PDC.2006-10-02_18.30.00 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: Network error with FD during Backup: ERR=Operation timed out 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Fatal error: No Job status returned from FD. 02-Oct 20:40 bacula-dir: PDC.2006-10-02_18.30.00 Error: Bacula 1.38.11(28Jun06): 02-Oct-2006 20:40:11 JobId: 176 Job: PDC.2006-10-02_18.30.00 Backup Level: Full Client: "PDC" Windows NT 4.0,MVS,NT 4.0.1381 FileSet: "PDC-FS" 2006-08-21 18:04:12 Pool: "FullTape" Storage: "LTO-1" Scheduled time: 02-Oct-2006 18:30:00 Start time: 02-Oct-2006 18:30:06 End time: 02-Oct-2006 20:40:11 Elapsed time: 2 hours 10 mins 5 secs Priority: 11 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None Volume name(s): FullTape-0004 Volume Session Id: 2 Volume Session Time: 1159832414 Last Volume Bytes: 38,857,830,949 (38.85 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: Error SD termination status: Error Termination: *** Backup Error *** Client 2 02-Oct 21:30 bacula-dir: Start Backup JobId 178, Job= MBXBDCB.2006-10-02_21.30.00 02-Oct 21:31 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down Retrying ... 02-Oct 21:37 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down Retrying ... 02-Oct 21:44 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down Retrying ... 02-Oct 21:51 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down Retrying ... 02-Oct 21:58 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down Retrying ... 02-Oct 22:04 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Warning: bnet.c:853 Could not connect to File daemon on 192.168.2.9:9102. ERR=Host is down Retrying ... 02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Fatal error: bnet.c:859 Unable to connect to File daemon on 192.168.2.9:9102. ERR=Host is down 02-Oct 22:10 bacula-dir: MBXBDCB.2006-10-02_21.30.00 Error: Bacula 1.38.11(28Jun06): 02-Oct-2006 22:10:03 JobId: 178 Job: MBXBDCB.2006-10-02_21.30.00 Backup Level: Full Client: "MBXBDCB" i686-pc-linux-gnu,redhat,9 FileSet: "MBXBDCB-FS" 2006-08-21 23:00:02 Pool: "FullTape" Storage: "LTO-1" Scheduled time: 02-Oct-2006 21:30:00 Start time: 02-Oct-2006 21:30:02 End time: 02-Oct-2006 22:10:03 Elapsed time: 40 mins 1 sec Priority: 13 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None Volume name(s): Volume Session Id: 4 Volume Session Time: 1159832414 Last Volume Bytes: 38,857,830,949 (38.85 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: SD termination status: Waiting on FD Termination: *** Backup Error *** My console again: vr0: watchdog timeout But my catalog backup was made with success. 03-Oct 03:00 bacula-dir: Start Backup JobId 179, Job= BackupCatalog.2006-10-03_03.00.00 03-Oct 03:03 bacula-dir: Bacula 1.38.11 (28Jun06): 03-Oct-2006 03:03:00 JobId: 179 Job: BackupCatalog.2006-10-03_03.00.00 Backup Level: Full Client: "BACULA" i386-portbld-freebsd6.1,freebsd, 6.1-RELEASE-p3 FileSet: "CATALOG-FS" 2006-08-22 05:00:02 Pool: "FullTape" Storage: "LTO-1" Scheduled time: 03-Oct-2006 03:00:00 Start time: 03-Oct-2006 03:00:50 End time: 03-Oct-2006 03:03:00 Elapsed time: 2 mins 10 secs Priority: 14 FD Files Written: 7,646 SD Files Written: 7,646 FD Bytes Written: 360,432,688 (360.4 MB) SD Bytes Written: 361,320,457 (361.3 MB) Rate: 2772.6 KB/s Software Compression: None Volume name(s): FullTape-0004 Volume Session Id: 5 Volume Session Time: 1159832414 Last Volume Bytes: 39,219,629,264 (39.21 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination: Backup OK I wasnt on that office, i note this during the morning because went i was trying to access that server from the other building with putty, i couldn't connect at first, them my main say "it's happend again :-("... i call to my friend there to un-plug and plug the cable and just with that i was able to connect to that server. It looks like this NIC is having problems with the workload hi, i have 2 things here that i can do: 1; Change the cable and try again. 2; Change the NIC and try again. What else can i do..? But i really hope someone fix this problem, thanks all for your time. Part of my dmesg output: Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-RELEASE-p10 #5: Mon Oct 2 13:26:52 PDT 2006 root@bacula.MBX.local:/usr/obj/usr/src/sys/BACULA Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a14000. Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a14188. Table 'FACP' at 0x1bff3040 Table 'APIC' at 0x1bff7dc0 MADT: Found table at 0x1bff7dc0 MP Configuration Table version 1.1 found at 0xc00f1400 APIC: Using the MADT enumerator. MADT: Found CPU APIC ID 0 ACPI ID 0: enabled ACPI APIC Table: Calibrating clock(s) ... i8254 clock: 1193181 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter "i8254" frequency 1193182 Hz quality 0 Calibrating TSC clock ... TSC clock: 1600072446 Hz CPU: AMD Duron(tm) processor (1600.07-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x681 Stepping = 1 Features=0x383fbff AMD Features=0xc0400800 Data TLB: 32 entries, fully associative Instruction TLB: 16 entries, fully associative L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative L2 internal cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 8-way associative real memory = 469696512 (447 MB) Physical memory chunk(s): 0x0000000000001000 - 0x000000000009efff, 647168 bytes (158 pages) 0x0000000000100000 - 0x00000000003fffff, 3145728 bytes (768 pages) 0x0000000000c25000 - 0x000000001b7d7fff, 448475136 bytes (109491 pages) avail memory = 450490368 (429 MB) bios32: Found BIOS32 Service Directory header at 0xc00fac70 bios32: Entry = 0xfb0f0 (c00fb0f0) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0xf0000+0xb160 pnpbios: Found PnP BIOS data at 0xc00fbc20 pnpbios: Entry = f0000:bc50 Rev = 1.0 Other BIOS signatures found: APIC: CPU 0 has ACPI ID 0 MADT: Found IO APIC ID 2, Interrupt 0 at 0xfec00000 ioapic0: Routing external 8259A's -> intpin 0 Greetings.