Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Nov 2004 19:33:51 +0100
From:      Georg Altmann <galtmann@las-cad.com>
To:        freebsd-questions@freebsd.org
Subject:   semaphore problem with Bakbone's Netvault on FreeBSD 4.10
Message-ID:  <C79517C1DAEEB0C790F8D4E7@[192.168.98.23]>

next in thread | raw e-mail | index | archive | help
Hi all,

I have a problem running the backup software Netvault from Bakbone 
(http://www.bakbone.com) under FreeBSD 4.10.
We are using Netvault to make backups of two servers and several 
workstations in our network. Backups are first staged to disk and later 
transferred to an ADIC FastStor 2 (LTO 1) library.

The problem occurs when the backup is transferred from disk to tape and if 
the backup job spans multiple tapes (not virtual media!): Netvault 
recognizes the end-of-media, loads a new media for the job and then hangs 
endlessly trying to write to the new media.

Bakbone claims, that this is a problem with SysV shared memory and 
semaphores in FreeBSD (and therefore not their software). Their support 
also sent me a patch for Netvault (unfortunately not a binary one, so I 
cannot test it) which is allegedly known to work around the problem. I 
attached it below. Note the added "#if defined(PLATFORM_FREEBSD)" bits in 
the code. To my understanding, the patched code polls the semaphore instead 
of waiting for it.
Please also note, that I configured Netvault to use network sockets instead 
of shared mem for the transfer and the problem persists (I did this by only 
selecting "TCP Data Transfer" in the configure device tab for both the tape 
and the virtual library and by adding [Data Channels]\nForce Sockets=TRUE 
to configure.cfg as proposed by Bakbone support, for anyone familiar with 
netvault). So I assume, the problem is really only related to semaphores 
and not at all to shared memory(?).

I hope that someone can guess from the code if this really leads to a 
deadlock because of a FreeBSD bug in SysV sempaphore handling.
I have already skimmed FreeBSD PRs for semaphore bugs, but no post really 
seemed related to this specific problem (see 
http://www.freebsd.org/cgi/query-pr-summary.cgi?category=&severity=&priority=&class=&state=&sort=none&text=semaphore&responsible=&multitext=&originator=&closedtoo=on&release=). 
So, maybe somebody is able to make clear whether I should urge Bakbone to 
fix their buggy software or look for a patch for the FreeBSD kernel and/or 
libraries.

Some info on the server:
$ uname -a
FreeBSD asterix.las-cad.local 4.10-RELEASE-p3 FreeBSD 4.10-RELEASE-p3 #0: 
Sat Sep 25 17:05:29 CEST 2004 
root@asterix.las-cad.local:/usr/obj/usr/src/sys/ASTERIX  i386
Netvault Version 7.11 Build 10 Release R2004AUG19-CHIEF

Hardware:
AMD Athlon @ 1100 MHz, 768 MB RAM

tape library:
ahc0: <Adaptec 29160 Ultra160 SCSI adapter> port 0xbc00-0xbcff mem 
0xdfffb000-0xdfffbfff irq 5 at device 12.0 on pci0
sa0 at ahc0 bus 0 target 5 lun 0
sa0: <HP Ultrium 1-SCSI E33A> Removable Sequential Access SCSI-3 device
sa0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit)
pass0 at ahc0 bus 0 target 0 lun 0
pass0: <ADIC FastStor 2 A12r> Removable Changer SCSI-2 device
pass0: 3.300MB/s transfers

staging disk:
ad4: 114473MB <ST3120022A> [232581/16/63] at ata2-master UDMA100
connected to
atapci1: <HighPoint HPT370 ATA100 controller> port 
0xc800-0xc8ff,0xcc00-0xcc03,0xd000-0xd007,0xd400-0xd403,0xd800-0xd807 irq 
10 at device 10.0 on pci0
ata2: at 0xd800 on atapci1

Thank you!

Best regards,
Georg Altmann

--

>-------------------<>-----------------------------<
> Georg Altmann     <> Phone +49 (0)89 17809328    <
> LAS-CAD GmbH      <> Fax   +49 (0)89 172594      <
> Brunhildenstr. 9  <> e-mail galtmann@las-cad.com <
> D-80639 Munich    <> backup george@george-net.de <
> Germany           <> http://www.las-cad.com      <
>-------------------<>-----------------------------<

code:

/*
** Wait on a semaphore, i.e., wait for a resource to become available.
*/
INTERNAL DataResultE DataSemaphoreWait
(
 DataChannelShmemQualifierO  oShmemInfo,
 int                         iSemNum,
 BooleanT                   *pbBlocked,
 long                       *plElapse
)
{
	DataResultE                tResult = DataFailure;
	struct sembuf              sOperation;

	sOperation.sem_num = iSemNum;
	sOperation.sem_op  = -1; /* Decrement semaphore by 1 when it's ready */
	sOperation.sem_flg = IPC_NOWAIT;
	if (0 == semop(oShmemInfo->lSemId, &sOperation, 1))
	{
		TRACE((108, 0, LIBVERBOSE, "Decrement semaphore %d", iSemNum));
		if(0 != (*oShmemInfo->pllEomFlag))
		{
			TRACE((173,0,NORMAL,"Got a EOM flag"));
			tResult = DataEom;
		}
		else
		{
			tResult = DataSuccess;
		}
	}
	else
	{
		long lMyError = errno;
#if defined(PLATFORM_FREEBSD)
    BooleanT bDone = FALSE;

    while(EAGAIN == lMyError && FALSE == bDone)
#else
		if(EAGAIN == lMyError)
#endif /* Not PLATFORM_FREEBSD */
      {
#if defined(PLATFORM_FREEBSD)
	    sOperation.sem_flg = IPC_NOWAIT;
#else
	    sOperation.sem_flg = 0;
#endif /* Not PLATFORM_FREEBSD */
			if(NULL != plElapse)
			{
				int64 llStart = TimeNowAsInt64();

				*pbBlocked = TRUE;
				if (0 == semop(oShmemInfo->lSemId, &sOperation, 1))
				{
					TRACE((109, 0, LIBVERBOSE, "Decrement semaphore %d, after wait", 
iSemNum));
					if(0 != (*oShmemInfo->pllEomFlag))
					{
						TRACE((174,0,NORMAL,"Got a EOM flag"));
#if defined(PLATFORM_FREEBSD)
						tResult = DataEom;
#endif /* PLATFORM_FREEBSD */
					  bDone = TRUE;
          }
					else
					{
						tResult = DataSuccess;
#if defined(PLATFORM_FREEBSD)
					  bDone = TRUE;
#endif /* PLATFORM_FREEBSD */
					}
				}
				else
				{
					lMyError = errno;
				}
				*plElapse += TimeNowAsInt64() - llStart;
			}
			else
			{
				if (0 == semop(oShmemInfo->lSemId, &sOperation, 1))
				{
					TRACE((110, 0, LIBVERBOSE, "Decrement semaphore %d, after wait", 
iSemNum));
					if(0 != (*oShmemInfo->pllEomFlag))
					{
						TRACE((176, 0, NORMAL, "Got a EOM flag"));
						tResult = DataEom;
#if defined(PLATFORM_FREEBSD)
					  bDone = TRUE;
#endif /* PLATFORM_FREEBSD */
					}
					else
					{
						tResult = DataSuccess;
#if defined(PLATFORM_FREEBSD)
					  bDone = TRUE;
#endif /* PLATFORM_FREEBSD */
					}
				}
				else
				{
					lMyError = errno;
				}
			}
		}
		if(DataSuccess != tResult && DataEom != tResult)
		{
			if (EINTR != lMyError)
			{
				TRACE((2, lMyError, LIBNORMAL, "wait on semaphore %ld failed: %s",
							 iSemNum,
							 SysErrorString(lMyError)));
			}
			else
			{
				tResult = DataInterrupted;
			}
		}
	}
	return tResult;
}

#endif





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C79517C1DAEEB0C790F8D4E7>