Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Dec 2006 05:23:29 GMT
From:      Patrick Baggett<baggett.patrick@figglesoftware.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   sparc64/106251: malloc fails > for large allocations
Message-ID:  <200612030523.kB35NT9h062726@www.freebsd.org>
Resent-Message-ID: <200612030530.kB35UIIC027362@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         106251
>Category:       sparc64
>Synopsis:       malloc fails > for large allocations
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-sparc64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Dec 03 05:30:12 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Patrick Baggett
>Release:        6.2-RC1
>Organization:
Figgle Software
>Environment:
FreeBSD SPARCLE.figglesoftware.com 6.2-RC1 FreeBSD 6.2-RC1 #0:Fri Nov 17 02.52.51 UTC 2006 root@s-dallas.cse.buffalo.edu/usr/obj/usr/src/sys/GENERIC sparc64
>Description:
I have program that allocates four 768MB blocks of contiguous memory (total of 3GB  = 4x768MB) and runs some operations on them. The after the first call (which successfully allocates the memory), the next 3 return NULL as if there was no memory available. Well...

The system I work on is an Enterprise 420R 4xUltraSPARC II 400MHz with 4GB RAM. Running 'top' shows that right before running, there is 3667MB of free RAM, and 1024MB of swap available. This should be more than enough to allocate 3GB of RAM, or at least one more 768MB block (for a total of 1.5GB).

I compiled this program as a 64 bit program using GCC, so I don't think it should be hitting the line for > 32 bit pointers. Even still, an x86 box could theoretically run this program too.

I have successfully compiled and tested the program under Solaris 10, so I don't think there is an error in my source code. Running 'prstat' on Solaris 10 (similar to 'top' if you haven't used Solaris) shows the appropriate 3072MB of allocated RAM.

FreeBSD should be able to allocate > 3GB of RAM on a 64 bit architecture, especially a 64 bit architecture.
>How-To-Repeat:
Run the test program included. Fails every time at allocation #2.
$ gcc -m64 -O3 dist_fma.c -o dist_fma
$ ./dist_fma
>Fix:
Allow large allocations?

Patch attached with submission follows:

/*
	dist_fma.c -- Distributed floating multiply-add to test sparc64 SMP performance
	Copyright 2006 Figgle Software
	Authors:
		Patrick Baggett (baggett.patrick@figglesoftware.com)
*/

#include <sys/time.h>
#include <stdlib.h>
#include <stdio.h>
#include <pthread.h>
#include <sys/resource.h>

#define NUM_FMA	(1024*1024*192)	/* 128M fma */

/* Use entire 64 bit register */
typedef unsigned long long u64;

typedef struct FMAJob
{
	float* x;
	float* y;
	float* z;
	float* out;
	unsigned int base;
	unsigned int qty;
} FMAJob;

/* 64 bit timer, millisecond accurate */
u64 Ticks()
{
	struct timeval tv;
	gettimeofday(&tv, NULL);
	return (tv.tv_sec*1000) + (tv.tv_usec/1000);
}

static void* fma_thread(void* job)
{
	FMAJob* pJob = (FMAJob*)job;
	u64 i;
	u64 base = pJob->base;
	for(i=0; i<pJob->qty; i++)
	{
		pJob->out[i] = pJob->x[base+i]*pJob->y[base+i]+pJob->z[base+i];
	}
	
	return NULL;
}

void InitData(float* x, float* y, float* z, float* out)
{
	register u64 i;

	/* By initializing the data, we make sure that all pages are commited, not just reserved */

	printf("Initializing data set...\n");
	for(i=0; i<NUM_FMA; i++)
		x[i] = 2.0f;
	
	for(i=0; i<NUM_FMA; i++)
		y[i] = 3.0f;

	for(i=0; i<NUM_FMA; i++)
		z[i] = 4.0f;
	
	for(i=0; i<NUM_FMA; i++)
		out[i] = 0.0f;
	
	printf("Working...\n");
}



int main()
{
	u64 i;
	u64 Start;
	u64 End;
	FMAJob Jobs[4];
	pthread_t ID[4];;


	float* x;
	float* y;
	float* z;
	float* out;

	/* This allocates about 3GB of RAM */
	x = malloc(NUM_FMA*sizeof(float));
	y = malloc(NUM_FMA*sizeof(float));
	z = malloc(NUM_FMA*sizeof(float));
	out = malloc(NUM_FMA*sizeof(float));

	/*
	setpriority(PRIO_PROCESS, 0, -20);
	*/

	if(!x || !y || !z || !out)
	{
		if(!x)
			printf("Could not alloc x\n");
		if(!y)
			printf("Could not alloc y\n");
		if(!z)
			printf("Could not alloc z\n");
		if(!out)
			printf("Could not alloc out\n"); 
		printf("Failed to allocate %lu bytes (%lu MB)\n",
		(u64)(4*NUM_FMA*sizeof(float)), (u64)(4*NUM_FMA*sizeof(float)/(1024*1024)));
		return 0;
	}

	for(i=0; i<4; i++)
	{
		Jobs[i].x = x;
		Jobs[i].y = y;
		Jobs[i].z = z;
		Jobs[i].out = out;
		Jobs[i].base = i*(NUM_FMA/4);
		Jobs[i].qty = (NUM_FMA/4);
	}
	
	InitData(x,y,z,out);
	
	Start = Ticks();

	for(i=0; i<NUM_FMA; i++)	
		out[i] = (x[i] * y[i]) + z[i];
	
	End = Ticks();
	
	printf("Done.\nSerial: %lu msecs\n", End-Start);
	
	InitData(x,y,z,out);

	Start = Ticks();
	for(i=0; i<4; i++)
	{
		/* Thread with a smallest stack + 64KB */
		pthread_create(&ID[i], NULL, fma_thread, (void*)&Jobs[i]);
	}

	pthread_join(ID[0], NULL);
	pthread_join(ID[1], NULL);
	pthread_join(ID[2], NULL);
	pthread_join(ID[3], NULL);
	
	End = Ticks();
	
	printf("Done.\nParallel: %lu msecs\n", End-Start);

	free(x);
	free(y);
	free(z);
	free(out);
	return 0;
}



>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200612030523.kB35NT9h062726>