Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Nov 1998 12:26:26 -0800
From:      grady@xcf.berkeley.edu (Steven Grady)
To:        multimedia@FreeBSD.ORG
Subject:   How can we switch to a higher-level audio interface?
Message-ID:  <199811022025.MAA29067@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help
[Summary: the current standard of interacting directly with the
audio device sucks.  What approach can we take to improve it,
if any?]

As I've been experimenting more with various audio-related pieces
of software in the last few months, I've become more and more
concerned with the fact that there is an increasing body of software
that uses a _really_ broken approach to sound, namely, to open the
device directly.  This has three serious problems: it doesn't work
over the network, only one application at a time can play a sound,
and it is a low-level API.  All of these problems used to exist
for graphics, which was of course why X was developed.

There have been various solutions proposed -- the Network Audio
Server is probably the most advanced, but development/maintenance
seems abandoned, and the documentation is skimpy (I know -- I'm
trying to write an application that uses it).  There's also rplay,
KDE's kaudioserver (which I haven't looked into yet), and some
others.

The problem is that none of these solutions is being adopted on a
widespread basis.  Instead, new applications still use /dev/audio,
/dev/dsp, etc.

It seems to me that there are a few reasons why no server-based
model has been adopted.  The main reason is that there is no
standard.  Unlike X, there is no single interface that everyone
uses.  Also, the various solutions are not particularly robust --
who wants to spend their precious development time debugging someone
else's code?  But most likely, I think that it is just a huge blind
spot for most people -- people seem comfortable with direct device
access, despite there being a much better alternative model literally
in front of their faces.

It's worth analyzing the three problems I mentioned above:
    non-network: apparently not a serious problem for most people,
	since most people run the applications on the machine
	in front of them.  An obvious exception is X terminals,
	which is why NCD took over the development of NAS.
	But most BSD/Linux people don't run on X terminals.
    low-level API: only affects developers.  I'm sure the developers
	dealing with sound would like a higher-level API (I know
	I do), but since the user won't see it, it doesn't seem
	worth dealing with.
    exclusive access: This is the thing that amazes me.  I HATE
	not having common access.  I want to use the audio to play mpeg3s,
	have interesting system sounds, use speak-freely, run
	timidity, play games, etc.  Right now, I have to choose which
	one I want, and turn off everything else.  That SUCKS.  Why aren't
	users up in arms about this?

Okay, so my question is, is it too late to do anything about this?

Here are some possible approaches:  

    Change the device driver so that multiple access is allowed.
    Advantages: no applications need to be changed.  Addresses the
	most obvious problem with the least amount of effort.
    Disadvantages: terrible design -- that kind of functionality should
	_not go in a device driver.  Doesn't address all the
	problems.  Not beneficial outside the FreeBSD world.
	Dangerous -- bugs in tricky  algorithms could cause system
	to crash.

    Implement a dynamically-loadable device driver that allows multiple
    access.
    Advantages: no applications need to be changed.
    Disadvantages: Still not a good design.  Doesn't address all the
	problems.  Not beneficial outside the FreeBSD world.  Requires
	additional non-trivial work to make the sound driver dynamic
	(although I think this should be done by someone anyway).
	Still dangerous.

    Port all the applications to use NAS (or rplay, kaudioserver, etc.).
    Advantages: Moves to more X-like model.
    Disadvantages: SERIOUS porting effort.  Locks into one audio API
	(which may not end up being standard).  Such major changes
	may not be accepted back into code base by original
	developers.  Existing APIs are still fairly low-level.  Chosen
	audio interface must be debugged (both client and server),
	potentially enhanced to support all existing device-level
	functionality.

    Develop toolkit-level API, port all to use that API, implement
    on top of one or more audio server formats.  Ultimately, create
    multiple language bindings.
    Advantages: Moves to more tk-like model.  Programs that use
	audio in a simple way (e.g. playing a sound) may require
	very simple changes, promoting acceptance by original application
	developers.  Eases cross-platform (i.e. beyond UNIX) implementations.
	Easy to switch low-level interfaces to use the best (most
	robust and/or functional).
    Disadvantages: SERIOUS development effort.  Non-trivial porting
	effort.  For audio-intensive applications (timidity, speak freely,
	etc.), not likely to be accepted back into original codebase
	(until toolkit becomes established).  Proliferation of
	libraries either bloats code, or causes installation
	problems (with dynamic libraries).

Another issue to consider is that we FreeBSD folks are in a priveleged
position with respect to the Linux folks.  Since our ports model incorporates
the patching of existing code bases, we can make programs work with
FreeBSD by supplying a patch, rather than trying to get all the
individual developers to incorporate whatever changes we come up with.
So if we followed one of these approaches, we can experiment just within
our world, rather than in the entire free software community.

What do people think about these issues?  Are there others?  Is there
any chance that I'll be able to play an mpeg, AND have my system ring
when a chat request comes in, without switching to Windows? 

(Oh, one more thing.  This problem occured in displays, it also occurs
with sound.  In fact, it's a general device-access thing.  If we fix this,
we still have the problem of accessing the joystick, the CD-ROM, tape drives,
etc.  People come up with individual ad-hoc solutions; maybe it's time to
think about what could be abstracted into a general solution.]

(And another thing -- although I've been thinking about this for a while,
I was inspired to actually send this e-mail after reading a very interesting
memo on open source from a high-level microsoft dude, annotated by Eric
Raymond.  It occurred to me while reading it that while X provides a
wonderful example on the plus side of open source, our current audio
problems are a strong example on the minus side.  I'd hate to think the
audio situation was a lost cause.
URL: http://www.tuxedo.org/~esr/halloween.html)

	Steven
	grady@xcf.berkeley.edu

"I think life should be more like TV.  I think all of life's problems
ought to be solved in 30 minutes with simple homilies, don't you?
I think weight and oral hygiene ought to be our biggest concerns.
I think we should all have powerful, high-paying jobs, and everyone
should drive fancy sports cars.  All our desires should be instantly
gratified.  Women should always wear tight clothes, and men should
carry powerful handguns.  Life overall should be more glamorous,
thrill-packed, and filled with applause."

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-multimedia" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811022025.MAA29067>