Mason R., Rumsey F. - An Assessment of the Spatial Performance of Virtual Home Theatre Algorithms by Subjective and Objective Methods.pdf

(258 KB) Pobierz
An Assessment of the Spatial Performance of Virtual Home Theatre Algorithms by Subjective and Objective Methods
DepartmentofMusicandSoundRecording
TheInstituteofSoundRecordingpapers
UniversityofSurrey Year2000
AnAssessmentoftheSpatial
PerformanceofVirtualHomeTheatre
AlgorithmsbySubjectiveandObjective
Methods
RussellMason FrancisRumsey
ThispaperispostedatSurreyScholarshipOnline.
http://epubs.surrey.ac.uk/recording/4
142565086.001.png
An Assessment of the Spatial Performance of
Virtual Home Theatre Algorithms by Subjective and Objective Methods
5137 (L - 6)
Russell Mason and Francis Rumsey
Institute of Sound Recording, University of Surrey, Guildford, Surrey, UK
Presented at
the 108th Convention
2000 February 19-22
Paris, France -
AUDIO
This preprint has been reproduced from the author’s advance
manuscript, without editing, corrections or consideration by the
Review Board. The AES takes no responsibility for the
contents.
Additional preprints may be obtained by sending request and
remittance to the Audio Engineering Society, 60 East 42nd St.,
New York, New York 10165-2520, USA.
All rights reserved. Reproduction of this preprint, or any portion
thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
AN AUDIO ENGINEERING
SOCIETY PREPRINT
142565086.002.png
An assessment of the spatial performance of virtual home theatre
algorithms by subjective and objective methods
Russell Mason and Francis Rumsey, Institute of Sound Recording,
School of Performing Arts, University of Surrey,
Guildford, Surrey, GU2 5XH, UK
r.mason@surrey.ac.uk
f.rumsey@surrey.ac.uk
Abstract:
A controlled subjective test was carried out to assess selected spatial qualities of three virtual
home theatre processors. The subjective results were used to evaluate a number of objective
measurements based on the interaural cross-correlation coefficient (IACC). A novel
implementation of the IACC was found which appears to correlate well with the subjective
data.
0 Introduction
This paper documents part of the work carried out under the Eureka 1653 MEDUSA
(Multichannel Enhancement of Domestic User Stereo Applications) project. The MEDUSA
project involves collaborative research between the following partners: the British
Broadcasting Corporation, the Institute of Sound Recording at the University of Surrey,
Nokia Research Centre, Genelec Oy, and Bang & Olufsen A/S.
The purpose of the project is to examine the variables of the domestic multichannel sound
system, with and without picture, to carry out the essential optimisation leading to consumer
end products. These products will combine the requirements of multichannel reproduction
together with the less complex modes of reproduction, such as stereo and mono. This involves
linked studies of programme production and perceptual elements, leading to a single
optimised approach to domestic reproduction.
A great deal of the research carried out within the MEDUSA project involves subjective
listening tests. These subjective experiments are both expensive and time consuming to carry
out. As an alternative to this, objective measures that correlate well with certain subjective
parameters would be more accurately repeatable and would save time and money [1].
Therefore, it would be useful if subjective assessments could be replaced or complemented by
objective measurement methods. Currently it may be an impossible task to replace subjective
assessments completely. However, there are measures that are established or under
development which may correlate well with some aspects of spatial perception.
Whilst a great deal of research has been completed into the aspect of localisation in
reproduction systems, ‘spatial impression’ has so far been left behind [2]. Perhaps one reason
for this is the comparative simplicity with which localisation can be evaluated in listening
tests. In contrast, spatial impression is a much more complicated multidimensional subjective
phenomenon. In this case, spatial impression is defined as the auditory perception of the
location, dimensions, and other physical parameters of a sound source and the acoustic
environment in which the source is located.
In some areas, spatial impression has been researched in detail. This includes the perception
of concert hall acoustics. Not all of the measurable or perceivable categories used in concert
1
hall acoustics are relevant to the reproduction of sound in small rooms, but there are definite
parallels. Beranek provides a good overview of this [3].
The research into concert hall acoustics also proposes a number of objective measurements
that help to predict how a listener will perceive the sound of a concert hall. Among these is
the interaural cross-correlation coefficient (IACC), a measurement which was first worked on
in the late 1960s. Work by Schroeder, Gottlob and Siebrasse found that IACC was one of a
number of physical measures that correlated well with listener preferences [4]. Ando
confirmed this and found that it was independent of reverberation time [5].
The IACC has seldom been tested using reproduction systems [6, 7, 8]. There are also
arguments that the IACC is inadequate due to its poor low frequency differentiation [9], and
that the IACC does not work for small rooms [10].
Therefore, an experiment was undertaken to test a reproduction system in a small room using
both subjective and objective measurements. The objective measurements were based on the
IACC and the results were examined to find correlations with subjective spatial attributes.
The reproduction chosen implemented various ‘virtual home theatre’ (VHT) systems as
described in [11]. This type of system aims to reproduce the spatial attributes of the original
multichannel material using only two loudspeakers. This is usually attempted by simulating
head-related transfer function (HRTF) cues with cross-talk cancelling. By using a system in
which some of the loudspeaker signals are already artificially spatialised, the challenge for the
objective measurement is possibly made more difficult.
1 Programme material
In order to quantify attributes of sound reproduction using subjective tests, it is necessary to
conduct controlled listening tests. Within these listening tests an experimental design needs to
limit extraneous variables to an absolute minimum. Because of this, the programme material
needs to contain a wide range of auditory cues, yet be limited enough not to confuse the
listener. Whilst this may in some cases limit the external validity of a test, it is sometimes
necessary in order to obtain sensitive results.
In this case, certain spatial attributes of various virtual home theatre algorithms were judged.
Based on the work of Berg [12], these attributes were limited to those of Apparent Source
Width (ASW), Listener Envelopment (LEV) – both defined in [13], and Depth (perceived
distance of the source from the listener). The simplest programme material available that
would sufficiently excite all three spatial attributes was a single source in a reverberant
environment. To produce a range of auditory cues, a number of acoustic environments and
sound sources were needed.
Therefore, programme material was recorded specifically for this experiment, consisting of a
number of sound sources in a number of acoustic environments. In order to separate the
variables of acoustic environment and sound source, the source was sounded and recorded in
each environment. If this had been done in the conventional manner of recording a
performance in each space, there would have been an additional variable. For a given musical
extract, even with the very best musicians, it would have been impossible to play exactly the
same twice or more. This would have added performance as a confounding variable in
judging the reproduction of the acoustic environments.
Replaying anechoically recorded excerpts through a loudspeaker in each of the acoustic
environments eliminated this variation. This method was necessarily a compromise as there
2
was no longer a real source sounding in each acoustic. The disadvantages of this approach
were due to the artificiality of this ‘virtual source’. This included the directionality of the
source and the physical coupling of the source to the air. Such factors as timbre, attack, decay,
and musicality should have been reproduced effectively by high quality reproduction. This
approach has been used successfully in previous experiments [14, 4].
1.1 Anechoic recordings
The most readily available source of anechoic recordings was the Bang and Olufsen CD that
contains anechoic recordings made for the Archimedes project. The recording of this is well
documented in [15].
In order to present a wide range of auditory cues, the programme material needs to contain a
range of sound sources. This should ideally include examples such as transients, sustains,
both wide-band and narrow-band (tuned) signals, a wide range of frequencies, and a human
voice. There should also be sufficient gaps in the extracts so it is possible to hear the effect of
the acoustics.
The extracts used from the B&O CD were Cello (sustained, tuned, low frequency) and
Trumpet (mixture of transient attacks and sustains, tuned, mid-high frequencies). Two
additional extracts were recorded in the free-field room at BBC Research and Development in
Kingswood Warren, UK. These were snare drum (transient, wide frequency range, separated
hits) and a male speaking voice (a mixture of noise and modulated tonal sounds - a popular
test item).
The recordings were made in mono with a Brüel and Kjær 4006 omnidirectional microphone
connected via a custom pre-amp and phantom power supply to a Tascam DA-30 DAT
recorder using the internal converters. The aim of the recording was to produce a result which
when replayed sounded as natural as possible. In order to do this, the recording was
monitored on a single large loudspeaker and compared with the natural sound from the
source.
It has been found that it is easier and more efficient to judge audio signals that are stationary
and possibly repetitive [2, 16]. Because of this, the snare drum and trumpet excerpts were
made up of a short loop of a bar or so. This loop was repeated for 60 seconds to match the
duration of the other extracts.
The relative reproduction level of each of the sound sources is also important in recreating it
as accurately as possible. Using a Brüel and Kjær SPL meter with a Brüel and Kjær 4145 1-
inch capsule, A-weighted SPL measurements with a fast time constant were made of an
example of each sound source represented. From this, the relative level of each source was
calculated and referenced to a calibration signal of pink noise at 85 dBA at 1 metre from the
loudspeaker. A DAT was compiled of the excerpts adjusted to the correct level. As a final
check, the DAT was replayed at its reference level next to the corresponding source
reproducing a similar phrase.
1.2 Choice of reproduction loudspeaker
After informal listening, it was apparent that the choice of loudspeaker for reproducing the
anechoic recordings was important as it had a significant effect on the perceived result. The
ideal situation would be to reproduce each sound source through a loudspeaker that matches
the source most closely in terms of size, shape, directionality and frequency response.
3
Zgłoś jeśli naruszono regulamin