Lokki T., Gron M., Savioja L., Takala T. - A Case Study of Auditory Navigation in Virtual Acoustic Environments.PDF

(452 KB) Pobierz
201286273 UNPDF
A Case Study of Auditory Navigation in Virtual Acoustic
Environments
Tapio Lokki, Matti Grohn, Lauri Savioja, and Tapio Takala
Telecommunications Software and Multimedia Laboratory
Helsinki University of Technology
P.O.Box 5400, 02015 HUT, Helsinki, FINLAND
Phone: +358 9 451 4737, Fax: +358 9 451 5014
Tapio.Lokki@hut., Matti.Grohn@hut., Lauri.Savioja@hut., Tapio.Takala@hut.
ABSTRACT
We report results of an auditory navigation experiment. In auditory navigation sound is employed as a navigational aid in a
virtual environment. In our experiment, the test task was to nd a sound source in a dynamic virtual acoustic environment.
In dynamic auralization the movements of the subject are taken into account in acoustic modeling of the room. We tested the
effect of three different factors (stimulus, panning method and acoustic environment) to the number of errors and to the time
spent in the test in nding the target. The results, which were also statistically validated, proved that noise is the best stimulus,
reverberation complicates the navigation and simple models of spatial hearing give enough cues for auditory navigation.
Keywords : Auditory Navigation, Virtual Acoustics, Spatial Hearing, Dynamic Auralization
INTRODUCTION
In this paper we describe results of an auditory navigation experiment. Auditory navigation tests have been done earlier e.g.
by Loomis et al. [6] and Rutherford [8]. Our aim was to do the experiment in dynamic system, in which perceived acoustics
changes according to the movements of the subject. A good overview of different techniques needed in auditory navigation is
presented by Begault [2]. In our experiment we applied a version of the DIVA auralization system [9].
EXPERIMENT
In this experiment the task of the subjects was to nd a sound source by moving and turning in a virtual space. Our purpose
was to analyse the effect of various factors in the test setup. These factors were inuence of the sound stimulus, the directional
cues, and acoustics of the environment.
We collected the following data from each test : time spent, ending position, and trajectory of the subject’s motion. Also every
subject lled out a short questionnaire after completing the experiment. In this questionnaire we asked comments about the
easiest stimulus and which tactic the subject used in nding the sound source. In the experiment, instructions were given both
aurally and literally. In the beginning of the experiment there were three rehearsal tests. These helped subjects to understand
what they should do.
We carried out a complete test set with three variables each having three different choices. Thus the whole test set contained
27 tests.
when turning).
When subject assumed that he has found the sound source he indicated that by pressing key “ f ”. This experiment was done in
the horizontal plane.
The sound source was a point source. The target area was a sphere around the source (the diameter was one meter). Starting
positions were in random directions, 25 m away from the source.
The experiment was run in an SGI O2 workstation in a quiet ofce room. The reproduction equipment was headphones
(Sennheiser HD-580).
Participants
The experiment was completed (all 27 tests) by 27 subjects. All of them were students or staff from Helsinki University of
Technology. All subjects easily understood the experiment and were enthusiastics to give comments and to see their results.
Coaching
Moving in a virtual space was controlled with the arrow keys of a keyboard. The subject was able to move forward and
backward, and to turn left and right in constant steps (0.4 meter when moving forward or backward and
Stimulus
Panning method
Acoustic environments
pink noise
ITD alone
direct sound
articial ute
ITD + simple amplitude panning (See Fig. 1)
direct sound + 6 early reections
recorded anechoic guitar ITD + minimum-phase HRTF (FIR 30 taps)
direct sound + 6 early reections
+ reverberation (length about 1 second)
Table 1: The three tested factors.
Variables
In this experiment we tested three different factors: stimulus, panning method, and inuence of acoustic environment. Each
factor contained three choices summarized in Table 1.
Stimuli : All stimuli were sampled at 32 kHz and had equal loudness. Each was about 30 seconds long and played in a loop.
The sound source had an omni-directional radiation pattern. Pink noise and anechoic guitar were digitally copied from Music
for Archimedes CD. 1
Panning Methods :
The interaural time difference (ITD), was included as an auditory cue to all tests. The ITD was calculated from spherical head
model and implemented with a short delay line. When subject pressed a key to turn his head the ITD changes smoothly. The
pick-up positions from ITD delay line were interpolated with rst order fractional delays.
The second panning method included also a simple model for frequency independent interaural level difference (ILD). This
method, also called cardioid method, was introduced by Takala and Hahn [10]. In this method sound signals for both ears are
weighted with and , which are obtained from equations:
(1)
(2)
where + is the azimuth angle of incoming sound. The cardioid method is illustrated in Fig. 1. On the left side two solid lines
in Eq. 1
and Eq. 2). On the right side of Fig. 1 the nal panning gains for left and right ear are depicted.
The third panning method used minimum-phase head-related transfer function (HRTF) lters instead of simple ILD. Original
HRTFs were measured from an articial head [7]. They were approximated with 30 tap FIR lters designed by Huopaniemi
[5]. We had lters at ?!
steps and other directions were interpolated from two adjacent lters with linear interpolation of lter
coefcients.
Acoustic environment : The simplest acoustic environment was a free eld, where only the direct sound was rendered. Our
auralization software calculates distance dependent delay, gain (according to 1/r-law), air absorption and direction for the
sound source. Air absorption is implemented with a simple lowpass lter. All auralization parameters are updated according
to the movements of a user. For example, when moving towards the sound source, delay gets shorter, gain gets bigger, and
air absorption reduces less high frequencies. To get smooth and continuous output signal the auralization parameters are
interpolated.
1 CD B&O 101. Music for Archimedes, 1992.
0
0
30
330
30
330
60
300
60
300
1.5
1
0.5
1.5
1
0.5
90
270
90
270
120
240
120
240
150
210
150
210
180
180
Figure 1: The cardioid panning method. In the left picture solid lines are right and left ear gains and a dashed line is front-back
gain. In the right picture the nal panning gains for left and right ear are depicted.
The synthesized ute was produced by a physical-based model [11].
illustrate the panning gains for right and left ears and a dashed line shows the front-back gain ( 0 >! # <$%243.& ,+.-(-(5 #
201286273.071.png 201286273.082.png 201286273.093.png 201286273.104.png 201286273.001.png 201286273.012.png 201286273.023.png 201286273.024.png
Found
Not Found
Total
Tested variable
N
Percent N Percent
N
pink noise
232 95.5 % 11
4.5 %
243
articial ute
191 78.6 % 52 21.4 %
243
recorded anechoic guitar
199 81.9 % 44 18.1 %
243
ITD only
176 72.4 % 67 27.6 %
243
ITD + Cardioid panning
221 90.9 % 22
9.1 %
243
ITD + HRTF (dummy head)
225 92.6 % 18
7.4 %
243
direct sound
215 88.5 % 28 11.5 %
243
direct sound + 6 reections
210 86.4 % 33 13.6 %
243
direct sound + 6 reections + reverb 197 81.1 % 46 18.9 %
243
Table 2: The number of found and not found cases. The 27 navigation tasks were completed by 27 subjects.
The second and third acoustic environments were a simple shoe-box room (30m x 22m x 10m). The second rendering case
included the direct sound (located in a corner, 2m from oor, 2m from wall and 5m from another wall) and all six rst order
reections, which were calculated using the image source method [1]. Each image source had similar auralization parameters
as the direct sound, but also material absorption was included. The auralization parameters of image sources (and direct sound)
were updated dynamically, according to the movements of user. The third rendering case included the direct sound, six early
reections and late reverberation with duration of one second.
RESULTS
The rst result of our experiment is that in most cases subjects did nd the target area. The found and not found (also called
errors) cases are summarized in Table 2 as a function of the tested variables. In the case “found” the ending point of the
navigation task was inside the target area.
Three subjects managed perfectly in all tests; they found the sound source in all 27 tests. Over half of the subjects made less
than three errors which can be considered very well performance. One of the subjects, whose performance was the poorest,
found only 55 % of sound sources.
Other collected data was the time spent in each navigation task. The high rate of found cases allows us to analyse spent times
in more detail. In Figs. 2 and 3 the boxplots present the effect of different factors to the time needed to carry out the navigation
task and the failure rate. In these plots as well as in the following analysis the spent times of “not found” cases are excluded,
because these cases do not give a reliable time of the completed task.
Typically the analysis of variance (ANOVA) model is used. However, in this case the collected data was not normally distributed
and hence it does not full the assumptions of ANOVA model. Fortunately, there exist nonparametric tests in which the
requirement of normal distribution of the data is not needed. In fact these nonparametric tests are especially appropriate when
the measurement of the dependent variable is ordinal. This applies in our case since the spent times can be ordered.
The rst applied nonparametric test was Kruskal-Wallis test. The Kruskal-Wallis test showed that in each variable group at
least one variable has a statistically signicant differences in distribution location, in other words the median of spent times of
one variable differs from other medians. The obtained results were for stimulus @BA = 43.094, p = 0.000, for panning method @BA
To nd out which variables have statistically signicant differencies in median times, the Wilcoxon Signed Ranks Test was
done (see Table 3). The Wilcoxon test analyzes the differences between the paired measurements for each subject.
Stimulus : Figure 3 and Table 2 show that pink noise was clearly the best stimulus (also statistically signicant difference,
see Table 3). Pink noise gave the minimum number of errors and was fastest, and it has also found to be easiest in subjective
judgements. Guitar sound gave worst results, which was also the subjective opinion of the subjects.
Panning Methods : It is quite clearly shown that ITD alone is inferior for auditory navigation, because almost 30% of these
cases were not found. The best panning method was cardioid panning which gave clearly fastest results. The difference to the
two other methods is statistically signicant (see Table 3). Surprisingly, in terms of median times ITD and ITD+HRTFs were
not statistically very different, although the error rate is much smaller with ITD+HRTFs (see Table 2).
Acoustic environment : Reverberation increased both the spent times and the error rate, which is an expected result. Direct
= 43.932, p = 0.000, and for acoustical environment @BA = 8.227, p = 0.016.
With nonparametric tests it is considered advisable to check validity of results by another test method. Thus we also conducted
the Friedman test, which is a nonparametric test that compares three or more paired groups. The test gave similar results that
Kruskal-Wallis test. The results were for stimulus @BA = 71.003, p = 0.000, for panning method @BA = 46.703, p = 0.000, and for
acoustical environment @ A = 16.867, p = 0.000.
201286273.025.png 201286273.026.png 201286273.027.png 201286273.028.png 201286273.029.png 201286273.030.png 201286273.031.png 201286273.032.png 201286273.033.png 201286273.034.png 201286273.035.png 201286273.036.png
All 27 tests in ascending order of median time (27 subjects)
250
Acoustic
Environment
Panning
200
Stimulus
150
100
50
0
Median time
29 31 31 34 34 39 42 42 42 42 42 43 44 48 50 50 51 57 58 59 60 60 62 69 70 79 83
Errors n/27
3 5 0 1 0 2 1 1 1 0 4 0 1 5 4 0 2 10 8 8 3 11 13 3 11 7 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Figure 2: The spent times of all the navigation tasks. The boxplot depicts the median and the 25%/75% percentiles. In the
bottom of the gure the median times (not found cases excluded) and the number of not found cases are printed.
Stimulus
Panning
Acoustics
200
Columns:
1. NOISE
2. FLUTE
3. GUITAR
200
Columns:
1. ITD
2. ITD+CARD
3. ITD+HRTF
200
Columns:
1. DIR
2. DIR+REFL
3. DIR+REFL+REV
180
180
180
160
160
160
140
140
140
120
120
120
100
100
100
93
80
80
80
75
75
73
69
70
62
63
60
60
59
60
57
54
51
51
46
45
44
41
40
40
40
39
39
37
37
38
32
30
31
29
31
20
20
20
0
1
2
3
0
1
2
3
0
1
2
3
Figure 3: Spent times in navigation tasks in the function of each tested variable (not found cases excluded). The boxplot depicts
the median and the 25%/75% percentiles. The “+” signs are outliers – the cases with values over 1.5 times the box length for
the upper edge of the box.
201286273.037.png 201286273.038.png 201286273.039.png 201286273.040.png 201286273.041.png 201286273.042.png 201286273.043.png 201286273.044.png 201286273.045.png 201286273.046.png 201286273.047.png 201286273.048.png 201286273.049.png 201286273.050.png 201286273.051.png 201286273.052.png 201286273.053.png 201286273.054.png 201286273.055.png 201286273.056.png 201286273.057.png 201286273.058.png 201286273.059.png 201286273.060.png 201286273.061.png 201286273.062.png 201286273.063.png 201286273.064.png 201286273.065.png 201286273.066.png 201286273.067.png 201286273.068.png 201286273.069.png 201286273.070.png 201286273.072.png 201286273.073.png 201286273.074.png 201286273.075.png 201286273.076.png 201286273.077.png 201286273.078.png 201286273.079.png 201286273.080.png 201286273.081.png 201286273.083.png 201286273.084.png 201286273.085.png 201286273.086.png 201286273.087.png 201286273.088.png 201286273.089.png 201286273.090.png 201286273.091.png 201286273.092.png 201286273.094.png 201286273.095.png 201286273.096.png 201286273.097.png 201286273.098.png 201286273.099.png 201286273.100.png 201286273.101.png 201286273.102.png 201286273.103.png 201286273.105.png 201286273.106.png 201286273.107.png 201286273.108.png 201286273.109.png 201286273.110.png 201286273.111.png 201286273.112.png 201286273.113.png
3 x 10
4
Test 9
x 10
4
Test 25
3 x 10
4
Test 20
3
2.5
2.5
SP
SP
2.5
SP
2
2
2
1.5
1.5
1.5
1
4
1
1
1
20
0.5
0.5
TA
0.5
11
24
17
2
4
0
TA
21
13
27
14
18
25
16
23
20
12
5
24
22
19
20
17
10
12
25
22
18
16
13
15
23
27
11
0
19
16
18
25
13
23
26
9
12
22
1
14
26
19
21
7
17
7
15
0
24
7
2
11
21
4
1
−0.5
−0.5
−0.5
TA
−1
−1
−1
Stimulus: flute
Stimulus: flute
−1.5
Panning: ITD
Environment: 6 refl., rev.
−1.5
Stimulus: flute
−1.5
Panning: ITD
Panning: ITD
Environment: no refl. no rev.
Environment: 6 refl., no rev.
−2
−2
−2
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
x 10
4
x 10
4
x 10
4
x 10
4
Test 19
x 10
4
Test 26
3
3 x 10
4
Test 21
3
2.5
2.5
2.5
2
2
2
1.5
SP
1.5
SP
1.5
1
1
1
0.5
0.5
0.5
SP
0
21
20
124
14
13
15
22
17
27
11
112
23
10
25
16
2 6
0
19
1 17
26
21
1124
27
125
12
22
10
120
16
23
8
0
21
10
25
15
122
1 2 121 2 12
111
16
24
27
2
TA
TA
−0.5
−0.5
−0.5
TA
−1
−1
Stimulus: noise
Panning: ITD + Cardioid
−1
Stimulus: noise
Panning: ITD + Cardioid
Stimulus: noise
−1.5
−1.5
Panning: ITD + HRTF
Environment: no refl., no rev.
−1.5
Environment: no refl., no rev.
Environment: 6 refl., no rev.
−2
−2
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
−2
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
4
x 10
x 10
4
x 10
4
Figure 4: All paths (27 subjects) of six different navigation tasks. Boxes indicates cases where at least the early reections
were rendered. Abbreviation SP marks the starting point and TA the target area.
Stimulus
Panning method
Acoustic environments
noise
ute
noise ITD+Card ITD+Card ITD+HRTF
dir+re
dir
dir
guitar guitar
ute
ITD
ITD+HRTF
ITD
dir+
dir+
dir+re
re+reverb re+reverb
Z
-8.408 -5.494 -2.634
-6.237
-5.579
-0.389
-2.704
-2.497
-0.250
Asymp. Sig.
(2-tailed)
0.000
0.000
0.008
0.000
0.000
0.697
0.007
0.013
0.802
Table 3: The results of the Wilcoxon Signed Ranks Test. All Z values are based on positive ranks.
and direct+reections gave almost equal results both in the time spent and in the error rate.
Figure 4 shows all paths (27 subjects) for six different navigation tasks. The upper row displays the test cases with most errors
(11 to 13 errors). In all these the stimulus was ute and the panning method ITD only. Due to the sine-wave like nature of the
ute sound the ITD can be very confusing panning method. The subjects had problems to nd correct direction to target area.
The three lower gures display three navigation tasks with no errors. In these cases the right direction to target area is found
very well. (It is easy to see, that there have been few front-back confusions and some subjects have rst headed away from the
target area.) These tasks have also been completed much faster than three tasks with most errors (mean of median times 37 s.
vs. 64 s.).
DISCUSSION
The noise stimulus was a continuous noise, which means that early reections and late reverberation should not affect to sound.
However, each early reection makes comb lter effect to noise and a comb lter effect is perceived as a certain pitch (so called
repetition pitch [4]). In dynamic situation, as in this case, these perceived repetition pitches descend when moving towards to
a sound source and this is clearly audible and helps a lot in navigation.
The results proved that dynamic early reections did not help in these navigation tasks. However, dynamic early reections are
considered as helping cues in externalization [3]. In these navigation tasks the perception of auditory space was not a measured
variable.
In our experiment the user interface is quite limited. The subjects could only turn their head or move forward and backward.
15
10
14
27
26
10
201286273.114.png 201286273.002.png 201286273.003.png 201286273.004.png 201286273.005.png 201286273.006.png 201286273.007.png 201286273.008.png 201286273.009.png 201286273.010.png 201286273.011.png 201286273.013.png 201286273.014.png 201286273.015.png 201286273.016.png 201286273.017.png 201286273.018.png 201286273.019.png 201286273.020.png 201286273.021.png 201286273.022.png
Zgłoś jeśli naruszono regulamin