The quality of the present predictions: how rigorous is the bisegmented nonlinearity in the Schairer et al. [4] model?
Figures 23 and 24 show hypothetical cochlear mechanical nonlinearities whose lower and upper branches do not precisely follow the Schairer et al. [4] model of the nonlinearity. In particular, the slopes of the lower branches of the nonlinearity in Figure 23 can exceed those of the Schairer et al. [4] model, whereas the slopes of the upper branches in Figure 23 can be lower than, higher than, or equal to those of the Schairer et al. [4] model. Similar but less extreme departures are seen in, or can be inferred from, Figure 24. Do such discrepancies invalidate the present nonlinearity computations based upon the results of Experiments 1 and 2? Hardly, as will now be shown.
The origin of the bisegmented nonlinearity: Yates et al. [7]
Schairer et al. [4] cite Yates et al. [7] as the source for the bisegmented nonlinearity used by Plack and Oxenham [6], although the latter do not explicitly mention Yates et al. [7] in their explanation of their model [6], p. 1599]. Plack and Oxenham [6] actually credit an earlier Oxenham paper for their model, which in turn cites a related paper by Yates. Regardless, if Yates et al. [7] is indeed the source of the Plack and Oxenham [6] bisegmented cochlear nonlinearity, then some critical commentary is overdue, and proceeds as follows.
Yates et al. [7] illustrated some smoothed empirical firing-rate-versus-intensity plots (here called “rate-intensity functions”) for four neurons exposed to pure tones of frequencies at or below each respective neuron’s CF. For tones at CF, whose intensity is specified in dB SPL, Yates et al. [7] found rate-intensity functions which were either sigmoidal in shape, or “sloping-saturating”, or straight. “Sloping-saturating” means that the rate-intensity function shows a bend, then climbs for an additional 30 dB or more but with a notably lesser slope (see [20] for sources of some examples). Yates et al. [7] noted that rate-intensity functions for frequencies well-off-CF tend to be sigmoidal, with a central straight section, i.e., one that is linear in dB SPL. They noted also that basilar-membrane displacement is, likewise, empirically linear in dB SPL at tone frequencies well below a given spot’s CF. Yates et al. [7] hence assumed that the well-off-CF rate-intensity function represents the dependence of a neuron’s firing rate upon basilar-membrane motion. They then chose a single neuron whose on-CF firing was sloping-saturating, and whose well-off-CF firing was sigmoidal with a linear central section. Then, for each firing rate represented by a point on the linear central section of the well-off-CF rate-intensity function, Yates et al. [7] found the intensity giving the same firing rate on the respective on-CF sloping-saturating rate-intensity function, and assumed that the latter intensity in dB SPL corresponded to the dB of basilar-membrane motion presumed to drive the well-off-CF rate-intensity function. In this manner, Yates et al. assembled a derived basilar-membrane input/output function for each of their four aforementioned neurons. In their own words, “all [of these four derived input/output] curves take on some aspect of a general form: an initial slope of unity, indicating a linear relationship between SPL and BM [basilar-membrane] amplitude, turning over more-or-less abruptly to assume a second, straight, section with a slope of about 0.2-0.25” [7], p. 211]. This, presumably, is the origin of the 0.2 dB/dB slope adopted by Plack and Oxenham [6] and later adopted in turn by Schairer et al. [4]. Yates et al. [7] used several other neurons to provide similarly-derived examples of the inferred basilar-membrane input/output function.
Closer inspection of the Yates et al. [7] nonlinearity
There are, however, a few major problems with the Yates et al. [7] construction of the cochlear mechanical nonlinearity. These are important, because they excuse the differences evident in Figures 23 and 24 between the cochlear nonlinearity as presently computed and the model nonlinearity of Schairer et al. [4]. Regarding Yates et al. [7], the well-off-CF functions shown by Yates et al. [7] tend to have linear sections only 15 dB wide at most. Indeed, in their explanatory example, the linear section is merely 10 dB wide – just enough to span the “hinge” in their inferred cochlear nonlinearity, which in their illustration nonetheless has an output range of 30 dB, something of a liberty.
Also, the Yates et al. [7] method of constructing the cochlear nonlinearity inherently assumes that the bend in the rate-intensity functions of sloping-saturating neurons represents the bend in the nonlinearity itself. Yates et al. [7] do not conceal the source of this notion, namely, an overcited paper by Sachs and Abbas [21], who proposed an approach to quantifying the effect of the nonlinearity on rate-intensity functions, an approach from which Yates et al. [7] clearly borrowed a great deal. For example, both papers inherently assume that basilar-membrane mechanical properties at one point along its length are mimicked at some other point, although in fact the mechanical properties change with location. Sachs and Abbas [21] also assumed a cochlear nonlinearity that had a slope of unity up to 73 dB SPL, above which the slope was 0.37, based upon others’ early measurements of the cochlear nonlinearity in monkeys. That intensity of 73 dB SPL is now known to be far too high for the bend, but Sachs and Abbas (using cats) nonetheless successfully found sloping-saturating rate-intensity functions whose bends were at 73 dB SPL. Sachs and Abbas [21] also showed plots that suggest that the bend point in the rate-intensity functions for sloping-saturating neurons varies over 40 decibels across neurons! Elsewhere, Palmer and Evans [22] also noted a broad range for the bend, one of 20 decibels. Altogether, such numbers suggest that sloping-saturating rate-intensity functions are, as Palmer and Evans [22] noted, not a result of cochlear nonlinearityg.
Altogether, the cochlear nonlinearity inferred by Yates et al. [7] must be considered a convenient contrivance. But Schairer et al. [4] and Schairer et al. [5], after Plack and Oxenham [6], had adopted the Yates et al. [7] nonlinearity. As such, deviations from it in Figures 23 and 24 should not be regarded as a failure of the present computations, but rather, of the underspecification of the nonlinearity in [4–7].
The quality of the present psychometric functions: (1) the advantages of the present method of obtaining psychometric functions over that of Dai [23]
Schairer et al. [4] and by Schairer et al. [5] used a method of Dai [23] to obtain psychometric functions through adaptive tracking, and to find their slopes. Dai [23], p. 3135] had concluded that adaptive tracking is “a better choice than the constant-stimulus method for measuring psychometric functions”. However, Dai’s approach is actually the less desirable one, as follows.
Does adaptive-tracking really yield psychometric functions which are more precise than those obtained through the method of constant stimuli?
Dai’s [23] conclusion was largely based upon computer simulations which he did to imitate a hypothetical observer performing anywhere from 120 to 900 trials, in 60-trial blocks, with each successive block starting at the threshold that had been identified in the previous block. (The use of small blocks brings its own problems; see below.) Dai’s [23] approach would seem to allow listener-experience-based improvement in the estimated threshold, but it contrasts to actual experiments, in which starting conditions might be the same for each block. Regardless, Dai [23] simulated three experimental methods: (1) adaptive tracking using the 2-down 1-up rule or (2) the 3-down 1-up rule; and (3) the method of constant stimuli. Step size was a parameter of the simulations. The “true psychometric function of the simulated observer, which was used to generate the responses” [23], p. 3136] was a cumulative Gaussian in the variable , where d′ is the Signal Detection Theory index of detectability [24]. Dai [23] defined d′ in terms of “signal level” x as d′ = (x/α )β for parameters α and β , which were to be estimated after-the-fact from the simulations, in which the actual chosen values of α and β were α = 1 and β = 1 . Note well that Dai’s x has intensity units, not decibel units. Dai’s psychometric function for data generation was a cumulative Gaussian in the intensity . In contrast, the present psychometric functions are cumulative Gaussians in x in dB SPL, a significant difference, as will be explained below. Nonetheless, both Dai [23] and the present work fitted functions to percentages-correct by minimizing the same weighted sum-of-squares-of-residuals, i.e., that used in Probit Analysis [1].
Dai [23] found that the generated estimates of α and β converged towards the true values as the number of trials approached 900, and as the step size increased from 1 dB to 12 dB. But the predicted β proved especially sensitive to step size, as had been found elsewhere (citations in [23]); in particular, for a step size of 1 dB, and using just 120 trials, the β obtained under adaptive tracking diverged significantly from its true value. The β computationally obtained by Dai [23] under the method of constant stimuli, however, did not. Dai’s [23] recommended solution to the divergence was to encourage the use of larger step sizes.
However, in earlier work from the Jesteadt laboratory [19, 25], the present author had tried blocks of adaptive tracks using step sizes of 8 dB followed by step sizes of 4 dB, and had found that roughly 1 out of every 4 adaptive tracks had to be discarded because it produced unrealistically low detection or discrimination thresholds. That is, subjects were able to make lucky guesses, which, due to the step sizes, took their thresholds down to absurdly low stimulus levels from which 50-trial blocks did not allow enough trials to recover. Of course, an experimenter might try to overcome this problem by using many more trials in each single block; in that case, one might just as well use the method of constant stimuli! Regardless, the point is that simulated psychophysical performances and actual psychophysical performances can give startlingly different outcomes.
Does adaptive tracking account for learning?
Dai himself [23], p. 3135] raised an important point when he noted that “The ability to trace the underlying psychometric function is particularly desirable when the performance of the observer undergoes a marked change as a result of learning, fatigue, fluctuation of attention, etc.”. Learning is often presumed to reflect reduction of internal noise. Unfortunately, the adaptive-tracking method itself may not take learning into account well, as no learning was mentioned in Schairer et al. [4] or in Schairer et al. [5], and no learning by listeners was evident in other papers from the same laboratory, papers concerning detection or discrimination (e.g., [13, 19, 25, 26]). In contrast, substantial learning was evident in the experiments reported here, manifested as shifts of the psychometric function to lower and lower SPLs over successive days of testing; a single complete psychometric function for a given time gap was produced by each subject on each testing day. In the present Experiments 1 and 2, learning effects were paramount despite subjects’ differences in the degree of previous laboratory listening experience (i.e., none for Subject 1/1A and Subject 2A, some for Subject 3, and very much for Subject 2). In each new stimulus condition of Experiments 1 and 2, improvement of detection threshold of as much as 10 dB occurred from the first to the second day’s trials, with successively lesser improvements over the following one or two days, just as found elsewhere for detection of longer tones [27, 28]. In Experiment 1, occasional within-day retesting with multiple probe-tone intensities showed that daily detection performance had asymptoted, but nonetheless, over successive days it continued to improve (see also [27–29]). As time gap in Experiment 1 changed from week to week, there was an imperceptible decline in the range of improvement in percentages-correct over successive days for a given time gap, an improvement seen elsewhere with another constant- intensities two-alternative forced-choice task [30] involving comparable time (months) and practice (thousands of trials). Of course, learning still re-occurred at each new time gap, just as found in experiments where a detected tonal frequency was changed [27]. Altogether, the learning effects noted here reflect those noted over a variety of much earlier studies, not all of which employed two-alternative forced-choice. Evidently, then, if “the underlying psychometric function is particularly desirable when the performance of the observer undergoes a marked change as a result of learning” [23], p. 3135], then such a psychometric function cannot be obtained through adaptive tracking, which cannot therefore be “a better choice than the constant-stimulus method for measuring psychometric functions” [23], p. 3135].
The advantage of accounting for learning is evident in the actual forward-masked detection thresholds in Experiment 1. The latter, when averaged across the three subjects there (graphed in [9]), are as much as 10 dB lower than thresholds for recovery from forward-masking obtained elsewhere with comparable stimuli ([31], two-alternative forced-choice tracking; [32], Bekesy tracking), despite the fact that the average age of the three subjects in Experiment 1 was 32, higher than the twenty-something average age which is typical of student-listener cohorts and hence likely to result in higher thresholds. The 10-dB difference is comparable to the circa-10-dB threshold drop seen in Experiments 1 and 2 due to learning, and cements the notion that tracking methods may not account well for learning.
Does Dai’s method give a better estimate of the slope of the psychometric function?
Last but hardly least, there is the practical issue of the actual fit of the psychometric functions to the percentages-correct. The psychometric functions fitted by Schairer et al. [4] and by Schairer et al. [5] were cumulative Gaussians in d′ = (x/α )β , after Dai [23], where x has intensity units (rather than decibels). The discrete values of x used in fitting functions to percentages-correct depends upon the technique that is used to establish the probe-tone’s detection threshold. Adaptive tracking, like the method of constant stimuli, focuses on percentages-correct (through correct/incorrect criteria), determining stimulus intensities indirectly. But the experimenter using the method of constant stimuli can specify the stimulus intensities employed and how often they are used; however, in an adaptive track, the situation is not so flexible, because the set of stimuli used and their frequencies of occurrence depend upon (1) the tracking method itself, and (2) the actual performance of the subject. The percentages-correct in Dai [23] and in Schairer et al. [4] and in Schairer et al. [5] can be examined by drawing a horizontal line at the 75%-correct mark in each graph and counting the number of data points which are above or below that line. For the three experiments of Schairer et al. [4] and the three experiments of Schairer et al. [5], all illustrations but those for the second experiment of Schairer et al. [4] (“variable-signal” with maskers of 60 dB SPL or of 90 dB SPL) reveal the data plots to be top-heavy. That is, the number of discrete values of the intensity which produce percentages-correct above 75% exceeds the number of discrete intensity values which produce percentages-correct below 75%. This is especially evident in Schairer et al. [5] thanks, ironically, to a two-track adaptive procedure intended to “obtain a larger range of PCs [percentages-correct] for the PF [psychometric function] fits” [5], p. 2199], which provided a greater number of employed intensities. With more data points above the 75% line than below it, the fit of the psychometric function will, regardless of the employed weighting scheme, focus on the uppermost data points, thus tending to underestimate the psychometric-function slope at any percentage-correct along the curve. Further, the upper data points may be more heavily weighted in the quantity that was actually minimized in the curvefitting. That quantity was a sum [23], Eq. 3] in which each term contains a multiplicative weight which is the number of trials associated with a particular percentage-correct. That number of trials may have been larger for the higher percentages-correct if the associated stimulus intensities had been visited more frequently in the adaptive track(s) than the lower stimulus intensities.
In contrast, the graphs of percentages-correct for the present Experiments 1 and 2 are not top-heavy. Hence, when the psychometric data and fitted functions for the present Experiments 1 and 2 were plotted versus intensity in dB SPL, slope at 75% was not underestimated, thanks to a close fit of the symmetric function to the data in coordinates of percentage-correct versus dB SPL. See Figure 10, whose fits of function to data are imitated by similarly good fits for Experiment 1, which the diligent reader can find in [8, 9, 14]. In those same plotting coordinates, however, the psychometric functions of Schairer et al. [4] and of Schairer et al. [5] are asymmetrical, with greater acceleration in the upper halves, as evident through close inspection of Figures 3 and 10 of Schairer et al. [4] and Figures 3 and 6 of Schairer et al. [5]. Such asymmetry de-emphasizes the fitting of the psychometric function to the lower data points, contributing to the misestimation of the slope of that function at any percentage-correct, and clearly underestimating the slope at 75% and below. The asymmetry is presumably due to using (after [23]) as the independent variable in the psychometric function, rather than using the stimulus intensity in dB SPL. However, it is dB SPL, not , which is the intensity measure of interest in the Schairer et al. [4] model (Figure 3).
Dai [23] had, besides running simulations, obtained empirical just-noticeable frequency differences using adaptive tracking with either a 2-down 1-up rule or a 3-down 1-up rule or the method of constant stimuli. In each case, Dai’s [23] fitted psychometric functions more closely followed the upper portions of the percentages-correct, such that the psychometric-function slope at 75% correct was underestimated – the same problem that is evident in Schairer et al. [4] and in Schairer et al. [5]. His fitted curves, too, are similarly asymmetrich.
The quality of the present psychometric functions: (2) why they offer unprecedented precision
The simulations of Garcia-Perez [33]
An important aspect of the present forward-masked detection thresholds is that the conservative and painstaking methods employed to obtain them rendered them of unprecedented precision. Garcia-Perez [33] confirms the unprecedented precision of the present results – by showing the lack of precision of detection thresholds obtained by the favored method used for obtaining detection and discrimination thresholds, namely, adaptive tracking. The latter is popular because of its greater speed than the method of constant stimuli, and it was the method employed by Schairer et al. [4] and by Schairer et al. [5], among others. Adaptive tracking typically employs equal dB steps when intensities are adjusted up or down during the adaptive track. Tracking can follow a number of different rules; the rule of dropping the intensity after two correct identifications of the target stimulus and raising it after one incorrect identification, called 2-down 1-up, has been especially popular and has been used in countless papers, including many from the Jesteadt laboratory (e.g., [4, 5, 25, 26]). Garcia-Perez [33] studied the relative efficacy of different tracking rules in m-alternative up-down tracking, where in the citations just mentioned, m=2, perhaps the most mundane choice. What Garcia-Perez [33] did was to generate simulated percentages-correct for detection or discrimination tasks, using either Weibull or logistic equations as the source psychometric functions. Those particular sigmoidal functions are justified from many psychophysical studies; recall that the cumulative Gaussians used in Probit Analysis are themselves approximations, as no exact sigmoidal solution exists. Garcia-Perez’s findings for detection are the relevant ones here, and as such his findings for discrimination, though similar, will be ignored.
In performing his simulations, Garcia-Perez [33] incorporated a factor crucial to the present paper, namely, “the spread σ [sic] of a psychometric function”. His definition of σ, however, differs from the present one (Eq. 3); indeed, his σ was described as “the width of the range of stimulus levels where ψ [the psychometric function] shows non- asymptotic behavior” [33], p. 2100]. The latter width was defined by Garcia-Perez using a mathematical rule involving a parameter chosen to give a σ that was “the width of the central 98% span of ψ” [33], p. 2100]. Garcia-Perez ran simulations of two-alternative forced-choice staircases, each of which was run until 200 reversals had occurred; from these the average of the last 180 reversals was taken as the detection threshold. Garcia- Perez used those to examine the behavior of the “landing point”, that is, “the percentage-correct point on which the staircase converges” [33], p. 2101] under any particular adjustment rule and final step size and ratio of final step size to σ. That is, for each combination of conditions, he ran 5,000 replications, in order to obtain mean values and standard deviations of the detection thresholds (although he did not discuss their actual distributions). Each mean threshold was substituted back into the generating psychometric function to get the landing point; the standard deviations were likewise used to establish error bars on each landing point.
Results of the simulations of Garcia-Perez [33]
Garcia-Perez [33] noted that the theoretical landing-points stated in the literature, such as 70.7% for two-alternative forced-choice under the 2-down 1-up rule with equal up and down steps, all assume an infinite number of reversals, as well as infinitesimally small steps. In practice, as Garcia-Perez [33] discovered, the landing points tended to deviate downwards from their theoretical values, this difference tending to increase with increase in the ratio of final step size to σ, that is, as step size becomes a greater proportion of the width of the psychometric function. For example, rather than being 70.7%, the landing point for 2-down 1-up could be lower than 60% if the final step size was greater than, say, 0.35σ. Garcia-Perez [33] also realized that 200 reversals was more than typically used in psychophysical studies; he therefore repeated his simulations for the 2-down 1-up rule when threshold was determined from the last 10 of 12 reversals or from the last 40 of 42 reversals. The error bars associated with the landing points became even larger as the number of reversals decreased, and were of unequal size, being larger toward higher landing points. Garcia-Perez’s [33], p. 2104] overall conclusions bear repetition, and are best expressed in his own words:
The consequences of the differential bias of conventional up-down staircases may range from eliminating an actual difference in threshold to producing it when none was actually there, contingent on which up-down rule was used and how the spread of the psychometric function varies across conditions. The magnitude of this misestimation can only be determined if the spread of the psychometric function has also been estimated with sufficient accuracy, but this is rarely done in experiments designed to obtain quick threshold estimates via up-down staircases.
Implications for the precision of the present work relative to that of Schairer et al. [4] and Schairer et al. [5]
Schairer et al. [4] used 2-down 1-up adaptive tracking with final step sizes of 4 dB and blocks of 50 forced-choices, which altogether, according to Garcia-Perez [33], would introduce substantial variability into the landing point. That is, if the true landing point was 60% for a given forward-masking condition, rather than the theoretical 70.7%, then the true threshold would have been higher, perhaps by several decibels. The empirical narrowing of the psychometric function with either larger masker-probe time-gap or with lesser forward-masker intensity would (given a consistent final step size) systematically decrease σ, and therefore increase the ratio of final step size to σ, thus systematically increasing the divergence of the adaptive-tracking-derived threshold from its true measure. In the present Experiments 1 and 2, in contrast, the intensities used were no more than 2 dB apart (see Methods), which meant that the broader the psychometric function was, the greater was the number of different stimulus intensities used to establish the threshold. This resulted in confidence intervals for each threshold which were of roughly equal size across thresholds.
Altogether, the thresholds obtained in Experiments 1 and 2 should be far more precise than those obtained by Schairer et al. [4] and by Schairer et al. [5].
Why adaptive tracking may be even less precise than indicated by Garcia-Perez [33]
The thresholds obtained from the present Experiments 1 and 2 may be even more precise than those of Schairer et al. [4] and of Schairer et al. [5], for the following reason. Garcia-Perez [33] defined the width of a psychometric function for detection according to the kind of schemes which have been popular amongst auditory physiologists for defining the “dynamic range” of a primary afferent neuron. Those schemes defined dynamic range as the width of the sigmoidal rate-level function fitted to the plot of firing-rate-versus-intensity of the neuron [20]. But such schemes do not provide an operational measure, i.e., do not provide the useful stimulus-intensity-encoding range (in dB) of the neuron [34], which may be much narrower. By the same token, the criterion width used by Garcia-Perez [33] for psychometric functions, “the width of the central 98% span of ψ”, is an extremely generous measure [20]; a more conservative measure can produce a much smaller “width” of a sigmoidal function, which would increase the ratio of final step size to σ, thereby increasing the deviation of any landing point from its theoretical value, hence reducing the precision of the thresholds inferred from landing points obtained using adaptive tracks.
“Fine structure” not seen in other studies
Finally, regarding Experiment 1, Figure 4 shows an unexpected rise in the probe-tone detection threshold circa t=7 milliseconds [8]. Studies of the recovery of the threshold of comparable stimuli from forward-masking [31, 32, 35] do not reveal this feature. Nonetheless, it is robust, being statistically significant as well as being associated with a sudden, momentary steepening of the psychometric function (see Figures 5, 6 and 7). It may be that learning, as well as greater precision, is required for this feature to appear.