With456: Fitting distributions using maximum likelihood: Methods and packages

The most powerful tests of response time (RT) models often involve the whole shape of the RT distribution, thus avoiding mimicking that can occur at the level of RT means and variances. Nonparametric distribution estimation is, in principle, the most appropriate approach, but such estimators are sometimes difficult to obtain. On the other hand, distribution fitting, given an algebraic function, is both easy and compact. We review the general approach to performing distribution fitting with maximum likelihood (ML) and a method based on quantiles (quantile maximum probability, QMP). We show that QMP has both small bias and good efficiency when used with common distribution functions (the ex-Gaussian, Gumbel, lognormal, Wald, and Weibull distributions). In addition, we review some software packages performing ML (PASTIS, QMPE, DISFIT, and MATHEMATICA) and compare their results. In general, the differences between packages have little influence on the optimal solution found, but the form of the distribution function has: Both the lognormal and the Wald distributions have nonlinear dependencies between the parameter estimates that tend to increase the overall bias in parameter recovery and to decrease efficiency. We conclude by laying out a few pointers on how to relate descriptive models of RT to cognitive models of RT. A program that generated the random deviates used in our studies may be downloaded from www.psychonomic.org/archive/.

Since the seminal work of Townsend and Ashby (1983), it has been known that fitting or testing a model with mean response times (RT) alone has very poor diagnostic power. Often, models can mimic each other at the level of predicted means, even when their fundamental assumptions are diametrically opposed (e.g., a parallel race model can mimic the predictions of a serial model; see Van Zandt & Ratcliff, 1995). In this respect, median RT does not fare better than mean RT (Miller, 1988; Ratcliff, 1993). One solution is to consider RT means and variances simultaneously (Cousineau & Larochelle, in press). Although this provides greater constraint, some model mimicking can still occur (Townsend & Colonius, 2001). Higher order moments (e.g., skew and kurtosis) are of little help because their sample estimates are unreliable for the sample sizes typically available in empirical research. As a result, the importance of considering the whole RT distribution for testing formal models is now generally acknowledged.

Nonparametric approaches to the description of RT distributions are possible. For example, estimating the cumulative distribution function (CDF) is easily achieved with the cumulative frequencies of observed RTs. However, estimating the probability density function (PDF) and the hazard function is more difficult (see Silverman, 1986, on the former, and Bloxom, 1984, on the latter). This is a problem because some models are most easily tested with nonparametric approaches (e.g., tests of the hazard function, Burbeck & Luce, 1982, and the crossing points of two PDFs, Ashby, Tein, & Balakrishnan, 1993).

A parametric approach to RT distribution is achieved by introducing an important piece of information: a density or cumulative density function of the distribution. As we will discuss in the next section, fitting a distribution is rather easy, and there are many software packages that can automate this procedure.1 In addition, the estimation method used, maximization of the likelihood function, is well understood and is not dependent upon the use of approximate heuristics (as opposed to nonparametric PDF and hazard function estimates; see Silverman, 1986). Also, once the distribution has been fitted, all associated functions (CDF, PDF, hazard, and log-survivor functions) are completely determined. Finally, the fitting process consists simply of finding estimated values for a few parameters (generally three for RT distributions). Thus, the whole RT distribution is summarized with a very compact representation.

All these benefits come at a cost, however. An incorrect distribution function, even one fitting the data reasonably well, may give a wrong indication of what kind of psychological model has produced the data. For this reason, many authors prefer to use distribution functions as an atheoretical tool or a descriptive model (Heathcote, Popiel, & Mewhort, 1991; Ratcliff, 1979). In addition, if the true RT distribution is in fact different from the fitted distribution in some fundamental way, the parameters may not capture the regularities that exist across different distributions (Schwarz, 2001). For that reason, it is desirable for experimenters using the parametric approach to fit more than one distribution function.

In this article, we review software that can perform distribution fitting. All the software packages reviewed can fit many distinct distribution functions. The most commonly used distributions in cognitive psychology are the ex-Gaussian (Hockley, 1984), the Gumbel (Gumbel, 1958; Yellot, 1977), the lognormal (Ulrich & Miller, 1993), the Wald (Burbeck & Luce, 1982), and the Weibull (Cousineau, Goodman, & Shiffrin, 2002) distributions (see Heathcote, Brown, & Cousineau, 2004, and Luce, 1986, Appendix A, for details).2 The software reviewed can all fit these distributions, although some can fit others, as well. They are FASTIS (Cousineau & Larochelle, 1997), QMPE (previously called QMLE; Brown & Heathcote, 2003), DISFIT (Dolan, van der Maas, & Molenaar, 2002), and MATHEMATICA (Wolfram, 1996). This review will be carried out in Section 2. Because the methods and the specific details of a fitting procedure are numerous, we provide in Section 1 some information to readers interested in programming their own fitting procedure. Although many readers will prefer to rely on existing software, these details are useful to know, because they can differ from one package to another.

1. DISTRIBUTION FITTING METHODS

Estimation Methods

On one hand, there is a data set T = {t^sub i^}, i = 1 . . . n, a sample containing n response times (RT). On the other hand, there is a distribution to fit that depends on a parameter set θ. The distribution is given by its probability density function f. The objective of the fitting procedure is to find the estimated parameters τ so that the theoretical distribution will be most similar to the distribution of the data set.

Many methods can be used to fit a distribution. Van Zandt (2000) reviewed sum of square error (SSE) methods and the maximum likelihood (ML) method (presented next). Using simulations, she found the standard ML method to be the best and SSE based on CDF almost as good. The criteria used were (1) bias-repeated over multiple samples, the average θ should be exactly the true parameter set θ of the population where the samples were taken; and (2) efficiency-when repeated over different samples of data, the estimates θ should have smaller variance than when estimated with other methods.

Occasionally, Equation 1 can be solved analytically. For example, we know that the sample mean is a maximum likelihood estimator of the population mean μ if the distribution is normal (Gaussian). Maximum likelihood estimators, whether analytic or obtained from numerical optimization, have a desirable property: They are asymptotically (n [arrow right] ∞) the most efficient; that is, they make maximum use of the information contained in the sample, resulting in the least variable estimation method (Van Zandt, 2000). Of course, it is not clear whether such asymptotic property holds for small samples as well (Heathcote, Brown, & Cousineau, 2004).

One may intuitively expect QML to be inefficient, since transforming the raw RT into quantiles involves a reduction of the information available. On the other hand, if m is close to n, the loss of information can be quite small and may also provide a benefit for finite samples. This is because there may be outliers in the data, and creating quantiles simply replaces the absolute value of an outlying observation with an additional count in n^sub l^ or n^sub m^. Also, estimates are robust to the addition of a small amount of error to a given RT, because Equation 3 will not change at all, as long as the RT does not move across a quantile bound.

Heathcote et al. (2000) showed that QML (Equation 3) is superior to CML (Equation 2) in terms of bias and efficiency when tested on simulated data generated by the exGaussian distribution; Heathcote et al. (2004) showed QML to be equal to or better than CML on other distributions. In order to do so, they extended a software package called QMPE to include the lognormal, Gumbel, WaId, and Weibull distributions with both CML and QMP estimation. Before proceeding to comparisons across software, we will discuss some issues related to the implementation of a maximum likelihood fitting technique because the software reviewed in Section 2 differ on these implementation details.

Implementing a Maximum Likelihood Fitting Procedure

In order to implement an ML procedure, three ingredients are required: (1) a distribution function to be fitted, (2) an optimization routine, and (3) starting values for θ.

To be a reasonable candidate for characterizing RT, a distribution function must be able to accommodate positively skewed data. The most commonly used distributions are briefly described in Heathcote et al. (2004), and their equations are presented in Table 1. The choice of a distribution provides the PDF equation/that is inserted into the function to be minimized (Equation 2 or 3), which in this context is usually called the objective function. Due to numerical considerations, the logarithm of the objective function is usually employed because the use of the summation avoids numerical underflows. For likelihood, for example, maximizing L(θ,T) over the range 0 to 1 is then replaced by minimizing -ln L(θ,T), the value ranging from +∞ [-ln(0), unlikely] to zero [ln(1), absolutely certain]. The same rationale applies if quantiles Q are used instead of raw data T. Most computer software packages report the minimized value of -ln L(θ,T).

The second ingredient is an optimization procedure to minimize the objective function. Various algorithms exist, the oldest of which was introduced by Newton. All these methods are iterative, starting with a tentative θ^sub 0^ and updating it through various iterations until an optimal value, θ^sub p^, is found.4

Appendix A summarizes the most commonly used minimization algorithms. They are generally distinguished (Box, Davies, & Swann, 1969) by whether they use analytic derivatives of the objective function (gradient methods) to guide search, or whether they use numerical approximations to the derivative (direct search methods). In general, gradient methods can find a minimum with a smaller number of iterations. However, it may take more time to perform the iterations if the gradients are not available in closed-form equations (as is the case for the ex-Gaussian distribution).

The last ingredient in obtaining a solution consists of finding reasonable starting values θ^sub 0^. If the surface of the objective function is quadratic, it has only one minimum, and thus, all starting points will lead the minimization routine to the same optimal solution θ^sub p^. In practice, however, there may be many local minima. The best way to avoid false convergence in a local minimum is to start the routine at various locations or to start as close as possible to the optimal solution. To achieve this, heuristic estimates can be developed, often based on the first few moments of the data in order to automate the starting point selection. These heuristics are not always accurate, due to sampling variance in the moment estimates.

2. TESTING THE SOFTWARE PACKAGES

We compare different software packages aimed at fitting distributions. These packages, briefly described in Appendix B, differ in the minimization routines used and in the heuristics used for starting values. All of these packages allow the user to alter the starting value parameters. Table 2 reviews some features of the software, and Appendix C shows examples of commands for a typical fitting session with each.

Simulation Methods

The simulations repeatedly sample random deviates from one of the five distributions, with known parameters, and then estimate those parameters with each software package. The parameter estimates are then compared against the known values for both accuracy (bias) and variability (efficiency). For each of the five source distributions (ex-Gaussian, Gumbel, lognormal, Wald, and Weibull), we sampled n = 250 random deviates. The parameter values appear in Table 3, along with the associated (theoretical) mean, standard deviation, and skew. They were chosen so that (1) the means and standard deviations are all approximately 1,000 and 100, respectively; (2) the overall distribution shapes are positively skewed (the Gumbel distribution has a constant skew). We repeated the sample-and-fit process 1,000 times, making sure that the same samples were fitted by each software package.

The random samples were generated for each of the five distributions by variously transforming random uniform deviates.5 The source code of a program that generated these random values is available on the archive site of the Psychonomic Society.

For all QMP calculations, the number of quantiles used was 32. This decision probably put the QMP method at a relative disadvantage because the small number of quantiles was unnecessarily restrictive.

Simulation Results

The programs reviewed were quite robust and never crashed; MATHEMATICA could not find a solution for only one simulated set of Wald deviates. Some analyses did not finish before they reached the maximum number of iterations allowed. However, because only QMPE and MATHEMATICA return this information, we did not remove these solutions from further analyses.

Parameter space. Before turning to the computation of bias and efficiency, we take a look at parameter space. In Figures 1-5, we plotted the estimated parameters as points in the appropriate parameter spaces (two-dimensional for the Gumbel distribution and three-dimensional for the other). Each point represents one of 100 different samples. The central cross shows the position of the true parameter set used to generate each sample. The purpose of these graphs is to see to what extent parameter dependencies are present and, most importantly, whether some software packages are less sensitive to them than others.

Figure 1 shows the parameters � and σ estimated from the Gumbel random deviates. As seen, the estimates are all spread out around the true parameters with no systematic deviations, indicating no important bias. Furthermore, all software shows the same dispersion.

Figure 2 shows the parameters �, σ, and τ estimated from ex-Gaussian distributed random deviates. One thing to note is that the cloud is not uniformly spread in all directions but tends to form an ellipse. This is easier to see with the projections on the sides of the plot box. This ellipse more or less goes through the main diagonal of the box, which illustrates that the parameter estimates are not independent. For example, a moderately small estimate for � can be compensated by a moderately large value of τ and a moderately small value of σ. QMPE and DISFIT return information about this fact in the form of estimated parameter correlations, but the other software packages do not.

The results for the Weibull distribution were similar, as seen in Figure 3, except that the ellipse is oriented along a different diagonal of the cube. This indicates that a moderately large estimate for α can be compensated by a moderately small estimate for β and γ. Except for one outlier obtained by QMPE, the efficiencies are roughly comparable (that outlier generated an error exit code and so could have been either censored or remedied by manually setting the starting points).

As can be seen from comparison across the panels of Figures 2 and 3, all software packages returned an ellipse of about the same shape and orientation. In all cases, the centers of gravity of the clouds are near the central cross, suggesting only a small bias, and the overall volume of the clouds suggest equal efficiency for all the software packages. Further investigation of biases and efficiency will be performed later.

Figure 4 shows the-more complicated-results using lognormally distributed random deviates: The points form a crescent. As a consequence, many of the estimates are close to the true parameter values. However, the center of gravity, because of the curvature, will not be on the cross, resulting in mean bias. In addition, all four software packages are subject to this pathology (although to a lesser extent for MATHEMATICA), suggesting that it is due to the distribution function, not the optimization capabilities of the software. Finally, QMPE has a few outliers near the bottom of panel a. On these occasions, a singular Hessian matrix error was also returned by QMPE.

This pathology is not unique to the lognormal distribution. Wald-distributed random deviates also produced estimates that form a crescent when the parameter space is plotted, as seen in Figure 5. It had the same volume and orientation, regardless of the software package used. Such nonlinear pathology cannot be detected by the estimated parameter correlations; only visual inspection of the parameter space shows it.

Package capabilities. In the following, we proceed to an examination of bias and efficiency across packages. However, we will not consider single-parameter biases but rather concentrate on the bias shown by the whole estimated parameter set, relative to the true parameter set. To achieve this, bias was computed as the distance between the center of gravity of all the estimated parameters θ^sub i^ i = 1 . . . 1,000, and the true θ. Thus, bias = ||E(θ) - θ|| = E(||θ^sub t^ - θ||) where E(θ) denotes the average position of all the estimates, and ||. || denotes the Euclidean distance (the norm). Efficiency was computed as the standard deviation in the distances between each point θ^sub i^ and θ, SD(||θ^sub t^ - θ||).

Figure 6 shows the results expressed as a percentage relative to ||θ||. Note that the scales for each panel differ. The two most biased distributions are the lognormal and the Wald (bottom row), reaching an average of 1% and 25% biases. These are exactly the distributions showing nonlinear dependencies between parameter estimates, as seen in Figures 4 and 5. The other three distribution functions (Weibull, Gumbel, and ex-Gaussian) have much smaller biases, less than 1% in all cases. The bias is even smaller than 0.1% for the Gumbel. In this last case, since the parameter space has only two dimensions, there is less potential for bias. DISFIT turned out to be very apt (low bias, high efficiency), fitting Weibull deviates, whereas MATHEMATICA outperformed the other packages for lognormal deviates.

Overall, the QMP estimates produced by QMPE are as good as those produced by the CML methods obtained from the other software packages (DISFIT being worst for lognormal deviates). This is surprising, considering the major information reduction imposed on the data: They were reduced from 250 raw data points to only 32 quantiles, an almost eight-fold compression. QMPE efficiency, indicated by the error bars in Figure 6, is slightly worse for the Weibull deviates, but results almost entirely from a few outliers (one is visible in Figure 3). Outliers generated by QMPE were often accompanied by an error exit code related to the singularity of the Hessian matrix. Therefore, a very strict selection of the successful fits would have increased considerably the efficiency of the QMPE method, to the detriment of having a little less than 5% of the data set either rejected or requiring refit. When manually fitting a data set, the user should consider changing the starting points or changing the criteria for ending a search.

Conclusions

Overall, the four software packages lead to very similar bias and efficiency measures, confirming that they all work properly and that the different platforms and algorithms used make little difference, at least with simulated data. This was true even though different optimization routines and different starting value heuristics underlie each package. The single most important factor on the quality of the estimates was the presence of nonlinear relationships between the parameter estimates. This has implications for comparing groups of subjects. For example, the Wald estimates are so inefficient that they are likely to differ more within group than between groups. If the purpose is to see differences, the ex-Gaussian and Weibull distributions are preferable as atheoretical summaries of shape. Strategies (such as reparameterization, Bates & Watts, 1988) can reduce such nonlinearities. However, the required transformations are difficult to find, sometimes relying on a trial-and-error process.

3. GENERAL CONCLUSIONS

From Descriptive Models of RT to RT Models

Parametric estimation of RT distribution provides a compact description of RT data. In addition, once the distribution is fitted, it is easy to calculate the PDF, CDF hazard, and more. A main point of this article was to show that there are good quality software packages to perform fits and that these packages are reliable and easy to use.

A more theoretical question is to decide which distribution function to fit. As seen in this paper, five candidates can readily be explored. Although there is no consensus at this time, two points should guide one's choice.

The first point concerns the informative utility of the parameters across samples. For example, if a single change in the experimental procedure results in changes in all the parameters of the distribution, the representation is not compact across conditions. Thus, in choosing a distribution function as a descriptive model, the researcher should be mostly interested in how concisely the parameters capture the experimental manipulation. This should be sought even if it sacrifices some quality of the fit.

Differences in L(θ,T) across distributions cannot be compared since distribution functions can have different capabilities for fitting random data. For example, a distribution with more free parameters has more liberty to fit the data and will likely have a smaller -ln L(θ,T). One solution is to penalize for extra parameters, as in the AIC test (Bozdogan, 1987). However, even with an equal number of parameters, some functions may be able to accommodate more data sets, a property often termed geometric complexity. There are methods to adjust the penalty term to compensate for complexity, but these can be computationally difficult (Myung, 2000; Gr�nwald, 2000).

The second important point in the choice of a distribution to fit is related to psychological models of cognition. Whereas a researcher might simply be interested in a descriptive model of RT for convenient communication of the results, a more ambitious approach is to have a model based on psychological mechanisms that can predict not only RT but also the shape and scale of the whole RT distribution. Two cases are then possible: First, the model can be analytically solved to yield an algebraic formula for the RT distribution (see Cousineau, in press). It can either be one of the distributions reviewed here or a yet-unknown distribution function. In this case, the researcher can fit this distribution and ensure that the parameters are acting according to a priori predictions (Schwarz, 2001). Second, in the case in which the model cannot be solved analytically, the researcher can simulate the model and ehoose a descriptive model to fit the simulated RT. By doing the same to the observed RT distribution, the researcher can check that the descriptions are convergent. This is the approach used in Ratcliff (1979), where the ex-Gaussian was the intermediary between empirical and simulated data.

The best solution is to fill the gap between a model and RT data with more than just the predicted means. However, it is possible that the observed RTs are contaminated by other factors, such as fatigue or fast guess. We thus have to keep in mind the possibility of fitting mixtures of distributions (Cousineau & Shiffrin, 2004; Dolan, van der Maas, & Molenaar, 2002) or that the parameters of the distributions are changing over time. In this context, QMP estimation is likely to be more robust to the effects of outliers and measurement noise than standard CML estimation.

ARCHIVED MATERIALS

The following materials associated with this article may be accessed through the Psychonomic Society's Norms, Stimuli, and Data archive, http://www.psychonomic.org/archive/.

To access these files or links, search the archive for this article using the journal (Behavior Research Methods, Instruments, & Computers), the first author's name (Cousineau) and the publication year (2004).

FILE: Cousineau-BRMIC-2004.zip

DESCRIPTION: The compressed archive file contains three files:

randmod.f90 and random. f90 are the two parts of a Fortran 90 program that generates sets of random numbers corresponding to samples from the following distributions: ex-Gaussian, Gumbel, lognormal, Wald, and Weibull. The code is adapted from the work of Dagpunar (1988), Marsaglia and Tsang (2000), Ahrens and Dieter (1982), and Kemp (1986).

readme.txt is a text file explaining the purpose of the program and how to compile it on most stations.

(Manuscript received April 15, 2003; revision accepted for publication May 13, 2004.)

[Reference]

REFERENCES

AHRENS, J. H., & DIETER, U. (1982). Computer generation of Poisson deviates from modified normal distributions. ACM Transactions on Mathematical Software, 8, 163-179.

ASHBY, F. G., TEIN, J.-Y., & BALAKRISHNAN, J. D. (1993). Response time distributions in memory scanning. Journal of Mathematical Psychology, 37, 526-555.

BATES, D. M., & WATTS, D. G. (1988). Nonlinear regression analysis and its application. New York: Wiley.

BERINGER, J. (1992). Timing accuracy of mouse response registration on the IBM microcomputer family. Behavior Research Methods, Instruments, & Computers, 24, 486-490.

BLOXOM, B. (1984). Estimating response time hazard functions: An exposition and extension. Journal of Mathematical Psychology, 28, 401-420.

Box, M. J., DAVIES, D., & SWANN, W. H. (1969). Non-linear optimization techniques. Edinburgh: Oliver & Boyd.

BOZDOGAN, H. (1987). Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52, 345-370.

BROWN, S., & HEATHCOTE, A. (2003). QMLE: Fast, robust, and efficient estimation of distribution functions based on quantiles. Behavior Research Methods, Instruments, & Computers, 35, 485-492.

BURBECK, S. L., & LUCE, R. D. (1982). Evidence from auditory simple reaction times for both change and level detectors. Perception & Psychophysics,32, 117-133.

CHANDLER, P. J. (1965). Subroutine STEPIT: An algorithm that finds the values of the parameters which minimize a given continuous function [computer program]. Bloomington: Indiana University.

COUSINEAU, D. (in press). Merging race models and adaptive networks: A parallel race network. Psychonomic Bulletin & Review.

COUSINEAU, D., GOODMAN, V. W., & SHIH-RIN, R. M. (2002). Extending statistics of extremes to distributions varying in position and scale and the implications for race models. Journal of Mathematical Psychology, 46, 431-454.

COUSINEAU, D., & LAROCHELLE, S. (1997). PASTIS: A program for curve and distribution analyses. Behavior Research Methods, Instruments, & Computers, 29, 542-548.

COUSINEAU, D., & LAROCHELLE, S. (in press). Visual-memory search: An integrative perspective. Psychological Research.

COUSINEAU, D., & SHIFFRIN, R. M. (2004). Termination of a visual search with large display size effect. Spatial Vision, 17, 327-352.

DAGPUNAR, J. (1988). Principles of random variate generation. Oxford: Oxford University Press, Clarendon Press.

DAWSON, M. R. W. (1988). Fitting the ex-Gaussian equation to reaction time distributions. Behavior Research Methods, Instruments, & Computers, 20, 54-57.

DOLAN, C. V., & MOLENAAR, P. C. M. (1991). A comparison of four methods of calculating standard errors of maximum-likelihood estimates in the analysis of covariance structure. British Journal of Mathematical & Statistical Psychology, 44, 359-368.

DOLAN, C. V., VAN DER MAAS, H. L. J., & MOLENAAR, P. C. M. (2002). A framework for ML estimation of parameters (mixtures of) common reaction time distributions given optional truncation or censoring. Behavior Research Methods, Instruments, & Computers, 34, 304-323.

FLETCHER, R. (1980). Practical methods of optimization. New York: Wiley.

GILL, P. E., MURRAY, W., SANDERS, M. A., & WRIOHT, M. H. (1986). User's guide for NPSOL (Version 4.0): A FORTRAN package for nonlinear programming (Tech. Rep. No. SOL 86-2). Stanford University: Department of Operation Research.

GILL, P. E., MURRAY, W., & WRIGHT, M. H. (1981). Practical optimization. London: Academic Press.

GR�NWALD, P. (2000). Model selection based on minimum description length. Journal of Mathematical Psychology, 44, 133-152.

GUMBEL, E. J. (1958). The statistics of extremes. New York: Columbia University Press.

HAYS, W. L. (1973). Statistics for the social sciences (2nd ed.). New York: Holt, Rinehart & Winston.

HEATHCOTE, A. (2004). Fitting Wald and ex-Wald distributions to response time data: An example using functions for the S-PLUS package. Behavior Research Methods, Instruments, & Computers, 36, 678-694.

HEATHCOTE, A., BROWN, S., & COUSINEAU, D. (2004). QMPE: Estimating Lognormal, Wald, and Weibull RT distributions with a parameter-dependent lower bound. Behavior Research Methods, Instruments, & Computers, 36, 277-290.

HEATHCOTE, A., BROWN, S., & MEWHORT, D. J. K. (2002). Quantile maximum likelihood estimation of response time distributions. Psychonomic Bulletin & Review, 9, 394-401.

HEATHCOTE, A., POPIEL, S. J., & MEWHORT, D. J. K. (1991). Analysis of response-time distributions: An example using the Stroop task. Psychological Bulletin, 109, 340-347.

HOCKLEY, W. E. (1984). Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, & Cognition, 10, 598-615.

KEMP, C. D. (1986). A modal method for generating binomial variables. Communications on Statistics-Theoretical Methods, 15, 805-813.

LUCE, R. D. (1986). Response times, their role in inferring elementary mental organization. New York: Oxford University Press.

MARSAGLIA, G., & TSANG, W. W. (2000). A simple method for generating gamma variables. ACM Transactions on Mathematical Software, 26, 363-372.

MILLER, J. (1988). A warning about median reaction time. Journal of Experimental Psychology: Human Perception & Performance, 14, 539-543.

MYUNG, I. J. (2000). The importance of complexity in model selection. Journal of Mathematical Psychology, 44, 190-204.

NELDER, J. A., & MEAD, R. (1965). A simplex method for function minimization. Computer Journal, 1, 308-313.

PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., & VETTERLING, W. T. (1986). Numerical recipes: The art of scientific computing. New York: Cambridge University Press.

RATCLIFF, R. (1979). Group reaction time distributions and an analysis of distribution statistics. Psychological Bulletin, 86, 446-461.

RATCLIFF, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114, 510-532.

SCHWARZ, W. (2001). The ex-Wald distribution as a descriptive model of response times. Behavior Research Methods, Instruments, & Computers, 33, 457-469.

SILVERMAN, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.

TOWNSEND, J. T., & ASHBY, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge: Cambridge University Press.

TOWNSEND, J. T., & COLONIUS, H. (2001, July). Variability of MAX and MIN statistics: A theory of the quantile spread as a function of sample size. 34th Annual Meeting of the Society for Mathematical Psychology, Providence, RI.

ULRICH, R., & MILLER, J. (1993). Information processing models generating lognormally distributed reaction times. Journal of Mathematical Psychology, 37, 513-525.

ULRICH, R., & MILLER, J. (1994). Effects of truncation on reaction time analysis. Journal of Experimental Psychology: General, 123, 34-80.

VAN ZANDT, T. (2000). How to fit a response time distribution. Psychonomic Bulletin & Review, 1, 424-465.

VAN ZANDT, T., & RATCLIFF, R. (1995). Statistical mimicking of reaction time data: Single-process models, parameter variability, and mixtures. Psychonomic Bulletin & Review, 2, 20-54.

WOLFRAM, S. (1996). The Mathematica Book (3rd Ed.). New York: Cambridge University Press.

YELLOT, J. L, JR. (1977). The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology, 15, 109-144.

[Author Affiliation]

DENIS COUSINEAU

Universit� de Montr�al, Montr�al, Qu�bec, Canada

SCOTT BROWN

University of California, Irvine, California

and

ANDREW HEATHCOTE

University of Newcastle, Callaghan, New South Wales, Australia

[Author Affiliation]

This research was supported in part by the Fonds pour la formation de Chercheurs et l'Aide � la Recherche and the Conseil de Recherches en Sciences Naturelles et en G�nie du Canada. We thank C. Dolan, G. Francis, G. Gigu�re, and S. H�lie for their comments on an earlier version of this article. Correspondence should be addressed to D. Cousineau, D�partement de psychologie, Universit� de Montr�al, C. P. 6128, succ. Centre-ville, Montr�al, PQ H3C 3J7, Canada (e-mail: denis.cousineau@umontreal.ca).

[Author Affiliation]

AUTHOR'S E-MAIL ADDRESS: denis.cousineau@umontreal.ca.

AUTHOR'S WEB SITE: http://mapageweb.umontreal.ca/cousined.

With456

вторник, 13 марта 2012 г.

Fitting distributions using maximum likelihood: Methods and packages

Комментариев нет:

Отправить комментарий

вторник, 13 марта 2012 г.

Fitting distributions using maximum likelihood: Methods and packages

Комментариев нет:

Отправить комментарий

вторник, 13 марта 2012 г.