Title: | Fitting Single and Mixture of Generalised Lambda Distributions |
---|---|
Description: | The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood and quantile matching estimation. Other methods such as moment matching, starship method, L moment matching are also provided. Diagnostics on goodness of fit can be done via qqplots, KS-resample tests and comparing mean, variance, skewness and kurtosis of the data with the fitted distribution. References include the following: Karvanen and Nuutinen (2008) "Characterizing the generalized lambda distribution by L-moments" <doi:10.1016/j.csda.2007.06.021>, King and MacGillivray (1999) "A starship method for fitting the generalised lambda distributions" <doi:10.1111/1467-842X.00089>, Su (2005) "A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data" <doi:10.22237/jmasm/1130803560>, Su (2007) "Nmerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions" <doi:10.1016/j.csda.2006.06.008>, Su (2007) "Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R" <doi:10.18637/jss.v021.i09>, Su (2009) "Confidence Intervals for Quantiles Using Generalized Lambda Distributions" <doi:10.1016/j.csda.2009.02.014>, Su (2010) "Chapter 14: Fitting GLDs and Mixture of GLDs to Data using Quantile Matching Method" <doi:10.1201/b10159>, Su (2010) "Chapter 15: Fitting GLD to data using GLDEX 1.0.4 in R" <doi:10.1201/b10159>, Su (2015) "Flexible Parametric Quantile Regression Model" <doi:10.1007/s11222-014-9457-1>, Su (2021) "Flexible parametric accelerated failure time model"<doi:10.1080/10543406.2021.1934854>. |
Authors: | Steve Su [aut, cre, cph] , Martin Maechler [aut], Juha Karvanen [aut], Robert King [aut], Benjamin Dean [ctb], R Core Team [aut] |
Maintainer: | Steve Su <[email protected]> |
License: | GPL (>= 3) |
Version: | 2.0.0.9.3 |
Built: | 2024-10-11 04:44:11 UTC |
Source: | https://github.com/cran/GLDEX |
The fitting algorithms considered in this package have two major objectives. One is to provide a smoothing device to fit distributions to data using the weight and unweighted discretised approach based on the bin width of the histogram. The other is to provide a definitive fit to the data set using the maximum likelihood estimation.
Copyright Information: To ensure the stability of this package, this package ports other functions from other open sourced packages directly so that any changes in other packages will not cause this package to malfunction.
All functions obtained from other sources have been acknowledged by the author in the authorship or the description sections of the help files and they are freely available online for all to use. Please contact the author for any copyright issues.
Specifically the following functions have been modified from R:
hist.su, ks.gof, pretty.su
The following functions are taken from other open source packages in R:
runif.pseudo, rnorm.pseudo, runif.halton, rnorm.halton, runif.sobol, rnorm.sobol by Diethelm Wuertz distributed under GPL.
digitsBase, QUnif and sHalton written by Martin Maechler distributed under GPL.
dgl, pgl, rgl, qgl, starship.adpativegrid, starship.obj and starship written by Robert King and some functions modified by Steve Su distributed under GPL.
Lmoments and t1lmoments written by Juha Karvanen distributed under GPL.
This package allows a direct fitting method onto the data set using
fun.RMFMKL.ml
, fun.RMFMKL.ml.m
, fun.RMFMKL.hs
,
fun.RMFMKL.hs.nw
, fun.RPRS.ml
, fun.RPRS.ml.m
,
fun.RPRS.hs
, fun.RPRS.hs.nw
,
fun.RMFMKL.qs
, fun.RPRS.qs
,
fun.RMFMKL.mm
, fun.RPRS.mm
,
fun.RMFMKL.lm
, fun.RPRS.lm
and in the case of
bimodal data set: fun.auto.bimodal.qs
,
fun.auto.bimodal.ml
,
fun.auto.bimodal.pml
functions.
The resulting fits can be graphically gauged by fun.plot.fit
or
fun.plot.fit.bm
(for bimodal data), or examined by numbers
using the Kolmogorov-Smirnoff resample tests (fun.diag.ks.g
) and
fun.diag.ks.g.bimodal
). For unimodal data fits, it is also
possible to compare the mean, variance, skewness and kurtosis of the fitted
distribution with the data set using fun.comp.moments.ml
and
fun.comp.moments.ml.2
functions. Similarly, for bimodal data fits,
this is done via fun.theo.bi.mv.gld
and fun.moments.r
.
Additionally, L moments for single generalised lambda distribution can be
obtained using fun.lm.theo.gld
. For graphical display of
goodness of fit, quantile plots can be used, these can be done using
qqplot.gld
and qqplot.gld.bi
for univariate
and bimodal generalised lambda distribution fits respectively.
Steve Su <[email protected]>
Asquith, W. (2007), L-moments and TL-moments of the generalized lambda distribution, Computational Statistics and Data Analysis 51(9): 4484-4496.
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods *17*, 3547-3567.
Gilchrist, Warren G. (2000), Statistical Modelling with Quantile Functions, Chapman & Hall
Karian, Z.A., Dudewicz, E.J., and McDonald, P. (1996), The extended generalized lambda distribution system for fitting distributions to data: history, completion of theory, tables, applications, the “Final Word” on Moment fits, Communications in Statistics - Simulation and Computation *25*, 611-642.
Karian, Zaven A. and Dudewicz, Edward J. (2000), Fitting statistical distributions: the Generalized Lambda Distribution and Generalized Bootstrap methods, Chapman & Hall
Karvanen, J. and A. Nuutinen (2008), Characterizing the generalized lambda distribution by L-moments, Computational Statistics and Data Analysis 52(4): 1971-1983. doi:10.1016/j.csda.2007.06.021
King, R.A.R. & MacGillivray, H. L. (1999), A starship method for fitting the generalised lambda distributions, Australian and New Zealand Journal of Statistics, 41, 353-374 doi:10.1111/1467-842X.00089
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM *17*, 78-82.
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424. doi:10.22237/jmasm/1130803560
Su, S. (2007). Nmerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Computational Statistics and Data Analysis: *51*, 8, 3983-3998. doi:10.1016/j.csda.2006.06.008
Su, S. (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9. doi:10.18637/jss.v021.i09
Su, S. (2009). Confidence Intervals for Quantiles Using Generalized Lambda Distributions. Computational Statistics and Data Analysis *53*, 9, 3324-3333. doi:10.1016/j.csda.2009.02.014
Su, S. (2010). Chapter 14: Fitting GLDs and Mixture of GLDs to Data using Quantile Matching Method. Handbook of Distribution Fitting Methods with R. (Karian and Dudewicz) 557-583. doi:10.1201/b10159-3
Su, S. (2010). Chapter 15: Fitting GLD to data using GLDEX 1.0.4 in R. Handbook of Distribution Fitting Methods with R. (Karian and Dudewicz) 585-608. doi:10.1201/b10159-3
Su, S. (2015) "Flexible Parametric Quantile Regression Model" Statistics & Computing 25 (3). 635-650. doi:10.1007/s11222-014-9457-1
Su S. (2021) "Flexible parametric accelerated failure time model" J Biopharm Stat. 2021 Sep 31(5):650-667. doi:10.1080/10543406.2021.1934854
GLDreg package in R for GLD regression models.
###Univariate distributions example: set.seed(1000) junk<-rweibull(300,3,2) ##A faster ML estimtion junk.fit1<-fun.RMFMKL.ml.m(junk) junk.fit2<-fun.RPRS.ml.m(junk) qqplot.gld(junk.fit1,data=junk,param="fmkl") qqplot.gld(junk.fit2,data=junk,param="rs") ##Using discretised approach, with 50 classes #Using discretised method with weights obj.fit1.hs<-fun.data.fit.hs(junk) #Plot the resulting fit. The fun.plot.fit function also works for singular fits #such as those by fun.plot.fit(obj.fit1.ml,junk,nclass=50, #param=c("rs","fmkl","fmkl"),xlab="x") fun.plot.fit(obj.fit1.hs,junk,nclass=50,param=c("rs","fmkl"),xlab="x") #Compare the mean, variance, skewness and kurtosis of the fitted distribution #with actual data fun.theo.mv.gld(obj.fit1.hs[1,1],obj.fit1.hs[2,1],obj.fit1.hs[3,1], obj.fit1.hs[4,1],param="rs") fun.theo.mv.gld(obj.fit1.hs[1,2],obj.fit1.hs[2,2],obj.fit1.hs[3,2], obj.fit1.hs[4,2],param="fmkl") fun.moments.r(junk) #Conduct resample KS tests fun.diag.ks.g(obj.fit1.hs[,1],junk,param="rs") fun.diag.ks.g(obj.fit1.hs[,2],junk,param="fmkl") ##Try another fit, say 15 classes obj.fit2.hs<-fun.data.fit.hs(junk,rs.default="N",fmkl.default="N",no.c.rs = 15, no.c.fmkl = 15) fun.plot.fit(obj.fit2.hs,junk,nclass=50,param=c("rs","fmkl"),xlab="x") fun.theo.mv.gld(obj.fit2.hs[1,1],obj.fit2.hs[2,1],obj.fit2.hs[3,1], obj.fit2.hs[4,1],param="rs") fun.theo.mv.gld(obj.fit2.hs[1,2],obj.fit2.hs[2,2],obj.fit2.hs[3,2], obj.fit2.hs[4,2],param="fmkl") fun.moments.r(junk) fun.diag.ks.g(obj.fit2.hs[,1],junk,param="rs") fun.diag.ks.g(obj.fit2.hs[,2],junk,param="fmkl") ##Uses the maximum likelihood estimation method #Fit the function using fun.data.fit.ml obj.fit1.ml<-fun.data.fit.ml(junk) #Plot the resulting fit fun.plot.fit(obj.fit1.ml,junk,nclass=50,param=c("rs","fmkl","fmkl"),xlab="x", name=".ML") #Compare the mean, variance, skewness and kurtosis of the fitted distribution #with actual data fun.comp.moments.ml(obj.fit1.ml,junk) #Do a quantile plot qqplot.gld(junk,obj.fit1.ml[,1],param="rs",name="RS ML fit") #Run a KS resample test on the resulting fit fun.diag2(obj.fit1.ml,junk,1000) #It is possible to use say fun.data.fit.ml(junk,rs.leap=409,fmkl.leap=409, #FUN="QUnif") to find solution under a different set of low discrepancy number #generators. ###Bimodal distributions example: #Fitting mixture of generalised lambda distributions on the data set using both #the maximum likelihood and partition maximum likelihood and plot the resulting #fits opar <- par() par(mfrow=c(2,1)) junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) #Do a quantile plot qqplot.gld.bi(faithful[,1],junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit",range=c(0.001,0.999)) par(opar) junk<-fun.auto.bimodal.pml(faithful[,1],clustering.m=clara,init1.sel="rprs", init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Partition maximum likelihood using",xlab="faithful1", param.vec=c("rs","fmkl")) #Fit the faithful[,1] data from the dataset library using sobol sequence #generator for the first distribution fit and halton sequence for the second #distribution fit. fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rmfmkl",init2.sel="rmfmkl", init1=c(-0.25,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3,fun1="runif.sobol", fun2="runif.halton") #Run diagnostic KS tests fun.diag.ks.g.bimodal(fit1$par[1:4],fit1$par[5:8],prop1=fit1$par[9], data=faithful[,1],param1="fmkl",param2="fmkl") #Find the theoretical moments of the fit fun.theo.bi.mv.gld(fit1$par[1],fit1$par[2],fit1$par[3],fit1$par[4],"fmkl", fit1$par[5],fit1$par[6],fit1$par[7],fit1$par[8],"fmkl",fit1$par[9]) #Compare this with the empirical moments from the data set. fun.moments.r(faithful[,1]) #Do a quantile plot qqplot.gld.bi(faithful[,1],fit1$par,param1="fmkl",param2="fmkl", name="FMKL FMKL ML fit") #Quantile matching method #Fitting faithful data from the dataset library, with the clara clustering #regime. The first distribution is RS and the second distribution is fmkl. #The percentage of data mix is 1%. #Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.qs(junk) fun.auto.bimodal.qs(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3) #L Moment matching #Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.lm(junk) # Moment matching method #Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.mm(junk) # Example on fitting mixture of normal distributions data1<-c(rnorm(1500,-1,2/3),rnorm(1500,1,2/3)) junk<-fun.auto.bimodal.ml(data1,per.of.mix=0.01,clustering.m=data1>0, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=data1, name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) qqplot.gld.bi(data1,junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit",range=c(0.001,0.999)) # Generate random observations from FMKL generalised lambda distributions with # parameters (1,2,3,4) and (4,3,2,1) with 50% of data from each distribution. fun.simu.bimodal(c(1,2,3,4),c(4,3,2,1),prop1=0.5,param1="fmkl",param2="fmkl")
###Univariate distributions example: set.seed(1000) junk<-rweibull(300,3,2) ##A faster ML estimtion junk.fit1<-fun.RMFMKL.ml.m(junk) junk.fit2<-fun.RPRS.ml.m(junk) qqplot.gld(junk.fit1,data=junk,param="fmkl") qqplot.gld(junk.fit2,data=junk,param="rs") ##Using discretised approach, with 50 classes #Using discretised method with weights obj.fit1.hs<-fun.data.fit.hs(junk) #Plot the resulting fit. The fun.plot.fit function also works for singular fits #such as those by fun.plot.fit(obj.fit1.ml,junk,nclass=50, #param=c("rs","fmkl","fmkl"),xlab="x") fun.plot.fit(obj.fit1.hs,junk,nclass=50,param=c("rs","fmkl"),xlab="x") #Compare the mean, variance, skewness and kurtosis of the fitted distribution #with actual data fun.theo.mv.gld(obj.fit1.hs[1,1],obj.fit1.hs[2,1],obj.fit1.hs[3,1], obj.fit1.hs[4,1],param="rs") fun.theo.mv.gld(obj.fit1.hs[1,2],obj.fit1.hs[2,2],obj.fit1.hs[3,2], obj.fit1.hs[4,2],param="fmkl") fun.moments.r(junk) #Conduct resample KS tests fun.diag.ks.g(obj.fit1.hs[,1],junk,param="rs") fun.diag.ks.g(obj.fit1.hs[,2],junk,param="fmkl") ##Try another fit, say 15 classes obj.fit2.hs<-fun.data.fit.hs(junk,rs.default="N",fmkl.default="N",no.c.rs = 15, no.c.fmkl = 15) fun.plot.fit(obj.fit2.hs,junk,nclass=50,param=c("rs","fmkl"),xlab="x") fun.theo.mv.gld(obj.fit2.hs[1,1],obj.fit2.hs[2,1],obj.fit2.hs[3,1], obj.fit2.hs[4,1],param="rs") fun.theo.mv.gld(obj.fit2.hs[1,2],obj.fit2.hs[2,2],obj.fit2.hs[3,2], obj.fit2.hs[4,2],param="fmkl") fun.moments.r(junk) fun.diag.ks.g(obj.fit2.hs[,1],junk,param="rs") fun.diag.ks.g(obj.fit2.hs[,2],junk,param="fmkl") ##Uses the maximum likelihood estimation method #Fit the function using fun.data.fit.ml obj.fit1.ml<-fun.data.fit.ml(junk) #Plot the resulting fit fun.plot.fit(obj.fit1.ml,junk,nclass=50,param=c("rs","fmkl","fmkl"),xlab="x", name=".ML") #Compare the mean, variance, skewness and kurtosis of the fitted distribution #with actual data fun.comp.moments.ml(obj.fit1.ml,junk) #Do a quantile plot qqplot.gld(junk,obj.fit1.ml[,1],param="rs",name="RS ML fit") #Run a KS resample test on the resulting fit fun.diag2(obj.fit1.ml,junk,1000) #It is possible to use say fun.data.fit.ml(junk,rs.leap=409,fmkl.leap=409, #FUN="QUnif") to find solution under a different set of low discrepancy number #generators. ###Bimodal distributions example: #Fitting mixture of generalised lambda distributions on the data set using both #the maximum likelihood and partition maximum likelihood and plot the resulting #fits opar <- par() par(mfrow=c(2,1)) junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) #Do a quantile plot qqplot.gld.bi(faithful[,1],junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit",range=c(0.001,0.999)) par(opar) junk<-fun.auto.bimodal.pml(faithful[,1],clustering.m=clara,init1.sel="rprs", init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Partition maximum likelihood using",xlab="faithful1", param.vec=c("rs","fmkl")) #Fit the faithful[,1] data from the dataset library using sobol sequence #generator for the first distribution fit and halton sequence for the second #distribution fit. fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rmfmkl",init2.sel="rmfmkl", init1=c(-0.25,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3,fun1="runif.sobol", fun2="runif.halton") #Run diagnostic KS tests fun.diag.ks.g.bimodal(fit1$par[1:4],fit1$par[5:8],prop1=fit1$par[9], data=faithful[,1],param1="fmkl",param2="fmkl") #Find the theoretical moments of the fit fun.theo.bi.mv.gld(fit1$par[1],fit1$par[2],fit1$par[3],fit1$par[4],"fmkl", fit1$par[5],fit1$par[6],fit1$par[7],fit1$par[8],"fmkl",fit1$par[9]) #Compare this with the empirical moments from the data set. fun.moments.r(faithful[,1]) #Do a quantile plot qqplot.gld.bi(faithful[,1],fit1$par,param1="fmkl",param2="fmkl", name="FMKL FMKL ML fit") #Quantile matching method #Fitting faithful data from the dataset library, with the clara clustering #regime. The first distribution is RS and the second distribution is fmkl. #The percentage of data mix is 1%. #Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.qs(junk) fun.auto.bimodal.qs(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3) #L Moment matching #Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.lm(junk) # Moment matching method #Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.mm(junk) # Example on fitting mixture of normal distributions data1<-c(rnorm(1500,-1,2/3),rnorm(1500,1,2/3)) junk<-fun.auto.bimodal.ml(data1,per.of.mix=0.01,clustering.m=data1>0, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=data1, name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) qqplot.gld.bi(data1,junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit",range=c(0.001,0.999)) # Generate random observations from FMKL generalised lambda distributions with # parameters (1,2,3,4) and (4,3,2,1) with 50% of data from each distribution. fun.simu.bimodal(c(1,2,3,4),c(4,3,2,1),prop1=0.5,param1="fmkl",param2="fmkl")
Integer number representations in other Bases.
Formally, for every element x[i]
, compute the (vector
of) “digits” of the
base
representation of the number
,
.
Revert such a representation to integers.
digitsBase(x, base = 2, ndigits = 1 + floor(log(max(x), base)))
digitsBase(x, base = 2, ndigits = 1 + floor(log(max(x), base)))
x |
For For |
base |
integer, at least 2 specifying the base for representation. |
ndigits |
number of bits/digits to use. |
For digitsBase()
, an object, say m
, of class
"basedInt"
which is basically a (ndigits
x n
)
matrix
where m[,i]
corresponds to x[i]
,
n <- length(x)
and attr(m,"base")
is the input
base
.
as.intBase()
and the as.integer
method for
basedInt
objects return an integer
vector.
digits
and digits.v
are now deprecated and will be
removed from the sfsmisc package.
Martin Maechler, Dec 4, 1991 (for S-plus; then called digits.v
).
digitsBase(0:12, 8) #-- octal representation ## This may be handy for just one number (and default decimal): digits <- function(n, base = 10) as.vector(digitsBase(n, base = base)) digits(128, base = 8) # 2 0 0 ## one way of pretty printing (base <= 10!) b2ch <- function(db) noquote(gsub("^0+(.{1,})$"," \1", apply(db, 2, paste, collapse = ""))) b2ch(digitsBase(0:33, 2)) #-> 0 1 10 11 100 101 ... 100001 b2ch(digitsBase(0:33, 4)) #-> 0 1 2 3 10 11 12 13 20 ... 200 201 ## Hexadecimal: i <- c(1:20, 100:106) M <- digitsBase(i, 16) hexdig <- c(0:9, LETTERS[1:6]) cM <- hexdig[1 + M]; dim(cM) <- dim(M) b2ch(cM) #-> 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 ... 6A
digitsBase(0:12, 8) #-- octal representation ## This may be handy for just one number (and default decimal): digits <- function(n, base = 10) as.vector(digitsBase(n, base = base)) digits(128, base = 8) # 2 0 0 ## one way of pretty printing (base <= 10!) b2ch <- function(db) noquote(gsub("^0+(.{1,})$"," \1", apply(db, 2, paste, collapse = ""))) b2ch(digitsBase(0:33, 2)) #-> 0 1 10 11 100 101 ... 100001 b2ch(digitsBase(0:33, 4)) #-> 0 1 2 3 10 11 12 13 20 ... 200 201 ## Hexadecimal: i <- c(1:20, 100:106) M <- digitsBase(i, 16) hexdig <- c(0:9, LETTERS[1:6]) cM <- hexdig[1 + M]; dim(cM) <- dim(M) b2ch(cM) #-> 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 ... 6A
This function will fit mixture of generalised lambda distributions to dataset. It is restricted to two generalised lambda distributions. The method of fitting is maximum likelihood via EM algorithm. It is a two step optimization procedure, each unimodal part of the bimodal distribution is modelled using the maximum likelihood method or the starship method (FMKL GLD only), these initial values are the used to maximise the likelihood for the entire bimodal distribution using the EM algorithm. It fits mixture of the form p*(f1)+(1-p)*(f2) where f1 and f2 are pdfs of the generalised lambda distributions.
fun.auto.bimodal.ml(data, per.of.mix = 0.01, clustering.m = clara, init1.sel = "rprs", init2.sel = "rprs", init1=c(-1.5, 1.5), init2=c(-1.5, 1.5), leap1=3, leap2=3,fun1="runif.sobol",fun2="runif.sobol",no=10000, max.it=5000, optim.further="Y")
fun.auto.bimodal.ml(data, per.of.mix = 0.01, clustering.m = clara, init1.sel = "rprs", init2.sel = "rprs", init1=c(-1.5, 1.5), init2=c(-1.5, 1.5), leap1=3, leap2=3,fun1="runif.sobol",fun2="runif.sobol",no=10000, max.it=5000, optim.further="Y")
data |
A numerical vector representing the dataset. |
per.of.mix |
Level of mix between two parts of the distribution, usually 1-2% of cross mix is sufficient. |
clustering.m |
Clustering method used in classifying the dataset into two parts. Valid arguments include clara, fanny and pam from the cluster library. Default is clara. Or a logical vector specifying how data should be split. |
init1.sel |
This can be |
init2.sel |
This can be |
init1 |
Inititial values lambda3 and lambda4 for the first generalised lambda distribution. |
init2 |
Inititial values lambda3 and lambda4 for the second generalised lambda distribution. |
leap1 |
See scrambling argument in |
fun1 |
A character string of either |
leap2 |
See scrambling argument in |
fun2 |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
max.it |
Maximum number of iterations for numerical optimisation. |
optim.further |
Whether to optimise the function further using full maximum likelihood method, recommended setting is "Y" |
The initial values that work well for RPRS are c(-1.5,1.5)
and for RMFMKL
are c(-0.25,1.5)
. For scrambling, if 1
, 2
or 3
the
sequence is scrambled otherwise not. If 1
, Owen type type of scrambling
is applied, if 2
, Faure-Tezuka type of scrambling, is applied, and if
3
, both Owen+Faure-Tezuka type of scrambling is applied. The star
method uses the same initial values as rmfmkl
since it uses the FMKL
generalised lambda distribution. Nelder-Simplex algorithm is used in the
numerical optimization. rprs
stands for revised percentile method for
RS generalised lambda distribution and "rmfmkl" stands for revised method of
moment for FMKL generalised lambda distribution. These acronyms represents the
initial optimization algorithm used to get a reasonable set of initial values
for the subsequent optimization procedues.
This function is an improvement from Su (2007) in Journal of Statistical
Software.
par |
The best set of parameters found, the first four corresponds to the first distribution fit, the second four corresponds to the second distribution fit, the last value correspond to p for the first distribution fit. |
value |
The value of -ML for the paramters obtained. |
counts |
A two-element integer vector giving the number of calls to
|
convergence |
|
message |
A character string giving any additional information returned by
the optimizer, or |
If the number of observations is small,
rprs
can sometimes fail as the percentiles may not exist for this data.
Also, if the initial values do not span a valid generalised lambda distribution,
try another set of initial values.
Steve Su
Bratley P. and Fox B.L. (1988) Algorithm 659: Implementing Sobol's quasi random sequence generator, ACM Transactions on Mathematical Software 14, 88-100.
Joe S. and Kuo F.Y. (1998) Remark on Algorithm 659: Implementing Sobol's quasi random Sequence Generator.
Nelder, J. A. and Mead, R. (1965) A simplex algorithm for function minimization. Computer Journal *7*, 308-313.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.auto.bimodal.pml
, fun.plot.fit.bm
,
fun.diag.ks.g.bimodal
# Fitting faithful data from the dataset library, with the clara clustering # regime. The first distribution is RS and the second distribution is fmkl. # The percentage of data mix is 1%. fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3)
# Fitting faithful data from the dataset library, with the clara clustering # regime. The first distribution is RS and the second distribution is fmkl. # The percentage of data mix is 1%. fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3)
This function will fit mixture of generalised lambda distributions to dataset. It is restricted to two generalised lambda distributions. The method of fitting is parition maximum likelihood. It is a two step optimization procedure, each unimodal part of the bimodal distribution is modelled using the maximum likelihood method or the starship method (FMKL GLD only). These initial values the used to "maximise" the complete log likelihood for the entire bimodal distribution. It fits mixture of the form p*(f1)+(1-p)*(f2) where f1 and f2 are pdfs of the generalised lambda distributions.
fun.auto.bimodal.pml(data, clustering.m = clara, init1.sel = "rprs", init2.sel = "rprs", init1=c(-1.5, 1.5), init2=c(-1.5, 1.5), leap1=3, leap2=3, fun1="runif.sobol", fun2="runif.sobol",no=10000,max.it=5000, optim.further="Y")
fun.auto.bimodal.pml(data, clustering.m = clara, init1.sel = "rprs", init2.sel = "rprs", init1=c(-1.5, 1.5), init2=c(-1.5, 1.5), leap1=3, leap2=3, fun1="runif.sobol", fun2="runif.sobol",no=10000,max.it=5000, optim.further="Y")
data |
A numerical vector representing the dataset. |
clustering.m |
Clustering method used in classifying the dataset into two parts. Valid arguments include clara, fanny and pam from the cluster library. Default is clara. Or a logical vector specifying how data should be split. |
init1.sel |
This can be |
init2.sel |
This can be |
init1 |
Inititial values lambda3 and lambda4 for the first generalised lambda distribution. |
init2 |
Inititial values lambda3 and lambda4 for the second generalised lambda distribution. |
leap1 |
See scrambling argument in |
fun1 |
A character string of either |
leap2 |
See scrambling argument in |
fun2 |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
max.it |
Maximum number of iterations for numerical optimisation. |
optim.further |
Whether to optimise the function further using full maximum likelihood method, recommended setting is "Y" |
The initial values that work well for RPRS are c(-1.5,1.5)
and for RMFMKL
are c(-0.25,1.5)
. For scrambling, if 1
, 2
or 3
the
sequence is scrambled otherwise not. If 1
, Owen type type of scrambling
is applied, if 2
, Faure-Tezuka type of scrambling, is applied, and if
3
, both Owen+Faure-Tezuka type of scrambling is applied. The star
method uses the same initial values as rmfmkl
since it uses the FMKL
generalised lambda distribution. Nelder-Simplex algorithm is used in the
numerical optimization. rprs
stands for revised percentile method for RS
generalised lambda distribution and "rmfmkl" stands for revised method of moment
for FMKL generalised lambda distribution. These acronyms represents the initial
optimization algorithm used to get a reasonable set of initial values for the
subsequent optimization procedues. This function is an improvement from Su
(2007) in Journal of Statistical Software.
par |
The best set of parameters found, the first four corresponds to the first distribution fit, the second four corresponds to the second distribution fit, the last value correspond to p for the first distribution fit. |
value |
The value of -PML for the paramters obtained. |
counts |
A two-element integer vector giving the number of calls to "fn" and "gr" respectively. This excludes those calls needed to compute the Hessian, if requested, and any calls to 'fn' to compute a finite-difference approximation to the gradient. |
convergence |
|
message |
A character string giving any additional information returned by
the optimizer, or |
If the number of observations is small, rprs
can sometimes fail as
the percentiles may not exist for this data. Also, if the initial values do not
result in a valid generalised lambda distribution, try another set of initial
values.
Bratley P. and Fox B.L. (1988) Algorithm 659: Implementing Sobol's quasi random sequence generator, ACM Transactions on Mathematical Software 14, 88-100.
Joe S. and Kuo F.Y. (1998) Remark on Algorithm 659: Implementing Sobol's quasi random Sequence Generator.
Nelder, J. A. and Mead, R. (1965) A simplex algorithm for function minimization. Computer Journal *7*, 308-313.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.auto.bimodal.ml
,fun.plot.fit.bm
,
fun.diag.ks.g.bimodal
# Fitting faithful data from the dataset library, with the clara clustering # regime. The first distribution is RS and the second distribution is fmkl. fun.auto.bimodal.pml(faithful[,1],clustering.m=clara,init1.sel="rprs", init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5),leap1=3,leap2=3)
# Fitting faithful data from the dataset library, with the clara clustering # regime. The first distribution is RS and the second distribution is fmkl. fun.auto.bimodal.pml(faithful[,1],clustering.m=clara,init1.sel="rprs", init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5),leap1=3,leap2=3)
This function will fit mixture of generalised lambda distributions to dataset. It is restricted to two generalised lambda distributions. The method of fitting is quantile matching method. It is a two step optimization procedure, each unimodal part of the bimodal distribution is modelled using quantile matching method. The initial values obtained are then used to maximise the theoretical and empirical quantile match for the entire bimodal distribution. It fits mixture of the form p*(f1)+(1-p)*(f2) where f1 and f2 are pdfs of the generalised lambda distributions.
fun.auto.bimodal.qs(data, per.of.mix = 0.01, clustering.m = clara, init1.sel = "rprs", init2.sel = "rprs", init1=c(-1.5, 1.5), init2=c(-1.5, 1.5), leap1=3, leap2=3, fun1 = "runif.sobol", fun2 = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000, maxit = 5000)
fun.auto.bimodal.qs(data, per.of.mix = 0.01, clustering.m = clara, init1.sel = "rprs", init2.sel = "rprs", init1=c(-1.5, 1.5), init2=c(-1.5, 1.5), leap1=3, leap2=3, fun1 = "runif.sobol", fun2 = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000, maxit = 5000)
data |
A numerical vector representing the dataset. |
per.of.mix |
Level of mix between two parts of the distribution, usually 1-2% of cross mix is sufficient. |
clustering.m |
Clustering method used in classifying the dataset into two parts. Valid arguments include clara, fanny and pam from the cluster library. Default is clara. Or a logical vector specifying how data should be split. |
init1.sel |
This can be |
init2.sel |
This can be |
init1 |
Inititial values lambda3 and lambda4 for the first generalised lambda distribution. |
init2 |
Inititial values lambda3 and lambda4 for the second generalised lambda distribution. |
leap1 |
See scrambling argument in |
fun1 |
A character string of either |
leap2 |
See scrambling argument in |
fun2 |
A character string of either |
trial.n |
Number of evenly spaced quantile ranging from 0 to 1 to be
used in the checking phase, to find the best set of initial values for
optimisation, this is intended to be lower than |
len |
Number of evenly spaced quantile ranging from 0 to 1 to be used, default is 1000 |
type |
Type of quantile to be used, default is 7, see |
no |
Number of initial random values to find the best initial values for optimisation. |
maxit |
Maximum number of iterations for numerical optimisation. Default is 5000. |
The initial values that work well for RPRS are c(-1.5,1.5)
and for RMFMKL
are c(-0.25,1.5)
. For scrambling, if 1
, 2
or 3
the
sequence is scrambled otherwise not. If 1
, Owen type type of scrambling
is applied, if 2
, Faure-Tezuka type of scrambling, is applied, and if
3
, both Owen+Faure-Tezuka type of scrambling is applied. The star
method uses the same initial values as rmfmkl
since it uses the FMKL
generalised lambda distribution. Nelder-Simplex algorithm is used in the
numerical optimization. rprs
stands for revised percentile method for
RS generalised lambda distribution and "rmfmkl" stands for revised method of
moment for FMKL generalised lambda distribution. These acronyms represents the
initial optimization algorithm used to get a reasonable set of initial values
for the subsequent optimization procedues.
par |
The best set of parameters found, the first four corresponds to the first distribution fit, the second four corresponds to the second distribution fit, the last value correspond to p for the first distribution fit. |
value |
The value of -ML for the paramters obtained. |
counts |
A two-element integer vector giving the number of calls to
|
convergence |
|
message |
A character string giving any additional information returned by
the optimizer, or |
If the number of observations is small,
rprs
can sometimes fail as the percentiles may not exist for this data.
Also, if the initial values do not span a valid generalised lambda distribution,
try another set of initial values.
Steve Su
Bratley P. and Fox B.L. (1988) Algorithm 659: Implementing Sobol's quasi random sequence generator, ACM Transactions on Mathematical Software 14, 88-100.
Joe S. and Kuo F.Y. (1998) Remark on Algorithm 659: Implementing Sobol's quasi random Sequence Generator.
Nelder, J. A. and Mead, R. (1965) A simplex algorithm for function minimization. Computer Journal *7*, 308-313.
Su (2008). Fitting GLD to data via quantile matching method. (Book chapter to appear)
fun.auto.bimodal.pml
, fun.auto.bimodal.ml
,
fun.plot.fit.bm
,
fun.diag.ks.g.bimodal
# Fitting faithful data from the dataset library, with the clara clustering # regime. The first distribution is RS and the second distribution is fmkl. # The percentage of data mix is 1%. fun.auto.bimodal.qs(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3)
# Fitting faithful data from the dataset library, with the clara clustering # regime. The first distribution is RS and the second distribution is fmkl. # The percentage of data mix is 1%. fun.auto.bimodal.qs(faithful[,1],per.of.mix=0.01,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3)
This is the secondary optimization procedure to evaluate the final bimodal
distribution fits using the maximum likelihood. It usually relies on initial
values found by fun.bimodal.init
function.
fun.bimodal.fit.ml(data, first.fit, second.fit, prop, param1, param2, selc1, selc2)
fun.bimodal.fit.ml(data, first.fit, second.fit, prop, param1, param2, selc1, selc2)
data |
Dataset to be fitted. |
first.fit |
The distribution parameters or the initial values of the first distribution fit. |
second.fit |
The distribution parameters or the initial values of the second distribution fit. |
prop |
The proportion of the data set, usually obtained from
|
param1 |
Can be either |
param2 |
Can be either |
selc1 |
Selection of initial values for the first distribution, can be
either |
selc2 |
Selection of initial values for the second distribution, can be
either |
This function should be used in tandem with fun.bimodal.init
.
par |
The first four numbers are the parameters of the first generalised lambda distribution, the second four numbers are the parameters of the second generalised lambda distribution and the last value is the proportion of the first generalised lambda distribution. |
value |
The objective value of negative likelihood obtained using the par above. |
counts |
A two-element integer vector giving the number of calls to functions. Gradient is not used in this case. |
convergence |
An integer code.
|
message |
A character string giving any additional information returned
by the optimizer, or |
There is currently no guarantee of a global convergence.
Steve Su
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
link{fun.bimodal.fit.pml}
, fun.bimodal.init
# Extract faithful[,2] into faithful2 faithful2<-faithful[,2] # Uses clara clustering method clara.faithful2<-fun.class.regime.bi(faithful2, 0.01, clara) # Save into two different objects qqqq1.faithful2.cc<-clara.faithful2$data.a qqqq2.faithful2.cc<-clara.faithful2$data.b # Find the initial values result.faithful2.init<-fun.bimodal.init(data1=qqqq1.faithful2.cc, data2=qqqq2.faithful2.cc, rs.leap1=3,fmkl.leap1=3,rs.init1 = c(-1.5, 1.5), fmkl.init1 = c(-0.25, 1.5), rs.leap2=3,fmkl.leap2=3,rs.init2 = c(-1.5, 1.5), fmkl.init2 = c(-0.25, 1.5)) # Find the final fits result.faithful2.rsrs<-fun.bimodal.fit.ml(data=faithful2, result.faithful2.init[[2]],result.faithful2.init[[3]], result.faithful2.init[[1]], param1="rs",param2="rs",selc1="rs",selc2="rs") # Output result.faithful2.rsrs
# Extract faithful[,2] into faithful2 faithful2<-faithful[,2] # Uses clara clustering method clara.faithful2<-fun.class.regime.bi(faithful2, 0.01, clara) # Save into two different objects qqqq1.faithful2.cc<-clara.faithful2$data.a qqqq2.faithful2.cc<-clara.faithful2$data.b # Find the initial values result.faithful2.init<-fun.bimodal.init(data1=qqqq1.faithful2.cc, data2=qqqq2.faithful2.cc, rs.leap1=3,fmkl.leap1=3,rs.init1 = c(-1.5, 1.5), fmkl.init1 = c(-0.25, 1.5), rs.leap2=3,fmkl.leap2=3,rs.init2 = c(-1.5, 1.5), fmkl.init2 = c(-0.25, 1.5)) # Find the final fits result.faithful2.rsrs<-fun.bimodal.fit.ml(data=faithful2, result.faithful2.init[[2]],result.faithful2.init[[3]], result.faithful2.init[[1]], param1="rs",param2="rs",selc1="rs",selc2="rs") # Output result.faithful2.rsrs
This is the secondary optimization procedure to evaluate the final bimodal
distribution fits using the partition
maximum likelihood. It usually relies on initial values found by
fun.bimodal.init
function.
fun.bimodal.fit.pml(data1, data2, first.fit, second.fit, prop, param1, param2, selc1, selc2)
fun.bimodal.fit.pml(data1, data2, first.fit, second.fit, prop, param1, param2, selc1, selc2)
data1 |
First data set, usually obtained by
|
data2 |
Second data set, usually obtained by
|
first.fit |
The distribution parameters or the initial values of the first distribution fit. |
second.fit |
The distribution parameters or the initial values of the second distribution fit. |
prop |
The proportion of the data set, usually obtained from
|
param1 |
Can be either |
param2 |
Can be either |
selc1 |
Selection of initial values for the first distribution, can be
either |
selc2 |
Selection of initial values for the second distribution, can be
either |
This function should be used in tandem with fun.bimodal.init
function.
par |
The first four numbers are the parameters of the first generalised lambda distribution, the second four numbers are the parameters of the second generalised lambda distribution and the last value is the proportion of the first generalised lambda distribution. |
value |
The objective value of negative likelihood obtained. |
counts |
A two-element integer vector giving the number of calls to functions. Gradient is not used in this case. |
convergence |
An integer code.
|
message |
A character string giving any additional information returned
by the optimizer, or |
There is currently no guarantee of a global convergence.
Steve Su
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.bimodal.fit.ml
, fun.bimodal.init
# Extract faithful[,2] into faithful2 faithful2<-faithful[,2] # Uses clara clustering method clara.faithful2<-clara(faithful2,2)$clustering # Save into two different objects qqqq1.faithful2.cc<-faithful2[clara.faithful2==1] qqqq2.faithful2.cc<-faithful2[clara.faithful2==2] # Find the initial values result.faithful2.init<-fun.bimodal.init(data1=qqqq1.faithful2.cc, data2=qqqq2.faithful2.cc, rs.leap1=3,fmkl.leap1=3,rs.init1 = c(-1.5, 1.5), fmkl.init1 = c(-0.25, 1.5), rs.leap2=3,fmkl.leap2=3,rs.init2 = c(-1.5, 1.5), fmkl.init2 = c(-0.25, 1.5)) # Find the final fits result.faithful2.rsrs<-fun.bimodal.fit.pml(data1=qqqq1.faithful2.cc, data2=qqqq2.faithful2.cc, result.faithful2.init[[2]], result.faithful2.init[[3]], result.faithful2.init[[1]],param1="rs", param2="rs",selc1="rs",selc2="rs") # Output result.faithful2.rsrs
# Extract faithful[,2] into faithful2 faithful2<-faithful[,2] # Uses clara clustering method clara.faithful2<-clara(faithful2,2)$clustering # Save into two different objects qqqq1.faithful2.cc<-faithful2[clara.faithful2==1] qqqq2.faithful2.cc<-faithful2[clara.faithful2==2] # Find the initial values result.faithful2.init<-fun.bimodal.init(data1=qqqq1.faithful2.cc, data2=qqqq2.faithful2.cc, rs.leap1=3,fmkl.leap1=3,rs.init1 = c(-1.5, 1.5), fmkl.init1 = c(-0.25, 1.5), rs.leap2=3,fmkl.leap2=3,rs.init2 = c(-1.5, 1.5), fmkl.init2 = c(-0.25, 1.5)) # Find the final fits result.faithful2.rsrs<-fun.bimodal.fit.pml(data1=qqqq1.faithful2.cc, data2=qqqq2.faithful2.cc, result.faithful2.init[[2]], result.faithful2.init[[3]], result.faithful2.init[[1]],param1="rs", param2="rs",selc1="rs",selc2="rs") # Output result.faithful2.rsrs
After classifying the data using fun.class.regime.bi
, this
function evaluates the temporary or initial solutions
by estimating each part of the bimodal distribution using the maximum likelihood
estimation and starship method.
These initial solutions are then passed onto fun.bimodal.fit.ml
or
fun.bimodal.fit.pml
to obtain the final fits.
fun.bimodal.init(data1, data2, rs.leap1, fmkl.leap1, rs.init1, fmkl.init1, rs.leap2, fmkl.leap2, rs.init2, fmkl.init2,fun1="runif.sobol", fun2="runif.sobol",no=10000)
fun.bimodal.init(data1, data2, rs.leap1, fmkl.leap1, rs.init1, fmkl.init1, rs.leap2, fmkl.leap2, rs.init2, fmkl.init2,fun1="runif.sobol", fun2="runif.sobol",no=10000)
data1 |
The first data obtained by the clustering algorithm. |
data2 |
The second data obtained by the clustering algorithm. |
rs.leap1 |
See scrambling argument in |
fmkl.leap1 |
See scrambling argument in |
rs.init1 |
Inititial values (lambda3 and lambda4) for the first RS
generalised lambda distribution. |
fmkl.init1 |
Inititial values (lambda3 and lambda4) for the first FMKL
generalised lambda distribution. |
rs.leap2 |
See scrambling argument in |
fmkl.leap2 |
See scrambling argument in |
rs.init2 |
Inititial values (lambda3 and lambda4) for the second RS
generalised lambda distribution. |
fmkl.init2 |
Inititial values (lambda3 and lambda4) for the second FMKL
generalised lambda distribution. |
fun1 |
A character string of either |
fun2 |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
All three methods of fitting (RPRS, RMFMKL and STAR) will be given for each part of the bimodal distribution.
prop |
Proportion of the number of observations in the first data in relation to the entire data. |
first.fit |
A matrix comprising the parameters of GLD obtained from RPRS, RMFMKL and STAR for the first dataset. |
second.fit |
A matrix comprising the parameters of GLD obtained from RPRS, RMFMKL and STAR for the second dataset. |
This is not designed to be called by the end user explicitly, the difficulties with RPRS parameterisation should be noted by the users.
Steve Su
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.class.regime.bi
,fun.bimodal.fit.pml
,
fun.bimodal.fit.ml
# Split the first column of the faithful data into two using fun.class.regime.bi faithful1.mod<-fun.class.regime.bi(faithful[,1], 0.1, clara) # Save the datasets qqqq1.faithful1.cc1<-faithful1.mod$data.a qqqq2.faithful1.cc1<-faithful1.mod$data.b # Find the initial values for secondary optimisation. result.faithful1.init1<-fun.bimodal.init(data1=qqqq1.faithful1.cc1, data2=qqqq2.faithful1.cc1, rs.leap1=3,fmkl.leap1=3,rs.init1 = c(-1.5, 1.5), fmkl.init1 = c(-0.25, 1.5), rs.leap2=3,fmkl.leap2=3,rs.init2 = c(-1.5, 1.5), fmkl.init2 = c(-0.25, 1.5)) # These initial values are then passed onto fun,bimodal.fit.ml to obtain the # final fits.
# Split the first column of the faithful data into two using fun.class.regime.bi faithful1.mod<-fun.class.regime.bi(faithful[,1], 0.1, clara) # Save the datasets qqqq1.faithful1.cc1<-faithful1.mod$data.a qqqq2.faithful1.cc1<-faithful1.mod$data.b # Find the initial values for secondary optimisation. result.faithful1.init1<-fun.bimodal.init(data1=qqqq1.faithful1.cc1, data2=qqqq2.faithful1.cc1, rs.leap1=3,fmkl.leap1=3,rs.init1 = c(-1.5, 1.5), fmkl.init1 = c(-0.25, 1.5), rs.leap2=3,fmkl.leap2=3,rs.init2 = c(-1.5, 1.5), fmkl.init2 = c(-0.25, 1.5)) # These initial values are then passed onto fun,bimodal.fit.ml to obtain the # final fits.
This function will return a single logical value showing whether a combination of L1, L2, L3 and L4 is a valid GLD.
fun.check.gld(lambda1, lambda2, lambda3, lambda4, param)
fun.check.gld(lambda1, lambda2, lambda3, lambda4, param)
lambda1 |
A numerical vector for L1 of GLD |
lambda2 |
A numerical vector for L2 of GLD |
lambda3 |
A numerical vector for L3 of GLD |
lambda4 |
A numerical vector for L4 of GLD |
param |
Can be "rs", "fmkl", or "fkml" |
A single logical value indicating whether the specified GLD is a valid probability density function
Steve Su
fun.check.gld(1,4,3,2,"rs") fun.check.gld(1,4,3,2,"fkml") fun.check.gld(1,4,3,-2,"rs")
fun.check.gld(1,4,3,2,"rs") fun.check.gld(1,4,3,2,"fkml") fun.check.gld(1,4,3,-2,"rs")
This function will return a logical vector showing whether vector combinations of L1, L2, L3 and L4 are valid GLDs.
fun.check.gld.multi(l1, l2, l3, l4, param)
fun.check.gld.multi(l1, l2, l3, l4, param)
l1 |
A numerical vector for L1 of GLD |
l2 |
A numerical vector for L2 of GLD |
l3 |
A numerical vector for L3 of GLD |
l4 |
A numerical vector for L4 of GLD |
param |
Can be "rs", "fmkl", or "fkml" |
This is an extension to fun.check.gld
A logical vector indicating whether the specified parameters are valid GLDs
Steve Su
fun.check.gld.multi(c(1,2,3),c(4,5,6),c(7,8,9),c(10,11,12),param="rs") fun.check.gld.multi(c(1,2,3),c(4,5,6),c(7,8,9),c(10,11,-12),param="rs")
fun.check.gld.multi(c(1,2,3),c(4,5,6),c(7,8,9),c(10,11,12),param="rs") fun.check.gld.multi(c(1,2,3),c(4,5,6),c(7,8,9),c(10,11,-12),param="rs")
This function is primarily designed to split a bimodal data vector into two groups to allow the fitting of mixture generalised lambda distributions.
fun.class.regime.bi(data, perc.cross, fun.cross)
fun.class.regime.bi(data, perc.cross, fun.cross)
data |
Data to be classified into two groups. |
perc.cross |
Percentage of cross over from one data to the other, usually set at 1% |
fun.cross |
Any clustering function such as |
This function is part of the routine mixture fitting procedure provided in this
package. The perc.cross
argument or percentage of cross over is designed
to allow the use of maximum likelihood estimation via EM algorithm for fitting
bimodal data. When this is invoked, it will ensure both part of the data will
contain both the minmum and maximum of the data set as well as a proportion (
specified in perc.cross argument) of observations from each other. If 1% is
required, then data.a will contains 1% of the data.b and vice versa after the
full data set has been classified into data.a
and data.b
by the
fun.cross
classification regime.
data.a |
First group of data obtained by the classification algorithm. |
data.b |
Second group of data obtained by the classification algorithm. |
Steve Su
Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
Su (2006) Maximum Log Likelihood Estimation using EM Algorithm and Partition Maximum Log Likelihood Estimation for Mixtures of Generalized Lambda Distributions. Working Paper.
# Classify the faithful[,1] data into two categories with 10% cross over mix. fun.class.regime.bi(faithful[,1],0.1,clara) # Classify the faithful[,1] data into two categories with no mixing: fun.class.regime.bi(faithful[,1],0,clara)
# Classify the faithful[,1] data into two categories with 10% cross over mix. fun.class.regime.bi(faithful[,1],0.1,clara) # Classify the faithful[,1] data into two categories with no mixing: fun.class.regime.bi(faithful[,1],0,clara)
After fitting the distribution, it is often desirable to see whether the
moments of the data matches with the fitted distribution. This function computes
the theoretical and actual moments especially for fun.data.fit.ml
function output.
fun.comp.moments.ml(theo.obj, data, name = "ML")
fun.comp.moments.ml(theo.obj, data, name = "ML")
theo.obj |
Fitted distribution parameters, usually output from
|
data |
Data set used |
name |
Naming the method used in fitting the distribution, by default this is "ML". |
r.mat |
A matrix showing the mean, variance, skewness and kurtosis of the fitted distribution in comparison to the data set. |
eval.mat |
Absolute difference in each of the four moments from the data under each of the distibutional fits. |
Sometimes it is difficult to find RPRS type of fits to data set, so
instead fun.comp.moments.ml.2
is used to compare the theoretical
moments of RMFMKL.ML and STAR methods
with respect to the dataset fitted.
Steve Su
# Generate random normally distributed observations. junk<-rnorm(1000,3,2) # Fit the dataset using fun.data.ml fit<-fun.data.fit.ml(junk) # Compare the resulting fits. It is usually the case the maximum likelihood # provides better estimation of the moments than the starship method. fun.comp.moments.ml(fit,junk)
# Generate random normally distributed observations. junk<-rnorm(1000,3,2) # Fit the dataset using fun.data.ml fit<-fun.data.fit.ml(junk) # Compare the resulting fits. It is usually the case the maximum likelihood # provides better estimation of the moments than the starship method. fun.comp.moments.ml(fit,junk)
After fitting the distribution, it is often desirable to see whether the moments of the data matches with the fitted distribution. This function computes the theoretical and actual moments for the FMKL GLD maximum likelihood estimation and starship method.
fun.comp.moments.ml.2(theo.obj, data, name = "ML")
fun.comp.moments.ml.2(theo.obj, data, name = "ML")
theo.obj |
Fitted distribution parameters, there should be two sets, both FMKL GLD. |
data |
Data set used |
name |
Naming the method used in fitting the distribution, by default this is "ML". |
r.mat |
A matrix showing the mean, variance, skewness and kurtosis of the fitted distribution in comparison to the data set. |
eval.mat |
Absolute difference in each of the four moments from the data under each of the distibutional fits. |
To compare all three fits under fun.data.fit.ml
see
fun.comp.moments.ml
function.
Steve Su
## Generate random normally distributed observations. junk<-rnorm(1000,3,2) ## Fit the dataset using fun.data.ml fit<-cbind(fun.RMFMKL.ml(junk),starship(junk)$lambda) ## Compare the resulting fits. It is usually the case the maximum likelihood ## provides better estimation of the moments than the starship method. fun.comp.moments.ml.2(fit,junk)
## Generate random normally distributed observations. junk<-rnorm(1000,3,2) ## Fit the dataset using fun.data.ml fit<-cbind(fun.RMFMKL.ml(junk),starship(junk)$lambda) ## Compare the resulting fits. It is usually the case the maximum likelihood ## provides better estimation of the moments than the starship method. fun.comp.moments.ml.2(fit,junk)
This function fits RS and FMKL generalised distribution to data using discretised approach with weights. It is designed to act as a smoother device rather than a definitive fit.
fun.data.fit.hs(data, rs.default = "Y", fmkl.default = "Y", rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), no.c.rs = 50, no.c.fmkl = 50,FUN="runif.sobol", no=10000)
fun.data.fit.hs(data, rs.default = "Y", fmkl.default = "Y", rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), no.c.rs = 50, no.c.fmkl = 50,FUN="runif.sobol", no=10000)
data |
Dataset to be fitted |
rs.default |
If yes, this function uses the default method
|
fmkl.default |
If yes, this function uses the default method
|
rs.leap |
See scrambling argument in |
fmkl.leap |
See scrambling argument in |
rs.init |
Initial values for RS distribution optimization,
|
fmkl.init |
Initial values for FMKL distribution optimization,
|
no.c.rs |
Number of classes or bins of histogram to be optimized over
for the RS GLD. This argument is ineffective if |
no.c.fmkl |
Number of classes or bins of histogram to be optimized over
for the FMKL GLD. This argument is ineffective if |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function optimises the deviations of frequency of the bins to that of the theoretical so it has the effect of "fitting clothes" onto the data set. The user can decide the frequency of the bins they want the distribution to smooth over. The resulting fit may or may not be an adequate fit from a formal statistical point of view such as satisfying the goodness of fit for example, but it can be useful to suggest the range of different distributions exhibited by the data set. The default number of classes calculates the mean and variance after categorising the data into different bins and uses the number of classes that best matches the mean and variance of the original, ungrouped data.The weighting is designed to accentuate the peak or the dense part of the distribution and suppress the tails.
A matrix showing the four parameters of the RS and FMKL distribution fit.
In some cases, the resulting fit may not converge, there are currently no checking mechanism in place to ensure global convergence. The RPRS method can sometimes fail if there are no valid percentiles in the data set or if initial values do not give a valid distribution.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RPRS.hs.nw
, fun.RMFMKL.hs.nw
,
fun.RMFMKL.hs
, fun.RPRS.hs
,
fun.data.fit.hs.nw
, fun.data.fit.ml
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(1000,3,2) fun.data.fit.hs(junk)
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(1000,3,2) fun.data.fit.hs(junk)
This function fits RS and FMKL generalised distribution to data using discretised approach without weights. It is designed to act as a smoother device rather than a definitive fit.
fun.data.fit.hs.nw(data, rs.default = "Y", fmkl.default = "Y", rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), no.c.rs = 50, no.c.fmkl = 50,FUN="runif.sobol",no=10000)
fun.data.fit.hs.nw(data, rs.default = "Y", fmkl.default = "Y", rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), no.c.rs = 50, no.c.fmkl = 50,FUN="runif.sobol",no=10000)
data |
Dataset to be fitted |
rs.default |
If yes, this function uses the default method
|
fmkl.default |
If yes, this function uses the default method
|
rs.leap |
See scrambling argument in |
fmkl.leap |
See scrambling argument in |
rs.init |
Initial values for RS distribution optimization,
|
fmkl.init |
Initial values for FMKL distribution optimization,
|
no.c.rs |
Number of classes or bins of histogram to be optimized over
for the RS GLD. This argument is ineffective if |
no.c.fmkl |
Number of classes or bins of histogram to be optimized over
for the FMKL GLD. This argument is ineffective if |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function optimises the deviations of frequency of the bins to that of the theoretical so it has the effect of "fitting clothes" onto the data set. The user can decide the frequency of the bins they want the distribution to smooth over. The resulting fit may or may not be an adequate fit from a formal statistical point of view such as satisfying the goodness of fit for example, but it can be useful to suggest the range of different distributions exhibited by the data set. The default number of classes calculates the mean and variance after categorising the data into different bins and uses the number of classes that best matches the mean and variance of the original, ungrouped data.
A matrix showing the four parameters of the RS and FMKL distribution fit.
In some cases, the resulting fit may not converge, there are currently no checking mechanism in place to ensure global convergence. The RPRS method can sometimes fail if there are no valid percentiles in the data set or if initial values do not give a valid distribution.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RPRS.hs
, fun.RMFMKL.hs
,
fun.RMFMKL.hs.nw
, fun.RPRS.hs.nw
,
fun.data.fit.hs
, fun.data.fit.ml
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(1000,3,2) fun.data.fit.hs.nw(junk)
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(1000,3,2) fun.data.fit.hs.nw(junk)
This function fits generalised lambda distributions to data using L moment matching method
fun.data.fit.lm(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), FUN = "runif.sobol", no = 10000)
fun.data.fit.lm(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted. |
rs.leap |
See scrambling argument in |
fmkl.leap |
See scrambling argument in |
rs.init |
Inititial values (lambda3 and lambda4) for the RS generalised lambda distribution. |
fmkl.init |
Inititial values (lambda3 and lambda4) for the FMKL generalised lambda distribution. |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function consolidates fun.RPRS.lm
and
fun.RMFMKL.lm
and gives all the fits in
one output.
A matrix showing the parameters of RS and FMKL generalised lambda distributions.
Steve Su
Asquith, W. (2007). "L-moments and TL-moments of the generalized lambda distribution." Computational Statistics and Data Analysis 51(9): 4484-4496.
Karvanen, J. and A. Nuutinen (2008). "Characterizing the generalized lambda distribution by L-moments." Computational Statistics and Data Analysis 52(4): 1971-1983.
fun.RPRS.qs
, fun.RMFMKL.qs
,
fun.data.fit.ml
, fun.data.fit.hs
,
fun.data.fit.hs.nw
, fun.data.fit.qs
,
fun.data.fit.mm
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.lm(junk)
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.lm(junk)
This function fits generalised lambda distributions to data using RPRS, RMFMKL and starship methods.
fun.data.fit.ml(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5),FUN="runif.sobol",no=10000)
fun.data.fit.ml(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5),FUN="runif.sobol",no=10000)
data |
Dataset to be fitted. |
rs.leap |
See scrambling argument in |
fmkl.leap |
See scrambling argument in |
rs.init |
Inititial values (lambda3 and lambda4) for the RS generalised lambda distribution. |
fmkl.init |
Inititial values (lambda3 and lambda4) for the FMKL generalised lambda distribution. |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function consolidates fun.RPRS.ml
,
fun.RMFMKL.ml
and starship
and gives all the fits in
one output.
A matrix showing the parameters of generalised lambda distribution for RPRS, FMFKL and STAR methods.
RPRS can sometimes fail if it is not possible to calculate the percentiles of the data set. This usually happens when the number of data point is small.
Steve Su
King, R.A.R. & MacGillivray, H. L. (1999), A starship method for fitting the generalised lambda distributions, Australian and New Zealand Journal of Statistics, 41, 353-374
Su, S. (2007). Numerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Computational statistics and data analysis 51(8) 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RPRS.ml
, fun.RMFMKL.ml
,
starship
, fun.data.fit.hs
,
fun.data.fit.hs.nw
,
fun.data.fit.qs
,
fun.data.fit.mm
,
fun.data.fit.lm
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.ml(junk)
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.ml(junk)
This function fits generalised lambda distributions to data using moment matching method
fun.data.fit.mm(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), FUN = "runif.sobol", no = 10000)
fun.data.fit.mm(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted. |
rs.leap |
See scrambling argument in |
fmkl.leap |
See scrambling argument in |
rs.init |
Inititial values (lambda3 and lambda4) for the RS generalised lambda distribution. |
fmkl.init |
Inititial values (lambda3 and lambda4) for the FMKL generalised lambda distribution. |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function consolidates fun.RPRS.mm
and
fun.RMFMKL.mm
and gives all the fits in
one output.
A matrix showing the parameters of RS and FMKL generalised lambda distributions.
RPRS can sometimes fail if it is not possible to calculate the percentiles of the data set. This usually happens when the number of data point is small.
Karian, Z. and E. Dudewicz (2000). Fitting Statistical Distributions: The Generalized Lambda Distribution and Generalised Bootstrap Methods. New York, Chapman and Hall.
fun.RPRS.qs
, fun.RMFMKL.qs
,
fun.data.fit.ml
, fun.data.fit.hs
,
fun.data.fit.hs.nw
, fun.data.fit.qs
,
fun.data.fit.lm
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.mm(junk)
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.mm(junk)
This function fits generalised lambda distributions to data using quantile matching method
fun.data.fit.qs(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), FUN = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000)
fun.data.fit.qs(data, rs.leap = 3, fmkl.leap = 3, rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), FUN = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000)
data |
Dataset to be fitted. |
rs.leap |
See scrambling argument in |
fmkl.leap |
See scrambling argument in |
rs.init |
Inititial values (lambda3 and lambda4) for the RS generalised lambda distribution. |
fmkl.init |
Inititial values (lambda3 and lambda4) for the FMKL generalised lambda distribution. |
FUN |
A character string of either |
trial.n |
Number of evenly spaced quantile ranging from 0 to 1 to be
used in the checking phase, to find the best set of initial values for
optimisation, this is intended to be lower than |
len |
Number of evenly spaced quantile ranging from 0 to 1 to be used, default is 1000 |
type |
Type of quantile to be used, default is 7, see |
no |
Number of initial random values to find the best initial values for optimisation. |
This function consolidates fun.RPRS.qs
and
fun.RMFMKL.qs
and gives all the fits in
one output.
A matrix showing the parameters of RS and FMKL generalised lambda distributions.
RPRS can sometimes fail if it is not possible to calculate the percentiles of the data set. This usually happens when the number of data point is small.
Steve Su
Su (2008). Fitting GLD to data via quantile matching method. (Book chapter to appear)
fun.RPRS.qs
, fun.RMFMKL.qs
,
fun.data.fit.ml
, fun.data.fit.hs
,
fun.data.fit.hs.nw
, fun.data.fit.mm
,
fun.data.fit.lm
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.qs(junk)
# Fitting normal(3,2) distriution using the default setting junk<-rnorm(50,3,2) fun.data.fit.qs(junk)
This function counts the number of times the p-value exceed 0.05 for the null hypothesis that the observations simulated from the fitted distribution is the same as the observations simulated from the unimodal data set.
fun.diag.ks.g(result, data, no.test = 1000, len = floor(0.9 * length(data)), param, alpha = 0.05)
fun.diag.ks.g(result, data, no.test = 1000, len = floor(0.9 * length(data)), param, alpha = 0.05)
result |
A vector representing the four parameters of the generalised lambda distribution. |
data |
The unimodal dataset. |
no.test |
Total number of tests required. |
len |
Number of data to sample. |
param |
Type of the generalised lambda distribution, |
alpha |
Significance level of KS test. |
A numerical value representing number of times the p-value exceeds alpha.
If there are ties, jittering is used in ks.gof
.
Steve Su
Stephens, M. A. (1986). Tests based on EDF statistics. In Goodness-of-Fit Techniques. D'Agostino, R. B. and Stevens, M. A., eds. New York: Marcel Dekker.
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Nmerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Computational Statistics and Data Analysis: *51*, 8, 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
# Generate 1000 random observations from Normal distribution with mean=100, # standard deviation=10. Save this as junk junk<-rnorm(1000,100,10) # Fit junk using RPRS method via the maxmum likelihood. fit1<-fun.RPRS.ml(junk, c(-1.5, 1.5), leap = 3) # Calculate the simulated KS test result: fun.diag.ks.g(fit1,junk,param="rs")
# Generate 1000 random observations from Normal distribution with mean=100, # standard deviation=10. Save this as junk junk<-rnorm(1000,100,10) # Fit junk using RPRS method via the maxmum likelihood. fit1<-fun.RPRS.ml(junk, c(-1.5, 1.5), leap = 3) # Calculate the simulated KS test result: fun.diag.ks.g(fit1,junk,param="rs")
This function counts the number of times the p-value exceed 0.05 for the null hypothesis that the observations simulated from the fitted distribution is the same as the observations simulated from the bimodal data set.
fun.diag.ks.g.bimodal(result1, result2, prop1, prop2, data, no.test = 1000, len = floor(0.9 * length(data)), param1, param2, alpha = 0.05)
fun.diag.ks.g.bimodal(result1, result2, prop1, prop2, data, no.test = 1000, len = floor(0.9 * length(data)), param1, param2, alpha = 0.05)
result1 |
A vector representing the four parameters of the first generalised lambda distribution. |
result2 |
A vector representing the four parameters of the second generalised lambda distribution. |
prop1 |
Proportion of the first distribution fitted to the bimodal dataset. |
prop2 |
Proportion of the second distribution fitted to the bimodal dataset. |
data |
The bimodal dataset. |
no.test |
Total number of tests required. |
len |
Number of data to sample. |
param1 |
Type of first generalised lambda distribution, can be
|
param2 |
Type of second generalised lambda distribution, can be
|
alpha |
Significance level of KS test. |
A numerical value representing number of times the p-value exceeds alpha.
If there are ties, jittering is used in ks.gof
.
Steve Su
Stephens, M. A. (1986). Tests based on EDF statistics. In Goodness- of-Fit Techniques. D'Agostino, R. B. and Stevens, M. A., eds. New York: Marcel Dekker.
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Nmerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Computational Statistics and Data Analysis: *51*, 8, 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
# Fit the faithful[,1] data from the MASS library fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rprs",init2.sel="rmfmkl", init1=c(-1.5,1,5),init2=c(-0.25,1.5),leap1=3,leap2=3) # Run diagnostic KS tests fun.diag.ks.g.bimodal(fit1$par[1:4],fit1$par[5:8],prop1=fit1$par[9], data=faithful[,1],param1="rs",param2="fmkl")
# Fit the faithful[,1] data from the MASS library fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rprs",init2.sel="rmfmkl", init1=c(-1.5,1,5),init2=c(-0.25,1.5),leap1=3,leap2=3) # Run diagnostic KS tests fun.diag.ks.g.bimodal(fit1$par[1:4],fit1$par[5:8],prop1=fit1$par[9], data=faithful[,1],param1="rs",param2="fmkl")
This function is primarily designed to be used for testing the fitted
distribution with reference to a theoretical distribution. It is also
tailored for output obtained from the fun.data.fit.ml
function.
fun.diag1(result, test, no.test = 1000, alpha = 0.05)
fun.diag1(result, test, no.test = 1000, alpha = 0.05)
result |
Output from |
test |
Simulated observations from theoretical distribution, the length
should be no.test |
no.test |
Number of times to do the KS tests. |
alpha |
Significance level of KS test. |
A vector showing the number of times the KS p-value is greater than alpha for each of the distribution fit strategy.
If there are ties, jittering is used in ks.gof
.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su, S. (2007). Numerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Journal of Computational statistics and data analysis 51(8) 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.diag2
, fun.diag.ks.g
,
fun.diag.ks.g.bimodal
# Fits a Weibull 5,2 distribution: weibull.approx.ml<-fun.data.fit.ml(rweibull(1000,5,2)) # Compute the resample K-S test results. fun.diag1(weibull.approx.ml, rweibull(100000, 5, 2))
# Fits a Weibull 5,2 distribution: weibull.approx.ml<-fun.data.fit.ml(rweibull(1000,5,2)) # Compute the resample K-S test results. fun.diag1(weibull.approx.ml, rweibull(100000, 5, 2))
This function is primarily designed to be used for testing the fitted
distribution with reference to an empirical data. It is also
tailored for output obtained from the fun.data.fit.ml
function.
fun.diag2(result, data, no.test = 1000, len=100, alpha = 0.05)
fun.diag2(result, data, no.test = 1000, len=100, alpha = 0.05)
result |
Output from |
data |
Observations in which the distribution was fitted upon. |
no.test |
Number of times to do the KS tests. |
len |
Number of observations to sample from the data. This is also the number of observations sampled from the fitted distribution in each KS test. |
alpha |
Significance level of KS test. |
A vector showing the number of times the KS p-value is greater than alpha for each of the distribution fit strategy.
If there are ties, jittering is used in ks.gof
.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su, S. (2007). Numerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Journal of Computational statistics and data analysis 51(8) 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.diag1
, fun.diag.ks.g
,
fun.diag.ks.g.bimodal
# Fits a Normal 3,2 distribution: junk<-rnorm(1000,3,2) fit<-fun.data.fit.ml(junk) # Compute the resample K-S test results. fun.diag2(fit,junk)
# Fits a Normal 3,2 distribution: junk<-rnorm(1000,3,2) fit<-fun.data.fit.ml(junk) # Compute the resample K-S test results. fun.diag2(fit,junk)
This function supplements fun.nclass.e
and it is not intended to
be used by the users directly.
fun.disc.estimation(x, nint)
fun.disc.estimation(x, nint)
x |
A vector of observations. |
nint |
Number of intervals to cut the vectors into. |
The function cuts the vector into evenly spaced categories and estimate the mean and variance of the actual data based on the categorisation.
Two numerical values, the first being the mean and the second being the variance.
Steve Su
## Cut up a randomly normally distributed observations into 5 evenly spaced ## categories and estimate the mean and variance based on this cateogorisation. junk<-rnorm(1000,3,2) fun.disc.estimation(junk,5)
## Cut up a randomly normally distributed observations into 5 evenly spaced ## categories and estimate the mean and variance based on this cateogorisation. junk<-rnorm(1000,3,2) fun.disc.estimation(junk,5)
This function calls the runif.sobol
, runif.sobol.owen
and
runif.halton
essentially from the spacefillr package.
fun.gen.qrn(n, dimension, scrambling, FUN = "runif.sobol")
fun.gen.qrn(n, dimension, scrambling, FUN = "runif.sobol")
n |
Number to generate. |
dimension |
Number of dimensions. |
scrambling |
seed used, or leap as in the case of
|
FUN |
This can be |
A vector of values if dimension=1, otherwise a matrix of values between 0 and 1.
Steve Su
fun.gen.qrn(1000,5,3,"runif.sobol") fun.gen.qrn(1000,5,409,"QUnif")
fun.gen.qrn(1000,5,3,"runif.sobol") fun.gen.qrn(1000,5,409,"QUnif")
This function computes the first four L moments for both RS and FMKL generalised lambda distributions.
fun.lm.theo.gld(L1, L2, L3, L4, param)
fun.lm.theo.gld(L1, L2, L3, L4, param)
L1 |
Lambda 1. Or c(Lambda 1,Lambda 2,Lambda 3,Lambda 4). |
L2 |
Lambda 2. |
L3 |
Lambda 3. |
L4 |
Lambda 4. |
param |
"rs" or "fmkl" or "fkml" |
A vector listing the first four L moments
Sometimes the theoretical moments may not exist, in those cases,
NA
is returned.
Steve Su
Asquith, W. (2007). "L-moments and TL-moments of the generalized lambda distribution." Computational Statistics and Data Analysis 51(9): 4484-4496.
Karvanen, J. and A. Nuutinen (2008). "Characterizing the generalized lambda distribution by L-moments." Computational Statistics and Data Analysis 52(4): 1971-1983.
fun.lm.theo.gld(1, 2, 3, 4, "rs") fun.lm.theo.gld(1, 2, 3, 4, "fmkl")
fun.lm.theo.gld(1, 2, 3, 4, "rs") fun.lm.theo.gld(1, 2, 3, 4, "fmkl")
This is a generic function that can be used to find mean, variance, sum or other operations according to some index imposed on the matrix or vector.
fun.mApply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
fun.mApply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
X |
Matrix with n rows. |
INDEX |
Vector or list of vectors of length n. |
FUN |
Function to operate on submatrices of |
... |
Arguments to function. |
simplify |
Set as |
If FUN
returns more than one number, fun.mApply
returns a matrix
with rows corresponding to unique values of INDEX
.
Tony Plate
# Finding the row medians of a matrix (matrix(1:20,nrow=5)) fun.mApply(matrix(1:20,nrow=5),list(1:5),median)
# Finding the row medians of a matrix (matrix(1:20,nrow=5)) fun.mApply(matrix(1:20,nrow=5),list(1:5),median)
This function checks the lowest and highest quantiles of the specified GLDs against the specified dataset
fun.minmax.check.gld(data, lambdas, param, lessequalmin = 1, greaterequalmax = 1)
fun.minmax.check.gld(data, lambdas, param, lessequalmin = 1, greaterequalmax = 1)
data |
A vector of numerical dataset |
lambdas |
A matrix of four columns representing lambda 1 to lambda 4 of the GLD |
param |
Can be "rs", "fkml" or "fmkl" |
lessequalmin |
Can be 0 or 1 |
greaterequalmax |
Can be 0 or 1 |
lessequalmin==1 means the lowest value of GLD <= minimum value of data lessequalmin==0 means the lowest value of GLD < minimum value of data greaterequalmin==1 means the highest value of GLD >= maximum value of data greaterequalmin==0 means the highest value of GLD > maximum value of data
A vector of logical values indicating whether the specified data the specified GLDs cover the minimum and the maximum values in a dataset
Steve Su
fun.minmax.check.gld(runif(100,.9,1),matrix(1:12,ncol=4),param="rs",0,0) fun.minmax.check.gld(runif(100,.98,1),matrix(1:12,ncol=4),param="fkml",1,1)
fun.minmax.check.gld(runif(100,.9,1),matrix(1:12,ncol=4),param="rs",0,0) fun.minmax.check.gld(runif(100,.98,1),matrix(1:12,ncol=4),param="fkml",1,1)
This functions compute the mean, variance, skewness and kurtosis of the fitted generalised lambda distribution mixtures using Monte Carlo simulation.
fun.moments.bimodal(result1, result2, prop1, prop2, len = 1000, no.test = 1000, param1, param2)
fun.moments.bimodal(result1, result2, prop1, prop2, len = 1000, no.test = 1000, param1, param2)
result1 |
A vector comprising four values for the first generalised lambda distribution. |
result2 |
A vector comprising four values for the second generalised lambda distribution. |
prop1 |
Proportion of the first generalised lambda distribution |
prop2 |
1-prop1, this can be left unspecified. |
len |
Length of object for each simulation run. |
no.test |
Number of simulation run. |
param1 |
This can be |
param2 |
This can be |
There is also a theoretical computation of the moments in
fun.theo.bi.mv.gld
, it should be noted
that the theoretical moments may not exist. The length of object in len
means how many observations should
be generated in each simulation run, with the number of simulation runs governed
by no.test
.
A matrix with four columns showing the mean, variance, skewness and kurtosis of the fitted generalised lambda distribution mixtures using Monte Carlo simulation. Each row represents a simulation run.
Steve Su
fun.theo.bi.mv.gld
, fun.simu.bimodal
,
fun.rawmoments
# Fitting the first column of the Old Faithful Geyser data fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rmfmkl",init2.sel="rmfmkl", init1=c(-0.25,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3) # After fitting compute the monte carlo moments using fun.moments.bimodal fun.moments.bimodal(fit1$par[1:4],fit1$par[5:8],prop1=fit1$par[9], param1="fmkl",param2="fmkl") # It is also possible to compare this with the moments of the original dataset: fun.moments(faithful[,1])
# Fitting the first column of the Old Faithful Geyser data fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rmfmkl",init2.sel="rmfmkl", init1=c(-0.25,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3) # After fitting compute the monte carlo moments using fun.moments.bimodal fun.moments.bimodal(fit1$par[1:4],fit1$par[5:8],prop1=fit1$par[9], param1="fmkl",param2="fmkl") # It is also possible to compare this with the moments of the original dataset: fun.moments(faithful[,1])
This function evaluates the mean, variance, skewness and kurtosis of a numerical vector. Missing values are automatically removed.
fun.moments.r(x, normalise = "N")
fun.moments.r(x, normalise = "N")
x |
A numeric vector |
normalise |
"Y" if you want kurtosis to be calculated with reference to kurtosis = 0 under Normal distribution. Default is "N". |
A vector of mean, variance, skewness and kurtosis.
Please contact the author directly if you find a bug!
Steve Su
fun.moments.r(rnorm(1000)) fun.moments.r(rnorm(1000),normalise="Y")
fun.moments.r(rnorm(1000)) fun.moments.r(rnorm(1000),normalise="Y")
Support function for discretised method of fitting distribution to data.
fun.nclass.e(x)
fun.nclass.e(x)
x |
Vector of data. |
This function calculates the mean and variance of the discretised data from 1 to the very last observation and chooses the best number of categories that represent the mean and variance of the actual data set through the criterion of squared deviations.
A numerical value suggesting the best number of class that can be used to represent the mean and variane of the original data set.
This is not designed to be called directly by end user.
Steve Su
fun.nclass.e(rnorm(100,3,2))
fun.nclass.e(rnorm(100,3,2))
This function is designed for univariate generalised lambda distribution fits only.
fun.plot.fit(fit.obj, data, nclass = 50, xlab = "", name = "", param.vec, ylab="Density", main="")
fun.plot.fit(fit.obj, data, nclass = 50, xlab = "", name = "", param.vec, ylab="Density", main="")
fit.obj |
Fitted object from |
data |
Dataset to be plotted. |
nclass |
Number of class of histogram, the default is 50. |
xlab |
Label on the x axis. |
name |
Naming the type of distribution fits. |
param.vec |
A vector describing the type of generalised lambda
distribution used in the |
ylab |
Label on the y axis. |
main |
Title of the graph. |
A graphical output showing the data and the resulting distributional fits.
If the distribution fits over fits the peak of the distribution, it can be difficult to see the actual data set.
Steve Su
fun.plot.fit.bm
, fun.data.fit.ml
,
fun.data.fit.hs
, fun.data.fit.hs.nw
,
fun.RPRS.ml
, fun.RMFMKL.ml
,
fun.RPRS.hs
, fun.RMFMKL.hs
,
fun.RPRS.hs.nw
, fun.RMFMKL.hs.nw
# Generate Normally distribute random numbers as dataset junk<-rnorm(1000,3,2) # Fit the data set using fun.data.fit.ml. # Also, fun.data.fit.hs or fun.data.fit.hs.nw can be used. obj.fit<-fun.data.fit.ml(junk) # Plot the resulting fits fun.plot.fit(obj.fit,junk,xlab="x",name=".ML",param.vec=c("rs","fmkl","fmkl")) # This function also works for singular fits such as those by fun.RPRS.ml, # fun.RMFMKL.ml, fun.RPRS.hs, fun.RMFMKL.hs, fun.RPRS.hs.nw, fun.RMFMKL.hs.nw junk<-rnorm(1000,3,2) obj.fit<-fun.RPRS.ml(junk) fun.plot.fit(obj.fit,junk,xlab="x",name="RPRS.ML",param.vec=c("rs"))
# Generate Normally distribute random numbers as dataset junk<-rnorm(1000,3,2) # Fit the data set using fun.data.fit.ml. # Also, fun.data.fit.hs or fun.data.fit.hs.nw can be used. obj.fit<-fun.data.fit.ml(junk) # Plot the resulting fits fun.plot.fit(obj.fit,junk,xlab="x",name=".ML",param.vec=c("rs","fmkl","fmkl")) # This function also works for singular fits such as those by fun.RPRS.ml, # fun.RMFMKL.ml, fun.RPRS.hs, fun.RMFMKL.hs, fun.RPRS.hs.nw, fun.RMFMKL.hs.nw junk<-rnorm(1000,3,2) obj.fit<-fun.RPRS.ml(junk) fun.plot.fit(obj.fit,junk,xlab="x",name="RPRS.ML",param.vec=c("rs"))
This function is designed for mixture of two generalised lambda distributions only.
fun.plot.fit.bm(fit.obj, data, nclass = 50, xlab = "", name = "", main="", param.vec, ylab="Density")
fun.plot.fit.bm(fit.obj, data, nclass = 50, xlab = "", name = "", main="", param.vec, ylab="Density")
fit.obj |
Fitted object from |
data |
Dataset to be plotted. |
nclass |
Number of class of histogram, the default is 50. |
xlab |
Label on the x axis. |
name |
Legend, usually used to identify type of GLD used if |
main |
Title of the graph. |
param.vec |
A vector describing the type of generalised lambda
distribution used in the |
ylab |
Label on the y axis. |
A graphical output showing the data and the resulting distributional fits.
If the distribution fits over fits the peak of the distribution, it can be difficult to see the actual data set.
Steve Su
fun.auto.bimodal.ml
, fun.auto.bimodal.pml
,
fun.plot.fit
opar <- par() par(mfrow=c(2,1)) # Fitting mixture of generalised lambda distributions on the data set using # both the maximum likelihood and partition maximum likelihood and plot # the resulting fits junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.1,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) junk<-fun.auto.bimodal.pml(faithful[,1],clustering.m=clara,init1.sel="rprs", init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5),leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Partition maximum likelihood using",xlab="faithful1", param.vec=c("rs","fmkl")) junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.1,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], main="Mixture distribution fit", name="RS and FMKL GLD",xlab="faithful1",param.vec=c("rs","fmkl")) par(opar)
opar <- par() par(mfrow=c(2,1)) # Fitting mixture of generalised lambda distributions on the data set using # both the maximum likelihood and partition maximum likelihood and plot # the resulting fits junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.1,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) junk<-fun.auto.bimodal.pml(faithful[,1],clustering.m=clara,init1.sel="rprs", init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5),leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Partition maximum likelihood using",xlab="faithful1", param.vec=c("rs","fmkl")) junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.1,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1,5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], main="Mixture distribution fit", name="RS and FMKL GLD",xlab="faithful1",param.vec=c("rs","fmkl")) par(opar)
This is a variant of the fun.plot.fit
function.
fun.plot.many.gld(fit.obj, data, xlab="", ylab="Density", main="", legd="", param.vec)
fun.plot.many.gld(fit.obj, data, xlab="", ylab="Density", main="", legd="", param.vec)
fit.obj |
A matrix of generalised lambda distibutions parameters
from |
data |
Dataset to be plotted or two values showing the ranges of value to be compared. |
xlab |
X-axis labels. |
ylab |
Y-axis labels. |
main |
Title for the plot. |
legd |
Legend for the plot. |
param.vec |
A vector showing the types of generalised lambda distributions. This can be "rs" or "fmkl", only needed if you want to put your own parameters for generalised lambda distributions which are not generated from a fitting algorithm in this package. |
A graph showing the different distributions on the same page.
The data part of the function is not plotted, to see the dataset use the
fun.plot.fit
function.
Steve Su
# Fit the dataset junk<-rnorm(1000,3,2) result.hs<-fun.data.fit.hs(junk,rs.default = "Y", fmkl.default = "Y", rs.leap=3, fmkl.leap=3,rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), no.c.rs=50,no.c.fmkl=50) opar <- par() par(mfrow=c(2,2)) # Plot the entire data range fun.plot.many.gld(result.hs,junk,"x","density","", legd=c("RPRS.hs", "RMFMKL.hs")) # Plot and compare parts of the distributions fun.plot.many.gld(result.hs,c(1,2),"x","density","",legd=c("RPRS.hs", "RMFMKL.hs")) fun.plot.many.gld(result.hs,c(0.1,0,2),"x","density","",legd=c("RPRS.hs", "RMFMKL.hs")) fun.plot.many.gld(result.hs,c(3,4),"x","density","",legd=c("RPRS.hs", "RMFMKL.hs")) par(opar)
# Fit the dataset junk<-rnorm(1000,3,2) result.hs<-fun.data.fit.hs(junk,rs.default = "Y", fmkl.default = "Y", rs.leap=3, fmkl.leap=3,rs.init = c(-1.5, 1.5), fmkl.init = c(-0.25, 1.5), no.c.rs=50,no.c.fmkl=50) opar <- par() par(mfrow=c(2,2)) # Plot the entire data range fun.plot.many.gld(result.hs,junk,"x","density","", legd=c("RPRS.hs", "RMFMKL.hs")) # Plot and compare parts of the distributions fun.plot.many.gld(result.hs,c(1,2),"x","density","",legd=c("RPRS.hs", "RMFMKL.hs")) fun.plot.many.gld(result.hs,c(0.1,0,2),"x","density","",legd=c("RPRS.hs", "RMFMKL.hs")) fun.plot.many.gld(result.hs,c(3,4),"x","density","",legd=c("RPRS.hs", "RMFMKL.hs")) par(opar)
This function is of theoretical interest only.
fun.rawmoments(L1, L2, L3, L4, param = "fmkl")
fun.rawmoments(L1, L2, L3, L4, param = "fmkl")
L1 |
Location parameter of the generalised lambda distribution. |
L2 |
Scale parameter of the generalised lambda distribution. |
L3 |
First shape parameter of the generalised lambda distribution. |
L4 |
Second shape parameter of the generalised lambda distribution. |
param |
|
This function is the building block for fun.theo.bi.mv.gld
.
A vector showing the raw moments of the specified generalised lambda distribution up to the fourth order.
Steve Su
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods *17*, 3547-3567.
Karian, Zaven A. and Dudewicz, Edward J. (2000), Fitting statistical distributions: the Generalized Lambda Distribution and Generalized Bootstrap methods, Chapman & Hall
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM *17*, 78-82.
~~objects to See Also as help
, ~~~
## Generate some random numbers using FMKL and RS generalised lambda ## distributions and then compute the empirical and theoretical ## E(X), E(X^2), E(X^3), E(X^4) junk<-rgl(100000,1,2,3,4) mean(junk) mean(junk^2) mean(junk^3) mean(junk^4) junk<-rgl(100000,1,2,3,4,"rs") mean(junk) mean(junk^2) mean(junk^3) mean(junk^4) fun.rawmoments(1,2,3,4) fun.rawmoments(1,2,3,4,"rs")
## Generate some random numbers using FMKL and RS generalised lambda ## distributions and then compute the empirical and theoretical ## E(X), E(X^2), E(X^3), E(X^4) junk<-rgl(100000,1,2,3,4) mean(junk) mean(junk^2) mean(junk^3) mean(junk^4) junk<-rgl(100000,1,2,3,4,"rs") mean(junk) mean(junk^2) mean(junk^3) mean(junk^4) fun.rawmoments(1,2,3,4) fun.rawmoments(1,2,3,4,"rs")
This function fits FMKL generalised distribution to data using discretised approach with weights. It is designed to act as a smoother device rather than as a definitive fit.
fun.RMFMKL.hs(data, default = "Y", fmkl.init = c(-0.25, 1.5), no.c.fmkl = 50, leap = 3,FUN="runif.sobol",no=10000)
fun.RMFMKL.hs(data, default = "Y", fmkl.init = c(-0.25, 1.5), no.c.fmkl = 50, leap = 3,FUN="runif.sobol",no=10000)
data |
Dataset to be fitted |
default |
If yes, this function uses the default method
|
fmkl.init |
Initial values for FMKL distribution optimization,
|
no.c.fmkl |
Number of classes or bins of histogram to be optimized over.
This argument is ineffective if |
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function optimises the deviations of frequency of the bins to that of the theoretical so it has the effect of "fitting clothes" onto the data set. The user can decide the frequency of the bins they want the distribution to smooth over. The resulting fit may or may not be an adequate fit from a formal statistical point of view such as satisfying the goodness of fit for example, but it can be useful to suggest the range of different distributions exhibited by the data set. The default number of classes calculates the mean and variance after categorising the data into different bins and uses the number of classes that best matches the mean and variance of the original, ungrouped data. The weighting is designed to accentuate the peak or the dense part of the distribution and suppress the tails.
A vector representing four parametefmkl of the FMKL generalised lambda distribution.
In some cases, the resulting fit may not converge, there are currently no checking mechanism in place to ensure global convergence.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RMFMKL.hs.nw
, fun.RPRS.hs.nw
,
fun.RPRS.hs
, fun.data.fit.hs
,
fun.data.fit.hs.nw
# Using the default number of classes fun.RMFMKL.hs(data=rnorm(1000,3,2),default="Y",fmkl.init=c(-0.25,1.5),leap=3) # Using 20 classes fun.RMFMKL.hs(data=rnorm(1000,3,2),default="N",fmkl.init=c(-0.25,1.5), no.c.fmkl=20,leap=3)
# Using the default number of classes fun.RMFMKL.hs(data=rnorm(1000,3,2),default="Y",fmkl.init=c(-0.25,1.5),leap=3) # Using 20 classes fun.RMFMKL.hs(data=rnorm(1000,3,2),default="N",fmkl.init=c(-0.25,1.5), no.c.fmkl=20,leap=3)
This function fits FMKL generalised distribution to data using discretised approach without weights. It is designed to act as a smoother device rather than as a definitive fit.
fun.RMFMKL.hs.nw(data, default = "Y", fmkl.init = c(-0.25, 1.5), no.c.fmkl = 50, leap = 3,FUN="runif.sobol",no=10000)
fun.RMFMKL.hs.nw(data, default = "Y", fmkl.init = c(-0.25, 1.5), no.c.fmkl = 50, leap = 3,FUN="runif.sobol",no=10000)
data |
Dataset to be fitted |
default |
If yes, this function uses the default method
|
fmkl.init |
Initial values for FMKL distribution optimization,
|
no.c.fmkl |
Number of classes or bins of histogram to be optimized over.
This argument is ineffective if |
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function optimises the deviations of frequency of the bins to that of the theoretical so it has the effect of "fitting clothes" onto the data set. The user can decide the frequency of the bins they want the distribution to smooth over. The resulting fit may or may not be an adequate fit from a formal statistical point of view such as satisfying the goodness of fit for example, but it can be useful to suggest the range of different distributions exhibited by the data set. The default number of classes calculates the mean and variance after categorising the data into different bins and uses the number of classes that best matches the mean and variance of the original, ungrouped data.
A vector representing four parameters of the FMKL generalised lambda distribution.
In some cases, the resulting fit may not converge, there are currently no checking mechanism in place to ensure global convergence.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RPRS.hs.nw
, fun.RMFMKL.hs
,
fun.RPRS.hs
, fun.data.fit.hs
,
fun.data.fit.hs.nw
# Using the default number of classes fun.RMFMKL.hs.nw(data=rnorm(1000,3,2),default="Y", fmkl.init=c(-0.25,1.5),leap=3) # Using 20 classes fun.RMFMKL.hs.nw(data=rnorm(1000,6,5),default="N",fmkl.init=c(-0.25,1.5), no.c.fmkl=20,leap=3)
# Using the default number of classes fun.RMFMKL.hs.nw(data=rnorm(1000,3,2),default="Y", fmkl.init=c(-0.25,1.5),leap=3) # Using 20 classes fun.RMFMKL.hs.nw(data=rnorm(1000,6,5),default="N",fmkl.init=c(-0.25,1.5), no.c.fmkl=20,leap=3)
This function fits FMKL generalised lambda distribution to data set using L moment matching
fun.RMFMKL.lm(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
fun.RMFMKL.lm(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted |
fmkl.init |
Initial values for FMKL distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides method of L moment fitting scheme for FMKL GLD. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid FMKL generalised lambda distribution.
This function is based on scheme detailed in the literature below but it has been modified by the author (Steve Su).
A vector representing four parametefmkl of the FMKL generalised lambda distribution.
Steve Su
Asquith, W. (2007). "L-moments and TL-moments of the generalized lambda distribution." Computational Statistics and Data Analysis 51(9): 4484-4496.
Karvanen, J. and A. Nuutinen (2008). "Characterizing the generalized lambda distribution by L-moments." Computational Statistics and Data Analysis 52(4): 1971-1983.
fun.RMFMKL.ml
, fun.RMFMKL.mm
,
fun.RMFMKL.qs
, fun.data.fit.ml
fun.data.fit.lm
, fun.data.fit.qs
,
fun.data.fit.mm
# Fitting the normal distribution fun.RMFMKL.lm(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
# Fitting the normal distribution fun.RMFMKL.lm(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
This function fits FMKL generalised lambda distribution to data set using maximum likelihood estimation.
fun.RMFMKL.ml(data, fmkl.init = c(-0.25, 1.5), leap = 3,FUN="runif.sobol", no=10000)
fun.RMFMKL.ml(data, fmkl.init = c(-0.25, 1.5), leap = 3,FUN="runif.sobol", no=10000)
data |
Dataset to be fitted |
fmkl.init |
Initial values for FMKL distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides one of the definitive fit to data set using generalised lambda distributions.
A vector representing four parameters of the FMKL generalised lambda distribution.
Steve Su
Su, S. (2007). Numerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Journal of Computational statistics and data analysis 51(8) 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RPRS.ml
, fun.data.fit.ml
,
fun.data.fit.qs
# Fitting the normal distribution fun.RMFMKL.ml(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
# Fitting the normal distribution fun.RMFMKL.ml(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
This function fits FMKL generalised lambda distribution to data set using maximum likelihood estimation using faster implementation through C programming
fun.RMFMKL.ml.m(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
fun.RMFMKL.ml.m(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted |
fmkl.init |
Initial values for FMKL distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides one of the definitive fit to data set using generalised lambda distributions.
A vector representing four parameters of the FMKL generalised lambda distribution.
Steve Su
Su, S. (2007). Numerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Journal of Computational statistics and data analysis 51(8) 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
# Fitting the normal distribution fun.RMFMKL.ml.m(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
# Fitting the normal distribution fun.RMFMKL.ml.m(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
This function fits FMKL generalised lambda distribution to data set using moment matching
fun.RMFMKL.mm(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
fun.RMFMKL.mm(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted |
fmkl.init |
Initial values for FMKL distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides method of moment fitting scheme for FMKL GLD. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid FMKL generalised lambda distribution.
This function is based on scheme detailed in the literature below but it has been modified by the author (Steve Su).
A vector representing four parametefmkl of the FMKL generalised lambda distribution.
Steve Su
Karian, Z. and E. Dudewicz (2000). Fitting Statistical Distributions: The Generalized Lambda Distribution and Generalised Bootstrap Methods. New York, Chapman and Hall.
fun.RMFMKL.ml
, fun.RMFMKL.lm
,
fun.RMFMKL.qs
, fun.data.fit.ml
fun.data.fit.lm
, fun.data.fit.qs
,
fun.data.fit.mm
# Fitting the normal distribution fun.RMFMKL.mm(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
# Fitting the normal distribution fun.RMFMKL.mm(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
This function fits FMKL generalised lambda distribution to data set using quantile matching
fun.RMFMKL.qs(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000)
fun.RMFMKL.qs(data, fmkl.init = c(-0.25, 1.5), leap = 3, FUN = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000)
data |
Dataset to be fitted |
fmkl.init |
Initial values for FMKL distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
trial.n |
Number of evenly spaced quantile ranging from 0 to 1 to be
used in the checking phase, to find the best set of initial values for
optimisation, this is intended to be lower than |
len |
Number of evenly spaced quantile ranging from 0 to 1 to be used, default is 1000 |
type |
Type of quantile to be used, default is 7, see |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides quantile matching fitting scheme for FMKL GLD. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid FMKL generalised lambda distribution.
A vector representing four parametefmkl of the FMKL generalised lambda distribution.
Steve Su
Su (2008). Fitting GLD to data via quantile matching method. (Book chapter to appear)
fun.RMFMKL.ml
, fun.RMFMKL.lm
,
fun.RMFMKL.mm
, fun.data.fit.ml
fun.data.fit.lm
, fun.data.fit.qs
,
fun.data.fit.mm
# Fitting the normal distribution fun.RMFMKL.qs(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
# Fitting the normal distribution fun.RMFMKL.qs(data=rnorm(1000,2,3),fmkl.init=c(-0.25,1.5),leap=3)
This function fits RS generalised distribution to data using discretised approach with weights. It is designed to act as a smoother device rather than as a definitive fit.
fun.RPRS.hs(data, default = "Y", rs.init = c(-1.5, 1.5), no.c.rs = 50, leap = 3,FUN="runif.sobol",no=10000)
fun.RPRS.hs(data, default = "Y", rs.init = c(-1.5, 1.5), no.c.rs = 50, leap = 3,FUN="runif.sobol",no=10000)
data |
Dataset to be fitted |
default |
If yes, this function uses the default method
|
rs.init |
Initial values for RS distribution optimization,
|
no.c.rs |
Number of classes or bins of histogram to be optimized over.
This argument is ineffective if |
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function optimises the deviations of frequency of the bins to that of the theoretical so it has the effect of "fitting clothes" onto the data set. The user can decide the frequency of the bins they want the distribution to smooth over. The resulting fit may or may not be an adequate fit from a formal statistical point of view such as satisfying the goodness of fit for example, but it can be useful to suggest the range of different distributions exhibited by the data set. The default number of classes calculates the mean and variance after categorising the data into different bins and uses the number of classes that best matches the mean and variance of the original, ungrouped data. The weighting is designed to accentuate the peak or the dense part of the distribution and suppress the tails.
A vector representing four parameters of the RS generalised lambda distribution.
In some cases, the resulting fit may not converge, there are currently no checking mechanism in place to ensure global convergence. The RPRS method can sometimes fail if there are no valid percentiles in the data set or if initial values do not give a valid distribution.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RPRS.hs.nw
, fun.RMFMKL.hs.nw
,
fun.RMFMKL.hs
, fun.data.fit.hs
,
fun.data.fit.hs.nw
# Using the default number of classes fun.RPRS.hs(data=rnorm(1000,2,3),default="Y",rs.init=c(-1.5,1.5),leap=3) # Using 20 classes fun.RPRS.hs(data=rnorm(1000,2,3),default="N",rs.init=c(-1.5,1.5), no.c.rs=20,leap=3)
# Using the default number of classes fun.RPRS.hs(data=rnorm(1000,2,3),default="Y",rs.init=c(-1.5,1.5),leap=3) # Using 20 classes fun.RPRS.hs(data=rnorm(1000,2,3),default="N",rs.init=c(-1.5,1.5), no.c.rs=20,leap=3)
This function fits RS generalised distribution to data using discretised approach without weights. It is designed to act as a smoother device rather than as a definitive fit.
fun.RPRS.hs.nw(data, default = "Y", rs.init = c(-1.5, 1.5), no.c.rs = 50, leap = 3,FUN="runif.sobol",no=10000)
fun.RPRS.hs.nw(data, default = "Y", rs.init = c(-1.5, 1.5), no.c.rs = 50, leap = 3,FUN="runif.sobol",no=10000)
data |
Dataset to be fitted |
default |
If yes, this function uses the default method
|
rs.init |
Initial values for RS distribution optimization,
|
no.c.rs |
Number of classes or bins of histogram to be optimized over.
This argument is ineffective if |
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function optimises the deviations of frequency of the bins to that of the theoretical so it has the effect of "fitting clothes" onto the data set. The user can decide the frequency of the bins they want the distribution to smooth over. The resulting fit may or may not be an adequate fit from a formal statistical point of view such as satisfying the goodness of fit for example, but it can be useful to suggest the range of different distributions exhibited by the data set. The default number of classes calculates the mean and variance after categorising the data into different bins and uses the number of classes that best matches the mean and variance of the original, ungrouped data.
A vector representing four parameters of the RS generalised lambda distribution.
In some cases, the resulting fit may not converge, there are currently no checking mechanism in place to ensure global convergence. The RPRS method can sometimes fail if there are no valid percentiles in the data set or if initial values do not give a valid distribution.
Steve Su
Su, S. (2005). A Discretized Approach to Flexibly Fit Generalized Lambda Distributions to Data. Journal of Modern Applied Statistical Methods (November): 408-424.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RPRS.hs.nw
, fun.RPRS.hs
,
fun.RMFMKL.hs
, fun.data.fit.hs
,
fun.data.fit.hs.nw
# Using the default number of classes fun.RPRS.hs.nw(data=rnorm(1000,3,2),default="Y",rs.init=c(-1.5,1.5),leap=3) # Using 20 classes fun.RPRS.hs.nw(data=rnorm(1000,3,2),default="N",rs.init=c(-1.5,1.5), no.c.rs=20,leap=3)
# Using the default number of classes fun.RPRS.hs.nw(data=rnorm(1000,3,2),default="Y",rs.init=c(-1.5,1.5),leap=3) # Using 20 classes fun.RPRS.hs.nw(data=rnorm(1000,3,2),default="N",rs.init=c(-1.5,1.5), no.c.rs=20,leap=3)
This function fits RS generalised lambda distribution to data set using L moment matching
fun.RPRS.lm(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
fun.RPRS.lm(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted |
rs.init |
Initial values for RS distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides method of L moment fitting scheme for RS GLD. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid RS generalised lambda distribution.
This function is based on scheme detailed in the literature below but it has been modified by the author (Steve Su).
A vector representing four parameters of the RS generalised lambda distribution.
Steve Su
Asquith, W. (2007). "L-moments and TL-moments of the generalized lambda distribution." Computational Statistics and Data Analysis 51(9): 4484-4496.
Karvanen, J. and A. Nuutinen (2008). "Characterizing the generalized lambda distribution by L-moments." Computational Statistics and Data Analysis 52(4): 1971-1983.
fun.RPRS.ml
, fun.RPRS.mm
,
fun.RPRS.qs
, fun.data.fit.ml
fun.data.fit.lm
, fun.data.fit.qs
,
fun.data.fit.mm
# Fitting the normal distribution fun.RPRS.lm(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
# Fitting the normal distribution fun.RPRS.lm(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
This function fits RS generalised lambda distribution to data set using maximum likelihood estimation.
fun.RPRS.ml(data, rs.init = c(-1.5, 1.5), leap = 3,FUN="runif.sobol",no=10000)
fun.RPRS.ml(data, rs.init = c(-1.5, 1.5), leap = 3,FUN="runif.sobol",no=10000)
data |
Dataset to be fitted |
rs.init |
Initial values for RS distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides one of the definitive fit to data set using generalised lambda distributions. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid RS generalised lambda distribution.
A vector representing four parameters of the RS generalised lambda distribution.
Steve Su
Su, S. (2007). Numerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Computational statistics and data analysis 51(8) 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
fun.RMFMKL.ml
, fun.data.fit.ml
,
fun.data.fit.qs
# Fitting the normal distribution fun.RPRS.ml(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
# Fitting the normal distribution fun.RPRS.ml(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
This function fits RS generalised lambda distribution to data set using maximum likelihood estimation using faster implementation through C programming
fun.RPRS.ml.m(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
fun.RPRS.ml.m(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted |
rs.init |
Initial values for RS distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides one of the definitive fit to data set using generalised lambda distributions. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid RS generalised lambda distribution.
A vector representing four parameters of the RS generalised lambda distribution.
Steve Su
Su, S. (2007). Numerical Maximum Log Likelihood Estimation for Generalized Lambda Distributions. Computational statistics and data analysis 51(8) 3983-3998.
Su (2007). Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R. Journal of Statistical Software: *21* 9.
# Fitting the normal distribution fun.RPRS.ml.m(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
# Fitting the normal distribution fun.RPRS.ml.m(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
This function fits RS generalised lambda distribution to data set using moment matching
fun.RPRS.mm(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
fun.RPRS.mm(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", no = 10000)
data |
Dataset to be fitted |
rs.init |
Initial values for RS distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides method of moment fitting scheme for RS GLD. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid RS generalised lambda distribution.
This function is based on scheme detailed in the literature below but it has been modified by the author (Steve Su).
A vector representing four parameters of the RS generalised lambda distribution.
Steve Su
Karian, Z. and E. Dudewicz (2000). Fitting Statistical Distributions: The Generalized Lambda Distribution and Generalised Bootstrap Methods. New York, Chapman and Hall.
fun.RPRS.ml
, fun.RPRS.lm
,
fun.RPRS.qs
, fun.data.fit.ml
fun.data.fit.lm
, fun.data.fit.qs
,
fun.data.fit.mm
# Fitting the normal distribution fun.RPRS.mm(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
# Fitting the normal distribution fun.RPRS.mm(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
This function fits RS generalised lambda distribution to data set using quantile matching
fun.RPRS.qs(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000)
fun.RPRS.qs(data, rs.init = c(-1.5, 1.5), leap = 3, FUN = "runif.sobol", trial.n = 100, len = 1000, type = 7, no = 10000)
data |
Dataset to be fitted |
rs.init |
Initial values for RS distribution optimization,
|
leap |
See scrambling argument in |
FUN |
A character string of either |
trial.n |
Number of evenly spaced quantile ranging from 0 to 1 to be
used in the checking phase, to find the best set of initial values for
optimisation, this is intended to be lower than |
len |
Number of evenly spaced quantile ranging from 0 to 1 to be used, default is 1000 |
type |
Type of quantile to be used, default is 7, see |
no |
Number of initial random values to find the best initial values for optimisation. |
This function provides quantile matching fitting scheme for RS GLD. Note this function can fail if there are no defined percentiles from the data set or if the initial values do not lead to a valid RS generalised lambda distribution.
A vector representing four parameters of the RS generalised lambda distribution.
Steve Su
Su (2008). Fitting GLD to data via quantile matching method. (Book chapter to appear)
fun.RPRS.ml
, fun.RPRS.lm
,
fun.RPRS.mm
, fun.data.fit.ml
fun.data.fit.lm
, fun.data.fit.qs
,
fun.data.fit.mm
# Fitting the normal distribution fun.RPRS.qs(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
# Fitting the normal distribution fun.RPRS.qs(data=rnorm(1000,2,3),rs.init=c(-1.5,1.5),leap=3)
This function allows the user to simulate observations from a mixture of two generalised lambda distributions. It can be very useful for sensitivity analysis.
fun.simu.bimodal(result1, result2, prop1, prop2, len = 1000, no.test = 1000, param1, param2)
fun.simu.bimodal(result1, result2, prop1, prop2, len = 1000, no.test = 1000, param1, param2)
result1 |
A vector comprising four values for the first generalised lambda distribution. |
result2 |
A vector comprising four values for the second generalised lambda distribution. |
prop1 |
Proportion of the first generalised lambda distribution |
prop2 |
1-prop1, this can be left unspecified. |
len |
Length of object for each simulation run. |
no.test |
Number of simulation run. |
param1 |
This can be |
param2 |
This can be |
The length of object in len
means how many observations should
be generated in each simulation run, with the number of simulation runs governed
by no.test
.
A list with length equal to the number of simulation runs. Each subset of the
list has random observations equal to the the number specified in
len
.
Steve Su
fun.theo.bi.mv.gld
, fun.moments.bimodal
,
fun.rawmoments
# Generate random observations from FMKL generalised lambda distributions with # parameters (1,2,3,4) and (4,3,2,1) with 50% of data from each distribution. junk<-fun.simu.bimodal(c(1,2,3,4),c(4,3,2,1),prop1=0.5,param1="fmkl", param2="fmkl") # Calculate the maximum number from each simulation run sapply(junk,max) # Calculate the median from each simulation run sapply(junk,median)
# Generate random observations from FMKL generalised lambda distributions with # parameters (1,2,3,4) and (4,3,2,1) with 50% of data from each distribution. junk<-fun.simu.bimodal(c(1,2,3,4),c(4,3,2,1),prop1=0.5,param1="fmkl", param2="fmkl") # Calculate the maximum number from each simulation run sapply(junk,max) # Calculate the median from each simulation run sapply(junk,median)
This is the bimodal counterpart for fun.comp.moments.ml.2
and
fun.comp.moments.ml
.
fun.theo.bi.mv.gld(L1, L2, L3, L4, param1, M1, M2, M3, M4, param2, p1, normalise="N")
fun.theo.bi.mv.gld(L1, L2, L3, L4, param1, M1, M2, M3, M4, param2, p1, normalise="N")
L1 |
Location parameter of the first generalised lambda distribution. Or all the parameters of mixture distribution in the form of c(L1,L2,L3,L4,M1,M2,M3,M4,p), you still must specify param1 and param2. |
L2 |
Scale parameter of the first generalised lambda distribution. |
L3 |
First shape parameter of the first generalised lambda distribution. |
L4 |
Second shape parameter of the first generalised lambda distribution. |
param1 |
|
M1 |
Location parameter of the second generalised lambda distribution |
M2 |
Scale parameter of the second generalised lambda distribution. |
M3 |
First shape parameter of the second generalised lambda distribution. |
M4 |
Second shape parameter of the second generalised lambda distribution. |
param2 |
|
p1 |
Proportion of the first generalisd lambda distribution. |
normalise |
"Y" if you want kurtosis to be calculated with reference to kurtosis = 0 under Normal distribution. |
A vector showing the theoretical mean, variance, skewness and kurtosis for mixture of two generalised lambda distributions.
The theoretical moments may not always exist for generalised lambda distributions.
Steve Su
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods *17*, 3547-3567.
Karian, Zaven A. and Dudewicz, Edward J. (2000), Fitting statistical distributions: the Generalized Lambda Distribution and Generalized Bootstrap methods, Chapman & Hall
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM *17*, 78-82.
fun.moments.bimodal
, fun.simu.bimodal
,
fun.rawmoments
# Fits the Old Faithful geyser data (first column) using the maximum # likelihood. fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rmfmkl",init2.sel="rmfmkl", init1=c(-0.25,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3) # Find the theoretical moments of the fit fun.theo.bi.mv.gld(fit1$par[1],fit1$par[2],fit1$par[3],fit1$par[4],"fmkl", fit1$par[5],fit1$par[6],fit1$par[7],fit1$par[8],"fmkl",fit1$par[9]) # Compare this with the empirical moments from the data set. fun.moments(faithful[,1])
# Fits the Old Faithful geyser data (first column) using the maximum # likelihood. fit1<-fun.auto.bimodal.ml(faithful[,1],init1.sel="rmfmkl",init2.sel="rmfmkl", init1=c(-0.25,1.5),init2=c(-0.25,1.5),leap1=3,leap2=3) # Find the theoretical moments of the fit fun.theo.bi.mv.gld(fit1$par[1],fit1$par[2],fit1$par[3],fit1$par[4],"fmkl", fit1$par[5],fit1$par[6],fit1$par[7],fit1$par[8],"fmkl",fit1$par[9]) # Compare this with the empirical moments from the data set. fun.moments(faithful[,1])
Computes the "mean","variance","skewness","kurtosis" statistics from a given generalised lambda distribution.
fun.theo.mv.gld(L1, L2, L3, L4, param, normalise="N")
fun.theo.mv.gld(L1, L2, L3, L4, param, normalise="N")
L1 |
Lambda 1. Or c(Lambda 1,Lambda 2,Lambda 3,Lambda 4). |
L2 |
Lambda 2. |
L3 |
Lambda 3. |
L4 |
Lambda 4. |
param |
"rs" or "fmkl" or "fkml" |
normalise |
"Y" if you want kurtosis to be calculated with reference to kurtosis = 0 under Normal distribution. |
A vector listing the values of mean, variance, skewness and kurtosis.
Sometimes the theoretical moments may not exist, in those cases,
NA
is returned.
Steve Su
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods *17*, 3547-3567.
Gilchrist, Warren G. (2000), Statistical Modelling with Quantile Functions, Chapman & Hall
Karian, Z.A., Dudewicz, E.J., and McDonald, P. (1996), The extended generalized lambda distribution system for fitting distributions to data: history, completion of theory, tables, applications, the “Final Word” on Moment fits, Communications in Statistics - Simulation and Computation *25*, 611-642.
Karian, Zaven A. and Dudewicz, Edward J. (2000), Fitting statistical distributions: the Generalized Lambda Distribution and Generalized Bootstrap methods, Chapman & Hall
fun.comp.moments.ml
,
fun.comp.moments.ml.2
, fun.lm.theo.gld
fun.theo.mv.gld(1, 2, 3, 4, "rs") fun.theo.mv.gld(1, 2, 3, 4, "fmkl")
fun.theo.mv.gld(1, 2, 3, 4, "rs") fun.theo.mv.gld(1, 2, 3, 4, "fmkl")
Returns an integer vector showing the position of zero values in the data.
fun.which.zero(data)
fun.which.zero(data)
data |
A vector of data. |
An integer vector showing the position of zero values in the data.
Any missing values will be returned as missing.
Steve Su
# Finding where the zeros are in this vector: c(0,1,2,3,4,0,2) fun.which.zero(c(0,1,2,3,4,0,2)) # Finding where the zeros are in this vector: c(0,1,2,3,NA,0,2) fun.which.zero(c(0,1,2,3,NA,0,2))
# Finding where the zeros are in this vector: c(0,1,2,3,4,0,2) fun.which.zero(c(0,1,2,3,4,0,2)) # Finding where the zeros are in this vector: c(0,1,2,3,NA,0,2) fun.which.zero(c(0,1,2,3,NA,0,2))
This function returns a vector after removing all the zeros.
fun.zero.omit(object)
fun.zero.omit(object)
object |
A vector of data. |
Returns a vector after removing zeros and also give information on the number of zeros in the data removed.
Missing value and Inf values are not removed in this zero removing process.
Steve Su
# Removing zero entries from the vector c(0,1,2,3,4,0,2) fun.zero.omit(c(0,1,2,3,4,0,2))
# Removing zero entries from the vector c(0,1,2,3,4,0,2) fun.zero.omit(c(0,1,2,3,4,0,2))
An alternative to the gl.check.lambda
function in gld package.
gl.check.lambda.alt(l1, l2, l3, l4, param = "fmkl")
gl.check.lambda.alt(l1, l2, l3, l4, param = "fmkl")
l1 |
Lambda 1. |
l2 |
Lambda 2. |
l3 |
Lambda 3. |
l4 |
Lambda 4. |
param |
|
This version differs from gl.check.lambda
in gld library.
A logical value, TRUE
or FALSE
. TRUE
indicates the
parameters given is a valid probability distribution.
Steve Su
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods *17*, 3547-3567.
Karian, Z.E., Dudewicz, E.J., and McDonald, P. (1996), The extended generalized lambda distribution system for fitting distributions to data: history, completion of theory, tables, applications, the “Final Word” on Moment fits, Communications in Statistics - Simulation and Computation *25*, 611-642.
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM *17*, 78-82.
gl.check.lambda.alt(0,1,.23,4.5,param="fmkl") ## TRUE gl.check.lambda.alt(0,-1,.23,4.5,param="fmkl") ## FALSE gl.check.lambda.alt(0,1,0.5,-0.5,param="rs") ## FALSE
gl.check.lambda.alt(0,1,.23,4.5,param="fmkl") ## TRUE gl.check.lambda.alt(0,-1,.23,4.5,param="fmkl") ## FALSE gl.check.lambda.alt(0,1,0.5,-0.5,param="rs") ## FALSE
A replacement to the gl.check.lambda function in gld package.
gl.check.lambda.alt1(l1, l2 = NULL, l3 = NULL, l4 = NULL, param = "fmkl", vect = FALSE)
gl.check.lambda.alt1(l1, l2 = NULL, l3 = NULL, l4 = NULL, param = "fmkl", vect = FALSE)
l1 |
Lambda 1. |
l2 |
Lambda 2. |
l3 |
Lambda 3. |
l4 |
Lambda 4. |
param |
|
vect |
A logical, set this to |
This is a modified gl.check.lambda
function in replace of gld
library's gl.check.lambda
function to allow for 5 parameters FMKL
distributions and vector input of parameter values into this function.
A logical value, TRUE
or FALSE
. TRUE
indicates the
parameters given is a valid probability distribution.
Steve Su
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods *17*, 3547-3567.
Karian, Z.E., Dudewicz, E.J., and McDonald, P. (1996), The extended generalized lambda distribution system for fitting distributions to data: history, completion of theory, tables, applications, the “Final Word” on Moment fits, Communications in Statistics - Simulation and Computation *25*, 611-642.
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM *17*, 78-82.
gl.check.lambda.alt1(c(0,1,.23,4.5),param="fmkl",vect=TRUE) ## TRUE, Using vector input of parameter values. gl.check.lambda.alt1(0,-1,.23,4.5,param="fmkl") ## FALSE gl.check.lambda.alt1(0,1,0.5,-0.5,param="rs") ## FALSE
gl.check.lambda.alt1(c(0,1,.23,4.5),param="fmkl",vect=TRUE) ## TRUE, Using vector input of parameter values. gl.check.lambda.alt1(0,-1,.23,4.5,param="fmkl") ## FALSE gl.check.lambda.alt1(0,1,0.5,-0.5,param="rs") ## FALSE
Density, quantile density, distribution function, quantile
function and random generation for the generalised lambda
distribution (also known as the asymmetric lambda, or Tukey
lambda). Works for both the "fmkl"
and "rs"
parameterisations. These functions originate from the gld
library by Robert King and they are modified in this package
to allow greater functionality and adaptability to new fitting
methods. It does not give an error message for invalid distributions
but will return NAs instead. To allow comparability with the pkg(gld)
package, this package uses the same notation and description as those
written by Robert King.
dgl(x, lambda1 = 0, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl", inverse.eps = 1e-08, max.iterations = 500) pgl(q, lambda1 = 0, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl", inverse.eps = 1e-08, max.iterations = 500) qgl(p, lambda1, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl") rgl(n, lambda1=0, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl")
dgl(x, lambda1 = 0, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl", inverse.eps = 1e-08, max.iterations = 500) pgl(q, lambda1 = 0, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl", inverse.eps = 1e-08, max.iterations = 500) qgl(p, lambda1, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl") rgl(n, lambda1=0, lambda2 = NULL, lambda3 = NULL, lambda4 = NULL, param = "fmkl")
x |
Vector of actual values for |
p |
Vector of probabilities for |
q |
Vector of quantiles for |
n |
Number of observations to be generated for |
lambda1 |
This can be either a single numeric value or a vector. If it
is a vector, it must be of length 4 for parameterisations |
lambda2 |
Scale parameter |
lambda3 |
First shape parameter |
lambda4 |
Second shape parameter |
param |
|
inverse.eps |
Accuracy of calculation for the numerical determination of F(x), defaults to 1e-8 |
max.iterations |
Maximum number of iterations in the numerical determination of F(x), defaults to 500 |
The generalised lambda distribution, also known as the asymmetric lambda, or Tukey lambda distribution, is a distribution with a wide range of shapes. The distribution is defined by its quantile function, the inverse of the distribution function. The 'gld' package implements three parameterisations of the distribution. The default parameterisation (the FMKL) is that due to Freimer Mudholkar, Kollia and Lin (1988) (see references below), with a quantile function:
for .
A second parameterisation, the RS, chosen by setting param="rs"
is
that due to Ramberg and Schmeiser (1974), with the quantile function:
This parameterisation has a complex series of rules determining
which values of the parameters produce valid statistical
distributions. See gl.check.lambda
for details.
dgl |
gives the density (based on the quantile density and a
numerical solution to |
qdgl |
gives the quantile density, |
pgl |
gives the distribution function (based on a numerical
solution to |
qgl |
gives the quantile function, and |
rgl |
generates random observations. |
Freimer, M., Kollia, G., Mudholkar, G. S. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods *17*, 3547-3567.
Gilchrist, Warren G. (2000), Statistical Modelling with Quantile Functions, Chapman & Hall
Karian, Z.A., Dudewicz, E.J., and McDonald, P. (1996), The extended generalized lambda distribution system for fitting distributions to data: history, completion of theory, tables, applications, the “Final Word” on Moment fits, Communications in Statistics - Simulation and Computation *25*, 611-642.
Karian, Zaven A. and Dudewicz, Edward J. (2000), Fitting statistical distributions: the Generalized Lambda Distribution and Generalized Bootstrap methods, Chapman & Hall
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM *17*, 78-82.
qgl(seq(0,1,0.02),0,1,0.123,-4.3) pgl(seq(-2,2,0.2),0,1,-.1,-.2,param="fmkl",inverse.eps=.Machine$double.eps)
qgl(seq(0,1,0.02),0,1,0.123,-4.3) pgl(seq(-2,2,0.2),0,1,-.1,-.2,param="fmkl",inverse.eps=.Machine$double.eps)
The generic function histsu computes a histogram of the given data values.
histsu(x, breaks = "Sturges", freq = NULL, probability = !freq, include.lowest = TRUE, right = TRUE, density = NULL, angle = 45, col = NULL, border = NULL, main = paste("Histogram of", xname), xlim = range(breaks), ylim = NULL, xlab = xname, ylab, axes = TRUE, plot = TRUE, labels = FALSE, nclass = NULL, ...)
histsu(x, breaks = "Sturges", freq = NULL, probability = !freq, include.lowest = TRUE, right = TRUE, density = NULL, angle = 45, col = NULL, border = NULL, main = paste("Histogram of", xname), xlim = range(breaks), ylim = NULL, xlab = xname, ylab, axes = TRUE, plot = TRUE, labels = FALSE, nclass = NULL, ...)
x |
A vector of values for which the histogram is desired. |
breaks |
Either: 1) A vector giving the breakpoints between histogram cells, OR 2) A single number giving the number of cells for the histogram, OR 3) A character string naming an algorithm to compute the number of cells (see Details), OR 4) A function to compute the number of cells. |
freq |
logical; if |
probability |
A logical value, |
include.lowest |
If |
right |
If |
density |
The density of shading lines, in lines per inch. The default
value of |
angle |
The slope of shading lines, given as an angle in degrees (counter-clockwise). |
col |
A colour to be used to fill the bars. The default of |
border |
The color of the border around the bars. The default is to use the standard foreground color. |
main |
Title of the graph. |
xlim |
A two valued vector specifying the lower and upper limits of the x axis. |
ylim |
A two valued vector specifying the lower and upper limits of the y axis. |
xlab |
X axis labels. |
ylab |
Y axis labels. |
axes |
Logical value, if |
plot |
Logical value, if |
labels |
Logical or character. Additionally draw labels on top of bars,
if not |
nclass |
Number of bins of the histogram. |
... |
Other graphical parameters, see par for details. |
See hist
help file. This function forces the number of class of
histogram to that as specified by the user.
An object of class "histogram"
which is a list with components:
breaks |
The n+1 cell boundaries (= |
counts |
N integers; for each cell, the number of |
density |
Values as estimated density values. If |
intensities |
Same as density, deprecated. |
mids |
The n cell midpoints. |
xname |
A character string with the actual |
equidist |
Logical, indicating if the distances between |
Please see hist
help file.
R development team with modifications by Steve Su
Venables, W. N. and Ripley. B. D. (2002) Modern Applied Statistics with S. Springer.
# See hist for extended example: junk<-rgamma(1000,5) # Forcing the number of bins to be 10: histsu(junk,nclass=10)
# See hist for extended example: junk<-rgamma(1000,5) # Forcing the number of bins to be 10: histsu(junk,nclass=10)
This function works in similar fashion as in is.na
and
is.notinf
.
is.inf(x)
is.inf(x)
x |
A numerical value or a vector of data. |
A logical vector, T
if the value is Inf or -Inf.
In the presence of missing value, the function will return a missing value.
Steve Su
is.inf(c(Inf,2,2,1,-Inf))
is.inf(c(Inf,2,2,1,-Inf))
This function works in similar fashion as in is.na
and is.inf
.
is.notinf(x)
is.notinf(x)
x |
A numerical value or a vector of data. |
A logical vector, T
if the value is not Inf or -Inf.
In the presence of missing value, the function will return a missing value.
Steve Su
is.notinf(c(Inf,2,2,1,-Inf))
is.notinf(c(Inf,2,2,1,-Inf))
Performs one or two sample Kolmogorov-Smirnov tests.
ks.gof(x, y, ..., alternative = c("two.sided", "less", "greater"), exact = NULL)
ks.gof(x, y, ..., alternative = c("two.sided", "less", "greater"), exact = NULL)
x |
A numeric vector of data values. |
y |
Either a numeric vector of data values, or a character string naming a distribution function. |
... |
Parameters of the distribution specified (as a character string) by 'y'. |
alternative |
Indicates the alternative hypothesis and must be one of
|
exact |
|
If y
is numeric, a two-sample test of the null hypothesis that
x
and y
were drawn from the same continuous distribution is
performed.
Alternatively, y
can be a character string naming a continuous
distribution function. In this case, a one-sample test is carried
out of the null that the distribution function which generated x
is distribution y
with parameters specified by "...".
The possible values "two.sided"
(default),"less"
and "greater"
of alternative
specify the null hypothesis that the
true distribution function of x
is equal to, not less than or not
greater than the hypothesized distribution function (one-sample case) or the
distribution function of y
(two-sample case),
respectively.
Exact p-values are not available for the one-sided two-sample case, or in the
case of ties. exact = NULL
(the default), an exact p-value is computed
if the sample size if less than 100 in the one-sample case, and if the product
of the sample sizes is less than 10000 in the two-sample case. Otherwise,
asymptotic distributions are used whose approximations may be inaccurate in
small samples. In the one-sample two-sided case, exact p-values are obtained as
described in Marsaglia, Tsang & Wang (2003). The formula of Birnbaum & Tingey
(1951) is used for the one-sample one-sided case.
If a single-sample test is used, the parameters specified in "..." must be
pre-specified and not estimated from the data. There is some more refined
distribution theory for the KS test with estimated parameters
(see Durbin, 1973), but that is not implemented in ks.gof
.
A list with class "htest"
containing the following components:
statistic |
Value of test statistics. |
p.value |
P-value. |
alternative |
Character string describing the alternative hypothesis. |
method |
Character string indicating what type of test was performed. |
data.name |
Character string giving the name(s) of the data. |
This function handle ties by jittering, adding a very small uniform random number generated from the minimal value of the data set divided by 1e+08 to minimal value divided by 1e+07.
R
Z. W. Birnbaum & Fred H. Tingey (1951), One-sided confidence contours for probability distribution functions. The Annals of Mathematical Statistics, *22*/4, 592-596.
William J. Conover (1971), Practical nonparametric statistics. New York: John Wiley & Sons. Pages 295-301 (one-sample "Kolmogorov" test), 309-314 (two-sample "Smirnov" test).
Durbin, J. (1973) Distribution theory for tests based on the sample distribution function. SIAM.
George Marsaglia, Wai Wan Tsang & Jingbo Wang (2003), Evaluating Kolmogorov's distribution. Journal of Statistical Software, *8*/18. <URL: http://www.jstatsoft.org/v08/i18/>.
x <- rnorm(50) y <- runif(30) # Do x and y come from the same distribution? ks.gof(x, y) # Does x come from a shifted gamma distribution with shape 3 and rate 2? ks.gof(x+2, "pgamma", 3, 2) # two-sided, exact ks.gof(x+2, "pgamma", 3, 2, exact = FALSE) ks.gof(x+2, "pgamma", 3, 2, alternative = "gr")
x <- rnorm(50) y <- runif(30) # Do x and y come from the same distribution? ks.gof(x, y) # Does x come from a shifted gamma distribution with shape 3 and rate 2? ks.gof(x+2, "pgamma", 3, 2) # two-sided, exact ks.gof(x+2, "pgamma", 3, 2, exact = FALSE) ks.gof(x+2, "pgamma", 3, 2, alternative = "gr")
Calculates sample L-moments, L-coefficients and covariance matrix of L-moments.
Lmoments(data,rmax=4,na.rm=FALSE,returnobject=FALSE,trim=c(0,0)) Lcoefs(data,rmax=4,na.rm=FALSE,trim=c(0,0)) Lmomcov(data,rmax=4,na.rm=FALSE) Lmoments_calc(data,rmax=4) Lmomcov_calc(data,rmax=4)
Lmoments(data,rmax=4,na.rm=FALSE,returnobject=FALSE,trim=c(0,0)) Lcoefs(data,rmax=4,na.rm=FALSE,trim=c(0,0)) Lmomcov(data,rmax=4,na.rm=FALSE) Lmoments_calc(data,rmax=4) Lmomcov_calc(data,rmax=4)
data |
matrix or data frame. |
rmax |
maximum order of L-moments. |
na.rm |
a logical value indicating whether 'NA' values should be removed before the computation proceeds. |
returnobject |
a logical value indicating whether a list object should be returned instead of an array of L-moments. |
trim |
c(0,0) for ordinary L-moments and c(1,1) for trimmed (t=1) L-moments |
Lmoments |
returns an array of L-moments containing a row for each variable in data, or if returnobject=TRUE, a list containing the following: |
lambdas |
an array of L-moments |
ratios |
an array of mean, L-scale and L-moment ratios |
trim |
the value of the parameter 'trim' |
source |
a string with value "Lmoments" or "t1lmoments" |
Lcoefs |
returns an array of L-coefficients (mean, L-scale, L-skewness, L-kurtosis, ...) |
Lmomcov |
returns the covariance matrix of L-moments or a list of covariance matrices if the input has multiple columns. |
Lmoments_calc |
is internal function. |
Lmomcov_calc |
is internal function. |
Functions Lmoments
and Lcoefs
calculate trimmed L-moments
if you specify trim=c(1,1)
.
Juha Karvanen <[email protected]>
Karvanen, J. and A. Nuutinen (2008). "Characterizing the generalized lambda distribution by L-moments." Computational Statistics and Data Analysis 52(4): 1971-1983.
Asquith, W. (2007). "L-moments and TL-moments of the generalized lambda distribution." Computational Statistics and Data Analysis 51(9): 4484-4496.
Elamir, E. A., Seheult, A. H. 2004. "Exact variance structure of sample L-moments" Journal of Statistical Planning and Inference 124 (2) 337-359.
Hosking, J. 1990. "L-moments: Analysis and estimation distributions using linear combinations of order statistics", Journal of Royal Statistical Society B 52, 105-124.
t1lmoments
for trimmed L-moments
x<-rnorm(500) Lmoments(x)
x<-rnorm(500) Lmoments(x)
Divide a range of values into equally spaced divisions. End points are given as output.
pretty.su(x, nint = 5)
pretty.su(x, nint = 5)
x |
A vector of values. |
nint |
Number of intervals required. |
This is also used for the plotting of histogram in the histsu
function.
A vector of endpoints dividing the data into equally spaced regions.
Steve Su
# Generate random numbers from normal distribution: junk<-rnorm(1000,2,3) # Cut them into 7 regions, 8 endpoints. pretty.su(junk,7)
# Generate random numbers from normal distribution: junk<-rnorm(1000,2,3) # Cut them into 7 regions, 8 endpoints. pretty.su(junk,7)
This plots the theoretical and actual data quantiles to allow the user to examine the adequacy of a single gld distribution fit.
qqplot.gld(data, fit, param, len = 10000, name = "", corner = "topleft",type="",range=c(0,1),xlab="",main="")
qqplot.gld(data, fit, param, len = 10000, name = "", corner = "topleft",type="",range=c(0,1),xlab="",main="")
data |
Data fitted. |
fit |
Parameters of distribution fit. |
param |
Can be either |
len |
Precision of the quantile calculatons. Default is 10000. This means 10000 points are taken from 0 to 1. |
name |
Name of the data set, added to the title of plot if |
corner |
Can be |
type |
This can be "" or "str.qqplot", the first produces the raw quantiles and the second plot them on a straight line. Default is "". |
range |
This is the range for which the quantiles are to be plotted.
Default is |
xlab |
x axis label, if left blank, then default is "Data". |
main |
Title of the plot, if left blank, a default title will be added. |
A plot is given.
Steve Su
set.seed(1000) junk<-rweibull(300,3,2) # Fit the function using fun.data.fit.ml obj.fit1.ml<-fun.data.fit.ml(junk) # Do a quantile plot on the raw quantiles qqplot.gld(junk,obj.fit1.ml[,1],param="rs",name="RS ML fit") # Or a qq plot to examine deviation from straight line qqplot.gld(junk,obj.fit1.ml[,1],param="rs",name="RS ML fit",type="str.qqplot")
set.seed(1000) junk<-rweibull(300,3,2) # Fit the function using fun.data.fit.ml obj.fit1.ml<-fun.data.fit.ml(junk) # Do a quantile plot on the raw quantiles qqplot.gld(junk,obj.fit1.ml[,1],param="rs",name="RS ML fit") # Or a qq plot to examine deviation from straight line qqplot.gld(junk,obj.fit1.ml[,1],param="rs",name="RS ML fit",type="str.qqplot")
This plots the theoretical and actual data quantiles to allow the user to examine the adequacy of two gld distributions mixture fit.
qqplot.gld.bi(data, fit, param1, param2, len = 10000, name = "", corner = "topleft",type="",range=c(0,1),xlab="",main="")
qqplot.gld.bi(data, fit, param1, param2, len = 10000, name = "", corner = "topleft",type="",range=c(0,1),xlab="",main="")
data |
Data fitted. |
fit |
Parameters of distribution fit. |
param1 |
Can be either |
param2 |
Can be either |
len |
Precision of the quantile calculatons. Default is 10000. This means 10000 points are taken from 0 to 1. |
name |
Name of the data set, added to the title of plot if |
corner |
Can be |
type |
This can be "" or "str.qqplot", the first produces the raw quantiles and the second plot them on a straight line. Default is "". |
range |
This is the range for which the quantiles are to be plotted.
Default is |
xlab |
x axis label, if left blank, then default is "Data" |
main |
Title of the plot, if left blank, a default title will be added. |
A plot is given.
Steve Su
set.seed(1000) junk<-rweibull(300,3,2) ## Fitting mixture of generalised lambda distributions on the data set using ## both the maximum likelihood and partition maximum likelihood and plot the ## resulting fits junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.1,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) ## Do a quantile plot on the raw quantiles qqplot.gld.bi(faithful[,1],junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit") ## Or a qq plot to examine deviation from straight line qqplot.gld.bi(faithful[,1],junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit",type="str.qqplot")
set.seed(1000) junk<-rweibull(300,3,2) ## Fitting mixture of generalised lambda distributions on the data set using ## both the maximum likelihood and partition maximum likelihood and plot the ## resulting fits junk<-fun.auto.bimodal.ml(faithful[,1],per.of.mix=0.1,clustering.m=clara, init1.sel="rprs",init2.sel="rmfmkl",init1=c(-1.5,1.5),init2=c(-0.25,1.5), leap1=3,leap2=3) fun.plot.fit.bm(nclass=50,fit.obj=junk,data=faithful[,1], name="Maximum likelihood using",xlab="faithful1",param.vec=c("rs","fmkl")) ## Do a quantile plot on the raw quantiles qqplot.gld.bi(faithful[,1],junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit") ## Or a qq plot to examine deviation from straight line qqplot.gld.bi(faithful[,1],junk$par,param1="rs",param2="fmkl", name="RS FMKL ML fit",type="str.qqplot")
These functions provide quasi random numbers or space filling or
low discrepancy sequences in the -dimensional unit cube.
sHalton(n.max, n.min = 1, base = 2, leap = 1) QUnif (n, min = 0, max = 1, n.min = 1, p, leap = 1)
sHalton(n.max, n.min = 1, base = 2, leap = 1) QUnif (n, min = 0, max = 1, n.min = 1, p, leap = 1)
n.max |
maximal (sequence) number. |
n.min |
minimal sequence number. |
n |
number of |
base |
integer |
min , max
|
lower and upper limits of the univariate intervals.
Must be of length 1 or |
p |
dimensionality of space (the unit cube) in which points are generated. |
leap |
integer indicating (if |
sHalton(n,m)
returns a numeric vector of length n-m+1
of
values in .
QUnif(n, min, max, n.min, p=p)
generates n-n.min+1
p-dimensional points in returning a numeric matrix
with p columns.
For leap
Kocis and Whiten recommend values of
, and particularly the
for dimensions
up to 400.
Martin Maechler
James Gentle (1998) Random Number Generation and Monte Carlo Simulation; sec.\ 6.3. Springer.
Kocis, L. and Whiten, W.J. (1997) Computationsl Investigations of Low-Discrepancy Sequences. ACM Transactions of Mathematical Software 23, 2, 266–294.
32*sHalton(20, base=2) QUnif(n=10,p=2,leap=409)
32*sHalton(20, base=2) QUnif(n=10,p=2,leap=409)
This uses the S+ version directly.
skewness(x, na.rm = FALSE, method = "fisher") kurtosis(x, na.rm = FALSE, method = "fisher")
skewness(x, na.rm = FALSE, method = "fisher") kurtosis(x, na.rm = FALSE, method = "fisher")
x |
Any numerical object. Missing values |
na.rm |
Logical flag: if na.rm=TRUE, missing values are removed from x
before doing the computations. If |
method |
Character string specifying the computation method. The two
possible values are |
The moment
forms are based on the definitions of skewness and kurtosis
for distributions; these forms should be used when resampling (bootstrap or
jackknife). The "fisher" forms correspond to the usual "unbiased" definition of
sample variance, though in the case of skewness and kurtosis exact unbiasedness
is not possible.
A single value of skewness or kurtotis.
If y = x - mean(x), then the "moment" method computes the skewness value as
mean(y3)/mean(y
2)
1.5
and the kurtosis value as mean(y
4)/mean(y
2)
2 - 3.
To see the "fisher" calculations, print out the functions.
Splus
var
x <- runif(30) skewness(x) skewness(x, method="moment") kurtosis(x) kurtosis(x, method="moment")
x <- runif(30) skewness(x) skewness(x, method="moment") kurtosis(x) kurtosis(x, method="moment")
Calculates estimates for the FMKL parameterisation of the generalised lambda
distribution on the basis of data, using the starship method.
The starship method is built on the fact that the
generalised lambda distribution
is a transformation of the uniform distribution. This method finds the
parameters that transform the data closest to the uniform distribution.
This function uses a grid-based search to find a suitable starting point (using
starship.adaptivegrid
) then uses optim
to find
the parameters that do this.
starship(data, optim.method = "Nelder-Mead", initgrid = NULL, param="FMKL", optim.control=NULL)
starship(data, optim.method = "Nelder-Mead", initgrid = NULL, param="FMKL", optim.control=NULL)
data |
Data to be fitted, as a vector |
|||||||||||||||||||||||||||||||||
optim.method |
Optimisation method for |
|||||||||||||||||||||||||||||||||
initgrid |
Grid of values of If it is left as NULL, the default grid depends on the parameterisation.
For
( For
( For
and
|
|||||||||||||||||||||||||||||||||
param |
choose parameterisation:
|
|||||||||||||||||||||||||||||||||
optim.control |
List of options for the optimisation step. See
|
The starship method is described in King and MacGillivray, 1999 (see references). It is built on the fact that the generalised lambda distribution is a transformation of the uniform distribution. Thus the inverse of this transformation is the distribution function for the gld. The starship method applies different values of the parameters of the distribution to the distribution function, calculates the depths q corresponding to the data and chooses the parameters that make the depths closest to a uniform distribution.
The closeness to the uniform is assessed by calculating the Anderson-Darling
goodness-of-fit test on the transformed data against the uniform, for a
sample of size length(data)
.
This is implemented in 2 stages in this function. First a grid search is
carried out, over a small number of possible parameter values
(see starship.adaptivegrid
for details). Then the minimum from
this search is given as a starting point for an optimisation of the
Anderson-Darling value using optim, with method given by optim.method
See references for details on parameterisations.
Returns a list, with
lambda |
A vector of length 4, giving
the estimated parameters, in order,
|
grid.results |
output from the grid search - see
|
optim |
output from the optim search -
|
Robert King, Darren Wraith
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods 17, 3547–3567.
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM 17, 78–82.
King, R.A.R. & MacGillivray, H. L. (1999), A starship method for
fitting the generalised distributions,
Australian and New Zealand Journal of
Statistics 41, 353–374
Owen, D. B. (1988), The starship, Communications in Statistics - Computation and Simulation 17, 315–323.
starship.adaptivegrid
,
starship.obj
data <- rgl(100,0,1,.2,.2) starship(data,optim.method="Nelder-Mead",initgrid=list(lcvect=(0:4)/10, ldvect=(0:4)/10))
data <- rgl(100,0,1,.2,.2) starship(data,optim.method="Nelder-Mead",initgrid=list(lcvect=(0:4)/10, ldvect=(0:4)/10))
Calculates estimates for the generalised lambda distribution on the basis of data, using the starship method. The starship method is built on the fact that the generalised lambda distribution is a transformation of the uniform distribution. This method finds the parameters that transform the data closest to the uniform distribution. This function uses a grid-based search.
starship.adaptivegrid(data, initgrid=list( lcvect = c(-1.5, -1, -0.5, -0.1, 0, 0.1, 0.2, 0.4, 0.8, 1, 1.5), ldvect = c(-1.5, -1, -0.5, -0.1, 0, 0.1, 0.2, 0.4, 0.8, 1, 1.5), levect = c(-0.5,-0.25,0,0.25,0.5)),param="FMKL")
starship.adaptivegrid(data, initgrid=list( lcvect = c(-1.5, -1, -0.5, -0.1, 0, 0.1, 0.2, 0.4, 0.8, 1, 1.5), ldvect = c(-1.5, -1, -0.5, -0.1, 0, 0.1, 0.2, 0.4, 0.8, 1, 1.5), levect = c(-0.5,-0.25,0,0.25,0.5)),param="FMKL")
data |
Data to be fitted, as a vector |
initgrid |
A list with elements,
Note: if |
param |
choose parameterisation:
|
The starship method is described in King and MacGillivray, 1999 (see references). It is built on the fact that the generalised lambda distribution is a transformation of the uniform distribution. Thus the inverse of this transformation is the distribution function for the gld. The starship method applies different values of the parameters of the distribution to the distribution function, calculates the depths q corresponding to the data and chooses the parameters that make the depths closest to a uniform distribution.
The closeness to the uniform is assessed by calculating the Anderson-Darling
goodness-of-fit test on the transformed data against the uniform, for a
sample of size length(data)
.
This function carries out a grid-based search. This was the original method
of King and MacGillivray, 1999, but you are advised to instead use
starship
which uses a grid-based search together with an
optimisation based search.
See references for details on parameterisations.
response |
The minimum “response value” — the result of the internal goodness-of-fit measure. This is the return value of starship.obj. See King and MacGillivray, 1999 for more details |
lambda |
A vector of length 4 giving the values of
|
Robert King, Darren Wraith
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods 17, 3547–3567.
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM 17, 78–82.
King, R.A.R. & MacGillivray, H. L. (1999), A starship method for
fitting the generalised distributions,
Australian and New Zealand Journal of
Statistics 41, 353–374
Owen, D. B. (1988), The starship, Communications in Statistics - Computation and Simulation 17, 315–323.
data <- rgl(100,0,1,.2,.2) starship.adaptivegrid(data,list(lcvect=(0:4)/10,ldvect=(0:4)/10))
data <- rgl(100,0,1,.2,.2) starship.adaptivegrid(data,list(lcvect=(0:4)/10,ldvect=(0:4)/10))
The starship is a method for fitting the generalised lambda distribution.
See starship
for more details.
This function is the objective funciton minimised in the methods. It is a goodness of fit measure carried out on the depths of the data.
starship.obj(par, data, param = "fmkl")
starship.obj(par, data, param = "fmkl")
par |
parameters of the generalised lambda distribution, a vector of
length 4, giving |
data |
Data — a vector |
param |
choose parameterisation:
|
The starship method is described in King and MacGillivray, 1999 (see references). It is built on the fact that the generalised lambda distribution is a transformation of the uniform distribution. Thus the inverse of this transformation is the distribution function for the gld. The starship method applies different values of the parameters of the distribution to the distribution function, calculates the depths q corresponding to the data and chooses the parameters that make the depths closest to a uniform distribution.
The closeness to the uniform is assessed by calculating the Anderson-Darling
goodness-of-fit test on the transformed data against the uniform, for a
sample of size length(data)
.
This function returns that objective function. It is provided as a seperate
function to allow users to carry out minimisations using optim
or other methods. The recommended method is to use the link{starship}
function.
The Anderson-Darling goodness of fit measure, computed on the transformed data, compared to a uniform distribution. Note that this is NOT the goodness-of-fit measure of the generalised lambda distribution with the given parameter values to the data.
Robert King, Darren Wraith
Freimer, M., Mudholkar, G. S., Kollia, G. & Lin, C. T. (1988), A study of the generalized tukey lambda family, Communications in Statistics - Theory and Methods 17, 3547–3567.
Ramberg, J. S. & Schmeiser, B. W. (1974), An approximate method for generating asymmetric random variables, Communications of the ACM 17, 78–82.
King, R.A.R. & MacGillivray, H. L. (1999), A starship method for
fitting the generalised distributions,
Australian and New Zealand Journal of
Statistics 41, 353–374
Owen, D. B. (1988), The starship, Communications in Statistics - Computation and Simulation 17, 315–323.
starship
starship.adaptivegrid
,
data <- rgl(100,0,1,.2,.2) starship.obj(c(0,1,.2,.2),data,"fmkl")
data <- rgl(100,0,1,.2,.2) starship.obj(c(0,1,.2,.2),data,"fmkl")
Calculates sample trimmed L-moments with trimming parameter 1.
t1lmoments(data,rmax=4)
t1lmoments(data,rmax=4)
data |
matrix or data frame. |
rmax |
maximum order of trimmed L-moments. |
array of trimmed L-moments (trimming parameter = 1) up to order 4 containing a row for each variable in data.
Functions link{Lmoments}
and link{Lcoefs}
calculate
trimmed L-moments if you specify trim=c(1,1)
.
Juha Karvanen [email protected]
Karvanen, J. and A. Nuutinen (2008). "Characterizing the generalized lambda distribution by L-moments." Computational Statistics and Data Analysis 52(4): 1971-1983.
Asquith, W. (2007). "L-moments and TL-moments of the generalized lambda distribution." Computational Statistics and Data Analysis 51(9): 4484-4496.
Elamir, E. A., Seheult, A. H. 2004. "Exact variance structure of sample L-moments" Journal of Statistical Planning and Inference 124 (2) 337-359.
Hosking, J. 1990. "L-moments: Analysis and estimation distributions using linear combinations of order statistics", Journal of Royal Statistical Society B 52, 105-124.
Lmoments
for L-moments
x<-rnorm(500) t1lmoments(x)
x<-rnorm(500) t1lmoments(x)
Returns a vector showing the position of missing values in a vector.
which.na(x)
which.na(x)
x |
An object which should be of |
This returns the indices of values in x
which are missing or
"Not a Number".
# A non-zero number divided by zero creates infinity, # zero over zero creates a NaN weird.values <- c(1/0, -2.9/0, 0/0, NA) which.na(weird.values) # Produces: # [1] 3 4
# A non-zero number divided by zero creates infinity, # zero over zero creates a NaN weird.values <- c(1/0, -2.9/0, 0/0, NA) which.na(weird.values) # Produces: # [1] 3 4