Title: | Fit GLD Regression/Quantile/AFT Model to Data |
---|---|
Description: | Owing to the rich shapes of Generalised Lambda Distributions (GLDs), GLD standard/quantile/Accelerated Failure Time (AFT) regression is a competitive flexible model compared to standard/quantile/AFT regression. The proposed method has some major advantages: 1) it provides a reference line which is very robust to outliers with the attractive property of zero mean residuals and 2) it gives a unified, elegant quantile regression model from the reference line with smooth regression coefficients across different quantiles. For AFT model, it also eliminates the needs to try several different AFT models, owing to the flexible shapes of GLD. The goodness of fit of the proposed model can be assessed via QQ plots and Kolmogorov-Smirnov tests and data driven smooth test, to ensure the appropriateness of the statistical inference under consideration. Statistical distributions of coefficients of the GLD regression line are obtained using simulation, and interval estimates are obtained directly from simulated data. References include the following: Su (2015) "Flexible Parametric Quantile Regression Model" <doi:10.1007/s11222-014-9457-1>, Su (2021) "Flexible parametric accelerated failure time model"<doi:10.1080/10543406.2021.1934854>. |
Authors: | Steve Su [aut, cre, cph]
|
Maintainer: | Steve Su <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.1 |
Built: | 2025-02-21 04:29:58 UTC |
Source: | https://github.com/cran/GLDreg |
Owing to the rich shapes of GLDs, GLD standard/quantile regression is a competitive flexible model compared to standard/quantile regression. The proposed method has some major advantages: 1) it provides a reference line which is very robust to outliers with the attractive property of zero mean residuals and 2) it gives a unified, elegant quantile regression model from the reference line with smooth regression coefficients across different quantiles. The goodness of fit of the proposed model can be assessed via QQ plots and Kolmogorov-Smirnov tests and Data Driven Smooth Test, to ensure the appropriateness of the statistical inference under consideration. Statistical distributions of coefficients of the GLD regression line are obtained using simulation, and interval estimates are obtained directly from simulated data.
Package: | GLDreg |
Type: | Package |
Version: | 1.1.1 |
Date: | 2024-01-23 |
License: | CC BY-NC-SA 4.0 |
The primary fitting function for GLD regression model is
GLD.lm.full
. The output of GLD.lm.full
can then be
passed to summaryGraphics.gld.lm
to display coefficients of GLD
regression model graphically. Once a GLD reference model is obtained,
quantile regression is obtained using GLD.quantreg
.
The corresponding fitting algorithms for survival data are
GLD.lm.full.surv
which can then be passed to
summaryGraphics.gld.surv.lm
to display coefficients of GLD
regression model graphically.
Steve Su <[email protected]>
Su, S. (2015) "Flexible Parametric Quantile Regression Model" Statistics & Computing 25 (3). 635-650. doi:10.1007/s11222-014-9457-1
Su S. (2021) "Flexible parametric accelerated failure time model" J Biopharm Stat. 2021 Sep 31(5):650-667. doi:10.1080/10543406.2021.1934854
GLDEX
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit a FKML GLD regression example<-GLD.lm(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml") ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Find median regression, use empirical method med.fit<-GLD.quantreg(0.5,fit,slope="fixed",emp=TRUE) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression along with simulations engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.ml.m) ## Plot coefficient summary summaryGraphics.gld.lm(engel.fit.all) ## Fit quantile regression from 0.1 to 0.9, with equal spacings between ## quantiles result<-GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed") ## Plot quantile regression lines fun.plot.q(x=engel$income,y=engel$foodexp,fit=engel.fit.all[[1]],result, xlab="income",ylab="Food Expense") ## Sometimes the maximum likelihood estimation may fail, for example when ## minimum/maximum support of GLD is exactly at the minimum/maximum value of the ## dataset, if this the case, try to use the L-moment matching method. engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.lm) ## Fit Accelerated Failure Time model to actg data: actg.rs<-GLD.lm.full.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,summary.plot=F,n.simu=1000) summaryGraphics.gld.surv.lm(actg.rs,label=c("(Intercept)", "IDV versus no IDV","Hemophiliac","Baseline CD4", "Months of prior \n ZDV use","Age"),exp="TRUE") ## End(Not run)
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit a FKML GLD regression example<-GLD.lm(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml") ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Find median regression, use empirical method med.fit<-GLD.quantreg(0.5,fit,slope="fixed",emp=TRUE) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression along with simulations engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.ml.m) ## Plot coefficient summary summaryGraphics.gld.lm(engel.fit.all) ## Fit quantile regression from 0.1 to 0.9, with equal spacings between ## quantiles result<-GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed") ## Plot quantile regression lines fun.plot.q(x=engel$income,y=engel$foodexp,fit=engel.fit.all[[1]],result, xlab="income",ylab="Food Expense") ## Sometimes the maximum likelihood estimation may fail, for example when ## minimum/maximum support of GLD is exactly at the minimum/maximum value of the ## dataset, if this the case, try to use the L-moment matching method. engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.lm) ## Fit Accelerated Failure Time model to actg data: actg.rs<-GLD.lm.full.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,summary.plot=F,n.simu=1000) summaryGraphics.gld.surv.lm(actg.rs,label=c("(Intercept)", "IDV versus no IDV","Hemophiliac","Baseline CD4", "Months of prior \n ZDV use","Age"),exp="TRUE") ## End(Not run)
actg dataset from Hosmer et al.
data(actg)
data(actg)
Identification Code
Time to AIDS diagnosis or death (days).
Event indicator. 1 = AIDS defining diagnosis, 0 = Otherwise.
Time to death (days)
Event indicator for death (only). 1 = Death, 0 = Otherwise.
Treatment indicator. 1 = Treatment includes IDV, 0 = Control group.
Treatment group indicator. 1 = ZDV + 3TC. 2 = ZDV + 3TC + IDV. 3 = d4T + 3TC. 4 = d4T + 3TC + IDV.
CD4 stratum at screening. 0 = CD4 <= 50. 1 = CD4 > 50.
0 = Male. 1 = Female.
Race/Ethnicity. 1 = White Non-Hispanic. 2 = Black Non-Hispanic. 3 = Hispanic. 4 = Asian, Pacific Islander. 5 = American Indian, Alaskan Native. 6 = Other/unknown.
IV drug use history. 1 = Never. 2 = Currently. 3 = Previously.
Hemophiliac. 1 = Yes. 0 = No.
Karnofsky Performance Scale. 100 = Normal; no complaint no evidence of disease. 90 = Normal activity possible; minor signs/symptoms of disease. 80 = Normal activity with effort; some signs/symptoms of disease. 70 = Cares for self; normal activity/active work not possible.
Baseline CD4 count (Cells/Milliliter).
Months of prior ZDV use (months).
Age at Enrollment (years).
https://hivdb.stanford.edu/pages/clinicalStudyData/ACTG320.html
Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY
A simple transformation of altering the location of RS/FKML GLD so that the theoretical mean is altered to the level specified. Only the first parameter of RS/FKML GLD is altered.
fun.mean.convert(x, param, val = 0)
fun.mean.convert(x, param, val = 0)
x |
A vector of four values representing Lambda 1, Lambda 2, Lambda 3 and Lambda 4 of RS/FKML GLD. |
param |
Can be "rs" or "fmkl" or "fkml" |
val |
The targeted theoretical mean |
A vector of four values representing Lambda 1, Lambda 2, Lambda 3 and Lambda 4 of the transformed RS/FKML GLD
If finite first moment does not exist, original input values will be returned
Steve Su
# Transform RS GLD with parameters 3,2,1,1 to mean of 0 fun.mean.convert(c(3,2,1,1),param="rs") # Check that the desired outcome is achieved fun.theo.mv.gld(0,2,1,1,param="rs") # Transform RS GLD with parameters 3,2,1,1 to mean of 5 fun.mean.convert(c(3,2,1,1),param="fkml",5) # Check that the desired outcome is achieved fun.theo.mv.gld(5,2,1,1,param="fkml")
# Transform RS GLD with parameters 3,2,1,1 to mean of 0 fun.mean.convert(c(3,2,1,1),param="rs") # Check that the desired outcome is achieved fun.theo.mv.gld(0,2,1,1,param="rs") # Transform RS GLD with parameters 3,2,1,1 to mean of 5 fun.mean.convert(c(3,2,1,1),param="fkml",5) # Check that the desired outcome is achieved fun.theo.mv.gld(5,2,1,1,param="fkml")
This function plots quantile regression lines from GLD.lm
and
one of fun.gld.slope.vary.int.fixed
,
fun.gld.slope.fixed.int.vary
,
fun.gld.slope.fixed.int.vary.emp
,
fun.gld.all.vary.emp
, fun.gld.all.vary
,
fun.gld.slope.vary.int.fixed.emp
, GLD.quantreg
.
fun.plot.q(x, y, fit, quant.info, ...)
fun.plot.q(x, y, fit, quant.info, ...)
x |
A numerical vector of explanatory variable |
y |
A numerical vector of response variable |
fit |
An object from |
quant.info |
An object from one of |
... |
Additional arguments to be passed to plot function, such as axis labels and title of the graph |
This is intended to plot only two variables, for quantile regression involving more than one explanatory variable, consider plotting the actual values versus fitted values by fitting a secondary GLD quantile model between actual and fitted values.
A graph showing quantile regression lines
Steve Su
Su (2015) "Flexible Parametric Quantile Regression Model" Statistics & Computing May 2015, Volume 25, Issue 3, pp 635-650
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Find median regression, use empirical method med.fit<-GLD.quantreg(0.5,fit,slope="fixed",emp=TRUE) fun.plot.q(x=x,y=y,fit=fit[[1]],med.fit, xlab="x",ylab="y") ## Not run: ## Plot result of quantile regression ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression along with simulations engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.ml.m) ## Fit quantile regression from 0.1 to 0.9, with equal spacings between ## quantiles result<-GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed") ## Plot the quantile regression lines fun.plot.q(x=engel$income,y=engel$foodexp,fit=engel.fit.all[[1]],result, xlab="income",ylab="Food Expense") ## End(Not run)
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Find median regression, use empirical method med.fit<-GLD.quantreg(0.5,fit,slope="fixed",emp=TRUE) fun.plot.q(x=x,y=y,fit=fit[[1]],med.fit, xlab="x",ylab="y") ## Not run: ## Plot result of quantile regression ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression along with simulations engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.ml.m) ## Fit quantile regression from 0.1 to 0.9, with equal spacings between ## quantiles result<-GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed") ## Plot the quantile regression lines fun.plot.q(x=engel$income,y=engel$foodexp,fit=engel.fit.all[[1]],result, xlab="income",ylab="Food Expense") ## End(Not run)
Similar to lm
, this function fits a linear model using RS/FKML
GLDs and assess the goodness of fit of GLD with respect to the data via qq plot
and Kolmogorov-Smirnoff (KS) test. Note that the use of KS test when parameters
of a distribution are estimated from data is generally frowned upon. This is because
one often gets inflated p-value with increased type II error due to the fact that the
KS test requires independence between test sample and parameters of distribution.
Therefore, the the resample KS test over 1000 simulation runs from GLDEX package is
probably a more reasonable measure. It is probably reasonable to consider the resample
KS test may in fact decrease the p-values, as testing is done on resampled data from
fitted distribution so there is a certain degree of inaccuracy there. The provision of
these results is to give some indication of optimistic and pessimistic goodness
of fit measure, as currently, there is an absence of a specialised GLD goodness of fit test.
A generic Data Driven Smooth Test from ddst library in R is also incorporated to assess goodness of fit.
When in doubt, QQ plot should always be considered ahead of these results.
GLD.lm(formula, data, param, maxit = 20000, fun, method = "Nelder-Mead", diagnostics = TRUE, range = c(0.01, 0.99), init = NULL, alpha = 0.05)
GLD.lm(formula, data, param, maxit = 20000, fun, method = "Nelder-Mead", diagnostics = TRUE, range = c(0.01, 0.99), init = NULL, alpha = 0.05)
formula |
A symbolic expression of the model to be fitted, similar to the formula
argument in |
data |
Dataset containing variables of the model |
param |
Can be "rs", "fmkl" or "fkml" |
maxit |
Maximum number of iterations for numerical optimisation |
fun |
If param="fmkl" or "fkml", this can be one of If param="rs", this can be one of |
method |
Defaults to "Nelder-Mead" algorithm, can also be "SANN" but this is a lot slower and may not as good |
diagnostics |
Defaults to TRUE, which computes Kolmogorov-Smirnoff test and do QQ plot |
range |
The is the quantile range to plot the QQ plot, defaults to 0.01 and 0.99 to avoid potential problems with extreme values of GLD which might be -Inf or Inf. |
init |
Choose a different set of initial values to start the optimisation process. This can either be full set of parameters including GLD parameter estimates, or it can just be the coefficient estimates of the regression model. |
alpha |
Significant level of KS test. |
Message |
Short description of estimation method used and whether the result converged |
Bias Correction |
Bias correction used to ensure the line has zero mean residuals |
Estimated parameters |
A set of estimate coefficients from GLD regression |
Fitted |
Predicted response value from model |
Residual |
Residual of model |
formula |
Formula used in the model |
param |
Specify whether RS/FKML/FMKL GLD was used |
y |
The response variable |
x |
The explanatory variable(s) |
fun |
GLD fitting function used in the computation process, outputted for internal programming use |
Steve Su
Su (2015) "Flexible Parametric Quantile Regression Model" Statistics & Computing May 2015, Volume 25, Issue 3, pp 635-650
## Dummy example library(GLDEX) ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit a FKML GLD regression example<-GLD.lm(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml") ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression engel.fit<-GLD.lm(foodexp~income,data=engel,param="fmkl",fun=fun.RMFMKL.ml.m) ## Extract the mammals dataset library(MASS) mammals.fit<-GLD.lm(log(brain)~log(body),data=mammals,param="rs", fun=fun.RPRS.lm) ## Using quantile regression coefficients as starting values library(quantreg) mammals.fit1<-GLD.lm(log(brain)~log(body),data=mammals,param="rs", fun=fun.RPRS.lm,init=rq(log(brain)~log(body),data=mammals)$coeff) # As an exercise, use the result from mammals.fit1 as initial values GLD.lm(log(brain)~log(body),data=mammals,param="rs", fun=fun.RPRS.lm,init=mammals.fit1[[3]]) ## End(Not run)
## Dummy example library(GLDEX) ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit a FKML GLD regression example<-GLD.lm(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml") ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression engel.fit<-GLD.lm(foodexp~income,data=engel,param="fmkl",fun=fun.RMFMKL.ml.m) ## Extract the mammals dataset library(MASS) mammals.fit<-GLD.lm(log(brain)~log(body),data=mammals,param="rs", fun=fun.RPRS.lm) ## Using quantile regression coefficients as starting values library(quantreg) mammals.fit1<-GLD.lm(log(brain)~log(body),data=mammals,param="rs", fun=fun.RPRS.lm,init=rq(log(brain)~log(body),data=mammals)$coeff) # As an exercise, use the result from mammals.fit1 as initial values GLD.lm(log(brain)~log(body),data=mammals,param="rs", fun=fun.RPRS.lm,init=mammals.fit1[[3]]) ## End(Not run)
The function is an extension of GLD.lm
and defaults to
1000 simulation runs, coefficients and statistical properties of coefficients
can be plotted as part of the output.
GLD.lm.full(formula, data, param, maxit = 20000, fun, method = "Nelder-Mead", range = c(0.01, 0.99), n.simu = 1000, summary.plot = TRUE, init = NULL)
GLD.lm.full(formula, data, param, maxit = 20000, fun, method = "Nelder-Mead", range = c(0.01, 0.99), n.simu = 1000, summary.plot = TRUE, init = NULL)
formula |
A symbolic expression of the model to be fitted, similar to the formula
argument in |
data |
Dataset containing variables of the model |
param |
Can be "rs", "fmkl" or "fkml" |
maxit |
Maximum number of iterations for numerical optimisation |
fun |
If param="fmkl" or "fkml", this can be one of If param="rs", this can be one of |
method |
Defaults to "Nelder-Mead" algorithm, can also be "SANN" but this is a lot slower and may not as good |
range |
The is the quantile range to plot the QQ plot, defaults to 0.01 and 0.99 to avoid potential problems with extreme values of GLD which might be -Inf or Inf. |
n.simu |
Number of times to repeat the simulation runs, defaults to 1000. |
summary.plot |
Whether to plot the coefficients graphically, defaults to TRUE. |
init |
Choose a different set of initial values to start the optimisation process. This can either be full set of parameters including GLD parameter estimates, or it can just be the coefficient estimates of the regression model. |
This function usually takes some time to run, as it involves refitting the GLD regression model many times, the progress of the simulation is outputted to the R screen, so users can guage the progress of the computation.
[[1]] |
Output of |
[[2]] |
A matrix showing the bias adjustment, coefficents of the model, parameters of GLD and whether the result converged at each run |
[[3]] |
Adjusted simulation result so that the empirical mean of
coefficients is the same as the estimated parameters obtained in
|
Steve Su
Su (2015) "Flexible Parametric Quantile Regression Model" Statistics & Computing May 2015, Volume 25, Issue 3, pp 635-650
GLD.lm
, GLD.quantreg
,
summaryGraphics.gld.lm
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit a full GLD regression engel.fit.full<-GLD.lm.full(foodexp~income,data=engel,param="fmkl", fun=fun.RMFMKL.ml.m) ## Extract the mammals dataset library(MASS) ## Fit a full GLD regression mammals.fit.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m) ## Using quantile regression coefficients as starting values library(quantreg) mammals.fit1.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m, init=rq(log(brain)~log(body),data=mammals)$coeff) ## Using the result of mammals.fit.full as initial values mammals.fit2.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m, init=mammals.fit1.full[[1]][[3]]) ## End(Not run)
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit a full GLD regression engel.fit.full<-GLD.lm.full(foodexp~income,data=engel,param="fmkl", fun=fun.RMFMKL.ml.m) ## Extract the mammals dataset library(MASS) ## Fit a full GLD regression mammals.fit.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m) ## Using quantile regression coefficients as starting values library(quantreg) mammals.fit1.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m, init=rq(log(brain)~log(body),data=mammals)$coeff) ## Using the result of mammals.fit.full as initial values mammals.fit2.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m, init=mammals.fit1.full[[1]][[3]]) ## End(Not run)
The function is an extension of GLD.lm.surv
and defaults to
1000 simulation runs, coefficients and statistical properties of coefficients
can be plotted as part of the output.
GLD.lm.full.surv(formula, censoring, data, param, maxit = 20000, fun, method = "Nelder-Mead", range = c(0.01, 0.99), n.simu = 1000, summary.plot = FALSE, init = NULL, alpha = 0.05, censor.type = "right", adj.int = FALSE, GLD.adj = FALSE, adj.censor = TRUE, keep.uncen = TRUE)
GLD.lm.full.surv(formula, censoring, data, param, maxit = 20000, fun, method = "Nelder-Mead", range = c(0.01, 0.99), n.simu = 1000, summary.plot = FALSE, init = NULL, alpha = 0.05, censor.type = "right", adj.int = FALSE, GLD.adj = FALSE, adj.censor = TRUE, keep.uncen = TRUE)
formula |
A symbolic expression of the model to be fitted, similar to the formula
argument in |
censoring |
1=Event, 0= Censored |
data |
Dataset containing variables of the model |
param |
Can be "rs", "fmkl" or "fkml" |
maxit |
Maximum number of iterations for numerical optimisation |
fun |
If param="fmkl" or "fkml", this can be one of If param="rs", this can be one of |
method |
Defaults to "Nelder-Mead" algorithm, can also be "SANN" but this is a lot slower and may not as good |
range |
The is the quantile range to plot the QQ plot, defaults to 0.01 and 0.99 to avoid potential problems with extreme values of GLD which might be -Inf or Inf. |
n.simu |
Number of simulations, defaults to 1000. |
summary.plot |
If TRUE present graphical display of model fitted. |
init |
Initial values to start optimization process. |
alpha |
Significant level of goodness of fit test. |
censor.type |
Can be " right" of "left censored. |
adj.int |
Adjust intercept in final output? |
GLD.adj |
Adjust GLD fitted to have theoretical zero mean? |
adj.censor |
Adjust censoring? |
keep.uncen |
Keep uncensored values? |
Message |
Short description of estimation method used and whether the result converged |
Bias Correction |
Bias correction used to ensure the line has zero mean residuals |
Estimated parameters |
A set of estimate coefficients from GLD regression |
Fitted |
Predicted response value from model |
Residual |
Residual of model |
formula |
Formula used in the model |
param |
Specify whether RS/FKML/FMKL GLD was used |
y |
The response variable |
x |
The explanatory variable(s) |
fun |
GLD fitting function used in the computation process, outputted for internal programming use |
censoring |
Censoring data |
AIC.full |
AIC results |
BIC.full |
BIC results |
censor.gld.values |
Result of GLD fit, including censoring |
simu.result |
Result of simulation for all coefficeints in the model |
censor.gld.values |
Result of GLD fit, including censoring |
simu.bias.correct.result |
Bias corrected simulation results |
Steve Su
Su (2021) "Flexible Parametric Accelerated Failure Time Model" Journal of Biopharmaceutical Statistics Volume 31, 2021 - Issue 5
GLD.lm.full
, GLD.quantreg
, GLD.lm
,
GLD.lm.surv
## Not run: actg.rs<-GLD.lm.full.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,summary.plot=F,n.simu=1000) ## End(Not run)
## Not run: actg.rs<-GLD.lm.full.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,summary.plot=F,n.simu=1000) ## End(Not run)
Similar to GLD.lm
, this function fits an Accelerated Failure Time
Model using RS/FKML GLDs.
GLD.lm.surv(formula, censoring, data, param, maxit = 20000, fun, method = "Nelder-Mead", diagnostics = TRUE, range = c(0.01, 0.99), init = NULL, alpha = 0.05, censor.type = "right", adj.int = TRUE, GLD.adj = FALSE, adj.censor = TRUE, keep.uncen = TRUE)
GLD.lm.surv(formula, censoring, data, param, maxit = 20000, fun, method = "Nelder-Mead", diagnostics = TRUE, range = c(0.01, 0.99), init = NULL, alpha = 0.05, censor.type = "right", adj.int = TRUE, GLD.adj = FALSE, adj.censor = TRUE, keep.uncen = TRUE)
formula |
A symbolic expression of the model to be fitted, similar to the formula
argument in |
censoring |
1=Event, 0= Censored |
data |
Dataset containing variables of the model |
param |
Can be "rs", "fmkl" or "fkml" |
maxit |
Maximum number of iterations for numerical optimisation |
fun |
If param="fmkl" or "fkml", this can be one of If param="rs", this can be one of |
method |
Defaults to "Nelder-Mead" algorithm, can also be "SANN" but this is a lot slower and may not as good |
diagnostics |
Defaults to TRUE, which computes Kolmogorov-Smirnoff test, Kolmogorov-Smirnoff Resample test, Data drive smooth test and do QQ plot on non censored data. |
range |
The is the quantile range to plot the QQ plot, defaults to 0.01 and 0.99 to avoid potential problems with extreme values of GLD which might be -Inf or Inf. |
init |
Choose a different set of initial values to start the optimisation process. This can either be full set of parameters including GLD parameter estimates, or it can just be the coefficient estimates of the regression model. |
alpha |
Significant level of goodness of fit test. |
censor.type |
Can be" right" of "left censored. |
adj.int |
Adjust intercept in final output? |
GLD.adj |
Adjust GLD fitted to have theoretical zero mean? |
adj.censor |
Adjust censoring? |
keep.uncen |
Keep uncensored values? |
Message |
Short description of estimation method used and whether the result converged |
Bias Correction |
Bias correction used to ensure the line has zero mean residuals |
Estimated parameters |
A set of estimate coefficients from GLD regression |
Fitted |
Predicted response value from model |
Residual |
Residual of model |
formula |
Formula used in the model |
param |
Specify whether RS/FKML/FMKL GLD was used |
y |
The response variable |
x |
The explanatory variable(s) |
fun |
GLD fitting function used in the computation process, outputted for internal programming use |
censoring |
Censoring data |
AIC.full |
AIC results |
BIC.full |
BIC results |
censor.gld.values |
Result of GLD fit, including censoring |
Steve Su
Su (2021) "Flexible Parametric Accelerated Failure Time Model" Journal of Biopharmaceutical Statistics Volume 31, 2021 - Issue 5
GLD.lm.full
, GLD.quantreg
, GLD.lm
,
GLD.lm.full.surv
## Not run: # Note the actg.rs1 differs from GLD.lm.full.surv because adj.int is set as # TRUE in GLD.lm.surv by default but adj.int is set as FALSE in # GLD.lm.full.surv by default actg.rs1<-GLD.lm.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m) actg.rs2<-GLD.lm.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,adj.int=FALSE) ## End(Not run)
## Not run: # Note the actg.rs1 differs from GLD.lm.full.surv because adj.int is set as # TRUE in GLD.lm.surv by default but adj.int is set as FALSE in # GLD.lm.full.surv by default actg.rs1<-GLD.lm.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m) actg.rs2<-GLD.lm.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg[which(actg$txgrp!=3 & actg$txgrp!=4),]$censor, data=actg[which(actg$txgrp!=3 & actg$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,adj.int=FALSE) ## End(Not run)
The GLD quantile regression can be: 1) Fixed intercept, allowing all other coefficients to vary, 2) Only intercept is allowed to vary and 3) All coefficients can vary. Minimisation is achieved numerically through least squares between the proportion of estimated GLD error distribution below zero versus the specified quantile for parametric approach. For non parametric approach, minimisation is achieved using a least squares approach to find a q-th quantile GLD line such that the percentage of observations below the line corresponds to the q-th quantile.
GLD.quantreg(q, fit.obj, intercept = "", slope = "", emp=FALSE)
GLD.quantreg(q, fit.obj, intercept = "", slope = "", emp=FALSE)
q |
Specify the quantile (range 0 to 1) line |
fit.obj |
An object from |
intercept |
Can either be "fixed" or left blank, blank indicates this parameter is allowed to vary in quantile line estimation |
slope |
Can either be |
emp |
Can either be |
This is a wrapper function for fun.gld.all.vary
,
fun.gld.slope.fixed.int.vary
,
fun.gld.slope.vary.int.fixed
.
A matrix showing the estimated coefficients for the specified quantile
regression model, the objective function value and whether convergence is
reached in the optimisation process. A value of 0 indicates convergence is
reached. The convergence value is the same as the one from the optim
function.
Steve Su
Su (2015) "Flexible Parametric Quantile Regression Model" Statistics & Computing May 2015, Volume 25, Issue 3, pp 635-650
GLD.lm.full
,fun.plot.q
,
summaryGraphics.gld.lm
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Find median regression, use empirical method med.fit<-GLD.quantreg(0.5,fit,slope="fixed",emp=TRUE) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression along with simulations engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.ml.m) ## Fit parametric GLD quantile regression from 0.1 to 0.9, with equal spacings ## between quantiles result<-GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed") ## Non parametric quantile regression GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed",emp=T) ## End(Not run)
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Find median regression, use empirical method med.fit<-GLD.quantreg(0.5,fit,slope="fixed",emp=TRUE) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit GLD Regression along with simulations engel.fit.all<-GLD.lm.full(foodexp~income,data=engel, param="fmkl",fun=fun.RMFMKL.ml.m) ## Fit parametric GLD quantile regression from 0.1 to 0.9, with equal spacings ## between quantiles result<-GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed") ## Non parametric quantile regression GLD.quantreg(seq(0.1,.9,length=9),engel.fit.all,intercept="fixed",emp=T) ## End(Not run)
This is an updated QQ plot function for GLD comparing fitted distribution with empirical data
qqgld.default(y, vals, param, ylim, main = "GLD Q-Q Plot", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE, datax = FALSE, ...)
qqgld.default(y, vals, param, ylim, main = "GLD Q-Q Plot", xlab = "Theoretical Quantiles", ylab = "Sample Quantiles", plot.it = TRUE, datax = FALSE, ...)
y |
A vector of empirical data observations |
vals |
A vector representing four parameters of GLD |
param |
Can be "rs", "fmkl" or "fkml" |
ylim |
A vector of two numerical values, specifying the upper and lower bound of y axis |
main |
Title of the qq plot |
xlab |
Label for X axis |
ylab |
Label for Y axis |
plot.it |
Whether to plot the QQ plot, default is TRUE |
datax |
Whether data values should be on x axis, default is FALSE |
... |
Additional graphical parameters |
This is an adaptation of the default qq plot in R
A list with components:
x |
The x coordinates of the points that were/would be plotted |
y |
The original y vector, i.e., the corresponding y coordinates including NAs. |
R, with modifications from Steve Su
x<-rnorm(100) fit1<-fun.RMFMKL.ml.m(x) qqgld.default(x,fit1,param="fmkl")
x<-rnorm(100) fit1<-fun.RMFMKL.ml.m(x) qqgld.default(x,fit1,param="fmkl")
GLD.lm.full
This function display the coefficients and the distribution of coefficients
obtained from GLD regression model. For a discussion on goodness of fit,
please see the description under GLD.lm
.
summaryGraphics.gld.lm(overall.fit.obj, alpha = 0.05, label = NULL, ColourVersion = TRUE, diagnostics = TRUE, range = c(0.01, 0.99))
summaryGraphics.gld.lm(overall.fit.obj, alpha = 0.05, label = NULL, ColourVersion = TRUE, diagnostics = TRUE, range = c(0.01, 0.99))
overall.fit.obj |
An object from |
alpha |
Specifying the range of interval for the coefficients, default is 0.05, which specifies a 95% interval. This also specifies the significance level of KS resample test. |
label |
A character vector indicating the labelling for the coefficients |
ColourVersion |
Whether to display colour or not, default is TRUE, if set as FALSE, a black and white plot is given. This is only applicable to the coefficient summary graph and has no effect on QQ plots. |
diagnostics |
If TRUE, then QQ plot will be given along with various goodness of fit test results |
range |
The is the quantile range to plot the QQ plot, defaults to 0.01 and 0.99 to avoid potential problems with extreme values of GLD which might be -Inf or Inf. |
The reason QQ plots are not displayed in black and white even if ColourVersion is set to FALSE is because the colour is necessary in those plots for clarity of display.
Graphics displaying coefficient summary and diagnostic plot (if chosen)
Steve Su
Su (2015) "Flexible Parametric Quantile Regression Model" Statistics & Computing May 2015, Volume 25, Issue 3, pp 635-650
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Note this is for illustration only, need to set number ## of simulations around 1000 usually for the graphics below ## to be meaningful summaryGraphics.gld.lm(fit,ColourVersion=FALSE,diagnostic=FALSE) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit a full GLD regression engel.fit.full<-GLD.lm.full(foodexp~income,data=engel,param="fmkl", fun=fun.RMFMKL.ml.m) ## Plot coefficient summary summaryGraphics.gld.lm(engel.fit.full,ColourVersion=FALSE,diagnostic=FALSE) summaryGraphics.gld.lm(engel.fit.full) ## Extract the mammals dataset library(MASS) ## Fit a full GLD regression mammals.fit.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m) ## Plot coefficient summary summaryGraphics.gld.lm(mammals.fit.full,label=c("intercept","log of body weight")) ## End(Not run)
## Dummy example ## Create dataset set.seed(10) x<-rnorm(200,3,2) y<-3*x+rnorm(200) dat<-data.frame(y,x) ## Fit FKML GLD regression with 3 simulations fit<-GLD.lm.full(y~x,data=dat,fun=fun.RMFMKL.ml.m,param="fkml",n.simu=3) ## Note this is for illustration only, need to set number ## of simulations around 1000 usually for the graphics below ## to be meaningful summaryGraphics.gld.lm(fit,ColourVersion=FALSE,diagnostic=FALSE) ## Not run: ## Extract the Engel dataset library(quantreg) data(engel) ## Fit a full GLD regression engel.fit.full<-GLD.lm.full(foodexp~income,data=engel,param="fmkl", fun=fun.RMFMKL.ml.m) ## Plot coefficient summary summaryGraphics.gld.lm(engel.fit.full,ColourVersion=FALSE,diagnostic=FALSE) summaryGraphics.gld.lm(engel.fit.full) ## Extract the mammals dataset library(MASS) ## Fit a full GLD regression mammals.fit.full<-GLD.lm.full(log(brain)~log(body),data=mammals,param="fmkl", fun=fun.RMFMKL.ml.m) ## Plot coefficient summary summaryGraphics.gld.lm(mammals.fit.full,label=c("intercept","log of body weight")) ## End(Not run)
GLD.lm.full.surv
This function display the coefficients and the distribution of coefficients obtained from GLD Accelerated Failure Time regression model.
summaryGraphics.gld.surv.lm(overall.fit.obj, alpha = 0.05, label = NULL, ColourVersion = TRUE, diagnostics = TRUE, range = c(0.01, 0.99), exp = FALSE)
summaryGraphics.gld.surv.lm(overall.fit.obj, alpha = 0.05, label = NULL, ColourVersion = TRUE, diagnostics = TRUE, range = c(0.01, 0.99), exp = FALSE)
overall.fit.obj |
An object from |
alpha |
Specifying the range of interval for the coefficients, default is 0.05, which specifies a 95% interval. This also specifies the significance level of KS test. |
label |
A character vector indicating the labelling for the coefficients |
ColourVersion |
Whether to display colour or not, default is TRUE, if set as FALSE, a black and white plot is given. This is only applicable to the coefficient summary graph and has no effect on QQ plots. |
diagnostics |
If TRUE, then QQ plot will be given along with various goodness of fit test results |
range |
The is the quantile range to plot the QQ plot, defaults to 0.01 and 0.99 to avoid potential problems with extreme values of GLD which might be -Inf or Inf. |
exp |
If TRUE, Exponentiate the coefficients |
The reason QQ plots are not displayed in black and white even if ColourVersion is set to FALSE is because the colour is necessary in those plots for clarity of display.
Graphics displaying coefficient summary and diagnostic plot (if chosen)
Steve Su
Su (2021) "Flexible Parametric Accelerated Failure Time Model" Journal of Biopharmaceutical Statistics Volume 31, 2021 - Issue 5
## Not run: library(mlr3proba) actg320.rs<-GLD.lm.full.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg320[which(actg320$txgrp!=3 & actg320$txgrp!=4),]$censor, data=actg320[which(actg320$txgrp!=3 & actg320$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,summary.plot=F,n.simu=1000) summaryGraphics.gld.surv.lm(actg320.rs,label=c("(Intercept)", "IDV versus no IDV","Hemophiliac","Baseline CD4", "Months of prior \n ZDV use","Age"),exp="TRUE") ## End(Not run)
## Not run: library(mlr3proba) actg320.rs<-GLD.lm.full.surv(log(time)~factor(txgrp)+hemophil+cd4+priorzdv+age, censoring=actg320[which(actg320$txgrp!=3 & actg320$txgrp!=4),]$censor, data=actg320[which(actg320$txgrp!=3 & actg320$txgrp!=4),], param="rs",fun=fun.RPRS.ml.m,summary.plot=F,n.simu=1000) summaryGraphics.gld.surv.lm(actg320.rs,label=c("(Intercept)", "IDV versus no IDV","Hemophiliac","Baseline CD4", "Months of prior \n ZDV use","Age"),exp="TRUE") ## End(Not run)