StatisticalAssessment of Two Research Reports
CourseName Course Number
Twoarticles are assessed in this paper – ExploratoryFactor Analysis: A Five-stepGuidefor Novices andBirnbaum–SaundersStatistical Modelling: A New Approach. Thearticle Exploratory Factor Analysis began by introducing factoranalysis. According to the study, factory analysis is used ineducation, psychology, and health sectors. Factor analysis is knownto be a multivariate statistical procedure. By multivariate, it meanshaving two or more variables. Factor analysis is made up of two majorgroups, namely exploratory factor analysis and confirmatory factoranalysis. Exploratory factor analysis enables the research to seekthe primary fundamentals to produce a model or a theory from a largegroup of latent constructs. In confirmatory, the research makes useof this method to assess a proposed model or theory. The authorsrecommended other means to determine the number of participantsneeded for every variable as indicated in the ratio N:p. This studyalso illustrates various tests to evaluate the suitability of thedata to be used for factor analysis. In Birnbaum and Saunders, theauthors suggested a new statistical model for measuring fatigue lifeunder cyclic stress. The models described were complicated by nature.The primary purpose of Leiva et al is to formulate a new model basedon the BS regression which follows the line of GLM. The mean responsein this framework is relative to the linear predictor. The authorspresented several other complicated approaches and according to theapproach shown in the article, the authors were able to form a groupof computational routines in regression computer language. Suchroutines are made up of functions diagnostics.bs, bsreg.fit, bsreg,envelope.bs, summary.bs, and a lot more. This paper evaluates thestatistical presentation of the two research reports namelyExploratoryFactor Analysis: A Five-step Guide for Novicesby Williams, Onsman, and Brown, and Birnbaum–SaundersStatistical Modelling: A New Approach byLeiva, Santos-Neto, Cysneiros, and Barros.
ExploratoryFactor Analysis: A Five-step Guide for Novices
Thisarticle began by introducing factor analysis. According to the study,factory analysis is used in education, psychology, and healthsectors. The primary objective of this study is to demonstrate afactor analysis procedure in educating paramedic educators,researches, and post graduate students. Seven significant topics werecovered in the study including the background of factor analysis, thekinds of factor analysis, the appropriateness of data for factoranalysis, process of extracting factors from data, the gauge used infactor extraction, the kinds of rotational procedures, and lastly,interpretation, and labeling of construction.
Factoranalysis is known to be a multivariate statistical procedure. Bymultivariate, it means having two or more variables. Factor analysiscan be used in varied ways including reduction of big number ofvariables into smaller sets, creations of fundamental dimensionsbetween latent constructs and measured variables, thus enabling therefinement and formation of theory. Factor analysis also offersconstruct validity evidence.
Factoranalysis is made up of two major groups, namely exploratory factoranalysis and confirmatory factor analysis. In exploratory, theresearcher does not have expectations of the quantity of variablesand thus becomes explorative by nature. Exploratory factor analysisenables the research to seek the primary fundamentals to produce amodel or a theory from a large group of latent constructs. Inconfirmatory, the research makes use of this method to assess aproposed model or theory and compared to exploratory, hasexpectations and assumptions grounded on a priori theory concerningthe factors and the models or theories that best apply to the factor.
Exploratoryfactor analysis is complex yet the analysis is linear and sequential.Further, the process involves various options. Exploratory analysisis made up of five basic steps: determining whether the data isappropriate for factor analysis or not, how the factors will beextracted, what criteria to use in recognizing factor extraction,variety of rotational procedure, and interpretation as well aslabeling. For data to be appropriate for factor analysis, it musthave 300 cases. However, for other researchers, samples can be small.Hence, it has been concluded in this study that sample sizes forfactor analysis differ.
Inthis study, the authors also recommended other means to determine thenumber of participants needed for every variable. This is indicatedas N:p ratio. Npertainsto the number of participants while ppertains to the number of variables. Through the use of N:p, theauthors discovered that there is no minimum level for the totalnumber of participants. Aside from sample to variable ratio, theauthors also studied the factorability of the correlation matrix. Asthe term denotes, a correlation matrix displays the relationshipbetween variables. Correlation matrix must be checked for correlationcoefficients greater than 0.30. The researchers categorized the databeing that +/-0.30 is minimal +/-0.40 is important, and +/-0.50 ispractically significant. Hence, if the correlations do not exceed0.30, then researchers must examine whether factor analysis is theappropriate method to use because a factorability of 0.30 would meanthat the factors make up 30% of the relationship among the data.Further, this would signify that 30% of the variables havesignificant variance and thus becomes impractical to recognizewhether the variables are correlation or the dependent variable.
Thisstudy also illustrates various tests to evaluate the suitability ofthe data to be used for factor analysis. Two of these tests are theBartlett’s Test of Sphericity and Kaiser-Meyer-Olkin Measure ofSampling Adequacy. KMO is used when the ratio between cases andvariable is less than 1:5. The Bartlett’s Test is significant atless than .05. As for extraction of factors, the primary objective ofrotation is to shorten the factor arrangement of items. Simply put,greater item loadings on a single factor and lesser item loadings onthe factor solutions left. In extracting factors, there are severalways suggested: image factoring, principal components analysis,maximum likelihood, principal axis factoring, canonical, and alphafactoring. A lot of researchers argue concerning the application ofPCF and PCA despite the fact that the differences between the twomethods are not so significant. PCA is the default approach invarious statistical programs and therefore used in exploratory factoranalysis. PCA is also used when there is no existence of priori modelor theory. The objective of extraction of data is to decrease largeitems into factors. To generate scale and simplify the solutions forfactors, criteria are made available. But with the confusing qualityof factor analysis, no criteria must be assumed to recognize factorextraction.
Thisstudy also tackles cumulative percentage of variance as anotherapproach in factor analysis that stimulates disagreement. No fixedthreshold is applicable even though some percentages have beenrecommended. In the aspect of natural science, factors must bestopped when 95% of the variance is elucidated. On the other hand, inhumanities, factors should be halted once variance explained 50 to60%. Scree Test considered subjective by nature because it requiresthe judgment of the researcher. Hence, disagreement concerningfactors that must be retained is open for arguments. In examining andinterpreting Scree plot, two steps are involved. First, it isimportant to draw a line through the eigenvalues and the point atwhich there is breakage, signifies the overall number of factors thatshould be retained. Parallel analysis is not so much used in factorextraction method. One reason is its limited accessibility instatistical programs. Nonetheless, for other experts, parallelanalysis has advantages and use in extraction of factors. In aparallel analysis, the eigenvalues are matched with eigenvalues thatare in random order. Factors are kept when the actual eigenvalues gobeyond the eigenvalues that are in random order.
Thisresearch also discusses the selection of rotational method. Accordingto the authors, when deciding the number of factors to analyze thedata, it is important to consider whether the variables relate to oneanother. Through rotational method, high item loadings are maximizedwhile low item loadings are minimized, hence generating a simplifiedand interpretable solution. Rotational method is made up of twocommon approaches namely orthogonal rotation as well as obliquerotations. Various studies have made use of diverse methods inselecting rotation options. The Orthogonal Varimax is the mostcommonly used method in factor analysis because it generates factorstructures that are not correlated. On the contrary, public rotationgenerates correlated factors which are seen as giving more preciseoutcomes for study that involves human behaviours or when informationis not able to meet priori assumptions. However the case maybe, theprimary aim is to offer less complex interpretation of outcomes, andgenerate a solution that is greatly parsimonious. The last part infactor analysis is the interpretation. In this part of the paper, theauthors discuss how researchers use this section to examine thevariables that can be attributed to a factor. For instance, a factormay have five variables each with a theme or a name. In traditionalstatistics, only two or three variables should load on certain factorin order for it to have a meaningful interpretation.
Thisarticle merely discussed the factor analysis but did not present inany actual or sample methodology and data to better illustrate thepurpose of exploratory factor analysis. The readers could havebenefitted more if various samples and illustrations were used.
Birnbaum–SaundersStatistical Modelling: A New Approach
Inthis paper, Birnbaum and Saunders suggested a new statistical modelfor measuring fatigue life under cyclic stress. Such statisticalmodel demonstrates that failure come from the formation of a dominantcrack generated by stress. The overall damage that permits theBirnbaum and Saunders distribution to be produced follows a normaldistribution. BS statistical model follows a unimodal and positivelyskewed distribution. BS model is used in engineering studies.
Ina BS model, a random variable Y follows the BS distribution withalpha greater than 0 as the shape parameter and delta greater than 0for the scale parameter. This is indicated by Y ~ BS(alpha, delta) if its probability functions is in the form of,
Inthis formula, delta serves as the median of the distribution of Y,1/Y ~ BS (alpha, 1/delta) and bY ~ BS (alpha, b delta), if b isgreater than 0. On the other hand, the average or the mean of Y isE[Y] = delta [1+alpha2/2]while its variance is [(alpha)(delta)]2[1+(5alpha2)/4].This study on BS statistical model also demonstrates the variedarguments and interpretations of experts. For instance, the authorsmentioned that Rieck and Nedelman defined that if Y ~ BS(alpha,delta), then V = log(Y) will follow a logarithm-BS distribution withlocal and shape parameter g = log(delta)∈R, indicated by V ~ log-BS(a, g). These researchers also suggestedlogarithm-linear regression models and used them to data involvingfatigue. Other researchers mentioned were Galea, Xie, and Wei whocreated various diagnostic devices for BS model. Leiva, on the otherhand, created BS logarithm linear regression models as well asdiagnostics and used them to survival information of patients havingblood cell cancer.
Theprimary purpose of Leiva et al is to formulate a new model based onthe BS regression which follows the line of GLM. The mean response inthis framework is relative to the linear predictor. Such linearpredictor is made up of regressors as well as unknown parameters.Compared to the present BS regression model, the new strategy isbased on the mean of the population. Santos-Neto proposed the newparameterization involving BS distribution. This parameterization isindicated by the m and delta parameters in which µ is greater than 0is the scale parameter as well as the distribution mean while deltagreater than 0 is a precision and shape parameter.
Inthe formula above, the authors made use of the notation Y ~ BS (µ,δ). The variance and mean of Y is shown in E[Y] = µ and Variance[Y] = µ 2/f, respectively, where f = [δ + 1]2/[2δ + 5]. Deltaserves as the precision parameter. In a fixed value of µ, when δ →∞, then the variance of Y is zero. Also, for fixed m, if δ → 0,then Variance [Y] → 5 µ2.
Inthe figure above, the author illustrated this to show some shapes ofthe probability distribution factor of Y ~ BS (m, δ). In letter (a),the parameter delta is in control of the kurtosis and the skewness ofthe distribution. I addition, the delta increases. As this happens,the probability distribution factor becomes more concentrated in themean section and thus there is reduction in variability. In figure(b), the parameter µ changes the distribution scale and as itcontinues to increase, there is also an increase in variability.Lastly, in figure (c), the variance becomes 20 when µ equals 2 anddelta → 0, whereas it tends to zero, as δ → ∞.
Theauthors presented several other complicated approaches and accordingto the approach shown in the article, the authors were able to form agroup of computational routines in regression computer language. Suchroutines are made up of functions diagnostics.bs, bsreg.fit, bsreg,envelope.bs, summary.bs, and a lot more. The authors also conducted asimulation study to observe the distributions of the residuals in thedata. The authors made use of a BS regression model with functionlinking logarithm log (µi)= b1+ b2xi,i = 1… 20 in which the real values for the stated parameters areunderstood as b1=0.2, b2= 0.5, and δ = 2 and δ = 25. In this article, the authors assumedthat the values of the data regressors are taken from a uniformdistribution in the 0 and 1 interval. Through the use of µi= exp(b1+ b2xi),the values of µiwere obtained. Thus, for every 5000 replications, the group was ableto acquire the observations y = (y1.. . y20)from the BS distribution with parameters µiand δ, for i = 1, . . ., 20. Then the group measured the modelthrough the use of bsreg () command, standard deviation, kurtosis,and coefficients of skewness.
Theauthors of this topic also examine the actual data set taken throughthe R package known as faraway which corresponds to the estimatedsales as well as actual sales of 20 consumer products. The projectedsales are represented by regressor, xi, in M$ while the actual salesare represented by response variable, Yi, in M$). The data can bemade available through faraway package or the command data. Theresearchers were able to give a descriptive account of the actualsales which encompass the mean, standard deviation, media,coefficient of variation and the minimum and maximum values. Theauthors also illustrated a scattered plot analysis of the actualsales and projected sales, the boxplot, the histogram, and others.The adjusted boxplots for data in asymmetric can be formed throughthe command adjbox () of a robustbase or R package. An exploratorydata analysis was also performed.
Thenew regression models of Birnbaum and Saunders have qualities thatare not accessible in the models of this kind. In particular, the newmodels suggested in the article enable the description of the averageor mean of the data in their actual scale whereas in the presentmodels, it merely makes use of logarithmic changes of the data thusstimulating a probable decrease of the power of the research andcomplexities in the interpretation of results. Furthermore, the newmodels simply allow the description of data having non-constantvariance. As for the current Birnbaum and Saunders models, none ofthem can describe all facets simultaneously. Aside from this fact,the new models are flexible since they allow the utilization ofvaried non-negative connection functions to associate the mean withregressors. From this study, it can be said that the deviancecomponent residual generally used in linear models is the best methodto use because of its coherence and because of the fact that the newmodels were suggested through the use of the same idea as that of thelinear models. The authors also developed new methods of influencingto evaluate the possible impact of a number of observations on themodel through the use of perturbation schemes. The statisticalmodeling with actual data through the use of the new scheme hasdemonstrated the significance of the main objective of the study. Themethodology used in this article was able to implement the regressionsoftware and it is accessible to all interested users.
Bryant,F., & Yarnold, P. (1995). Principal-components analysis andexploratory and confirmatory factor analysis. AmericanPsychological Association.
Cudeck,R. (2000). Exploratory factor analysis. HandbookOf Applied Multivariate Statistics And Mathematical Modeling,265–296.
Engelhardt,M., Bain, L., & Wright, F. (1981). Inferences on the parametersof the Birnbaum-Saunders fatigue life distribution based on maximumlikelihood estimation. Technometrics, 23(3),251–256.
Ford,J., MacCallum, R., & Tait, M. (1986). The application ofexploratory factor analysis in applied psychology: A critical reviewand analysis. PersonnelPsychology, 39(2),291–314.
Kundu,D., Kannan, N., & Balakrishnan, N. (2008). On the hazard functionof Birnbaum–Saunders distribution and associatedinference. ComputationalStatistics & Data Analysis, 52(5),2692–2702.
Leiva,V., Santos-Neto, M., Cysneiros, F., & Barros, M. (n.d.).Birnbaum-Saunders statistical modelling: a new approach.
Rieck,J., & Nedelman, J. (1991). A Log-Linear Model for theBirnbaum—Saunders Distribution.Technometrics, 33(1),51–60.
Williams,B., Onsman, A., & Brown, T. (2010). Exploratory factor analysis:A five-step guide for novices. JournalOf Emergency Primary Health Care, 8(3).