Factor analysis how does it work




















Think of factor analysis as shrink wrap. When applied to a large amount of data, it compresses the set into a smaller set that is far more manageable, and easier to understand. Determining when to use particular statistical methods to get the most insight out of your data can be tricky. There are three main forms of factor analysis. If your goal aligns to any of these forms, then you should choose factor analysis as your statistical method of choice:.

Exploratory Factor Analysi s should be used when you need to develop a hypothesis about a relationship between variables. Confirmatory Factor Analysis should be used to test a hypothesis about the relationship between variables. Construct Validity should be used to test the degree to which your survey actually measures what it is intended to measure. Large datasets are the lifeblood of factor analysis. If the scale has been adapted in some way, or if it is being empirically examined for the first time, all of the factor loadings and factor correlations should also be reported so future researchers can compare their values with these original estimates.

These could be reported as a standalone instrument validation paper or in the methods section of a study using that instrument. If the researcher wants to continue to use the existing items, it is prudent to investigate this misfit to better understand the relationships between the items. This calls for the use of an EFA, where the relationships between variables and factors are not predetermined i. As mentioned before, EFA could also be the first choice for a researcher if the instrument is in an early stage of development.

We outline the steps for conducting an EFA in the following sections. See Box 4 for a description of how to describe analytical considerations for an EFA in the methods section. Because the results from the initial CFA indicated that the data did not support a two-factor solution, we proceeded with an EFA to explore the factor structure of the data. Considering the ordinal and nonnormal nature of the data, a principal axis factor estimator was used to extract the variances from the data.

Only cases with complete items were used in the EFA. Due to the fact that theory and the preceding CFA indicated that the different subscales are correlated, quartimin rotation an oblique rotation was chosen for the EFA.

Visual inspection of the scree plot, parallel analysis PA based on eigenvalues from the principal components and factor analysis in combination with theoretical considerations were used to decide on the appropriate number of factors to retain.

PA was implemented with the psych package Revelle, Just as with CFA, the first step in an EFA is selecting a statistical method to use to extract the variances from the data.

The considerations for the selection of this estimator are similar to those for CFA see Selecting an Estimator. One of the most commonly used methods for extracting variance when conducting an EFA on ordinal data with slight nonnormality is principal axis factoring Leandre et al.

Factor rotation is a technical step to make the final output from the model easier to interpret see Bandalos, , pp. The main decision for the researcher to make here is whether the rotation should be orthogonal or oblique Raykov and Marcoulides, ; Leandre et al.

Orthogonal means that the factors are uncorrelated to one another in the model. Oblique allows the factors to correlate to one another. In educational studies, factors are likely to correlate to one another; thus oblique rotation should be chosen unless a strong hypothesis for uncorrelated factors exists Leandre et al. Orthogonal and oblique are actually families of rotations, so once the larger choice of family is made, a specific rotation method must be chosen. The specific rotation method within the oblique category that is chosen does not generally have a strong effect on the results Bandalos and Finney, However, the researcher should always provide information about which rotation method was used Bandalos and Finney, After selecting the methods for estimation and rotation, researchers must determine how many factors to extract for EFA.

This step is recognized as the greatest challenge of an EFA, and the issue has generated a large amount of debate e. Eigenvalues are roughly a measure of the amount of information contained in a factor, so factors with higher eigenvalues are the most useful for understanding the data. A scree plot is a plot of eigenvalues versus number of factors. Scree plots allow researchers to visually estimate the number of factors that are informative by considering the shape of the plot see the annotated output in the Supplemental Material, Section 2, for an example of a scree plot.

These two methods are considered heuristic, and many researchers recommend also using parallel analysis PA or the minimum average partial correlation test to determine the appropriate number of factors Ledesma and Valero-Mora, ; Leandre et al.

In addition, several statistics that mathematically analyze the shape of the scree plot have been developed in an effort to provide a nonvisual method of determining the number of factors Ruscio and Roche, ; Raiche et al.

We recommend using a number of these indices, as well as theoretical considerations, to determine the number of factors to retain. The results of all of the various methods discussed provide plausible solutions that can all be explored to evaluate the best solution. When these indices are in agreement, this provides more evidence of a clear factor structure in the data.

To make each factor interpretable, it is of outmost importance that the number and nature of factors retained make theoretical sense see Box 5 for a discussion on how many factors to retain.

Further, the intended use for the survey should also be considered. For example, say a researcher is interested in studying two distinct populations of students. Parallel analysis based on eigenvalues from the principal components and factor analysis indicated three components and five factors. The scree plot indicated an initial leveling out at four factors and a second leveling out at six factors. We started by running a three-factor model and then increased the number of factors by one until we had run all models ranging from three to six factors.

The pattern matrices were then examined in detail with a special focus on whether the factors made theoretical sense see Table 2 for pattern matrices for the three-, four-, and five-factor models. The items originally representing agentic goals were split into two factors. In the four-factor solution, the autonomy and competency items were split into two different factors. In the five-factor solution, three items from the original communal goals scale working with people, connection to others, and intimacy contributed most to the additional factor.

For a six-factor solution, the sixth factor included only one item with pattern loadings greater than 0. In conclusion, the communal scale might represent one underlying construct as suggested by previous research or it might be split into two subscales represented by items related to 1 serving others and 2 connection. Our data did not support a single agentic factor. Instead, these items seemed to fit on two or three subscales: prestige, autonomy, and possibly competency.

Because all the suggested solutions three-, four-, and five-factor solutions included a number of poorly fitting items, we decided to remove items and run a second set of EFAs before proceeding to the CFA. The new items showing low pattern coefficients were items belonging to their own factors in the five-factor EFA i. To further explore a five-factor solution, we decided, on the basis of the empirical results and the theoretical meaning of the items, to stepwise remove items 4 mastery , 14 competition , and 22 intimacy.

If some items continue to show pattern coefficients below 0. The new item five-factor solution resulted in theoretically the same factors as for the first five-factor EFA, but now all pattern coefficients but one were above 0. TABLE 3. Standardized pattern coefficients for the Diekman et al. In conclusion, the initial CFA, as well as the EFA analysis, indicated that the two-dimensional scale previously suggested was not supported in our sample.

The EFA analysis mainly indicated a three- or a five-factor solution. We continued with this five-factor solution, as it allowed us to retain more of the original items and made theoretical sense, as the five factors were just a further parsing of the original agentic and communal scales. Based on the results from the EFAs, a second CFA was specified using the five-factor model with 20 items excluding 4: mastery, competition, and intimacy. Factor loadings were close to or above 0.

This means that the factors explained most of the items well. Factor correlations were highest between the service and connection factors 0. The lowest factor correlation found was between the prestige and service factors 0.

Coefficient alpha values for the subscales were 0. Results from the final five-factor CFA model. Survey items for items descriptions see Table 3 are represented by squares and factors are represented by ovals. The numbers below the double-headed arrows represent correlations between the factors; the numbers by the one-directional arrows between the factors and the items represent standardized factor loadings.

Small arrows indicate error terms. The results from the factor analysis did not confirm the proposed two-factor goal-endorsement scale for use with college STEM majors. Instead, our results indicated five subscales: prestige, autonomy, competency, service, and connection Table 4. The five-factor solution aligned with Diekman et al. Our sample did, however, allows us to further refine the solution for the original two scales.

Finer parsing of the agentic and communal scales may help identify important differences between students and allow researchers to better understand factors contributing to retention in STEM majors.

In addition, with items related to autonomy and competency moved to their own scales, the refined prestige scale focusing on factors like power, recognition, and status may be a more direct contrast to the service scale. Further, retention may be significantly correlated to prestige but not to autonomy. Alternatively, differences between genders may exist for the service scale but not the connection scale. TABLE 4. Proposed five-factor solution.

Items within each factor are ordered by highest to lowest factor loadings. On the basis of the result of this factor analysis, we recommend using the five-factor solution for interpreting the results of the current data set, but interpret the connection and competency scales with some caution, for reasons summarized in the next section.

The proposed five-factor solution needs additional work. In particular, both the competency and connection scales need further development. Only two items represented connection, and this is not adequate to represent the full aspect of this construct, especially to make it clearly distinct from the construct of service.

The competency scale included only three items, coefficient alpha was 0. Further studies should confirm whether the suggested dimensionality holds in a more representative sample. Future studies should also test whether the instrument has the same structure with STEM students from different backgrounds i. The work presented here only establishes the dimensionality of the survey. The aim of EFA is to gain a better understanding of underlying patterns in the data, investigate dimensionality, and identify potentially problematic items.

In addition to the results from parallel analysis or other methods used to estimate the number of factors, other informative measures include pattern coefficients and communalities. These outputs from an EFA will be discussed in this section. See Box 5 for an example of how to write up the output from an EFA.

Pattern coefficients and communalities are parameters describing the relationship between the items and the factors. They help researchers understand the meaning of the factors and identify items that do not empirically appear to belong to their theorized factor.

Pattern coefficients represent the impact each factor has on an item after controlling for the impact of all the other factors on that item. A high pattern coefficient suggests that the item is well explained by a particular factor. However, as with CFA, there is no clear rule as to when an item has a pattern coefficient too low to be considered part of a particular factor.

Guidelines for minimum pattern coefficient values range from 0. It is also important to consider the magnitude of any cross-loadings. Cross-loading describes the situation in which an item seems to be influenced by more than one factor in the model.

Cross-loading is indicated when an item has high pattern coefficients for multiple factors. Cross-loadings higher than 0. Communality represents the percentage of the variance in responses on an item accounted for by all factors in the proposed model.

However, in CFA, the variance in an item is only explained by one factor, while in EFA, the variance in one item can be explained by several factors. Low communality for an item means that the variance in the item is not well explained by any part of the model, and thus that item could be a subject for elimination.

We emphasize that, even if pattern coefficients or communalities indicate that an item might be subject for elimination, it is important to consider the alignment between the item and hypothesized construct before actually eliminating the item. The items in a scale are presumably chosen for some theoretical reason, and eliminating any items can cause a decrease in content validity Bandalos and Finney, If any item is removed, the EFA should be rerun to ensure that the original factor structure persists.

This can be done on the same data set, as EFA is exploratory in nature. Once the factors and the items make empirical and theoretical sense, the factor solution can be interpreted, and suitable names for the factors should be chosen see Box 5 for a discussion of the output from an EFA. Important sources of information for this include: the amount variance explained by the whole solution and the factors, factor correlations, pattern coefficients, communality values, and the underlying theory.

Because the names of the factors will be used to communicate the results, it is crucial that the names reflect the meaning of the underlying items. Because the item responses are manifestations of the constructs, different sets of items representing a construct will, accordingly, lead to slightly different nuanced interpretations of that construct. Once a plausible solution has been identified by an EFA, it is important to note that stronger support for the solution can be obtained by testing the hypothesized model using a CFA on a new sample.

In this article, we have discussed the need for understanding the validity evidence available for an existing survey before its use in discipline-based educational research. Thus, each time a researcher decides to use an instrument, they have to consider to what degree evidence and theory support the intended interpretations and use of the instrument. A researcher should always review the different kinds of validity evidence described by AERA, APA, and NCME ; Table 1 before using an instrument and should identify the evidence they need to feel confident when employing the instrument for an intended use.

When using several related items to measure an underlying construct, one important validity aspect to consider is whether a set of items can confidently be combined to represent that construct.

In this paper, we have shown how factor analysis both exploratory and confirmatory can be used to investigate that. We recognize that the information presented herein may seem daunting and a potential barrier to carrying out important, substantive, educational research. We appreciate this sentiment and have experienced those fears ourselves, but we feel that properly understanding procedures for vetting instruments before their use is essential for robust and replicable research.

Again, we can use an analogy for the measurement of unobservable phenomena: one would not expect an uncalibrated and calibrated scale to produce the same values for the weight of a rock.

Research conducted using uncalibrated or biased instruments, regardless of discipline, is at risk of inferring conclusions that are incorrect. The researcher may make the appropriate inferences given the values provided by the instrument, but if the instrument itself is invalid for the proposed use, then the inferences drawn are also invalid.

Our aim in presenting these methods is to strengthen the research conducted in biology education and continue to improve the quality of biology education in higher education. We refer interested readers to Bandalos , Sijtsma , and Crocker and Algina This is commonly done by either 1 choosing one of the factor loadings and fixing it to 1 this is done for each factor in the model or 2 by fixing the variance of the latent factors to 1.

We have chosen the former approach for this example. In a similar way as for CFA, these model fit indices can be used to evaluate the fit of the data to the model. Because these values are unstandardized, it is sometimes hard to interpret these relationships.

For this reason, it is common to standardize factor loadings and other model relationships e. Fully discussing the nuances of how to create a single score from a set of items is beyond the scope of this paper, but we would be remiss if we did not at least mention it and encourage the reader to seek more information, such as DiStefano et al. Knekta et al. This article is distributed by The American Society for Cell Biology under license from the author s. It is available to the public under an Attribution—Noncommercial—Share Alike 3.

Add to favorites Download Citations Track Citations. View article. Abstract Across all sciences, the quality of measurements is important. Validity refers to the degree of which evidence and theory support the interpretations of the test score for the proposed use. Did the respondents understand the items as intended by the researcher? Evidence based on relations to other variables Analyses of the relationships of instrument scores to variables external to the instrument and to other instruments that measure the same construct or related constructs Can the instrument detect differences in the strength of communal goal endorsement between women and men that has been found by other instruments?

Evidence based on the consequences of testing b The extent to which the consequences of the use of the score are congruent with the proposed uses of the instrument Will the use of the instrument cause any unintended consequences for the respondent?

Is the instrument identifying students who need extra resources as intended? Translational Issues in Psychological Science , 1 4 , Standards for educational and psychological testing. Washington, DC. Google Scholar Andrews, S. Link , Google Scholar Armbruster, P. Active learning and student-centered pedagogy improve student attitudes and performance in introductory biology.

Link , Google Scholar Bakan, D. The duality of human existence: An essay on psychology and religion. Google Scholar Bandalos, D. Measurement theory and applications for the social sciences. New York: Guilford. Factor analysis. Helps you understand by just reading it once quite the contrary for the definitions on the other websites.

Hi Maike, I have a survey with 15 q, 3 measure reading ability, 3 writing, 3 understanding, 3 measure monetary values and 3 measure literacy unrelated aspects. Thanks for your help. Very clear explanation and useful examples. I woudl liek to aks you somehting. I would like to design a questionnaire using Likert scale that I can use for factor analysis. Let us say I need to find out the view of a student if they have a negative attitude towards learning a subject. Where you talked about the amount of variance a factor captures and eigenvalue that measures that.

Thanks Doc This has been the most understandable explanation I have so far had. You mentioned something about your next post? May you please also talk about factor analysis using R. Good day to you. I have a question on factor analysis. I have a pool of 30 items for my construct, then I conducted the PCs, with nine items. After conducted the CFA, it only has three items.

Does this acceptable? I have two kinds of questions: one with a 5-option response and another with a 7-option one.

Can I run exploratory FA on both at the same time? But, mathematically, is it right? You can then use those combination variables — indices or subscales — in other analyses.

Kindly guide me about this I will waiting for your answer. I am grateful to have little idea on how to apply factor analysis. But stil sir!

How would I enter data on exel spreat sheet and how will I start running the analysis? I am ph. D student and one of my objective of the study has to do with factor analysis. I have identify four factors with twenty three variable in question. Pls explain step by step for me. Thanks and best regard.

Looking forward to hear from you sir. Thank you very much Dr. I have struggled 13 months to understand Factor Analysis, and this has been the simple and very helpful. Thank you again. Dear Dr Thanks very much for you explanation on factor analysis, even those who beginners in statistics like me can follow your elaborations.

As i am using Factor analysis by SPSS in my master research, i got five factors related to my research. What does this matrix endicated for? Can you help please? Thank you very much for posting it!

Hence, your first formula, represents the required info. How can I emerge these values to one value and group each respondent into e. Very clear and useful description, also understandable for non-mathematicians, e. Many thanks for posting this! This was the best and and easiest to understand explanation of Factor Analysis I have found. I will book mark your page as a future reference.

Your email address will not be published. What is a factor? What are factor loadings? Variables Factor 1 Factor 2 Income 0. Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess.

To see this in action for Item 1 run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze — Regression — Linear and enter q01 under Dependent and q02 to q08 under Independent s.

Note that 0. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us. Like PCA, factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3. This represents the total common variance shared among all items for a two factor solution. The next table we will look at is Total Variance Explained.

In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column. The main difference now is in the Extraction Sums of Squares Loadings. We notice that each corresponding row in the Extraction column is lower than the Initial column.

This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower. Factor 1 explains Just as in PCA the more factors you extract, the less variance explained by each successive factor. A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criteria Analyze — Dimension Reduction — Factor — Extraction , it bases it off the Initial and not the Extraction solution.

This is important because the criteria here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error. Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1. If you want to use this criteria for the common variance explained you would need to modify the criteria yourself.

Answers: 1. When there is no unique variance PCA assumes this whereas common factor analysis does not, so this is in theory and not in practice , 2.

F, it uses the initial PCA solution and the eigenvalues assume no unique variance. First note the annotation that 79 iterations were required. If we had simply used the default 25 iterations in SPSS, we would not have obtained an optimal solution. The elements of the Factor Matrix table are called loadings and represent the correlation of each item with the corresponding factor.

Just as in PCA, squaring each loading and summing down the items rows gives the total variance explained by each factor. Note that they are no longer called eigenvalues as in PCA. This number matches the first row under the Extraction column of the Total Variance Explained table. We can repeat this for Factor 2 and get matching results for the second row. Additionally, we can get the communality estimates by summing the squared loadings across the factors columns for each item.

For example, for Item Note that these results match the value of the Communalities table for Item 1 under the Extraction column. This means that the sum of squared loadings across factors represents the communality estimates for each item.

We will use the term factor to represent components in PCA as well. These elements represent the correlation of the item with each factor. Now, square each element to obtain squared loadings or the proportion of variance explained by each factor for each item.

Summing the squared loadings across factors you get the proportion of variance explained by all factors in the model. This is known as common variance or communality, hence the result is the Communalities table.

These now become elements of the Total Variance Explained table. Summing down the rows i. In words, this is the total common variance explained by the two factor solution for all eight items. Equivalently, since the Communalities table represents the total common variance explained by both factors for each item, summing down the items in the Communalities table also gives you the total common variance explained, in this case. True or False the following assumes a two-factor Principal Axis Factor solution with 8 items.

F, the sum of the squared elements across both factors, 3. F, sum all eigenvalues from the Extraction column of the Total Variance Explained table, 6. F, the total Sums of Squared Loadings represents only the total common variance excluding unique variance, 7. F, eigenvalues are only applicable for PCA. To run a factor analysis using maximum likelihood estimation under Analyze — Dimension Reduction — Factor — Extraction — Method choose Maximum Likelihood.

Although the initial communalities are the same between PAF and ML, the final extraction loadings will be different, which means you will have different Communalities, Total Variance Explained, and Factor Matrix tables although Initial columns will overlap. The other main difference is that you will obtain a Goodness-of-fit Test table, which gives you a absolute test of model fit.

Non-significant values suggest a good fitting model. Here the p -value is less than 0. In practice, you would obtain chi-square values for multiple factor analysis runs, which we tabulate below from 1 to 8 factors.

The table shows the number of factors extracted or attempted to extract as well as the chi-square, degrees of freedom, p-value and iterations needed to converge. Note that as you increase the number of factors, the chi-square value and degrees of freedom decreases but the iterations needed and p-value increases. Practically, you want to make sure the number of iterations you specify exceeds the iterations needed.

In SPSS, no solution is obtained when you run 5 to 7 factors because the degrees of freedom is negative which cannot happen. The number of factors will be reduced by one. It looks like here that the p -value becomes non-significant at a 3 factor solution. Note that differs from the eigenvalues greater than 1 criteria which chose 2 factors and using Percent of Variance explained you would choose factors.

We talk to the Principal Investigator and at this point, we still prefer the two-factor solution. We will talk about interpreting the factor loadings when we talk about factor rotation to further guide us in choosing the correct number of factors.

F, the two use the same starting communalities but a different estimation process to obtain extraction loadings, 3. F, only Maximum Likelihood gives you chi-square values, 4. F, greater than 0. T, we are taking away degrees of freedom but extracting more factors. As we mentioned before, the main difference between common factor analysis and principal components is that factor analysis assumes total variance can be partitioned into common and unique variance, whereas principal components assumes common variance takes up all of total variance i.

For both methods, when you assume total variance is 1, the common variance becomes the communality. The communality is unique to each item, so if you have 8 items, you will obtain 8 communalities; and it represents the common variance explained by the factors or components.

However in the case of principal components, the communality is the total variance of each item, and summing all 8 communalities gives you the total variance across all items. In contrast, common factor analysis assumes that the communality is a portion of the total variance, so that summing up the communalities represents the total common variance and not the total variance. In summary, for PCA, total common variance is equal to total variance explained , which in turn is equal to the total variance, but in common factor analysis, total common variance is equal to total variance explained but does not equal total variance.

The following applies to the SAQ-8 when theoretically extracting 8 components or factors for 8 items:. F, the total variance for each item, 3.

F, communality is unique to each item shared across components or factors , 5. After deciding on the number of factors to extract and with analysis model to use, the next step is to interpret the factor loadings. Factor rotations help us interpret factor loadings. There are two general types of rotations, orthogonal and oblique.

The goal of factor rotation is to improve the interpretability of the factor solution by reaching simple structure. Without rotation, the first factor is the most general factor onto which most items load and explains the largest amount of variance. This may not be desired in all cases.

Suppose you wanted to know how well a set of items load on each factor; simple structure helps us to achieve this. For the following factor matrix, explain why it does not conform to simple structure using both the conventional and Pedhazur test. We know that the goal of factor rotation is to rotate the factor matrix so that it can approach simple structure in order to improve interpretability.

Orthogonal rotation assumes that the factors are not correlated. The benefit of doing an orthogonal rotation is that loadings are simple correlations of items with factors, and standardized solutions can estimate unique contribution of each factor. The most common type of orthogonal rotation is Varimax rotation. We will walk through how to do this in SPSS.

First, we know that the unrotated factor matrix Factor Matrix table should be the same. Additionally, since the common variance explained by both factors should be the same, the Communalities table should be the same. The main difference is that we ran a rotation, so we should get the rotated solution Rotated Factor Matrix as well as the transformation used to obtain the rotation Factor Transformation Matrix.

Finally, although the total variance explained by all factors stays the same, the total variance explained by each factor will be different. The Rotated Factor Matrix table tells us what the factor loadings look like after rotation in this case Varimax.

Kaiser normalization is a method to obtain stability of solutions across samples. After rotation, the loadings are rescaled back to the proper size. This means that equal weight is given to all items when performing the rotation.



0コメント

  • 1000 / 1000