Volume 39, Issue 5 pp. 1035-1064
RESEARCH ARTICLE
Open Access

Progress in partial least squares structural equation modeling use in marketing research in the last decade

Marko Sarstedt

Corresponding Author

Marko Sarstedt

Munich School of Management, Ludwig-Maximilians-University Munich, Munich, Germany

Faculty of Economics and Business Administration, Babeș-Bolyai University, Cluj-Napoca, Romania

Correspondence Marko Sarstedt, Ludwig-Maximilians-University Munich, Munich, Germany.

Email: [email protected]

Search for more papers by this author
Joseph F. Hair

Joseph F. Hair

Mitchell College of Business, University of South Alabama, Mobile, Alabama, USA

Search for more papers by this author
Mandy Pick

Mandy Pick

Faculty of Economics and Management, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany

Search for more papers by this author
Benjamin D. Liengaard

Benjamin D. Liengaard

Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark

Search for more papers by this author
Lăcrămioara Radomir

Lăcrămioara Radomir

Faculty of Economics and Business Administration, Babeș-Bolyai University, Cluj-Napoca, Romania

Search for more papers by this author
Christian M. Ringle

Christian M. Ringle

Hamburg University of Technology (TUHH), Hamburg, Germany

Search for more papers by this author
First published: 27 January 2022
Citations: 117

Abstract

Partial least squares structural equation modeling (PLS-SEM) is an essential element of marketing researchers' methodological toolbox. During the last decade, the PLS-SEM field has undergone massive developments, raising the question of whether the method's users are following the most recent best practice guidelines. Extending prior research in the field, this paper presents the results of a new analysis of PLS-SEM use in marketing research, focusing on articles published between 2011 and 2020 in the top 30 marketing journals. While researchers were more aware of the when's and how's of PLS-SEM use during the period studied, we find that there continues to be some delay in the adoption of model evaluation's best practices. Based on our review results, we provide recommendations for future PLS-SEM use, offer guidelines for the method's application, and identify areas of further research interest.

1 INTRODUCTION

For many years, estimating models with complex inter-relationships between observed concepts and their latent variables was equivalent to executing factor-based structural equation modeling (SEM). Recent research, however, demonstrates the rise of partial least squares (PLS) as a composite-based alternative (Jöreskog & Wold, 1982). PLS-SEM applications have grown exponentially in the past decade (Hair et al., 2022), especially in the social sciences (e.g., Ali et al., 2018; Ringle et al., 2020; Willaby et al., 2015), but also in other fields of scientific inquiry, such as agricultural science, engineering, environmental science, and medicine (e.g., Durdyev et al., 2018; Menni et al., 2018; Svensson et al., 2018). The availability of comprehensive software programs with an intuitive graphical user interface (Sarstedt & Cheah, 2019), application guideline articles (e.g., Chin, 1998; Hair et al., 2011; Henseler et al., 2009), and textbooks (e.g., Hair et al., 2022; Ramayah et al., 2018; Wong, 2019), all of which have made the method available for nontechnical use, have shaped the field significantly and contributed to PLS-SEM's dissemination.

An article by Hair et al. (2012) has had a lasting impact on the marketing and consumer behavior disciplines—as evidenced by its massive citation count. In this paper, the authors review more than 200 studies using PLS-SEM and published in top 30 ranked marketing journals between 1981 and 2010. Based on their evaluation of PLS-SEM applications according to a wide range of criteria pertaining to, for example, model characteristics and assessment practices, Hair et al. (2012, p. 428) identify “misapplications of the technique, even in top-tier marketing journals,” noting that “researchers do not fully capitalize on the criteria available for model assessment and sometimes even misapply measures.” The authors also derive comprehensive guidelines for algorithmic settings, measurement and structural model evaluation criteria, as well as complementary analyses, which have become an anchor for the method's follow-up extensions and applications.

It has been a decade since the publication of Hair et al.'s (2012) article. During these years, the PLS-SEM field has undergone extensive methodological developments (Hwang et al., 2020; Khan et al., 2019). Research has shaped the method's understanding (e.g., Rigdon, 2012), introduced new metrics (e.g., Liengaard et al., 2021), and clarified aspects related to model specification and data considerations (e.g., Rigdon, 2012; Sarstedt et al., 2016), all of which are relevant for PLS-SEM users. Some of these extensions emerged from controversies about PLS-SEM's general efficacy (Evermann & Rönkkö, 2021), such as guidelines for identifying and treating endogeneity (Hult et al., 2018), methods for estimating common factor models (Bentler & Huang, 2014; Dijkstra & Henseler, 2015; Kock, 2019), and novel means of assessing discriminant validity (Henseler et al., 2015). In addition, best practices, for example, with regard to measurement and structural model assessment (Hair et al., 2020b), have solidified in recent years as part of the PLS-SEM field's further maturation. But has PLS-SEM use truly been understood and accepted in the marketing research field? Have researchers taken the most recent best practice guidelines on board? Are there still cases of misapplications as previously pointed out by Hair et al. (2012).

This paper addresses these questions by presenting the results of a new review of PLS-SEM use in marketing research, focusing on articles published between 2011 and 2020 in the top 30 marketing journals. By applying the same coding scheme as Hair et al. (2012), our analysis facilitates drawing conclusions about developments in PLS-SEM use over time and identifying potential points of concern. In addition, our review covers recent developments in the field, such as improved metrics for assessing internal consistency reliability (Dijkstra & Henseler, 2015), discriminant validity (Henseler et al., 2015), and predictive power (Shmueli et al., 2016) to explore whether researchers follow the latest advances in the field. Based on our review results, we spell out recommendations for future PLS-SEM use, offer guidelines for the method's application, and identify areas of further research interest. Our overarching aim is to improve the rigor of the PLS-SEM method's application.

2 REVIEW OF PLS-SEM RESEARCH PUBLISHED BETWEEN 2011 AND 2020

To ensure comparability with Hair et al.'s (2012) article, we reviewed PLS-SEM applications in the top 30 marketing journals according to Hult et al. (2009) journal rankings. Since Hair et al. (2012) examined the period from 1981 to 2010, we focused our review on the following 10 years (i.e., 2011–2020) to identify possible developments and provide an overview of PLS-SEM use in recent marketing studies. We also used the keywords “partial least squares” and “PLS” to conduct a full-text search in the Clarivate Analytics' Web of Science, EBSCO Information Services, and Elsevier's Scopus databases. We ensured that we would retrieve the complete set of relevant articles by using the same search terms to search the journals' websites. The search was completed on January 12, 2021.

We excluded all articles matching our keyword list, but not related to PLS-SEM (e.g., private labels, PLs). We also excluded articles applying PLS regression or only mentioning PLS-SEM, but not applying the method. Following these adjustments, we conducted a detailed review of all the articles published in journals with interdisciplinary content (Journal of Business Research, Journal of International Business Studies, Journal of Product Innovation Management, and Management Science) to identify those in marketing. Finally, we excluded all articles introducing methodological advancements of the method (e.g., Shmueli et al., 2016) or including simulation studies comparing PLS-SEM with other methods (e.g., Hair et al., 2017b).

The search produced a total of 239 articles applying PLS-SEM in the last 10 years (Table 1), which is a steep increase compared to the previous period's 204 articles (Hair et al., 2012). A breakdown of the number of articles by year (Figure 1) shows that PLS-SEM use surged substantially in 2019 and 2020, with 37 and 36 articles, respectively—only 2010 saw a higher number of PLS-SEM articles. These results clearly indicate an enduring upward trend in PLS-SEM use since the early 2000s.

Table 1. PLS-SEM studies in the top 30 marketing journals between 2011 and 2020
Advances in Consumer Research Wauters, Brengman, and Janssens 2011
European Journal of Marketing Akman, Plewa, and Conduit 2019
Aspara and Tikkanen 2011
Carlson, Gudergan, Gelhard, and Rahman 2019
Chen, Peng, and Hung 2015
Coelho and Henseler 2012
Davari, Iyer, and Guzmán 2017
Dessart, Aldás-Manzano, and Veloutsou 2019
Françoise and Andrews 2015
Gaston-Breton, and Duque 2015
Huang and Tsai 2013
Krishen, Leenders, Muthaly, Ziółkowska, and LaTour 2019
Mo, Yu, and Ruyter 2020
Nath 2020
Olbrich and Schultz 2014
Ormrod and Henneberg 2011
Piehler, King, Burmann, and Xiong 2016
Šerić 2017
Singh and Söderlund 2020
Vidal 2014
Wijayaratne, Reid, Westberg, Worsley, and Mavondo 2018
Willems, Brengman, and Kerrebroeck 2019
Yu, Ruyter, Patterson, and Chen 2018
Industrial Marketing Management Ali, Ali, Salam, Bhatti, Arain, and Burhan 2020
Berghman, Matthyssens, and Vandenbempt 2012
Camisón and Villar-López 2011
Faroughian, Kalafatis, Ledden, Samouel, and Tsogas 2012
Ferreras-Méndez, Newell, Fernández-Mesa, and Alegre 2015
Genc, Dayan, and Genc 2019
Gupta, Drave, Dwivedi, Baabdullah, and Ismagilova 2020
Harmancioglu, Sääksjärvi, and Hultink 2020
Hazen, Overstreet, Hall, Huscroft, and Hanna 2015
Heirati, O'Cass, Schoefer, and Siahtiri 2016
Hossain, Akter, Kattiyapornpong, and Dwivedi 2020
Inigo, Ritala, and Albareda 2020
Jain, Khalil, Johnston, and Cheng 2014
Joachim, Spieth, and Heidenreich 2018
Lopes de Sousa Jabbour, Vazquez-Brust, Chiappetta Jabbour, and Latan 2017
Mahlamäki, Rintamäki, and Rajah 2019
Mahlamäki, Storbacka, Pylkkönen, and Ojala 2020
Nagati and Rebolledo 2013
Nenonen, Storbacka, and Frethey-Bentham 2019
Ng, Ding, and Yip 2013
Niu, Deng, and Hao 2020
Poucke, Matthyssens, Weele, and Bockhaven 2019
Pulles, Schiele, Veldman, and Hüttinger 2016
Ritter and Geersbro 2011
Rollins, Bellenger, and Johnston 2012
Shahzad, Ali, Takala, Helo, and Zaefarian 2018
Sluyts, Matthyssens, Martens, and Streukens 2011
Stekelorum, Laguir, and Elbaz 2020
Teller, Alexander, and Floh 2016
Vries, Schepers, Weele, and Valk 2014
Yeniaras, Kaya, and Dayan 2020
International Journal of Research in Marketing Miao and Evans 2012
International Marketing Review Andéhn and L'Espoir Decosta 2016
Freeman and Styles 2014
Griffith, Lee, Yeo, and Calantone 2014
Jean, Wang, Zhao, and Sinkovics 2016
Kumar, Singh, Pereira, and Leonidou 2020
Moon and Oh 2017
Oliveira Duarte and Silva 2020
Pinho and Thompson 2017
Rahman, Uddin, and Lodorfos 2017
Rippé, Weisfeld-Spolter, Yurova, and Sussan 2015
Singh and Duque 2020
Sinkovics, Sinkovics, and Jean 2013
Journal of Advertising Coleman, Royne, and Pounders 2020
José-Cabezudo and Camarero-Izquierdo 2012
Journal of Advertising Research Archer-Brown, Kampani, Marder, Bal, and Kietzmann 2017
Dennis and Gray 2013
Miltgen, Cases, and Russell 2019
Robinson and Kalafatis 2020
Singh, Crisafulli, and La Quamina 2020
Journal of Business Research Ahrholdt, Gudergan, and Ringle 2019
Albert, Merunka, and Valette-Florence 2013
Ali, Ali, Grigore, Molesworth, and Jin 2020
Ballestar, Grau-Carles, and Sainz 2016
Banik, Gao, and Rabbanee 2019
Barhorst, Wilson, and Brooks 2020
Blocker 2011
Borges-Tiago, Tiago, and Cosme 2019
Caputo, Mazzoleni, Pellicelli, and Muller 2020
Cenamor, Parida, and Wincent 2019
Cervera-Taulet, Pérez-Cabañero, and Schlesinger 2019
Chang, Shen, and Liu 2016
Del Sánchez de Pablo González Campo, Peña García Pardo, and Hernández-Perlines 2014
Ferrell, Harrison, Ferrell, and Hair 2019
Flecha-Ortíz, Santos-Corrada, Dones-González, López-González, and Vega 2019
Galindo-Martín, Castaño-Martínez, and Méndez-Picazo 2019
Gelhard and Delft 2016
Gudergan, Devinney, and Ellis 2016
Hernández-Perlines 2016
Hsieh 2020
Iglesias, Markovic, and Rialp 2019
Japutra and Molinillo 2019
Japutra, Ekinci, and Simkin 2019
Kapferer and Valette-Florence 2019
Kühn, Lichters, and Krey 2020
Leischnig, Henneberg, and Thornton 2016
Leong, Hew, Ooi, and Chong 2020
Martins, Costa, Oliveira, Gonçalves, and Branco 2019
McColl-Kennedy, Hogan, Witell, and Snyder 2017
Méndez-Suárez and Monfort 2020
Merz, Zarantonello, and Grappi 2018
Mourad and Valette-Florence 2016
Navarro-García, Arenas-Gaitán, Javier Rondán-Cataluña, and Rey- Moreno 2016
Navarro-García, Sánchez-Franco, and Rey-Moreno 2016
Ohiomah, Andreev, Benyoucef, and Hood 2019
Oliveira Duarte and Pinho 2019
Padgett, Hopkins, and Williams 2020
Palos-Sanchez, Saura, and Martin-Velicia 2019
Peterson 2020
Picón, Castro, and Roldán 2014
Reguera-Alvarado, Blanco-Oliver, and Martín-Ruiz 2016
Rippé, Smith, and Dubinsky 2018
Roy, Balaji, Soutar, Lassar, and Roy 2018
Saleh Al-Omoush, Orero-Blat, and Ribeiro-Soriano 2020
Schubring, Lorscheid, Meyer, and Ringle 2016
Segarra-Moliner and Moliner-Tena 2016
Sener, Barut, Oztekin, Avcilar, and Yildirim 2019
Sharma and Jha 2017
Skarmeas, Saridakis, and Leonidou 2018
Suhartanto, Dean, Nansuri, and Triyuni 2018
Tajvidi, Richard, Wang, and Hajli 2020
Takata 2016
Thakur and Hale 2013
Tran, Lin, Baalbaki, and Guzmán 2020
Valette-Florence, Guizani, and Merunka 2011
Wu, Raab, Chang, and Krishen 2016
Zhang, He, Zhou, and Gorp 2019
Zollo, Filieri, Rialti, and Yoon 2020
Journal of Interactive Marketing Buzeta, Pelsmacker, and Dens 2020
Divakaran, Palmer, Søndergaard, and Matkovskyy 2017
Journal of International Business Studies Lam, Ahearne, and Schillewaert 2012
Lew, Sinkovics, Yamin, and Khan 2016
Journal of International Marketing Johnston, Khalil, Jain, and Cheng 2012
Journal of Marketing Köhler, Rohm, Ruyter, and Wetzels 2011
Journal of Marketing Management Ashill and Jobber 2014
Balaji and Roy 2017
Barnes and Mattsson 2011
Bennett 2011
Bennett 2018
Bennett and Kottasz 2011
Brettel, Engelen, and Müller 2011
Brill, Munoz, and Miller 2019
Carlson, Rahman, Rosenberger, and Holzmüller 2016
Carlson, Rosenberger, and Rahman 2015
Chiang, Wei, Parker, and Davey 2017
Dall'Olmo Riley, Pina, and Bravo 2015
Falkenreck and Wagner 2011
Fernandes and Castro 2020
Finch, Hillenbrand, O'Reilly, and Varella 2015
Hankinson 2012
Helme-Guizon and Magnoni 2019
Iriana, Buttle, and Ang 2013
Jack and Powers 2013
King, Grace, and Weaven 2013
Ledden, Kalafatis, and Mathioudakis 2011
Mouri, Bindroo, and Ganesh 2015
Ngo and O'Cass 2012
Papagiannidis, Pantano, See-To, and Bourlakis 2013
Richard and Zhang 2012
Ross and Grace 2012
Roy, Balaji, and Nguyen 2020
Roy, Singh, Hope, Nguyen, and Harrigan 2019
Stocchi, Michaelidou, Pourazad, and Micevski 2018
Tabeau, Gemser, Hultink, and Wijnberg 2017
Tafesse and Wien 2018
Taheri, Gori, O'Gorman, Hogg, and Farrington 2016
Teller, Gittenberger, and Schnedlitz 2013
Wu, Jayawardhena, and Hamilton 2014
Wyllie, Carlson, and Rosenberger 2014
Journal of Product Innovation Management Beuk, Malter, Spanjol, and Cocco 2014
Borgh and Schepers 2014
Brettel, Heinemann, Engelen, and Neubauer 2011
Calantone and Rubera 2012
Carbonell and Rodríguez-Escudero 2016
Carbonell and Rodríguez-Escudero 2019
Dubiel, Durmuşoğlu, and Gloeckner 2016
Ernst, Kahle, Dubiel, Prabhu, and Subramaniam 2015
Feurer, Schuhmacher, and Kuester 2019
Hammedi, Riel, and Sasovova 2011
Heidenreich and Handrich 2015
Heidenreich, Spieth, and Petschnig 2017
Jean, Sinkovics, and Hiebaum 2014
Kock, Gemünden, Salomo, and Schultz 2011
Kuester, Homburg, and Hess 2012
Langley, Bijmolt, Ortt, and Pals 2012
Lee and Tang 2018
Mahr, Lievens, and Blazevic 2014
Matsuno, Zhu, and Rice 2014
Mauerhoefer, Strese, and Brettel 2017
McNally, Akdeniz, and Calantone 2011
McNally, Durmuşoğlu, and Calantone 2013
Ngo and O'Cass 2012
Nijssen, Hillebrand, Jong, and Kemp 2012
Pitkänen, Parvinen, and Töytäri 2014
Schuster and Holtbrügge 2014
Siahtiri 2018
Spanjol, Mühlmeier, and Tomczak 2012
Spanjol, Qualls, and Rosa 2011
Zobel 2017
Journal of Public Policy and Marketing Hasan, Lowe, and Petrovici 2019
Journal of Retailing Pelser, Ruyter, Wetzels, Grewal, Cox, and Beuningen 2015
Journal of Service Research Boisvert 2012
Mullins, Agnihotri, and Hall 2020
Journal of the Academy of Marketing Science DeLeon and Chatterjee 2017
Ernst, Hoyer, Krafft, and Krieger 2011
Fombelle, Bone, and Lemon 2016
Hansen, McDonald, and Mitchell 2013
Heidenreich, Wittkowski, Handrich, and Falk 2015
Heijden, Schepers, Nijssen, and Ordanini 2013
Hillebrand, Nijholt, and Nijssen 2011
Houston, Kupfer, Hennig-Thurau, and Spann 2018
Hult, Morgeson, Morgan, Mithas, and Fornell 2017
Leroi-Werelds, Streukens, Brady, and Swinnen 2014
Martin, Johnson, and French 2011
Miao and Evans 2013
Nakata, Zhu, and Izberk-Bilgin 2011
Ranjan and Read 2016
Santos-Vijande, López-Sánchez, and Rudd 2016
Steinhoff and Palmatier 2016
Weerawardena, Mort, Salunke, Knight, and Liesch 2015
Wilden and Gudergan 2015
Wolter and Cronin 2016
Marketing Letters Dugan, Rouziou, and Hochstein 2019
Psychology and Marketing Barnes and Pressey 2012
Borges-Tiago, Tiago, Silva, Guaita Martínez, and Botella-Carrubi 2020
Devece, Llopis-Albert, and Palacios-Marqués 2017
Evers, Gruner, Sneddon, and Lee 2018
Fatima, Mascio, and Johns 2018
Gong and Yi 2018
Hernández-Perlines, Moreno-García, and Yáñez-Araque 2017
Jain, Malhotra, and Guan 2012
Revilla-Camacho, Vega-Vázquez, and Cossío-Silva 2017
Sheng, Simpson, and Siguaw 2019
Verhagen, Dolen, and Merikivi 2019
Zhang and Zhang 2014
  • Note: California Management Review, Harvard Business Review, Journal of Consumer Psychology, Journal of Consumer Research, Journal of Marketing Research, Management Science, Marketing Science, Quantitative Marketing and Economics, and Sloan Management Review did not produce any relevant articles.
  • Journal of Business ceased publication of the journal at the end of 2006.
  • Abbreviation: PLS-SEM, partial least squares structural equation modeling.
  • * We excluded five studies by Caemmerer and Mogos Descotes (2011), Chen et al. (2011), Kidwell et al. (2012), Luo et al. (2015), and Rippé et al. (2019) published in Advances in Consumer Research as these were only published as extended abstracts.
Details are in the caption following the image
Articles per year

The 239 articles were published in 20 journals, with Journal of Business Research (58 articles; 24.27%), Journal of Marketing Management (35 articles, 14.64%), and Industrial Marketing Management (31 articles, 12.97%) being the top three most used publishing outlets. Compared to the previous period, PLS-SEM was applied far more frequently in Journal of Product Innovation Management, Journal of the Academy of Marketing Science, and International Marketing Review. In contrast, European Journal of Marketing and Journal of Marketing published fewer articles using PLS-SEM. Table 2 documents these results with respect to all journals that published more than one PLS-SEM article between 2011 and 2020, also showing the previous period's corresponding frequencies.

Table 2. Journals with more than one publication between 2011 and 2020 compared to 1981–2010
Publications
Journals 1981–2010 (n = 204) Proportion (%) 2011–2020 (n = 239) Proportion (%)
Journal of Business Research 15 7.35 58 24.27
Journal of Marketing Management 6 2.94 35 14.64
Industrial Marketing Management 23 11.27 31 12.97
Journal of Product Innovation Management 11 5.39 30 12.55
European Journal of Marketing 30 14.71 22 9.21
Journal of the Academy of Marketing Science 13 6.37 19 7.95
International Marketing Review 3 1.47 12 5.02
Psychology and Marketing 9 4.41 12 5.02
Journal of Advertising Research 4 1.96 5 2.09
Journal of Advertising 3 1.47 2 0.84
Journal of Interactive Marketing 2 0.98 2 0.84
Journal of International Business Studies 2 0.98 2 0.84
Journal of Service Research 7 3.43 2 0.84

Of the 239 articles, 93 (38.91%) report two or more alternative models or different datasets (e.g., collected in different years, countries, target groups, or resulting from a segmentation), yielding a total number of 486 analyzed PLS path models. In the following, we use the term “studies” to refer to the 239 journal articles and “models” to refer to the 486 PLS path models analyzed in these articles. Compared to the previous period, the average number of PLS path models analyzed per article increased from approximately 1.5 to 2 in more recent years, showing a shift to multistudy designs that have become the norm in marketing and consumer research (McShane & Böckenholt, 2017).

3 REVIEW OF PLS-SEM RESEARCH: 2011–2020

We evaluated the 239 articles and the models included therein in terms of the following criteria, which Hair et al. (2012), follow-up reviews, and conceptual articles (e.g., Hair et al., 2020b2019b2019c) identified as relevant for PLS-SEM use: (1) reasons for using PLS-SEM, (2) data characteristics, (3) model characteristics, (4) measurement model evaluation, (5) structural model evaluation, (6) advanced modeling and analysis techniques, and (7) reporting. Addressing these seven critical issues, we provide an update of PLS-SEM applications in marketing and compare our findings with those of Hair et al. (2012). Where applicable, we discuss major shifts in PLS-SEM use between the two periods under consideration (i.e., 1981–2010 and 2011–2020).

3.1 Reasons for using PLS-SEM

Our review shows that 196 (82.01%) of the 239 studies provide a rationale for using PLS-SEM instead of factor-based SEM or sum scores regression (Table 3). The five most frequently mentioned reasons are: small sample size (114 studies, 47.70%), nonnormal data (76 studies, 31.80%), theory development and exploratory research (73 studies, 30.54%), high model complexity (70 studies, 29.29%), predictive study focus (61 studies, 25.52%), and formative measures (56 studies, 23.43%).

Table 3. Reasons for using PLS-SEM
1981–2010 2011–2020
Number of studies (n = 204) Proportion (%) Number of studies (n = 239) Proportion (%)
Nonnormal data 102 50.00 76 31.80
Small sample size 94 46.08 114 47.70
Formative measures 67 32.84 56 23.43
Explain variance in the endogenous constructs 57 27.94
Theory development and exploratory research 35 17.16 73 30.54
High model complexity 27 13.24 70 29.29
Categorical variables 26 12.75 5 2.09
Theory testing 25 12.25 11 4.60
Predictive study focus 61 25.52
PLS-SEM's popularity and standard use in the field 14 5.86
Moderation effects 12 5.02
Higher-order constructs 8 3.35
Simultaneous use of multi- and single-items constructs 6 2.51
Latent variable scores availability 5 2.09
Mediation effects 3 1.26
Other reasons (e.g., small number of indicators per construct, model comparison assessment, higher statistical power than factor-based SEM) 19 7.95
  • Abbreviation: PLS-SEM, partial least squares structural equation modeling.

The two main reasons mentioned in our study are similar to those reported by Hair et al. (2012) a decade ago. Given the increasing calls to move beyond data characteristics to motivate the use of PLS-SEM and, instead, emphasize the research objective (e.g., Hair et al., 2019b2019c; Sarstedt et al., 2021), this finding raises concerns. While PLS-SEM performs well in terms of statistical power (Sarstedt et al., 2016) and convergence compared to other methods when the sample size is small (Henseler et al., 2014; Reinartz et al., 2009), it is undeniable and axiomatic that small samples adversely affect all statistical techniques, including PLS-SEM (Benitez et al., 2020). In fact, more than two decades ago, Chin (1998, p. 305) warned researchers that “the stability of the estimates can be affected contingent on the sample size.” Cassel et al. (1999) also noted that PLS-SEM analyses should draw on sufficiently large sample sizes to warrant small standard errors. Numerous others, such as Marcoulides and Saunders (2006), Hair et al. (2013), and Rigdon (2016), have clearly advised against relying on the logic of using PLS-SEM to derive results from small samples—unless the nature of the population can justify such a step (e.g., a small population size as commonly encountered in B2B research; Benitez et al., 2020; Hair et al., 2019c). In fact, critics such as Evermann and Rönkkö (2021), Rönkkö and Evermann (2013), and Rönkkö et al. (2016) have repeatedly used PLS-SEM's misapplication as a “strawman” argument to criticize the technique itself, rather than the weak research designs (Petter & Hadavi, 2021; Petter, 2018). Similarly, simulation studies have shown that PLS-SEM does not offer substantial advantages over other SEM methods in terms of parameter accuracy when the data depart somewhat from normality (Goodhue et al., 2012; Reinartz et al., 2009). Reflecting this evidence, guideline articles have emphasized that it is not sufficient to motivate the choice of PLS-SEM over alternative methods primarily on the basis of nonnormality alone (e.g., Hair et al., 2019b2019c2020b).

Despite their strong emphasis on the data characteristics, our results also attest to marketing researchers' higher maturity in terms of motivating their method choice. For example, compared to the previous period, an increasing number of researchers emphasize model complexity and their study's predictive focus. Both are valid arguments, as PLS-SEM can handle highly complex models with many indicators, constructs, and model relations without identification concerns (Akter et al., 2017). When estimating these models, PLS-SEM follows a causal-predictive paradigm aimed at testing the predictive power of a model developed carefully on the basis of theory and logic (Liengaard et al., 2021). The underlying theoretical rationale, also referred to as explanation and prediction-oriented (EP) theory, “corresponds to commonly held views of theory in both the natural and social sciences” (Gregor, 2006, p. 626). Numerous standard models, such as the various customer satisfaction index (e.g., Eklöf & Westlund, 2002; Fornell et al., 1996) or technology acceptance models (e.g., Davis, 1989; Venkatesh et al., 2003), follow an EP-theoretic approach by aiming to explain the cause-effect mechanisms the model postulates, while also generating predictions that underline its practical usefulness (Sarstedt & Danks, 2021). PLS-SEM's ability to strike a balance between machine learning methods, which are fully predictive but atheoretical by nature (Hair & Sarstedt, 2021), and factor-based SEM, which is purely concerned with theory confirmation (Hair, et al., 2021b), makes PLS-SEM a particularly valuable method for applied research disciplines like marketing.

Researchers' increased focus on predictive purposes is no coincidence. Seminal articles by Evermann and Tate (2016) as well as Shmueli et al. (2016) on the method's predictive performance have opened a new chapter in PLS-SEM-based methodological research. For example, Sharma et al. (2021c) introduced prediction-oriented model comparison to the field by identifying metrics that excel at selecting the model with the highest predictive power and adequate fit from a set of competing models. Liengaard et al.'s (2021) cross-validated predictive ability test and its recent extension (Sharma et al., 2021a) enable researchers to test a model's predictive power relative to that of competing models and on a standalone basis. Other studies have extended prior research by comparing PLS-SEM's predictive power with that of Hwang and Takane's (2004) generalized structured component analysis, identifying situations that favor each method's use from a prediction point of view (Cho et al., 2021). Guideline articles that make these techniques accessible to a broader audience have accompanied these methodological extensions (Chin et al., 2020; Hair, 2021; Shmueli et al., 2019). Our review of the reasons given for the choice of PLS-SEM reflects some of these developments, but we expect a further shift toward emphasizing the method's causal-predictive focus in the future.

Finally, and unlike in the previous period, researchers also suggest additional reasons, such as PLS-SEM's popularity and standard use in the field (14 studies, 5.86%), the estimation of moderation effects (12 studies, 5.02%), mediation effects (3 studies, 1.26%) and higher-order constructs (8 studies, 3.35%). Indeed, recent research emphasizes the method's efficacy regarding estimating conditional process models combining mediating and moderating effects in a single analysis (Cheah et al., 2021; Sarstedt et al., 2020a), and clarifies the specification, estimating, and evaluation of higher-order constructs by means of PLS-SEM (Sarstedt et al., 2019). At the same time, fewer studies motivate their choice of PLS-SEM on the grounds of formative measurement—a finding that the studies' model characteristics, which we discuss later, also mirror.

3.2 Data characteristics

Of the 486 models, 476 report the sample size (97.94%). The average sample size (5% trimmed mean = 279.19) and the median (199) of the models in our review are higher than those that Hair et al. (2012) reported for the 1981–2010 period (5% trimmed mean = 211.29; median = 159). Four studies in our review were conspicuous due to their very large sample sizes of between n = 8876 and n = 26,576. At the same time, 85 of the 476 models (17.86%) rely on sample sizes of less than 100 (i.e., the smallest sample size was n = 29). Only 17 of the 476 models (3.57%) fail to meet the ten times rule (Barclay et al., 1995), which is a very rough guideline for determining the minimum sample size required to achieve an adequate level of statistical power (for details and alternatives, see Hair et al., 2022, Chap. 1). Those models that do not meet the ten times rule draw on sample sizes that are, on average, only 10.98% below the recommended level (Table 4).

Table 4. Data characteristics
1981–2010 2011–2020
Models (n = 311) Proportion (%) Models (n = 486) Proportion (%)
Sample size: no. of models reporting 311 100% 476 97.94
5% trimmed mean 211.29 279.19
Median sample size 159 199
Sample size below 100 76 of 311 24.44 85 of 476 17.86
Ten times rule of thumb not met 28 of 311 9.00 17 of 476 3.57
Average deviation from the recommended sample size 45.18% 10.98%
Number of studies (n = 204) Proportion (%) Number of studies (n = 239) Proportion (%)
Missing values: No. of studies reporting 41 17.15
Casewise deletion 39 of 41 95.12
Mean replacement 2 of 41 4.88
Outliers: No. of studies reporting 7 2.93
Nonnormal data: No. of studies reporting 19 9.31 24 10.04
Degree of nonnormality reported 5 of 24 20.83
Use of discrete variables: No. of studies reporting 57 27.94 24 10.04
Binary variables 43 of 57 75.44 15 of 24 62.50
Categorical variables 14 of 57 12.28 9 of 24 37.50

These results are encouraging compared to those in Hair et al. (2012), who reported a greater share of models estimated with less than 100 observations (24.44%), more models that failed the ten times rule (9.00%), and a higher relative deviation from the recommended sample size (45.18%). Marketing researchers—despite their reporting of PLS-SEM's efficacy regarding estimating models with small sample sizes as motivating their method choice—seem to be more concerned about sample size issues than previously. Critical accounts of the method (Goodhue et al., 2012; Rönkkö & Evermann, 2013) and conceptual discussions (Hair et al., 2019c; Rigdon, 2016) have certainly contributed to this increased awareness. Nevertheless, empirical research almost never discusses the statistical power arising from a concrete analysis, despite the available tutorials on power analyses in a PLS-SEM context (Aguirre-Urreta & Rönkkö, 2015). For sample size determination, researchers can also draw on Kock and Hadaya's (2018) inverse square root method, which is relatively easy to apply and offers a more accurate picture of minimum sample size requirements than the ten times rule does.

Missing values are a primary concern in the data collection and processing phases. Nevertheless, only 41 studies (17.15%) acknowledge missing values in their data, while almost all of them (39 of 41 studies, 95.12%) used casewise deletion to treat them (Table 4). Grimm and Wagner (2020) have recently shown that PLS-SEM estimates are very stable when using casewise deletion on datasets with up to 9% missing values. However, when doing so researchers need to ensure the missing value patterns are not systematic, and that the final model estimation yields sufficient levels of statistical power. None of the relevant studies in our review offer such information. While casewise deletion seems to be the default setting for researchers using PLS-SEM, extant guidelines recommend using mean replacement when less than 5% values are missing per indicator or applying more complex procedures, such as maximum likelihood and multiple imputation (Hair et al., 2022). However, while the efficacy of more complex imputation procedures has been tested in a factor-based SEM context (Graham & Coffman, 2012; Lee & Shi, 2021), their impact on PLS-SEM's parameter accuracy and predictive performance remains a blind spot in methodological research.

Even though outliers influence the ordinary least squares regressions in PLS-SEM, only seven studies (2.93%) explicitly considered them. In one of these studies, the authors did not try to identify outliers but used an interval smoothing approach to mitigate the effect of corresponding observations. Of the remaining six studies that considered outliers, three did not mention the specific method used to detect outliers, whereas the remaining three studies included brief descriptions of the approaches: univariate and multivariate outlier detection, inconsistent or overly consistent response patterns, and univariate outlier detection using boxplots followed by detecting incongruence in response patterns. Most of these studies (5 out of 6) removed outliers from their dataset; the other study did not identify any outliers. While the presence of outliers might influence results substantially (Sarstedt & Mooi, 2019), our findings indicate that applied PLS-SEM research generally disregards outlier detection and treatment.

Nonnormal data is the second most frequently mentioned reason for using PLS-SEM. Nevertheless, only 24 studies (10.04%) acknowledge the nonnormal distribution of their data with few studies quantifying the degree of nonnormality by means of, for example, kurtosis and skewness (5 studies; Table 4). This is generally unproblematic, since PLS-SEM is a nonparametric method and, as such, robust in terms of processing nonnormal data (Hair et al., 2017b; Reinartz et al., 2009). While Hair et al. (2022, Chap. 2) note that highly nonnormal data may inflate standard errors derived from bootstrapping, Hair et al.'s (2017b) simulation study results suggest that such data do not impact Type I or II errors negatively.

Using discrete variables (i.e., binary and categorical variables) when measuring models is a final area of concern in PLS-SEM applications. Of the 239 studies, only 24 (10.04%) use binary or categorical variables as elements in measurement models (i.e., not as a grouping variable to split the dataset as part of a multigroup analysis), which is considerably less than what Hair et al. (2012) reported (27.94%). In line with Hair et al.'s (2012) recommendations, which raised concerns regarding their general applicability, especially when used as binary single-item indicators of endogenous constructs (e.g., to represent a choice situation), recent studies primarily use binary and categorical variables to perform multigroup analyses. However, research on the handling of discrete variables in PLS-SEM has progressed since 2012. For example, Hair et al. (2019a) document how to process data from discrete choice experiments where binary indicators representing a choice situation (e.g., buy/not buy) measure the constructs. Similarly, Cantaluppi and Boari (2016) extended the original PLS-SEM algorithm to accommodate ordinal variables when researchers cannot assume equidistance between the scale categories (see also Schuberth et al., 2018b). Given these developments, we expect the use of discrete variables in PLS path models, such as when estimating data from choice experiments, to gain traction in the future.

To summarize, while our results point to improvements in researchers' awareness of sample size issues in PLS-SEM use, they devote too little effort toward quantifying the degree of statistical power associated with the model estimation. Missing data are not routinely discussed, and their treatment in PLS-SEM is still an area of potential concern, which methodological research should address. Researchers should also be aware of recent extensions that facilitate the inclusion of discrete variables in PLS path models.

3.3 Model characteristics

The number of constructs and indicator variables that define model complexity and the use of formative measurement models are the key reasons for PLS-SEM's attractiveness (Hair et al., 2022, Chap. 1). Table 5 offers an overview of these and other model characteristics, contrasting our results with those of Hair et al.'s (2012) review of the 1981–2010 period.

Table 5. Model characteristics
1981–2010 2011–2020
Criterion Results (n = 311) Proportion (%) Results (n = 486) Proportion (%)
Number of latent variables
Mean 7.94 7.39
Median 7.00 7.00
Range (2; 29) (2; 24)
Number of structural model relations
Mean 10.56 11.90
Median 8.00 10.00
Range (1; 38) (1; 70)
Model type
Focused 109 35.05 161 33.13
Unfocused 85 27.33 149 30.66
Balanced 117 37.62 176 36.21
Measurement model
Only reflective 131 42.12 374 76.95
Only formative 20 6.43 7 1.44
Reflective and formative 123 39.55 105 21.60
Not specified 37 11.90 233 47.94
Number of indicators per reflective construct
Mean 3.99 3.85
Median 3.50 3.00
Range (1; 27) (2; 30)
Number of indicators per formative construct
Mean 4.62 4.28
Median 4.00 3.00
Range (1; 20) (2; 38)
Total number of indicators in models
Mean 29.55 29.39
Median 24.00 24.00
Range (4; 131) (3; 222)
Number of models with single-item constructs 144 46.30 177 36.42
  • a Focused model: number of exogenous constructs at least twice as high as the number of endogenous constructs in the model; unfocused model: number of endogenous constructs at least twice as high as the number of exogenous constructs in the model; balanced model: all remaining models.
  • b This paper's authors classified models with missing information about the measurement mode ex post.

The average number of latent variables in the path models is 7.39, which is only slightly lower than in the previous period (7.94). Similarly, the average number of indicators per model is similar in both periods (1981–2010: 29.39; 2011–2020: 29.55), suggesting there has been no change in the model complexity over the years. The average number of structural model relationships, which increased only slightly from 10.56 to 11.90 from the initial period to the most recent, also supports this finding. In addition, we do not observe any substantial changes in the model types considered in the studies; that is, whether the models are focused versus unfocused by comprising a higher versus lower share of exogenous constructs than endogenous constructs. Specifically, the share of focused, unfocused, and balanced models is almost the same and very similar to those reported in Hair et al. (2012). We therefore repeat Hair et al.'s (2012) call to consider focused and balanced models, also because these model types should meet PLS-SEM's prediction goal better.

PLS path models, depending on their measurement model specifications, can be classified as either only reflective, only formative, or a combination of both. We find the majority of the 486 models' constructs (374 models, 76.95%) are only reflectively measured, followed by models with both reflective and formative measures (105 models, 21.60%)—a much higher share of reflective measures compared to Hair et al. (2012). Only 7 of the 486 models' constructs (1.44%) are purely formatively specified. Somewhat surprisingly, a large percentage of the models (233 models; 47.94%) lack a description of the constructs' measurement modes, which is much higher than in Hair et al.'s (2012) review (11.90%). Classifying these models ex post by means of Jarvis et al. (2003) guidelines suggests the overwhelming majority (227 models; 97.42%) should be considered as only comprising reflectively specified constructs. While this result suggests that researchers consider reflective measures the default option, the consequences of measurement model misspecification should not be taken lightly. The impact of misspecifications on downstream estimates are generally not pronounced (Aguirre-Urreta et al., 2016), but the primary problem of such misspecifications arises from content validity concerns, since both measurement types rely on fundamentally different sets of indicators (Diamantopoulos & Siguaw, 2006). In line with Hair et al. (2012), researchers should ensure the measurement model specification is explicit to preempt criticism arising from potential misspecification claims. The confirmatory tetrad analysis for PLS-SEM (Gudergan et al., 2008; CTA-PLS; Hair et al., 2018, Chap. 3) provides a statistical test to confirm the choice of measurement model. However, only six studies (2.51%) used this approach to assess the adequacy of their measurement model specification.

The average number of indicators for reflectively measured constructs is 3.85 and 4.28 for formatively measured ones, which is similar to those in the previous period (Table 5). Contrary to their reflective counterparts, formative measurement models need to cover a much broader content bandwidth, as they are not conceived as interchangeable (i.e., highly correlated) measures of the same theoretical concept (Diamantopoulos & Winklhofer, 2001). Consequently, formative measurement models should have more indicators than reflective models do. Our review supports this notion, but the comparably small, albeit significant (t = 2.037, p = 0.044) difference in the average number of indicators is surprising, suggesting that future studies should devote more attention to content validity concerns in formative measurement.

We also find the share of single-item constructs decreased from 46.30% to 36.42% from one period to the next. On the one hand, this finding is encouraging, as it suggests researchers have become aware of single-item measures' limitations in terms of explanatory power. Specifically, Diamantopoulos et al. (2012) show that, due to the absence of multiple indicators of the same concept, the lack of measurement error correction deflates the model estimates, potentially triggering Type II errors in the model estimates (Sarstedt et al., 2016). On the other hand, single items are still too prevalent in PLS path models, which is particularly problematic when measuring endogenous target constructs. Researchers should generally refrain from using single items unless they measure observable characteristics, such as is commonly done in the form of control variables (e.g., income, sales, and number of employees).

To summarize, we observe a clear shift toward reflective measurement in PLS-SEM studies. The various controversies about formative measures' use (Howell et al., 2007; Rhemtulla et al., 2015; Wilcox et al., 2008) probably triggered this development, prompting researchers to “take the safe route” and rely on standard reflective measures. By following this mantra, researchers forego the opportunity to offer marketing practice differentiated recommendations. Contrary to their reflective counterparts, formative measures offer practitioners concrete guidance on how to “improve” certain target constructs—an important consideration in light of the growing concerns about marketing research's relevance for business practice (Homburg et al., 2015; Jaworski, 2011; Kohli & Haenlein, 2021; Kumar, 2017). At the same time, we see some improvement in the model specification practices, with researchers relying less on single-item measures. Nevertheless, the percentage of studies using single items is still high, providing an opportunity for improvement.

3.4 Model evaluation

Since Hair et al.'s (2012) review, research has proposed various improvements in the model evaluation metrics for both measurement and structural models. In the following, we assess whether researchers consider best practices, while acknowledging that some of these improvements have only recently been introduced. We distinguish between reflective and formative measurement models, since their validation—as extensively documented in the literature—relies on totally different sets of criteria (Hair et al., 2021a2022; Ramayah et al., 2018; Wong, 2019).

3.4.1 Reflective measurement models

A total of 479 of the 486 models (98.56%) included at least one reflectively measured construct (Table 5). Of these 479 models, 390 models (81.42%) reported the indicator loadings and 386 models (80.59%) reported at least one measure of the internal consistency reliability (Table 6). The majority of the models reported composite reliability ρC in conjunction with Cronbach's alpha (190 models, 39.67%). Composite reliability ρC was the only reliability metric reported in 132 models (27.56%), and Cronbach's alpha was the only reliability metric in 49 of the 479 models (10.23%). While these reporting practices reflect previous recommendations in the literature (Hair et al., 2017a), more recent guidelines call for using ρA (Dijkstra & Henseler, 2015) as an additional appropriate measure of internal consistency reliability. However, only 15 models rely on this criterion, either exclusively (3 models; 0.63%) or in conjunction with one or more other metrics (12 models, 2.51%). Compared to the previous period, a larger percentage of researchers consider both the indicator reliability and internal consistency reliability (Table 6). The internal consistency reliability assessment more frequently relies on Cronbach's alpha, whose use in the PLS-SEM context is criticized due to the metric assuming a common factor model (Rönkkö et al., 2021). When used in composite models, however, Cronbach's alpha underestimates internal consistency reliability, making it a conservative reliability measure. Consequently, when Cronbach's alpha does not raise concerns in a PLS-SEM analysis, the construct measurement can a fortiori be expected to exhibit sufficient levels of internal consistency reliability.

Table 6. Measurement model evaluation
Panel A: Reflective measurement models
1981–2010 2011–2020
Empirical test criterion in PLS-SEM Number of models reporting (n = 254) Proportion reporting (%) Number of models reporting (n = 479) Proportion reporting (%)
Indicator reliability Indicator loadings 157 61.81 390 81.42
Internal consistency reliability Only Cronbach's alpha 35 13.78 49 10.23
Only ρc 73 28.74 132 27.56
Only ρA 3 0.63
Cronbach's alpha & ρc 69 27.17 190 39.67
Cronbach's alpha & ρA 1 0.21
ρc & ρA 2 0.42
All three 9 1.88
Convergent validity AVE 146 57.48 371 77.45
Other 7 2.76 16 3.34
Discriminant validity Only Fornell-Larcker (FL) criterion 111 43.70 176 36.74
Only cross-loadings 12 4.72 8 1.67
Only HTMT 25 5.22
FL criterion & cross-loadings 31 12.20 81 16.91
FL criterion & HTMT 38 7.93
Cross-loadings & HTMT 3 0.63
All three 12 2.51
Panel B: Formative measurement models
1981–2010 2011–2020
Empirical test criterion in PLS-SEM Number of models reporting (n = 143) Proportion reporting (%) Number of models reporting (n = 112) Proportion reporting (%)
Reflective criteria used to evaluate formative constructs 33 23.08 10 8.93
Collinearity Only VIF/tolerance 17 11.89 43 38.39
Only condition index 1 0.70 1 0.89
Both 4 2.80 9 8.04
Convergent validity Redundancy analysis 6 5.36
Indicator's relative contribution to the construct Indicator weights 33 23.08 74 66.07
Significance of weights Standard errors, significance levels, t values/p values 25 17.48 36 32.14
Only confidence intervals 0 0.00
Both 2 1.79
  • Abbreviation: PLS-SEM, partial least squares structural equation modeling.
  • a Single item constructs were excluded in 1981–2010 and 2011–2020.
  • b Proportion reporting (%) for 2011–2020 uses all 479 models as the base even though the HTMT criterion has only been proposed in 2015.

Of the 479 models, 371 (77.45%) assessed the convergent validity correctly by using the average variance extracted. The remaining models either established convergent validity incorrectly by using criteria such as cross-loadings and composite reliability (16 models, 3.34%) or did not comment on this aspect of model evaluation (92 models, 19.21%).

Discriminant validity is arguably the most important aspect of validity assessment in reflective measurement models (Farrell, 2010), because it ensures each construct is empirically unique and captures a phenomenon that other constructs in the PLS path model do not represent (Franke & Sarstedt, 2019). Against this background, it is surprising that only 343 of the 479 models (71.61%) assessed their discriminant validity. The most frequently used metric is the well-known Fornell-Larcker criterion (Fornell & Larcker, 1981), either as a standalone metric (176 models, 36.74%), or in conjunction with other criteria (131 models, 27.35%). The second most used criterion is an analysis of the indicator cross-loadings, which 104 of the 479 models (21.71%) report. Recent research has shown, however, that both criteria are largely unsuitable on conceptual and empirical grounds to assess discriminant validity. Henseler et al. (2015) propose the heterotrait-monotrait (HTMT) ratio of correlations as an alternative metric to assess discriminant validity, and a series of follow-up studies have confirmed its robustness (Franke & Sarstedt, 2019; Radomir & Moisescu, 2020; Voorhees et al., 2016). We find that 78 models of the 326 (23.93%) models that have been published since the publication of Henseler et al. (2015) draw on this criterion, either exclusively (25 models, 7.67%) or jointly with at least one other criterion (53 models, 16.26%). Of the 78 models applying the HTMT, 47 (60.26%) compare its values with a fixed cutoff value, two models (2.56%) rely on inference testing based on bootstrapping confidence intervals, and 23 models (29.49%) apply both approaches.

While our findings demonstrate some improvements in the reflective measurement model assessment compared to the previous period, concerns about the discriminant validity assessment remain. Given methodological innovations' diffusion latency in applied research, the strong reliance on the Fornell-Larcker criterion and cross-loadings seems understandable. However, future studies should only draw on the HTMT criterion and use bootstrapping to assess whether its values deviate significantly from a predetermined threshold. This threshold depends, however, on the conceptual similarity of the constructs under consideration. For example, a higher HTMT threshold, such as 0.9, can be assumed for conceptually similar constructs, whereas the analysis of conceptually distinct constructs should rely on a lower threshold, such as 0.85 (Franke & Sarstedt, 2019; Hair et al., 2022). Roemer et al. (2021) recently proposed the HTMT2, which relaxes the original criterion's assumption of equal population indicator loadings. Their simulation study shows that the HTMT2 produces a marginally smaller bias in the construct correlation estimates when the indicator loading patterns are very heterogeneous; that is, when some loadings are 0.55 or lower while others are close to 1. The small differences between the values of HTMT and HTMT2 are, however, extremely unlikely to translate into different conclusions when using the criterion for inference-based discriminant validity testing—as recommended in the extant literature (Franke & Sarstedt, 2019). For common loading patterns, where loadings vary between 0.6 and 0.8, the regular HTMT actually performs better than the HTMT2, unless one assumes extremely high construct correlations that would violate discriminant validity, regardless which criterion is being used. Hence, while the notion of an “improved criterion for assessing discriminant validity” appears appealing, the HTMT2 does not offer any improvement over the original metric.

3.4.2 Formative measurement models

The assessment of formative measurement models draws on different criteria than those used in the context of reflective measurement due to the conceptual differences between the two approaches. Specifically, while reflective measurement treats the indicators as error-prone manifestations of the underlying construct, formative measurement assumes that indicators represent different aspects of a construct that jointly define its meaning (Diamantopoulos et al., 2008).

Overall, 112 of the 486 models (23.05%) include at least one formatively measured construct (Table 5). In 10 of these 112 models (8.93%), the researchers use reflective measurement model criteria to evaluate formative measures (Table 6). This practice is very problematic, however, because formatively measured constructs' indicators do not necessarily correlate highly, which renders standard convergent and internal consistency reliability metrics meaningless (Diamantopoulos & Winklhofer, 2001)—even though corresponding metrics may occasionally reach satisfactory values, since strong indicator correlations may also occur in formative measurement models (Nitzl & Chin, 2017). Instead, researchers need to (1) run a redundancy analysis to assess the formative construct's convergent validity, (2) ensure the formative measurement model is not subject to collinearity, and (3) evaluate the formative indicator weights. Table 6 shows the results of this aspect of our review.

In only 6 of the 112 models (5.36%) with formative measures, the authors reported the results of a redundancy analysis. This analysis establishes a formative measure's convergent validity by showing that it correlates highly with a reflective multi-item or single-item measure of the same concept (Cheah et al., 2018). Chin (1998) introduced the redundancy analysis in a PLS-SEM context more than 20 years ago, but extant guidelines only adopted this method more recently (Hair et al., 2022; Ramayah et al., 2018), which could explain its limited use.

Conversely, collinearity assessment is much more prevalent in formative measurement model evaluation, with about half of the models (53 models, 47.32%) assessed in this regard. The variance inflation factor is the primary criterion for collinearity assessment and applied in most studies (Table 6).

Formative measurement model assessment's main focus is interpreting the indicator weights, which represent each indicator's relative contribution to forming the construct's (Cenfetelli & Bassellier, 2009). Slightly less than two-thirds of the models in our review (74 models, 66.07%) report the formative indicator weights, and researchers also assess the indicators' statistical significance in 44 of the 74 models (59.46%). In six of these 44 models (13.63%), researchers only comment on the weights' significance, whereas they report inference statistics in 38 models (86.26%). A detailed analysis shows that 21 of the 44 models (47.73%) include nonsignificant weights. Rather than deleting corresponding indicators, extant guidelines suggest analyzing the formative indicators' loadings, which represent their absolute contribution to the construct (Hair et al., 2019b2022). Only five of the 21 models (23.81%) follow this recommendation.

Overall, our results regarding formative measurement model evaluation are encouraging, as they suggest researchers have become more concerned about applying criteria dedicated to this measurement mode. In comparison, in the previous period, less than one-third of all models analyzed the formative indicator weights and far fewer tested for potential collinearity issues. Nevertheless, in absolute terms our results offer room for improvement. There is virtually no redundancy analysis in PLS-SEM research, which is problematic, since it offers an empirical validation that the formative measure is similar to established measures of the same concept. In addition, many of the models still do not report collinearity statistics. This finding is surprising, since collinearity assessment has long been acknowledged as an integral element of formative measurement model assessment (Cenfetelli & Bassellier, 2009; Diamantopoulos & Winklhofer, 2001; Petter et al., 2007). Researchers should consider these aspects carefully before focusing on indicator weight assessment.

3.4.3 Structural model

Once the measures' reliability and validity have been established, the next step is to assess the structural model's explanatory and predictive power, as well as the path coefficients' significance and relevance (Hair et al., 2020b). Table 7 documents our review's corresponding results.

Table 7. Structural model evaluation
1981–2010 2011–2020
Criterion Empirical test criterion in PLS-SEM Number of models reporting (n = 311) Proportion reporting (%) Number of models reporting (n = 486) Proportion reporting (%)
Path coefficients Values 298 95.82 480 98.77
Significance of path coefficients Standard errors, significance levels, t values/p values 287 92.28 419 86.21
Confidence intervals 0 0.00 11 2.26
Both 0 0.00 43 8.85
Effect size f2 16 5.14 86 17.70
Explanatory power R2 275 88.42 430 88.48
Predictive power PLSpredict 23 4.73
Predictive relevance (blindfolding) Q2 51 16.40 162 33.33
q2 0 0.00 15 3.09
Model fit GoF 16 5.14 60 12.35
SRMR 40 8.23
dULS 1 0.21
dG 1 0.21
RMStheta 1 0.21
1981–2010 2011–2020
Criterion Empirical test criterion in PLS-SEM Number of studies reporting (n = 204) Proportion reporting (%) Number of studies reporting (n = 239) Proportion reporting (%)
Observed heterogeneity Categorical moderator 47 23.04 57 23.85
Continuous moderator 15 7.35 58 24.27
Unobserved heterogeneity Latent class techniques (e.g., FIMIX-PLS) 0 0.00 10 4.18
  • a Proportion reporting (%) for 2011–2020 uses all 486 models as the base even though PLSpredict has only been proposed in 2016.

In a first step, researchers need to ensure that potential collinearity between sets of predictor constructs in the model does not negatively impact the structural model estimates. Our review results show that marketing researchers generally omit this step, as only 100 of the 486 models (20.58%) mention inspecting collinearity in the structural model.

Not surprisingly, almost all the models (480 models, 98.77%) report path coefficients and their significance (473 models, 97.33%). Inference testing relies primarily on t tests with standard errors derived from bootstrapping (419 models, 86.12%). In keeping with recommendations in the more recent literature (Streukens & Leroi-Werelds, 2016), 54 models (11.11%) report bootstrapping confidence intervals either alone (11 models, 2.26%) or together with t values (43 models, 8.85%). Aguierre-Urreta and Rönkkö's (2018) simulation study suggests that researchers should primarily draw on percentile confidence intervals or, alternatively, on bias-corrected and accelerated confidence intervals. Both approaches perform very similarly, but the bias-corrected confidence intervals achieve slightly better results when the bootstrap distributions are highly skewed. Researchers seem to have internalized these recommendations, since 39 of 54 models (72.22%) rely on the percentile method, while 15 models (27.78%) apply the bias-corrected and accelerated confidence intervals. Supplementing the path coefficient reporting, 86 of 486 models (17.70%) also express the structural model effects in terms of the f2 effect size, which provides evidence of an exogenous construct's relative impact on an endogenous construct in terms of R2 changes. While earlier guidelines called for the routine reporting of the f2 (Chin, 1998), more recent research emphasizes that this statistic is redundant in terms of path coefficients, and concluding that reporting it should be optional (Hair et al., 2022).

The majority of the models report the R2 (430 models, 88.48%) to support the structural model's quality, which is almost identical to what occurred in the previous period (88.42%). During our review, we noticed that some authors view the R2 as a measure of their models' predictive power. However, the R2's computation draws on the entire dataset and, as such, indicates a model's explanatory power (Shmueli, 2010). Assessing the PLS path model's predictive power requires estimating the parameters by means of a subset of observations, and using these estimates to predict the omitted observations' case values (Sarstedt & Danks, 2021; Shmueli & Koppius, 2011). Our review shows that only 11 of the 239 studies (4.60%) use holdout samples to validate their results, possibly because research has only recently offered clear guidelines on how to run a holdout sample validation in a PLS-SEM context (Cepeda Carrión et al., 2016). Alternatively, researchers could draw on Shmueli et al.'s (2016) PLSpredict procedure, which implements k-fold cross-validation to generate case-level predictions on an indicator level. The procedure partitions the data into k subsets and uses k-1 subsets to predict the indicator values of a specific target constructs in the remaining data subset (i.e., the holdout sample). This process is repeated k times such that each subset once serves as a holdout sample.

PLSpredict is a relatively new procedure and research has only recently offered guidelines for its use (Shmueli et al., 2019). It is therefore not surprising that only 23 of the 295 models (7.80%) that have been published after Shmueli et al.'s (2016) introduction of PLSpredict apply the procedure. Instead, many researchers (162 of the 486 models, 33.33%) rely on the blindfolding-based Q2 statistic, which scholars had long considered a suitable means of assessing predictive power (Sarstedt et al., 2014)—a far greater percentage than in the previous period, when only 16.40% of all models included this assessment. However, recent research casts doubt on this interpretation, noting that this statistic confuses explanatory and predictive power assessment (Shmueli et al., 2016). We, therefore, advise against its use and that of the q2, which is a Q2-based equivalent of the f2 effect size that 15 of the 486 models (3.09%) report.

A controversial and repeatedly raised issue in research is whether model fit—as understood in a factor-based SEM context—is a meaningful evaluation dimension in PLS path models. While some researchers strongly advocate the use of model fit metrics (e.g., Schuberth & Henseler, 2021; Schuberth et al., 2018a), others point to the available metrics' conceptual deficiencies and question their performance in the context of PLS-SEM (e.g., Hair et al., 2019c; Hair et al., 2022; Lohmöller, 1989). Our review shows that model fit assessment does not feature prominently in recent PLS-SEM applications as researchers report model fit metrics for only 103 of the 486 models (21.19%). Most researchers rely on Tenenhaus et al. (2004) GoF metric (60 models, 12.35%), which Henseler and Sarstedt (2013) debunked as ineffective for separating well-fitting models from misspecified ones. Some researchers (additionally) report SRMR (40 models, 8.23%), relying on thresholds proposed by Henseler et al. (2014). The reporting of distance-based measures dULS and dG as well as the RMStheta is practically nonexistent in PLS-SEM research (1 model each, 0.21%). Scholars have conceptually discussed (Schuberth & Henseler, 2021) and empirically tested (Schuberth et al., 2018a) the SRMR, which lend this metric some authority. However, given SRMR's performance in research settings commonly encountered in applied research, Hair et al. (2022, Chap. 6) comment critically on its practical utility. Specifically, in PLS path models with three constructs, the SRMR requires sample sizes of approximately 500 to detect misspecifications reliably. Our review shows that the vast majority of PLS path models are far more complex (Table 5) and rely on much smaller sample sizes, therefore supporting the raised concerns. The same holds for the bootstrap-based test for exact model fit, whose use recent research advocates (Benitez et al., 2020; Henseler et al., 2016a; Schuberth & Henseler, 2021), but which none of the studies in our review applies.

To summarize, we observe some improvements in PLS-SEM-based structural model assessment, with more researchers seeming to be concerned about predictive power evaluations. In this respect, researchers have made increasing use of the Q2 statistic, which is, however, not a reliable indicator of a model's predictive power. The latter practice is not surprising, since concerns regarding Q2's suitability have only been raised in rather recent research (e.g., Shmueli et al., 2016). Researchers should therefore follow the most recent recommendations and apply k-fold cross-validation by using PLSpredict or drawing on holdout samples. Researchers should also be aware of the controversies surrounding the efficacy of extant model fit metrics in PLS-SEM, and carefully check whether their model and data constellations favor their use. Judging by Schuberth et al.'s (2018a) findings, the SRMR and exact fit test will exhibit very weak performance on the average model identified in our review with seven constructs and 280 observations.

3.5 Advanced modeling and analysis techniques

With the PLS-SEM field's increasing maturation (Hwang et al., 2020; Khan et al., 2019), researchers can draw on a greater repertoire of advanced modeling and analysis techniques to support their conclusions' validity and to identify more complex relationships patterns (Hair et al., 2020a).

For example, many researchers use PLS-SEM because of its ability to accommodate higher-order constructs without identification concerns. Analyzing the prevalence and application of higher-order constructs, we find that 71 of the 239 studies (29.71%) included at least one such construct. The majority of these studies considered second-order constructs (65 studies), while the remaining studies included third-order constructs (5 studies) or both (1 study). Analyzing the measurement model specification of the second-order constructs (Wetzels et al., 2009), we find that most of the studies employ Type I (reflective-reflective; 30 studies), Type II (reflective-formative; 26 studies), or both (4 studies). Only five studies employ Type IV (formative-formative), while no study draws on a Type III (formative-reflective) measurement specification. Analyzing the specification, estimation, and evaluation of the higher-order constructs in greater detail, gives rise to concern in two respects. First, 47 of the studies that employed higher-order constructs do not make the specification and estimation transparent. Those that do, either draw on the repeated indicators approach (15 studies) or the two-stage approach (9 studies). This practice is problematic as endogenous Type II and IV higher-order constructs cannot be reliably estimated using the standard repeated indicators approach (Becker et al., 2012). Second and more importantly, only eight studies correctly evaluate the higher-order constructs using criteria documented in the extant literature (e.g., Sarstedt et al., 2019). The majority of the 71 studies do not evaluate the higher-order constructs at all (30 studies) or incompletely apply the relevant criteria (21 studies), typically disregarding discriminant validity assessment in reflective (i.e., Type I) and the redundancy analysis in formative higher-order constructs (i.e., Types II and IV). Finally, 12 studies misapply the evaluation criteria by erroneously interpreting the relationships between lower- and higher-order components as structural model relationships. Overall, our findings suggest that researchers need to provide much more care in their handling of higher-order constructs by considering the most recent guidelines (Sarstedt et al., 2019).

The identification and treatment of observed and unobserved heterogeneity is another aspect that has attracted considerable attention in methodological research (Memon et al., 2019; Rigdon et al., 2010; Sarstedt et al., 2022). Our review shows that researchers' use of PLS-SEM also mirrors this development. A total of 115 studies (48.12%) in our review took observed heterogeneity into account by either considering continuous (58 studies 24.27%) or categorical (57 studies 23.85%) moderating variables. The strong increase in the percentage of studies using continuous moderating variables compared to the previous period, when only 7.35% undertook corresponding moderation analyses, demonstrates the growing interest in more complex model constellations. Although a total of 57 studies investigated the categorical moderators' impact by means of multigroup analysis (Sarstedt et al., 2011), only 12 address the issue of measurement invariance by using the MICOM procedure (Henseler et al., 2016b). This practice is problematic as establishing measurement invariance is a requirement for any multigroup analysis. By establishing (partial) measurement invariance, researchers are assured that the group differences in the model estimates are not the result of, for example, group-specific response styles (Hult et al., 2008).

While the consideration of moderators facilitates the identification and treatment of observed heterogeneity, group-specific effects can also result from unobserved heterogeneity. Since failure to consider such unobserved heterogeneity can be a severe threat to PLS-SEM results' validity (Becker et al., 2013), researchers call for the routine use of latent class procedures. Recent guidelines recommend the tandem use of finite mixture PLS (FIMIX-PLS; Hahn et al., 2002) and more powerful latent class techniques, such as PLS prediction-oriented segmentation (PLS-POS; Becker et al., 2013), PLS genetic algorithm segmentation (PLS-GAS; Ringle et al., 2014), and PLS iterative reweighted regressions segmentation (PLS-IRRS; Schlittgen et al., 2016). While FIMIX-PLS allows researchers to reliably determine the segments to extract from data (Sarstedt et al., 2011), techniques such as PLS-POS reproduce the actual segment structure more effectively. Recent research calls for the routine application of latent class techniques to ensure that heterogeneity does not impact the aggregate level results adversely (Sarstedt et al., 2020b). Despite these calls, only 10 of the 239 studies in our review (4.18%) conduct a latent class analysis. All these studies apply FIMIX-PLS, which, while detecting heterogeneity issues well, clearly lags behind in terms of treating the heterogeneity.

Researchers have long noted the need to explore alternative explanations for the phenomena under investigation by considering different model configurations. Such explanations will allow researchers to compare the emerging models by using well-known regression literature metrics (Burnham & Anderson, 2002) to select the best-fitting model in the set. Only 10 of our review studies (4.18%) compare alternative models by using the same data set. By disregarding alternative theoretically plausible model configurations, researchers could fall prey to confirmation biases, since they only look for configurations that support “their” model (Nuzzo, 2015), and miss out on opportunities to fully understand the mechanisms at work (Sharma et al., 2019). Further, very few studies compare multiple models by drawing on a large set of criteria, therefore, applying between one and nine different criteria. However, none of the studies applies the Bayesian Information Criterion (Schwarz, 1978) or the Geweke and Meese (1981) criterion, which prior research identified as superior in PLS-SEM-based model comparisons (Danks et al., 2020; Sharma et al., 2019; Sharma et al., 2021c). With further recent developments in these directions (e.g., based on the cross-validated predictive ability test, CVPAT), we expect the popularity and relevance of the predictive model assessment (Sharma et al., 2021a) and model comparison (Liengaard et al., 2021) to increase.

While our review results point to some progress in terms of applying advanced analysis techniques, there is still ample room for improvement. First, researchers need pay greater attention to the specification, estimation and particularly the validation of higher-order constructs. Second, measurement invariance assessment should precede any multigroup analysis (Henseler et al., 2016b), since failure to establish at least partial measurement invariance renders any multigroup analysis meaningless. Third, researchers should routinely apply latent class analyses to check their aggregate level results' robustness. Failure to do so may result in misleading interpretations when unobserved heterogeneity affects the model estimates. Finally, researchers should more often compare theoretically plausible alternatives to the model under consideration. Considering alternative explanations is a crucial step before making any attempt to falsify a theory (Popper, 1959).

3.6 Reporting

Hair et al.'s (2012) review emphasized that the algorithm settings used in the analysis needed to be transparent to facilitate its replicability. Our review shows some improvements compared to the previous period, most notably with regard to documenting the software used for the analysis. Whereas less than half of all the studies in the previous period report the software (49.02%), the percentage is much higher in recent PLS-SEM research (193 studies, 80.75%). Most PLS-SEM studies (166 studies, 69.46%) apply SmartPLS, with PLS-Graph following far behind (18 studies, 7.53%). Other rarely used software programs include XLSTAT (4 studies, 1.67%), WarpPLS (3 studies, 1.26%) and ADANCO (2 studies, 0.84%). In addition, two studies (0.82%) use the plspm package of the R software.

Only five studies (2.09%) mention the parameter or algorithm settings, seven studies (2.93%) the computational options (e.g., weighting schemes), and 12 studies (5.02%) the maximum number of iterations. Although making these algorithm settings transparent is certainly useful, different algorithm settings do not normally induce significant deviations in the estimation results. This, however, differs in bootstrapping settings; Rönkkö et al. (2015), for example, show that the use of sign change options can trigger considerable Type I errors, whereas Streukens and Leroi-Werelds (2016) emphasize the need to draw many bootstrap samples (i.e., at least 10,000) from the original data. In this regard, it is encouraging that 175 studies (73.22%) mention the use of bootstrapping, 141 of which provided additional details of the algorithmic settings, such as the number of bootstrap samples used. None of the studies use jackknifing as an alternative means of deriving standard errors, which is likely due to this method only being implemented in the early PLS-Graph software that only a few researchers still use.

Research has produced various variants of the original PLS-SEM algorithm (Becker & Ismail, 2016). One prominent research stream deals with the original PLS-SEM's extensions, to estimate common factor models. While researchers have introduced various such extensions (Bentler & Huang, 2014; Kock, 2019), consistent PLS-SEM (PLSc-SEM) has become the primary technique in this stream (Dijkstra & Henseler, 2015). The use of PLSc-SEM has triggered discussions with certain researchers, who call for its routine application (Benitez et al., 2020; Evermann & Rönkkö, 2021; Henseler et al., 2016a), while others conclude the method “adds very little to existing knowledge of SEM” (Hair et al., 2019c, p. 570). PLS-SEM users seem to share this skepticism as only five studies (2.09%) apply PLSc-SEM.

Finally, we see no increase in the share of studies reporting indicator covariance or correlation matrices. This practice is problematic, because it hinders the results' replicability and checking whether model evaluation metrics, such as the HTMT, are adequate. On the contrary, studies routinely report construct-level correlation matrices (180 studies, 75.31%), largely due to their relevance for Fornell-Larcker-based discriminant validity assessment.

To summarize, reporting practices regarding the software and algorithm settings used have improved since Hair et al.'s (2012) review. Nevertheless, more studies need to make the software and, particularly, the bootstrap settings transparent. Furthermore, SmartPLS (Ringle et al., 2015) is currently the most widely applied software, likely due to its user friendliness (Sarstedt & Cheah, 2019), features, and functionalities (Memon et al., 2021). However, with new software packages, such as cSEM (Rademaker & Schuberth, 2021) and SEMinR (Ray et al., 2021), as well as the detailed documentation on them (Hair et al., 2021a; Henseler, 2021), we expect the use of R packages to gain traction. Finally, in light of the growing concerns about social science research's replicability (Rigdon et al., 2020), reporting indicator and construct-level correlation matrices should become mandatory.

4 CONCLUSION

The substantial increase in the number of articles applying PLS-SEM during the last decade shows that PLS-SEM has become an essential element in marketing researchers' methodological toolbox. Our review covers 239 articles published in the top 30 marketing journals between 2011 and 2020—a significant increase compared to those in Hair et al.'s (2012) study, which covered 204 articles published in the same journals between 1981 and 2010.

We find that researchers are currently more aware of the stumbling blocks of PLS-SEM use. Recent research uses PLS-SEM to estimate models with similar average complexity as in the previous period, but drawing on larger datasets, which favors the analyses' statistical power. Our results also suggest that researchers have become more aware of single-item measures' limitations (Diamantopoulos et al., 2012) and of more complex modeling options, like higher-order constructs (Sarstedt et al., 2019) and moderation (Memon et al., 2019). Similarly, researchers have started accommodating recently introduced model evaluation metrics, like the ρA, to assess internal consistency reliability (Dijkstra & Henseler, 2015), the HTMT to assess discriminant validity (Henseler et al., 2015), and Shmueli et al.'s (2016) PLSpredict procedure to evaluate a model's predictive power.

At the same time, we also observe a certain degree of latency in other areas of PLS-SEM use. For example, researchers still rely strongly on the Fornell-Larcker criterion (Fornell & Larcker, 1981) and on cross-loadings to assess discriminant validity; they also hardly use a redundancy analysis to establish formative measures' convergent validity. The use of more advanced analyses, such as latent class techniques, is still lacking in applied PLS-SEM research despite persistent calls for their routine use to ensure the results' validity (Becker et al., 2013; Sarstedt & Danks, 2021a; Sarstedt et al., 20172022). Reporting practices in terms of, for example, the software used and the bootstrapping algorithm settings, are still inadequate. The latency with which methodological innovations diffuse in applied research might explain some of these findings, but certainly not all of them. Researchers, reviewers, and editors should pay greater attention to current developments and the latest best practices in PLS-SEM use (e.g., Hair et al., 2022). Table 8 summarizes these best practices in keeping with the aspects covered in our review and serves as a checklist for future applications of PLS-SEM. We also offer common rules of thumb, which offer initial guidance for estimating and assessing models, and provide supporting references for further reading. Figure 2 visualizes the specific aspects that researchers need to consider in their application of PLS-SEM, focusing on the illustration of the model evaluation and all relevant metrics.

Table 8. Checklist for the application of PLS-SEM
Aspect Recommendation/rules of thumb Suggested references
General
Motivate the use of PLS-EM Motivating the use of PLS-SEM on the grounds of “small sample size” should be done with caution, as small samples affect the method adversely. Do not motivate PLS-SEM use on the grounds of nonnormal data alone. Hair et al. (2019b); Rigdon (2016)
Emphasize the causal-predictive nature of the analysis, model complexity, the estimation of conditional process models, and the use of formative measures. Hair et al. (2019b); Sarstedt et al. (2020a); Sarstedt et al. (2016)
Data characteristics
Sample size Run power analyses to determine the required sample size prior to the analysis or, alternatively, apply the inverse square root method. The ten times rule offers only very rough guidance regarding the minimum sample size requirements and should not be used as a justification. Aguirre-Urreta and Rönkkö (2015); Kock and Hadaya (2018)
Missing values Report missing values and the use of imputation procedures. Hair et al. (2022)
Outliers Detect outliers using univariate or multivariate methods. Delete outliers if necessary or treat them as a separate segment. Sarstedt and Mooi (2019)
Use of discrete variables Discrete variables may be used as grouping variables in multigroup analyses or be included as dummy variables. PLS-SEM can process discrete choice modeling data. Hair et al. (2019a); Hair et al. (2019b)
Model characteristics
Model type Specify funnel-like models where the number of predictor constructs in the model's partial regressions are larger than the endogenous constructs. Avoid chain-like models. Use CB-SEM for unfocused models. Hair et al. (2012); Hair et al. (2021b)
Description of the measurement models Provide a detailed list of indicators in the appendix; distinguish between reflective and formative measurement. Hair et al. (2022)
Single items Generally, refrain from using single items unless measuring observable characteristics, which is commonly done in the form of control variables (e.g., income, sales, number of employees, etc.). Cheah et al. (2018); Diamantopoulos et al. (2012)
Measurement model evaluation: reflective
Internal consistency reliability Cronbach's alpha, composite reliability ρc and ρA Consider Cronbach's alpha as the lower and ρC as the upper boundary of internal consistency reliability. ρA should be considered the best point estimate of internal consistency reliability. Dijkstra and Henseler (2015); Hair et al. (2022)
Thresholds of all reliability measures: 0.70-0.90 (0.60 in exploratory research, max. 0.95 to avoid indicator redundancy).
Convergent validity Interpret the average variance extracted (AVE). Sarstedt et al. (2021)
Threshold:
AVE ≥0.50
Discriminant validity Do not use the Fornell-Laker criterion or cross-loadings, but the HTMT. Franke and Sarstedt (2019); Henseler et al. (2015)
Thresholds:
HTMT <0.90 for conceptually similar constructs
HTMT <0.85 for conceptually different constructs
In addition, test whether the HTMT is significantly lower than the threshold value.
Measurement model evaluation: formative
Convergent validity (redundancy analysis) Use the redundancy analysis to assess the convergent validity; a global single- item or a reflectively measured multi-item scale can be used as an alternative measurement. The construct correlation should be ≥0.70 Cheah et al. (2018); Chin (1998)
Collinearity Use the variance inflation factor (VIF) to assess collinearity in all sets of formative indicators. Hair et al. (2019b)
Thresholds:
VIF≤3: no collinearity issues
VIF 3–5: possible collinearity issues
VIF ≥5: critical collinearity issues
Indicator weights Report the indicator weight estimates and information regarding their significance. Aguirre-Urreta and Rönkkö (2018); Hair et al. (2022)
For inference testing, report the t values, p values with standard errors or, preferably, the bootstrapping confidence intervals. Use the percentile method to construct confidence intervals; use the bias-corrected and accelerated bootstrap method for highly asymmetric bootstrap distributions.
Consider the indicator loadings of nonsignificant weights. Indicators with nonsignificant and low loadings (<0.5) should be removed from the measurement model.
Structural model evaluation
Collinearity Use the variance inflation factor (VIF) for the collinearity assessment of all sets of predictor constructs in the structural model.
Thresholds:
VIF≤3: no collinearity issues
VIF 3–5: possible collinearity issues
VIF ≥5: critical collinearity issues
Path coefficients Report the path coefficient estimates and information on their significance. Aguirre-Urreta and Rönkkö (2018); Hair et al. (2022)
For inference testing, report t values, p values with standard errors or, preferably, the bootstrapping confidence intervals. Use the percentile method to construct confidence intervals; if the bootstrap distributions are highly asymmetric, use the bias-corrected and accelerated (BCa) bootstrap method.
Explanatory power Report the R2, but do not consider the metric as indicative of predictive power. Hair (2021); Hair et al. (2019b); Shmueli and Koppius (2011)
The R2 values depend on the model complexity and the phenomena under research. Note that very high R2 values such as 0.90 are usually indicative of model overfit when measuring theoretical concepts. Hair et al. (2019b)
Predictive power Use PLSpredict to compare the predictions generated by the PLS path model with those of a naïve linear benchmark model. Shmueli et al. (2016); Shmueli et al. (2019)
Model fit Do not use the GoF index. Apply the SRMR or the bootstrap-based test for an exact model fit, but only if the model complexity and sample size support this step. Hair et al. (2022); Schuberth and Henseler (2021); Schuberth et al. (2018b)
Advanced modeling and analysis techniques
Higher-order constructs The specification, estimation, and evaluation of higher-order constructs require particular scrutiny. Follow the most recent guidelines. Sarstedt et al. (2019)
Observed heterogeneity Consider recent guidelines on moderator, multigroup, and conditional process model analyses. Klesel et al. (2019); Memon et al. (2019); Sarstedt et al. (2020a)
Unobserved heterogeneity Use latent class analyses as a robustness check to ascertain that unobserved heterogeneity does not adversely impact the aggregate level results. Becker et al. (2013); Sarstedt et al. (2022); Sarstedt et al. (2020b),
Use finite mixture PLS in combination with more advanced latent class procedures, such as PLS prediction-oriented segmentation.
Measurement Invariance Use the MICOM procedure to at least establish partial measurement invariance as a necessary condition for any multigroup analysis. Henseler et al. (2016b)
Model comparisons Consider different model configurations as representations of alternative explanations of the phenomena under consideration. Liengaard et al. (2021); Sharma et al. (2019); Sharma et al. (2021b)
Use the Bayesian information criterion (BIC) to select the best model.
Consider using BIC-based Akaike weights to quantify each model's relative suitability. Alternatively use the cross-validated predictive ability test (CVPAT).
Reporting
Software Mention the software applied and correctly cite its use, especially if required by the license agreement. Hair et al. (2012)
Algorithm settings Report, at least, the following bootstrapping settings: the number of bootstrap samples used (recommendation: 10,000) and the type of bootstrapping confidence interval.
Do not use sign change options.
Correlation matrix Report the indicator and construct correlation matrices. Hair et al. (2012)
  • Abbreviation: PLS-SEM, partial least squares structural equation modeling.
Details are in the caption following the image
A systematic procedure for reporting PLS-SEM results. PLS-SEM, partial least squares structural equation modeling

As evidenced by PLS-SEM's numerous extensions (e.g., Liengaard et al., 2021; Rasoolimanesh et al., 2021; Richter et al., 2020) and by controversial debates (Hair et al., 2021b; Rönkkö et al., 2021), research in the field is highly dynamic (Hwang et al., 2020; Khan et al., 2019). This rapid progress makes it difficult for researchers to keep up with the latest developments and to make up their minds regarding which research stream to follow. For example, our review points to some confusion in terms of model fit assessment. Some researchers seem to be tempted to report a model fit metric, which has become the sine qua non in factor-based SEM research, adopting the GoF index prematurely which, however, cannot separate correctly specified models from misspecified ones (Henseler & Sarstedt, 2013). Very few researchers apply metrics, like the SRMR, which have been tested in simulation studies (Schuberth et al., 2018a), but whose users seem to disregard their quite weak performance in typical model and data constellations. Future research should, therefore, develop better model fit metrics that detect model misspecifications reliably in standard data and model settings, which reflect those of the various customer satisfaction indices (e.g., Fornell et al., 2020) and technology acceptance models (e.g., Venkatesh et al., 2012).

Recent research also presents various developments that could soon play very important roles in the PLS-SEM field. Hwang and Cho (2020) recently addressed a criticism of the PLS-SEM algorithm by introducing a variant that draws on a single optimization criterion. This extension has the potential to solve various criticisms of the original PLS-SEM method in terms of its modeling options (e.g., introducing model constraints) and model fit assessment. Future studies should further explore this global PLS-SEM method's efficacy for applied research.

Researchers should also advance recently proposed methodological extensions, which we document in Table 9. For example, Richter et al. (2020) subject the PLS-SEM-produced construct scores to a necessary condition analysis (NCA). By following a necessity logic, NCA implies that an outcome—or a certain level of an outcome—can only be achieved if the necessary cause is in place, or is at a certain level (see also Rasoolimanesh et al., 2021). PLS-SEM-based construct scores have also been used in other contexts, such as in the identification and treatment of endogeneity (Hult et al., 2018). Other advances in PLS-SEM address nonlinear effects estimation (Basco et al., 2021) and ensuring the validity of results by uncovering unobserved heterogeneity (Becker et al., 2013; Sarstedt & Danks, 2021). Future research should build on these initial efforts to develop more comprehensive methods that consider the necessity logic and endogeneity before initially computing the construct scores. Such efforts would further enhance PLS-SEM's capabilities and lead to its even more accelerated use in the decade to come.

Table 9. Advanced analysis techniques
Procedure Description Suggested references
Agent-based simulation The combination with agent-based simulation (ABS) makes PLS-SEM results dynamic and extends their predictive range. The PLS-SEM agent uses a static path model and PLS-SEM results to determine the ABS settings at the agent level. Then, the dynamic ABS modeling method extends PLS-SEM's predictive capabilities from the individual level to the population level by modeling the diffusion process in a network (e.g., consumers). Schubring et al. (2016)
Endogeneity Methods for identifying and treating endogeneity, which occurs when a predictor construct is correlated with the error term of the dependent construct to which it is related. Hult et al. (2018)
Fuzzy-set qualitative comparative analysis (fsQCA) and necessary condition analysis (NCA) and In fsQCA, both the independent and dependent variables are calibrated into set membership scores in order to identify (calibrated) independent variables that are sufficient but not necessary for an outcome of the dependent variable. This approach has been transferred to PLS-SEM to analyze necessary conditions in the structural model. Different from the fsQCA, the NCA does not rely on binary necessity statements and therefore considers that an outcome or a certain level of an outcome can only be achieved if the necessary cause is in place or is at a certain level. Leischnig et al. (2016), Rasoolimanesh et al. (2021), Richter et al. (2020)
Importance-performance map analysis Allows researchers to gain more insights from the PLS-SEM analysis by contrasting constructs' total effects with their (rescaled) average scores. Ringle and Sarstedt (2016)
Latent class analysis techniques to identify segments with distinct construct scores Partial least squares k-means facilitates identifying groups of data that maximize score differences while at the same time accounting for structural and measurement model heterogeneity. Fordellone and Vichi (2020)
Latent class techniques to identify segments with distinct model relations Response-based segmentation techniques that identify distinct segments that differ in terms of structural or measurement model relations (e.g., FIMIX-PLS, PLS-IRRS, PLS-POS). Becker et al. (2013), Sarstedt et al. (2021a), Schlittgen et al. (2016)
Model comparisons Empirically compare a set of theoretically plausible models. Danks et al. (2020), Liengaard et al. (2021), Sharma et al. (2019)
Moderated mediation and conditional process analysis This approach combines the moderator analysis and mediator analysis into a moderated mediation and conditional process analysis in PLS-SEM. Cheah et al. (2021), Sarstedt et al. (2020a2020b)
Nonlinear effects estimation PLS-SEM assumes linear relationships. In some instances, however, this assumption does not hold in that relationships may be nonlinear and require nonlinear estimations of coefficients in PLS-SEM. Basco et al. (2021), Sarstedt et al. ( 2020b)
Weighted PLS-SEM (WPLS) The WPLS algorithm is a modified version of the original PLS-SEM algorithm that incorporates sampling weights. Becker and Ismail (2016), Cheah et al. (2021)
  • Abbreviation: PLS-SEM, partial least squares structural equation modeling.

5 ACKNOWLEDGMENT

Open Access funding enabled and organized by Projekt DEAL.

    CONFLICT OF INTERESTS

    The authors declare that there are no conflict of interests.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are available from the corresponding author upon reasonable request.

    • 1 Note that this distinction refers to the measurement-theoretic perspective. To estimate formative measurement models, PLS-SEM uses composite indicators that define the construct in full. The method does not allow for estimating a formative construct's error term using causal indicators (Cho et al., 2021; Sarstedt et al., 2016).
    • 2 In most cases, the descriptions suggest that the authors averaged the items of the lower-order components to generate indicators of the higher-order construct. This practice is, however, problematic as it ignores the biasing effect of measurement error that PLS-SEM's differential indicator weighting accounts for.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.