Effective Use of Likert Scales in Visualization Evaluations: A Systematic Review
Abstract
Likert scales are often used in visualization evaluations to produce quantitative estimates of subjective attributes, such as ease of use or aesthetic appeal. However, the methods used to collect, analyze, and visualize data collected with Likert scales are inconsistent among evaluations in visualization papers. In this paper, we examine the use of Likert scales as a tool for measuring subjective response in a systematic review of 134 visualization evaluations published between 2009 and 2019. We find that papers with both objective and subjective measures do not hold the same reporting and analysis standards for both aspects of their evaluation, producing less rigorous work for the subjective qualities measured by Likert scales. Additionally, we demonstrate that many papers are inconsistent in their interpretations of Likert data as discrete or continuous and may even sacrifice statistical power by applying nonparametric tests unnecessarily. Finally, we identify instances where key details about Likert item construction with the potential to bias participant responses are omitted from evaluation methodology reporting, inhibiting the feasibility and reliability of future replication studies. We summarize recommendations from other fields for best practices with Likert data in visualization evaluations, based on the results of our survey. A full copy of this paper and all supplementary material are available at https://osf.io/exbz8/.
References
- Agresti A.: Ordinal responses: Cumulative logit models. In Categorical data analysis. John Wiley & Sons, 2003, ch. 8.2. 10
- Ahn Y., Lin Y.-R.: Fairsight: Visual analytics for fairness in decision making. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 1086–1095. 8, 10
- Albo Y., Lanir J., Bak P., Rafaeli S.: Off the radar: Comparative evaluation of radial visualization solutions for composite indicators. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 569–578. 10
- Brooke J., et al.: Sus-A quick and dirty usability scale. Usability Evaluation in Industry 189, 194 (1996), 4–7. 3
- Bacchetti P.: Current sample size conventions: flaws, harms, and alternatives. BMC medicine 8, 1 (2010), 1–7. 5, 9
- Blascheck T., Besançon L., Bezerianos A., Lee B., Isenberg P.: Glanceable visualization: Studies of data comparison performance on smartwatches. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 630–640. 9
- Batch A., Cunningham A., Cordeil M., Elmqvist N., Dwyer T., Thomas B. H., Marriott K.: There is no spoon: Evaluating performance, space use, and presence with expert domain users in immersive analytics. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 536–546. 9
- Besançon L., Dragicevic P.: The continued prevalence of dichotomous inferences at CHI. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (2019), pp. 1–11. 2
- Besançon L., Issartel P., Ammi M., Isenberg T.: Hybrid tactile/tangible interaction for 3d data exploration. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 881–890. 7
- Burch M., Konevtsova N., Heinrich J., Höferlin M., Weiskopf D.: Evaluation of traditional, orthogonal, and radial tree diagrams by an eye tracking study. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2440–2448. 9, 10
- Bach B., Riche N. H., Hurter C., Marriott K., Dwyer T.: Towards unambiguous edge bundling: Investigating confluent drawings for network visualization. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 541–550. 9
- Butkiewicz T., Stevens A. H.: Effectiveness of structured textures on dynamically changing terrain-like surfaces. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 926–934. 7
- Caine K.: Local standards for sample size at CHI. In Proceedings of the 2016 CHI conference on human factors in computing systems (2016), pp. 981–992. 2
- Chen Y., Barlowe S., Yang J.: Click2annotate: Automated insight externalization with rich semantics. In 2010 IEEE Symposium on Visual Analytics Science and Technology (2010), IEEE, pp. 155–162. 7
- Clason D. L., Dormody T. J.: Analyzing data measured by individual Likert-type items. Journal of agricultural education 35, 4 (1994), 4. 1
10.5032/jae.1994.04031 Google Scholar
- Cockburn A., Dragicevic P., Besançon L., Gutwin C.: Threats of a replication crisis in empirical computer science. Communications of the ACM 63, 8 (2020), 70–79. 2
- Crisan A., Elliott M.: How to evaluate an evaluation study? Comparing and contrasting practices in Vis with those of other disciplines: Position paper. In 2018 IEEE Evaluation and Beyond-Methodological Approaches for Visualization (BELIV) (2018), IEEE, pp. 28–36. 4, 5, 9
- Cockburn A., Gutwin C., Dix A.: Hark no more: On the preregistration of chi experiments. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018), pp. 1–12. 2
- Journal Citation Report. Clarivate Analytics (2018). 5
- Cohé A., Liutkus B., Bailly G., Eagan J., Lecolinet E.: Schemelens: A content-aware vector-based fisheye technique for navigating large systems diagrams. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 330–338. 8
- Correll M.: What do we actually learn from evaluations in the “Heroic Era” of visualization?: Position paper. In 2020 IEEE Workshop on Evaluation and Beyond-Methodological Approaches to Visualization (BELIV) (2020), IEEE, pp. 48–54. 2
- Carifio J., Perla R.: Resolving the 50-year debate around using and misusing Likert scales. Medical education 42, 12 (2008), 1150–1152. 2, 4
- Chyung S. Y., Roberts K., Swanson I., Hankinson A.: Evidence-based survey design: The use of a midpoint on the likert scale. Performance Improvement 56, 10 (2017), 15–23. 3
10.1002/pfi.21727 Google Scholar
- Dimara E., Bezerianos A., Dragicevic P.: The attraction effect in information visualization. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 471–480. 6, 7, 8
- Dragicevic P.: HCI Statistics without p-values. PhD thesis, 2015. 5, 9, 10
- Eulzer P., Engelhardt S., Lichtenberg N., De Simone R., Lawonn K.: Temporal views of flattened mitral valve geometries. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 971–980. 8
- Friedman H. H., Amoo T.: Rating the rating scales. Journal of Marketing Management, Winter (1999), 114–123. 3, 7
- Faul F., Erdfelder E., Lang A.-G., Buchner A.: G∗ power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior research methods 39, 2 (2007), 175–191. 5
- Gschwandtnei T., Bögl M., Federico P., Miksch S.: Visual encodings of temporal uncertainty: A comparative user study. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 539–548. 10
- Glasser S., Lawonn K., Hoffmann T., Skalej M., Preim B.: Combined visualization of wall thickness and wall shear stress for the evaluation of aneurysms. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2506–2515. 6, 8
- Gotz D., Stavropoulos H.: Decisionflow: Visual analytics for high-dimensional temporal event sequence data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1783–1792. 7, 8
- Harrell F. E., et al.: Ordinal logistic regression. In Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis, vol. 3. Springer, 2015, ch. 13. 10
- Harpe S. E.: How to analyze Likert and other rating scale data. Currents in Pharmacy Teaching and Learning 7, 6 (2015), 836–850. 1, 2, 3, 4
- Hartley J., Betts L. R.: Four layouts and a finding: the effects of changes in the order of the verbal labels and numerical values on Likert-type scales. International Journal of Social Research Methodology 13, 1 (2010), 17–27. 3
- Hsu T.-C., Feldt L. S.: The effect of limitations on the number of criterion score values on the significance level of the f-test. American Educational Research Journal 6, 4 (1969), 515–527. 4
- Hornbæk K.: Some whys and hows of experiments in human–computer interaction. Foundations and Trends in Human-Computer Interaction 5, 4 (2013), 299–373. 5
- Hullman J., Qiao X., Correll M., Kale A., Kay M.: In pursuit of error: A survey of uncertainty visualization evaluation. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 903–913. 2
- Hart S. G., Staveland L. E.: Development of nasa-tlx (task load index): Results of empirical and theoretical research. In Advances in Psychology, vol. 52. Elsevier, 1988, pp. 139–183. 3
- Hajizadeh A. H., Tory M., Leung R.: Supporting awareness through collaborative brushing and linking of tabular data. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2189–2197. 6
- Isenberg P., Bezerianos A., Dragicevic P., Fekete J.-D.: A study on dual-scale data charts. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2469–2478. 4, 9
- Isenberg P., Fisher D., Morris M. R., Inkpen K., Czerwinski M.: An exploratory study of co-located collaborative visual analytics around a tabletop display. In 2010 IEEE Symposium on Visual Analytics Science and Technology (2010), IEEE, pp. 179–186. 10
- Isenberg T., Isenberg P., Chen J., Sedlmair M., Möller T.: A systematic review on the practice of evaluating visualization. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2818–2827. 1, 2
- Jamieson S.: Likert scales: How to (ab) use them? Medical education 38, 12 (2004), 1217–1218. 4
- Krekhov A., Cmentowski S., Waschk A., Krüger J.: Deadeye visualization revisited: Investigation of preattentiveness and applicability in virtual environments. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 547–557. 10
- Kosara R., Haroz S.: Skipping the replication crisis in visualization: Threats to study validity and how to address them: Position paper. In 2018 IEEE Evaluation and Beyond-Methodological Approaches for Visualization (BELIV) (2018), IEEE, pp. 102–107. 3, 10
- Kaptein M. C., Nass C., Markopoulos P.: Powerful and consistent analysis of Likert-type rating scales. In Proceedings of the SIGCHI conference on human factors in computing systems (2010), pp. 2391–2394. 2
- Kersten-Oertel M., Chen S. J.-S., Collins D. L.: An evaluation of depth enhancing perceptual cues for vascular volume visualization in neurosurgery. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 391–403. 4, 9
- Kosara R.: An empire built on sand: Reexamining what we think we know about visualization. In Proceedings of the sixth workshop on beyond time and errors on novel evaluation methods for visualization (2016), pp. 162–168. 1, 2, 3
- Kaptein M., Robertson J.: Rethinking statistical analysis methods for CHI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2012), pp. 1105–1114. 2, 9, 10
- Lekschas F., Behrisch M., Bach B., Kerpedjiev P., Gehlenborg N., Pfister H.: Pattern-driven navigation in 2d multi-scale visualizations with scalable insets. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 611–621. 8, 9
- Lam H., Bertini E., Isenberg P., Plaisant C., Carpendale S.: Empirical studies in information visualization: Seven scenarios. IEEE Transactions on Visualization and Computer Graphics 18, 9 (2011), 1520–1536. 2
- Law P.-M., Basole R. C., Wu Y.: Duet: Helping data analysis novices conduct pairwise comparisons by minimal specification. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 427–437. 7, 8, 9, 10
- Lewis J. R.: Multipoint scales: Mean and median differences and observed significance levels. International Journal of Human-Computer Interaction 5, 4 (1993), 383–392. 4
10.1080/10447319309526075 Google Scholar
- Likert R.: A technique for the measurement of attitudes. Archives of psychology (1932). 3
- Liddell T. M., Kruschke J. K.: Analyzing ordinal data with metric models: What could possibly go wrong? Journal of Experimental Social Psychology 79 (2018), 328–348. 1, 2
- Liu L., Padilla L., Creem-Regehr S. H., House D. H.: Visualizing uncertain tropical cyclone predictions using representative samples from ensembles of forecast tracks. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 882–891. 8, 9
- Lawonn K., Trostmann E., Preim B., Hildebrandt K.: Visualization and extraction of carvings for heritage conservation. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 801–810. 7
- Meyer M., Dykes J.: Criteria for rigor in visualization design study. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 87–97. 2
- Meuschke M., Oeltze-Jafra S., Beuing O., Preim B., Lawonn K.: Classification of blood flow patterns in cerebral aneurysms. IEEE Transactions on Visualization and Computer Graphics 25, 7 (2018), 2404–2418. 7
- Mumby P. J.: Statistical power of non-parametric tests: A quick guide for designing sampling strategies. Marine pollution bulletin 44, 1 (2002), 85–87. 5
- Nilsen E. B., Bowler D. E., Linnell J. D.: Exploratory and confirmatory research in the open science era. Journal of Applied Ecology 57, 4 (2020), 842–847. 4
- Norman G.: Likert scales, levels of measurement and the “laws” of statistics. Advances in health sciences education 15, 5 (2010), 625–632. 2, 4
- Perugini M., Gallucci M., Costantini G.: A practical primer to power analysis for simple experimental designs. International Review of Social Psychology 31, 1 (2018). 5, 9
- Polk T., Yang J., Hu Y., Zhao Y.: TenniVis: Visualization for tennis match analysis. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2339–2348. 9
- Ren D., Amershi S., Lee B., Suh J., Williams J. D.: Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 61–70. 7
- Robbins N. B., Heiberger R. M., et al.: Plotting Likert and other rating scales. In Proceedings of the 2011 Joint Statistical Meeting (2011), pp. 1058–1066. 9
- Roberts J. C., Headleand C., Ritsos P. D.: Sketching designs using the five design-sheet methodology. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 419–428. 3
- Ren D., Höllerer T., Yuan X.: ivisdesigner: Expressive interactive design of information visualizations. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2092–2101. 7
- Roster C. A., Lucianetti L., Albaum G.: Exploring slider vs. categorical response formats in web-based surveys. Journal of Research Practice 11, 1 (2015), D1–D1. 9
- Sedlmair M., Frank A., Munzner T., Butz A.: Relex: Visualization for actively changing overlay network specifications. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2729–2738. 7
- Schmidt J., Fleischmann D., Preim B., Brändle N., Mistelbauer G.: Popup-plots: Warping temporal data visualization. IEEE Transactions on Visualization and Computer Graphics 25, 7 (2018), 2443–2457. 7
- Siegel S.: Nonparametric statistics. The American Statistician 11, 3 (1957), 13–19. 4, 10
- Schwarz N., Knäuper B., Hippler H.-J., Noelle-Neumann E., Clark L.: Rating scales numeric values may change the meaning of scale labels. Public Opinion Quarterly 55, 4 (1991), 570–582. 3
- Smit N., Lawonn K., Kraima A., DeRuiter M., Sokooti H., Bruckner S., Eisemann E., Vilanova A.: Pelvis: Atlas-based surgical planning for oncological pelvic surgery. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 741–750. 8
- Streiner D. L., Norman G. R., Cairney J.: Health measurement scales: a practical guide to their development and use. Oxford University Press, USA, 2015. 9
- Stevens S. S.: On the theory of scales of measurement. Science 103, 2684 (1946), 677–680. 3, 4
- Sarvghad A., Tory M., Mahyar N.: Visualizing dimension coverage to support exploratory analysis. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 21–30. 7, 8
- Shapiro S. S., Wilk M. B.: An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611. 8
- Tang T., Rubab S., Lai J., Cui W., Yu L., Wu Y.: istory-line: Effective convergence to hand-drawn storylines. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 769–778. 4
- Valdez A. C., Ziefle M., Sedlmair M.: Priming and anchoring effects in visualization. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 584–594. 10
- Wu Y., Cao N., Archambault D., Shen Q., Qu H., Cui W.: Evaluation of graph sampling: A visualization perspective. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 401–410. 8, 9
- Weijters B., Cabooter E., Schillewaert N.: The effect of rating scale format on response styles: The number of response categories and response category labels. International Journal of Research in Marketing 27, 3 (2010), 236–247. 3
- Weng L.-J.: Impact of the number of response categories and anchor labels on coefficient alpha and test-retest reliability. Educational and Psychological Measurement 64, 6 (2004), 956–972. 3
- Wongsuphasawat K., Gotz D.: Exploring flow, factors, and outcomes of temporal event sequences with the outflow visualization. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2659–2668. 8
- Waldner M., Le Muzic M., Bernhard M., Purgathofer W., Viola I.: Attractive flicker—guiding attention in dynamic narrative visualizations. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2456–2465. 7
- Wei Y., Mei H., Zhao Y., Zhou S., Lin B., Jiang H., Chen W.: Evaluating perceptual bias during geometric scaling of scatterplots. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 321–331. 8
- Wongsuphasawat K., Shneiderman B.: Finding comparable temporal categorical records: A similarity measure with an interactive visualization. In 2009 IEEE Symposium on Visual Analytics Science and Technology (2009), IEEE, pp. 27–34. 7
- Yang Y., Dwyer T., Jenny B., Marriott K., Cordeil M., Chen H.: Origin-destination flow maps in immersive environments. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 693–703. 9
- Yu L., Efstathiou K., Isenberg P., Isenberg T.: Cast: Effective and efficient user interaction for context-aware selection in 3d particle clouds. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 886–895. 9
- Zhao Y., Luo F., Chen M., Wang Y., Xia J., Zhou F., Wang Y., Chen Y., Chen W.: Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 12–21. 8