Likert scales are often used in visualization evaluations to produce quantitative estimates of subjective attributes, such as ease of use or aesthetic appeal. However, the methods used to collect, analyze, and visualize data collected with Likert scales are inconsistent among evaluations in visualization papers. In this paper, we examine the use of Likert scales as a tool for measuring subjective response in a systematic review of 134 visualization evaluations published between 2009 and 2019. We find that papers with both objective and subjective measures do not hold the same reporting and analysis standards for both aspects of their evaluation, producing less rigorous work for the subjective qualities measured by Likert scales. Additionally, we demonstrate that many papers are inconsistent in their interpretations of Likert data as discrete or continuous and may even sacrifice statistical power by applying nonparametric tests unnecessarily. Finally, we identify instances where key details about Likert item construction with the potential to bias participant responses are omitted from evaluation methodology reporting, inhibiting the feasibility and reliability of future replication studies. We summarize recommendations from other fields for best practices with Likert data in visualization evaluations, based on the results of our survey. A full copy of this paper and all supplementary material are available at https://osf.io/exbz8/.

References

Agresti A.: Ordinal responses: Cumulative logit models. In Categorical data analysis. John Wiley & Sons, 2003, ch. 8.2. 10
Google Scholar
Ahn Y., Lin Y.-R.: Fairsight: Visual analytics for fairness in decision making. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 1086–1095. 8, 10
PubMed Web of Science® Google Scholar
Albo Y., Lanir J., Bak P., Rafaeli S.: Off the radar: Comparative evaluation of radial visualization solutions for composite indicators. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 569–578. 10
10.1109/TVCG.2015.2467322
Web of Science® Google Scholar
Brooke J., et al.: Sus-A quick and dirty usability scale. Usability Evaluation in Industry 189, 194 (1996), 4–7. 3
Google Scholar
Bacchetti P.: Current sample size conventions: flaws, harms, and alternatives. BMC medicine 8, 1 (2010), 1–7. 5, 9
10.1186/1741-7015-8-17
PubMed Web of Science® Google Scholar
Blascheck T., Besançon L., Bezerianos A., Lee B., Isenberg P.: Glanceable visualization: Studies of data comparison performance on smartwatches. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 630–640. 9
10.1109/TVCG.2018.2865142
Web of Science® Google Scholar
Batch A., Cunningham A., Cordeil M., Elmqvist N., Dwyer T., Thomas B. H., Marriott K.: There is no spoon: Evaluating performance, space use, and presence with expert domain users in immersive analytics. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 536–546. 9
10.1109/TVCG.2019.2934803
PubMed Web of Science® Google Scholar
Besançon L., Dragicevic P.: The continued prevalence of dichotomous inferences at CHI. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (2019), pp. 1–11. 2
Google Scholar
Besançon L., Issartel P., Ammi M., Isenberg T.: Hybrid tactile/tangible interaction for 3d data exploration. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 881–890. 7
10.1109/TVCG.2016.2599217
Web of Science® Google Scholar
Burch M., Konevtsova N., Heinrich J., Höferlin M., Weiskopf D.: Evaluation of traditional, orthogonal, and radial tree diagrams by an eye tracking study. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2440–2448. 9, 10
10.1109/TVCG.2011.193
PubMed Web of Science® Google Scholar
Bach B., Riche N. H., Hurter C., Marriott K., Dwyer T.: Towards unambiguous edge bundling: Investigating confluent drawings for network visualization. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 541–550. 9
10.1109/TVCG.2016.2598958
Web of Science® Google Scholar
Butkiewicz T., Stevens A. H.: Effectiveness of structured textures on dynamically changing terrain-like surfaces. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 926–934. 7
10.1109/TVCG.2015.2467962
Web of Science® Google Scholar
Caine K.: Local standards for sample size at CHI. In Proceedings of the 2016 CHI conference on human factors in computing systems (2016), pp. 981–992. 2
Google Scholar
Chen Y., Barlowe S., Yang J.: Click2annotate: Automated insight externalization with rich semantics. In 2010 IEEE Symposium on Visual Analytics Science and Technology (2010), IEEE, pp. 155–162. 7
Google Scholar
Clason D. L., Dormody T. J.: Analyzing data measured by individual Likert-type items. Journal of agricultural education 35, 4 (1994), 4. 1
10.5032/jae.1994.04031
Google Scholar
Cockburn A., Dragicevic P., Besançon L., Gutwin C.: Threats of a replication crisis in empirical computer science. Communications of the ACM 63, 8 (2020), 70–79. 2
10.1145/3360311
Web of Science® Google Scholar
Crisan A., Elliott M.: How to evaluate an evaluation study? Comparing and contrasting practices in Vis with those of other disciplines: Position paper. In 2018 IEEE Evaluation and Beyond-Methodological Approaches for Visualization (BELIV) (2018), IEEE, pp. 28–36. 4, 5, 9
Google Scholar
Cockburn A., Gutwin C., Dix A.: Hark no more: On the preregistration of chi experiments. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018), pp. 1–12. 2
Google Scholar
Journal Citation Report. Clarivate Analytics (2018). 5
Google Scholar
Cohé A., Liutkus B., Bailly G., Eagan J., Lecolinet E.: Schemelens: A content-aware vector-based fisheye technique for navigating large systems diagrams. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 330–338. 8
10.1109/TVCG.2015.2467035
PubMed Web of Science® Google Scholar
Correll M.: What do we actually learn from evaluations in the “Heroic Era” of visualization?: Position paper. In 2020 IEEE Workshop on Evaluation and Beyond-Methodological Approaches to Visualization (BELIV) (2020), IEEE, pp. 48–54. 2
Google Scholar
Carifio J., Perla R.: Resolving the 50-year debate around using and misusing Likert scales. Medical education 42, 12 (2008), 1150–1152. 2, 4
10.1111/j.1365-2923.2008.03172.x
CAS PubMed Web of Science® Google Scholar
Chyung S. Y., Roberts K., Swanson I., Hankinson A.: Evidence-based survey design: The use of a midpoint on the likert scale. Performance Improvement 56, 10 (2017), 15–23. 3
10.1002/pfi.21727
Google Scholar
Dimara E., Bezerianos A., Dragicevic P.: The attraction effect in information visualization. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 471–480. 6, 7, 8
10.1109/TVCG.2016.2598594
Web of Science® Google Scholar
Dragicevic P.: HCI Statistics without p-values. PhD thesis, 2015. 5, 9, 10
Google Scholar
Eulzer P., Engelhardt S., Lichtenberg N., De Simone R., Lawonn K.: Temporal views of flattened mitral valve geometries. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 971–980. 8
PubMed Web of Science® Google Scholar
Friedman H. H., Amoo T.: Rating the rating scales. Journal of Marketing Management, Winter (1999), 114–123. 3, 7
Google Scholar
Faul F., Erdfelder E., Lang A.-G., Buchner A.: G∗ power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior research methods 39, 2 (2007), 175–191. 5
10.3758/BF03193146
PubMed Web of Science® Google Scholar
Gschwandtnei T., Bögl M., Federico P., Miksch S.: Visual encodings of temporal uncertainty: A comparative user study. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 539–548. 10
10.1109/TVCG.2015.2467752
Web of Science® Google Scholar
Glasser S., Lawonn K., Hoffmann T., Skalej M., Preim B.: Combined visualization of wall thickness and wall shear stress for the evaluation of aneurysms. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2506–2515. 6, 8
10.1109/TVCG.2014.2346406
PubMed Web of Science® Google Scholar
Gotz D., Stavropoulos H.: Decisionflow: Visual analytics for high-dimensional temporal event sequence data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1783–1792. 7, 8
10.1109/TVCG.2014.2346682
PubMed Web of Science® Google Scholar
Harrell F. E., et al.: Ordinal logistic regression. In Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis, vol. 3. Springer, 2015, ch. 13. 10
Google Scholar
Harpe S. E.: How to analyze Likert and other rating scale data. Currents in Pharmacy Teaching and Learning 7, 6 (2015), 836–850. 1, 2, 3, 4
10.1016/j.cptl.2015.08.001
Web of Science® Google Scholar
Hartley J., Betts L. R.: Four layouts and a finding: the effects of changes in the order of the verbal labels and numerical values on Likert-type scales. International Journal of Social Research Methodology 13, 1 (2010), 17–27. 3
10.1080/13645570802648077
Web of Science® Google Scholar
Hsu T.-C., Feldt L. S.: The effect of limitations on the number of criterion score values on the significance level of the f-test. American Educational Research Journal 6, 4 (1969), 515–527. 4
10.3102/00028312006004515
Web of Science® Google Scholar
Hornbæk K.: Some whys and hows of experiments in human–computer interaction. Foundations and Trends in Human-Computer Interaction 5, 4 (2013), 299–373. 5
Google Scholar
Hullman J., Qiao X., Correll M., Kale A., Kay M.: In pursuit of error: A survey of uncertainty visualization evaluation. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 903–913. 2
10.1109/TVCG.2018.2864889
Web of Science® Google Scholar
Hart S. G., Staveland L. E.: Development of nasa-tlx (task load index): Results of empirical and theoretical research. In Advances in Psychology, vol. 52. Elsevier, 1988, pp. 139–183. 3
Google Scholar
Hajizadeh A. H., Tory M., Leung R.: Supporting awareness through collaborative brushing and linking of tabular data. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2189–2197. 6
10.1109/TVCG.2013.197
PubMed Web of Science® Google Scholar
Isenberg P., Bezerianos A., Dragicevic P., Fekete J.-D.: A study on dual-scale data charts. IEEE Transactions on Visualization and Computer Graphics 17, 12 (2011), 2469–2478. 4, 9
10.1109/TVCG.2011.160
PubMed Web of Science® Google Scholar
Isenberg P., Fisher D., Morris M. R., Inkpen K., Czerwinski M.: An exploratory study of co-located collaborative visual analytics around a tabletop display. In 2010 IEEE Symposium on Visual Analytics Science and Technology (2010), IEEE, pp. 179–186. 10
Google Scholar
Isenberg T., Isenberg P., Chen J., Sedlmair M., Möller T.: A systematic review on the practice of evaluating visualization. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2818–2827. 1, 2
10.1109/TVCG.2013.126
PubMed Web of Science® Google Scholar
Jamieson S.: Likert scales: How to (ab) use them? Medical education 38, 12 (2004), 1217–1218. 4
10.1111/j.1365-2929.2004.02012.x
CAS PubMed Web of Science® Google Scholar
Krekhov A., Cmentowski S., Waschk A., Krüger J.: Deadeye visualization revisited: Investigation of preattentiveness and applicability in virtual environments. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 547–557. 10
PubMed Web of Science® Google Scholar
Kosara R., Haroz S.: Skipping the replication crisis in visualization: Threats to study validity and how to address them: Position paper. In 2018 IEEE Evaluation and Beyond-Methodological Approaches for Visualization (BELIV) (2018), IEEE, pp. 102–107. 3, 10
Google Scholar
Kaptein M. C., Nass C., Markopoulos P.: Powerful and consistent analysis of Likert-type rating scales. In Proceedings of the SIGCHI conference on human factors in computing systems (2010), pp. 2391–2394. 2
Google Scholar
Kersten-Oertel M., Chen S. J.-S., Collins D. L.: An evaluation of depth enhancing perceptual cues for vascular volume visualization in neurosurgery. IEEE Transactions on Visualization and Computer Graphics 20, 3 (2013), 391–403. 4, 9
10.1109/TVCG.2013.240
Web of Science® Google Scholar
Kosara R.: An empire built on sand: Reexamining what we think we know about visualization. In Proceedings of the sixth workshop on beyond time and errors on novel evaluation methods for visualization (2016), pp. 162–168. 1, 2, 3
Google Scholar
Kaptein M., Robertson J.: Rethinking statistical analysis methods for CHI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2012), pp. 1105–1114. 2, 9, 10
Google Scholar
Lekschas F., Behrisch M., Bach B., Kerpedjiev P., Gehlenborg N., Pfister H.: Pattern-driven navigation in 2d multi-scale visualizations with scalable insets. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 611–621. 8, 9
PubMed Web of Science® Google Scholar
Lam H., Bertini E., Isenberg P., Plaisant C., Carpendale S.: Empirical studies in information visualization: Seven scenarios. IEEE Transactions on Visualization and Computer Graphics 18, 9 (2011), 1520–1536. 2
10.1109/TVCG.2011.279
PubMed Web of Science® Google Scholar
Law P.-M., Basole R. C., Wu Y.: Duet: Helping data analysis novices conduct pairwise comparisons by minimal specification. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 427–437. 7, 8, 9, 10
10.1109/TVCG.2018.2864526
Web of Science® Google Scholar
Lewis J. R.: Multipoint scales: Mean and median differences and observed significance levels. International Journal of Human-Computer Interaction 5, 4 (1993), 383–392. 4
10.1080/10447319309526075
Google Scholar
Likert R.: A technique for the measurement of attitudes. Archives of psychology (1932). 3
Google Scholar
Liddell T. M., Kruschke J. K.: Analyzing ordinal data with metric models: What could possibly go wrong? Journal of Experimental Social Psychology 79 (2018), 328–348. 1, 2
10.1016/j.jesp.2018.08.009
Web of Science® Google Scholar
Liu L., Padilla L., Creem-Regehr S. H., House D. H.: Visualizing uncertain tropical cyclone predictions using representative samples from ensembles of forecast tracks. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 882–891. 8, 9
10.1109/TVCG.2018.2865193
Web of Science® Google Scholar
Lawonn K., Trostmann E., Preim B., Hildebrandt K.: Visualization and extraction of carvings for heritage conservation. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 801–810. 7
10.1109/TVCG.2016.2598603
Web of Science® Google Scholar
Meyer M., Dykes J.: Criteria for rigor in visualization design study. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 87–97. 2
Web of Science® Google Scholar
Meuschke M., Oeltze-Jafra S., Beuing O., Preim B., Lawonn K.: Classification of blood flow patterns in cerebral aneurysms. IEEE Transactions on Visualization and Computer Graphics 25, 7 (2018), 2404–2418. 7
10.1109/TVCG.2018.2834923
PubMed Web of Science® Google Scholar
Mumby P. J.: Statistical power of non-parametric tests: A quick guide for designing sampling strategies. Marine pollution bulletin 44, 1 (2002), 85–87. 5
10.1016/S0025-326X(01)00097-2
CAS PubMed Web of Science® Google Scholar
Nilsen E. B., Bowler D. E., Linnell J. D.: Exploratory and confirmatory research in the open science era. Journal of Applied Ecology 57, 4 (2020), 842–847. 4
10.1111/1365-2664.13571
Web of Science® Google Scholar
Norman G.: Likert scales, levels of measurement and the “laws” of statistics. Advances in health sciences education 15, 5 (2010), 625–632. 2, 4
10.1007/s10459-010-9222-y
PubMed Web of Science® Google Scholar
Perugini M., Gallucci M., Costantini G.: A practical primer to power analysis for simple experimental designs. International Review of Social Psychology 31, 1 (2018). 5, 9
10.5334/irsp.181
Web of Science® Google Scholar
Polk T., Yang J., Hu Y., Zhao Y.: TenniVis: Visualization for tennis match analysis. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2339–2348. 9
10.1109/TVCG.2014.2346445
PubMed Web of Science® Google Scholar
Ren D., Amershi S., Lee B., Suh J., Williams J. D.: Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 61–70. 7
10.1109/TVCG.2016.2598828
Web of Science® Google Scholar
Robbins N. B., Heiberger R. M., et al.: Plotting Likert and other rating scales. In Proceedings of the 2011 Joint Statistical Meeting (2011), pp. 1058–1066. 9
Google Scholar
Roberts J. C., Headleand C., Ritsos P. D.: Sketching designs using the five design-sheet methodology. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 419–428. 3
10.1109/TVCG.2015.2467271
PubMed Web of Science® Google Scholar
Ren D., Höllerer T., Yuan X.: ivisdesigner: Expressive interactive design of information visualizations. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2092–2101. 7
10.1109/TVCG.2014.2346291
PubMed Web of Science® Google Scholar
Roster C. A., Lucianetti L., Albaum G.: Exploring slider vs. categorical response formats in web-based surveys. Journal of Research Practice 11, 1 (2015), D1–D1. 9
Web of Science® Google Scholar
Sedlmair M., Frank A., Munzner T., Butz A.: Relex: Visualization for actively changing overlay network specifications. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2729–2738. 7
10.1109/TVCG.2012.255
CAS PubMed Web of Science® Google Scholar
Schmidt J., Fleischmann D., Preim B., Brändle N., Mistelbauer G.: Popup-plots: Warping temporal data visualization. IEEE Transactions on Visualization and Computer Graphics 25, 7 (2018), 2443–2457. 7
10.1109/TVCG.2018.2841385
PubMed Web of Science® Google Scholar
Siegel S.: Nonparametric statistics. The American Statistician 11, 3 (1957), 13–19. 4, 10
10.1080/00031305.1957.10501091
Web of Science® Google Scholar
Schwarz N., Knäuper B., Hippler H.-J., Noelle-Neumann E., Clark L.: Rating scales numeric values may change the meaning of scale labels. Public Opinion Quarterly 55, 4 (1991), 570–582. 3
10.1086/269282
Web of Science® Google Scholar
Smit N., Lawonn K., Kraima A., DeRuiter M., Sokooti H., Bruckner S., Eisemann E., Vilanova A.: Pelvis: Atlas-based surgical planning for oncological pelvic surgery. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 741–750. 8
10.1109/TVCG.2016.2598826
Web of Science® Google Scholar
Streiner D. L., Norman G. R., Cairney J.: Health measurement scales: a practical guide to their development and use. Oxford University Press, USA, 2015. 9
Google Scholar
Stevens S. S.: On the theory of scales of measurement. Science 103, 2684 (1946), 677–680. 3, 4
10.1126/science.103.2684.677
CAS Web of Science® Google Scholar
Sarvghad A., Tory M., Mahyar N.: Visualizing dimension coverage to support exploratory analysis. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 21–30. 7, 8
10.1109/TVCG.2016.2598466
PubMed Web of Science® Google Scholar
Shapiro S. S., Wilk M. B.: An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611. 8
10.2307/2333709
Web of Science® Google Scholar
Tang T., Rubab S., Lai J., Cui W., Yu L., Wu Y.: istory-line: Effective convergence to hand-drawn storylines. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 769–778. 4
10.1109/TVCG.2018.2864899
Web of Science® Google Scholar
Valdez A. C., Ziefle M., Sedlmair M.: Priming and anchoring effects in visualization. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 584–594. 10
10.1109/TVCG.2017.2744138
PubMed Web of Science® Google Scholar
Wu Y., Cao N., Archambault D., Shen Q., Qu H., Cui W.: Evaluation of graph sampling: A visualization perspective. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2016), 401–410. 8, 9
10.1109/TVCG.2016.2598867
Web of Science® Google Scholar
Weijters B., Cabooter E., Schillewaert N.: The effect of rating scale format on response styles: The number of response categories and response category labels. International Journal of Research in Marketing 27, 3 (2010), 236–247. 3
10.1016/j.ijresmar.2010.02.004
Web of Science® Google Scholar
Weng L.-J.: Impact of the number of response categories and anchor labels on coefficient alpha and test-retest reliability. Educational and Psychological Measurement 64, 6 (2004), 956–972. 3
10.1177/0013164404268674
Web of Science® Google Scholar
Wongsuphasawat K., Gotz D.: Exploring flow, factors, and outcomes of temporal event sequences with the outflow visualization. IEEE Transactions on Visualization and Computer Graphics 18, 12 (2012), 2659–2668. 8
10.1109/TVCG.2012.225
CAS PubMed Web of Science® Google Scholar
Waldner M., Le Muzic M., Bernhard M., Purgathofer W., Viola I.: Attractive flicker—guiding attention in dynamic narrative visualizations. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2456–2465. 7
10.1109/TVCG.2014.2346352
PubMed Web of Science® Google Scholar
Wei Y., Mei H., Zhao Y., Zhou S., Lin B., Jiang H., Chen W.: Evaluating perceptual bias during geometric scaling of scatterplots. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 321–331. 8
10.1109/TVCG.2019.2934208
PubMed Web of Science® Google Scholar
Wongsuphasawat K., Shneiderman B.: Finding comparable temporal categorical records: A similarity measure with an interactive visualization. In 2009 IEEE Symposium on Visual Analytics Science and Technology (2009), IEEE, pp. 27–34. 7
Google Scholar
Yang Y., Dwyer T., Jenny B., Marriott K., Cordeil M., Chen H.: Origin-destination flow maps in immersive environments. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 693–703. 9
10.1109/TVCG.2018.2865192
Web of Science® Google Scholar
Yu L., Efstathiou K., Isenberg P., Isenberg T.: Cast: Effective and efficient user interaction for context-aware selection in 3d particle clouds. IEEE Transactions on Visualization and Computer Graphics 22, 1 (2015), 886–895. 9
10.1109/TVCG.2015.2467202
PubMed Web of Science® Google Scholar
Zhao Y., Luo F., Chen M., Wang Y., Xia J., Zhou F., Wang Y., Chen Y., Chen W.: Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 12–21. 8
10.1109/TVCG.2018.2865020
Web of Science® Google Scholar

Citing Literature

Volume41, Issue3

June 2022

Pages 43-55

Effective Use of Likert Scales in Visualization Evaluations: A Systematic Review

Abstract

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Effective Use of Likert Scales in Visualization Evaluations: A Systematic Review

Abstract

References

Citing Literature

References

Related

Information