Program evaluation: An educator's portal into academic scholarship
Supervising Editor: Dr. Susan Promes
Abstract
Program evaluation is an “essential responsibility” but is often not seen as a scholarly pursuit. While Boyer expanded what qualifies as educational scholarship, many still need to engage in processes that are rigorous and of a requisite academic standard to be labelled as scholarly. Many medical educators may feel that scholarly program evaluation is a daunting task due to the competing interests of curricular change, remediation, and clinical care. This paper explores how educators can take their questions around outcomes and efficacy of our programs and efficiently engage in education scholarship. The authors outline how educators can examine whether training programs have a desired impact and outcomes, and then how they might leverage this process into education scholarship.
INTRODUCTION
Program evaluation has been referred to as an “essential responsibility” for those tasked with the oversight of medical training programs,1 but it is striking how little of this program evaluation work is labelled as scholarly, and how rarely this work translates into academic scholarship. While what qualifies as educational scholarship has been expanded well beyond traditional peer-reviewed publications to include the scholarship of teaching, discovery, integration, and application,2 there is still a need to engage in processes that are rigorous and of a requisite academic standard to be labeled as scholarly.3 However, being asked to both create educational deliverables and innovate within this context is often already above and beyond the duties of overworked and under-supported medical educators. Many medical educators may feel that scholarly program evaluation is a step too far—with so many competing interests, it can be difficult to find the “bandwidth” to accomplish these scholarly tasks.4 Don’t we all wonder about the outcomes and efficacy of our programs? Were our programs received as they were intended? And finally, is my training program having the desired impact and outcomes? And if so, wouldn’t it be nice to generate a multiple win around your project?5
It is not just a lack of time that can prevent medical educators from engaging in scholarly evaluation efforts. Some educators may also feel inadequately trained in program evaluation and unclear what approaches and strategies to employ when engaging in program evaluation. Further, if evaluation is completed well, there is often an opportunity to translate this work into scholarly outputs.
The goal of this paper was to accomplish three goals: (1) to introduce educators to the concept of program evaluation, (2) to help them to understand frameworks that will guide them in correctly and rigorously performing program evaluations, and (3) to discuss ways in which program evaluation can translate to scholarly output.
WHAT IS PROGRAM EVALUATION?
In medical education, a “program” can refer to a large spectrum of activities, and experiences—they can range from a new workplace-based assessment program6, 7 to a boot camp series8 to a longitudinal faculty development course.9, 10 It is an ever-evolving field with new technologies, shifting paradigms, and often unclear scholarly formats. The delivery of medical education requires the implementation of programs. Whether it is a well-established program (e.g., intern orientation or airway management training) or a novel approach to assessment (e.g., simulation-based critical care competency or entrustable professional activity), these programs need to be evaluated to determine if they are worthwhile with respect to effectiveness or value. A formal definition for program evaluation has been put forth by Mohanna and Cottrell as “a systematic approach to the collection, analysis, and interpretation of information about any aspect of the conceptualization, design, implementation, and utility of educational programmes”.11 Simply stated, program evaluation is the process of identifying the value of an educational offering, but at times it can also be a way of determining issues or problems in need of systematic improvement.
Methods similar to those employed by experimentalists or epidemiologists may be used for measurement and analysis when conducting program evaluation, but this process is distinct from conventional research studies. Experimental research typically focuses on the generation of new knowledge that adds to the world more transferable or generalizable to other contexts, whereas program evaluation seeks to understand the efficacy of a specific, discrete project (e.g., a curricular change in a program or a new course design). Quantitative experiments may involve hypothesis testing with a control group and an experimental group, while qualitative studies may seek to understand or describe an experienced phenomenon. Despite being distinct from research, program evaluation is a rigorous process that might use a variety of quantitative and/or qualitative data to determine the value of the outcomes of a program, though technically a research protocol is not required.
WHY AND WHEN TO USE PROGRAM EVALUATION
While the specific purposes of program evaluation are extensive, at its core, program evaluation is about values, judgements, decision making, and change.1, 12, 13 Program evaluation is another way, outside of the program itself, that you can create a value proposition to your community via your program.14 Educators use program evaluation to determine the value and worth of the program they designed and then explain that worth to others. There are multiple program evaluation frameworks, and which framework you select is determined by the stakeholders and focus of the evaluation.13, 15
The ultimate why of your program evaluation will be how you define success of the program in the eyes of the stakeholders and the focus of the evaluation.16 This marker of success should fall into at least one broad category of program evaluation—accountability, knowledge, or development—though these categories are often intertwined.1, 12, 17 More specific purposes for evaluation within these three categories are found in Table 1.
Accountability | Knowledge | Development |
---|---|---|
|
|
|
Although it can resemble research (e.g., experimental or qualitative medical education research), it is differentiated from research by the fundamental underlying impetus for the study—research work seeks to understand the world better through its conduct (to create generalizable or transferrable “truths” to better understand how things work), whereas program evaluation seeks to understand how and if a specific program works.
If done correctly, program evaluation is a systematic method of answering questions about the program you have designed, providing insights for others to replicate or avoid in their own programs.18 Once the work has been done, “dissemination to the community at large constitutes a critical element of scholarship.”13 Dissemination of this work could be publishing the program evaluation as an original research report, as an innovation report, or in an online curricular repository (e.g., MedEdPORTAL, JETem) to help advance knowledge for others (Table 2).
Curriculum package (eg. JETem.org or MedEdPortal) | Innovation report | Original article (Formal program evaluation study) | |
---|---|---|---|
Prototypical Study Question | Is our program worth repeating in other contexts by other teachers? |
Usually one (or a combination of) the following questions:
|
Usually seeking to ask a study question that clarifies, explains, or justifies a program. Study questions can come in a wide variety, but center upon the specific aspects of a program
|
Description of the origins and development of the Innovation | Emphasized slightly more to explain the gap that the curricular package fills |
Emphasized heavily on the actual building of the innovation Analogous to a technical report (engineering) or early materials development work (chemistry or other sciences) Theory and conceptual frameworks are often highlighted |
Deemphasized May even cite the prior innovation report like a full study cites a protocol |
Description of the actual Innovation | The featured element within this type of scholarship. Really details the innovations | This is certainly highlighted in some depth, but not to the level of a curricular package. May wish to append curricular materials within the appendix, but certainly NOT the center point for this type of paper | Deemphasized but usually is described with enough rigor in the materials section of the methods for a new reader (who has not yet read prior work on the topic) can understand the nature and high-level specifics of the innovation—at least so as to understand why the outcomes were of interest |
Outcomes reporting | Increasingly desired but also usually provides insights to other teachers seeking to implement this curriculum as to why this is important. Usually some level of outcomes reporting (e.g., Kirkpatrick level 1, acceptability) is required | Some level of reporting for outcomes | Depending on the framing of the article the outcomes may be different from a simple reporting of effectiveness. Often original works that explore innovations will delve |
Overall, once the rationale is determined, program evaluation can be divided up into two groups that help direct the when—formative (i.e., used to improve the performance of the program, program monitoring, happens at various times) and summative (i.e., used for overall judgements about the program and its developers, usually at the end of the program).19, 20 No matter what the why, all programs should have program evaluations built into them. In fact, Woodward argues that program evaluation should be done within every part of the educational intervention process. For example, a needs assessment is the program evaluation determining the need for the program.19 Ideally, the program evaluation should be developed alongside the program itself ensuring that one does a credible evaluation answering all required questions.18 Early program evaluation development prevents later problems and allows data to be collected, as suggested by Durning et al.,16 during three phases: (1) before (establish a baseline and helps show how much of the outcomes are due to the program itself), (2) during (process measurements; allows developers to notice and fix problems early), and (3) after the program (outcome measurements). The why and when of program evaluation feed directly into the approach you take in doing the program evaluation (i.e., how you actually do this).
HOW TO USE PROGRAM EVALUATION METHODOLOGIES
As stated above, development of the program evaluation should happen alongside development of the program itself, meaning prior to launching the program (or the most recent class of participants). This involves identifying the specific goals of the evaluation by considering the potential stakeholders and end-users of the resultant evaluation. With this information, educators can better align the breadth and focus of the evaluation with their specific needs (Box 1).
BOX 1. Components of a program evaluation
- Develop an evaluation question based on specific goals of various stakeholders
- Identify your theory of change
- Perform a literature search
- Identify your (validated) collection instrument
- Consider your outcomes with a broad lens
Once you have identified the target audience, next determine the underlying theory for change. The three most common theories for this are reductionism, system theory, and complexity theory. Reductionism relies upon an assumption that there is a specific order with a direct cause and effect for each action.21 This approach, reflected in models such as the Logic model,22, 23 suggests that there is a clear linearity and predictable impact from each intervention.1 System theory builds upon this with its roots in the general system theory applied to biology.24 In this model, it is proposed that the whole of a system is greater than the sum of its individual parts.24 Therefore, education programs expand beyond merely isolated parts, instead comprising the integration of the specific program components with each other and with the broader educational environment. Complexity theory expands further to adapt to the ever-changing, more complex state of programs in real life.1, 25 There are multiple complex factors that can influence education programs, including the participants, influence of stakeholders and regulators, professional practice patterns, the surrounding environment, and expanding knowledge within the specific field, as well as with regard to the education concepts being taught.1 Understanding the underlying theories can help inform the conceptual frameworks selected for evaluation, but we will dive into this more in the next section.
CONCEPTUAL FRAMEWORKS
There are many frameworks that can guide your program evaluation process. A full description of each of these is beyond the scope of this paper; however, our authorship team has detailed six program evaluation frameworks that have been featured in medical education (and specifically AEM Education and Training) including: CIPP, Kirkpatrick Model, Logic Model, Realist Evaluation, RE-AIM, and SQUIRE-EDU. Table 3 provides a description of some of the more commonly used frameworks and sources of further information on each of them.31-47
Framework name | Origins & explanation | Example scenario | Citation for a “How to” guide (first author, year) | Citation of an exemplar paper (first author, year) |
---|---|---|---|---|
CIPP (Context, Input, Process, Products) |
CIPP is a comprehensive framework for guiding the evaluations of programs and systems and includes the following components: context, input, process, and impact Context is based on needs assessment, available resources, problems, any background information, and the overall program environment. It includes the planning stage and mainly focuses on the desired goals and objectives for a program Inputs refer to the required strategies, tools, or resources that must be included in the program to meet the needs identified during the Context stage. Inputs include elements including budget, research, plans, stakeholders, or subject matter experts Process is the stage of program development and execution. This stage is where the inputs all come together and it is often revisited to ensure that the program development was well-designed and that the program implementation is meeting expectations Products include the review phase and are the outputs and outcomes related to program performance and objectives The main question in this stage is whether the intended goals have been met. Further, the program sustainability in terms of context, inputs and processes as well as any potential necessary changes to the program are assessed |
The program director recently launched a new diversity, equity, and inclusion curriculum. She wants to better understand and evaluate the effectiveness of the curricula. She selects the CIPP framework to better understand the context, inputs, process, and products, so as to more fully consider all of the inputs and outputs from the new curriculum |
Stufflebeam (2003) 21 Lee (2019)22 |
Steinert (2005)23 Rooholamini (2017)24 |
Kirkpatrick Model |
Kirkpatrick’s original four-level (reaction, learning, behavior, results) model is widely employed in the evaluation of health professional education programs 1. Reaction: Assesses a person’s reactions to a course related element such as teachers, materials, activities, and design While high satisfaction does not necessarily guarantee the next level (learning), low satisfaction levels are likely to reduce the probability of learning 2. Learning: refers to changes in knowledge, skills, and attitudes 3. Behaviors: are changes in practice 4. Results: changes at the organizational level Other more recent modifications of this model include:
|
The vice chair of faculty development has created a new asynchronous, just-in-time training module. She wants to understand the perception and effect of this new module. As part of the evaluation, she sought out users’ reaction, learning, behaviors, and impact on the system. She selected the Kirkpatrick framework to ensure she had both perception and higher-level outcomes |
Kirkpatrick (2006)25 Kirkpatrick and Kirkpatrick (2016)26 Barr (2005)27 Hammick (2007)28 The New World Kirkpatrick Model29 Phillips (2003)30 Kaufman and Keller (1994)31 |
Gottlieb (2021)32 Lam and Stickrath (2020)33 |
Logic Model | This model is commonly used for designing and evaluating projects and consists of a matrix that outlines a project’s goals, activities, assumptions, and expected results. It provides a structure to help clarify the components of a project, the activities, resources, as well as its anticipated challenges | The medical student clerkship director has added a new airway curriculum for the medical students on rotation. He realizes that it is important to understand both the cost and benefits of the program. Therefore, he uses the Logic model to incorporate both the inputs and outputs into the program evaluation |
Newcomer (2015)34 Van (2016)17 |
Love (2016)35 |
Realist Evaluation |
Realist evaluation is suited for the evaluation of complex educational interventions such as simulation-based education. It seeks to answer what works, for whom, in what circumstances, in what respects, to what extent, and why It uses a mixed methods approach to collection of data to test the context-mechanism-outcome configurations of the education intervention Through investigating the context, mechanisms, and outcomes of education programs, realist evaluation can allow educators to better understand why and when an evaluation does work and in which contexts |
The simulation director has created a new in-situ simulation program which includes interprofessional learners from multiple professions. He wants to identify what works best for different learners in different circumstances and why. Therefore, he selects a realist evaluation for his framework |
Graham and McAleer (2018)36 Wong (2012)37 |
Ogrinc (2014)38 Ellaway (2018)39 |
RE-AIM |
The RE-AIM framework is mainly designed to evaluate the impact of community-based public health programs and interventions. These interventions are often complex as they mainly rely on multiple stakeholders and in complex settings. To understand the impact of a program, impact on participants, organization providing a program, and the broader community needs to be captured This framework consists of five evaluation dimensions: Reach, Effectiveness, Adoption, Implementation, and Maintenance. It has been implemented across various settings and contexts such as community, policy, and public health initiatives |
The vice chair of operations created a new program to train their physicians, advanced practice providers, and nurses on using telemedicine for patient care. Given the complexity of the intervention and reliance upon multiple stakeholders, she uses the RE-AIM framework |
Glasgow (1999)40 Shaw (2019)41 |
Nagji (2020)42 Rose (2021)43 Yilmaz (2021) 44 |
SQUIRE-EDU |
The SQUIRE (Standards for Quality Improvement Reporting Excellence) framework offers guidelines for reporting new knowledge about ways to improve healthcare. SQUIRE has an adaptation that is specifically about Educational Improvement (SQUIRE-EDU) These guidelines are proposed for reports on system-level work to improve the quality, safety, and value of healthcare systems SQUIRE offers a variety of ways to improve healthcare and encourages the researchers to consider all SQUIRE items, but the inclusion of every SQUIRE element may not be necessary SQUIRE has provided guidance on healthcare improvement, and contributed to the understanding of factors that impact the success, and failure, of healthcare improvement efforts |
The medical director is leading a quality improvement initiative to reduce overprescribing of antibiotics. He has developed a multi-step program which includes a specific training session. He uses the SQUIRE-EDU framework to align with the focus on quality improvement |
Goodman (2016)45 Ogrinc, (2019)46 |
Taylor (2019)47 |
When creating the program evaluation, you may utilize frameworks to guide the data collection. The selection process for your conceptual framework will require consideration of the end-users and which data will be most valuable to them. You should perform a thorough literature search to identify similarities and differences with prior programs. Questions should seek to assess the benefits and consequences of the new intervention or innovation. During the literature search, seek out existing tools used by similar programs to inform your evaluation tool design. Identify how this aligns with your current program evaluation needs and modify the tool where necessary. It is important to also collect validity evidence for your specific tool.26 Even if a tool is “validated” in another setting, new validity should be sought for the current application within the context of the new program.26 Since evaluation is often centered on a particular program, the evaluation plan may contain outcomes that are idiosyncratic rather than generalizable; however, best practices of questionnaire design should still be followed as much as possible (e.g., basing the tool on prior evaluation of a previous study, pilot testing a survey tool prior to launch to ensure readability and clarity).
Finally, consider the outcomes with a broader lens. While often considered with regard to learner-oriented outcomes (e.g., Kirkpatrick model), it is also important to consider the costs (e.g., time, expenses, faculty) and broader societal implications as described further below. Those reading the findings will want to weigh the cost and benefits of the program.
MARKERS OF HIGH-QUALITY PROGRAM EVALUATION
Program evaluation and research studies have very common features, depending on the objectives of a study, these two methods may become very similar. While research studies aim to produce new knowledge, program evaluation studies focus on the program quality and value.27 When unsure, ethics boards guidelines are helpful for ensuring that the study that you are about to conduct is a program evaluation study. In the United States, many program evaluations will require institutional review board approval but are usually granted exemption status since program evaluations will fall well within normal educational practices. Ethics boards in Canada deem program evaluations exempt from the ethical review as per Tri___Council Policy Statement 2 [2018] Article 2.5.28 Therefore, initially, a program evaluation study should be checked with the ethics board and receive an ethical exemption to make sure that the study purpose, objectives, data collection, and analysis aligns with it.
There are three common approaches to program evaluation studies: decision-oriented, outcomes-oriented, and expertise-oriented.29 In the previous section, various program evaluation frameworks and models were described that can yield to the overall approaches. These frameworks are of vital value to the overall program evaluation process.1 Without using a framework, program evaluation may lose its focus and the flow of the study may become redundant and less helpful. As each framework focuses on different parts of a study, it is important for researchers to take into account the study’s objectives and focus. The face validity of a framework should be agreed by the investigators, meaning the outcomes of the study could be achieved through the selected framework.13 A study could focus on many objectives such as trainees’ learning, satisfaction, and the intervention’s success in reaching various audiences.1
Innovation reports are an integral part of program evaluation studies as they evaluate novel approaches to teaching and learning. Hall and colleagues reviewed the literature on the quality markers of innovation reports and came up with 34 items resulting in seven themes from analysis of the problem to dissemination of results to ensure that the innovation reports adequately provide insights and reproducibly.30 Therefore, ensuring that a program evaluation study has rigor and reproducibility is very important for any type of program evaluation study. Box 2 provides various pearls to help researchers who will tackle to program evaluation studies. Box 3 contains an annotated bibliography that summarizes key resources for further reading.
BOX 2. Pearls for those interested in conducting program evaluation work
Based on prior literature on innovation reports and program evaluations, we have identified some common problems encountered when authors claim to have conducted these formats of studies:
Pearl 1: Plan the program evaluation from the onset. Ideally, program evaluation should be established prior to the program launch (or at least prior to the most recent cohort). Performing program evaluation once the program is ongoing will limit the available information and increase the risk of recall bias.
Pearl 2: Consider all of the inputs and outputs. The evaluators will need to think beyond just the learner outcomes and consider the broader outcomes, impacts, and the resources and requirements to run the program.
Pearl 3: Attempt to identify unintended outcomes. Intended outcomes are often tracked but a systematic inquiry into identifying unintended outcomes is often overlooked.
Pearl 4: Involve a statistician or a data scientist early. Some program evaluation approaches require complex statistical analysis and even further data exploration to understand complex data to be collected through the program implementation. A statistician or a data scientist can provide different approaches on how to analyze data and understand the relationship on program focus and outcomes.
Pearl 5: Chart the overall program evaluation process. Program evaluation could be very complex from planning to evaluation. Each step of the program evaluation should be represented with a figure in the study. This charting process will give readers a clear idea about the program evaluation steps and how the framework was implemented at each step.
BOX 3. Key resources for further reading
The following are key papers on the program evaluation methodology recommended for those interested in learning more.
1. Frye AW, Hemmer PA. Program evaluation models and related theories: AMEE guide no. 67. Med Teach. 2012;34(5):e288-e299.
This is a review of several common program evaluation models and the benefits and limitations of each. The paper also provides examples of how to apply these in practice.
2. Cook DA. 2010. Twelve tips for evaluating educational programs. Med Teach. 32:296–301.
A concise article that breaks down program evaluation into twelve “tips” to guide the development and implementation. Not meant to be used alone, but again a solid introduction to the process with an included blank table for readers to start brainstorming their own program evaluations.
3. Goldie J. AMEE Education Guide no. 29: Evaluating Educational Programs. Med Teach. 2006; 28(3): 210–224.
An introductory how-to guide for program evaluation of educational programs in general including the history and the process. A solid starting point for someone who is unfamiliar with the process and a solid introduction to allow better integration of the information provided in the AMEE no. 67 (included below) which walks the reader through theories to use as frameworks for their program evaluations.
4. Durning SJ, Hemmer P, Pangaro LN. The Structure of Program Evaluation: An Approach for Evaluating a Course, Clerkship, or Components of a Residency or Fellowship Training Program. Teach Learn Med, 19:3, 308–318, 10.1080/10401330701366796
While the other articles included here involve program evaluation in general, this article focuses on applying program evaluation to graduate medical education. While it is just one particular framework out of many that are available, it provides insight into how to apply program evaluation to programs that don’t necessarily fit the usual educational program mold. For medical educators beginning their program evaluation journey, having this example will allow them to see how other frameworks might be used for their programs.
CONCLUSION
Program evaluations can be seen as a gateway towards other forms of scholarship for those who are most at home developing programs and curricula. However, it should be acknowledged as its own form of scholarship that is unique and separate from curriculum development or research.
CONFLICTS OF INTEREST
Dr. Shera Hosseini has received funding for her postdoctoral fellowship from the McMaster Institute for Research in Aging (MIRA). Dr. Yilmaz is the recipient of a 2019 TUBITAK Postdoctoral Fellowship grant. Dr. Shah—none; and no grants. Dr. Gottlieb holds grants for unrelated work with the Centers for Disease Control and Prevention, Council of Residency Directors in Emergency Medicine, Society for Academic Emergency Medicine, and eCampus Ontario. Dr. Stehman—none, Dr. Hall—holds grants for unrelated work from the Royal College of Physicians and Surgeons of Canada, Queen’s University Center for Teaching and Learning, and the Physician Services Incorporated Foundation. Dr. Chan holds grants for unrelated work from McMaster University, the PSI foundation, Society for Academic Emergency Medicine, eCampus Ontario, the University of Saskatchewan, and Royal College of Physicians and Surgeons of Canada.