Public oversight bodies have continued to issue a high proportion of negative inspection findings year after year despite the actions taken by audit firms to address deficiencies and empirical evidence suggesting audit quality is improving. We conducted an experiment where we manipulated the level of audit scepticism to explain the continued puzzling poor inspection results. Based on theory from psychology research on prevalence induced concept change we show that when audit quality improves our participants making inspection judgements are likely to subconsciously redefine what constitutes an acceptable audit thus leading to static judgements. Given these results, our theory suggests this is an alternative explanation for consistently high rates of negative inspection findings, and we provide suggestions for future research.

1 INTRODUCTION

Audits of financial statements constitute the main independent check on the integrity and reliability of the financial statements, providing investors, lenders and other users of these statements with greater trust and confidence in the information provided. Such trust and confidence are imperative for a robust and healthy economy (PJC, 2020) because it leads to well-functioning capital markets, which in turn facilitate investments to productive activities and contribute to economic growth (CAQ, 2019).

The efficacy of audits in instilling trust in financial statements depends greatly on public belief that auditors have the expertise and the independence to provide high quality audits. As such, in the past two decades, many countries have established public audit oversight bodies (POBs) to replace self-regulation by the audit profession. In theory, inspections of audit firms conducted by POB inspectors should increase investors' confidence about auditor independence and audit quality (Coates, 2007). Research in accounting provides empirical evidence that inspections by POB inspectors have improved audit quality (DeFond & Lennox, 2017; Lamoreaux, 2016), and it would be expected that deficiencies found in audits would decrease over time as a result. However, this has not happened. Despite the significant time and resources audit firms devote to improving audit quality and addressing inspectors' concerns (Aobdia, 2018; Johnson et al., 2019; Trotman, 2023), inspectors still identify a high proportion of deficiencies every year. For instance, in 2022, the Australian Securities and Investments Commission (ASIC, 2022a) found that in 32% of key audit areas reviewed for the largest six audit firms, auditors did not obtain sufficient appropriate evidence that the financial statements were free of material misstatements, which is similar to the prior year with similarly high figures found in US and UK jurisdictions (Ege et al., 2020; Martinow et al., 2020; Trotman, 2023). This is not a new phenomenon. For example, in 2014, the Wall Street Journal reported that one out of three audits failed to meet the standards of a Public Company Accounting Oversight Board (PCAOB) inspection (Chasan, 2014). This news was met with strong scepticism by leading academics on audit quality and several possibilities were suggested as to why the PCAOB's findings were either false or misleading (Peecher & Solomon, 2014).

Whether the high rate of inspection failures is false/misleading or not, news reports focus on these negative findings and have had harmful effects on the reputation of auditors (AUASB, 2019; Dowling et al., 2018; Trotman, 2023) and long-term negative effects on attracting high quality staff to the audit profession (Dowling et al., 2018; Ege et al., 2020). Submissions to the Parliamentary Inquiry on Regulation of Auditing in Australia (PJC, 2020) and academic research in the USA (Glover et al., 2019) have expressed surprise that the negative findings (percentage of deficient audits) have remained consistent over a number of years when these negative findings damage audit firm reputation and have major adverse consequences for the audit partners involved. In sum, inspectors continue to find constant high levels of audit deficiencies through their inspection process but evidence based on interviews with auditors (Glover et al., 2019) and audit committee chairs who observe the audit process at their companies (Simnett & Trotman, 2022) support the view that audit quality and auditing processes have improved over time.

Audit firms have previously argued that inspectors are errant in their assessment that a high proportion of audit engagements are deserving of negative inspection results (Ege et al., 2020). Prior research has attempted to explain this difference of opinion between audit firms and inspectors as a combination of different incentives, knowledge, and experience between auditors and inspectors as well as factors related to information processing including hindsight bias for inspectors who complete their inspections after the audit engagement is complete (Glover et al., 2019; Peecher et al., 2013; Trotman, 2023). We propose augmenting these explanations with an additional explanation: prevalence induced concept change (hereafter PICC).

Groundbreaking research by Levari et al. (2018) introduced the phenomenon of PICC to the psychology literature. In short, they find that as the prevalence of an underlying concept changes, individuals subconsciously redefine the concept. They show this first with coloured dots. Participants were shown dots randomly drawn from a colour distribution ranging from clearly blue to clearly purple with the colour of some dots being ambiguously between blue and purple. They found that when they decreased the prevalence of “blueness” in the dots, individuals expanded their concept of “blue” to include dots that they would have previously deemed purple – participants did this even if they were warned about the change in prevalence and given financial incentives to remain consistent. Once the phenomenon was established with a concept as discretely definable as colour, the authors then conducted experiments showing that it happens with fuzzier concepts such as aggressive appearance and ethicality via experiments wherein participants would rate previously non-threatening faces as a threat and previously ethical IRB proposals as unethical in response to changes in underlying prevalence.

How problematic PICC is in real world applications will be heavily context dependent. Levari et al. (2018, p. 1465) acknowledges this when stating the following:

When yellow bananas become less prevalent, a shopper's concept of “ripe” should expand to include speckled ones, but when violent crimes become less prevalent, a police officer's concept of “assault” should not expand to include jaywalking. What counts as a ripe fruit should depend on the other fruits one can see, but what counts as a felony, a field goal, or a tumor should not, and when these things are absent, police officers, referees, and radiologists should not expand their concepts and find them anyway.

Implicit in this recognition of the importance of context are two assertions: (1) understanding PICC will be more important in some contexts than in others; and (2) professional judgements made by individuals specifically trained to exercise judgement in their domain (e.g., police officers, referees, and radiologists) will also be subject to PICC. Although we believe both of these assertions are likely true, the second assertion is an extrapolation beyond the original participant pools of Levari et al. (2018). This raises the interesting question of whether PICC applies to audit inspectors and whether PICC explains the continuing high rate of deficiencies found by inspectors even though there is evidence that firms are taking recommended actions to overcome these deficiencies and both audit committee chairs and professional investors see audit quality as generally high (Trotman, 2023). We conduct an experiment with actual experts – professional auditors, who we use as a proxy for inspectors – exercising judgement within their domain of expertise.

If audit inspectors are susceptible to PICC, then they may find roughly the same number of deficiencies each year even if audit firms' various initiatives succeed at improving audit quality in absolute terms because audit inspectors would unconsciously expand their definition of what constitutes a deficient audit and keep the number of deficient engagements approximately consistent from year to year. Given the small number of inspectors in each country, it is not feasible to obtain a sufficient sample of inspectors for this study. However, as inspectors are commonly hired from practising auditors, auditors are appropriate surrogates for inspectors. These auditors also participate in internal inspection processes within their own firms with the aim of these internal inspections being to improve audit quality within the firms.

We therefore test our predictions by conducting a mixed experiment with a single between-subjects factor manipulated at two levels (stable scepticism vs. increasing scepticism) and a within-subjects factor represented by 100 repeated trials. The between-subjects factor ensures that participants in the stable scepticism condition see a uniform mix of different levels of scepticism throughout the task while those in the increasing scepticism condition gradually begin to see more highly sceptical actions and correspondingly fewer actions at low levels of scepticism. The between-subjects manipulation is accomplished through the use of a within-subjects factor of 100 repeated trials. Participants were presented with 100 short fictional audit vignettes and asked only a single question: “Has the auditor exercised sufficient professional scepticism?” for each vignette.1 In the stable condition, the proportion of sceptical audit procedures is held constant throughout the study with one-third being high-scepticism, one-third being medium-scepticism, and one-third low-scepticism. It is important to note that the terms high, medium, and low are used only in a relative sense. That is, high-scepticism audit procedures are relatively more sceptical than medium-scepticism audit procedures which are in turn more sceptical than low-scepticism audit procedures. However, due to the imprecise manner in which professional standards define scepticism (IAASB, 2023a) and the extant disagreement among audit professionals about what is and is not sceptical (Glover et al., 2019; Peecher et al., 2013), we cannot make definitive claims about how sceptical our created vignettes are in an absolute sense.

In the increasing scepticism condition, participants are presented audit vignettes that initially follow the same uniform scepticism distribution as the control condition, but as they see more vignettes, the relative proportion of low-scepticism audit procedures declines and correspondingly the proportion of high-scepticism audit procedures increases. This design choice in the lab is meant to mirror the real world where, due to increasing audit quality over time, audit inspectors are likely to generally see a higher average level of scepticism in inspected audits over time.

As predicted, we find that practising auditors – our surrogates for inspectors – are affected by prevalence induced concept change and are therefore more likely to classify a given level of scepticism as insufficiently sceptical when the overall prevalence of low-scepticism audit procedures decreases. Our results are important to audit firms, regulators, and those with oversight of capital markets. Inspections are arguably the best available measure of audit quality (PJC, 2020). If audit firms interpret inspection reports as an accurate signal of audit quality when, because of PICC, they are less reliable than assumed, it may lead to poor decision making. For example, audit firms may erroneously choose to discontinue programs directed at improving audit quality because these programs do not improve the outcome of inspection reports. Audit firms may also divert resources away from programs they believe are effective because they do not improve inspection results and adopt audit approaches they believe will please inspectors (Glover et al., 2019). We also believe our study should be of interest to POBs such as the ASIC or the PCAOB. To the extent that inspectors are making judgements that are affected by PICC, they should be aware of this potential bias and put steps in place to monitor and react to any effects of PICC. It is also important to understand whether these judgements are affected by PICC because parliamentary/senate inquiries around the world have focused on deficiency rates and called for changes to reduce deficiency rates. More broadly, we believe our study may apply to any form of regulation where the conditions for PICC are met, not just the inspection process within the auditing profession.

2 BACKGROUND THEORY AND HYPOTHESIS

2.1 Background

In response to the various financial scandals and accounting failures of the early 2000s, countries throughout the world began to create public oversight bodies (Harris, 2013). The PCAOB was among the first such bodies created. It was created as part of the Sarbanes–Oxley act and the PCAOB explicitly views its existence as an attempt to address a loss of investor confidence that resulted from the aforementioned scandals and accounting failures (Harris, 2013). One of the key responsibilities of the PCAOB is to inspect audit firms to ensure they are conducting audits in a manner worthy of the public trust so that society can avoid a repeat of the loss of investor confidence that was observed in the early 2000s. In the PCAOB's own words, “essentially the PCAOB audits the auditors” (Harris, 2013). Although the exact structure and mission of a POB will vary slightly from country to country, the notion that a POB ensures investor confidence by auditing the auditors is ubiquitous across jurisdictions. Given that investor confidence is a key reason that POBs exist, the outcomes from the POBs' inspections are released each year in the spirit of transparency.

According to the information released in POB inspection reports, POB inspectors determine that auditors do not obtain reasonable assurance that the financial statements are free of material misstatement before providing an unqualified opinion in roughly one third of cases (Ege et al., 2020). Given that POBs exist at least in part to ensure investor confidence remains high, this is a troubling finding. In addition, the deficiency rate is not only high but generally consistent from year to year (Ege et al., 2020). Even though deficiency rates have stayed consistent or increased over the last decade, some inspection agencies continue to believe that the reporting of deficiencies will lead to the improvement of inspection results over time as evidenced in the following quote from a 2022 ASIC media release:

Audit inspections are designed to promote audit quality and high-quality financial reports. ASIC encourages audit firms to continue to focus on improving audit quality, which will in time improve the overall level of findings.

Many stakeholders have expressed concerns about these increasing deficiency rates, resulting in senate inquires (e.g., Australian Government Treasury, 2024; PJC, 2020). A large body of extant research suggests that the high rate of negative findings in POB inspection results should not be taken at face value (Glover et al., 2019; Peecher et al., 2013; Peecher & Solomon, 2014). Specifically, this research suggests that inspectors are more likely to determine that an audit deficiency exists because they are evaluating highly complex judgements with the benefit of hindsight as well as because of other differences between auditors and inspectors including experience, training, and incentives (Glover et al., 2019; Peecher et al., 2013). This explains at least in part why the deficiency rate is so high, but does little to explain why the deficiency rate is not improving over time. Research on audit quality finds that audit quality has generally improved over time (Glover et al., 2019; Simnett & Trotman, 2022), suggesting that even if the audit failure rate is artificially inflated by factors such as hindsight bias and individual differences, it should still improve over time as audit firms focus on improving audit quality.

The idea that there is a mismatch between the consistently high rate of deficiencies and overall audit quality is echoed in a 2023 CAANZ report (Trotman, 2023). The report examines four conceptual views/indicators of audit quality: (1) conclusions of the parliamentary inquiry into the regulation of Audit in Australia; (2) views of audit committee chairs; (3) investors' views; and (4) ASIC inspection findings. In summarising the conclusions of parliamentary inquiry, the report argues that the PJC interim report accounts for the perceptions of a range of relevant stakeholders including the Financial Reporting Council, the Australian Institute of Company Directors, and senior partners from all Big 4 and many mid-tier firms and concludes that the audit system is working well and any needs for potential improvement are being addressed. This finding is well encapsulated in the following quote from the inquiry:

Notwithstanding the findings of ASIC's audit inspection program stakeholder perceptions suggest that, while there are opportunities for improvement, overall, the quality of audit in Australia is of a high standard. (PJC Inquiry Interim Report 2020, para 3.8)

Similarly, in discussing the views of audit committees, Trotman (2023) leverages the work of Simnett and Trotman (2022) on perceptions of audit quality by audit committee chairs in concluding that audit quality is high in Australia. For example, audit committee chairs surveyed all rated their incumbent auditors as either excellent (60% of cases) or above average (40% of cases). This finding that 100% of auditors are considered above average or better is an improvement over earlier iterations of the survey from 2020 and 2018 wherein it was 94% and 92% respectively. This may be due in part to the sentiment expressed by audit committee chairs that if their auditors were not at least above average they would quickly do something about it including potentially changing audit partners or putting the audit out to tender (Simnett & Trotman, 2022). Additionally, Trotman (2023) refers to findings that investors perceive auditors as the most trusted group when it comes to advancing investor protection rating ahead of regulators. For example, a study conducted by the FRC in conjunction with the AUASB found that:

Overall, 93% of professional investors indicated that audit quality is “average” or “above”. Correspondingly, only 7% indicated that audit quality is below average or poor (FRC/AUASB, 2019, p. 6).

Collectively, these three perspectives presented in the report suggest that audit quality in Australia is high and improving. This is in stark contrast to ASIC inspection reports which include a high number of deficiencies each year with no sign towards improvement. Further, as Knechel and Ghandar (2021) suggests, the reasons for this discrepancy are likely attributable to other factors than inspection reports simply capturing some component of audit quality overlooked by the other perspectives.

Peecher and Solomon (2014) offer one explanation for why the audit failure rate does not decrease: POB inspections are not based on a random sampling of all audits. Thus, even if there are fewer deficient audits overall, inspectors may simply be effective at finding problematic audits to include in their sample. Although this explanation no doubt plays a role in the consistency of inspection results from year to year, it is likely only a partial explanation. We propose augmenting the above explanations with a cognitive explanation: prevalence induced concept change (PICC).

2.2 Theory – prevalence induced concept change

Prevalence induced concept change is the phenomenon whereby individuals subconsciously redefine concepts in response to changes in prevalence (Levari et al., 2018). Levari et al. (2018) first demonstrated this phenomenon using colours. Their research leverages the fact that coloured dots as displayed on computers are clearly defined by their relative proportions of red, green and blue. As an example, a coloured dot classified as 1-0-254 would have only one part red (and no parts green) to 254 parts blue and would appear very clearly blue to all observers with normal vision. Adding just one part red (2-0-253) would make the dot slightly more purple but it would still appear extremely blue to every observer with normal vision. As more red is added, the dot will gradually appear more purple. As an example, almost all observers would classify 100-0-155 as very clearly purple. However, somewhere in between these extremes, there are coloured dots that are difficult to classify: dots for which individuals will be torn as to whether the dot should be classified as blue or purple.

Importantly, the exact dot that serves as an inflection point (i.e., the dot with enough red for an individual to no longer classify it as blue) will vary by individual, but an inflection point exists for everyone. Levari et al. (2018) show participants a series of random dots and for each dot simply asks, “Is this dot blue?” This allows the researchers to determine for each individual where their natural inflection point is (i.e., the level of red in a dot at which an individual no longer classifies the dot as blue). Once this inflection dot is established, the authors are then able to vary the prevalence of “blueness” in the dots by biasing the randomisation of dots to skew more purple. In doing so, they demonstrate that as the level of blueness in dots becomes less prevalent, individuals expand their concept of what qualifies as blue and classify dots that they would have previously considered purple as blue. In a follow-up experiment, Levari et al. (2018) find this impact on individuals' judgement persists even if they are warned in advance of the distributional shift and paid to remain consistent.

In subsequent experiments, Levari et al. (2018) demonstrate that individuals are also susceptible to PICC for less discretely definable concepts such as the aggressiveness perceived in a human face or ethicality. Participants were shown a series of computer-generated faces designed to display differing levels of aggression and participants were asked whether each face they were shown was a threat. Once a baseline was established the researchers then decreased the prevalence of threatening faces and found that participants responded by rating faces as threats that they previously determined were not threats. In another experiment participants were shown hypothetical IRB proposals of varying levels of ethicality and asked whether or not the researchers should be allowed to conduct the study (i.e., was the study sufficiently ethical). After participants rated enough proposals to establish a baseline, the prevalence of unethical proposals decreased and in response participants determined that proposals they would have previously deemed sufficiently ethical were no longer sufficiently ethical to be conducted.

PICC has also been examined in additional contexts outside of Levari et al. (2018). For example, Devine et al. (2022) explore the impact of PICC on perceptions of body type and finds that increasing the prevalence of thin bodies leads individuals to expand their concept of “overweight” to include bodies they would have otherwise judged to be normal. Additionally, Levari et al. (2024) examines so-called “fake news” in exploring the effects of increasing the prevalence of blatantly false news stories on the evaluation of “merely implausible” news stories. The authors find that exposure to a high prevalence of very implausible claims can increase belief in other more ambiguous claims. Levari et al.'s (2024) work is to our knowledge the first paper to explore in depth the psychological phenomena driving PICC. The authors suggest that PICC is driven at least in part by contrast effects whereby PICC causes individuals to redefine concepts because they are contrasting new instances of the concept with previously encountered instances of the concept.

Levari et al. (2018) argue that PICC is extremely important because it could taint the judgements of professional decision makers whose judgements are crucial to a well-functioning society. We agree with this assertion and believe that trained decision makers' judgements will also be tainted by PICC, but this is an extrapolation beyond the untrained student participants in Levari et al. (2018). More specifically, we expect the judgements of audit inspectors (proxied by auditors), who undergo extensive training to exercise judgement within their domain of expertise, will also be susceptible to PICC.

As the prevalence of scepticism in audits increases, we expect that audit inspectors will be prone to evaluate audit actions that they would have previously judged to be sufficiently sceptical as insufficiently sceptical. We choose to focus on scepticism because international standards on auditing (ISA) explicitly require auditor scepticism and scepticism is considered necessary for an effective audit (IAASB, 2023b). The auditing standards define professional scepticism as “an attitude that includes a questioning mind, being alert to conditions which may indicate possible misstatement due to error or fraud, and critical assessment of evidence” (IAASB, 2023a, p. 78). In addition, insufficient scepticism is a common deficiency described in inspection reports (Trotman, 2023) because of the complexities of evaluating whether or not there is a sufficient level of professional scepticism (Nelson, 2009; Nolder & Kadous, 2018; Stepankova et al., 2022; Stevens et al., 2019).

Furthermore, ASIC has long recognised the importance of professional scepticism for audit quality (ASIC, 2021). The significance of scepticism is evident in ASIC reports, which frequently refer to the importance of professional scepticism in ensuring quality audits (see ASIC, 2022b, 2022c). Inspectors' focus on professional scepticism is also evident in an ASIC (2017) report suggesting professional bodies to provide additional training and workshops on core skills to assist auditors in exercising professional scepticism. As can be seen from the ASIC (2022b) report, inadequate levels of professional scepticism, exercised by the engagement team, represent the second most common cause for negative audit findings. Recently, ASIC noted that professional scepticism should be one of the focus areas for audits of financial reports under COVID-19 conditions (ASIC, 2022c) and urged auditors to focus their professional judgement and scepticism on areas that are most reliant on estimates and are uncertain (ASIC, 2023). We summarise ASIC's findings and recommendations on professional scepticism in Appendix 1, and argue that these findings suggest that inspectors focus on professional scepticism and that professional scepticism is at the forefront of audit inspections. However, despite our focus on scepticism, we expect PICC to also apply to judgements about other components of the audit.

2.3 Hypothesis

As discussed previously, every individual will have a different baseline inflection point for PICC. In Levari et al. (2018), the specific dot that was no longer considered blue was different for each participant. Irrespective of this baseline, individuals shifted their inflection point in response to a change of the prevalence of blue in the colour distribution. In a similar fashion, irrespective of baseline judgements, we expect auditor judgements to shift in response to a change in the prevalence of scepticism in the audit procedures they are exposed to. Specifically, as auditors see fewer low-scepticism audit procedures and more high-scepticism audit procedures, we expect that they will rate audit procedures as insufficiently sceptical that they would previously have deemed sufficiently sceptical.2 We therefore formalise our hypothesis as follows:

H1.When the prevalence of low-scepticism audit procedures declines, individuals who experience the change in scepticism will be more likely to rate the actions of auditors as insufficiently sceptical for a given level of scepticism than individuals who do not experience the change in scepticism.

3 METHOD

We wrote 48 vignettes with the help of a retired partner from a Big Four auditing firm. For a given vignette, there were three versions with varying level of scepticism (high, medium, and low), yielding 144 vignettes in total.3 The vignettes contained between 25 and 80 words with an average of 51 words. Each vignette describes an issue auditors faced and the actions they took to address that issue. Appendix 2 provides five examples of high-, medium-, and low-scepticism vignettes. However, it should be noted that participants do not see matched versions presented together. That is, a participant could not see a high-scepticism version of a vignette and then immediately see the low- or medium-scepticism version of the same vignette. Different versions of the same vignette are always separated by a minimum of 30 vignettes. To ensure that the perceived level of scepticism for each vignette was in line with its classification, we also piloted all vignettes with student participants to ensure there were no inconsistencies such as a low-scepticism vignette perceived as sufficiently sceptical by a high proportion of participants or a high-scepticism vignette perceived as insufficiently sceptical by a high proportion of participants.

Practising auditor participants took the role of a reviewer for an audit firm. This reviewer role should be familiar to auditors as audit firms adopt a hierarchical group process whereby work at each level is reviewed by the level above (Trotman et al., 2015).4 Participants received instructions that indicated that to help the researchers better understand auditor judgements, they would be presented with a series of vignettes that included the actions of a fictional audit team. They were further instructed that after reading each vignette and accompanying audit team actions they were to answer the following question with either yes or no: “Has the auditor exercised sufficient professional scepticism?” Because this is a departure from practice, where audit actions must be considered within a larger context in order to evaluate scepticism, participants were specifically instructed that “In practice auditor scepticism must be considered in the context of the audit as a whole. However, in your role today, we ask that you consider only the information presented to you in deciding whether the auditors exercised sufficient professional scepticism.” Participants were also instructed to consider each vignette independently from every other vignette. Finally, participants were instructed that in order to avoid fatigue they would be presented with a rest screen after every 10 questions to take a brief break if necessary, but were further instructed to complete all 100 vignettes in one sitting.5 Over the course of 100 vignettes, we decreased the prevalence of low-scepticism audit vignettes (with a corresponding increase in high-scepticism vignettes) for participants in the increasing scepticism condition but not for participants in the stable scepticism condition. We term the variable for this split into differing prevalences of vignette scepticism prevalence condition.

In the stable scepticism condition, on average a third of the vignettes they saw in each block of questions were high-scepticism, a third were medium-scepticism, and a third were low-scepticism. The increasing scepticism condition is identical to the stable scepticism condition for the first four vignette blocks (40 vignettes total). That is, participants in both conditions see the same exact 40 vignettes in the first four blocks, but then participants begin to see relatively more high-scepticism vignettes and fewer low-scepticism vignettes starting from block 5. Specifically, the proportion of low-scepticism vignettes drops to 15% in blocks 5 and 6 and then drops further to 5% in blocks 7 to 10.6 The proportion of medium-scepticism vignette prevalence remained unchanged through all 10 blocks.

Since participants were asked to complete a total of 100 vignettes, it was critical to design the task in a manner that would both limit the presence of fatigue as well as preclude fatigue effects from impacting the inferences of any analysis. Given that our task, though long, is equally long for both conditions, it is extremely unlikely that fatigue would drive any between-subjects differences. However, fatigue could explain within subject variation absent due consideration in the experimental design. For example, tired participants may adopt a heuristic that caused them to respond differently to high-, medium-, and low-scepticism vignettes differently due to fatigue. Adopting such a heuristic would require the various vignette types to differ on something easily cognitively accessible. We believed that vignette length was the most likely possible source of such a heuristic; for example, a participant could decide to always assess instances where the description of the auditor's actions is long as sufficiently sceptical and short instances as insufficiently sceptical. Therefore, in addition to making the vignettes as similar as possible at any given level of scepticism, we also conducted an untabulated analysis of vignette length and determined that high-, medium-, and low-scepticism vignettes were 53, 51, and 49 words long on average and none of these differences are statistically significant. Finally, we note that our task is similar in length to one of the tasks in Levari et al. (2018) and the authors do not report any issues in relation to fatigue in that study. However, it is a limitation of our design that we cannot rule out or quantify the potential impact of fatigue on our results if any.

Participants in this study were 67 practising auditors recruited in Australia from various large and mid-size audit firms. Participants' work experience ranged from 2.5 to 16 years with an average work experience just above 6 years and a standard deviation of 3.5 years. We requested our contacts at the audit firms send our materials only to individuals who were seniors or above as we did not believe participants below this level of experience could be reasonable proxies for our population of interest.7 Participants received monetary remuneration for their participation in the form of a A$60 gift voucher. Upon signing up to participate in the study, individuals were sent a Qualtrics link that allowed them to access the experimental materials.

4 RESULTS

To determine whether the decreasing prevalence of low-scepticism vignettes (and correspondingly the increasing prevalence of high-scepticism vignettes) caused participants' concept of a insufficiently sceptical audit procedures to expand, we conduct an analysis of covariance (ANCOVA) analysis in which the primary dependent variable is the percentage of vignettes for which participants answer yes to the question: “Has the auditor exercised sufficient professional scepticism?” for each vignette type. For example, a participant who sees 20 medium-scepticism vignettes and answers “yes” to 10 of these vignettes would have a value of 50% for the medium-scepticism level of this dependent variable.

We leverage the fact that participants in all conditions see the same initial 40 vignettes to let each participant serve as their own control by using the percentage of “yes” answers in the initial 40 vignettes as a proxy for their baseline level of scepticism and labelling this control variable Judgement Propensity Control.8 Correspondingly, our primary dependent variable is based on participant responses to the final 60 questions. We have a single dependent variable but analyse this dependent variable at three separate levels: one for each level of vignette-scepticism (low, medium, and high). Our dependent variable measures in percentage terms how frequently participants answered “yes” to the question “Has the auditor exercised sufficient professional scepticism?” Therefore, a value of 50% would indicate that participants answered “yes” exactly half the time. It is necessary that this dependent variable be analysed at three levels because participants see a different number of vignettes for the high- and low-scepticism vignettes. As a result, any comparison for the final 60 vignettes taken as a whole instead of at the category level would not be a meaningful comparison because the increasing scepticism condition should mechanically have a higher proportion of “yes” answers simply because participants are presented more high- and fewer low-scepticism vignettes.

Our test results are presented in Table 1. We expected to observe the effects of PICC to be strongest in our study where participant judgements are most split because the marginal case is most likely to be affected by a shift in the cutoff for what is considered (in)sufficiently sceptical.9 As Panel A of Table 1 shows, participant judgements are closest to 50% (i.e., most split) in the control condition for low-scepticism vignettes.10 Therefore we conclude that because participants' judgements are most divided in the low-scepticism condition, the low-scepticism condition is where we should expect to observe the strongest results for our sample.

TABLE 1. The effect prevalence on auditor judgement^a.

Panel A: Means (standard deviations) proportion of sceptical judgements (n = 67)

Stable

Increasing

Average

Low-scepticism^b

48.6%

(21.7%)

n = 32

34.9%

(28.8%)

n = 35

41.4%

(26.4%)

Medium-scepticism^b

65.9%

(20.3%)

n = 32

62.4%

(20.6%)

n = 35

64.1%

(20.4%)

High-scepticism^b

83.4%

(14.4%)

n = 32

83.0%

(11.9%)

n = 35

83.2%

(13.1%)

Panel B: Analysis of covariance (low-scepticism vignettes)
Factor	df	Sum of squares	F	p-value^c
Prevalence condition^d	1	1262.51	4.033	0.025
Judgement propensity control^e	1	22,852.94	73.009	<0.001
Error	64

Panel C: Analysis of covariance (medium-scepticism vignettes)
Factor	df	Sum of squares	F	p-value^c
Prevalence condition^d	1	4.52	0.024	0.439
Judgement propensity control^e	1	15,064.47	78.859	<0.001
Error	64

Panel D: Analysis of covariance (high-scepticism vignettes)
Factor	df	Sum of squares	F	p-value^c
Prevalence condition^d	1	16.51	0.113	0.369
Judgement propensity control^e	1	1866.86	12.751	0.001
Error	64

^a Percentages indicate how frequently participants answered “yes” to the question “Has the auditor exercised sufficient professional scepticism?” for the last 60 vignettes (i.e., after the distributions in the increasing condition changed).
^b These terms classify the relative level of scepticism for each type of vignette; i.e., on average vignettes classified as high are more sceptical than those classified as medium which are in turn more sceptical than those classified low. These classifications do not necessarily correspond to scepticism in any absolute sense.
^c Reported p-values are two-tailed unless testing a one-tailed prediction as signified by bold.
^d This is a binary variable equal to 1 for participants who see more high- and fewer low-scepticism vignettes over time. All other participants see an equal distribution of the three vignette types.
^e Judgement propensity control is a covariate based on the proportion of the time participants answer “yes” in the first 40 vignettes; i.e., when both conditions are the same.

Directionally, as shown in Table 1 Panel A, results are consistent with our hypothesis at all levels of vignette-scepticism. However, as Panels, B–D of Table 1 show, the effect of varying scepticism prevalence is statistically significant only for low-scepticism vignettes (F = 4.03, p = 0.025). Collectively, these results provide support for our hypothesis as our participants did rate audit procedures as insufficiently sceptical (as indicated by “no” answers) where previously they would have rated them as sufficiently sceptical (as observed in the low-scepticism vignettes). We interpret this as evidence suggesting that the judgements of participants were impacted by PICC.

Our analysis at each level of scepticism is necessary as explained above, but is also a slight mismatch to our setting of interest. Our analysis shows a greater number of deficiencies for a constant level of audit quality; that is, at a given quality level. Our applied setting of interest is one where audit quality is potentially increasing and overall deficiency rates remain constant. For the overall sample audit, scepticism is increasing in the increasing condition due to the change in the mix of high- and low-scepticism vignettes. Thus, a decrease in an individual's willingness to classify a vignette as sufficiently sceptical at a given level of scepticism will push their overall willingness to say yes to levels similar to before changing scepticism prevalence. This suggests that our findings can plausibly help explain the static deficiency rates observed among real-world inspectors.

We next consider whether our results are moderated by the experience of participants. As our true population of interest – inspectors – is likely more experienced on average than the participants at the low end of the experience distribution among our participants, if our results were driven by inexperienced auditors that would be problematic for our ability to generalise to our population of interest. We therefore conduct a supplemental analysis (untabulated) using only participants at or above mean level of experience of 6 years. We find that p-values for the prevalence condition variable decrease using only auditors above the mean experience which provides confidence that our results are not driven by the relatively inexperienced auditors in our sample.

5 CONCLUSION AND DISCUSSION

Academics and practising accountants have long been puzzled by the consistently high deficiency rates found in inspection reports. The consistency of these deficiency rates is incongruent with research suggesting that audit quality is improving (e.g., Glover et al., 2019; Simnett & Trotman, 2022) and has significant negative effects both for the audit firms who receive high deficiency rates in their inspection reports and for the staff who are employed by audit firms (AUASB, 2019; Dowling et al., 2018; Ege et al., 2020; Trotman, 2023). Prior research has attempted to explain these incongruent results as a combination of different incentives, knowledge, experience, and factors related to information processing (Glover et al., 2019; Peecher et al., 2013). We augment these results by demonstrating that PICC is an additional explanation for consistently adverse inspection results. In contrast to other explanations such as different incentives, PICC is a bias that individuals are not consciously aware of. As such, to the extent that PICC influences the consistently high rate of deficiencies, POBs may be unaware of the role it plays. Nevertheless, the negative impact of consistently high rates of deficiencies in inspections on firms and the profession are independent of intentions. To the extent that PICC may be a contributing factor to these inspection results, we encourage further research to examine the conditions under which PICC may occur in the inspection process.

Furthermore, we believe our findings have potentially broader applicability outside of auditing and the inspection process. We make this claim with the caveat that just as expert judgements were an extrapolation beyond the original participant pool of Levari et al. (2018), non-audit expert judgements are similarly an extrapolation beyond our data. It is possible that the decision-making training undergone in other domains decreases the likelihood of judgements being affected by PICC. A further limitation is that the time horizon of our experiment differs greatly from the time horizon that will be experienced by inspectors. Specifically, in our setting participants experience a shift in the prevalence of scepticism over the course of approximately an hour. In practice, improvements in auditor scepticism or audit quality are experienced by inspectors over a much longer time horizon. This difference between our study and practice leaves several important questions unanswered. For example, in applied settings will judgements continue to drift without limit so long as prevalence continues to change? The theory we rely on for our paper in Levari et al. (2018) suggests that PICC can cause a constant revision of what represents the new normal and therefore problems associated with PICC could potentially be exacerbated over longer time horizons. However, this discussion is based on our understanding of the still limited body of extant PICC research. Reliably drawing conclusions about the long-term implications of PICC requires a longitudinal study. Nevertheless, we believe we provide at least preliminary evidence that even experts' judgements can be affected by PICC.

As such, we believe it is important that we suggest possible remedies to the problem PICC potentially represents for the inspection process. We do so, with the caveat that our research was designed to establish the possibility that PICC can impact the judgements of inspectors rather than to test possible interventions for reducing PICC. Levari (2022) suggests that the impact of PICC can be lessened by exposing individuals to extreme examples within a population to help individuals to calibrate themselves. As such, a potential first step to lessening the impact of PICC may be as simple as ensuring that inspectors regularly undergo training that expose them to both extremely well and extremely poorly done audits. We believe it is important that more research be conducted on this issue.

Another limitation of our study is that we use practising auditors as surrogates for inspectors. We made this choice because it was not possible to recruit a sufficient number of inspectors to participate in the study. Following Peecher et al. (2013), for theory testing purposes we used knowledgeable auditors to complete the task. However, we recognise potential differences between auditors and inspectors in terms of experiences, knowledge and incentives (Peecher et al., 2013). Another limitation is that due to the nature of our manipulation, a single “yes” or “no” answer has a larger impact on the low-scepticism average in the increasing scepticism condition compared to the stable condition. This is a limitation inherent to the design but the difference grows less pronounced as observations increase. A final limitation of our study is that because scepticism cannot be precisely measured and “appropriate” scepticism is not precisely defined by existing standards, we cannot unequivocally say what the “right” judgement is for every one of our vignettes. That is, we cannot say what percentage of each scepticism level (high, medium, and low) should have been rated as appropriately sceptical. However, in practice inspectors are granted significant discretion in interpreting audit standards and irrespective of what the “true” interpretation should be, a lack of consistency in how inspectors interpret standards over time is problematic because of the consequences such inconsistency creates for audit firms.

ACKNOWLEDGEMENT

Open access publishing facilitated by University of New South Wales, as part of the Wiley - University of New South Wales agreement via the Council of Australian University Librarians.

APPENDIX: 1 SUMMARY OF ASIC REPORTS

Report	Finding
ASIC (2017)	Has suggested that they (professional bodies) provide additional training and workshops on core skills to assist auditors in exercising professional scepticism. Do audit committee members maintain professional scepticism and a questioning attitude towards the information received from management and in considering the quality of the audit? Do directors and the audit committee challenge the auditor, including on professional scepticism applied in judgement areas such as accounting estimates and accounting policies? Did the auditor exhibit sufficient professional scepticism in challenging, rather than rationalising, estimates and accounting policy choices (e.g., complex or subjective asset valuations, including cases where the reported net assets exceed the market capitalisation of the company)?
ASIC (2021)	In respect to audit quality, states that audit quality can be influenced by factors such as “an audit firm's culture and focus on audit quality, professional scepticism and consultation”
ASIC (2022b)	A summary of findings from ASIC review of root cause analysis of negative audit quality findings performed by the largest six audit firms between 1 July 2020 and 31 December 2021 ASIC notes that audit firms have, in some instances, not identified the root cause of negative audit findings correctly, and states that “the audit quality finding and firm's documentation reviewed by ASIC indicated that the primary root causes could include supervision and review, knowledge and skills of individuals, and the level of professional scepticism exercised by the engagement team” (p. 9)
ASIC (2022c)	Suggest professional scepticism as one of the focus areas for audits of financial reports under COVID-19 conditions. Specifically, the report recommends “applying professional scepticism and appropriately challenging estimates, assumptions, assessments, and the sufficiency and appropriateness of audit evidence”
ASIC (2023)	Highlights focus areas for 31 December 2023 reporting and urges auditors to “focus their professional judgement and scepticism on those areas of the financial report preparation process that are most reliant on estimates and are uncertain” (Kate O'Rourke ASIC Commissioner)

APPENDIX: 2 AUDIT VIGNETTES

Vignette example 1
High level of scepticism	The audit team has identified management review controls as a potential risk. In particular, the auditors are concerned with the degree of precision applied in management review controls The audit team determines management's level of precision by observing corrections management has made, interviewing management directly, and conducting follow up testing to ensure the controls are being applied consistently Has the auditor exercised sufficient professional scepticism?
Medium level of scepticism	The audit team has identified management review controls as a potential risk. In particular, the auditors are concerned with the degree of precision applied in management review controls The audit team observes corrections management has made via management review controls and infers a level of precision accordingly Has the auditor exercised sufficient professional scepticism?
Low level of scepticism	The audit team has identified management review controls as a potential risk. In particular, the auditors are concerned with the degree of precision applied in management review controls The audit team ensures that management review controls have been performed by checking for relevant signatures Has the auditor exercised sufficient professional scepticism?
Vignette Example 2
High level of scepticism	The client recently expanded to new geographical markets and has reduced its credit requirements for new customers in these areas. The allowance for doubtful accounts as a percent of sales has slightly increased relative to previous years The audit team develops an independent expectation of the allowance to corroborate the reasonableness of managements' estimate Has the auditor exercised sufficient professional scepticism?
Medium level of scepticism	The client recently expanded to new geographical markets and has reduced its credit requirements for new customers in these areas. The allowance for doubtful accounts as a percent of sales has slightly increased relative to previous years The audit team tests the process used by management to develop the estimate. During the testing process, they analyse historical data used by management to assess whether the data is comparable and consistent with data of the current period Has the auditor exercised sufficient professional scepticism?
Low level of scepticism	The client recently expanded to new geographical markets and has reduced its credit requirements for new customers in these areas. The allowance for doubtful accounts as a percent of sales has slightly increased relative to previous years The audit team relies solely on the work of a management-hired specialist to assess the valuation of allowance for doubtful accounts Has the auditor exercised sufficient professional scepticism?
Vignette Example 3
High level of scepticism	Management's assumptions relevant to their goodwill impairment assessment is based on being first to market on a new heart monitoring devise. To meet the goal of first to market, an aggressive timeline exists to commence production The audit team engages project management specialist to assess likelihood of on-time project completion Has the auditor exercised sufficient professional scepticism?
Medium level of scepticism	Management's assumptions relevant to their goodwill impairment assessment is based on being first to market on a new heart monitoring devise. To meet the goal of first to market, an aggressive timeline exists to commence production The audit team inspects prototype to assess feasibility of production and inquires of production personnel about the production schedule Has the auditor exercised sufficient professional scepticism?
Low level of scepticism	Management's assumptions relevant to their goodwill impairment assessment is based on being first to market on a new heart monitoring devise. To meet the goal of first to market, an aggressive timeline exists to commence production The audit team reviews internal company communications regarding the likelihood of on-time project completion Has the auditor exercised sufficient professional scepticism?
Vignette Example 4
High level of scepticism	The audit team is meeting at the beginning of a repeat-engagement for a fraud brainstorming session An audit manager has reviewed the prior year audit file and prepared an initial list of possible fraud risk factors for discussion. The audit manager also obtains a report from a forensic specialist that includes fraud risk factors specific to the industry Has the auditor exercised sufficient professional scepticism?
Medium level of scepticism	The audit team is meeting at the beginning of a repeat-engagement for a fraud brainstorming session The audit manager has prepared an initial list of possible fraud risk factors for discussion based on their knowledge of the industry and a report from a forensic specialist Has the auditor exercised sufficient professional scepticism?
Low level of scepticism	The audit team is meeting at the beginning of a repeat-engagement for a fraud brainstorming session The audit senior includes fraud as an agenda item for the brainstorming meeting and the audit senior asks the team members for any fraud risk factors they can think of and consider worth discussing Has the auditor exercised sufficient professional scepticism?
Vignette Example 5
High level of scepticism	The audit team is reviewing the classification of borrowings in the financial statements The audit senior reviews the underlying loan agreements from the bank to identify the appropriate terms and conditions for disclosure in the financial statements, including the current and non-current classification Has the auditor exercised sufficient professional scepticism?
Medium level of scepticism	The audit team is reviewing the classification of borrowings in the financial statements The audit senior compares the current and non-current classification of borrowings to the prior year and reviews the evidence on the audit file related to any changes in borrowings during the year Has the auditor exercised sufficient professional scepticism?
Low level of scepticism	The audit team is reviewing the classification of borrowings in the financial statements The audit senior agrees the current and non-current classification of borrowings to the supporting analysis provided by the client Has the auditor exercised sufficient professional scepticism?

Open Research

DATA AVAILABILITY STATEMENT

Research data are not shared.

REFERENCES

1 A sample of five vignettes created for this study can be found in Appendix 2.
2 It should be noted that “scepticality” is not precisely measurable the way colour is. As a result, although we can confidently use the terms high-, medium-, and low-scepticism audit procedures in relative terms, we cannot definitely say how sceptical each of our vignettes in each of these categories are in absolute terms.
3 There were more than 100 vignettes even though participants only saw 100 vignettes. Participants in the treatment condition see more high-scepticism vignettes, necessitating a need for 48 high-scepticism vignettes. Since each vignette is matched with another vignette of two other scepticism levels, 144 vignettes had to be created in total.
4 We elected not to ask auditors to assume the role of a POB inspector because we believed that this would introduce unnecessary noise as each auditor attempted to conform their judgements to their own individual perceptions of POB inspectors which may or may not be accurate.
5 Ten questions form a block of questions delineated by a rest screen after each block, but participants were presented questions one at a time within blocks. Question order was random within blocks and each question was presented on its own screen.
6 This specific timing and the accompanying percentages were chosen to mirror as closely as possible the design of Levari et al. (2018). Because we have fewer trials than Levari et al. (2018), we elected not to use the same method of randomisation as Levari et al. That is, instead of selecting a low-scepticism audit 15% of time in blocks 5 and 6 and 5% in blocks 7 to 10 with the possibility of any given individual seeing more or less than that percentage based on random chance, we include exactly 15% of the vignettes to be low-scepticism in blocks 5 and 6 and exactly 5% in blocks 7 to 10.
7 Seven responses were received from participants with 2 or fewer years of experience despite our request. We did not include these participants as they did not meet the minimum experience levels required.
8 An alternative approach is to calculate Judgement Propensity Control at each level of scepticism and use different covariates in analyses at different level of scepticism. Our results and inferences are robust to this alternative approach of calculating Judgement Propensity Control.
9 However, for our participants we cannot definitely determine ex ante the scepticism level at which participants' judgements are expected to be most divided, as this largely depends on the parameterisation in the experiment and will be influenced by the characteristics of our participant pool.
10 Alternatively, the entire sample of participants can be used examining only the first four blocks (40 questions) as both conditions see a uniform distribution of vignette scepticism to that point. This approach yields qualitatively identical inferences, that is, results are most split for low-scepticism vignettes.

Volume64, Issue4

December 2024

Pages 4429-4446

The effects of prevalence induced concept change on audit scepticism judgements

Abstract

1 INTRODUCTION

2 BACKGROUND THEORY AND HYPOTHESIS

2.1 Background

2.2 Theory – prevalence induced concept change

2.3 Hypothesis

3 METHOD

4 RESULTS

5 CONCLUSION AND DISCUSSION

ACKNOWLEDGEMENT

APPENDIX: 1 SUMMARY OF ASIC REPORTS

APPENDIX: 2 AUDIT VIGNETTES

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

The effects of prevalence induced concept change on audit scepticism judgements

Abstract

1 INTRODUCTION

2 BACKGROUND THEORY AND HYPOTHESIS

2.1 Background

2.2 Theory – prevalence induced concept change

2.3 Hypothesis

3 METHOD

4 RESULTS

5 CONCLUSION AND DISCUSSION

ACKNOWLEDGEMENT

APPENDIX: 1 SUMMARY OF ASIC REPORTS

APPENDIX: 2 AUDIT VIGNETTES

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

References

Related

Information