A Multicenter Validity Study of Four Smartphone Hearing Test Apps in Optimized and Home Environments
This work was presented as a poster at the British Academic Conference of Otolaryngology annual conference (Birmingham, UK, February 15–17th 2023).
The authors have no other funding, financial relationships, or conflicts of interest to disclose.
Abstract
Objective
Pure tone audiometry (PTA) is the gold standard for hearing assessment. However, it requires access to specialized equipment. Smartphone audiometry applications (apps) have been developed to perform automated threshold audiometry and could allow patients to perform self-administered screening or monitoring. This study aimed to assess the validity and feasibility of patients using apps to self-assess hearing thresholds at home, with comparison to PTA.
Methods
A multi-center, prospective randomized study was conducted amongst patients undergoing PTA in clinics. Participants were randomly allocated to one of four publicly-available apps designed to measure pure tone thresholds. Participants used an app once in optimal sound-treated conditions and a further three times at home. Ear-specific frequency-specific thresholds and pure tone average were compared using Pearson correlation coefficient. The percentage of app hearing tests with results within ±10 dB of PTA was calculated. Patient acceptability was assessed via an online survey.
Results
One hundred thirty-nine participants submitted data. The results of two at-home automated smartphone apps correlated strongly/very strongly with PTA average and their frequency-specific median was within ±10 dB accuracy. Smartphone audiometry performed in sound-treated and home conditions were very strongly correlated. The apps were rated as easy/very easy to use by 90% of participants and 90% would be happy/very happy to use an app to monitor their hearing.
Conclusion
Judicious use of self-performed smartphone audiometry was both valid and feasible for two of four apps. It could provide frequency-specific threshold estimates at home, potentially allowing assessments of patients remotely or monitoring of fluctuating hearing loss.
Level of Evidence
2 Laryngoscope, 134:2864–2870, 2024
INTRODUCTION
More than 20% of the world's population have hearing loss, with greater than 5% experiencing loss that impedes communication without hearing support.1 It is a significant disability associated with a negative psychosocial burden,2 deleterious health conditions,3 and high economic costs through losses in productivity.4 Early detection and subsequent treatment through screening can negate these negative consequences.5 Furthermore, rapid, remote hearing monitoring would be valuable for clinical diagnosis and research.
The gold standard for hearing assessment is pure tone audiometry (PTA). This requires in-person access to specialized equipment and staff, which may preclude access, particularly in rural or resource-poor locations.6 One way of overcoming these challenges is with automated threshold audiometry whereby users self-determine the hearing threshold. Meta-analyses have found that automated audiometry may be comparable with PTA in adults with mild and moderate hearing loss.7-9 Some methods necessitate calibrated headphones or other equipment (such as a notebook personal computer, an external sound card, or portable audiometer10, 11) with others requiring no additional equipment (other than users' earbuds and smartphone). The latter are ideal for home testing.
Smartphone applications (apps) are a common method of automated threshold audiometry. They have been applied in school screening,12 resource-poor locations,13 the COVID-19 pandemic,14 the elderly,15 and in occupational health.16 A 2016 review found 30 apps providing automated audiometry, with only five in any validation study and sample sizes ranging from 25 to 110.17 A subsequent 2021 review identified 44 publicly available apps of which only seven were validated.14
The location in which smartphone and tablet-based audiometry apps have been tested is variable. and while many tests have been outside sound-treated rooms, these have typically been supervised in quiet offices, clinics or school rooms, with evidence that even in these controlled environments, ambient noise may be detrimental to app performance.8 Crucially, it appears that no apps have been tested in the “real-world” home environment, which is essential prior to recommending apps for clinical or research monitoring of hearing at home.
Aim
To compare gold standard audiologist-led PTA versus smartphone hearing test results performed both at home and in hospital sound-treated booths.
MATERIALS AND METHODS
Study Design
This is a prospective, multicenter validation study conducted between August 2021 and March 2023 at four hospitals providing secondary or tertiary ENT and audiology care. It was designed to validate the feasibility of using four smartphone audiology apps for patient-delivered hearing assessment in comparison to the gold standard of PTA.
Ethical approval was obtained from the Research Ethics Committee (20/WM/0324). Local approval was obtained at each center. All patients provided written informed consent before enrolment. STrengthening the Reporting of OBservational studies in Epidemiology guidelines were utilized.18
Participants were asked to download a named app onto their personal device. Four apps were selected, two from each of iOS and Android operating systems: Easy Hearing Test (iOS-1), Hearing Test and Ear Age Test (iOS-2), Eartone Hearing Test (Android-1), and Hearing Test (Android-2). Apps were selected on the basis that they were free, available in the United Kingdom (UK), provided a graphical output, and did not require participants to input personal information. The authors did not develop these apps. At the time of this work there were no apps available for both iOS and Android that met the selection criteria on both platforms.
Participants who were already attending outpatient clinic for a clinically-indicated PTA were approached by the study team. After consent was provided, patients were randomized to one app. Randomization was stratified by the operating system of the smartphone that the participant used. Each site was provided with numbered sealed envelopes with two series, marked either “Apple” or “Android”. Within the two series, there were a balanced number of envelopes specifying each of the apps (block randomization of four), with position in the series generated via an online randomizer. The envelopes contained app-specific instructions, consent forms, and inexpensive earbud-style earphones (Prosignal/PSG08468) to use for testing. These had been tested by our group and shown to provide adequate frequency response using KEMAR dummy head.19
Participants performed the app hearing test once in hospital (in a sound-treated room), and three further times at home (days 1, 3, and 7). They were instructed to perform the home hearing test in a quiet environment avoiding background noise. Participants returned their results by a secure study email address. Following completion, participants were invited to complete a voluntary online Google Forms survey regarding their experience of using the app at home. This consisted of a series of questions where participants could rank their view on a 5-point Likert scale: 1 was rated as “strongly disagree” and 5 as “strongly agree”.
Participants
Eligible participants were individuals over 18 years of age who presented to ENT or audiology clinics for outpatient assessment of their ears or hearing, and who owned a smartphone that ran iOS or Android operating system. Only participants requiring PTA as part of their standard care, as determined by a clinician independent of this study, were eligible for recruitment. Participants were included in analyses if they submitted at least one smartphone app hearing test.
Exclusion criteria included: 1. Active ear infection or discharge; 2. Hearing loss >90 dB at three consecutive frequencies.
The severity of hearing loss was classified according to the British Society of Audiology guidelines: normal <20 dB, mild 21–40 dB, moderate 41–70 dB, severe 71–95 dB, profound >95 dB.
Statistical Analysis
An a priori power calculation was performed to examine correlation across repeats in different environments (repeated measures ANOVA for multiple time points), to compare PTA versus apps (t-test for paired data), and to compare app-assisted hearing testing at the hospital versus at home (t-test) in G-Power (v3.1; Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany). This indicated a sample size of n = 34 in each group (minimum n = 136 overall) based on 80% power and alpha of 0.05. Recruited patients who did not submit any data were not counted toward the overall sample size. Participants who submitted at least one smartphone result were included.
The following statistical analyses were performed in MATLAB (v 9.10 [R2021a]; Natick, Massachusetts). Data were examined for normal distribution. Pearson's correlation coefficient (CC) was used when data were normally distributed; Spearman's correlation test was planned if data were not normally distributed. The following comparisons were performed between smartphone app audiometry and PTA: (a) Ear-specific frequency-specific thresholds at 0.5, 1, 2, 4, and 8 kHz; (b) The pure tone average thresholds (mean of 0.5, 1, 2, 4, and 8 kHz). Although not conventional, 8 kHz was included in the pure tone average as it is valuable in ototoxicity monitoring. The following comparisons were performed between smartphone app audiometry performed in optimal (in sound-treated room) and home conditions: (a) Ear-specific frequency-specific thresholds; (b) Pure tone average thresholds.
Test–retest reliability for app pure tone average thresholds (mean of 0.5, 1, 2, 4, and 8 kHz) was examined by intraclass CC (ICC): optimal sound-treated versus final at home conditions, and first versus final at home conditions.
The median, interquartile range, and percentage of app-derived thresholds within ±10 dB of the hospital PTA were calculated at each frequency, and across the pure tone average.
A clinically reasonable correlation was defined as strong (0.6–0.79) or very strong (0.8–1) correlation, and median within ± 10 dB accuracy across all frequencies. Weak correlation was specified as 0–0.39; moderate as 0.4–0.59.
RESULTS
Demographics and Baseline Characteristics
One hundred fifty-six participants across four sites in the UK provided written informed consent (Supplementary Material 1). Seventeen recruited participants were excluded from analyses as they failed to submit any data to the study team. Data from 139 participants was submitted representing 388 smartphone hearing tests and 772 ear app-audiograms in total (two patients submitted unilateral readings). Sixty-five participants (47% of cohort) submitted a complete dataset comprising four app hearing test results.
The mean age of participants was 48 years (range:19–78, SD:15); 69 participants were female (50%). The most common documented reason for undergoing a PTA was chronic otitis media (COM) (Supplementary Material 2). 58% of ears (n = 161) had normal hearing, whilst the remainder had abnormal hearing (mild 26% [n = 71], moderate 13% [n = 36], severe 4% [n = 10]). The mean time to complete the app in the sound-treated room was 6 minutes (range:2–15, SD:2.9).
Smartphone Audiometry Versus PTA: Frequency-Specific Results
The Pearson's CC of PTA versus home app hearing test for the frequency-specific, ear-specific results are presented in Table I and stratified by left and right ears. Across both ears, the frequency-specific scores were strongly correlated at all analyzed frequencies and were better correlated at higher frequencies.
Frequency (Hz) | |||||
---|---|---|---|---|---|
0.5 | 1 | 2 | 4 | 8 | |
Left ear | 0.63 | 0.65 | 0.69 | 0.69 | 0.68 |
Right ear | 0.63 | 0.67 | 0.71 | 0.72 | 0.79 |
Figure 1 illustrates the frequency-specific distribution of error between PTA and app scores, stratified by the four apps (Figure 1A–D) and overall (Figure 1E). All apps tended to overestimate the severity of hearing loss, particularly at low frequencies. This disagreement was least pronounced for Android-2 (Figure 1D). The median of iOS-1 and iOS-2 were also within ± 10 dBHL of the gold standard at all five frequencies.

The percentage of results within ± 10 dB of PTA are presented in Table II. Overall, the ± 10 dB accuracy was between 54% and 66%. However, this was better for certain apps. For example, iOS-1 and Android-2 had approximately 70%–80% ± 10 dB accuracy, with the highest accuracy at the 2 kHz frequency.
Proportion (n/N) | Frequency (kHz) | Pure tone average | |||||
---|---|---|---|---|---|---|---|
0.5 | 1 | 2 | 4 | 8 | |||
Easy hearing test (iOS-1) | 75% (53/71) | 74% | 58% | 77% | 68% | 66% | 74% |
Hearing test and ear age test (iOS-2) | 75% (52/69) | 54% | 60% | 63% | 60% | 69% | 61% |
Eartone hearing test (Android-1) | 71% (48/68) | 48% | 33% | 42% | 33% | 50% | 40% |
Hearing test (Android-2) | 59% (40/68) | 70% | 68% | 83% | 68% | 61% | 75% |
All apps | 69% (193/278) | 61% | 54% | 66% | 57% | 62% | 63% |
- Note: n refers to the number of participants (by ear) who submitted at home app data; N refers to number of participants who had a PTA performed, that is, not all participants with a PTA submitted at home app data.
Smartphone Audiometry Versus PTA: Pure Tone Average Results
Table III compares pure tone averages when contrasting app audiometry performed in optimal conditions in hospital versus home. There was a moderate correlation between app audiograms performed in hospital versus PTA (Pearson's CC 0.56). This improved to a rating of “strong” when patients performed the app at home (0.68).
Left ear | Right ear | Both ears | ||||
---|---|---|---|---|---|---|
Prop (n/N) | CC | Prop (n/N) | CC | Prop (n/N) | CC | |
Stratified by location of smartphone app test | ||||||
Performed in hospital | 99% (138/139) | 0.54 | 99% (138/139) | 0.59 | 99% (276/278) | 0.56 |
Performed at home | 70% (97/139) | 0.68 | 69% (96/139) | 0.69 | 69% (193/278) | 0.68 |
Average of all smartphone apps | 100% (139/139) | 0.57 | 100% (139/139) | 0.61 | 100% (139/139) | 0.59 |
Stratified by randomized app performed at home | ||||||
Easy hearing test (iOS-1) | 75% (27/36) | 0.79 | 72% (26/36) | 0.74 | 75% (53/71) | 0.77 |
Hearing test & ear age test (iOS-2) | 74% (26/35) | 0.49 | 74% (26/35) | 0.47 | 75% (52/69) | 0.47 |
Eartone hearing test (Android-1) | 71% (24/34) | 0.65 | 71% (24/34) | 0.76 | 71% (48/68) | 0.69 |
Hearing test (Android-2) | 59% (20/34) | 0.86 | 59% (20/34) | 0.81 | 59% (40/68) | 0.83 |
- Note: Prop refers to the proportion of n relative to overall N; n refers to number of participants who submitted each type of app data; N refers to number of participants who had a PTA performed, that is, not all participants with a PTA submitted at home app data.
Home hearing thresholds were stratified by app (Table III). Pure tone average thresholds of Android-2 showed a very strong correlation with the PTA (CC 0.83). iOS-1 and Android-1 showed strong correlation (0.77, 0.69, respectively); iOS-2 showed moderate correlation (0.47). The overall ±10 dB percentage accuracy was 63%, being highest for iOS-1 and Android-2 (74% and 75%, respectively).
The violin plot in Figure 2 depicts the pure tone average difference (i.e., the error between PTA and app), stratified by severity of hearing loss: normal, mild, and moderate or severe. Apps tended to overestimate thresholds in listeners without hearing loss.

Smartphone Audiometry in Optimal Noise Versus at Home
Smartphone results performed in optimal noise conditions (i.e., in hospital sound-treated rooms) versus at home were rated as either strong or very strong correlation at all frequencies (Table IV), and across all apps (Table V).
Frequency (kHz) | |||||
---|---|---|---|---|---|
0.5 | 1 | 2 | 4 | 8 | |
Left ear | 0.79 | 0.78 | 0.84 | 0.87 | 0.89 |
Right ear | 0.84 | 0.82 | 0.90 | 0.94 | 0.95 |
Left ear | Right ear | Both ears | ||||
---|---|---|---|---|---|---|
Prop (n/N) | CC | Prop (n/N) | CC | Prop (n/N) | CC | |
Easy hearing test (iOS-1) | 75% (27/36) | 0.84 | 72% (26/36) | 0.94 | 75% (53/71) | 0.88 |
Hearing test and ear age test (iOS-2) | 74% (26/35) | 0.90 | 74% (26/35) | 0.89 | 75% (52/69) | 0.90 |
Eartone hearing test (Android-1) | 71% (24/34) | 0.86 | 71% (24/34) | 0.95 | 71% (48/68) | 0.93 |
Hearing test (Android-2) | 56% (19/34) | 0.86 | 59% (20/34) | 0.98 | 59% (40/68) | 0.92 |
- Note: Prop refers to the proportion of n relative to overall N; n refers to number of participants who submitted each type of app data; N refers to number of participants who had a PTA performed, that is, not all participants with a PTA submitted at home app data.
Test–Retest Reliability
Test–retest reliability was high when examining both the optimal noise-to-home condition (intraclass coefficient [ICC] 0.87) and the first to final readings at home (ICC 0.90).
Participant Reception
Forty-six participants (response rate 33%) submitted online feedback surveys; 26 (56.5%) were iOS users and 20 (43.5%) were Android users. The full results are available as Supplementary Material 3. Forty-two participants (91%) agreed/strongly agreed that the smartphone app was easy to use, and 40 (87%) agreed/strongly agreed that they would be happy to use an app to monitor their hearing in the future. Thirty-eight participants (83%) agreed/strongly agreed that they were able to find a suitably quiet place to use the app.
DISCUSSION
To our knowledge, this is the first study to report the validity of self-conducted smartphone hearing audiometry testing platforms at home. We found a strong correlation between hearing thresholds, determined by the “gold standard” of PTA in hospital, and thresholds automatically defined via home smartphone hearing tests. This finding relates to a diverse group of adult participants at multiple centers with hearing losses of variable severity and etiology. These findings support the ongoing use and development of home smartphone audiometry.
Test Accuracy and Repeatability
The accuracy of smartphone audiometry has previously been shown to be high, with pooled sensitivity and specificity for detecting mild hearing loss of 0.91 and 0.90 respectively, and slightly poorer performance for detecting moderate hearing loss (pooled sensitivity and specificity of 0.94 and 0.87 respectively).9 Our pure tone average accuracy (as measured by percentage within ±10 dB of the hospital PTA) was greater than 70% in three out of four apps. Not all apps performed as well as each other, with Android-2 performing the best overall.
All apps tended to overestimate the severity of hearing loss, particularly at lower frequencies. There are several possibilities for this including the impact of ambient noise (although a consistency of this finding when testing in sound-treated room does not support this) and poor calibration/quality of the earbuds. A useful feature for future developers to consider, would be providing inbuilt feedback to the user on the noise of their listening environment. Although perfect accuracy would be ideal, for the purpose of screening, overestimation may be better than underestimation, maximizing sensitivity.
While accuracy was moderate, precision was better, with high test–retest correlation between tests performed at home. This is important for disease monitoring, for example identifying fluctuations in Ménière's that may trigger the need for intervention, monitoring ototoxicity, or research exploring the trajectory of hearing recovery or decline.
We noted a tendency for thresholds on the right ear to be lower (better) than those on the left. The right ear was tested first in three apps (iOS-2, Android-1, Android-2) and so this may reflect fatigue or poorer concentration as testing progressed. Unlike an in-hospital PTA performed by an experienced audiologist, the smartphone apps will not detect poor concentration or inconsistencies in responses.
Effect of Test Environment
Previous work suggested that performance may be poorer in noisier environments, such as in a clinic waiting room13; although this finding has not been universal, and the use of noise-canceling headphones may mitigate some ambient noise effects.20 Many apps designed for hearing screening, as opposed to PTA, use a digits-in-noise test (e.g., the World Health Organisation hearWHO app). This has the advantage of using signal-to-noise ratio as the measure of functional hearing, meaning ambient noise pollution is less of an issue. However, it does not provide the sensitivity or frequency-specific data of a PTA. This would be particularly of value where limited high or low-frequency losses may occur (e.g., ototoxicity or Ménière's, respectively).
A meta-analysis found performance outside an audiometric booth to be poorer,8 though all published studies not using a sound-treated room have still utilized a quiet school room, clinic room, or office for supervised testing. Surprisingly, we found that results obtained at home were more accurate than those in the sound-treated room. We attribute this to a learning effect, as the first time the participants used the app, they were alone in an audiometric booth. By contrast, home testing was repeated up to three times and always with prior experience. Questionnaire data showed that fewer than 10% of participants disagreed that it was easy to find a quiet place. Our findings therefore support the home environment as suitable, at least in relation to a typical UK home.
Our study involved issuing participants with standard inexpensive earbuds, costing approximately £2.00/pair. A well-fitting earbud may have some sound attenuation properties for noisier environments, with insert earphones found to be best for this.21 We chose to standardize earbuds to remove any bias that differing quality or frequency response between smartphones may introduce. App performance may have been better with higher-quality earbuds, as many apps are developed with a stock earbud in mind.
Clinical Application of Smartphone Tests
All apps were able to determine thresholds up to 8 kHz, and accurate high-frequency data adds additional value for ototoxicity monitoring, with daily or regular testing in at-risk individuals potentially allowing ototoxic drugs to be stopped or their dose reduced at an early stage. The ability to regularly test patients and remotely collect data has many possibilities for research, for example, enabling a better understanding of the time course of hearing recovery following idiopathic sudden sensorineural loss or decline in genetic conditions.
The apps' simplicity meant that participants did not struggle and gave positive feedback: 87% of participants agreeing or strongly agreeing that they would be happy to monitor their hearing at home. This fits with a wider trend toward consumer-led health monitoring and wearable health technologies. This can be invaluable to a clinician in observing fluctuating hearing and may also permit patients to take control over their management and seek treatment earlier for relapsing conditions, such as Ménière's or vestibular schwannoma, or to monitor improvement during treatment for sudden sensorineural hearing loss.
Study Limitations
We selected four apps across both common smartphone operating systems. For the purpose of feasibility and patient data security, our choice was limited to apps freely available and not requiring registration using patient identifiable data. We were unable to test the uHear app, which is most widely validated in the literature, as it was not available in UK. Other apps may vary in performance.
Patient age may impact app performance, with poorer results in children and older individuals.8, 22 While a strength of our work was the inclusion of participants with a broad age range, we did not test children. It may be that the performance of this group would be poorer.
Smartphones cannot fully replicate a PTA (lack of bone conduction to distinguish conductive loss), and there were no accommodations for masking; the latter being particularly relevant in the monitoring of patients with unilateral hearing loss.
Our results were more accurate at home. Future studies that wish to determine the accuracy of app results at home versus sound-treated rooms, could ensure all participants practice using the app multiple times before data collection is started. Ideally, they would be randomized to perform the test three times in each environment. Participants were limited to performing the app only once in the hospital to reduce their time burden in this voluntary study and due to the availability of sound-treated rooms.
Pure tone average usually does not include 8 kHz as its reliability can be poor compared to lower frequencies. However, 8 kHz appeared just as accurate in our data, and so was included due to its value in ototoxicity monitoring.
CONCLUSIONS
Home, patient-conducted audiometry is increasingly desirable with a move toward remote care, and it may improve accessibility for some patients. Two of four apps (iOS-1 and Android-2) reached accuracy that was considered clinically reasonable (strong correlation plus median within ±10 dB accuracy), and therefore may be appropriate for use in a clinical setting. The high precision suggests they may be most appropriate to remotely assess for fluctuations in hearing, rather than relying on the absolute hearing threshold. Patients and clinicians must be aware of their limitations; apps cannot currently replace PTA, with hearing thresholds often overestimating the true level. Clinicians should be cautious not to recommend unvalidated apps.