Volume 29, Issue 3 pp. 846-862
RESEARCH ARTICLE
Open Access

Targeting carbon reduction in UK households: A new segmentation model using financial transaction data

Jasmine Wells

Corresponding Author

Jasmine Wells

Sustainability Research Institute, University of Leeds, Leeds, UK

Behavioural Science, Lloyds Banking Group, London, UK

Correspondence

Jasmine Wells, Sustainability Research Institute, University of Leeds, Leeds, UK.

Email: [email protected]

Search for more papers by this author
Anna Trendl

Anna Trendl

Behavioural Science, Lloyds Banking Group, London, UK

Warwick Business School, University of Warwick, Coventry, UK

Search for more papers by this author
Anne Owen

Anne Owen

Sustainability Research Institute, University of Leeds, Leeds, UK

Search for more papers by this author
John Barrett

John Barrett

Sustainability Research Institute, University of Leeds, Leeds, UK

Search for more papers by this author
Norbert Jobst

Norbert Jobst

Portfolio Analytics, Lloyds Banking Group, London, UK

Search for more papers by this author
David Leake

David Leake

Behavioural Science, Lloyds Banking Group, London, UK

Search for more papers by this author
First published: 02 April 2025

Editor Managing Review: Klaus Hubacek

Abstract

Designing effective and targeted policies to reduce household emissions needs to consider variability in household consumption patterns, preferences, and financial capacities. This paper introduces a new segmentation model of household carbon footprints that uses financial transaction data from over 700,000 customers of a major high-street bank. Our approach considers socioeconomic, consumer-preference, and spatial factors to identify 10 distinct household typologies. We focus on targeted retrofit as a practical application, identifying three high-impact household types with the capacity to invest—“Suburban Home Improvers,” “Car and Tech Enthusiasts,” and “Affluent Families”—and suggest targeted policy and communication opportunities. Our segmentation supports a new data-driven policy design that considers both the technical potential and diverse behavioral factors affecting decarbonization decisions.

1 INTRODUCTION

Household consumption drives between 58% and 72% of global greenhouse gas (GHG) emissions, largely due to pollution generated throughout product and service supply chains (Ivanova et al., 2016; Wiedmann & Lenzen, 2018). Achieving full decarbonization requires reducing energy demand—both direct (e.g., household energy use) and indirect (energy embedded in goods and services)—and improving the energy efficiency of production across industries. These two aspects can be seen as the “cause” and “effect” of emissions, yet climate policies tend to focus on mitigating effects rather than addressing the causes (Cohen & Ummel, 2019). Consumption-based (CB) accounting offers a way to integrate both perspectives by shifting responsibility from producers to final consumers (Barrett et al., 2013; Ottelin et al., 2019; Wiedmann & Lenzen, 2018). It accounts for emissions generated throughout the entire supply chain—including production, transportation, and disposal—irrespective of where they occur, making it a useful tool for shaping demand-side policies. Despite being widely recognized as an important complement to production-based methods, few countries have formally adopted CB accounting in their emissions reporting.

In the United Kingdom, there is growing interest in identifying consumer-oriented emission mitigation opportunities at local levels (Local Government Association, 2024). However, designing effective policies requires a strong understanding of the variability among households at the local level, including consumption behaviors, preferences, and financial capabilities of the residents (Froemelt & Wiedmann, 2020). Financial transaction data (FTD) can be a useful tool in this effort by providing real-time, high-resolution information on household consumption (Trendl et al., 2023). Integrating FTD with other data sources supports the development of segmentation models that can provide insights for policymakers and other stakeholders to develop more targeted interventions. These models can account for the diverse socioeconomic, geographic, and consumer-preference factors that shape household decisions (Baiocchi et al., 2010; Dubois et al., 2019).

This paper presents a new segmentation of household carbon footprints (CFs) derived from FTD from over 700,000 customers of a major high street bank. Household CFs refer to GHG emissions associated with consumption and are expressed as carbon dioxide-equivalents (CO2e). It is calculated using a CB approach and encompasses both direct emissions from activities such as energy use and private transport, as well as indirect emissions embedded in the supply chains of goods and services (Barrett et al., 2013; DEFRA, 2024). We use retrofit as a case study to demonstrate how this method can guide targeted policies, helping local policymakers and other stakeholders identify eligible households and tailor communication strategies for effective engagement.

While retrofit primarily concerns direct household emissions, incorporating indirect emissions into the segmentation model reveals consumption preferences often overlooked in existing data (Wilson et al., 2015). Understanding variations in consumption behaviors, socioeconomic backgrounds, and location can provide clues into how household groups might engage with retrofit programs, while keeping the model adaptable for broader policy applications. For example, the method could be used to inform strategies for reducing indirect emissions from purchases of goods (e.g., clothing), or support targeted transport policies (e.g., promoting electric vehicle [EV] adoption).

The remainder of this paper is structured as follows. The following section offers a brief literature review, examining existing studies in this domain and positions our research. Section 2 describes our data sources and methodology. Section 3 presents our new household typologies with the carbon profiles that emerge from our analysis. We then provide an example use-case demonstrating how this approach can support more effective climate policy and targeting strategies. Section 4 evaluates our approach, discusses limitations, and suggests directions for future research.

1.1 Literature review

1.1.1 The benefits of segmentation in climate policy

Numerous studies have provided valuable insights into the drivers of household CFs, identifying key factors that contribute to their variability. These include income (Büchs et al., 2024; Duarte et al., 2012; Lévay et al., 2021; Mahía & de Arce, 2022; Salo et al., 2021; Steen-Olsen et al., 2016; Weber & Matthews, 2008; Zhang et al., 2019), household size and composition (Ivanova & Büchs, 2022; Minx et al., 2013), location (Seriño & Klasen, 2015; Tomás et al., 2020), travel (Mattioli et al., 2023), and housing infrastructure (Ottelin et al., 2015). Such research has helped shape policies and strategies that aim to reduce household emissions (e.g., see Büchs & Schnepf, 2013; Ivanova et al., 2017). Segmentation analysis builds upon this foundation by categorizing households based on similarities in CFs, consumption, location, and socio-economic profiles (Baiocchi et al., 2010).

Unlike approaches that analyze drivers of CFs in isolation, segmentation allows for a more integrated perspective and reveals how combinations of factors can interact to shape household CFs. This richer understanding enables policymakers to develop interventions that consider the specific needs and characteristics of diverse populations (Baiocchi et al., 2010, 2022; Froemelt & Wiedmann, 2020; Froemelt et al., 2018; Koide et al., 2019). However, existing studies are often constrained by key data limitations, such as small sample sizes, infrequent purchases, self-reporting biases, as well as underrepresentation of the highest-income households—those with the greatest potential for carbon reductions (Kilian et al., 2023). Addressing these limitations requires incorporating more comprehensive and objective data sources, such as FTD (Trendl et al., 2023).

FTD offers a promising resource for segmentation tasks. Already widely used in the industry for consumer profiling (CACI, 2024), FTD provides an objective record of household expenditures across various consumption categories. It offers large, representative samples with high temporal and spatial resolution. FTD has also been used effectively for estimating CFs, showing close correspondence to existing data sources more commonly used for this purpose (Trendl et al., 2023). Incorporating financial transactions into a new segmentation model offers advantages over existing approaches that rely on lower frequency data such as census data and budget surveys (Gale et al., 2016; Morgan et al., 2021; Wyszomierski et al., 2023; Yuan & Choudhary, 2020).

1.1.2 Applications of segmentation analysis for retrofit policies

One application of segmentation analysis is for targeted retrofit of residential properties to reduce carbon emissions. The UK's goal of achieving net-zero emissions by 2050 requires substantial reductions in emissions from residential housing (CCC, 2020). Challenges in decarbonizing this sector include improving the energy efficiency of existing homes—many of which are among the least insulated in Europe (CAT, 2017).

Despite the clear need for improving energy efficiency through retrofit, the uptake of such measures in the UK residential housing sector has been notably limited (CCC, 2019), even in the face of various government initiatives designed to stimulate demand. For example, programs like the Green Deal (DECC, 2010) aimed to remove upfront cost barriers by offering loans repaid through savings on energy bills (DECC, 2013). However, the scheme fell short of expectations, with only a fraction of the anticipated number of households participating before its discontinuation due to low uptake and administrative challenges (NAO, 2016). Similarly, the Energy Company Obligation and the more recent Green Homes Grant faced obstacles such as complex application processes, limited contractor availability, and insufficient public awareness (EAC, 2021; UK Parliament, 2022).

Recent research has identified types of households in the United Kingdom more likely to apply for domestic energy grants for retrofit by combining energy performance certificate (EPC) data with data from the UK Census and Indices of Multiple Deprivation (Owen et al., 2023b). This neighborhood-level typology found that the uptake of energy grants is heavily influenced by non-economic factors, including social influence and community networks. For example, low-income, Asian-origin households concentrated in specific regions and living in inefficient, owner-occupied terraces are found to be much more likely to apply for certain energy grants (Owen et al., 2023b). This study highlights the role of geodemographics and household characteristics in influencing sustainability decisions and opens the potential for more granular segmentation approaches.

Our research introduces a novel segmentation of UK households—specifically England and Wales—using FTD to calculate CFs and profile associated consumption characteristics. This approach offers a higher-resolution view of key consumer segments and delivers insights that can inform policy-making and communications with diverse household groups to drive effective change. The following section provides further details on the methodology and data sources applied.

2 METHODS

To accurately identify dominant household CF typologies across England and Wales, we conducted a multi-step analysis. First, we calculated the CFs of 719,993 households using their FTD. Next, we employed a dimensionality reduction technique followed by a clustering algorithm to group households with similar characteristics. Finally, we mapped these groups at the neighborhood level to assess their geographic distribution. The sections below describe these steps in more detail.

2.1 Data collection

2.1.1 FTD

This study uses FTD data from an anonymized sample of 719,993 individuals who banked with Lloyds Banking Group (LBG)—one of the United Kingdom's largest retail banks, in the period between January 2021 and December 2021. FTD captures various types of spending, including personal current and credit card transactions, electronic transfers, online purchases, in-store payments, and ATM withdrawals.

To ensure data quality and full visibility into customer spending patterns, we focused on accounts where LBG was the customer's primary bank. This was important because over 90% of UK households use multiple banks (“multi-banked”) for different services (Business Money, 2024; FCA, 2018). Using data from one financial provider can lead to an underestimation of overall spending activity (and thus CFs) if not accounted for.

To identify these primary-banked customers, we applied specific selection criteria that follow the methodology of Trendl et al. (2023):
  • Accounts should record at least 12 customer-initiated transactions (i.e., non-automatic payments) per month during 2021. This filtering is used to confirm active banking behavior and is standard practice within the bank.
  • Each account includes at least one transaction categorized as “Energy” in LBG's transaction classification system (TCS), ensuring energy-related spending—an important component of a household's CF—was captured.
  • As an additional step, we excluded customers who have made payments to non-LBG current accounts in their name or payments to non-LBG credit cards exceeding 2.5% of total spending. While this threshold is arbitrary, sensitivity analysis by Trendl et al. (2023) provided in Supporting Information S1 found that it strikes an ideal balance between maintaining a robust sample and ensuring we capture the vast majority of a customer's spending activity.

Through this selection process, we obtained a sample that covered 7261 middle layer super output areas (MSOAs) across England and Wales, nearly the full set of 7264 MSOAs available. MSOAs typically represent between 2000 and 6000 households with a resident population of 5000 to 15,000 individuals. We randomly selected up to 100 anonymous customers aged 18 or older from each MSOA. A complete sample of 100 customers was achieved for 45% of MSOAs (3274), while the remaining had slightly smaller samples ranging from 90 to 99 customers.

2.1.2 Socioeconomic and demographic data

In addition to FTD, we integrated other household data from LBG's internal databases, which included demographic and socioeconomic attributes such as:
  • Age, gender, and income: We estimated after-tax income by taking the median credit turnover across all accounts for the last 3 months of 2021 and annualizing.
  • Savings and investments: LBG's internal TCS can classify outgoing payments to savings and investment platforms such as ISAs, trusts, stocks and shares, and building societies.
  • Homeownership status: LBG determines whether customers are homeowners or tenants during account setup and updates this information as needed.
  • Account type: We recorded whether accounts were joint or individual, as joint accounts typically indicate shared household expenses.
  • Child benefit and electric vehicle flags: We included a binary marker (flag) for accounts that have received income from child benefits, and/or have made purchases to charging stations, both of which are visible through LBG's TCS.

In our final sample, 44% of households (316,866 customers) held joint accounts, which likely reflects the shared spending behavior of two or more individuals within the household. This is a critical factor, as joint spending typically results in higher absolute CFs.

These demographic markers, along with CF estimates, were crucial in developing the household typologies produced by our analysis.

2.1.3 Energy performance certification data

The Energy Performance of Buildings Data (EPBD) publishes EPC ratings for approximately 60% of housing stock in England and 55% in Wales (DLUHC, 2024). These ratings assess the energy efficiency of homes on a scale from A to G, with A representing the highest level of efficiency. The EPBD also reports the proportion of dwellings rated C and above at the MSOA level.

We did not include EPBD data as part of the segmentation variables used to determine household typologies. Instead, we used the dataset to map the distribution of energy-inefficient housing (homes rated below C) across UK MSOAs. By overlaying this information with the spatial distribution of high-energy household groups, we gained a novel perspective into neighborhoods where targeted retrofit efforts could be prioritized.

2.2 Carbon footprint calculation

We now outline the methodology used to calculate the CFs of households based on their transactions. The process involves assigning carbon multipliers (CMs) to different categories of household spending.

2.2.1 Carbon multipliers from UKMRIO

To calculate the carbon emissions associated with household spending, we used CMs derived from the 2023 UK Multi-regional Input Output (UKMRIO) model. This model is an environmentally extended multi-regional input-output (EEMRIO) model of the UK economy (Owen et al., 2023a). The CMs constructed to account for both the direct emissions associated with burning fuel (e.g., to heat homes and drive cars) and the indirect emissions embedded in global supply chains of the goods and services consumed by the household. This CB accounting framework is essential for understanding the true carbon impact of household spending, as it captures emissions generated both domestically and internationally (Peters et al., 2011; Wood et al., 2018). We refer to Supporting Information S2 for a more detailed overview of this step and associated limitations.

2.2.2 Assigning carbon multipliers to transaction categories

Since household spending is recorded in LBG's internal TCS, the next step is to map these spending categories to the corresponding UKMRIO multipliers.

The TCS consists of 386 categories, to which CMs are applied as follows:
  • Direct mapping: Some TCS categories, such as “airlines” and “electricity,” correspond directly to specific UKMRIO categories like “passenger transport by air” or “electricity supply.”
  • Spend-weighted averages: In cases where a single TCS category corresponds to multiple UKMRIO categories, we calculate a spend-weighted average of the CMs. For example, the “Gas & Electricity” category in TCS is split between “gas” and “electricity” in UKMRIO. Based on the UK Living Costs and Food Survey (LCFS), we can estimate that 56.4% of this spending is on electricity, and the rest on gas, leading to a combined CM for this category.

Of the 386 TCS categories, 153 were excluded from the analysis because they do not represent consumption expenditure (e.g., loans, mortgages, and council tax). For ATM withdrawals, the associated spending was distributed across relevant TCS categories based on survey data on money-spending habits (UK Finance, 2020). For further discussion of these steps, see Trendl et al. (2023), and Supporting Information S3 for this article. The mapping table can be accessed in Supporting Information S4.

2.2.3 Aggregating carbon footprints

Once the CMs are assigned to the TCS categories, household-level CFs are calculated by multiplying the amount spent in each category by the corresponding CM. This process is repeated across 233 relevant spending categories.

To simplify the comparison across households, these detailed CFs are then aggregated into 24 broader consumption groups, such as energy, food, commuter travel, and recreation (see Table 1).

TABLE 1. Variables used for segmentation.
Data class Variable Units Consumption group Abbreviation
Carbon footprints Housing bills—energy tCO2e Housing energy
Housing bills—other tCO2e Housing housing_other
Airlines tCO2e Travel air
Car rentals and taxis tCO2e Travel rental_taxi
Commuter (incl. rail, tram, & buses) tCO2e Travel publ_transport
Other transport services tCO2e Travel oth_transport
Petrol tCO2e Travel petrol
Vehicle purchase and maintenance tCO2e Travel garage
Vehicle finance tCO2e Travel car_finance
Food tCO2e Food food
Furnishings—hardware, furniture, and garden tCO2e Furnishing hardw_furn_gardn
Furnishings—construction, storage, and security tCO2e Furnishing constr_strg
Hotels and travel agencies tCO2e Restaurants and hotels hotels
Restaurants tCO2e Restaurants and hotels resto
Restaurants—fast food tCO2e Restaurants and hotels resto_ff
Recreation tCO2e Recreation rec
Recreation—digital tCO2e Recreation rec_digi
Clothing tCO2e Clothing clothing
Communications tCO2e Communications comms
Health tCO2e Health health
Education tCO2e Education and childcare edu
Childcare tCO2e Education and childcare childcare
Alcohol and tobacco tCO2e Other alco_tob
Miscellaneous tCO2e Other misc
Total carbon footprint tCO2e
Socio-economic indicators Age Years
Gender—male Binary (1,0)
Income GBP (£)
Savings amount GBP (£)
Investments amount GBP (£)
Tenure—tenant Binary (1,0)
Joint account status—joint Binary (1,0)
Additional spend data Child benefit flag Binary (1,0)
Electric vehicle flag Binary (1,0)

In addition, each household's final dataset includes seven socioeconomic variables (e.g., age, gender, and income) and two variables related to spending behaviors (whether they receive child benefit payments or own an EV, see Table 1). This aggregation allows us to categorize households into meaningful groups for further analysis and segmentation.

2.3 Segmentation

We used a two-stage approach to segment households into distinct typologies based on the CFs and sociodemographic characteristics listed in Table 1. First, dimensionality reduction was applied to simplify the data, followed by k-means clustering to group similar households. The following section describes this process in more detail.

2.3.1 Dimensionality reduction

Dimensionality reduction is a technique for managing high-dimensional data, often used as a preparatory step for clustering and classification tasks (Bouveyron et al., 2019). Clustering relies on distance measures to evaluate the separation between observations, but these measures become less reliable in high-dimensional spaces (Aggarwal et al., 2001). By reducing data dimensionality, distance metrics become more meaningful.

In this study, we employed Uniform Manifold Approximation and Projection (UMAP) (McInnes et al., 2020), which utilizes principles of Riemannian geometry and algebraic topology to project data onto a lower-dimensional space, referred to as an “embedding” (McInnes et al., 2020). UMAP effectively preserves local and global structures within the data, which are important for maintaining the relationship structures within the dataset (Allaoui et al., 2020).

We treated continuous and categorical variables in our dataset separately (see Table 1), applying Euclidean distance for the former and Jaccard for the latter. Their respective embeddings were then combined using a union operation (McInnes, 2018).

2.3.2 Clustering approach

After dimensionality reduction, we applied k-means clustering to group households based on their CF and associated characteristics. K-means clustering divides the data into k non-overlapping clusters (MacQueen, 1967), where each household is assigned to the cluster with the nearest centroid. The algorithm works iteratively to minimize within-cluster variance, ensuring that households in the same cluster have similar attributes while those in different clusters are more distinct (Gale et al., 2016).

To determine the optimal number of clusters (k), we employed a combination of methods:
  • Elbow plot: This method helps identify the point where the rate of decrease in within-cluster variance slows, suggesting a suitable number of clusters (Yuan & Yang, 2019).
  • Silhouette score: This metric ranges from −1 to 1 and measures how well data points fit within their assigned clusters. A higher score indicates better-defined clusters, with values between 0.71 and 1 considered an excellent split (Mamat et al., 2018).
  • Visual inspection: We also visually examined the UMAP embeddings to ensure the cluster formations made sense.

2.3.3 Model evaluation

To evaluate cluster stability, we resampled 5% of the dataset, bootstrapping the clustering process 100 times. We evaluated the quality of the results using the following metrics:
  • Adjusted Rand index (ARI): This metric measures the similarity between two clustering results. An ARI score of 1 indicates perfect agreement, while a score of 0 indicates random clustering (Santos & Embrechts, 2009).
  • Silhouette score: This evaluates how well each data point fits within its assigned cluster and how distinct the clusters are from each other.

Given the computational demands of processing over 700,000 observations, this resampling approach was practical. We found consistent silhouette scores across different random samples confirming that the reduced dataset maintained the overall structure of the clusters.

3 RESULTS

Our clustering exercise produced 10 distinct and stable clusters, based on the following metrics:
  • Mean ARI: 0.92, indicating strong consistency between clusters.
  • Mean silhouette score: 0.72, reflecting a clear distinction between clusters and strong internal consistency.

These high scores confirmed our clustering approach was robust, with minimal overlap between clusters that resulted in well-defined groupings.

In this section, we present the results of this clustering exercise by first reporting key sociodemographic characteristics, the resulting carbon profiles (our “pen portraits”) identified to describe each household group, and the levels of income, savings, and investment associated.

We also offer example applications, focusing on retrofit to demonstrate how these insights can inform retrofit opportunities by mapping areas with high concentrations of poorly insulated homes and energy-intensive households.

3.1 Sociodemographic characteristics

Table 2 summarizes the key attributes identified for each of the 10 clusters. This includes household income, average spending on investments and savings platforms, joint account status, tenure, gender, child benefit status, and whether the household owns an EV. Shaded cells indicate the prevalence of each characteristic within the cluster, with darker shading indicating a higher percentage of households exhibiting the trait.

TABLE 2. Sociodemographic characteristics of clusters.
% Observations
Cluster N Average age Average income (£) Average investments + savings (£) Joint account Renters Male

Child

benefits flag

Electric vehicle flag
1 49,801 51 37,837 2794 83.64 99.99 58.63 15.26 1.31
2 101,388 58 44,114 6056 100.00 0.00 0.00 0.00 0.00
3 36,410 41 36,427 1595 46.14 46.15 18.25 100.00 0.32
4 106,472 58 48,710 6508 99.96 0.00 99.26 0.00 1.72
5 85,951 60 29,266 2855 0.00 0.00 0.00 0.00 0.00
6 50,608 37 29,610 506 0.00 100.00 0.08 100.00 0.29
7 75,588 49 25,384 1011 0.00 100.00 0.00 0.00 0.03
8 82,162 52 41,592 4396 0.00 0.00 100.00 0.00 0.00
9 76,559 44 33,072 1571 0.00 100.00 100.00 0.00 0.00
10 55,054 42 46,431 3694 91.90 0.25 45.12 97.97 3.34
  • Note: Percentage observation of categorical characteristics is presented as a heat map.

The youngest and oldest average age groups (clusters 6 and 5, respectively) are mostly female, have sole accounts, and rank among the three lowest in average income (Table 2). However, their saving behaviors differ significantly. Cluster 6 has the lowest average savings and investments across all groups, while cluster 5 are strong savers, with higher average savings than some higher-earning clusters such as 1, 3, and 9. Additionally, cluster 6 consists of renters receiving child-benefit payments, whereas cluster 5 are homeowners.

Clusters 2, 4, and 10 have similar incomes and are among the highest-earning groups (Table 2). Clusters 2 and 4 are particularly alike in age and savings, with a key distinction being that cluster 2 are females, while cluster 4 are males, some of whom spend on EVs. Cluster 10, despite having similar income levels to clusters 2 and 4, has lower average savings and investments than these groups. Most households within this group receive child-benefits and they also have the highest prevalence of EV spending.

3.2 Carbon footprint profiles of clusters

To compare CFs across the clusters, we standardized them. The results, shown in Figure 1, are therefore expressed in terms of units of standard deviations from the population average. A value of zero represents the population's average CF, while positive or negative values indicate whether a cluster's CF is above or below the average, respectively. This comparison allows us to identify clusters with significantly higher or lower CFs in specific consumption categories. By doing so, we gain insight into the relative environmental impact of different household groups. A detailed table of descriptive statistics for CFs by cluster is reported in Supporting Information S5.

Details are in the caption following the image
Standardized carbon footprint profiles of clusters 1–10, with 0 indicating sample average. Underlying data for this figure are available in Table 1 of Supporting Information S9.

In terms of energy, clusters 2, 4, and 10 have higher average CFs, while clusters 6, 7, and 9 show lower average CFs (Figure 1). Clusters 2, 4, and 10 also have higher CFs from home improvement categories (e.g., hardware, furniture, and gardening).

In terms of travel, clusters 3, 4, 8, and 10 have higher petrol CFs, whereas clusters 5, 6, and 7 have much lower average CFs (Figure 1). In addition, clusters 4, 8, and 10 have higher CFs associated with other vehicle-related expenses, such as garage spend and vehicle finance. Clusters 6 and 7 also show elevated CFs for public transport and taxis, while cluster 5 demonstrates low CFs across all travel categories. Airline CFs are relatively close to average across clusters, however, it is slightly higher in clusters 1, 4, 9, and 10.

3.3 Cluster descriptions (Pen portraits)

The sociodemographic attributes (Table 2) and CF profiles (Figure 1) were combined to create pen portraits for each household typology. As outlined in Table 3, these summarize lifestyle and financial characteristics, helping us understand the drivers of CFs among household groups.

TABLE 3. Pen portraits of household typologies.
Cluster Pen portrait Mean house-hold CF (tCO2e) Distinguishing features Consumption patterns Examples of high membership locations
High CFs Low CFs
1 Rural and suburban Renters 18.4 Mostly renters with joint accounts, some have children. Housing bills unrelated to energy, food, petrol, air travel, public transport, and communications. Clothing and energy bills. Coastal areas and near national parks, for example, the Isles of Scilly, Northumberland, north Norfolk, and Cumbria, and outside major cities like London.
2 Suburban home improvers 19.0 Female homeowners with joint accounts. Energy bills, home improvement purchases (e.g., hardware, gardening), and health. Fast-food, communications, car-rentals and taxis, and public transport. Suburban England and Wales, including Bromley, Swindon, Somerset, and Monmouthshire.
3 Trendy families 19.2 Predominantly female joint- and sole-account holders with families. Clothing, fast-food, communications, petrol, and childcare. Garage, gardening, and health. Urban or suburban areas including Crofton Park (Lewisham), Oldham Town South (Greater Manchester), and neighborhoods in Bradford.
4 Car and tech enthusiasts 20.6 Mostly male homeowners with joint accounts; high prevalence of EV ownership. Energy bills, petrol, vehicle expenses, digital goods, hotels, restaurants, and home furnishings. Fast-food, taxis, and housing bills unrelated to energy. Neighborhoods in Buckinghamshire, Sheffield, and West Berkshire.
5 Suburban seniors 13.4 Female homeowners with sole accounts, likely of older age group. Health. Food, fast-food, communication, restaurants, recreation, and petrol. Rural or suburban areas, a few of them are near the coast or near ports, such as Hoylake, Emsworth, Lewes, and neighborhoods of Wirral and Cornwall.
6 Taxis and takeouts 16.1 Young female renters receiving child benefits. A small number have EVs. Fast-food, taxis, clothing, and communications. Energy bills, vehicle expenses, and home improvement purchases. Urban areas such as Barking Central (London), Harehills South (Leeds), and Townhill (Swansea).
7 Inner-city commuters 12.6 Renters. A small number have EVs. Commuter, taxi travel, and housing bills unrelated to energy. Below average for most categories, particularly for energy bills, petrol, and vehicle-related expenses. Inner-city neighborhoods like Wormhold Road (London), Leeds City Centre, Victoria Park (Manchester), and Cathays South and Bute Park (Cardiff, Wales).
8 Suburban convenience 18.7 Male homeowners with sole accounts. Vehicle-related expenses including petrol, restaurants, and home furnishings. Clothing, communication, food, and housing bills unrelated to energy. Suburban areas like Hodge Hill (Birmingham), Braunstone Town (Leicester), and Langstone and Llan-wern (Newport, Wales).
9 Young male urbanites 17.5 Male renters with sole accounts. Taxis, public transport, fast-food, recreation, and housing bills unrelated to energy. Energy bills, food, and clothing. Neighborhoods of London, Sheffield, Manchester, Nottingham, and Birmingham.
10 Affluent families 22.9 Mostly wealthy homeowners with families and higher-than-average EV ownership. Most consumption categories. Housing bills not related to energy, commuter travel, and taxis. Westfield (Bath), Morpeth South and West (Northumberland), Dorset, York, and Pen-y-fford and Higher Kinnerton (Wales).
  • Abbreviations: CF, carbon footprint; EV, electric vehicle.

3.4 Income, savings, and investments

Next, we focus on the level of income, savings, and investments across clusters, recognizing these as key attributes relevant to financial capabilities for decarbonization such as retrofit uptake (Caulfield et al., 2022; Schleich et al., 2021).

Consistent with previous research, we find that clusters with higher average incomes tend to have higher CFs (see Figure 2; Büchs & Schnepf, 2013; Druckman & Jackson, 2009). Higher average incomes are also typically linked to higher savings and investment rates (Advani et al., 2020; Huggett & Ventura, 2000), a pattern broadly observed in our data (Figure 2). However, two typologies—“Suburban Seniors” (Cluster 5) and “Car and Tech Enthusiasts” (Cluster 4)—are unique exceptions to this. “Suburban Seniors” have similar average incomes to “Taxis and Takeouts” (Cluster 6) but have higher average savings and investments (Figure 2). Similarly, “Car and Tech Enthusiasts” have higher average incomes than “Affluent Families” (Cluster 10), yet their average CFs are lower, and their average savings are higher (Figure 2).

Details are in the caption following the image
Income, savings, and investment balances across clusters. Underlying data for this figure are available in Table 2 of Supporting Information S9.

“Suburban Home Improvers” (Cluster 2), “Car and Tech Enthusiasts,”Suburban Seniors,”Suburban Convenience” (Cluster 8), and “Affluent Families” (Cluster 10) have markedly higher average combined savings and investments compared to other clusters (Kruskal–Wallis = 24248.0, p < 0.01).

3.5 Practical applications of segmentation

In this section, we demonstrate how insights from our segmentation can be used for targeted retrofit by focusing on three typologies with higher-than-average energy CFs: “Suburban Home Improvers” (Cluster 2), “Car and Tech Enthusiasts” (Cluster 4), and “Affluent Families” (Cluster 10). Figure 3 identifies geographic areas with high retrofit potential, defined as MSOAs that feature both a relatively energy-inefficient housing stock and a high concentration of households from these typologies. Specifically, the map highlights neighborhoods in England and Wales that rank in the upper 20th percentile for both low EPC ratings (dwellings rated below C) and clusters with higher-than-average energy CFs.

Details are in the caption following the image
Map of England and Wales showing neighborhoods with high retrofit potential. Underlying data for this figure are available in Table 3 of Supporting Information S9.

These parameters can be adjusted for a more granular analysis if needed. Further details on threshold selection are provided in Supporting Information S6, while insights into the spatial distribution of clusters can be found in Supporting Information S7.

4 DISCUSSION

This study introduces a novel segmentation of households in England and Wales using FTD to enhance understanding of household CFs. Analyzing spending patterns and socioeconomic characteristics of over 700,000 households, we identified 10 distinct household typologies, each with unique consumption behaviors and carbon profiles.

Our findings reveal significant CF variations influenced by factors such as income, tenure, age, and spending habits. For example, higher-income clusters like “Affluent Families” and “Car and Tech Enthusiasts” exhibit above-average CFs across most consumption categories (Figure 1), consistent with prior research linking income to emissions (Büchs & Schnepf, 2013; Druckman & Jackson, 2009). In contrast, “Taxis and Takeouts” have lower direct from energy and fuel but higher indirect emissions from recreation, dining, and clothing (Figure 1), reflecting urban lifestyles.

The spatial distribution of these typologies highlights policy opportunities. For example, “Suburban Home Improvers” cluster in suburban areas, while “Inner-city Commuters” are concentrated in major cities (Supporting Information S7). This detail enables targeted interventions, such as retrofit programs in areas with high-energy CFs and poorly insulated homes (Figure 3).

This research supports existing work demonstrating the advantages of FTD over self-reported data in profiling carbon emissions. FTD offers high-resolution, real-time spending insights while mitigating inaccuracies associated with self-reported surveys (Trendl et al., 2023). Using FTD for segmentation also addresses important limitations of previous studies, such as small sample sizes and lack of spatial detail necessary for local policy interventions (Baiocchi et al., 2022; Froemelt & Wiedmann, 2020; Singleton & Longley, 2019).

Household CFs in our sample range from 12.6 to 22.9 tCO2e across clusters, with a mean of 17.8 tCO2e for the overall sample (Supporting Information S5). These values are comparable to existing studies (e.g., see Owen & Büchs, 2024; Trendl et al., 2023), though our higher upper range likely reflects FTD's ability to capture expenditure extremes often underrepresented in self-reported surveys (Eckerstorfer et al., 2016; Kilian et al., 2023; OECD, 2018; Trendl et al., 2023).

4.1 Policy implications

This segmentation enables tailored policies and communications that align with the specific needs, motivations, and financial capacities of household types to drive greater engagement (Schanes et al., 2016). Using the example of retrofit, “Suburban Home Improvers” may respond to financial incentives related to home improvement projects, given their interest in hardware and gardening (Figure 1). Tailored messaging for this group could emphasize the potential for improving property value (Fuerst et al., 2015) and the health benefits of energy-efficient homes (Aravena et al., 2016; Curl et al., 2015), which align with their higher spending on health (Figure 1). “Car and Tech Enthusiasts” may respond well to technology-focused solutions like smart home systems and solar panels, while “Affluent Families,” may prefer propositions that cater to family needs.

Both “Car and Tech Enthusiasts” and “Affluent Families” are familiar with financial products (Figure 1), which suggests they may be open to retrofit financing options—though loans are not always popular (Bolton et al., 2023). Research shows homeowners often prefer using personal savings or inheritance over loans for renovations (Bolton et al., 2023), reflecting a common behavioral bias in financial decisions: The source of money influences how people feel, which in turn influences how they spend it (Levav & Mcgraw, 2009). This is particularly relevant as “Car and Tech Enthusiasts” tend to have higher savings and investment levels than “Affluent Families” (Figure 2). Savings or inheritance can make renovations more attractive and serve as “trigger points” for retrofits (Bolton et al., 2023). Targeting these households with retrofit information at such moments could enhance impact and uptake.

Our segmentation approach identifies households with the highest emissions reduction potential, both in terms of actual emissions and financial capacity. While government funding should prioritize low-income and fuel-poor households (Middlemiss et al., 2023), significant funding gaps in the United Kingdom mean many owner-occupiers must self-fund retrofits to meet national net zero targets (UK Parliament, 2024). Local authorities, while primarily focused on retrofits for social housing, could also use this segmentation to enhance voluntary or incentive-based approaches for private-sector households. For example, councils providing energy advice services (e.g., see city of York council; Jones & Dean, 2024) or coordinating with private-sector retrofit providers could tailor outreach strategies to high-energy-consuming homeowner segments.

Beyond retrofits, segmentation can support other policy areas, such as EV adoption or sustainable consumption. For example, targeted policies can consider potential challenges faced by groups with high vehicle CFs, such as limited public transport access and financial constraints among “Rural and Suburban Renters” (Büchs & Schnepf, 2013; Muñoz et al., 2020). Moreover, campaigns encouraging secondhand shopping or swapping markets (Gwozdz et al., 2017; Nielsen et al., 2022) could be targeted toward groups like “Trendy Families,” “Taxis and Takeouts,” and “Affluent Families,” who show high CFs from clothing (Figure 1).

Household segmentation data, from both public and commercial sources, have been useful elsewhere in influencing local policy. For example, the Greater London Authority uses the open-access 2011 Output Area Classification (OAC) based on the UK census data (Gale et al., 2016) to analyze CFs of London boroughs, aiming to address rising CB emissions and engage local businesses and communities in reduction efforts (London Council, 2024; Owen, 2024). Similarly, Newport City Council has benefited from using commercial household segmentation data to tailor initiatives that align with specific household groups’ needs, preferences, and behaviors (Leventhal, 2016).

While commercial geodemographic classifications raise concerns about transparency and reproducibility, open classifications often lack detail and timeliness (Singleton & Longley, 2019). Given the increasing privatization of consumer data (Singleton & Longley, 2019), new data-sharing frameworks are needed to balance accessibility and privacy (Trendl et al., 2023). While our method is not fully reproducible, it remains transparent and capable of generating aggregated insights useful for policy.

4.2 Limitations

While this study advances the understanding of household CFs and offers practical insights, it has several limitations. First, reliance on FTD from a single bank, despite being one of the largest in the United Kingdom, may introduce sample bias. Although primary-banked customers offer a near-complete spending picture, this analysis does not capture households that primarily use other banks or are unbanked.

Second, while the use of spend-based carbon accounting is effective for capturing a broad range of consumption activities, it does not capture emissions from non-transactional activities (e.g., cash purchases). Additionally, the assignment of CMs based on transaction categories assumes homogeneity within categories, which may overlook variations in the carbon intensity of specific products or services.

Third, we cannot estimate per-capita CFs due to a lack of data on household composition. While joint accounts suggest multi-person households, exact sizes remain unclear, and sole accounts may reflect shared spending. Segments with high energy CFs often involve joint accounts, indicating larger households, but their per-capita CFs may be lower when adjusted for household size (Ivanova & Büchs, 2022). Nonetheless, by overlaying our data with energy-inefficient housing distribution, we can still identify areas with significant retrofit opportunities, regardless of whether high CFs originate from larger or smaller households.

Additionally, because we did not define households based on address information, it is possible that individuals from the same household appear in different clusters. This risk is lower in the case of sole account holders, as energy bills are typically paid from one account, and we filtered those individuals who do not have at least one energy-related expenditure per year. While it is possible individuals from the same household appear in multiple clusters, our segmentation still highlights their diverse preferences for tailored communication strategies.

4.3 Future research

Future studies could address these limitations by incorporating data from multiple financial institutions to enhance representativeness. Expanding the scope to include Scotland and Northern Ireland would also provide a more comprehensive view of the United Kingdom. Furthermore, employing longitudinal data could also facilitate the evaluation of policy interventions over time and the evolving dynamics of household consumption patterns.

While internal validation of household typologies was conducted via resampling (see section 2.3.3), external validation, or “ground-truthing,” is also important (Vickers & Rees, 2011). This involves assessing how well cluster descriptions align with real-world observations, such as through area-based visits, consultations, or local interviews (Owen et al., 2023b; Vickers & Rees, 2011). Future research could apply this approach to enhance confidence in our typologies for policymakers and other stakeholders. Additionally, testing communication strategies through online behavioral choice experiments could gauge the effectiveness of retrofit interventions among typologies (e.g., see Achtnicht & Madlener, 2014).

Further disaggregating the 10 groupings into meaningful sub-clusters, such as the OACs (Gale et al., 2016; Vickers & Rees, 2011; Wyszomierski et al., 2023), could also help uncover variations in household CFs, consumption patterns, and financial capacities within each group. This would allow for more tailored interventions suited to different policy questions aimed at reducing CFs.

Finally, further research on the application of segmentation for targeted retrofit could explore linking additional data sources to FTD at the individual account level, such as EPC or smart meter data (e.g., Few et al., 2023). Linking consumption behaviors with details like energy use and building characteristics offers useful insights for targeting retrofits. However, advancing this research direction would require overcoming key challenges, such as ensuring strict compliance with data privacy regulations and addressing the technical challenge of matching noisy address data across different time periods.

5 CONCLUSION

This study demonstrates the value of using FTD to produce a more nuanced segmentation of UK households based on CFs and socioeconomic characteristics. By identifying 10 distinct household typologies, we offer a new framework for policymakers to consider in their implementation of targeted policies and the associated communications. There is a clear need for novel approaches to drive faster uptake of sustainability measures. Through the example of retrofit, we show how this data-driven approach could accelerate this process and highlight the importance of considering diverse socioeconomic, geodemographic, and behavioral characteristics to make interventions more effective.

ACKNOWLEDGMENTS

This work would not be possible without the invaluable support and contributions of many individuals to whom we extend our heartfelt thanks: Alex Anwyl-Irvine for his guidance on segmentation methodologies; Lara Vomfell for her assistance with data compilation and coding; Alistair Gilfillan and Alec Phillpotts for their encouragement and support; Daniel Marron, Paul Jefferson, Andrew McCulloch, and Neil Oliver for their valuable input; and finally, Jaideep Mann, Trystan Davies, and Ranil Boteju for making this work possible. Jasmine Wells reports financial support was provided by Lloyds Banking Group plc.

    CONFLICT OF INTEREST STATEMENT

    The authors declare no conflict of interest. The views and opinions expressed are those of the authors and do not necessarily reflect the views of Lloyds Banking Group plc (LBG), its affiliates, or employees.

    ETHICS STATEMENT

    The LBG Privacy Risk and Impact Assessment process recognizes the ethical basis for this study on aggregated, anonymous data as part of a strategy to help customers transition to a more sustainable future. Upon opening an account, LBG customers consent to their data to be used for research: https://www.lloydsbank.com/help-guidance/customer-support/privacy-explained/data-privacy-notice.html.

    ENDNOTES

    • 1 In the United Kingdom, EPCs assess energy efficiency of buildings on a scale from A to G, with A being the most efficient.
    • 2 It is standard practice for banks to maintain TCSs, which underpin crucial analytical processes.
    • 3 Recent survey finds over 90% of UK adults have accounts with multiple banks (Business Money 2024). Restricting our sample to only individuals subscribed to a single bank provider (LBG) would reduce representativeness of our sample and risk introducing demographic bias.
    • 4 Except for one MSOA, South Tottenham, which has a sample of 75 customers.
    • 5 Child benefit payments are available to all UK residents responsible for children under 20 if they are in education. These benefits are tax-free unless a household member earns a net income exceeding £60,000.
    • 6 The Living Cost and Food Survey dataset annually collects detailed information on household expenditure, income, and living conditions in the United Kingdom.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are available from LBG but are restricted due to licensing agreements. Data are available from the authors upon reasonable request and with permission from LBG.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.