A Demand Forecasting Model Leveraging Machine Learning to Decode Customer Preferences for New Fashion Products
Abstract
Demand forecasting for new products in the fashion industry has always been challenging due to changing trends, longer lead times, seasonal shifts, and the proliferation of products. Accurate demand forecasting requires a thorough understanding of consumer preferences. This research suggests a model based on machine learning to analyse customer preferences and forecast the demand for new products. To understand customer preferences, the fitting room data are analysed, and customer profiles are created. K-means clustering, an unsupervised machine learning algorithm, is applied to form clusters by grouping similar profiles. The clusters were assigned weights related to the percentage of product in each cluster. Following the clustering process, a decision tree classification model is used to classify the new product into one of the predefined clusters to predict demand for the new product. This demand forecasting approach will enable retailers to stock products that align with customer preferences, thereby minimising excess inventory.
1. Introduction
At the end of every season, fashion retailers often face a significant amount of unsold stock left over. To avoid dead stocks, retailers offer discounts, which dent their profit margins. Even after discounts, the business suffers a substantial revenue loss if stocks are left unsold. Poor demand forecasting is the root cause of this situation [1]. Effective inventory planning methods reduce stockouts and overstock and satisfy customer requirements [2, 3].
Demand forecasting poses a significant challenge in the fashion industry for several reasons, including unpredictable consumer preferences, longer lead times, seasonality and trend changes, short product life cycles, and no historical sales data for new products. Typically, forecasting retail demand depends on historical data, whereas the fashion industry introduces new collections every season, making it hard to have sales records. Therefore, it becomes challenging to predict the demand for new products. Generally, retailers make forecast decisions based on their experience and sometimes foresee more demand than is required [4]. In such situations, they end up with more products than needed, leading to unsold surplus inventory. Researchers have used various techniques such as regression, time series, ensembles, and neural networks to overcome these issues by accurately forecasting demand [5, 6]. However, every model has its merits and demerits. Therefore, retailers should choose the appropriate approach in forecasting processes so as not to overstock or stock out new products [7–9].
The success of any business relies on knowing the customer and understanding their decisions regarding purchases. The decisions of businesses in the fashion industry are significantly influenced by the customer’s needs, desires, motivations, and requirements throughout the design, production, promotion, and sales chain [10]. Based on research papers [11, 12], advanced machine learning models have shown promising capabilities in identifying even the smallest shifts in consumer behaviour and preferences. This advanced insight enables proactive demand forecasting, allowing businesses to anticipate better and meet customer demands.
The study uses a data-driven methodology that utilises machine learning (ML) techniques to optimise the demand forecasting approach for fashion retailers. The objective of this methodology is to improve the accuracy of forecasting for newly launched products. The proposed approach involves analysing fitting room data while considering customer preferences. This paper is organised as follows. Section 2 reviews the contributions to the literature. Section 3 outlines the data-driven methodology. Section 4 presents the results and findings. Section 5 discusses the findings. Finally, Section 6 covers the conclusions, limitations, and additional opportunities.
2. Related Papers
2.1. Forecasting of New Products Using Machine Learning Models
Previous research in sales forecasting, especially for new products, has primarily focused on using historical data to predict future sales. It involves traditional statistical methods, machine learning techniques, and, more recently, deep learning approaches. Conventional methods include time-series analysis and regression models, which base their forecasts on patterns observed in past data. Machine learning techniques, such as clustering and classification (e.g., K-means clustering and decision trees), have been used to group similar products together and predict new product sales based on clusters [13]. However, their dependence on past data for predictions limits these methods.
Machine learning models such as ELM and SVR capture complex demand patterns and identify similar ones, leading to more accurate estimates with fewer data responses. Linear regression, polynomial regression, and SVR could predict seasonal product sales by analysing input variables and product features. Models such as regression trees, random forests, k-NN, and linear regression integrate input product features and sales trends for better forecasting [14, 15]. A cluster-while-regress model fits sales forecasting models to each cluster and facilitates personalised forecasting.
Traditional models struggle with the nonlinear complexities of market dynamics and consumer behaviour. However, deep learning models, including Siamese neural networks and attention-based multimodal encoder-decoder systems, improve accuracy by analysing various variables such as unstructured data such as images and social media content [16, 17]. Deep learning models are often used as “black boxes,” making it difficult to explain the reasons for their predictions despite their effectiveness [18].
Ensemble model-based techniques such as random forest, Gradient Boosting Machine (GBM), and XGBoost are recommended to improve sales forecast accuracy. These models evaluate trends in previous product sales and apply clustering techniques to identify patterns [19]. The DemandForest model is a novel methodology that uses advanced techniques such as K-means clustering, random forest, and Quantile Regression Forest to anticipate demand for new items without historical sales data, with better accuracy than traditional approaches [20]. Furthermore, applying a dual-layer strategy addresses the inventory problem in physical stores, assisting fashion managers in determining the competitiveness of products and the number of stocks [21].
Based on previous studies, the research proposes a new approach to predicting sales using fitting room data. By analysing fitting room data, we can generate a customer profile based on product features, enabling retailers to understand consumer preferences, buying behaviour, and purchasing habits. It also allows them to forecast the potential sales of a new product more accurately.
2.2. Fitting Room Data
Historically, sales data have been the primary source of information for forecasting demand, but they do not fully reflect customer preferences [22, 23]. Based on previous studies, demographics and geography are critical to shaping consumer preferences towards product attributes. By carefully selecting a product assortment, businesses can increase customer satisfaction, increasing customer loyalty and repeat purchases. Satisfied customers are more predictable in their purchasing behaviour, directly affecting the retailer’s capacity to forecast demand accurately [24]. Consequently, market segmentation is necessary, as this helps identify the appropriate products for each target group [25]. The data available from fitting rooms thus allow retailers to understand better what customers want and analyse them even more comprehensively. Therefore, data on the fitting room are essential for predicting demand and often serve as a final instance in which the consumer decides whether to purchase [26].
- (1)
The data in the fitting room provide a more comprehensive understanding of customer preferences, including products customers were interested in trying on but ultimately decided not to purchase for various reasons. It captures a broader range of customer preferences, including those not reflected in final sales [27]. On the contrary, sales data capture the final purchase decision of customers, reflecting choices made within the constraints of price, fit, and other purchasing factors.
- (2)
Second, customers typically bring multiple items (3 to 4 garments) to the fitting room, which provides valuable data points on consumer preferences and potential data that sales data missed [28]. Sales data offer a limited perspective, focusing only on purchased items. While it accurately reflects successful sales, it does not understand the full spectrum of customer interests and potential sales opportunities.
- (3)
Fitting room data capture the “potential data”—items customers were interested in enough to try on but did not purchase. The fitting room data include garments that met the customer’s initial approval regarding style, fit, or other criteria but were not purchased for various reasons (price, second thoughts, etc.). These data are crucial to understanding missed opportunities and areas for improvement. Sales data are directly tied to revenue but cannot capture the customer’s initial interest that did not convert into a sale.
3. Methodology
The article aims to predict the sales of newly launched products using data from the consumer profile derived from the fitting room. The methodology consists of several steps: data collection, feature selection and feature aggregation, clustering using the K-means algorithm, assigning weights, and classification using decision tree.
3.1. Data Collection
Data were collected from a retail store in Tamil Nadu, India, for the men’s wear formal shirt category. The collection period is the Spring Summer season for three months with the exclusion of end-of-season sales due to their potential to impact purchase behaviour and skew consumer preferences. The researcher collected the data for the study at the entrance of the fitting room. When a customer enters with a garment for trial, the product number is noted, and the product features listed in Table 1 are extracted [29, 30].
Size | 36 | 38 | 39 | 40 | 42 | 44 | 46 | 48 |
---|---|---|---|---|---|---|---|---|
Sleeve style | Half sleeves, full sleeves | |||||||
Color | Beige, black, blue, brown, cream, green, grey, khaki, light blue, lilac | |||||||
Collar type | Button-down, regular, band | |||||||
Fit | Slim, ultra slim | |||||||
Pattern | Print, patterned, graphic, print, checks, dots, stripes, solid, texture | |||||||
Price range | Rs. 934 to Rs. 3999 |
3.1.1. Ethical Considerations
Ethical considerations were carefully examined before data collection. No extra registration was introduced beyond normal store operations, and no store customer’s personal information was registered. The data collected do not include identifying features about customers or store employees.
The data collected will have a unique customer ID assigned to it, with each garment’s features extracted and recorded in detail. To illustrate this, let us look at an example of two customers (customer ID 1 and customer ID 2) and their data recorded in Table 2. Similarly, three-month data were collected, which include 1630 customer IDs (customers entering the fitting room) and 6540 garments entering the fitting room.
Customer ID | Garments taken into trial room | Size | Sleeve | Color | Collar type | Fit | Pattern | Price |
---|---|---|---|---|---|---|---|---|
1 | 3 | 40 | Full | Black | Regular | Slim | 2299 | |
40 | Full | Grey | Button down | Slim | Solid | 3499 | ||
40 | Half | Black | Button down | Slim | Dots | 2799 | ||
2 | 3 | 42 | Half | Cream | Regular | Slim | Patterned | 1999 |
42 | Half | Green | Regular | Slim | Checks | 1374 |
3.2. Building Profile
3.2.1. Feature Extraction
Machine learning algorithms are widely used to predict sales, focusing on the importance of features for accurate predictions. Identifying the essential input features to predict accurately is crucial, and previous studies claimed that focusing on a comparatively low number of inputs would have been counterproductive. However, a thousand or even tens of thousands of fundamental features need to be changed in the selection process and evaluated based on their success [31].
Several techniques, such as XGBoost, multiprediction frameworks, and permutation feature importance approaches, have been proposed to assess feature significance in sales data [32]. A feature importance score is an effective tool to identify the features that have the most significant impact on consumer preferences. The study uses models such as decision trees, the Lasso coefficient, and random forests to generate this score. These models analyse a wide range of features and subsequently create a score reflecting each feature’s significance in predicting the target variable.
3.2.2. Feature Engineering
Feature engineering simplifies complex datasets by identifying similarities between different aspects, examining how they interact, and organising them according to their importance to the problem [33]. Feature engineering includes choosing relevant features for machine learning techniques. The key to this process is using domain knowledge in the specific field to analyse similar characteristics and group them accordingly [34]. The research introduces a new approach to categorising similar attributes by involving domain knowledge to create a personalised customer profile.
Certain elements of men’s formal shirts, such as sleeve length, collar style, and size, stay the same year after year. However, features such as color, patterns, and prices are constantly changing. As customer preferences constantly change, developing a detailed consumer profile is essential to gain a deeper insight into them. This allows us to group similar features and better understand customer’s wants [35, 36].
Among the dataset, we will consider beige, blue, and pink as warm colors and black, white, and grey as neutral colors. Similarly, patterns are classified based on the design of these patterns, such as abstracts, textured, and geometrics. Categorising prices in the range of INR 500 provides a better understanding of customer purchase patterns. With this grouping, we can understand what customers want so that we can make appropriate product decisions [37].
The colors are broadly categorised into warm, cool, and neutral. Furthermore, patterns are grouped into abstract, texture, geometry, and solid designs. In addition, sizes come in S, M, L, XL, and XXL (see Table 3). In addition, scaling numerical features, such as price, contributes to all features equally during the clustering process. The grouping process enhances feature normalization, improving the clustering algorithm’s effectiveness.
Size | Sleeve | Fit | Pattern | Collar | Color | Price range |
---|---|---|---|---|---|---|
|
|
|
|
|
|
|
3.3. K-Means Clustering and Assigning Weightage
K-means clustering is a popular unsupervised learning algorithm that aims to partition data points into K clusters based on similarity, with each cluster represented by its centroid. The algorithm was applied to the preprocessed dataset with varying numbers of clusters (K) ranging from 2 to 10. Two methods were employed to determine the optimal number of clusters (K): silhouette score and elbow method.
The silhouette score measures the compactness and separation of clusters. The average silhouette score of all samples was calculated for each K value. Higher silhouette scores indicate better-defined clusters. The elbow method assesses the within-cluster sum of squares (inertia) for different K values. A plot of K values against inertia values was generated, and the “elbow point,” where the inertia begins to decrease at a slower rate, was identified. Based on the results of the silhouette score and the elbow method, the optimal K value was determined. The K value that maximised the silhouette score or exhibited a significant change in inertia (indicative of the “elbow point”) was selected as the optimal number of clusters for the dataset.
After determining the optimal K value, the data points are iteratively assigned to the nearest cluster centroid based on the Euclidean distance, and the centroids are recalculated as the mean of the assigned data points. This process continues until convergence, minimising the within-cluster sum of squared distances. Finally, cluster details such as centroid coordinates and cluster sizes are examined to interpret the clustering results and identify patterns within the data.
3.4. Assigning Weights to Clusters
Following cluster formation, cluster ratios are calculated to quantify the distribution of customer IDs across different clusters. The ratio of each group is computed as the percentage of customer IDs assigned to that cluster relative to the total number of customer IDs. The clustered groups show the various profiles under which each customer falls. Now, each profile is given percentage values of the total [38].
3.5. Decision Tree Classification
A decision tree classifier is trained using features of garments and their corresponding cluster labels. The classifier learns patterns in the data and assigns new garments to appropriate clusters based on precalculated profiles.
The framework (Figure 1) comprises three steps: (1) building customer profiles, (2) grouping similar customer profiles and assigning weights for the customer profiles, and (3) classification of new products. Machine learning algorithms are employed to achieve these steps.

4. Results and Findings
4.1. Feature Extraction
To determine the essential features that influence customer preference, three machine learning models, decision tree, random forest, and Lasso Regression, were compared based on performance and importance of the characteristics. After preprocessing the dataset and splitting it into training and testing sets, the models are trained and their importance metrics for features are computed. Results indicate variations in feature importance across models, yet certain features such as price, color, pattern, and size consistently emerge as influential predictors (Figure 2).

4.2. Feature Aggregation
To streamline the analysis process, we combined the categorical features of color, pattern, size, and price to create a customer profile. Colors were grouped into “warm,” “cool,” and “neutral” categories, patterns were categorised as abstract, geometric, texture, or solid, and sizes were grouped into S, M, L, XL, XXL, and 3XL. Prices were grouped into ranges of INR 500 (see Table 3 for details). This method will give each customer ID a unique profile based on these features.
Customer ID:……… Color + …… Pattern + …… Size + …… Price.
4.3. Clustering
To determine the best K value, the silhouette score and inertia values for different K values ranging from 2 to 10 were analysed, and the graph is given below. From both evaluation methods, the K value at 6 is optimal for forming clusters (Figure 3).

Subsequently, K-means clustering was applied with a K value of six, and the algorithm groups similar customer preferences together based on the specified features, effectively segmenting the dataset into six distinct clusters, and the details are presented in Figure 4. The clustering provided a systematic approach to segment customers based on size, price, color, and pattern preferences (features considered as important; refer Figure 2).

In cluster analysis, every cluster is defined by a centroid that represents the mean values of the features of the data points within that cluster. These groups expose distinct classifications based on customer preferences on size, color, pattern, and price, and the group details are given in Table 4.
Cluster 0 | Large sized and higher-priced garments in warm or bright tones with abstract or intricate patterns |
Cluster 1 | Average-sized and lower-priced garments in cool tones with geometric or simple patterns |
Cluster 2 | Similar to Cluster 0 in size and price with warm tones and abstract or textured patterns |
Cluster 3 | Larger sized and moderately priced garments in warm tones with intricate patterns |
Cluster 4 | Larger sized higher-priced garments in cool or neutral tones with geometric or solid patterns |
Cluster 5 | Similar to Cluster 1 in size and price with cool tones and abstract or textured patterns |
4.4. Assigning Weights to the Consumer Profile
The ratio of each cluster is computed as the percentage of the number of garments taken into the fitting room assigned to that group relative to the total number of garments given in Table 5.
Cluster | No. of garments taken into the fitting room | Percentage |
---|---|---|
0 | 722 | 11.04 |
1 | 1561 | 23.87 |
2 | 1584 | 24.22 |
3 | 953 | 14.57 |
4 | 721 | 11.03 |
5 | 998 | 15.26 |
Group 1 and Group 2 comprise the largest proportions of garments, with approximately 23.87% and 24.22% of the total, respectively. Cluster 0 and Cluster 4 represent smaller proportions, each accounting for approximately 11.04% and 11.03% of the total, respectively. Cluster 3 and Cluster 5 have intermediate proportions, with approximately 14.57% and 15.26% of the total, respectively.
4.5. New Product Classification Using the Decision Tree
To predict the demand for new products based on customer preferences, a sample of four garments (listed in Table 6) was provided as input for a classification using a decision tree model. Before running the model on the new data, it was first trained using features such as size, color, price, and pattern from the fitting room dataset, along with their corresponding cluster labels. The model learned patterns and relationships between these features and the corresponding cluster labels to predict better cluster labels based on the feature values. Secondly, the new garment details were preprocessed as the fitting room dataset, with a grouping of characteristics as shown in Table 3.
Garment No. | Size | Color | Price | Pattern |
---|---|---|---|---|
1 | 39 | Light blue | 1799 | Floral |
2 | 40 | Olive | 2600 | Herringbone |
3 | 42 | Red | 1485 | Buffalo |
4 | 40 | Navy | 1889 | Checks |
The model predicts the clusters for the new data points (Garment No. 1, 2, 3, and 4) by passing their feature values (size, color, price, and pattern) through the trained decision tree. It uses these feature values to navigate its nodes and eventually reaches a leaf node corresponding to a predicted cluster label. The predicted clusters for the four new garments are given in Table 7.
Garment No. | Size | Color | Price | Pattern | Predicted cluster | Predicted percentage of sales |
---|---|---|---|---|---|---|
1 | 0 | 0 | −0.226150 | 0 | 0 | 11.04 |
2 | 0 | 1 | 0.805698 | 2 | 5 | 15.26 |
3 | 0 | 3 | −0.630645 | 1 | 2 | 24.22 |
4 | 0 | 0 | −0.110212 | 1 | 0 | 11.04 |
- (i)
Garment No. 1 and Garment No. 4, both with a cool color (light blue and navy), are assigned to Cluster 1, indicating similarity in these attributes.
- (ii)
Similarly, Garment No. 3, with a warm color (red), is assigned to Cluster 2, which may contain garments with similar characteristics. Price and pattern attributes also play a significant role in the assignment.
- (iii)
Garment No. 2, with a high price and a textured pattern (herringbone), is assigned to Cluster 4, possibly indicating a cluster of premium or luxury garments with similar attributes.
The last column of the table shows the predicted percentage sales of new garments, which are calculated based on the weightage of the group obtained from Table 5. After analysing the data, it is apparent that customers will prefer Garment No. 3 compared to the other four garments. This information allows retailers to plan their inventory more effectively based on customer preferences and make informed choices, which is beneficial for them.
5. Discussion
The research discusses a new approach to demand forecasting for new products in the fashion industry characterised by changing trends and customer preferences. The demand forecasting model has shown great potential in incorporating machine learning methods such as K-means clustering and decision tree classification.
The important features influencing customer’s buying preferences were identified based on fitting room data analysed in this study. Decision tree, random forest, and Lasso Regression are machine learning models that were used to identify the influencing features. These include price, color, pattern, and size features that were observed to be significant. After the feature selection process, K-means clustering was applied to the dataset. It identified six customer segments based on their preferences for size, price, color, and pattern. The study showed that consumer’s taste is highly diversified, as evidenced by the clusters found. Several of these preferred large sizes with abstract or intricate patterns and warm or bright colors, while others desired average-sized ones, which were priced lower and had basic geometrical designs using cool colors. Moreover, the research used weights calculated as a ratio of garments entering the fitting room to assess the degree of interaction between each segment of buyers and the types of garments. According to this chart, there appeared to be a greater pull towards apparel under clusters 1 and 2, showing more inclination from customers for these profiles.
In conclusion, based on the framework, the consumer preference decision tree model was used to predict new demands and develop a new product classification strategy. This involved training the model with key features such as cluster labels obtained from the fitting room dataset and then classifying new garment entries into clusters based on the attributes they belong to.
Our findings show that demand forecasting in fashion retail could be achieved through machine learning, which can help in inventory decision making more than personal judgement and historical sales data used traditionally. This study indicates that machine learning can improve inventory management by providing a more customer-centric and adaptive approach.
6. Conclusions
Accurate demand forecasting is essential for the fashion industry to optimise profits and create a sustainable environment. Customer preference is the most significant driver that influences sales in the fashion retail industry. This research study relied on data obtained from fitting rooms to gather information on customer preferences. The collected data are analysed with machine learning models. As the collected data are large, the feature grouping method is used, and customer profiles are generated. Subsequently, the K-means clustering approach was applied to group similar profiles. The silhouette score and the elbow methods were used to determine the optimal number of clusters. According to the results obtained from both methods, clustering with a K value of 6 was the most optimal value for clustering. The six clusters are then ranked based on the number of profiles under each cluster to determine the most popular men’s formal shirts available in a store. Following the weightage assignment step, the decision tree method was used to classify a new garment into predefined clusters based on the fitting room information from the previous season. This approach helps retailers predict customer preferences and stock up on the most sought-after products. This prevents excess unsold stock, maximises revenue, and improves inventory management.
The proposed model has some limitations that could be addressed in future research. Since the study was limited to a single store, the same method can be used to better understand customer preferences across various regions using multiple store locations. In addition, different channels, such as online and mobile apps, can be considered under analysis to improve demand forecasting based on a wide range of customer interaction points. Moreover, the research area may focus on sustainable and environmentally friendly product choices and forecast the demand.
Deep learning approaches like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) can also be applied to better handle unstructured data, such as text and images, widely used in the fashion sector. These models could offer significant insights into consumer preference features such as style, pattern, and trend. Moreover, external factors such as economic conditions and social media trends influence the demand for fashion. It also gives more accurate demand predictions by continuously capturing trends through real-time data from online fashion platforms and social media.
The efficiency and precision of data collection in this study can be enhanced by using advanced technology like RFID to automate data collection. The study also developed a model, but its precision has not been evaluated. The latter can be done through training and testing the model using data from two subsequent seasons. Addressing the limitations and exploring more areas for research could lead to a more accurate demand forecasting model that assists retailers in reducing unsold inventory levels and improving profitability.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Open Research
Data Availability
Research data cannot be shared.