[Retracted] Analysis of Emotional Color Representation in Oil Painting Based on Deep Learning Model Evaluation
Abstract
When an artist creates an oil painting, it is rich in emotion. Color is the main way to express emotion in oil painting. As far as the art field is concerned, color is an objective phenomenon that people can really feel its unique richness and brilliance, and different people will have different emotions when facing the same picture. Color itself is not emotional, but people have different psychological feelings by looking at different colors. This is the power of color. In the same way, color is also a carrier for painters to convey emotions in oil paintings. The study of emotional color expression in oil painting is more helpful for people to understand the emotion conveyed by oil painting. Color has an important role and an irreplaceable position in oil painting creation. In the creation of oil painting, color expression is combined with emotional expression. The creator conveys his experience and feelings in the form of color. In this paper, we look at the relationship between color and emotion, the emotional expression of color, and the expression of emotion in painting. The creators of oil paintings convey their feelings and experiences in the form of color, which plays a unique and important role in portraying the image of their works, expressing emotions, and creating an atmosphere, and we analyze and reveal how to use paint to express emotions in oil painting.
1. Introduction
Colors have different connotations in different fields. Due to different uses, the definitions of colors in different fields will also change [1]. As far as the field of art is concerned, color is an objective phenomenon, and people can really feel its unique richness and splendor. In addition, color is also one of the most visually conveying elements in the art of painting, and its role cannot be replaced by other languages and symbols. Color is the vitality of painting, it plays a very important role in the transmission of meaning and the formation of appeal, and it reflects the artist’s emotional concept and subject concept [2].
In oil painting creation, color is an indispensable and important factor. It is a means for the creator to express his emotional tendencies, and it is also a symbol of the creator’s emotion. It can directly convey the creator’s personal emotions and directly affect the creation. Different people will have different emotions when facing the same picture. Color itself is not emotional, but people have different psychological feelings by watching different colors. For example, people feel melancholy and sadness when they see blue, warm, and festive when they see red, and vitality and vitality when they see green. Vitality, giving this feeling to color, this is the emotion that color conveys to people. Color can make paintings more appealing. Paintings from the earliest sketches to today’s colorful forms illustrate the important role of colors in emotional expression [3].
In the eyes of painters, color is not only the color of objective things but also a form of emotional expression formed by painters based on their own life experience and aesthetics [4]. A good oil painting is not only very mature in the use of color but also must be the product of the painter’s true feelings. The painter’s emotion and the color of the oil painting have achieved unity and blending to a certain extent. Only when the painter accurately expresses his emotions through the color of the oil painting can the viewer fully understand the emotions of the oil painting and the painter [5]. No matter how oil painting is presented, it is undeniable that its fundamental starting point is from emotion. When dealing with color as an objective thing, color itself has no emotion. In our eyes, color is only a physical state of existence. Appreciate oil paintings with subjective emotions, the colors of oil paintings will show a variety of emotions. For example, distorted brushwork and irritating colors will give people a strong artistic impact; plain colors will make the whole picture appear smooth and stable. In short, in the face of different colors, people will not only have complex visual experience and emotional associations but also various emotions, which is the emotional expression of oil painting colors [6].
In this paper, we look at the relationship between color and emotion, the expression of emotion in color, and the expression of emotion in painting. The paper focuses on the relationship between color and emotion, the emotional expression of color, and the expression of emotion in painting [7–9].
Emotions are produced by individuals in actual activities and are subjective reflections of the objective external environment [10]. This subjective reflection takes various forms, such as the emotions that humans often hold. In the process of oil painting creation, the creator expresses his personal feelings by controlling rich colors. Color can be said to be the language that the creator conveys his emotions in the creation of oil painting. [11]. For example, Van Gogh’s work “The Yellow House” (Figure 1) is an oil painting with strong color contrast. The artist conveys his feelings and experiences in the form of colors.

2. Related Work
Color is a natural existence endowed by all things in the world. The physical properties of color itself do not carry any thoughts and emotions. People inject subjective thoughts and emotions into colors and extend deeper meanings [12]. In the process of oil painting creation, the author expresses his emotions through the reorganization and construction of natural objects and colors, which is an important means of aesthetic expression. In oil painting creation, color is the way of visual communication and the way of expressing human emotions [13]. The world of human life is colorful, and oil painting creators present real life through color to express the unique charm of art [14]. Purity is a form of emotional expression. Purity is a painting attitude and a noble spirit.
These shades and shades will have different changes under the influence of the light source. Therefore, in the process of shaping the objective things, the author should use different color shades to shape them according to the structure of the objective things. This physical property of color is generally used when shaping the highlights of characters or objects [15, 16]. Each color in the picture is not an isolated existence, these colors together constitute the environmental color in the picture, and the environmental color is the basis of the unity and coordination of the picture [17].
In color reconciliation, people should try to maintain the brightness and purity of the color to form a stable state, which can reduce the contrast of the color, so that the color can show an orderly emotional color performance [18]. A certain skill should be used for color harmony [19]. In oil painting creation, creators should study the application of color, understand the relationship between colors, and make the purity of light and shade of a color system harmonious and orderly [20]. In this way, the hue, purity, brightness, etc. presented in the picture will be somewhat different, and this difference will clearly open the level of the picture, making the picture more spiritual and vivid. Therefore, if the color is moderately reconciled, the picture will present a harmonious and unified visual effect, which is called color reconciliation. When the author creates part of the figure painting, taking the skirt as an example, I will use a brighter red for rendering, because red has a warm and warm visual effect, and it has a relatively strong contrast with the surrounding environment [21].
Emotions are the feelings produced by the interaction between life phenomena and people’s hearts. It not only refers to people’s emotions but also refers to all human sensory, physiological, psychological, and spiritual feelings [22]. No matter what form of painting it is, there is an emotional element in its creative process. Through the rich emotional expression, the creator makes the works of art present a variety of colors. Color is one of the most basic elements in the field of painting, and it is also one of the most important means of expression in oil painting creation. Through the emotional expression of color in the painting, the appreciator can realize the spiritual communication and ideological collision with the creator, so as to obtain an emotional experience and generate emotional resonance. Therefore, the color becomes the emotional communication between the creator and the appreciator and the picture. The ties and bridges of the artist are the medium that embodies the artist’s soul [23].
Matisse said “The main purpose of color should serve as much as possible to express [24].” The creative process of art is not only a process of emotional catharsis but also a process of emotional enhancement and sublimation, that is, the creator expresses his emotions in various painting languages and conveys his emotions to the appreciators in various painting languages. Throughout the history of the development of art in the world, the handed down works left by devoting all the emotions of the artist can always be remembered by people [25].
3. Methods
At present, due to the excellent performance and excellent model structure of deep learning have been widely used in various fields, it is more and more common to use deep learning algorithms to process artistic emotional color analysis. The training process of DL is in Figure 2.

For the analysis of emotional color expression in oil painting, one can easily give an exact answer as to which art painting has a higher quality of emotional color expression, but it is difficult to give a good reason to support it. This reason is seen from the perspective of machine learning. Although it is difficult to generalize and generalize the characteristics of emotional color expression, there is a human consensus on emotional color expression. As shown in Figure 3, it can be easily recognized that the art painting in Figure 3(a) has higher emotional color performance quality, the reason or the emotional color performance characteristics can be summarized as Figure 3(a) using the rule of thirds composition, higher color contrast, higher definition, and so on. Although these emotional color performance characteristics are subjective, their commonality cannot be denied. Therefore, the computational emotional color performance based on the visual stimuli of art paintings has a foothold. The evaluation of the emotional color performance of art paintings in the field of computer vision is to use the computing performance of the computer to imitate the human brain to score the emotional color performance of art paintings, so as to predict the emotional color performance response of human beings to the visual stimuli of art paintings. It can even replace the human brain to perform batch-type evaluation of the emotional color performance quality of oil paintings.


Similarly, human responses to the visual emotional attitudes of different art paintings are also highly subjective, but their subjectivity is more limited by the content of the art painting. As shown in Figure 4, any artwork containing a “smiley face” (visual content) tends to be more likely to be identified as having a “happy” or “satisfied” emotional attitude. As can be seen, the same or similar visual content will point to the same emotional stimulus. Therefore, it is possible to identify emotional information based on the visual content of different art paintings, which provides an opportunity for computer scientists to study the recognition of emotional information in art paintings. Analogously to the evaluation of the emotional color representation of art paintings in the computer domain, the recognition of emotional information of art paintings in the computer domain is based on the relationship between the pixel values of art paintings with different emotional attitude responses.

Figure 5 shows the overall architecture of the network in this paper, and the basic architecture of each channel is ResNet50. This chapter then introduces the design details and formulation description of the feature fusion unit, the core component of the network in this paper, and proves that this unit will not affect the forward and backward propagation of the overall network. The multiscale local and global mean pooling operations employed are detailed next in the given network. Then, the details of the network branch design of the multitask network (AENet) for emotional color representation of oil paintings proposed in this paper will be introduced, in which the details of how the two-way task autocorrelation data streams (aesthetic stream and emotional stream) extract their own features and share between tasks will be introduced in detail. How data streams (shared streams) extract common knowledge representations. Finally, the adopted loss function and trade-off strategy are introduced.

3.1. Construction of the Dataset
The images with tags of aesthetics and emotion (IAE) for multitask analysis of image emotional color representation and emotion prediction were introduced in this section. The uniqueness of this dataset is that it is obtained from internet image search engines (Flickr and Instagram), and the internet can acquire images from various real-world scenarios, which makes the distribution of the dataset closer to real life, training on it. Images containing emotional information were collected by querying an image search engine using eight emotions as keywords. Then, for the collected images, let the volunteers on the crowdsourcing website score them, initially obtain a large-scale data set with “weak emotional markers,” then use the above information to screen more effective “experts” (score with higher confidence), and obtain a subset of the “weak sentiment marker” dataset by subscores from these “experts.” Then, through prescreening based on expert votes, images with strongly labeled sentiment information are selected, and finally a total of about 23,000+ strongly labeled images with eight emotion labels are obtained, all emotion categories have about 1100+ images, and each category has a good balance to ensure that there is no problem of category imbalance learning in the subsequent model training process.
First, at the beginning of scoring, the author of this paper first constructs a database subset of 800 images (100 images for each emotion category) for about 20 registered users for preliminary scoring, calculates the Kullback-Leibler divergence of the score distribution between every two users, and then filters out the users with larger emotional color performance deviation. By selecting rating users, this study improves the confidence of labels. Finally, the paper selects 10 specific “experts” (half male and half female) to label each image’s quality of emotional color representation. This paper assigns different scores to the four emotional color performance quality labels at the back end of the scoring website, from high to low, 10, 7, 4, and 1, respectively. The final emotional color performance quality of each image is a combination of user scores, that is, the emotional color performance quality evaluation label of oil paintings ranging from 0 to 100 points.
The total number of images after further screening is 22098, the result of the score distribution is shown in Figure 6(a), and the horizontal axis marks the emotional color performance quality score with mean score. As expected, the summed scores are approximately normally distributed around the midpoint of the rating scale, so the image emotional color representation label distribution has good discrimination and confidence. In this paper, the labeled dataset is divided by setting an intermediate threshold and then grouped by different emotional color performance quality in each emotion. The results of all 22086 image categories are shown in Figure 6(b), of which there are 12647 high-quality emotional color representation images and 9451 low-quality emotional color representation images. The emotional category on the horizontal axis marks the emotional color expression and the classification of emotions, and the frequency on the vertical axis marks the frequency of occurrence of each category. From the observation of the histogram, it can be found that the more positive the image emotion, the higher the quality of the image’s emotional color performance, which indicates that there is a certain relationship between the image emotion and the emotional color performance, which strengthens the previous related conjectures in this paper.


The large-scale image oil painting emotional color representation dataset (IAE) annotated in this paper has a total of eight emotion categories (entertainment, anger, awe, satisfaction, disgust, excitement, fear, and sadness), all of which are strong confidence labels. Then, we classify them into 2 classes (high quality and low quality) by setting a reasonable median threshold. In all subsequent comparative experiments, unless otherwise specified, the 22098 images of the IAE dataset were randomly divided into training set (70%, 15466 images), test set (20%, 4417 images), and validation set (10%, 2215 images). With the iteration of training, 10% of the validation set is gradually integrated into the training set to learn the final multitask model.
Equations (2) and (3) formally describe the forward and backward computations of the fusion layer unit in the neural network framework. It can be concluded that the learning of the neural network embedded in the fusion layer is still end-to-end and will not be affected by the fusion layer designed in this paper. In addition, the introduced four parameters αs, αp, βs, βp that control the degree of sharing are automatically learned with the iterative process of the network. In this paper, considering that the high-level perception of images cannot be separated from the low-level features of images, it is necessary to comprehensively consider different levels of feature synthesis information in the learning process to improve the performance of learning tasks. Therefore, in the network architecture of this paper, the local pooling from different layers will be connected together with the global mean pooling of the last layer, and the reuse of features is maximized.
As shown in Figure 7, the feature maps of the last three stages of Resets50 are given in this paper, and their scales are 28 × 28, 14 × 14, and 7 × 7. This paper does not want to add additional convolutional layers to introduce redundant parameters as additional nontransfer parameters, which will affect the learning effect of the network. Therefore, partial mean pooling and global mean pooling are directly used in different feature layers, and the results are summarized. It is known that ResNet will change the spatial dimension of its feature map after each conv bl, the dimension of each upper layer is 2 times that of the latter layer, and the size of the feature map of the last layer is 7 × 7, so this paper uses a sliding window with a size of 7 × 7 and a step size of 7 to do local mean pooling in the middle and low layers. Finally, the obtained feature map after pooling is redrawn to a vector of size 1 × 1 × ∗ and directly stacked with the vector of the same dimension of the last layer, and finally, the feature collection induction vector is obtained.

It has been seen from Figure 7 that the newly proposed network architecture for multitask analysis of emotional color representation in oil painting consists of the following two main parts: three network branches (emotional color representation flow branch, shared flow branch, and emotional flow branch) and two task loss function. In the last fully-connected layer of the network, our model stacks shared features and feature vectors of respective tasks to analyze the multitask analysis network’s perception of emotional color representation quality and emotional information.
Next, three network branches are introduced from top to bottom as shown in Figure 7. The emotional color representation perception network branch extracts the emotional color representation feature representation directly related to the input image. The basic network here is ResNet50, which first fully pretrains on the ImageNet dataset, then performs transfer learning on the obtained pretraining weights, and further fine-tunes the pretraining on the emotional color representation training set. .
The shared perception network branch extracts emotional color representations and shared feature representations between emotions, and the road network integrates the fusion perception of emotion and emotional color representation through the learning and iterative computation of fusion layers. Because the training sets of these two related tasks have passed through the road network, through learning iterations, the road network hopes to ignore the specific features of the autocorrelation of the two tasks; finally, only the shared features that utilize two related tasks are extracted, and the shared features can be utilized by the two single tasks of emotional color representation and emotion at the same time, thereby improving the learning performance. This network is also based on ResNet50, and the training here is only carried out on ImageNet and then migrated to the multitask analysis network, which ensures the difference of the weights of the initialization of the three-way network and ensures that the three-way network extracts features to some extent. The inconsistency is conducive to the multitask analysis. For a certain input image Xi, the final shared feature vector .
The emotion-aware network branch extracts emotional autocorrelation features, and the network settings can be analogous to the emotional color representation branch network, except that fine-tuning is performed on the emotion dataset. For a certain input image Xi, the final shared feature vector .
Among them, y represents the true value label of the classification category, represents the predicted value label, and the hyperparameter λ controls the degree of importance between tasks. The setting of the hyperparameter λ can change the size of the value changed by the backpropagation network branch.
4. Case Study
Paintings contain rich emotional factors of the artist. Different colors will bring different psychological feelings and aesthetic experiences to the appreciator. In the modern and contemporary times that emphasize the development of individuality, oil painting, as a painting art form that can express human thoughts and feelings, has had a broad and profound impact on people’s lives. Without color, the art of painting lacks direct emotional expression, and it cannot allow viewers to appreciate oil painting works of art full of artistic charm. Take Van Gogh as an example, Van Gogh’s painting career was only 10 years, and many of his works used strong colors to express his feelings for the objective world. “I’m looking more and more for a simple technique, which may not be an impressionist technique. I hope to draw at least so that all people with two eyes can understand at a glance.” With this creative concept in mind, sunflowers, olive trees, and golden wheat fields (Figure 8) in his works all seem to have souls. Personal emotions make Van Gogh’s brushstrokes constantly changing, either repressed, grief-stricken, or calm. Van Gogh demanded himself with strict criteria all his life and always pursued higher artistic ideals.

Compared with the single-task model, it can prove the rationality and effectiveness of the multitask idea, and compared with the multitask model, it can prove the innovation and superiority of the AENet model in this paper.
4.1. Single-Task Comparative Experiments
These model approaches serve as single-task learning methods without multi-task assistance. First of all, this paper uses the basic network, that is, ResNet50, as a comparative experimental model, and this paper uses ResNet50 to identify the comparison method. Then, this paper also considers ResNet50 with increased width, which is also called WRN in the industry. Since the hyperparameter k introduced in the WRN network is set to 2 in this paper, it is identified by WRN (k = 2). In order to exclude the possibility that the increase of parameters will improve the performance, this paper also adds a deeper ResNet for comparative experiments, which has twice the number of parameters than ResNet (on average, the multitask analysis framework is 1.5 times the number of parameters of ResNet50), ResNet101 is used here, so it is identified as ResNet101.
4.2. Learning and Scaling
The fusion layer unit is composed of four learnable parameters αs, αp, βs, βp through a linear connection. It is known that its initialization range is [0, 1], so its value range is obviously “unmatched” with other parameters in the network. To be precise, the size of the introduced learnable parameters is 2 to 3 orders of magnitude multiples of the convolution kernel parameters in the network, so the back-propagation gradient update value through the fusion layer is in the order of magnitude of 10−4 ~ 10−3, which cannot meet its update conditions at all. More precisely, its sluggish update rate results in essentially negligible changes in the four introduced learnable parameters. Therefore, this study needs to scale the learning rate of the network data flow when it passes through the fusion layer unit. In practice, this study found that setting the overall initial learning rate of the network to 10−5 ~ 10−4, and the scaling ratio of the fusion layer to 102 will enable the network to achieve faster convergence speed and ultimately achieve the best learning performance. Similarly, the initial value of the hyperparameter input also needs to be manually set. Different initial values will cause the learning effect to have a focus between tasks. The initial value set in this paper is 4.0.
First of all, this paper introduces the insertion of the fusion layer. In this study, the five stages of ResNet50 are split in the design architecture of the network. The output feature maps of each stage are then sequentially used as the input of the fusion layers designed above in this paper (each fusion layer is a dual input). Then, the data flow passes through the linear combination unit and then outputs the feature map. In other words, the fusion layer does not change the feature map size of the network after being inserted into ResNet50, so the network can be connected again through the fusion layer. Then, it is necessary to explain that this paper adds the L2 norm to the fully connected layer of the ResNet backend for specification, and the hyperparameter value of the L2 norm is set to 0.0 l, which fully prevents the overfitting of the network learner. Note that the L1 norm will reduce the stability of the model, and the model training will have “shocks,” resulting in poor learning performance and should be avoided. The learnable parameters introduced in the fusion layer are also regulated by the L2 norm, because this study needs to be able to effectively control the fluctuation of these parameters in the range of 0 to 1.
For the input of the model, the size used in this paper is 224 × 224 × 3, so each input image needs to be resized in size. The batch size is set to 16 or 32, and a smaller batch size will affect the learning performance. In this experiment, emotion is an 8-category task, and emotional color performance is a 2-category task, so the output of the model is an 8-dimensional vector and a 2-dimensional vector, respectively, which can represent the probability prediction value of each category after normalization by softmax. The activation function is consistent with the overall network, and the ReLu function is used. In this experiment, the improved ReLu functions such as Leacky ReLu have no obvious improvement in the experimental effect. Therefore, simple ReLu is used for model simplification to improve the speed of model training and reduce the dependence on computing power.
Table 1 shows the performance of different models and methods on the benchmark dataset constructed in this study, in which the emotional accuracy and emotional color performance accuracy, respectively, represent the test set accuracy performance of the two tasks, and the method in this paper obtains the highest accuracy and marked in bold black. This paper finds that compared with the single-task ResNet50, the increase in width and depth of ResNet cannot significantly improve the performance of the two single-tasks in this paper. The reason is that the increase in depth or width of the wide residual network will reduce the efficiency of feature reuse and slow down the training, so the performance can only be slightly improved compared to the basic ResNet50 network. If simply compared with the single-task model, the method proposed in this paper achieves about 4% and 5% performance improvement in the accuracy of emotional color representation analysis (2 categories) and emotional information recognition (8 categories) in oil paintings, respectively. At the same time, for the multitask comparative experiment, this study found that it is difficult for the traditional Split-CNN to improve the learning performance of the two tasks at the same time. Net achieves the best results because the cross-stitch unit can learn the degree of sharing across multiple tasks.
Method | Emotional accuracy (%) | Aesthetic accuracy (%) | |
---|---|---|---|
Single task | ResNet50 | 61.08 | 74.86 |
WRN (k = 2) | 62.66 | 75.15 | |
ResNet101 | 61.74 | 78.77 | |
Multitasking | Split-CNN | 60.22 | 77.78 |
CS-net | 63.18 | 77.71 | |
AENet-FL | 61.45 | 76.69 | |
AENet (ours) | 66.24 | 81.06 |
The limitation of multilevel representation in Net, coupled with not providing a separate branch structure for shared features, its performance improvement can only reach around 3% and 4%. In addition, it can be easily seen that the emotional color performance of the fusion layer is removed. The performance of the emotional multitask analysis network drops sharply, which fully illustrates the effect of the fusion layer proposed in this study.
In order to further illustrate the model in this paper, Figure 9 shows the specific values of the accuracy of the proposed method and CS-Net in each category, and the visualization shown in the figure is a confusion matrix. As shown in the figure, it is not difficult to find that the traditional CS-Net not only has a lower overall recognition accuracy but also tends to confuse some emotions, such as “scared” and “sad,” which is a big flaw, when the emotional color representation information is introduced, the confusion is significantly improved. What is more, in each emotional color representation and emotion meticulous category, the multitask analysis network framework for emotional color representation of oil painting proposed in this paper achieves better and more stable recognition accuracy.




5. Conclusion
In oil painting, color is one of the indispensable basic elements and one of the most important means of expression. To some extent, color is the body and soul of oil painting, which can give life and emotion to oil painting. In the art of oil painting, the expression of color should be based on emotion, expressing what touches the artist’s heart, so as to resonate with the connoisseur. Only then can the artwork created be called a successful artwork. When emotional color representation information is introduced, confusion is significantly increased. More importantly, among the various emotional color representation and emotion meticulous categories, the multitask analysis network framework for emotional color representation in oil painting proposed in this paper achieves better and more stable recognition accuracy. In the process of inheriting and exploring traditional Chinese color concepts, we need to continuously interact and integrate with Western painting art concepts, continuously enrich and expand the diversified color language, and continuously create excellent artworks of the new era.
Conflicts of Interest
The authors declared that they have no conflicts of interest regarding this work.
Open Research
Data Availability
The experimental data used to support the findings of this study are available from the corresponding author upon request.