This manuscript is dedicated to Robert L. Taylor on the occasion of his 90th birthday and to his many contributions to the finite element method and his open source software FEAP. Congratulations, Bob: tang,, 1!
Abstract
Sparse regression and feature extraction are the cornerstones of knowledge discovery from massive data. Their goal is to discover interpretable and predictive models that provide simple relationships among scientific variables. While the statistical tools for model discovery are well established in the context of linear regression, their generalization to nonlinear regression in material modeling is highly problem-specific and insufficiently understood. Here we explore the potential of neural networks for automatic model discovery and induce sparsity by a hybrid approach that combines two strategies: regularization and physical constraints. We integrate the concept of Lp regularization for subset selection with constitutive neural networks that leverage our domain knowledge in kinematics and thermodynamics. We train our networks with both, synthetic and real data, and perform several thousand discovery runs to infer common guidelines and trends: L2 regularization or ridge regression is unsuitable for model discovery; L1 regularization or lasso promotes sparsity, but induces strong bias that may aggressively change the results; only L0 regularization allows us to transparently fine-tune the trade-off between interpretability and predictability, simplicity and accuracy, and bias and variance. With these insights, we demonstrate that Lp regularized constitutive neural networks can simultaneously discover both, interpretable models and physically meaningful parameters. We anticipate that our findings will generalize to alternative discovery techniques such as sparse and symbolic regression, and to other domains such as biology, chemistry, or medicine. Our ability to automatically discover material models from data could have tremendous applications in generative material design and open new opportunities to manipulate matter, alter properties of existing materials, and discover new materials with user-defined properties.
1 MOTIVATION
The ability to discover meaningful constitutive models from data would forever change how we understand, model, and design new materials and structures. Massive advancements in data science are now bringing us closer than ever towards this goal.1, 2 Throughout the past three years, numerous research groups have begun to harness the potential of neural networks and fit constitutive models to experimental data,3-14 an approach that is now widely known as constitutive neural networks.15 While initial studies have used neural networks exclusively as black box regression operators,16 recent approaches are increasingly recognizing their potential to discover not only the model parameters, but also the model itself.17 The paradigm of automated model discovery was first formalized in the context of nonlinear dynamical systems more than a decade ago to discover Lagrangians and Hamiltonians for oscillators,18 pendula, biological processes,19 or turbulent fluid flows.20 It is now rapidly gaining popularity in the context of constitutive modeling,21 and several promising techniques have emerged to decipher constitutive relations between stresses and strains,11 and even integrate them automatically into a finite element analysis.22-24 These not only include constitutive neural networks,15 but also sparse regression,25 genetic programming in the form of symbolic regression,26 and variational system identification.27
The holy grail in automated model discovery is to identify generalizable and truly interpretable models with physically meaningful parameters.2, 28, 29 Ideally, we want to discover a concise, yet simple and interpretable model with only a few relevant terms that best explains experimental data, while remaining robust to outliers and noise. In terms of statistical learning, this translates model discovery into a subset selection or feature extraction task. Subset selection and shrinkage methods are by no means new; in fact, they have been extensively studied for many decades.30-33 In the context of linear regression, these methods have become standard textbook knowledge.34 In the context of nonlinear regression, when analytical solutions are rare, subset selection is much more nuanced, general recommendations are difficult, and feature extraction becomes highly problem-specific.35 To be clear, this limitation is not exclusively inherent to automated model discovery with constitutive neural networks—it applies to distilling scientific knowledge from data in general.19 This includes alternative model discovery approaches like sparse regression,20, 25 or symbolic regression.19, 26, 36 The key question to the success of discovering new knowledge from data is: how do we robustly discover the best interpretable model with a small subset of relevant terms? And, probably equally importantly: What is the trade-off between interpretability and prediction accuracy? To frame these questions more broadly, let us first revisit the notions of regression and neural networks in the context of constitutive modeling:
Regression. Regression is a statistical method to examine the relationship between a dependent variable, in constitutive modeling in solid mechanics the stress , and one or more independent variables, in this case the strain , using a model that depends on a set of model parameters . Here regression has two main objectives: characterizing the form and strength of the relationship between stress and strain to enable predictions, and providing insights into how stress and strain are correlated.34 Popular types of regression are logistic regression, assuming for example a binary relationship; linear regression,37 assuming a relationship that is linear in the model parameters , or nonlinear regression, assuming a relationship that is nonlinear in the model parameters , as we do throughout this manuscript. Regression is the cornerstone of statistical learning.35 It provides tools to decipher relationships within data, but its application to constitutive modeling requires attention to physical constraints including objectivity, symmetry, incompressibility, polyconvexity, or thermodynamic consistency.38-43 As a natural consequence, we cannot just use any set of functions to build our constitutive model: while polynomial functions between stresses and strains associated with a linear regression would be ideal from an optimization point of view, these models may violate thermodynamic constraints, which favor exponential or power functions associated with a nonlinear regression.
Linear regression. Linear regression37 seeks to model the relationship between a dependent variable, in our case the stress , and a set of one or more independent variables, in our case the set of strains at different load levels , using a function that depends linearly on the model parameter , in this case the elastic modulus or Young's modulus. The regression estimates this parameter by minimizing the difference between the predicted stress, , for given strains and stiffness , and the experimentally measured stresses , divided by the number of data points . A common measure for this difference is the mean squared error based on the -norm,34 , for which the minimization problem becomes
()
When phrased as least square's problems, provided linear regression problems have a convex objective function with a unique global minimum. For our example (1), we can find it by evaluating the vanishing derivative, . Here, the minimization problem is not only linear in the model parameter , but also in the dependent variable , and we obtain an explicit solution for the elastic modulus, . For linear regression with multiple model parameters , for which the minimization problem is a linear function in the dependent variables , we obtain a coupled system of equations, , with an explicit solution for the parameter vector . For linear regression with one or multiple model parameters , for which the minimization problem is a nonlinear function in the dependent variables , we obtain a similar set of one or more equations , which may require an iterative solution for the parameter vector . Importantly, any regression that is linear in the model parameters —independent of whether it is linear, polynomial, or generally nonlinear in the independent variables —is considered a linear regression problem that, when phrased as a least square's problem with appropriate data, results in a convex quadratic function in the model parameters with a unique global minimum.
Nonlinear regression. Nonlinear regression44 seeks to model the relationship between a dependent variable, in our case the Piola stress , and a set of one or more independent variables, in our case the set of deformation gradients at different load levels , using a function that depends nonlinearly on a set of model parameters .45 The regression estimates these parameters by minimizing the difference between the predicted stress, , for given deformation gradients and model parameters , and the experimentally measured stresses , divided by the number of data points . Similar to linear regression, we can measure for this difference as the mean squared error based on the -norm,34 , for which the minimization problem becomes
()
In general, nonlinear regression problems have a non-convex objective function with multiple local minima. Solving non-convex optimization problems requires iterative algorithms that are at risk of converging to a local minimum instead of the global minimum, and their solution is often highly sensitive to the initial conditions that we select for the parameter vector. Depending on the nature of the problem, the solution we find may involve a large and dense parameter vector , and overfitting may occur when the number of parameters is larger than the number of data points, . Notably, even for many data points, we may face overfitting when the data are noisy or not rich enough to sufficiently activate all the parameters. For example, with tension and compression tests alone, we cannot estimate model parameters for shear.
Sparse regression. Sparse regression is a special type of regression that seeks to prevent overfitting by inducing sparsity in the parameter vector by setting a large number of parameters to zero.33 Sparse regression is particularly useful in high-dimensional settings, since it generates models with a small subset of non-zero parameters,34 which tends to make the model more interpretable.35 Historically, the need for sparse regression emerged prominently with the advent of high-dimensional datasets for which the number of parameters can easily exceed the number of independent observations.46 A prominent example is SINDy, an algorithm for sparse identification in nonlinear dynamics that promotes sparsity through sequential thresholded least-squares by iterating between a partial least-squares fit and a thresholding step to sequentially drop the least relevant terms of a model.20 Importantly, while these sparsification algorithms converge well in linear regression associated with convex objective functions,47 their convergence is no longer guaranteed in nonlinear regression with non-convex objective functions. The advantages of sparse regression are improved interpretability
by reducing the parameter set to only a few non-zero terms; feature selection by identifying the most relevant terms; and reduced risk of overfitting by promoting simpler models. These advantages come at a price: the disadvantages of sparse regression are selection bias by enforcing sparsity of the parameter estimates; additional hyperparameters that need to be tuned and require additional attention; and risk of misspecification by excluding relevant parameters if sparsity is enforced too aggressively. In conclusion, sparse regression offers a powerful toolset for high-dimensional modeling, but introduces a trade-off between interpretability and prediction accuracy.
Neural networks. Neural networks are a class of models and algorithms that can approximate a wide range of functions.48 Their versatility not only makes them a powerful tool for classification, reinforcement learning, and generative tasks, but also for regression problems,49 in our context, for regression in constitutive modeling.16 Neural networks consist of input, hidden, and output layers with several nodes in each layer. Their parameters are the network weights , where is the number of hidden layers and is the number of nodes per layer. During training, neural networks effectively perform a regression as they learn their parameters by minimizing a loss function that penalizes the error between model and data. Similar to the classical nonlinear regression in Equation (2), we can characterize this error as the mean squared error, the -norm of the difference between the stress predicted by the model and the experimentally measured stress , divided by the number of data points to train the model,
()
However, in contrast to traditional regression tools that have a fixed functional form, neural networks can easily adapt their shape which allows them to model complex functions,5 either linear50 or nonlinear29 in the model parameters . The advantages of neural networks are their universal approximation that allows them to approximate any
continuous function for a sufficiently large number of weights,48 and their inherent flexibility that allows them to model high-dimensional nonlinear relationships like the constitutive behavior we seek to model here. Their disadvantages are their computational complexity, especially for densely connected architectures with multiple hidden layers; risk of overfitting sparse or noisy data; and lack of interpretability that generally worsens with the number of layers and makes plain neural networks unsuitable for model discovery tasks.
Sparse neural networks. Sparse neural networks use a special type of network architecture for which a large number of weights are zero. This reduces the number of active connections between the nodes of consecutive layers. Sparsity can be induced during training by using special algorithms, or after training by pruning.51 The concepts of sparse neural networks and weight or node pruning—inspired by brain development and synaptic pruning—have gained increasing attention with the rise of deep learning and the need for computational efficiency.52 The advantages of sparse neural networks are their computational efficiency and faster inference times; reduced risk of overfitting by promoting smaller model sizes; and their potential for regularization and improved generalization. The disadvantages are increased training complexity to induce sparsity; and risk of decreased performance through overly aggressive sparsification or pruning.
The objective of this review is to explore neural networks for which sparsity is induced by a hybrid approach of two combined strategies: regularization and physical constraints. Towards this goal, first, we review popular subset selection and shrinkage methods to induce sparsity within the general framework of regularization in Section 2. Then, we discuss how to leverage physical constraints to induce sparsity within the framework of constitutive neural networks in Section 3. Finally, we propose a hybrid approach of regularized constitutive neural networks for automated model discovery and discuss the interpretability and prediction accuracy of their discovered models by means of a library of illustrative examples in Section 4. We conclude by comparing the different approaches and providing guidelines and recommendations in Section 5.
2 L REGULARIZATION
The concept of regularization or bridge regression was first introduced three decades ago in the context of chemometrics with the goal to shrink the parameter space in chemical data analysis.30 The method has re-gained attention as a powerful tool to promote sparsity in system identification,20 and, most recently, in discovering constitutive models from data.25 regularization is a generalized regularization technique that uses the norm of the parameter vector , the sum of the th power of the norm of its coefficients , raised to the inverse power, . Bridge regression constrains the regression (2) by constraining this norm to be smaller than a non-negative constant ,
()
It proves convenient to reformulate this constrained regression problem as a penalized regression problem, by penalizing the regression (2) with a penalty term that consists of the norm , multiplied by a non-negative penalty parameter ,
()
The constrained and penalized regression problems (4) and (5) are equivalent, which implies that for a given , there exists an such that the two problems share the same solution .53 The flexible power not only allows us to recover classical regularization techniques like or regularization as special cases, but also to interpolate smoothly between these different methods.30 The advantages of regularization are its inherent flexibility by allowing for a continuum of popular regularization techniques for varying powers ; and its potential to effectively promote sparsity. Its disadvantages are the added complexity associated with the selection of hyperparameters, specifically the penalty parameter and the power ; and the potential computational challenges associated with specific choices for . Figure 1 illustrates the contours of the regularization term, for varying powers as , evaluated for two parameters, and . The top row illustrates the effect of powers smaller than or equal to one, ; the bottom row illustrates the effect powers larger than one, . In the following, we highlight the most popular special cases of regularization, their history, and their advantages and disadvantages.
Lp regularization. Contours of regularization term, with , for varying powers, , evaluated for two parameters, and . For , top row, with the special case of regularization or lasso represented through the pyramid, regularization promotes sparsity by setting some weights exactly to zero, but is no longer strictly convex and can have multiple local minima. For , bottom row, with the special case of regularization or ridge regression represented through the ellipsoid, regularization promotes stability by reducing outliers, while the regularization term remains convex.
Lregularization or ridge regression. regularization, commonly known to as ridge regression, was introduced more than half a century ago to address multicollinearity in regression analysis,31 and has gained attention for its ability to stabilize parameter estimates, especially when the parameters are closely correlated.34 It uses the norm of a vector, the Euclidian norm, the sum of the vector components squared, . Notably, the norm does not weigh all entries of the vector equally. Instead, it squares the vector entries which makes it highly sensitive to outliers as it penalizes the squared magnitude of the individual parameters . Ridge regression supplements the regression (2) with a penalty term that consists of the norm multiplied by a penalty parameter ,
()
Its advantages are stability in multicollinearity by offering stable parameter estimates even in the presence of highly correlated predictors; managing outliers and preventing overfitting by quadratically penalizing extreme coefficients; and computational efficiency even for large datasets. Its disadvantages are introducing bias, which may result in underestimating certain coefficients and effects; and its inability to induce sparsity, which makes it unsuitable for our current focus of subset selection. Figure 1 shows that, for the special case of regularization, the regularization term adopts a convex ellipsoidal shape that promotes stability by reducing outliers.
Lregularization or lasso. regularization was initially introduced as a method to analyze seismograms in geophysics almost four decades ago,32 and has become widely known under the name of lasso, short for least absolute shrinkage and selection operator.33 Lasso has become popular for producing interpretable models,4 while exhibiting the same stability properties as ridge regression. It uses the norm of a vector, the sum of the absolute values of its components, . Because of its similarities with a distance between city blocks, the norm is often referred to as the Manhattan distance or taxicab norm. Notably, the norm weighs all entries of the vector equally and is therefore less sensitive to outliers than the norm. Lasso supplements the regression (2) with a penalty term that consists of the norm multiplied by a penalty parameter ,
()
Its advantages are enabling feature selection and inducing sparsity by reducing some weights exactly to zero which effectively reduces model complexity and improves interpretability; mitigating overfitting by constraining the magnitude of the weights, which is especially important when data are limited or high-dimensional; and providing predictive insights by identifying the most relevant weights.34 Its disadvantages are introducing bias, which may result in underestimating certain coefficients and effects; and focusing on selective effects while discarding others, especially in nuanced multi-effect situations when the weights are closely correlated. Figure 1 shows that, for the special case of regularization, the regularization term adopts a non-strictly-convex pyramid shape that promotes sparsity by reducing some weights exactly to zero.
Lregularization or elastic net. regularization, also known as elastic net, is a hybrid approach that seeks to combine the benefits of both and regularization.54 The elastic net supplements the regression (2) with two penalty terms in terms of the and norms multiplied by two independent penalty parameters and ,
()
For and , it recovers the classical ridge regression and lasso as special cases. For and , regularization shares many features with regularization with and generates contours similar, but not identical to Figure 1, bottom left. However, in contrast to regularization with ,30 regularization not only promotes stability, but also induces sparsity while remaining convex.34 Its disadvantage is its added computational complexity. Since regularization is not
a special case of the regularization family, we will not consider it further throughout this study.
Lregularization. regularization with powers has become a popular tool in subset selection since it promotes sparsity more aggressively than simple regularization. While norms are traditionally defined for powers larger than one, , the concept of applying powers smaller than one, , was introduced more than three decades ago in sparse regression of large systems.30 Notably, for powers smaller than one, the penalty term becomes non-convex and is no longer a norm in the classical sense. For the special case of , the penalty term uses the sum of the square roots of the vector components, , and we can easily see that this construct no longer satisfies the triangle inequality. regularization supplements the regression (2) with a penalty term that consists of this term, multiplied by a penalty parameter ,
()
The advantages of regularization, or more generally of regularization with powers smaller than one, are enhanced sparsity potentially leading to more parsimonious models; and subset selection especially in high-dimensional datasets.55 Its disadvantages are its computational complexity induced by its non-convex nature; and its multiple local minima
making subset selection complex and non-unique. Figure 1 shows that, for the special cases of , , and regularization, the regularization term adopts an increasingly non-convex shape and promotes sparsity by more aggressively setting weights equal to zero.
Lregularization or subset selection. regularization, is a form of subset selection that imposes a penalty on the number of non-zero parameters in a regression model. The origins of selecting a subset of relevant parameters date back to early efforts in regression modeling with the objective to discover parsimonious models with enhanced interpretation and prediction. However, formalizing this idea as an penalty method and connecting it rigorously to regularization has emerged more prominently with the advent of high-dimensional datasets.30 The norm is commonly referred to as sparse norm and is not a norm in a strict mathematical sense. It refers to the pseudo-norm, , where is the indicator function that is one if the condition inside the parenthesis is true and zero otherwise. As such, the norm counts the number of non-zero entries in a vector, which implies that this approach directly penalizes model complexity in terms of predictor inclusion. Notably, the norm is an explicit, discrete measure of sparsity. It is robust to outliers since it only counts the number of non-zero elements in the parameter vector and does not express preference for smaller or larger entries. regularization supplements the regression (2) with a penalty term that consists of the norm, multiplied by a penalty parameter ,
()
Its advantages are its conceptual simplicity by providing a direct mechanism for subset selection that directly penalizes non-zero parameters; reduced overfitting by promoting fewer non-zero parameters, in particular when data are limited; and enhanced model interpretability by focusing only on the relevant terms. Its disadvantages are its computational complexity that results from turning continuous model selection into an NP hard discrete combinatorial problem with possible parameter combinations, making it computationally intractable for problems with large parameter sets; its non-convexity
induced by the penalty term that leads to optimization challenges related to several local minima; and increased instability by discovering models for which slight changes in the data can result in an entirely different parameter set. In the contour plot of Figure 1, regularization would correspond to two discrete planes along the two parameter axes.
Predictability and interpretability. regularization is an intricate balance between predictability and interpretability: For powers larger than one, , regularization can improve predictability, increase robustness, prevent overfitting, and enhance generalization to new data by penalizing outliers and reducing extreme coefficients. For powers equal to or smaller than one, , regularization, can improve interpretability, promote simpler models, and identify the most influential predictors by encouraging sparsity and forcing some coefficients exactly to zero. Taken together, regularization is a trade-off between interpretability and predictability, between simplicity and accuracy, and between bias and variance. Two hyperparameters, the power and the regularization strength , allow us to fine-tuning of this balance. Throughout this manuscript, we will provide a library of systematic examples that illustrate the sensitivity of regularization with respect to these two hyperparameters.
3 NEURAL NETWORKS
In this study, we adopt the concept of neural networks to perform regression in constitutive modeling with the objective to improve both predictability and interpretability. To demonstrate that our strategy generalizes to different types of neural networks, we compare two constitutive neural networks that have recently become popular in the context of automated model discovery.29, 50 Both networks are sparse neural networks by design, where sparsity is inspired by the underlying physics of hyperelasticity. In their input layer, they use characteristic features of the deformation gradient to a priori satisfy the kinematic constraint of material objectivity, and acknowledge a characteristic isotropic and incompressible material behavior by satisfying the constraints of material symmetry and incompressibility.15 In their output layer, they learn a free energy function from which they derive the nominal stress to a priori satisfy the dissipation inequality and, with it, thermodynamic consistency.10 In their hidden layers, both networks use a special set of custom-designed activation functions to a priori satisfy physical constraints3 and a particular network architecture to satisfy polyconvexity.9, 12
Invariant based and principal stretch based neural networks. We explore two types of neural networks that use different types of activation functions to represent two different classes of constitutive models: invariant and principal stretch based.56, 57 Both networks are generalizations of popular constitutive models that include widely used hyperelastic models as special cases.58-61 They are interpretable by design and their weights translate into physically meaningful parameters with physical units and a physical interpretation.17 Yet, there is a major difference between both models: the invariant model uses different functional forms for each activation function and results in a nonlinear regression problem, whereas the principal stretch model uses the same functional form with different but fixed exponents and results in a linear regression problem. This has critical implications on the convexity of the objective function, and with it, on the nature of the solution.
Data. We train our networks on both synthetic and real data from tension, compression, and shear tests. For the synthetic data, we generate stretch stress pairs for fixed parameters through forward simulation. Specifically, we calculate stresses over a wide range of tensile stretches, compressive stretches, and shear strains in ten equidistant increments, resulting in three data sets with eleven stretch-stress pairs each. For the real data, we extract stretch stress pairs from our previously published human brain experiments on 5 mm gray matter tissue cubes.62, 63 Specifically, we use the reported stresses averaged over tensile, compressive, and shear experiments, in sixteen equidistant increments, resulting in three data sets with seventeen stretch-stress pairs each.29 All data are available on our GitHub repository, https://github.com/LivingMatterLab/CANN.
3.1 Invariant based neural network
Invariant based constitutive neural networks take the deformation gradient as input and predict the free energy function as output from which we calculate the stress . From the deformation gradient, they extract a set of invariants, in our example and , and feed them into its two hidden layers.29, 64 The first layer generates the powers of the invariants, in our example the first and second and , and the second layer subjects these powers to specific functions, in our example to the identity and exponential and . The free energy function is a sum of the resulting eight terms. Figure 2 illustrates the invariant based constitutive neural network with the eight functional building blocks highlighted in color, where the hot red colors relate to the first invariant and the cold blue colors to the second. During training, the network autonomously discovers the best model, out of possible combinations of terms, and simultaneously learns its model parameters . It minimizes the loss function (3), the difference between the stress predicted by the model and the experimentally measured stress , divided by the number of data points used for training ,
()
To ensure thermodynamic consistency, the network does not learn the stress directly,16 but rather derives it from the free energy function.15 For our example in Figure 2, free energy function takes the following explicit representation,
()
with the following derivatives with respect to the invariants and ,
()
Using the second law of thermodynamics, we can derive the Piola stress, , as thermodynamically conjugate to the deformation gradient ,56
()
where the term ensures perfect incompressibility in terms of the pressure that we determine from the boundary conditions. For the network free energy (12), the Piola stress is
()
Notably, the Piola stress of the invariant based network (15) is a nonlinear function in the network weights , which translates the loss function (11) into a nonlinear regression problem, with possibly multiple local minima. For this particular format, one of the first two weights of each row becomes redundant, and we can reduce the set of network parameters to twelve, . We train our invariant based network with tension, compression, and shear data and rewrite the loss function (11) in terms of two contributions that minimize the error between the normal and shear stresses predicted by the model, and , and the data, and , where and denote the different stretch and shear levels and ,
()
To explore the effect of scaling of the three individual stress terms, alternatively, we weigh all three experiments equally, and also train the network by minimizing the error between the normalized tensile, compressive, and shear stresses predicted by the model , , and , and the data , , and , normalized by the maximum recorded tensile, compressive, and shear stresses, , , and , where , , and denote the different stretch and shear levels and ,
()
Below, we briefly derive the explicit analytical expressions for the Piola stresses in uniaxial tension and compression and in simple shear, such that the tensile stress is for , the compressive stress is for , and the shear stress is for all .
Invariant based neural network for automated model discovery. The network takes the deformation gradient as input and outputs the free energy function from which we calculate the stress . The network is invariant based, it first calculates the invariants and , and feeds them into its two hidden layers. The first layer generates the first and second powers and of the invariants and the second layer applies the identity and exponential function and to these powers. The free energy function is a function of the eight color-coded terms. During training, the network discovers the best model, of possible combinations of terms, to explain the experimental data .
Uniaxial tension and compression. For the special case of uniaxial tension and compression in terms of the stretch , with and and , the invariants take the following form,
()
Using Equation (14) and the zero normal stress condition, , we obtain the following expression for the uniaxial stress stretch relation,
()
which translates into the following explicit expression between our network stress and the uniaxial stretch ,
()
Simple shear. For the special case of simple shear, in terms of the shear , with and and , the invariants take the following form,
()
Using Equation (14), we obtain the following shear stress stretch relation,
()
which translates into the following explicit expression between our network shear stress and the shear strain ,
()
Figure 3 illustrates the contours of the loss function for all possible two-term models of the invariant based network in Figure 2. By combining any two terms of the model and setting all other weights equal to zero, we can generate 28 possible models. For these 28 combinations of two terms, we evaluate two versions of the loss function, non-normalized from
Equation (16) and normalized from Equation (17), using the invariant based definitions of the normal stress (20) and shear stress (23). First, we generate synthetic data, and , for tensile stretches of , compressive stretches of , and shear strains of , in ten equidistant increments each, assuming an exact solution with fixed weights of the first layer, and , and a pair of non-zero weights of the second layer, and , while fixing the remaining six weights of the second layer equal to zero. This results in 28 training data sets of eleven stretch-stress pairs each, for tension, compression, and shear. Second, we vary the two non-zero network weights in the ranges and . For each pair of weights, we evaluate the normal and shear stresses and using Equations (20) and (23), and extract the tensile, compressive, and shear stresses, , , and . Third, we evaluate the non-normalized loss function (16) as the mean squared error between the model stresses and and the synthetically generated data stresses and , and plot its contours for each pair of weights and in the lower triangle. Next, we evaluate the normalized loss function (17) as the normalized mean squared error between the model stresses , , and , and the synthetically generated data stresses , , and , and plot its contours for each pair of weights and in the upper triangle.
Loss functions of invariant based neural network. Contours of the loss function for all 28 possible two-term models of the invariant based constitutive neural network in Figure 2. The loss function is evaluated across tensile stretches , compressive stretches , and shear strains , with network weights in the ranges and . The minimum of the loss function indicates the exact solution and , represented through the white circle. The lower triangle illustrates the non-normalized loss function (16), the upper triangle illustrates the normalized loss function (17). All loss functions are convex, with contours varying from ellipsoids to valleys with long ridges, highlighting the collinearity of some and pairs.
Each loss function takes a minimum of for the exact solution, and , indicated through the white circles. From this minimum, both versions of the loss function, non-normalized and normalized, increase with both, decreasing and increasing weights and , and remain convex within the entire domain, for all 28 two-term models. Yet, the contours of the loss function vary significantly for different pairs of weights and , indicating its sensitivity with respect to the individual terms: some pairs of weights generate loss functions of ellipsoidal shape, for example, the , , , pairs in the normalized upper triangle, suggesting that in the studied stretch and shear range, these terms are non-collinear, and would represent a rich base for a potential constitutive model. Other pairs of weights generate loss functions with long ridges parallel to the parameter axes, for example, the , , , pairs in the non-normalized lower triangle, suggesting that these terms are almost collinear
and not well suited as an independent base for a constitutive model. On average, the normalized pairs in the upper triangle seem to generate more convex loss functions than the non-normalized pairs in the lower triangle, suggesting that normalization helps to generate more convex loss functions, a richer functional base, and a more robust solution overall. Notably, the model in the first row and fifth column and the model in the fifth row and first column combine the linear terms in the first and second invariants, and , and represent the popular Mooney Rivlin model for rubber-like materials.59, 60 Overall, this simple example only illustrates the 28 two-term models out of a total set of all 256 possible models, only considers a limited stretch and shear range and , and only screens a narrow window of parameter ranges . Even within these limitations, the contours of loss functions are rather difficult to interpret, making it difficult to comprehend the full potential of the entire network, even though it only consists of eight distinct terms.
3.2 Principal stretch based neural network
Principal stretch based constitutive neural networks take the deformation gradient as input and predict the free energy function as output from which we calculate the stress . From the deformation gradient, they extract the principal stretches, , , and , and feed them into the hidden layer.50, 65 The hidden layer applies eight different exponents to these stretches. The free energy function is a sum of the resulting eight terms. Figure 4 illustrates the principal stretch based constitutive neural network with the eight functional building blocks highlighted in color, where the dark red and green terms are identical to the dark red and green terms of the invariant based network in Figure 2, while the other six terms are different. During training, the network autonomously discovers the best model, out of possible combinations of terms, and simultaneously learns its model parameters . It minimizes the loss function (3), the difference between the stress predicted by the model and the experimentally measured stress , divided by the number of data points used for training ,
()
The free energy of the principal stretch based model takes the following explicit representation,57, 66 , where the individual weights correspond to the shear moduli divided by the exponents . For the eight-term model in Figure 4, we fix these exponents to , such that the free energy becomes a sum of the following terms,
()
and its derivatives with respect to the principal stretches takes the following form,
()
Using the second law of thermodynamics, we can derive the Piola stress, , as thermodynamically conjugate to the deformation gradient ,57
()
where and are the eigenvectors in the undeformed and deformed configurations, and the term ensures perfect incompressibilty in terms of the pressure that we determine from the boundary conditions. For the network free energy (25), the Piola stress is
()
parameterized in terms of eight network weights, . Notably, the Piola stress of the principal stretch based network (28) with fixed exponents is a linear function in the network weights , which translates the loss function (24) into a linear regression problem, with a single unique global minimum.67 We train our principal stretch based network with tension, compression, and shear data and rewrite the loss function (24) in terms of two contributions that minimize the error between the tensile and compressive stresses predicted by the model and the data , and between the shear stresses predicted by the model and the data , where we include data from different stretch levels and different shear levels ,
()
For comparison, similar to the invariant based network in the Section 3.1, we also train the network by minimizing the error between the normalized tensile, compressive, and shear stresses predicted by the model , , and , and the data , , and , normalized by the maximum recorded tensile, compressive, and shear stresses, , , and , where , , and denote the different stretch and shear levels and ,
()
Below, we briefly derive the explicit analytical expressions for the Piola stresses in uniaxial tension and compression and in simple shear, such that the tensile stress is for , the compressive stress is for , and the shear stress is for all .
Principal stretch based neural network for automated model discovery. The network takes the deformation gradient as input and outputs the free energy function from which we calculate the stress . The network is principal stretch based, it first calculates the principal stretches and and , and feeds them into its hidden layer. The hidden layer applies eight different exponents to these principal stretches. The free energy function is a function of the eight color-coded terms. During training, the network discovers the best model, of possible combinations of terms, to explain the experimental data .
Uniaxial tension and compression. For the special case of uniaxial tension and compression in terms of the stretch , the principal stretches are
()
Using Equation (27) and the zero normal stress condition, , we obtain the following expression for the uniaxial stress stretch relation,
()
which translates into the following explicit expression between our network stress and the uniaxial stretch ,
()
Simple shear. For the special case of simple shear, in terms of the shear , we obtain the principal stretches,
()
Using Equation (27), we obtain the following expression for the shear stress stretch relation,
()
which translates into the following explicit expression between our network shear stress and shear strain ,
()
Figure 5 illustrates the contours of the loss function for all possible two-term models of the principal stretch based network in Figure 4. By combining any two terms of the model and setting all other second-layer weights equal to zero, we can generate 28 possible models. For these 28 combinations of two terms, we evaluate two versions of the loss function, non-normalized from Equation (29) and normalized from Equation (30), following the method described in Section 3.1, but now using the principal stretch based definitions of the normal stress (33) and shear stress (36). Similar to Section 3.1, we plot the non-normalized loss function in the lower triangle and the normalized
loss function in the upper triangle. Again, by design, all loss functions take a minimum of for the exact solution, and , indicated through the white circles. From this minimum, both versions of the loss function, non-normalized and normalized, increase with both decreasing and increasing weights and , and remain convex within the entire domain, for all 28 two-term models. Notably, in contrast to the loss function contours of the invariant based model in Figure 3, the contours of the principal stretch based model in Figure 5 display less variation for different pairs of weights and : Only a few pairs of weights generate loss functions of ellipsoidal shape, for example, the , pairs in the non-normalized lower triangle, or the , , pairs in the normalized upper and non-normalized lower triangles, suggesting that, in the studied stretch and shear range, only a few pairs of terms are non-collinear and would represent a solid base for a potential constitutive model. Most pairs of weights generate loss functions with long ridges parallel to the parameter axes, suggesting that many terms are almost collinear and not well suited as a functional base for a constitutive model. In contrast to the invariant based network, normalization does not seem to fix this issue, both the upper and lower triangle display this collinearity. Notably, the model in the first row and fifth column and the model in the fifth row and first column combine the positive and negative second powers of the principal stretches, and , and represent the popular Mooney Rivlin model,59, 60 which is identical to the and models of the invariant based model in Figure 3. Overall, while these contours are difficult to interpret, we can compare them directly to Figure 3 and realize that, within the studied stretch and shear range and , and parameter window , the invariant based network seems to represent a much broader spectrum of functions than the principal stretch based network for which the functional base seems to be generally more narrow and almost collinear. We also note that the loss function is highly sensitive to normalization: For both networks, the normalized loss functions (17) and (30) tend to generate more convex shapes than the non-normalized loss functions (16) and (29), which is why we will focus on the maximum-stress normalized loss functions (17) and (30) in all following examples.
Loss functions of principal stretch based neural network. Contours of the loss function for all 28 possible two-term models of the principal stretch based constitutive neural network in Figure 4. The loss function is evaluated across tensile stretches , compressive stretches , and shear strains , with network weights in the ranges and . The minimum of the loss function indicates the exact solution and , represented through the white circle. The lower triangle illustrates the non-normalized loss function (29), the upper triangle illustrates the normalized loss function (30). All loss functions are convex, with contours varying from a few ellipsoids to many valleys with long ridges, highlighting the collinearity of many and pairs.
4 REGULARIZED NEURAL NETWORKS
We now integrate the concepts of regularization from Section 2 and constitutive neural network modeling from Section 3 and explore the resulting regression in view of predictability and interpretability. Specifically, we supplement the loss function of the constitutive neural network with a penalty term of type,
()
The loss function minimizes the error between the model stress that we derive from the free energy of the neural network, , and the experimentally measured stress divided by the number of data points , penalized by the norm, , of the parameter vector made up of the network weights , multiplied by the penalty parameter . Specifically, we use tension, compression, and shear data and specify the stress error as the normalized difference between the tensile, compressive, and shear stresses predicted by the neural network, , , and , and the data, , , and , at , , and stretch and shear levels and ,
()
In the following, we systematically explore the sensitivity of the loss function (38) with respect to the two hyperparameters of the regularization, the power and the penalty parameter . For illustrative purposes, we first focus on a simplified two-term model, the Mooney Rivlin model that is shared between both neural networks, before we explore both regularized complete eight-term networks.
Lregularized Mooney Rivlin model. The Mooney Rivlin model59, 60 is a two-term constitutive model that is located right at the intersection of the invariant based neural network in Figure 2 and the principal stretch based network in Figure 4. Notably, it is the only model, for which both networks coincide. It uses the dark red term, , and the green term, , of both neural networks, and weighs them by the network weights, and , while all other network weights are identical to zero,
()
This implies that the activation of any other weight will make the invariant and principal stretch based networks drift away from one another. The Mooney Rivlin model in Equation (39) includes the one-term dark red Neo Hooke model61 with and the one-term green Blatz Ko model58 with as special cases. For the Mooney Rivlin model, the regularized loss function from Equation (38) specifies to
()
with the Mooney Rivlin stresses in tension, for , and compression, for , from Equations (20) and (33), and in shear, for all . from Equations (23) and (36),
()
Notably, the uniaxial stress and shear stress of the Mooney Rivlin model (41) are linear functions in the network weights and , which translates the neural network loss, , of the loss function (40) into a linear regression problem, with a single unique global minimum.
Figure 6 illustrates the contours of the regularized loss function for the two-term Mooney Rivlin model, with varying powers, , evaluated for the two parameters and , using synthetic data from tension, compression, and shear tests. The loss function consists of the neural network loss, , illustrated in the first row and fifth column of Figures 3 and 5; supplemented by the regularization, , illustrated in Figure 1. For all eight graphs in Figure 6, we evaluate the loss function (40) using the Mooney Rivlin stresses (41). First, we generate synthetic data, , , for tensile stretches of , compressive stretches of , and shear strains of , in ten equidistant increments each, assuming an exact solution with and , while fixing the remaining weights equal to zero. This results in the training data sets of eleven stretch-stress pairs for tension, compression, and shear. Second, we vary the two Mooney Rivlin network weights in the ranges and , and evaluate the tensile, compressive, and shear model stresses, , , using Equation (41). Third, we evaluate the loss function (40) as the normalized mean squared error between the model stresses , , and the synthetically generated data stresses , , , supplemented by the regularization for the eight different powers, . As these powers increase by two orders of magnitude, fixing the second hyperparameter to one and the same value for all eight examples would increasingly emphasize the regularization over minimizing the actual network loss, and generate increasingly biased results. Instead, for each power , we select the penalty parameter such that the maximum value of the loss function within the screened parameter window, in the dark red upper right corner, at and , consists of equal contributions by the network term and the regularization term. This results in eight different penalty parameters, . For each set of hyperparameters , we increase the penalty parameter in four increments, indicated through the four hyperplanes in each graph. Similar to the non-regularized loss functions of the invariant and principal stretch based networks in Figures 3 and 5, we highlight the minimum of the last of these four loss functions through a white sphere. Importantly, in contrast to the non-regularized loss functions in Figures 3 and 5, the regularized loss function in Figure 6 no longer has a minimum of at and . Instead, the minimum of the loss function and its location in the -space are now functions for the two hyperparameters and . For the eight powers and penalty parameters we used in this example, the minima of the loss function at the location of the white sphere become , and their varying locations in the -space are indicated through the white spheres in Figure 6.
Loss functions of Lp regularized Mooney Rivlin model for synthetic data. Contours of the regularized loss function, , for the two-term Mooney Rivlin model with varying powers, , evaluated for the two parameters, and , for synthetic data from tension, compression, and shear tests. For , top row, with the special case of regularization or lasso in the fourth column, regularization promotes sparsity by training exactly to zero, but the loss function is no longer strictly convex and has multiple local minima. For , bottom row, with the special case of regularization or ridge regression in the second column, regularization promotes stability, retains both non-weights, and , and maintains a convex loss function with a single global minimum.
Figure 6 reveals several interesting features of the regularized Mooney Rivlin model: most notably, the regularized loss function is highly sensitive to the power and varies significantly for below and above one, as we conclude from the different shapes in the first and second rows. For , in the top row, with the special case of regularization or lasso in the fourth column, regularization promotes sparsity by training one of the weights exactly to zero, in this case , while the other weight remains positive, . Importantly, for , the loss function is no longer convex and has two local minima, one at and one at . Notably, for a too small power, for example, for , we observe a drastic regularization with sharp-contoured gradients towards the parameter planes, and the model loses robustness. For , in the bottom row, with the special case of regularization or ridge regression in the second column, regularization promotes stability and retains both non-zero weights, and . The loss function remains convex with a single global minimum. Increasing the penalty parameter amplifies these effects and moves the regularized minimum further away from the non-regularized minimum. Taken together, while a regularization across a continuous spectrum of powers provides a lot of flexibility, the discovered weights and are highly sensitive to the selection of the two hyperparameters and : while the power acts as a switch between sparsity and robustness, the penalty parameter induces a trade-off between regularization and bias.
Figure 7 illustrates the contours of the regularized loss function for the two-term Mooney Rivlin model, with varying powers, , and penalty parameters, , evaluated for the two parameters, and , using synthetic data from tension, compression, and shear tests. For all 64 contour plots, we evaluate the normalized loss function (40) following the method of Figure 6, but now by varying both hyperparameters, and . Without regularization, left column, with , all eight contour plots are identical to the non-regularized Mooney Rivlin loss function in the first row and fifth column of Figures 3 and 5. Its minimum is identical to the exact solution, and , represented through the white circles. With infinite regularization, right column, with , all eight contour plots are a two-dimensional projection of the regularization contours in Figure 1. For , in the four top rows, with increasing , from left to right, the loss function gradually loses strict convexity, the minimum first moves towards and , and then towards and . For , in the four bottom rows, with increasing , from left to right, the loss function always remains convex, both weights always remain active, and , and move closer together as the minimum gradually moves towards zero, and .
Loss functions of Lp regularized Mooney Rivlin model for synthetic data. Contours of the regularized loss function, , for the two-term Mooney Rivlin model with varying powers, , and penalty parameters , evaluated for the two parameters, and , for synthetic data from tension, compression, and shear tests. Without regularization, left column, with , the minimum of the loss function is identical to the exact solution and , represented through the white circle. With infinite regularization, right column, with , the loss function is identical to the regularization term and the contours are identical to Figure 1. For , with increasing , the loss function gradually loses convexity, the minimum first moves towards and , and then towards and . For , with increasing , the loss function always remains convex, both weights always remain active, and , as the minimum moves gradually towards and .
Figure 7 confirms our observations from Figure 6 and provides additional insights into the regularized Mooney Rivlin model: the regularized loss function is highly sensitive to both hyperparameters, and . Decreasing the power to or below one, , increases interpretability by promoting sparsity as a subset of weights become exactly zero; smaller powers and larger penalty parameters promote sparsity more drastically and generate increasingly less convex loss functions. Increasing the power above one, , increases predictability by promoting robustness as the loss function becomes increasingly convex; larger powers and larger penalty parameters promote robustness more drastically and generate increasingly more convex loss functions. These observations confirm the general notion that regularization is an intricate balance between predictability and interpretability and between regularization and bias that requires a careful selection of the appropriate values for the hyperparameters and .
Figure 8 illustrates the contours of the regularized loss function for the two-term Mooney Rivlin model, with varying powers , evaluated for the two parameters and , but now using real data from tension, compression, and shear tests of human brain.62 For all eight graphs, we evaluate the loss function (40) as the normalized mean squared error between the model , , , and the data , , , for tensile stretches of , compressive stretches of , and shear strains of , in 16 equidistant increments each,29 for varying Mooney Rivlin network weights in the ranges and , and apply regularization, , for eight different powers, . We select a penalty parameter , such that the maximum value of the loss function within the screened parameter window, in the dark red upper right corner, at and , consists of equal contributions by the network term and the regularization term. For each set of hyperparameters , we increase the penalty parameter in four increments, , indicated through the four hyperplanes in each graph, and highlight the minimum of the last of these four loss functions through a white sphere. Similar to Figure 6 based on synthetic data, the minimum of the loss function and its location in the -space are functions for the two hyperparameters and . For the plain non-regularized loss function, the minimum of the loss function is 0.0713 and its weights are and . For the eight powers, the minima of the loss function at the location of the white sphere are , and their varying locations in the -space are indicated through the white spheres in Figure 8.
Loss functions of Lp regularized Mooney Rivlin model for real data. Contours of the regularized loss function, , for the two-term Mooney Rivlin model with varying powers, , evaluated for the two parameters, and , for real data from tension, compression, and shear tests of human brain. For , top row, with the special case of regularization or lasso in the fourth column, regularization promotes sparsity by training exactly to zero, but the loss function is no longer strictly convex and has multiple local minima. For , bottom row, with the special case of regularization or ridge regression in the second column, regularization promotes stability, retains both non-weights, and , and maintains a convex loss function with a single global minimum.
Figure 8 reveals several interesting differences between the loss function for synthetic data in Figure 6 and for real data, in this case from human brain experiments, in Figure 8. Most importantly, for the synthetic data, we assumed an exact minimum at and , where the loss function is exactly zero for the non--regularized model, and takes the value of the regularization term otherwise. For the real data, we no longer know a priori where the exact minimum is and it is no longer exactly zero, since the Mooney Rivlin model is not exact for the real data. From screening the parameter plane, find the minimum loss at and . Strikingly, this suggests that the one-parameter Blatz Ko model58 with and is better suited to describe the experimental data than the two-parameter Mooney Rivlin59, 60 model. However, we can clearly see the negative effect of over-regularization with too large penalty parameters : for , in the top row, the minimum of the loss function remains on the axis, but the Blatz Ko parameter is drastically reduced from its non-regularized value of to , , and even . For , in the bottom row, the minimum of the loss function even moves away from the axis, and both parameters become activated at a similar magnitude between and . Taken together, the discovered weights and are highly sensitive to over-regularization for extreme ranges of the hyperparameters and : extreme penalty parameters induce increased bias as the loss function increasingly focuses on minimizing the penalty term rather than the regularization problem itself.
4.1 L regularized invariant based neural network
Similar to the previous example, we explore the effects of regularization with respect to the two hyperparameters and , but now for the full eight-term invariant based network,29 instead of the two-term Mooney Rivlin model,59, 60 and for training on real instead of synthetic data. We use tension, compression, and shear data from human brain tests,62 over a tensile range of , a compressive range of , and a shear range of , sampled in 16 equidistant increments each, averaged over anywhere between and specimen.29 We train the invariant based neural network from Figure 2 in Section 3.1 and minimize the loss function from Equation (38) with the stress definitions (20) and (23) with three different powers, , and four different penalty parameters, . We use the Adam optimizer, a robust adaptive algorithm for stochastic gradient-based first-order optimization.68
Figure 9 summarizes our four discovered models in terms of the nominal stress as a function of stretch or shear strain, with the penalty parameter increasing from left to right. The circles represent the experimental data.62 The color-coded regions represent the stress contributions of the eight model terms according to Figure 2. The coefficients of determination quantify the goodness of fit. Overall, the regularized invariant based network trains solidly and provides a good fit of the data. Without regularization, in the left column, the network discovers four non-zero terms, all in terms of the second invariant, , indicated through cold green-to-blue colors,
with stiffness-like parameters kPa, kPa, kPa, and kPa and exponential weights, and , and its stress takes the following form, . As the penalty parameter increases, from left to right, the number of non-zero terms decreases. With a penalty parameter , in the third column, the network discovers three non-zero terms, all in terms of the second invariant, , indicated through cold green-to-light-blue colors,
with stiffness-like parameters kPa, kPa, and kPa, and the exponential weight, , and its stress takes the following form, . For the largest penalty parameter, in the right column, the network discovers a single non-zero term, the turquoise linear exponential term of the second invariant,
with the stiffness-like parameter kPa and the exponential weight , and its stress takes the following form, . While Figure 9 provides great visual insights into the performance of regularization with varying penalty parameters, it only represents a snapshot of model discovery in the eight-dimensional parameter space of the network. Subset selection and model discovery are not only sensitive to the initialization of the parameter vector , but also to the stochastic nature of the Adam optimizer. This implies that different runs may produce different results. This raises the question how reproducible and robust the results in Figure 9 are for varying initial conditions and training runs.
Discovered models of L1 regularized invariant based network. Nominal stress as a function of stretch or shear strain for the invariant based neural network with regularization for varying penalty parameters , trained with human gray matter tension, compression, and shear data. Circles represent the experimental data. Color-coded regions represent the discovered model terms. Coefficients of determination indicate the goodness of fit.
Figure 10 summarizes the discovered weights for the invariant based network with regularization for varying powers and penalty parameters . For all twelve combinations of the two hyperparameters, we perform a total of training runs each, with varying initial conditions for the network weights , such that each of the four models in Figure 9 is the result of one of the regularized training runs in the middle row. The colored boxes in Figure 10 indicate the relevance of the eight model terms, with their means and standard deviations. Interestingly, the and regularizations in the first and second rows perform qualitatively similarly: they both start with four dominant terms, all in terms of the second invariant. Except for a small number of outliers, they both converge to two dominant one-term models, the green and the turquoise models, while all other weights train to zero. The fact that both networks alternate between these two terms is a result of the non-convex nature of the underlying nonlinear regression problem associated with the invariant based network and indicates the existence of multiple local minima. Instead, the regularization in the bottom row converges to a model that consistently trains the dark red and red terms to zero, and maintains six non-zero terms, of which the yellow , turquoise , and dark blue terms are dominant.
Discovered models of Lp regularized invariant based network. Distribution of discovered weights for the invariant based neural network with regularization for varying powers and penalty parameters . Colored boxes indicate the relevance of the eight model terms, with means and standard deviations from realizations with varying initializations of the network weights.
Figure 11 summarizes the convergence of the regularized invariant based network in terms of the goodness of fit and number of terms, for varying powers and penalty parameters . Red dots indicate the coefficient of determination , blue dots indicate the number of terms, with means and standard deviations from realizations. A known shortcoming of the regularization is that it introduces bias and moves the solution away from the minimum of the network loss towards the minimum of the regularization loss. This is particularly critical for our network in which all weights have a different meaning and potentially also a different magnitude. To quantify the effects of this potential limitation, Figure 11 compares the non-normalized regularization, , in terms of the weights that we have used throughout this study against a normalized regularization, , in terms of the -normalized weights . Here, are the weights of the one-term models from the diagonal in Table 1.
Convergence of Lp regularized invariant based network. Goodness of fit and number of terms for the invariant based neural network with regularization for varying powers and penalty parameters . Top row uses a non-normalized regularization in terms of the weights , bottom row uses a normalized regularization in terms of the -normalized weights . Red dots indicate the coefficient of determination , blue dots indicate the number of terms, with means and standard deviations from realizations.
TABLE 1.
L regularized invariant based neural network.
w
w, w
w
w, w
w
w, w
w
w, w
w
0.796
0.237
0.918, 0.600
0.400
10.048
0.403
3.666, 2.718
0.000
0.840
0.000
0.957, 0.865
0.330
12.545
0.330
3.810, 3.286
Loss
0.092 +
0.090 + 2
0.060 + 2
0.060 + 2
0.071 + 2
0.069 + 2
0.040 + 2
0.040 + 2
w,w
0.918, 0.600
0.237
1.076, 0.727
0.980, 0.410
9.811
1.219, 0.329
4.167, 2.342
0.369, 0.000
0.841
0.558, 0.000
3.186, 0.250
1.095, 0.294
12.547
0.822, 0.407
4.089, 2.993
Loss
0.090 + 2
0.089 +
0.060 + 2
0.060 + 2
0.071 + 2
0.063 + 2
0.040 + 2
0.040 + 2
w
10.048
0.400
9.811
0.980, 0.410
18.348
9.388
3.011, 2.977
8.151
0.507
8.173
0.876, 0.569
8.216
10.916
8.178
3.193, 3.422
Loss
0.060 + 2
0.060 + 2
0.086 +
0.086 + 2
0.049 + 2
0.049 + 2
0.069 + 2
0.070
w, w
3.666, 2.718
0.403
4.167, 2.342
1.219, 0.329
3.011, 2.977
9.388
4.305, 4.244
3.134, 2.672
0.497
3.014, 2.669
1.266, 0.394
3.210, 2.559
10.896
2.973, 2.737
3.149, 3.477
Loss
0.060 + 2
0.060 + 2
0.086 + 2
0.086 +
0.049 + 2
0.048 + 2
0.070 + 2
0.070 + 2
w
0.840
0.000
0.841
0.369, 0.000
0.507
8.151
0.497
3.134, 2.672
0.840
0.234
0.747, 0.803
0.406
11.178
0.411
3.644, 3.030
Loss
0.071 + 2
0.071 + 2
0.049 + 2
0.049 + 2
0.071 +
0.070 + 2
0.033 + 2
0.033 + 2
w,w
0.957, 0.865
0.000
3.186, 0.250
0.558, 0.000
0.876, 0.569
8.173
1.266, 0.394
3.014, 2.669
0.747, 0.803
0.234
0.898, 0.923
1.022, 0.396
11.014
0.839, 0.484
3.427, 3.208
Loss
0.069 + 2
0.063 + 2
0.049 + 2
0.048 + 2
0.070 + 2
0.069 +
0.033 + 2
0.033 + 2
w
12.545
0.330
12.547
1.095, 0.294
10.916
8.216
10.896
3.210, 2.559
11.178
0.406
11.014
1.022, 0.396
19.599
9.520
3.565, 2.834
Loss
0.040 + 2
0.040 + 2
0.069 + 2
0.070 + 2
0.033 + 2
0.033 + 2
0.059 +
0.059 + 2
w, w
3.810, 3.286
0.330
4.089, 2.993
0.822, 0.407
3.193, 3.422
8.178
3.149, 3.477
2.973, 2.737
3.644, 3.030
0.411
3.427, 3.208
0.839, 0.484
3.565, 2.834
9.520
4.560, 4.280
Loss
0.040 + 2
0.040 + 2
0.070 + 2
0.070 + 2
0.033 + 2
0.033 + 2
0.059 + 2
0.060 +
Note: Weights and remaining losses of the one- and two-term models of the regularized invariant based neural network. The diagonal summarizes the discovered one-term models penalized by , the off-diagonal the two-term models penalized by . Best-in-class models are highlighted in bold.
Figure 11 confirms that regularization is a trade-off between error and complexity, or similarly, between the goodness of fit and the number of terms. While the and regularizations behave qualitatively similarly and promote sparsity by reducing the number of non-zero terms to one, the promotes robustness by maintaining a large subset of six non-zero terms. The regularization is less aggressive than the regularization and requires larger penalty parameters to achieve a similar sparseness, which could induce a larger bias, away from minimum of the network loss towards the minimum of the regularization loss. Normalizing the penalty term by using the -normalized weights instead of the non-normalized weights accelerates the positive effects of regularization, especially in the small-penalty-parameter regime, and could provide a viable solution to reduce regularization-induced bias. Ultimately, in the large-penalty-parameter regime, the non-normalized and normalized regularizations converge towards a similar goodness of fit and number of terms.
Taken together, our results confirm the general notion that regularization increases interpretability for powers equal to or below one, , by promoting sparsity as a subset of weights train exactly to zero; and increases predictability for powers larger than one, , by promoting robustness as a unique subset of weights emerges as dominant. Larger penalty parameters amplify these trends at the price of an increased bias, which we can reduce, at least in part, by normalizing the network weights in the the penalty term.
Lregularized invariant based neural network. For comparison, we explore the effects of regularization using the same tension, compression, and shear data from human brain tests as in the previous example.62 We train the invariant based neural network from Figure 2 in Section 3.1 and minimize the loss function from Equation (38) with the stress definitions (20) and (23), but now use a penalty term, , with the norm , to penalize the total number of non-zero terms in the model. In essence, regularization turns network training into a discrete combinatorial problem with possible models, 8 with a single term, 28 with two, 56 with three, 70 with four, 56 with five, 28 with six, 8 with seven, and 1 with all eight terms. For illustrative purposes, we focus on the eight one-term and 28 two-term models that follow by explicitly setting the other seven and six terms of the network to zero.
Table 1 summarizes the weights and remaining losses of the one- and two-term models of the regularized invariant based neural network. The diagonal summarizes the discovered one-term models, the off-diagonal the two-term models. The regularization penalizes the one-term models by and the two-term models by . The boldface cells highlight the four best-in-class models of each category. Figures 12 and 13 illustrate the stress-stretch and stress-shear plots of these four one- and two-term models. Interestingly, all best-in-class models are models in terms of the second invariant indicated through the cold green-to-blue colors. None of the eight best models includes the first invariant indicated through the warm red-to-yellow colors. This finding contradicts the common practice of using primarily the first invariant, for example, in popular and widely used neo Hooke model. Strikingly, the classical Hooke model61 represented through the dark red term in both networks with , a stiffness-like parameter kPa, a shear modulus kPa, and a remaining loss of has the largest remaining loss and performs the worst of all one-term models. Similarly, the Demiray model69 represented through the red term with , a stiffness-like parameter kPa, an exponent , and a remaining loss of , and the Holzapfel type model70 represented through the yellow term with , a stiffness-like parameter kPa, an exponent , and a remaining loss of , also perform worse than all one-term second-invariant models. Yet, these results agree well with our previous observations that the second invariant is better suited to represent the behavior of brain tissue than the first invariant.29 The best-in-class one-term model with the lowest remaining loss is the model with the light blue quadratic term of the second invariant,
with the stiffness-like parameter kPa for which the stress takes the following form, . The best-in-class two-term model is the model with the turquoise linear exponential and the dark blue quadratic exponential terms of the second invariant,
with the stiffness-like parameters kPa and kPa, and the exponential weights and , for which the stress takes the following form, . For this simple example, the remaining loss of the best one-term model is and the remaining loss of the best two-term model is . This implies that, for penalty parameters , the regularization would favor the two-term model, while for penalty parameters , the regularization would favor the one-term model. These simple considerations highlight the importance of the penalty parameter , which explicitly acts as a discrete switch between the number of terms we want to include in our model.
Discovered best-in-class one-term models of Lregularized invariant based network. Nominal stress as a function of stretch or shear strain for the invariant based constitutive neural network with regularization, trained with human gray matter tension, compression, and shear data. Circles represent the experimental data. Color-coded regions represent the discovered model terms. Coefficients of determination indicate the goodness of fit for each individual test; remaining loss indicates the quality of the overall fit.
Discovered best-in-class two-term models of Lregularized invariant based network. Nominal stress as a function of stretch or shear strain for the invariant based constitutive neural network with regularization, trained with human gray matter tension, compression, and shear data. Circles represent the experimental data. Color-coded regions represent the discovered model terms. Coefficients of determination indicate the goodness of fit for each individual test; remaining loss indicates the quality of the overall fit.
Taken together, this example illustrates the discrete nature of the regularization as a discrete combinatorial problem that becomes increasingly expensive as the number of model terms increases. Our results emphasize the sensitivity of the regularization with respect to the penalty parameter and highlight the trade-off between bias and variance: increasing the penalty parameter increases bias, reduces variance, and decreases model complexity as the total number of non-zero terms decreases towards one.
4.2 Lp regularized principal stretch based neural network
Similar to the previous example, we explore the effects of regularization with respect to the two hyperparameters and , but now for the full principal stretch based network with all eight terms,29 for training on tension, compression, and shear data from human brain tests.62 We train the principal based neural network from Figure 4 in Section 3.2 and minimize the loss function from Equation (38) with the stress definitions (33) and (36) with three different powers, , and four different penalty parameters, using the Adam optimizer.68
Figure 14 summarizes our four discovered models in terms of the nominal stress as a function of stretch or shear strain, with the penalty parameter increasing from left to right. The circles represent the experimental data.62 The color-coded regions represent the stress contributions of the eight model terms according to Figure 4. The coefficients of determination quantify the goodness of fit. Similar to the invariant based network in Section 4.1, the regularized principal stretch based network trains solidly and provides a good fit of the data. Without regularization, in the left column, the network discovers all eight non-zero terms. As the penalty parameter increases, from left to right, the number of non-zero terms decreases. For the largest penalty parameter, in the right column, the network discovers a single dominant term, the dark blue term,
with a stiffness-like parameter kPa and a stress .
Discovered models of L1 regularized principal stretch based network. Nominal stress as a function of stretch or shear strain for the principal stretch based neural network with regularization for varying penalty parameters , trained with human gray matter tension, compression, and shear data. Circles represent the experimental data. Color-coded regions represent the stress contributions of the eight model terms. Coefficients of determination indicate the goodness of fit.
Figure 15 summarizes the discovered weights for the principal stretch based network with regularization for varying powers and penalty parameters . For all twelve combinations of the two hyperparameters, we perform a total of training runs each, with varying initial conditions for the network weights , such that each of the four models in Figure 14 is the result of one of the regularized training runs in the middle row. The colored boxes in Figure 15 indicate the relevance of the eight model terms, with their means and standard deviations. In contrast to the invariant based network in Section 4.1, the principal stretch based network consistently discovers similar terms across all three regularizations, with a clear preference for the dark blue term. The fact that all networks robustly discover similar terms is a result of the convex
nature of the underlying linear regression problem associated with the principal stretch based network and indicates the existence of a single unique global minimum. However, the and regularized networks gradually drop more non-zero terms as the penalty parameter increases, while the regularized network maintains all eight terms. As we had already anticipated from comparing Figures 3 and 5, the functional base of the principal stretch based network is more collinear than the base of the invariant based network, which result in a more gradual shift of the active weights, from towards , as the penalty parameter increases. The fact that all three regularizations converge to the boundary of our domain, the dark blue term with the minimum exponent of minus eight, suggests that the true best fit might lay outside the current parameter range, with even smaller exponents. This agrees well with previous studies50, 62 that have discovered one-term Ogden models with exponents of and .
Discovered models of Lp regularized principal stretch based network. Distribution of discovered weights for the principal stretch based neural network with regularization for varying powers and penalty parameters . Colored boxes indicate the relevance of the eight model terms, with means and standard deviations from realizations with varying initializations of the network weights.
Taken together, this example illustrates that model discovery with regularization generalizes well to different network types, irrespective of whether the terms are invariant or principal stretch based. Comparing both network types reveals that the method is sensitive to the nonlinear versus linear nature of the underlying regression problem: the nonlinear invariant based network alternates between different dominant terms associated with multiple local minima, while the linear principal stretch based network consistently discovers similar terms associated with a single unique global minimum.
5 CONCLUSION AND RECOMMENDATIONS
regularization is a powerful technology to finetune the training process of a neural network. In automated model discovery, it provides the critical missing piece of the puzzle that enables a controlled down-selection of the discovered terms and focus on the most important features of the model while putting less emphasis on minor effects. By promoting sparsity of the parameter vector, regularization inherently improves interpretability and provides valuable insights into the underlying nature of the data. Importantly, regularization introduces two hyperparameters: the power by which it penalizes the individual model parameters, and the penalty parameter by which it scales the relative importance of the regularization loss in comparison to the neural network loss. Both parameters enable a precise control of model discovery from data and it is crucial to understand their mathematical subtleties, computational implications, and physical effects. Here we reviewed the mathematics and computation of the most common representatives of the family, and demonstrated their features in terms of two classes of constitutive neural networks, invariant and principal stretch based, trained with both, synthetic and real data. Training with synthetic data proved to be robust and stable, and generally provides excellent metrics for quality control since we know the exact solution. However, it remains a toy problem that fails to reveal the true usefulness in practical real-world applications. Training with real data was algorithmically robust, but challenging, since we know nothing about the exact solution. Our study uses neural networks as a tool for linear and nonlinear regression. We acknowledge that our results can be interpreted and well understood without resorting neural networks and generalize naturally to other regression techniques including symbolic regression, genetic programming, or system identification.
For conciseness, we have limited the scope of the present review: first, we only considered small networks with no more than eight terms, but point out that the automated model discovery generalizes well to isotropic networks with 12 terms,29 transversely isotropic networks with 16 terms,64 two-fiber family networks with 16 terms,11 and orthotropic networks with 32 terms. Second, we trained on all available data and did not investigate splitting the data into train and test sets, which we have done in our previous work.29, 50 Third, we did not explicitly study the effects of controlled noise, but point out that Figures 8-15 are all based on real experimental data with real natural noise. Fourth, we did not further explore hybrid top down approaches like SINDy,20 since the nonlinear nature of our optimization problem does not guarantee that we easily find the global minimum,47 from which we could initiate a sequential thresholded least squares down-selection; finally, we have not yet investigated the effects of regularization on uncertainty quantification, something we are currently exploring in a separate Bayesian approach.
We would like to share the most important lessons we have learnt throughout this study:
Normalize first! We cannot overstate the importance of normalizing. Clearly, while normalizing is less of an issue in linear regression, it is critical in nonlinear regression. This holds for both the training data, illustrated in Figures 3 and 5, and the weights, illustrated in Figure 11. The loss function typically contains several terms of different magnitude that compete during minimization. It proves important to normalize by the number of data sets in each category, the magnitude of the tensile, compressive, and shear stresses, and the magnitude of the weights to balance the impact of the individual contributions.
Lregularization is the most honest member of the Lp family: regularization or subset selection is honest, transparent, and unbiased. Its penalty parameter acts as a direct switch to select the desired number of terms. It is the only member of the family that explicitly controls the balance between the number of terms and the goodness of fit, illustrated in Figure 11. While regularization across the entire network translates into an expensive NP hard discrete combinatorial problem of the order of , we recommend to begin any discovery by running an regularization for all possible one- and two-term models to determine the best-in-class models of each category and identify the dominant terms, similar to Figures 12 and 13 and Table 1. Importantly, in nonlinear regression, the best-in-class -term model may actually not be a subset of the best-in-class -term model, and successively removing terms like in iterative pruning51 or sequential thresholding least squares20 might not be a viable solution. Instead, running regularization for all possible one- and two-term models provides a quick first insight into the nature and hierarchy of the best-in-class models.22 From this initial first glimpse, we can proceed by successively adding terms. In addition, from the best-in-class one-term models, we can use the discovered weights to initialize the weights for higher order runs and to normalize the weights in the regularization term, .
Lregularization is powerful for subset selection, but needs large penalty parameters to be effective: regularization or lasso promotes sparsity by reducing a large subset of parameters exactly to zero. Notably, for all the examples in our study, this down-selection required quite large penalty parameters —often on the order of one-to work effectively. This not only affects the magnitude of the discovered parameters, but often also the discovered model itself. For example, for , the independent realizations of the regularization in Figure 10 alternate between the green and turquoise one-term models, while the unbiased plain regularization in Figure 12 and Table 1 ranks these two models clearly behind the blue and dark blue one-term models. To identify regularization-induced bias, we recommend to always compare the results of the regularization against the best-in-class low-term models of the regularization. This comparison is simple and inexpensive, and provides valuable insights into the magnitude of selection bias and the aggressiveness of the regularization.
Lregularization promotes sparsity for small penalty parameters, but suffers from multiple local minima: regularization addresses the shortcomings of the classical regularization by down-selecting more aggressively, requiring smaller penalty parameters, and introducing less bias. While regularization works well in practice, it is computationally challenging. Its non-convexity introduces multiple local minima, indicated through the first rows in Figures 6-8, and through the green and turquoise one-term models in Figure 10, and the blue and dark blue one-term models in Figure 15. To avoid getting stuck in a local minimum, we highly recommend exploring different initialization strategies for the network weights. Specifically, we were able to robustly identify multiple local minima by initializing the weights with the regularized weights . Alternatively, we could gradually ramp up the effect of regularization by starting with a penalty parameter and smoothly increase it to a desired strength, essentially by moving from left to right in Figure 7. For quality control, we recommend comparing the remaining loss of each converged run against the remaining regularized baseline loss as reported in Table 1.
Lregularization promotes stability, but is not suited for subset selection: regularization, L by design, is not suited to reduce a subset of terms exactly to zero. Instead, it maintains all terms as indicated in the bottom rows of Figures 10 and 15, each for runs. From Figures 6-8 we conclude that, for increasing penalty parameters , regularization reduces outliers by first bringing the weights closer together and then collectively reducing them toward zero. Clearly, regularization improves convexity, which makes model discovery more robust and more stable. However, it not only fails to down-select the number of terms, but also strongly biases the solution away from the minimum of the pure network loss towards the minimum of the regularization loss. We do not recommend using regularization, or any other member of the regularization family with powers larger than one, , to increase sparsity and improve interpretability in model discovery. Table 2 provides a side-by-side comparison of the regularizations we explored throughout this study along with their advantages, disadvantages, and references.
TABLE 2.
Lp regularization.
Algorithm
Regularization
Advantages
Disadvantages
L
Subset selection
•
Penalizes number of non-zero terms
•
Term count is inherently unbiased
•
Conceptually simple and honest
•
Promotes sparsity
•
Improves interpretability
•
Valuable insight for one or two terms
•
Solves discrete combinatorial problem
•
Results in NP hard problem
•
Computationally expensive,
•
But manageable for one term,
•
And manageable for two terms,
L
Compromise between and
•
Improved efficiency compared to
•
Improved sparsity compared to
•
Reduces some parameters exactly to zero
•
Works even for smaller and less bias
•
Non-convex, multiple local minima
•
Increased computational complexity
L
Lasso
least absolute shrinkage and selection operator
•
Weighs all components equally
•
Less sensitive to outliers than
•
Reduces some parameters exactly to zero
•
Promotes sparsity
•
Improves interpretability
•
Not strictly convex, local minima
•
Emphasizes selective effects
•
Introduces bias, inaccurate for large
L
Elastic net
compromise between L1 and L2
•
Improved stability compared to
•
Improved sparsity compared to
•
Increased computational complexity
L
Ridge regression
•
Uses components squared
•
Reduces outliers, improves predictability
•
Increases robustness
•
Promotes stability
•
Introduces bias
•
Moves parameters towards each other
•
Reduces but maintains all
parameters
•
Does not promote sparsity
Note: Comparison of special cases, advantages, disadvantages, and references.
Densifying instead of sparsifying: The closure problem is a common challenge in both fluid and solid mechanics. It refers to the difficulty of fully specifying the constitutive equations that relate stresses and strains and characterize the material behavior. In fluid mechanics, the closure problem is closely related to turbulence modeling, where it approximates intricate interactions between different scales, and can be well represented through polynomials.20 In solid mechanics, the closure problem characterizes complex material behaviors at the microscopic scale, and is traditionally often represented through a combination of polynomials,25exponentials,69, 70logarithms,71 and powers.57, 66 In the context of model discovery, assuming perfect data, polynomial models translate into a convex linear optimization problem with a single unique global minimum, while exponential, logarithmic, or power models translate into a non-convex nonlinear optimization problem with possibly multiple local minima. For convex discovery problems with a unique global minimum, inducing sparsity has been well established through a top down approach in which we first calculate a dense parameter vector at the global minimum, and then sparsify the parameter vector by sequentially thresholding and removing the least relevant terms.20, 25, 27, 47 For non-convex discovery problems with multiple local minima, this approach is infeasible since different initial conditions may result in different solutions with non-unique parameter vectors.67 Instead of trying to sparsify a dense parameter vector, we recommend to gradually densify the parameter vector from scratch. This bottom up approach iteratively solves the discrete combinatorial problem and densifies the parameter vector by sequentially adding the most relevant terms.72 Importantly, instead of solving the NP hard discrete combinatorial problem associated with a complete regularization that screens all possible combinations of terms at , we recommend to gradually add terms, starting with the best-in-class one-term model at , adding a second term at , and repeating addition until the incremental improvement of the overall loss function meets a user-defined convergence criterion. At most, this algorithm involves evaluations of the loss function to land on a fully populated dense parameter vector. Importantly, for non-convex model discovery problems, this algorithm—while cost effective and well-rationalized—is not guaranteed to converge to the global minimum. Instead of successively adding up to terms, for practical purposes, it is often sufficient to limit the number of desirable terms to one, two, three or four, and identify the best-in-class model of each class, which requires a discrete comparison of discrete models, in our case 8, 28, 56, or 70.22 Out of all possible discovery algorithms, this is the most honest, unbiased, and transparent approach.
Taken together, our study suggests that regularized constitutive neural networks are a powerful technology for automated model discovery that allows us to identify interpretable constitutive models from data. We anticipate that our results generalize to regularization for model discovery with other techniques such as symbolic regression or system identification, and, more broadly, to model discovery in other fields such as biology, chemistry, or medicine. The ability to discover new knowledge from data could have tremendous applications in generative material design where it could shape the path to manipulate matter, alter properties of existing materials, and discover new materials with targeted properties.
AUTHOR CONTRIBUTIONS
JAMC: Method development; simulation; data analysis; result interpretation; manuscript writing. SRSP: Method development; simulation; data analysis; result interpretation; manuscript writing. KL: Study design; method development; simulation; data analysis; result interpretation; manuscript writing. EK: Study design; method development; simulation; data analysis; result interpretation; manuscript writing.
ACKNOWLEDGMENTS
This work was supported by the National Science Foundation Graduate Research Fellowship to Jeremy McCulloch and Skyler St. Pierre, by the DAAD Fellowship to Kevin Linka, and by the NSF CMMI Award 2320933 Automated Model Discovery for Soft Matter to Ellen Kuhl.
CONFLICT OF INTEREST STATEMENT
The authors declare no potential conflict of interests.
The data that support the findings of this study are openly available in the Living Matter Lab GitHub repository at https://github.com/LivingMatterLab/CANN, References 29 and 50.
REFERENCES
1Alber M, Buganza Tepole A, Cannon W, et al. Integrating machine learning and multiscale modeling: perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ Digit Med. 2019; 2: 115.
3As'ad F, Avery P, Farhat C. A mechanics-informed artificial neural network approach in data-driven constitutive modeling. Int J Numer Methods Eng. 2022; 123: 2738-2759.
5Ghaderi A, Morovati V, Dargazany R. A physics-informed assembly for feed-forward neural network engines to predict inelasticity in cross-linked polymers. Polymers. 2020; 12: 2628.
11Peirlinck M, Linka K, Hurtado JA, Kuhl E. On automated model discovery and a universal material subroutine for hyperelastic materials. Comput Methods Appl Mech Eng. 2024; 418:116534.
14Wang LM, Linka K, Kuhl E. Automated model discovery for muscle using constitutive recurrent neural networks. J Mech Behav Biomed Mater. 2023; 145:106021.
15Linka K, Hillgartner M, Abdolazizi KP, Aydin RC, Itskov M, Cyron CJ. Constitutive artificial neural networks: a fast and general approach to predictive data-driven constitutive modeling by deep learning. J Comput Phys. 2021; 429:110010.
17Linka K, Kuhl E. A new family of constitutive artificial neural networks towards automated model discovery. Comput Methods Appl Mech Eng. 2023; 403:115731.
21Peng GCY, Alber M, Buganza Tepole A, et al. Multiscale modeling meets machine learning: what can we learn?Arch Comput Methods Eng. 2021; 28: 1017-1037.
22Peirlinck M, Linka K, Hurtado JA, Holzapfel GA, Kuhl E. Democratizing biomedical simulation through automated model discovery and a universal material subroutine. bioRxiv, 2023. doi:10.1101/2023.12.06.570487
26Abdusalamov R, Hillgartner M, Itskov M. Automatic generation of interpretable hyperelastic models by symbolic regression. Int J Numer Methods Eng. 2023; 124: 2093-2104.
27Wang Z, Estrada JB, Arruda EM, Garikipati K. Inference of deformation mechanisms and constitutive response of soft material surrogates of biological tissue by full-field characterization and data-driven variational system identification. J Mech Phys Solids. 2021; 153:104474.
28Flaschel M, Kumar S, De Lorenzis L. Automated discovery of generalized standard material models with EUCLID. Comput Methods Appl Mech Eng. 2023; 405:115867.
29Linka K, St Pierre SR, Kuhl E. Automated model discovery for human brain using constitutive artificial neural networks. Acta Biomater. 2023; 160: 134-151.
40Hartmann S. Parameter estimation of hyperelastic relations of generalized polynomial-type with constraint conditions. Int J Solids Struct. 2001; 38: 7999-8018.
41Linden L, Klein DK, Kalinka KA, Brummund J, Weeger O, Kästner M. Neural networks meet hyperelasticity: a guide to enforcing physics. J Mech Phys Solids. 2023; 179:105363.
46Champion K, Lusch B, Kutz N, Brunton SL. Data-driven discovery of coordinates and governing equations. Proc Natl Acad Sci USA. 2019; 116: 22445-22451.
50St Pierre SR, Linka K, Kuhl E. Principal-stretch-based constitutive neural networks autonomously discover a subclass of Ogden models for human brain tissue. Brain Multiphys. 2023; 4:100066.
51Han S, Pool J, Tran J, Dally WJ. Learning both weights and connections for efficient neural networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Vol 1. MIT Press; 2015: 1135-1143.
57Ogden RW. Large deformation isotropic elasticity—on the correlation of theory and experiment for incompressible rubberlike solids. Proc R Soc Lond A Math Phys Sci. 1972; 326: 565-584.
60Rivlin RS. Large elastic deformations of isotropic materials. IV. Further developments of the general theory. Philos Trans R Soc Lond A. 1948; 241: 379-397.
63Budday S, Ovaert TC, Holzapfel GA, Steinmann P, Kuhl E. Fifty shades of brain: a review on the material testing and modeling of brain tissue. Arch Comput Methods Eng. 2020; 27: 1187-1230.
64Linka K, Buganza Tepole A, Holzapfel GA, Kuhl E. Automated model discovery for skin: discovering the best model, data, and experiment. Comput Methods Appl Mech Eng. 2023; 410:116007.
65St Pierre SR, Rajasekharan D, Darwin EC, Linka K, Levenston ME, Kuhl E. Discovering the mechanics of artificial and real meat. Comput Methods Appl Mech Eng. 2023; 415:116236.
70Holzapfel GA, Gasser TC, Ogden RW. A new constitutive framework for arterial wall mechanics and comparative study of material models. J Elast. 2000; 61: 1-48.
72Nikolov DP, Srivastava S, Abeid BA, et al. Ogden material calibration via magnetic resonance cartography, parameter sensitivity and variational system identification. Philos Trans A Math Phys Eng Sci. 2022; 380:20210324.
Please check your email for instructions on resetting your password.
If you do not receive an email within 10 minutes, your email address may not be registered,
and you may need to create a new Wiley Online Library account.
Request Username
Can't sign in? Forgot your username?
Enter your email address below and we will send you your username
If the address matches an existing account you will receive an email with instructions to retrieve your username
The full text of this article hosted at iucr.org is unavailable due to technical difficulties.