Abstract

In this work, the efficient robust global optimization (ERGO) method is revisited with the aim of enhancing and expanding its existing capabilities. The original objective of ERGO was to address the computational challenges associated with optimization-under-uncertainty through the use of Bayesian optimization (BO). ERGO tackles robust optimization problems which are characterized by sensitivity in the objective function due to stochasticity in the design space. It does this by concurrently minimizing the mean and variance of the objective in a multi-objective setting. To handle the computational complexity arising from the uncertainty propagation, ERGO exploits the analytical expression of the surrogate model underlying BO. In this study, ERGO is extended to accommodate multiple objectives, incorporate an improved predictive error estimation approach, investigate the treatment of failed function evaluations, and explore the handling of stochastic parameters next to stochastic design variables. To evaluate the effectiveness of these improvements, the enhanced ERGO scheme is compared with the original method using an analytical test problem with varying dimensionality. Additionally, the novel optimization technique is applied to an aerodynamic design problem to validate its performance.

Graphical Abstract Figure
Graphical Abstract Figure
Close modal

1 Introduction

Driven by the goal of reducing the computational burden associated with conventional approaches to optimization-under-uncertainty (OUU), recent years have witnessed the emergence of various surrogate-assisted (SA) techniques aimed at addressing this challenge [13]. This paper focuses on addressing this computational challenge within the context of robust design optimization (RDO), which seeks to find designs that exhibit robustness against small variations in uncertain parameters [4]. More specifically, the performance of the efficient robust global optimization (ERGO) scheme is enhanced, which is positioned in the OUU research field below.

The existing SA-RDO approaches can be categorized in two ways: first based on the manner by which the surrogate is introduced and second on the formulation of the RDO problem. Persson and Ölvander divided the former in two based on whether the surrogate is updated during the optimization or not [5]. In reference to the latter, Kanno distinguished three RDO formulations: worst-case optimization (minimization of the maximum objective evaluation), discrepancy optimization (minimization of the difference between objective value evaluations), and variance minimization (minimization of the moments of the objective probability density functions) [6]. Persson and Ölvander’s study illustrated the outperforming ability of the actively updating approach [5]. Ryan et al. put a multi-objective (MO) reformulation of the variance minimization forward as the most competitive formulation of the RDO problem [7].

Among the various surrogate-modeling techniques available, Gaussian process interpolation (GPI), also known as Kriging, has received significant attention [8]. This can be attributed to GPI’s flexibility as a parameter-free method, as well as its capability to quantify prediction uncertainty. Bayesian optimization (BO) builds on GPI’s through the introduction of an acquisition functions that are typically formulated in such a way that the design space is explored and the objective is exploited. The efficient global optimization (EGO) framework is the most well known in which the acquisition function corresponds to the expected improvement (EI) [9,10].

The MO–BO approach to RDO was pursued by Keane in which the uncertainty propagation was performed by means of sampling the surrogate to obtain the quantities of interest and optimization these by means of NSGA-II [11,12]. This approach was also followed by Cheng et al. [13]. The approach of Ribaud et al. [14] relies on a Taylor expansion in order to propagate mean and variance through the surrogate. However, none of these methods make active use of the underlying model uncertainty to ensure objective space exploration and therefore, the end result might strongly depend on the initial set of function evaluation on which the surrogate is built. Most recently, the ERGO scheme was devised to account for this problem by formulating an acquisition function that actively leverages on the surrogate-induced uncertainty [15]. The ERGO scheme builds on the EGO algorithm and further increases the efficiency of solving RDO problems by performing the uncertainty propagation (UP) to assess the robustness measures without additional experiments or sampling. In this paper, the ERGO scheme is improved upon by permitting the treatment of multiple objectives, by improving upon the assessment of the predictive uncertainty of the stochastic measures, by treating function evaluation failures, and by treating the occurrence of stochastic parameters along with stochastic design variables. The contributions are summarized as follows:

  1. The original ERGO scheme relied on a reformulation of the Keane’s Euclidean EI [16] as the acquisition function that drives the selection of designs to be evaluated. However, the Euclidean EI has been proven to be sensitive to magnitude variations of the objectives and is furthermore limited to two objectives [17], corresponding to the mean and variance of a single function output. As such, the hypervolume EI is introduced in the ERGO framework to account for both shortcomings [18].

  2. The original ERGO scheme made use of a first-order second moment (FOSM) approach [19] to assess the predictive variance (epistemic uncertainty [20]) of the predictions of mean and variance of the objective due to input variability (aleatoric uncertainty [20]). This low-order approach drove the acquisition function to under emphasize certain regions in the objective space. To account for this shortcoming, a second-order second moment (SOSM) approach is introduced to assess the predictive variance.

  3. The original ERGO scheme did not treat the possible occurrence of objective function failures. The SAMURAI framework [21], which incorporated the ERGO scheme in the context of asynchronous reliability-based robust design optimization, accounted for this particularity by means of adding the prediction of failed evaluations to the training set, mimicking Forrester et al.’s missing-at-random (MAR) approach [22]. The introduction of the MAR approach in the original scheme can introduce excessive non-linearities in the objective space when the predictive uncertainty of the surrogate is substantial. This can lead to the prediction of non-existent multi-model objective spaces and regions with high predictive uncertainty. Consequently, the acquisition function may remain elevated throughout the optimization process. To account for this, a classifier is introduced to predict the feasible domain and guide the acquisition function, following the approach of Backov et al. [23].

  4. The original ERGO scheme only permitted stochastic design variables. As such, the impact of parameter variability on the objective could not be directly assessed. To account for this defect, the treatment of stochastic parameters is examined in the form of a two-step acquisition function optimization.

The remainder of this paper is organized as follows. In Sec. 2 of this paper, a brief overview of the notations used throughout this work is introduced. In Sec. 3, the ERGO framework is briefly summarized after which the three improvements upon the strategy are presented in Sec. 4, namely the novel acquisition function which relies on the hypervolume expected improvement, the higher-order predictive variance assessment of the stochastic objective measures, the treatment of crash constraints by means of Gaussian process classification, and the treatment of stochastic parameters. In Sec. 5, the aforementioned improvements are brought together making up ERGO-II and compared with the original scheme on an analytical test problem. In Sec. 6, the improved approach is examined on an aerodynamic test case inspired by the aerodynamic design optimization discussion group [24]. Concluding remarks and outlooks are formulated in Sec. 7.

2 Notation

2.1 Sub- and Superscripts.

xi refers to the ith sample of X (i=1,,n(X)) with n(X) the number of elements in set X, with X a proper subset of vector space X and X a subset of Rd(X): xXXRd(X). d(X) is used to denote the dimensionality of the vector space, such that x(i) refers to the ith component (i=1,,d(X)) of design vector x. Furthermore, with xi the vector x is denoted from which the ith component has been omitted.

The superscripts {}i and {}c are used to differentiate the interpolator from the classifier. Similarly, the subscripts {}x and {}u are used to differentiate the design variable, entities that influence the model that the engineer can change within a specified range during the optimization design study, and the operation parameters, entities that influence the model and might vary during operation, but can not be changed by the engineer.

2.2 Operators.

The concept of vectorizing a matrix, denoted as vec(A)AV, involves the process of stacking its columns. Similarly, the matricization of a tensor, expressed as mat(A)AM, entails the stacking of matrices. In the case of a rank 3 tensor, ARd×Rd×Rd, this operation yields mat(A)Rd×Rd2, while for a rank 4 tensor, ARd×Rd×Rd×Rd, it results in mat(A)Rd2×Rd2.

Furthermore, the use of the double dot product, denoted as “:”, serves to represent the Frobenius inner product: A:B=tr(AB), with tr signifying the trace operator of a matrix. Additionally, a noteworthy observation is made: AB:BA=tr((AB)2) holds true when both A and B are symmetric. The Hadamard product is aptly denoted by “”, and the notation “{}” is introduced to represent the Hadamard power. The Kronecker product is represented by “”. Moreover, an “outer power” of a vector is introduced as xxx=xx, providing a means to simplify notation. Similarly, an “outer power” of a matrix is defined as AAA=vec(A)vec(A). Based on these definitions, equivalent expressions are formulated, such as x:A=xAx=|xA2=tr(xA), which are employed interchangeably for the sake of brevity and clarity.

Finally, use is made of ϕ and Φ to respectively denote the standard normal and cumulative distribution function. The notation of both is eased out by introducing ϕj(x)ϕ(yj|μj(x),Σj(x)) in the context of Gaussian processes.

2.3 Fonts.

Epistemic uncertainty denotes non-deterministic behavior arising from a degree of ignorance or a deficiency in knowledge concerning the system or the environment and is depicted in this work using the calligraphic font, symbolized as y. In contrast, aleatory uncertainty pertains to the inherent variability linked to the physical system or environment under investigation, which is in this work represented by the Fraktur typeface, denoted as x [20].

3 Efficient Robust Global Optimization

The ERGO framework, upon which this paper builds, is a SA-RDO technique in which a MO–BO routine is combined with a second-order Taylor-based UP approach [15]. In the ERGO framework, the mean and variance of the objective function are minimized simultaneously through the formulation of an innovative acquisition function, referred to as the robust expected improvement (REI). The UP approach leverages the analytical formulation of the surrogate model to obtain a closed expression of the mean and variance and their respective predictive uncertainties. As such, no additional sampling is required, which further decreases the computational cost of solving the problem. The RDO problem solved by ERGO can formally be written as
(1)
where Ex and Vx respectively denote the expected value and variance operators over the stochastic design variables x, which are treated as aleatoric uncertainties. The RDO problem translates itself to finding the Pareto front PPar{Ex,Vx}[y]{Ex[y*],Vx[y*]}:{Ex[y*],Vx[y*]}{Ex[Y],Vx[Y]} with Y the objective space: y*YR. Furthermore, it is assumed that the input variability is normally distributed xN(x,Φx), such that x can be fully described by its mean x and variance Φx.
Making a prediction for a noisy input corresponds to the integration of the epistemic predictive distribution of the Gaussian process p(y(x)|Si), created from Si={X,Y} with Y the objective matrix defined by Y(i,j)=yi(xj) and aleatory input distribution p(x|x,Φx) leading to mixed epistemic and aleatory distribution p(y(x)|Si,x,Φx). Evaluation of the mean and variance over the aleatoric variability leads to again epistemic stochastic processes.
(2a)
(2b)
as presented in Ref. [25].

A closed-form expression of yE and yV is not readily available, therefore approximations are resorted to. In the original framework, Girard et al.’s approach is followed for the universal Kriging interpolator [15], where the trends are obtained by means of Isserlis’ theorem and the processes are estimated through the assumption that they are again Gaussian, appropriately referred to as the Gaussian approximation [25]. Furthermore, the covariance is locally approximated by a second-order polynomial.

The more widely used ordinary GPI is presented here, in which the trend function f is assumed constant. In this case, the approximation becomes mathematically equivalent to locally approximating the predictive mean of the objective with a second-order Taylor function. In case of normally distributed input noise, the uncertainty propagation becomes analytically tractable and corresponds to the second-order first moment (SOFM) and SOSM approach for respectively the predictive means of the mean and variance of the objective due to input variability. These values are now fully defined by the predictive mean μy, Jacobian μy and Hessian μΔy of the stochastic process with deterministic input and the mean x and covariance Φx of the stochastic input.
(3a)
(3b)
(3c)
as derived in Ref. [15].
To ensure exploration during the optimization process, the uncertainty in the prediction of the afore-derived stochastic metrics needs to be known. In a similar fashion as above, a Taylor-series based error propagation technique is employed. The original approach made use of a first-order curve and neglected the influence of covariance terms.
(4a)
(4b)
as derived in Ref. [15].
By means of these metric, a novel acquisition function can be formulated, namely the REI. The REI builds on Keane’s Euclidean EI [15,16]. In this context, the Pareto front P is redefined as the Pareto front of the aleatoric mean and variance as predicted by the surrogate in the evaluated infills: PPar[μ{E,V}(X|Φx)].
(5)
where the first term denotes the probability of finding the infill point outside of the current predicted Pareto front and the second term denoting the distance of the centroid of the infill in the objective space to the nearest point on the predicted Pareto front. ps, qEs, and qVs correspond to the sum over three vector functions
Algorithm 1

Require: Evaluated sampling plan Si={X,y}

 1: While computational budget is not exhausted do

 2:  θ*argmaxθLi(θ|Si)   ▷Eq.(A2)

 3:  PPar[μ{E,V}(X|θ*,Φx)]

 4:  z*maxxREIEu(x|θ*,P)   ▷Eq.(5)

 5:  Ifz*>kREIthen

 6:   x*argmaxxREIEu(x|θ*,P)

 7:   y*y(x*)

 8:   SiSi{x*,y*}

 9:  else

10:   break

11: X*argParx[μ{E,V}(X|θ*,Φx)]

with i=1,,p1 and p=n(P). Furthermore, ΦE(i) and ΩE(i) are introduced as
as proposed in Ref. [15].

Plugged in EGO (cf.  Appendix) the ERGO framework is obtained (Algorithm 1). Notice in Algorithm 1: line 3 that θ* is added as an argument. This substitution is made to indicate that a maximum likelihood estimation (MLE) approach is adopted instead of a fully Bayesian approach to evaluate the interpolator. Furthermore, kREI is introduced as an additional constant that serves as a stopping criterion: if no further significant improvement can be made, stop (break) the optimization.

4 Improved Efficient Robust Global Optimization

4.1 Robust Hypervolume Expected Improvement.

The initial ERGO scheme depends on adapting Keane’s Euclidean EI [16] as the acquisition function guiding the selection of designs for evaluation. Nevertheless, the Euclidean EI has been shown to be responsive to changes in the magnitudes of objectives and is also constrained to handling only two objectives [17], representing the mean and variance of a singular function output. Consequently, the ERGO framework introduces the hypervolume EI to address both of these limitations [18].

The current standard to assess the Pareto set quality is the hypervolume indicator H(P), which corresponds to the Lebesque measure of the hyperspace dominated by the Pareto front bounded by a reference point in the objective space r{yjmax+ϵ|j=1d(Y)}. The indicator can be used to define a scalar improvement function Ihv(x|P,r) which measures the contribution (or improvement) of the point y(x) to the Pareto set P. Identical to the single-objective case, the stochastic nature of the interpolator can be used to assess the uncertainty in the prediction and evaluated to define the expected hypervolume improvement Ey[Ihv(x|Φx,P,r)]. To realize this, the exclusive hypervolume indicator He(x|P,r)H(Py(x)|r)H(P|r) is introduced, such that
(6a)
(6b)
Extensive research has been conducted toward the analytical evaluation of this integral. In this work, the approach by Couckuyt et al. is built upon [18], in which Ey[Ihv(x|P,r)] is written in closed form as the sum of improvement contributions of disjoint cells K={κi|i=1,,n(K)} with κi={[κ_i(j),κ¯i(j)]|j=1,,d(Y)} making up the non-dominated region IC(K|x). The overline and underline are introduced to respectively denote the lower and upper bounds of the cell, such that κi contains the dimensions of the cell.
(7)
The improvement contribution of each cell κi is by itself the sum over the disjoint cells K of the product over the dimensionality of the objective space of a function J
(8)
depending on the relative position of κi versus κj: if the cell that contains the prediction dominates the cell in the dimension considered, J1 is invokes. When this domination is only partial, J2 is called. In all other cases, the contribution is zero. This can be mathematically presented as
(9)
The mathematical derivation of J1 and J2 can be found in Couckuyt et al. [18] and is given below. The superscript (l) has been omitted from J and κ for clarity.
(10a)
(10b)
To calculate the statistical functions, the integral bounds of the region that improves the Pareto set need to be identified. However, this region is often irregularly shaped, making analytical calculation complex for a higher number of objective functions. Instead, Couckuyt et al. proposed a branch-and-bound approach. The algorithm constructs a pseudo Pareto set by adding two extra points representing the ideal and anti-ideal points. The objective space is divided into hyperrectangles, and a binary partitioning strategy is used to determine the smallest set of hyperrectangles that dominate or augment the Pareto set. The process involves testing a single point within each hyperrectangle and recursively partitioning the space until all hyperrectangles are accepted or rejected based on a condition function. Once the integral bounds are identified, the statistical criteria can be evaluated [18]. This is schematically presented in Fig. 1.
Fig. 1
Schematic representation of the partitioning of the area outside of the Pareto front (bottom-left) of two contesting objectives in disjoint cells. The light and dark areas (bottom-left) respectively correspond to the area where the Pareto front can be improved and the exclusive hypervolume for infill y(x) [18].
Fig. 1
Schematic representation of the partitioning of the area outside of the Pareto front (bottom-left) of two contesting objectives in disjoint cells. The light and dark areas (bottom-left) respectively correspond to the area where the Pareto front can be improved and the exclusive hypervolume for infill y(x) [18].
Close modal
When the expected value and variance of the objective functions are used to drive the expected hypervolume improvement the REI is obtained
(11)
As indicated by Wagner et al., the activation function that drives the Bayesian optimization routine should decrease to zero on evaluated points [17]. This typically corresponds to the predictive variance decreasing to zero in evaluated points. However, the predictive (epistemic) variances of the (aleatoric) mean and variance do not disappear in the infills due to contribution of the predictive variance of the Jacobian and Hessian. To avoid reevaluation of infills and stagnation of the optimizer an activation function A(x|X) is introduced and multiplied with the acquisition function. The activation function takes on the form of a Gaussian radial basis function, which is reformulated such that it is equal to zero in evaluated points and continuously increases to one in unevaluated regions.
(12a)
(12b)
with the weighting coefficient w empirically defined to be equal to (NSi+50)2.

4.2 Improved Epistemic Uncertainty Propagation.

The original ERGO scheme made use of an FOSM approach to assess ΣE and ΣV. This low order approach drove the acquisition function to under emphasize certain regions in the objective space. To account for this shortcoming, an SOSM approach is introduced to assess the predictive variance [26].
(13a)
(13b)
When comparing the first-order uncorrelated (Eq. (4a)) and second-order correlated (Eq. (13a)) predictive variance of the aleatoric mean ΣE the additional covariance term can be observed. Similarly, when comparing the first-order uncorrelated (Eq. (4b)) and second-order correlated (Eq. (13b)) predictive variance of the aleatoric variance ΣV additional second-order terms can be observed on top of the additional covariances terms. A comparison of the two approximations is presented in Fig. 2. While ΣV remains small between 0.2 and 0.7, the second-order correlated approach predicts a non-zero value as opposed to the first-order uncorrelated approach. These new approximations can now be introduced in Eqs. (10a) and (10b) to improve upon the prediction of REI.
Fig. 2
Comparison of the first-order uncorrelated and second-order correlated predictive variance of the aleatoric mean and variance of the Gaussian process predictor trained from eight equidistant samples for test function y(x)=(6x−2)2sin(12x−4)/45+0.03 with x∼N(x,10−2)
Fig. 2
Comparison of the first-order uncorrelated and second-order correlated predictive variance of the aleatoric mean and variance of the Gaussian process predictor trained from eight equidistant samples for test function y(x)=(6x−2)2sin(12x−4)/45+0.03 with x∼N(x,10−2)
Close modal

4.3 Crash Constraint Treatment.

The original ERGO scheme did not treat the possible occurrence of objective function failures. The SAMURAI framework, which incorporated the ERGO scheme in the context of reliability-based robust design optimization, accounted for this particularity by means of adding the prediction of failed evaluations to the training set. However, this strongly affects the convergence of the scheme [21].

By storing information on whether (ci=+1) or not (ci=1) the design vector (xi) could be evaluated, a binary classification problem is obtained. In this work, use is made of a discriminative Gaussian process classifier (GPC) to model the feasible space directly p(c(x)|Sc) with Sc={X,c}={(xi,ci)|i=1,,n(Sc)}. Rasmussen and Williams’ textbook is strongly drawn from to describe the theory behind GPC and its implementation [27]. The approach corresponds to turning the output of a regression model into a class probability using a response function λ(z):R[0,1], guaranteeing a valid probabilistic interpretation. As such, the regression model becomes a latent or hidden process.

Inference can as such be understood as a two-step process, in which first the distribution over the latent output is determined p(y(x)|Sc), after which this distribution is used to make a prediction of the probability of a class p(c(x)|Sc). To ease out the notation, the arguments of the joint distribution due to the evaluation of the training set in the latent stochastic process are dropped: y(Xc)y,
(14a)
(14b)
with p(y|Sc)=p(c|y)p(y)/p(c). The evaluation of these integrals is intractable and must be approximated. Conventional methods rely on sampling, using for example, Markov Chain Monte Carlo sampling, or analytical approximations of the integrals, such as the Laplace approximation, variational inference (VI) or expectation propagation (EP). In this work, use is made of the latter, which is briefly described here. Through Bayes theorem the following can be observed
(15)
with p(y) the Gaussian prior. Note furthermore that the proportionality is exactly equal when a normalization term, corresponding to the marginal likelihood, is taken into account. The likelihood p(ci|yi) is taken equal to the probit likelihood for binary classification Φ(ciyi). The latter is approximated by means of an unnormalized Gaussian function ti(yi), such that posterior p(y|Sc) can be approximated by q(y|Sc)
(16a)
(16b)
with Σ~=diag(σ~i2), μ=Σ1Σ~μ~, Σ=(Ψ1+Σ~1), and kEP=q(Sc). The latter corresponds to the normalization term of the posterior and is equal to the EP approximation of the marginal likelihood, all of which are predicted by the EP algorithm. The marginal log likelihood logLc(θ|Sc)logp(Sc|θ)logq(Sc|θ) is given by
(17)
The EP algorithm relies on the minimization of the Kullback–Leibler divergence of the exact posterior over the approximate KL(p(y|Sc)q(y|Sc)), as opposed to the VI algorithm which minimizes KL(q(y|Sc)p(y|Sc)). This translates itself in a four-step iterative process during which ti is sequentially updated. First, an approximate posterior is constructed by leaving out the current ti. Second, this distribution is combined with the exact likelihood to approximate the marginal distribution. This marginal is approximated by a Gaussian distribution in the third step, which boils down to a moment matching step. Finally, ti is reevaluated in the fourth step. Details on the algorithmic implementation are given in Rasmussen and Williams [27].
The availability of the approximate Gaussian posterior distribution obtained by means of the EP algorithm now permits the evaluation of the approximate predictive distribution.
(18)
An illustration of the classifier is presented in Fig. 3. The surrogate-assisted SOFM (Eq. (3a)) and SOSM (Eq. (3b)) predictions are cut off by q(c(x)=1|Sc)<0.5. This ensures that optimizer is not motivated to explore the non-evaluable region. Notice that the prediction of the unfeasible domain only slightly moves into the exact one, which indicates the effectiveness of the GPC.
Fig. 3
Comparison of the surrogate-assisted SOFM (Eq. (3a)) and SOSM (Eq. (3b)) predictions with the exact mean and variance of the problem presented in Fig. 2 for which the evaluations fail at x<0.2. The predictions in by means of a GPC modeled non-evaluable region are omitted.
Fig. 3
Comparison of the surrogate-assisted SOFM (Eq. (3a)) and SOSM (Eq. (3b)) predictions with the exact mean and variance of the problem presented in Fig. 2 for which the evaluations fail at x<0.2. The predictions in by means of a GPC modeled non-evaluable region are omitted.
Close modal
The treatment of objective function evaluation failure in the context of ERGO can subsequently be understood as follows. First, a GPC is trained in parallel with the GPIs of the objective functions evaluations. Subsequently, the acquisition function is conditioned to only probe the design space which is predicted to be evaluable. This is realized by taking the product of the acquisition function and the predictive classifier. The result is a “safe” robust expected improvement (SREI):
(19)
The effect of the activation function A(x|X) (Eq. (12a)) and the classifier q(c(x)=1|Sc) (Eq. (18)) on the REI is illustrated in Fig. 4. It can be observed that dotted line is high in the unfeasible region, even for designs that have been attempted to be evaluated. However, by enforcing the activation function (dashed line), regions of (attempted) evaluations are forced to zero. Secondly, the full line shows that by applying the classifier, the most valuable next point to be evaluated lies outside the non-evaluable region. Proving the effectiveness of the proposed new acquisition function.
Fig. 4
Illustration of the effect of the activation function A(x|X) (Eq. (12a)) and the classifier q(c(x)=1|Sc) (Eq. (18)) on the REI for the problem presented in Figs. 2 and 3
Fig. 4
Illustration of the effect of the activation function A(x|X) (Eq. (12a)) and the classifier q(c(x)=1|Sc) (Eq. (18)) on the REI for the problem presented in Figs. 2 and 3
Close modal

The construction of the Gaussian process classifier is performed using the publicly available matlab® implementation of Rasmussen and Williams’ work using a conjugate gradient approach to maximize the concentrated log marginal likelihood function [27].

4.4 Stochastic Parameter Treatment.

The original ERGO scheme only permitted stochastic design variables. As such, the impact of parameter variability on the objective could not be directly assessed. To account for this defect, the treatment of stochastic parameters was included in the improved ERGO formulation. The problem solved now takes on the following form:
(20)
The Gaussian processes built from the different objective functions now include the stochastic parameters uRd(U) with the assumption that uN(u,Φu). More concretely, a space filling design of experiments including u is constructed where u[u3σu,u+3σu] with diag[Φu]=σu. This ensures that 99.6% of the effect of stochastic parameters is captured. The robust Pareto front is now updated to include the effect of the stochastic parameters, evaluated at their mean value PPar[μ{E,V}(X|u,Φ)].
To select a new infill, the robust expected improvement is optimized around the mean of the stochastic parameters. The corresponding new infill design variable vector x* is subsequently held fixed while u* is determined. The latter corresponds to the maximizer of the “safe” predictive variance product (SPVP) as to ensure that the predictions of aleatoric metrics is as accurate as possible. This results in the following two-step optimization process:
(21a)
(21b)
(21c)

4.5 Algorithmic Development.

Algorithm 2

RequireS={Si,Sc} with Si={Ui,Xi,Yi} and Sc={Uc,Xc,c}

 1: While computational budget is not exhausted do

 2:  Fork=1,,n(Y)do

 3:   Θi*(k)argmaxθLi(θ|Sik)   ▷Eq.(A2)

 4:  θ*cargmaxθLc(θ|Sc)   ▷Eq.(17)

 5:  PPar[μ(X|u,Θ*,Φ)]

 6:  z*maxxSREI(x|u,Θ*,P,r)   ▷Eq.(21a)

 7:  Ifz*>kREIthen

 8:   x*argmaxxSREI(x|u,Θ*,P,r)

 9:   u*argmaxuSPVP(u|x*,Θ*)   ▷Eq.(21b)

10:   c*c(u*,x*)

11:   Ifc*=1then

12:    y*y(u*,x*)

13:    SiSi{u*,x*,y*}

14:   ScSc{u*,x*,c*}

15:  else

16:   break

17: X*argParx[μ(X|u,Θ*,Φ)]

If these different aspects are brought together, the improved ERGO-II algorithm is obtained (Algorithm 2 and Fig. 5) with Θ={Θi,θc} and Φ={Φx,Φu}. Upon inspection of the new algorithm, it can be observed that in comparison with the original ERGO scheme (or EGO upon which it was built as presented in Algorithm 3 in  Appendix) multiple GP interpolators are built for each of the different function output sets and a GP classifier to model the evaluable region (lines 2-4). Upon assessment of whether the contribution of the design that is expected to contribute most strongly to the robust Pareto is significant enough (lines 6 and 7), the design is evaluated (line 10). When the design evaluation is success, the output is added to the sets (lines 11-14). This process is subsequently repeated until the computational budget is depleted (line 1) or until the improvement decreases beyond a threshold value kREI (lines 7 and 16).

Fig. 5
Flowchart of the EGO algorithm in gray [10], with the additions that lead to the ERGO algorithm in blue [15] and the additions of this paper in orange resulting in the current ERGO-II scheme (Color version online.)
Fig. 5
Flowchart of the EGO algorithm in gray [10], with the additions that lead to the ERGO algorithm in blue [15] and the additions of this paper in orange resulting in the current ERGO-II scheme (Color version online.)
Close modal

5 Performance Assessment

5.1 Problem Description.

A comparative study between ERGO (Algorithm 1) and ERGO-II (Algorithm 2) is setup to examine how the improved methodology performs in comparison to the original scheme. This is realized by applying the methodologies on a stochastic reformulation of Zitzler et al.’s first test problem (ZDT1) for varying dimensionality [28]. The test problem is restricted to a single stochastic function, to which two robustness objectives correspond, since the original ERGO scheme is restricted to this condition. Second, the original ERGO scheme is enhanced with the same routine to treat stochastic parameters as the ERGO-II method. Third, the MAR approach introduced in the SAMURAI framework [21] is also applied here to the original routine to permit the occurrence of evaluation failures in the comparative study, which corresponds to the constraint introduced here.
(22)
with both the design variables and parameters being treated as being normally distributed xN(x,diag[102]) with x[0,1]d(X) and uN(1,diag[102]). The feasible domain is chosen such that it contains 80% of the input space.

5.2 Methodology.

The evaluation of the performance of the multi-objective optimization algorithms is conducted through the utilization of the hypervolume metric H(P) [29]. The inherent stochastic nature of the optimization process, influenced by the random seed of the optimizer, can result in variable performance. To account for this aspect, the optimization is repeated ten times after which the average, min and max H(P) in each iteration after the design of experiments (DoE) are computed.

The determination of the appropriate size for the DoE adheres to Jones et al.’s recommendation, which suggests a scaling factor of 11(d(X)+d(U))5 [10]. For the DoE, a Latin hypercube sampling approach [30] is employed, and Morris and Mitchell’s maximin-criterion is utilized to quantify the space-filling property. This criterion aims to maximize the distances between pairs of points in ascending order while simultaneously minimizing the number of corresponding pairs [31]. The stopping criteria employed include a maximum of 200 function evaluations after the DoE or the termination of the process if the normalized expected improvement falls below 0.1% (corresponding to kREI). The optimization of the REI is accomplished using a genetic algorithm.

5.3 Results and Discussion.

Figure 6 illustrates the ERGO versus ERGO-II performance assessment as a function of number of iterations for increasing dimensionality over the different plots from top to bottom with on the left the hypervolume indicator and on the right the normalized maximum expected improvement. The full line denotes the mean values, while the shades area are minimum, and maximum values of the respective performance measures. Optimal performance of an optimizer would be characterized by an early climb of H(P) and a decrease to zero of the REI once H(P) stagnates. The latter would ensure that the optimizer terminates if no further improvement can be obtained.

Fig. 6
Performance assessment of ERGO versus ERGO-II for increasing dimensionality with on the left, the hypervolume indicator and on the right, the normalized maximum expected improvement
Fig. 6
Performance assessment of ERGO versus ERGO-II for increasing dimensionality with on the left, the hypervolume indicator and on the right, the normalized maximum expected improvement
Close modal

A notable observation is the enhanced performance of ERGO-II compared to the original scheme as the dimensionality increases (dx[5,8]). This can be observed based on the fact that there is a strong increase in H(P) early on and a decrease of REI to zero once H(P) stagnated. ERGO maintains a high REI during the optimization, without further improvements of H(P). The significant performance fluctuations of the original scheme are attributed to the treatment of crash constraints, since this approach might lead to the constant generation of regions of high predictive variance, as discussed in the introduction.

At lower dimensionality (dx[2,4]) ERGO-II also performs better than ERGO, but fails to stop once H(P) stagnates. The reason for which might be attributed to the size DoE, which is now taken to be proportional to the dimensionality of the input space.

6 Validation

To illustrate the method in a more practical context, the original ERGO scheme is compared with the improved ERGO-II methodology by applying them to a computational fluid dynamics (CFD) design problem. More precisely, the stochastic reformulation of the second benchmark case introduced by the AIAA Aerodynamic Design Optimization Discussion Group (ADODG) is solved [24].

6.1 Problem Description.

The second benchmark case has the objective of minimizing the drag coefficient CD of a modified RAE2822 profile in transonic viscid flow by altering its geometry, while a minimum cross-sectional area S, lift coefficient CL, and pitching moment coefficient CM constraint is satisfied [32].
(23)
The problem is formulated at a freestream Mach number Ma =0.734 and Reynolds number Re =6.5×106. This problem is extended upon by including stochasticity in the design and parameter space in order to account for off-point importance as was emphasized by LeDoux and colleagues [33], echoing Drela’s remarks on airfoil design [34].
(24)
The stochasticity in the parameter space is introduced in the form of variability in the Mach and Reynolds numbers u=[Ma,Re] such that these numbers are normality distributed around the original reference operation conditions with a coefficient of variation σu/u equal to 0.1.

6.2 Parameterization.

Kulfan and Bussoletti’s class-shape function transformation (CST) is used to parametrize the airfoil [35]. This corresponds to the thickness distribution t(xc) as the product of a class function C(xc), a generalized formulation of a circle, and a shape function S(xc), corresponding to the Bézier curve, plus a trailing edge (te) thickness correction δtte to give the thickness distribution, with xc=x/c where x and c respectively denote the chordwise position and chord length
(25a)
(25b)
(25c)
with Cid(b) the binomial coefficient and b the Bézier coefficients. Since the airfoil is not symmetric, a separate curve is considered for both the pressure and suction side. Three second-order curves for both sides are considered, leading to six Bézier coefficients: [bp,bs]. Additionally, the angle of attack (AoA) α is added as a design variable: x=[bp,bs,α]. The design space X is determined by estimating the CST coefficients corresponding to the RAE2822 (given in Table 1, obtained by means of an MSE minimization) around which a variation of ±0.1 is permitted. The AoA is permitted to vary in [0deg,10deg].
Table 1

Optimal designs CST description as predicted by ERGO (middle) and ERGO-II (bottom) in comparison with the default (RAE2822) profile at the top

bp(1)bp(2)bp(3)bs(1)bs(2)bs(3)
Default0.1220.1840.207.108.248.035
Design 10.0750.1580.256.120.217.001
Design 20.0900.1830.222.066.207.010
Design 10.0900.1860.256.138.218.007
Design 20.0970.1950.254.147.213.002
bp(1)bp(2)bp(3)bs(1)bs(2)bs(3)
Default0.1220.1840.207.108.248.035
Design 10.0750.1580.256.120.217.001
Design 20.0900.1830.222.066.207.010
Design 10.0900.1860.256.138.218.007
Design 20.0970.1950.254.147.213.002

Production tolerances are translated in the form stochastic design variables such that the design vector is normally distributed around the parametrization of the airfoil with a coefficient of variation equal to 0.1.

Self-intersecting airfoils that would appear in the DoE are accounted for by the crash constraint treatment presented in Sec. 4.3.

6.3 Methodology.

The treatment of constraints is performed through a conventional probabilityoffeasibility approach [36]. To model the transonic flow around the airfoil, CFD simulations are applied. The governing equations for flows correspond to the unsteady Reynolds-averaged Navier–Stokes equations.

6.3.1 Turbulence Modeling.

The introduction of Reynolds’ decomposition of the flow variables in a mean and fluctuating part in the Navier–Stokes equations and the ensemble-averaging of this leads to the RANS equations, which are much cheaper to solve. However, by decomposing and averaging an additional term is introduced in the equations, which has the form of a stress-term and is typically addressed as the Reynolds stress. This leads to a set of equations that is no longer closed. Different manners exist by which this stress term is solved, such as by relating the Reynolds stress to the mean strain rate, building forth on the Boussinesq eddy viscosity assumption, by means of the eddy viscosity. Again different approaches exist to determine this term; the kω SST model, which is used in this work, does this by means of two additional transport equations for turbulent kinetic energy k and specific turbulence dissipation rate ω and assuming a linear relation in the boundary layer between the turbulent shear stress and the turbulent kinetic energy [37].

6.3.2 Solver-Specific Settings.

To solve the set of equations presented in the previous sections, a pressure-based solver with a fully-coupled approach was applied. A second-order scheme was used for the pressure and second-order upwind discretization was applied for density and momentum while turbulent kinetic energy, specific dissipation rate, and energy were discretized using a first-order upwind scheme. A least-squares cell-based approach is employed to calculate the gradients. This implies that a least-squares method is used to solve the system of equations relating the cell gradient to the difference in cell values. All residuals of the equations were enforced to drop below 105, including for the continuity equation. The residuals of the former are expressed as the ratio of the sum of the absolute errors in each cell to the sum of the absolute values of the flow variable calculated in each cell. The residual of the latter is expressed as the ratio of the sum of the rate of mass creation in each cell to the largest absolute value of this sum in the first five iterations.

6.3.3 Mesh Settings.

A c-shaped grid is constructed around the airfoil. A grid independence study was conducted to determine the grid and domain sizing in accordance to the approach in [38]. The results correspond to a computational of 20 chord lengths in front, above, and below the airfoil and 40 chord lengths behind it as presented in Fig. 7. Furthermore, a maximum wall normal expansion ratio of 1.1 is imposed on 120 layers surrounding the airfoil to obtain a y+ of maximum 1 near the stagnation point, on average 0.35, and 200 nodes are placed on the chord.

Fig. 7
Computational domain for the high-fidelity solver built in accordance to the sensitivity study performed in Ref. [38]
Fig. 7
Computational domain for the high-fidelity solver built in accordance to the sensitivity study performed in Ref. [38]
Close modal

6.3.4 Boundary Conditions.

Surrounding the computational domain a pressure far field boundary condition is imposed with a Mach number determined by the ERGO-II scheme. The values of the turbulent intensity and the turbulent viscosity ratio are respectively set to 5% and 10%. The wall is modeled as a no-slip boundary.

6.3.5 Computational Solver.

All calculations are performed using the CFD-code ansys®fluent®23.1.0 The computational effort of a single simulation using five cores of an Intel Xeon Gold 6240 2.6GHz CPU amounted to roughly 1 min.

6.4 Results and Discussion.

The resulting Pareto fronts of the design optimization-under-uncertainty as presented by ERGO and ERGO-II are visualized in Fig. 8. In light and dark gray, the feasible designs of respectively the DOE and the infills are presented. In black the Pareto fronts, as predicted by the surrogates of the output (CD) after the final iteration are visualized.

Fig. 8
Pareto front of the ADODG2 aerodynamic design problem under uncertainty as predicted by ERGO (a) and ERGO-II (b))
Fig. 8
Pareto front of the ADODG2 aerodynamic design problem under uncertainty as predicted by ERGO (a) and ERGO-II (b))
Close modal

Inspection shows that the Pareto front obtained by ERGO-II extends deeper in the bottom left than the one obtained by ERGO, conform the conclusions drawn in the previous section. Additionally, the number of feasible infills is larger when using ERGO-II, as can be seen by the larger number of dark gray dots in the bottom plot. This can again be attributed to the effect of the crash constraint treatment as pinpointed in the introduction and elaborated in Sec. 4.3: using the MAR approach might lead to distorted surrogates if the prediction is excessive or might mislead the optimizer if the MAR approach predicts the unfeasible region to be the optimal one. The behavior is avoided when using the GP classifier approach as presented in this work.

Another observation that can be made is the fact that the default prediction is different in the top and bottom plots. This can be attributed to the fact that the uncertainty propagation relies on the GPI. Theoretically, if the output function displays the smoothness characteristics enforced by the GPI and UP approach, the GP-assisted UP will converge to the exact solution. However, this is prohibited by the infills added using the MAR approach.

The two designs on the two Pareto fronts are selected for closer inspection. Their corresponding CST-parametrization values are given in Table 1 and visualized along with their characteristics for Ma =0.734 and Re =6.5×106 as obtained using the CFD modeling approach in Fig. 9. A first observation that can be drawn is the fact that ERGO’s design 2 (in yellow in the top plots) has a smaller area than the default design (in blue), while this constraint was enforced (Eq. (24)). This is once more a consequence of the MAR approach which has added this unfeasible design to the design set with a prediction that outperformed the other evaluated designs. A possible approach to avoid this is driven up the contribution of the model uncertainty in the MAR prediction. However, experience has shown that this destabilizes the surrogate training set, as denoted above.

Fig. 9
Optimal designs visualization and characteristics prediction at Ma =0.734 and Re =6.5×106 as predicted by ERGO (a and b) and ERGO-II (c and d). (Color version online.)
Fig. 9
Optimal designs visualization and characteristics prediction at Ma =0.734 and Re =6.5×106 as predicted by ERGO (a and b) and ERGO-II (c and d). (Color version online.)
Close modal

On the other hand, the designs presented by ERGO-II meet the area constraint. It can be noted that the optimal designs have a more rounded pressure side near the leading edge and rounded suction side near the trailing edge. Inspection of the characteristics indicates a higher stall angle for the optimal designs, but a lower value of CL-gradient in the linear region in comparison with the baseline design. Second, a lower drag and moment coefficient is observed.

In regards to the computational cost, ERGO and ERGO-II are built on the EGO algorithm, a form of BO as discussed in the introduction, which is known to be a data-efficient method and therefore used specifically for methods where the computational cost of a function evaluation is far more substantial than the cost of training the GPs and finding the next infill. Both methods required roughly the same computational budget, corresponding to 250 function calls, of which 150 succeeded. This added up to approximately 4–5 h for each optimization. If an alternative surrogate-free approach had been attempted to solve the problem, the computational cost of the uncertainty quantification would have prohibited solving the problem due to the computational cost of the CFD calls; for example, performing a single UP of a design by means of a sampling approach or a spectral method such as the Taylor-based method or Polynomial Chaos would surmount to the same computational cost as the whole ERGO-II optimization.

The successful outcome of the validation case using ERGO-II proves the effectiveness of the improved scheme.

7 Conclusion and Outlook

In this paper improvements upon ERGO, a surrogate-assisted optimization-under-uncertainty methodology for robust design optimization, was presented. This was realized in four steps:

  1. The hypervolume expected improvement was introduced in the framework as an alternative to the Euclidean EI. The Euclidean EI exhibited sensitivity to variations in objective magnitudes and was restricted to only two objectives. By incorporating the hypervolume EI, these limitations were mitigated, enabling more robust and versatile design selection based on the acquisition function.

  2. A more advanced approach for assessing predictive variance was introduced. The initial implementation relied on a FOSM technique to estimate the predictive uncertainty of the stochastic objective measures. However, this lower-order approach led to the underemphasis of certain regions in the objective space. To address this drawback, the adoption of a SOSM approach was proposed, enabling a more accurate assessment of the predictive variance within the acquisition function.

  3. A classifier to model the feasible space was introduced. In the original ERGO scheme, the consideration of objective function failures was neglected. To address this issue, the utilization of a Gaussian process classifier was proposed, which, when combined with the acquisition function, minimizes the disruption to the convergence of the optimization process.

  4. The treatment of stochastic parameters was examined. The original ERGO scheme had a limitation in that it only allowed for stochastic design variables. Consequently, the direct assessment of the impact of parameter variability on the objective was not possible. In order to address this limitation, a two-step acquisition function optimization approach was undertaken to specifically handle stochastic parameters.

The scheme was compared with the original ERGO approach using one of the ZDG test function for varying dimensionality and validated on the second ADODG problem, the drag minimization of a RAE2822 airfoil in a viscous transonic flow while being subjected to stochasticity in both operating conditions and design variables. The outcome illustrated the success of the scheme in obtaining a robust solution within an affordable computational budget.

Future research tracks are currently focused on two aspects of the proposed ERGO-II scheme. First, alternative approaches to the uncertainty propagation routine are explored. For highly non-linear objective spaces, the second-order approximation might leave wanting. Therefore, exact propagation methods are explored for specific covariance function and input distribution combinations. A second research track is focused on dealing with the limitations inherent to Bayesian optimization: namely the assumption of a smooth output space and a limited number of variables. Some recent advances in the machine learning community, namely manifold GPs, are currently being explored.

Acknowledgment

The author gratefully acknowledges the funding by the Research Foundation-Flanders (FWO), Belgium, through the Junior Post-Doctoral fellowship of Jolan Wauters. The computational resources (Stevin Supercomputer Infrastructure) and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by Ghent University, Belgium, FWO and the Flemish Government—department EWI, Belgium.

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Nomenclature

Greek Symbols

α =

angle of attack

α =

GP predictive mean coefficients

β =

polynomial coefficients

θ =

covariance function hyperparameters

κ =

hypervolume cell

ν =

Matérn smoothness parameter

σ2 =

process variance

Φ =

input space covariance matrix

Ψ =

Gaussian process covariance matrix

c =

chord length

d =

dimensionality

b =

Bernstein coefficients

y =

vector of objective function evaluations

F =

model matrix

P =

pareto front

S =

training set

x =

deterministic/mean design vector

x =

stochastic design vector

y =

Gaussian process of the objective

X =

design set

X =

design space

CD =

drag coefficient

CL =

lift coefficient

CM =

moment coefficient

   Ma =

Mach number

Re =

Reynolds number

Subscripts

p =

pressure side

s =

suction side

   hv =

hypervolume

Eu =

Euclidean

* =

optimal value

Superscripts

M =

matricization

V =

vectorization

c =

classifier

i =

interpolator

   T =

Taylor approximation

=

transpose

Operators

A() =

activation function

ϕ() =

standard normal probability density function

Φ() =

standard normal cumulative distribution function

ψ(,) =

covariance function

E{}[] =

expected value operator

μ{}() =

predictive mean

f() =

trend vector function

H[] =

hypervolume measure

Kν() =

Bessel function

L() =

likelihood function

N(,) =

normal distribution

Σ{}() =

predictive variance

V{}[] =

variance operator

y() =

objective vector function

y() =

Gaussian process

Abbreviations

ADODG =

aero design optimization discussion group

BO =

Bayesian optimization

DoE =

design of experiments

E(R)GO =

efficient (robust) global optimization

((S)R)EI =

((safe) robust) expected improvement

EP =

expectation propagation

(F/S)OFM =

(first/second)-order first moment

(F/S)OSM =

(first/second)-order second moment

GA =

genetic algorithm

GPI/C =

Gaussian process interpolator/classifier

KL =

Kullback–Leibler

LHS =

Latin hypercube sampling

MAR =

missing-at-random

MC(I/S) =

Monte Carlo (integration/sampling)

MLE =

maximum likelihood estimation

NSGA-II =

non-dominated sorting genetic algorithm II

OUU =

optimization-under-uncertainty

PDF =

probability density function

RANS =

Reynolds-averaged Navier–Stokes

RDO =

robust design optimization

SA =

surrogate-assisted

(S/M)OO =

(single/multi) objective optimization

SQP =

sequential quadratic programming

U(Q/P) =

uncertainty (quantification/propagation)

Appendix: Bayesian Optimization

This appendix presents a concise introduction to GPI and BO, extended with a number of metrics essential to the ERGO framework, making use of the notation introduced in Sec. 2. For more information on GPI and BO, the reader is referred to Refs. [8,27].

BO is motivated by the problem of finding a global minimizer (or the mathematically equivalent maximizer) of an unknown objective function y in a data-efficient manner.
(A1)
In this context, X denotes the designated space of interest, typically a confined subset of Rd(X), with d(X) typically taken smaller than 20 [16]. It is further assumed that the function y:XR is a continuous black-box function. Evaluating y is an expensive process devoid of noise or gradients, making it impossible to directly obtain the gradient without resorting to approximations. The core concept of BO is to construct a surrogate model of the objective function. This surrogate model can be updated and queried to guide the optimization decisions [8].

The GPI is the surrogate model commonly used in BO. From a function-space perspective, a Gaussian process y can be defined as a distribution over functions, where the values of y(x) at any set of points {xi|i=1,,n} follow a joint Gaussian distribution. A Gaussian process is fully characterized by its second-order statistics y(x)=GP(m(x),ψ(x,x|θ)). Here, m(x):XR represents the mean function, and ψ(x,x|θ):X×XR represents the covariance function fully described by its parameters θ. Different interpolators can be obtained based on the formulation of the mean function. When a constant value is used, the method is known as “simple Kriging.” Moreover, if a multivariate polynomial f(x) is employed, the interpolator is referred to as “universal Kriging.”

In a fully Bayesian approach, a prior distribution is placed over the hyperparameters. However, customary a point estimate of the hyperparameters is used in the shape of the MLE: argmaxθLi(θ|Si)p(Si|θ) with Si{X,y} the evaluated data. The likelihood function can be simplified by taking the logarithm and marginalizing out the hyerparameters
(A2)
with F the model matrix defined by F(i,j)=fi(xj), Ψ the correlation matrix defined by Ψ(i,j)=ψ(xi,xj), β the coefficients of the multivariate polynomial trend, and σ2 the process variance. This can be further simplified by enforcing the stationarity with respect to β and σ2, which corresponds to a generalized least squares approach such that
(A3a)
(A3b)
When leaving out the constant terms the concentrated log marginal likelihood function is obtained which can be directly optimized. Note that the resulting function on the one hand quantifies the fit of the surrogate through σ2, while on the other hand, it penalized complexity through |Ψ|.
Building forth on the definition of a Gaussian process and the theorem of Bayes the predictive distribution p[y(x)|Si] can be directly evaluated and gives again a normal distribution of which the mean E[y(x|Si)]μy(x) and variance V[y(x|Si)]Σy(x) can be directly evaluated
(A4a)
(A4b)
with ψ(x)=[ψ(x1,x),,ψ(xn,x)]α=Ψ1(yFβ), Γ=(FΨ1F)1, g(x)=FΨ1ψ(x)f(x).
The availability of a closed formulation of the predictive mean and variance permits the derivation of the Jacobian of the predictive mean μ(x) and the corresponding variance Σμ(x) [3942]
(A5a)
(A5b)
The formulation is extended even further upon for the Hessian of the predictive mean μΔy(x) and the corresponding predictive variance of the Hessian ΣΔy(x)
(A6a)
(A6b)
Additionally, the covariance between the mean prediction, gradient, and hessian can be derived
(A7a)
(A7b)
(A7c)
In this paper, exclusive use is made of the Matérn covariance function, common in the aerospace community [27,43]. It is often preferred over the conventional exponential covariance function, which infinitely differentiable nature leads to non-physical levels of smoothness [44].
(A8)
with Γ(ν) the Gamma function and Kν the modified Bessel function of the second kind
(A9a)
(A9b)
(A9c)
In the Matérn function θ and ν respectively correspond to the width parameters and the smoothness parameter. A strongly simplified expression can be obtained when ν=1/2+p with pN is used. The result is a p-times mean square differentiable function. In this work, exclusive use is made of the case where p=2
(A10)
Using the chain rule the Jacobian xψ(x,x) and Hessian δxxψ(x,x) can be obtained with the former being a d(X)×1 vector and the latter a d(X)×d(X) matrix given by
(A11a)
(A11b)
The Gaussian process interpolator is built using the object-oriented design and analysis of computer experiments (ooDACE) toolbox, which is an open-source matlab® software [45]. The construction process involves employing a multi-start sequential quadratic programming approach to optimize the concentrated log marginal likelihood function [46,47].
Now the enhancement in performance of a predicted point concerning the current best-evaluated point, denoted as ymin=min[y(X)], can be formally expressed as follows: I(x|ymin)=max[yminy(x),0]. By considering the stochastic nature of the interpolator, it becomes possible to assess the uncertainty in the prediction. This assessment is used to define the expected improvement Ey[I(x|Si)] [9]. It is important to note that the first term aims to minimize the objective (exploitation), while the second term aims to minimize the uncertainty in the prediction (exploration). This infill criterion serves as the foundation of the well-known EGO algorithm (Algorithm 3) [10].
(A12)
Notice in Algorithm 3: line 6 that θ* is added as an argument. This substitution is made to indicate that an MLE approach is adopted instead of a fully Bayesian approach to evaluate the interpolator. Furthermore, kEI is introduced as an additional constant that serves as a stopping criterion: if no further significant improvement can be made, stop (break) the optimization. Otherwise, the state of practice is to ensure the training set does not contain more than 500 designs [16].

EGO [10]

Algorithm 3

Require Evaluated sampling plan Si={X,y}

 1: While computational budget is not exhausted do

 2:  θ*argmaxθLi(θ|Si)   ▷Eq.(A2)

 3:  yminminy

 4:  z*maxxEI(x|θ*,ymin)

 5:  Ifz*>kEIthen

 6:   x*=argmaxxEI(x|θ*,ymin)   ▷Eq.(A12)

 7:   y*y(x*)

 8:   SiSi{x*,y*}

 9:  else

10:   break

11: x*argminxy

References

1.
Chatterjee
,
T.
,
Chakraborty
,
S.
, and
Chowdhury
,
R.
,
2019
, “
A Critical Review of Surrogate Assisted Robust Design Optimization
,”
Arch. Comput. Methods Eng.
,
26
(
1
), pp.
245
274
.
2.
Parnianifard
,
A.
,
Azfanizam
,
A.
,
Ariffin
,
M. K. A. M.
, and
Ismail
,
M. I. S.
,
2018
, “
An Overview on Robust Design Hybrid Metamodeling: Advanced Methodology in Process Optimization Under Uncertainty
,”
Int. J. Ind. Eng. Comput.
,
9
(
1
), pp.
1
32
.
3.
Balesdent
,
M.
,
Brevault
,
L.
,
Morio
,
J.
, and
Chocat
,
R.
,
2020
,
Overview of Problem Formulations and Optimization Algorithms in the Presence of Uncertainty
,
Springer International Publishing
,
Cham
, pp.
147
183
.
4.
Zang
,
T. A.
,
Hemsch
,
J.
,
Michael
,
H.
,
Luckring
,
J. M.
,
Peiman
,
M.
,
Padula
,
S. L.
, and
Stroud
,
W. J.
,
2002
, “Needs and Opportunities for Uncertainty-Based Multidisciplinary Design Methods for Aerospace Vehicles,” NASA, Report™-2002-211462.
5.
Persson
,
J. A.
, and
Ölvander
,
J.
,
2017
, “
How to Compare Performance of Robust Design Optimization Algorithms, Including a Novel Method
,”
Artificial Intell. Eng. Design, Anal. Manuf.
,
31
(
3
), pp.
286
297
.
6.
Kanno
,
Y.
,
2020
, “
On Three Concepts in Robust Design Optimization: Absolute Robustness, Relative Robustness, and Less Variance
,”
Structural Multidisciplinary Optim.
,
62
(
2
), pp.
979
1000
.
7.
Ryan
,
K. M.
,
Lewis
,
M. J.
, and
Yu
,
K. H.
,
2015
, “
Comparison of Robust Optimization Methods Applied to Hypersonic Vehicle Design
,”
J. Aircraft
,
52
(
5
), pp.
1510
1523
.
8.
Shahriari
,
B.
,
Swersky
,
K.
,
Wang
,
Z.
,
Adams
,
R. P.
, and
Freitas
,
N. D.
,
2016
, “
Taking the Human Out of the Loop: A Review of Bayesian Optimization
,”
Proc. IEEE
,
104
(
1
), pp.
148
175
.
9.
Mockus
,
J.
,
Tiesis
,
V.
, and
Zilinskas
,
A.
,
1978
, “The Application of Bayesian Methods for Seeking the Extremum,”
Towards Global Optimization 2: Proceedings of a Workshop at the University of Cagliari
, Vol.
2
,
L. C. W.
Dixon
, and
G. P.
Szego
, eds.,
Italy
, pp.
117
129
.
10.
Jones
,
D. R.
,
Schonlau
,
M.
, and
Welch
,
W. J.
,
1998
, “
Efficient Global Optimization of Expensive Black-Box Functions
,”
J. Global Optim.
,
13
(
4
), pp.
455
492
.
11.
Keane
,
A. J.
,
2012
, “
Cokriging for Robust Design Optimization
,”
AIAA J.
,
50
(
11
), pp.
2351
2364
.
12.
Keane
,
A. J.
, and
Voutchkov
,
I. I.
,
2020
, “
Robust Design Optimization Using Surrogate Models
,”
J. Comput. Design Eng.
,
7
(
1
), pp.
44
55
.
13.
Cheng
,
J.
,
Liu
,
Z.
,
Wu
,
Z.
,
Li
,
X.
, and
Tan
,
J.
,
2015
, “
Robust Optimization of Structural Dynamic Characteristics Based on Adaptive Kriging Model and CNSGA
,”
Struct. Multidiscipl. Optim.
,
51
(
2
), pp.
423
437
.
14.
Ribaud
,
M.
,
Blanchet-Scalliet
,
C.
,
Helbert
,
C.
, and
Gillot
,
F.
,
2020
, “
Robust Optimization: A Kriging-Based Multi-objective Optimization Approach
,”
Reliab. Eng. Syst. Saf.
,
200
(
1
), p.
106913
.
15.
Wauters
,
J.
,
2022
, “
ERGO: A New Robust Design Optimization Technique Combining Multi-objective Bayesian Optimization With Analytical Uncertainty Quantification
,”
ASME J. Mech. Des.
,
144
(
3
), p.
031702
.
16.
Keane
,
A. J.
,
2006
, “
Statistical Improvement Criteria for Use in Multiobjective Design Optimization
,”
AIAA J.
,
44
(
4
), pp.
879
891
.
17.
Wagner
,
T.
,
Emmerich
,
M.
,
Deutz
,
A.
, and
Ponweiser
,
W.
,
2010
, “On Expected-Improvement Criteria for Model-Based Multi-objective Optimization,”
Parallel Problem Solving From Nature, PPSN XI
,
Springer Berlin Heidelberg
,
Berlin, Germany
, pp.
718
727
.
18.
Couckuyt
,
I.
,
Deschrijver
,
D.
, and
Dhaene
,
T.
,
2014
, “
Fast Calculation of Multiobjective Probability of Improvement and Expected Improvement Criteria for Pareto Optimization
,”
J. Global Optim.
,
60
(
3
), pp.
575
594
.
19.
Haldar
,
A.
, and
Mahadevan
,
S.
,
2000
,
Probability, Reliability, and Statistical Methods in Engineering Design
,
Wiley
,
New York
.
20.
Oberkampf
,
W. L.
,
Trucano
,
T. G.
, and
Hirsch
,
C.
,
2004
, “
Verification, Validation, and Predictive Capability in Computational Engineering and Physics
,”
Appl. Mech. Rev.
,
57
(
5
), pp.
345
384
.
21.
Wauters
,
J.
,
Degroote
,
J.
,
Couckuyt
,
I.
, and
Crevecoeur
,
G.
,
2022
, “
SAMURAI: A New Technique for Efficient Optimization Under Uncertainty
,”
AIAA J.
,
60
(
11
), pp.
6133
6156
.
22.
Forrester
,
A.
,
Sóbester
,
A.
, and
Keane
,
A.
,
2006
, “
Optimization With Missing Data
,”
Proc. R. Soc. A: Math. Phys. Eng. Sci.
,
462
(
2067
), p.
935
.
23.
Bachoc
,
F.
,
Helbert
,
C.
, and
Picheny
,
V.
,
2020
, “
Gaussian Process Optimization With Failures: Classification and Convergence Proof
,”
J. Global Optim.
,
78
(
3
), pp.
483
506
.
24.
Nadarajah
,
S.
,
2013
, “Aerodynamic Design Optimization: Drag Minimization of the RAE 2822 in Transonic Viscous Flow,” McGill Computational Aerodynamics Group, Report ADODG Case 2.
25.
Girard
,
A.
,
2004
, “Approximate Methods for Propagation of Uncertainty with Gaussian Process Models,” Thesis.
26.
Xue
,
J.
,
Leung
,
Y.
, and
Ma
,
J.-H.
,
2015
, “
High-Order Taylor Series Expansion Methods for Error Propagation in Geographic Information Systems
,”
J. Geograph. Syst.
,
17
(
2
), pp.
187
206
.
27.
Rasmussen
,
C. E.
, and
Williams
,
C. K. I.
,
2006
,
Gaussian Processes for Machine Learning
,
MIT Press
,
Cambridge, MA
.
28.
Zitzler
,
E.
,
Deb
,
K.
, and
Thiele
,
F.
,
2000
, “
Comparison of Multiobjective Evolutionary Algorithms: Empirical Results
,”
Evolutionary Comput.
,
8
(
2
), pp.
173
195
.
29.
Fonseca
,
C. M.
, and
Fleming
,
P. J.
,
1996
, “On the Performance Assessment and Comparison of Stochastic Multiobjective Optimizers,”
Parallel Problem Solving From Nature – PPSN IV
,
Springer Berlin Heidelberg
,
Berlin, Germany
, pp.
584
593
.
30.
McKay
,
M. D.
,
Beckman
,
R. J.
, and
Conover
,
W. J.
,
1979
, “
A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code
,”
Technometrics
,
21
(
2
), pp.
239
245
.
31.
Morris
,
M. D.
, and
Mitchell
,
T. J.
,
1995
, “
Exploratory Designs for Computational Experiments
,”
J. Statist. Planning Inference
,
43
(
3
), pp.
381
402
.
32.
Nadarajah
,
S.
,
2017
, “Aerodynamic Design Optimization: Drag Minimization of the NACA 0012 in Transonic Inviscid Flow,” McGill Computational Aerodynamics Group, Report ADODG Case 1.
33.
LeDoux
,
S. T.
,
Vassberg
,
J. C.
,
Young
,
D. P.
,
Fugal
,
S.
,
Kamenetskiy
,
D.
,
Huffman
,
W. P.
,
Melvin
,
R. G.
, and
Smith
,
M. F.
,
2015
, “
Study Based on the AIAA Aerodynamic Design Optimization Discussion Group Test Cases
,”
AIAA J.
,
53
(
7
), pp.
1910
1935
.
34.
Drela
,
M.
,
1998
, “Pros & Cons of Airfoil Optimization,”
Frontiers of Computational Fluid Dynamics 1998
,
World Scientific
,
Singapore
, pp.
363
381
.
35.
Kulfan
,
B. M.
, and
Bussoletti
,
J. E.
,
2006
, “
‘Fundamental’ Parametric Geometry Representations for Aircraft Component Shapes
,”
11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference
,
Portsmouth, VA
,
Sept. 6–8
.
36.
Forrester
,
A.
,
Sóbester
,
A.
, and
Keane
,
A.
,
2008
,
Engineering Design Via Surrogate Modelling: A Practical Guide
,
Wiley
,
Hoboken, NJ
.
37.
Menter
,
F. R.
,
1994
, “
Two-Equation Eddy-Viscosity Turbulence Models for Engineering Applications
,”
AIAA J.
,
32
(
8
), pp.
1598
1605
.
38.
Wauters
,
J.
,
Degroote
,
J.
, and
Vierendeels
,
J.
,
2019
, “
Comparative Study of Transition Models for High-Angle-of-Attack Behavior
,”
AIAA J.
,
57
(
6
), pp.
2356
2371
.
39.
Han
,
Z.-H.
,
Görtz
,
S.
, and
Zimmermann
,
R.
,
2013
, “
Improving Variable-Fidelity Surrogate Modeling Via Gradient-Enhanced Kriging and a Generalized Hybrid Bridge Function
,”
Aerospace Sci. Technol.
,
25
(
1
), pp.
177
189
.
40.
Han
,
Z.-H.
,
Zhang
,
Y.
,
Song
,
C.-X.
, and
Zhang
,
K.-S.
,
2017
, “
Weighted Gradient-Enhanced Kriging for High-Dimensional Surrogate Modeling and Design Optimization
,”
AIAA J.
,
55
(
12
), pp.
4330
4346
.
41.
McHutchon, A., 2014, “Nonlinear Modelling and Control using Gaussian Processes,” Ph.D. thesis, University of Cambridge, https://mlg.eng.cam.ac.uk/pub/pdf/Mch14.pdf
42.
Lophaven
,
S. N.
,
Nielsen
,
H. B.
, and
Sondergaaard
,
J.
,
2002
, “DACE: A Matlab Kriging Toolbox,” Technical University of Denmark, Report IMM-TR-2002-12.
43.
Palar
,
P. S.
,
Zuhal
,
L. R.
,
Chugh
,
T.
, and
Rahat
,
A.
,
2020
, “On the Impact of Covariance Functions in Multi-Objective Bayesian Optimization for Engineering Design,”
AIAA SciTech Forum
,
Orlando, FL
,
Jan. 6–10
,
American Institute of Aeronautics and Astronautics
, pp.
1
13
.
44.
Stein
,
M. L.
,
1991
, “
A Kernel Approximation to the Kriging Predictor of a Spatial Process
,”
Ann. Inst. Statist. Math.
,
43
(
1
), pp.
61
75
.
45.
Couckuyt
,
I.
,
Dhaene
,
T.
, and
Demeester
,
P.
,
2014
, “
ooDACE Toolbox: A Flexible Object-Oriented Kriging Implementation
,”
J. Mach. Learn. Res.
,
15
(
1
), pp.
3183
3186
.
46.
Ugray
,
Z.
,
Lasdon
,
L.
,
Plummer
,
J.
,
Glover
,
F.
,
Kelly
,
J.
, and
Martí
,
R.
,
2007
, “
Scatter Search and Local NLP Solvers: A Multistart Framework for Global Optimization
,”
INFORMS J. Comput.
,
19
(
3
), pp.
328
340
.
47.
Nocedal
,
J.
, and
Wright
,
S. J.
,
2006
,
Numerical Optimization
,
Springer New York
,
New York
.