## Abstract

Physics-informed neural networks (PINNs) have gained popularity across different engineering fields due to their effectiveness in solving realistic problems with noisy data and often partially missing physics. In PINNs, automatic differentiation is leveraged to evaluate differential operators without discretization errors, and a multitask learning problem is defined in order to simultaneously fit observed data while respecting the underlying governing laws of physics. Here, we present applications of PINNs to various prototype heat transfer problems, targeting in particular realistic conditions not readily tackled with traditional computational methods. To this end, we first consider forced and mixed convection with unknown thermal boundary conditions on the heated surfaces and aim to obtain the temperature and velocity fields everywhere in the domain, including the boundaries, given some sparse temperature measurements. We also consider the prototype Stefan problem for two-phase flow, aiming to infer the moving interface, the velocity and temperature fields everywhere as well as the different conductivities of a solid and a liquid phase, given a few temperature measurements inside the domain. Finally, we present some realistic industrial applications related to power electronics to highlight the practicality of PINNs as well as the effective use of neural networks in solving general heat transfer problems of industrial complexity. Taken together, the results presented herein demonstrate that PINNs not only can solve ill-posed problems, which are beyond the reach of traditional computational methods, but they can also bridge the gap between computational and experimental heat transfer.

## 1 Introduction

The application of machine learning (ML) techniques to heat transfer problems can be dated back to 1990s, when artificial neural networks (ANN) were used to learn the convective heat transfer coefficients [1] from data. In recent years, more advanced learning-based methods have been developed also aided by the improvement of the appropriate hardware, e.g., GPU technology, and have been beneficial to various heat transfer problems [2–4]. In particular, the convolutional neural network (CNN)—a widely used deep learning technique—has been successfully employed to make predictions from image-like data in complex and high-dimensional problems [5–8]. For example, a CNN proposed in Ref. [8] was used to predict the local heat flux of turbulent channel flows by feeding it with the wall-shear stress and the wall pressure. In other work, a CNN based on the U-net [9] architecture, employed a two-dimensional (2D) slice of time-averaged temperature field as input to predict a ridge pattern, which was used to quantify the fraction and time variability of turbulent heat transfer [5]. The aforementioned methods are mostly based on the supervised learning strategy, where a database with labels (i.e., ground-truth) is required to train the model. The advantage of such data-driven methods is that once trained, the prediction procedure will be very fast. However, the data generation is a big issue and the generalization (from one experimental condition to another) of the model is not guaranteed. In addition to the supervised learning, other approaches have been taken into account. For instance, an unsupervised learning strategy was applied for a conjugate thermal optimization problem [10]. More recently, deep reinforcement learning (RL) has also been applied to control thermal systems; for example, Ref. [11] showed that RL-based control is able to stabilize the conductive regime and bring the onset of convection up to a Rayleigh number $Ra=3\xd7104$, outperforming the state-of-the-art linear controllers. Also, Ref. [12] applied RL to several natural and forced convection problems, demonstrating that their RL algorithms can alleviate the heat transfer enhancement related to the onset of convection.

The aforementioned efforts do not directly take into account the underlying physics of heat transfer problems. To tackle this problem, a multitask learning approach is required, such as the framework of physics-informed neural networks (PINNs). This approach was first proposed for solving both forward and inverse problems described by a combination of some data and of partial differential equations (PDEs), and subsequently it was applied to various fluid mechanics problems [13–16] as well as heat transfer problems [16–20]. For instance, motivated by the limited understanding of the physical mechanisms responsible for the heat transfer enhancement in rough turbulent Rayleigh–Bénard convection, the authors in Ref. [19] applied the PINN framework to predict turbulent transport at $Ra=2\xd7107$ in a subdomain of a Rayleigh–Bénard cavity filled with water and bearing two square-based roughness elements placed on the hot plate. PINNs training benefited from a large direct numerical simulations database. The influence of the choice of the data acquisition and residuals sampling, in relation to the problem geometry and initial/boundary conditions, was also reported [19]. Nvidia also has developed the code SimNet based on PINNs and applied it to solve multiphysics problems involving heat transfer of a FPGA heat sink inside a channel [16]; see also Sec. 5.2 below. Moreover, PINNs can be employed to quantify the flow fields of natural convection from data measurements, as demonstrated in Ref. [18]. In this article, the authors inferred the flow fields based on a number of temperature as well as velocity measurements, and investigated the influence of the data selection. The flexibility offered by PINNs has also been leveraged to tackle free boundary and Stefan problems; a general class of problems for which the evolution of unknown boundaries and moving interfaces has to be estimated concurrently with solving the underlying PDE. For example, Wang et al. [21] have demonstrated the effectiveness of PINNs in solving both forward and inverse problems Stefan problems involving multiphase interfaces; see also Sec. 4 below. More recently, the PINNs algorithm was applied to quantify the velocity and pressure fields of natural convection over an espresso cup from temperature data obtained by a background-oriented schlieren experiment [20], indicating that the method can be successfully used to deal with real experimental data. In addition to PINNs, an unsupervised learning approach with auto-encoder and image gradient was proposed in Ref. [22] to solve the heat equations on a chip. The principle of this method is similar to PINNs. However, by training the network with a set of data, the proposed framework can also generalize the trained network for predicting solutions for heat equations with unseen source terms. These algorithms, which aim to solve the heat equations, introduce the physical models to the neural networks, and train the model by minimizing a loss function involving the residuals of the governing equations. Therefore, the solutions can be determined with limited data or without any data except for the boundary and initial conditions.

In this article, we mainly review the applications of PINNs on inverse heat transfer problems in forced and mixed convection and on the two-phase Stefan problem with a moving interface. We also include two examples from industry on how to use PINNs in the thermal design of power electronics. The article is organized as follows. In Sec. 2, we provide an overview of PINNs, and in Sec. 3, we present two prototype problems of forced and mixed convection. In Sec. 4, we present the formulation of PINNs for the two-phase Stefan problem with some illustrative results, and in Sec. 5, we present two industrial applications of PINNs to power electronics by Ansys and Nvidia. We conclude in Sec. 6 with a brief summary and outlook.

## 2 Overview of Physics-Informed Neural Networks

where $Nx$ is a general linear or nonlinear differential operator; $x\u2208\mathbb{R}d$ and *t* are the spatial and temporal coordinates, respectively; Ω and $\u2202\Omega $ denote the computational domain and the boundary; $u(x,t)$ is the solution of the PDEs with initial condition $h(x)$ and boundary condition $g(x,t)$. We remark that this formulation can be easily generalized to higher-order PDEs since they can be written as systems of first-order PDEs.

*b*are trainable weights and biases, respectively; $\sigma (\xb7)$ is the activation function representing a simple nonlinear transformation. The parameters of the network can be trained by minimizing a composite loss function taking the form

_{j}*N*,

_{r}*N*, and

_{b}*N*

_{0}are the numbers of data points for different terms. Notice that all loss terms are a function of the network weight and bias parameters, $wi,j$ and

*b*, respectively, the dependence on which has been omitted to favor notation simplicity. In addition, if some data are available inside the domain, an extra loss term indicating the mismatch between the predictions and the data can be taken into account

_{j}To compute the residuals for $Lr$, derivatives of the outputs with respect to the inputs (i.e., $ut$ and $Nx[u]$) are required. Such computation is achieved in the PINN framework using automatic differentiation in the deep learning code. Automatic differentiation relies on the fact that combining the derivatives of the constituent operations by the chain rule gives the derivative of the overall composition. This technique is a key enabler for the development of PINNs, and is the key element that differentiates PINNs from similar efforts in the early 1990s [24,25], which relied on manual derivation of back-propagation rules. Nowadays, automatic differentiation capabilities are well-implemented in most deep learning frameworks such as TensorFlow [26] and PyTorch [27], and it allows us to avoid tedious derivations or numerical discretization while computing derivatives of all orders in space–time. In Sec. 3, the problem-dependent governing equations, loss functions and the training configurations of PINNs are described case by case.

A schematic of the PINN framework is demonstrated in Fig. 1, in which a simple heat equation $ut=\alpha uxx$ is used as an example to show how to setup a PINN for heat transfer problems. As shown in Fig. 1(a), the fully connected neural network is used to approximate the solution *u*(*x*, *t*), which is then applied to construct the residual loss $Lr$, boundary conditions loss $Lb$, and initial conditions loss $L0$. The parameters of the fully connected network are trained using gradient-descent methods based on the back-propagation of the loss function. We also demonstrate the data points used for different terms of the loss function, as shown in Fig. 1(b). Note that the residual points for computing the residual loss $Lr$ can be randomly selected in the space–time domain and the numbers of points can be defined by the users.

## 3 Convection Heat Transfer: Unknown Thermal Boundary Conditions

In standard heat transfer handbooks and textbooks, one can find explicit expressions of the Nusselt number as a function of the Reynolds and Prandtl numbers, for several prototypical heat transfer problems, e.g., inside and around pipes, boundary layers, and other surfaces given that the thermal boundary condition is precisely specified, i.e., constant temperature or constant heat flux, e.g., see Refs. [28–30]. Moreover, at the present time, standard computational fluid mechanics (CFD) methods can simulate arbitrary thermal boundary conditions, e.g., the mixed Robin boundary condition, or arbitrary temperature distributions imposed on the heated surface with very good accuracy and on industrial complexity geometric domains. However, in real heat transfer applications, e.g., in power electronics or in a nuclear reactor, unlike the velocity boundary conditions, the thermal boundary conditions are never precisely known as this would require an enormous and complex instrumentation, which is not feasible in industrial applications but only in a few cases in research experimental setups.

The lack of thermal boundary conditions leads to an ill-posed boundary value problem for the energy equation, which cannot be solved no matter how sophisticated the CFD method is. To this end, we address this ill-posed problem here by asking the simple question: what if we have some temperature measurements at a few points, e.g., using simple thermocouples, in convenient locations that may or may not include the heated surface with the unknown boundary condition. Can we then solve an inverse problem using both the measurements and the governing equations of flow and heat transfer. In a typical CFD setup, this would require a tedious data assimilation method [31,32] blended in with flow and heat transfer solvers and may take extremely long times to converge, if it could converge at all. Here, inspired by the development of PINNs, we exploit the expressivity of deep neural networks to formulate this ill-posed problem, blending seamlessly data and mathematical models while simultaneously inferring both the flow and temperature fields.

### 3.1 Forced Convection.

where *θ*, $u=(u,v)T$, and *p* are the dimensionless temperature, velocity, and pressure fields, respectively. Pe, Re, and Ri denote the Peclet, Reynolds, and Richardson numbers, respectively. Note that for forced convection discussed in this section, $Ri=0$. Given the boundary and initial conditions, simulating the forced convection problems with standard CFD techniques is relatively simple. However, here, we aim to demonstrate a new proof-of-concept of inferring the entire temperature field from very sparse temperature measurements. In particular, unlike the well-posed simulation problem, which explicitly defines the thermal boundary conditions on all boundaries of the domain, the problem we focus on is an ill-posed problem, where the boundary conditions are not fully known and have to be discovered from very few measurements while simultaneously infer the entire temperature field everywhere in the domain. Moreover, we can also infer the velocity and pressure fields everywhere in the domain. This can be considered as an inverse problem, which can be addressed by PINNs. To demonstrate the PINN effectiveness, we perform the forced convection simulations in two different scenarios: heat transfer inside an enclosure, and heat transfer over a stationary cylinder, where the flows are both steady.

#### 3.1.1 Forced Convection in an Enclosure.

We consider 2D steady forced convection heat transfer in an enclosure, as shown in Fig. 2; the two hot objects with red boundaries in the enclosure are at temperature *θ* = 1; uniform cooling flow of $u=1,v=0$ enters from the left-bottom and exits from the right-top boundary. The temperature on the rest of the walls and inflow boundary is *θ* = 0. On the outflow boundary, $\u2202\theta n=0$. The governing equations are given by Eq. (6) with Re* *=* *50, Pe* *=* *36, and Ri* *=* *0.0. Note that here the height of *AB* is used as the characteristic length for the Reynolds number. The simulation started from initial conditions: *u *=* *0, *v *=* *0, and *θ* = 0 and stopped when both the flow and temperature reach steady-state.

#### 3.1.2 Flow Past Cylinder.

Here, we consider the classical two-dimensional heat transfer problem of forced heat convection around a circular cylinder in steady-state; we assume that we have some data and the reference solution is obtained numerically using the spectral/*hp* element method [33]. The simulation domain size is $[\u22127.5\u2009D,22.5\u2009D]\xd7[\u221210\u2009D,10\u2009D]$, consisting of 2094 quadrilateral elements, where *D* is the diameter of the cylinder. The cylinder center is located at (0,0). On the fluid flow part, the cylinder surface is assumed to be no-slip, no-penetration wall. Uniform velocity ($u=U\u221e,v=0$) is imposed on the inflow boundary where $x/D=\u22127.5$, periodic boundary condition is used on the lateral boundaries where $y/D=\xb110$, and zero-pressure boundary is prescribed on the outflow boundary where $x/D=22.5$. On the heat transfer part, constant temperature $\theta =\theta \u221e=0$ is imposed on the inflow boundary, periodic boundary is assumed on the lateral boundaries and zero-gradient is used on the outflow boundary. For the cylinder surface, constant wall temperature is considered as a simple example but any arbitrary temperature distribution can be readily modeled too. The governing equations of this problem are given in Eq. (6), with the Reynolds number $Re=20$ and Peclet number $Pe=200$. The simulation has been carried out until both flow and temperature reach steady-state.

#### 3.1.3 Active Sensor Placement.

In data-driven problems, it is often the case that the sensor locations, where data are collected, may affect the inferred results. To obtain the best sensor locations, it generally requires trial-and-error, which is a costly process. Therefore, we also propose a method to adaptively select the location of sensors to minimize the number of temperature measurements. This method allows us to start with a small number of fixed sensors (e.g., 4 or 5) and iteratively add more sensors at the essential locations.

The reasons of using this residual include: (a) this residual, which represents the error of the temperature equation, can be directly computed by the neural network and it does not require any prior knowledge; (b) as we will show in the following result, there is a correlation between this residual and the posterior temperature error; and (c) this can be considered as a sensitivity metric of the temperature with respect to the location (*x*, *y*). Taking the forced convection of flow past cylinder as an example, the proposed method for adaptive sensor placement can be summarized as follows:

#### 3.1.4 Results.

The first term $Lr$ penalizes the governing equations (6), including the heat equation, the momentum equations, and the continuity equation. Here, *N _{r}* is the batch size of residual points, which are randomly selected in the spatial domain. The second and third terms are the boundary conditions for velocity and temperature fields, respectively. It should be noted that the residual points for velocity boundary and temperature boundary can be different, thus two independent terms are applied in the loss function.

*θ*denotes the environmental temperature, which is known in practice. The last term of the loss function $L\theta $ is the mismatch between the inferred temperatures and the in situ measured values, e.g., at the thermocouple locations. The value of $N\theta $ depends on the sensor placement in different scenarios.

_{b}The parameters of the neural networks are randomly initialized using the Glorot scheme [34] and trained by the Adam optimizer [35] with a decreasing learning rate schedule. The results are obtained after 80,000 iterations with decreasing learning rates of $1\xd710\u22123,\u20091\xd710\u22124,\u20091\xd710\u22125$, and $1\xd710\u22126$ (20k iterations for each).

*L*

_{2}errors as the metric

where *V* represents one of the predicted quantities $(\theta ,u,v,p)$ and $V*$ is the corresponding reference.

Enclosure. We employ a fully connected neural network with ten hidden layers and 120 neurons per layer. The number of residual points for the three terms in loss function 8 are *N _{r}* = 10,736,

*N*= 757, and $N\theta b=548$. Note that the points for $Lub$ include all the boundaries of the enclosure since the velocity boundary conditions are always well defined. As shown in Fig. 2, the points of $N\theta b$ are uniformly distributed in all other boundaries except at EF and GH. Moreover, to mimic the scenario in real life that limited measurement data are available, here we assume that the temperature at the positions of the four green triangles and velocity at the positions of the five circles are known, see Fig. 2. The comparison between PINN-inferred result and the reference simulation result is shown in Fig. 3. It could be observed that PINN can predict the flow and temperature in the whole domain well qualitatively. The

_{ub}*L*

_{2}error of the temperature on the two unknown walls EF and GH is 10.82%.

Flow past cylinder. A fully connected neural network with ten hidden layers and 200 neurons per layer is employed in PINNs to infer the solutions. The numbers of residual points for different terms in loss function (8) are given as *N _{r}* = 20,000,

*N*= 900, and $N\theta b=500$. In particular, the points for computing $Lub$ include the inlet, upper, and lower bounds, as well as the cylinder boundary. We should point out again that the boundary condition of circular cylinder for velocity is known (no-slip) and that for temperature is unknown. The number of measurements ($N\theta $) depends on case by case and is specified in the following. In the following, we consider first the case of (unknown) prescribed temperature and subsequently the case of prescribed heat flux on the cylinder surface.

_{ub}Constant temperature surface. Nine different sensor configurations are considered for the case with constant temperature surface, which are illustrated in Fig. 4. The number of temperature measurements $N\theta $ varies from 5 to 11. As shown in the figure, we only have a couple of sensors on the cylinder and a couple more in the wake. Based on these observations, the thermal boundary condition on the cylinder surface is to be inferred. The relative *L*_{2} errors of temperature, velocity, and pressure fields computed over the whole domain are also provided in Fig. 4, along with the sensor placements. We note that the training of the network parameters is a nonconvex optimization problem (where a global minima is not guaranteed), thus that the result of each PINN simulation may be affected by the randomness of network initialization. Therefore, we perform ten independent training processes for each configuration and demonstrate the best results in Fig. 4.

For the first case shown in the figure, there are only five sensors: two on the cylinder surface and three in the wake. Although the neural network can predict accurate solutions of velocity and pressure, it fails to predict the temperature field very accurately ($\epsilon \theta >10%$). As seen from case 2, adding one more sensor on the surface can dramatically improve the results, showing that the information at the front stagnation point of the cylinder is significant in this problem. Case 3 is used to investigate another sensor placement in the wake behind the cylinder, which differs from case 2 and we can find that it can further increase the accuracy. The second group in Fig. 4, including cases 4, 5, and 6, show the fact that we can obtain relatively good results when there is no temperature measurement on the cylinder surface. Increasing the number of sensors around the cylinder can improve the inference performance. Moreover, compared with the third group (cases 7, 8, and 9), we find that placing one more sensor at the front stagnation point on the cylinder surface can help the method to find a much more accurate solution. For example, the temperature error in case 7 is only 1.25% while that in case 4 is 15.09%. It is worth noting that in all different sensor configurations, the velocity and pressure fields are accurately inferred by the PINN algorithm. The reason is that the boundary conditions of the velocity are well-defined, thus the Navier–Stokes (NS) equations (which are independent to the heat equation) can be well-resolved by PINNs [15].

A typical example of the inferred results is demonstrated in Fig. 5, where the fields are inferred by PINN with sensor configuration case 3. The pointwise absolute errors between CFD simulation and PINN results are also given, where we can see the magnitudes of the errors are very small compared to the magnitudes of inferring quantities. Figure 6 demonstrates the corresponding temperature and Nusselt number profiles on the cylinder surface. We observe that the temperature profile is approximately constant (equal to 1) and the Nusselt number looks consistent with the CFD solution. The mean Nusselt numbers along the cylinder surface of exact and predicted solutions are 5.90 and 5.89, respectively. That results in an accurate Nusselt number prediction with error smaller than 1%. Note that there are only six sensors used in case 3. Using such limited measurements for prediction is an inverse problem, which is generally difficult to solve. In addition to the unknown boundary condition, the proposed neural network can simultaneously infer the velocity, pressure, and temperature fields everywhere in the domain.

Constant flux surface. To discover the boundary condition with constant heat flux ($d\theta /dn=0$), we apply an experimental setup with seven sensors (three on the boundary and four in the wake domain), which is illustrated at the top-left corner in Fig. 7. The corresponding temperature and Nusselt number profiles are also demonstrated in the figure, where we can observe the consistency between CFD simulation and PINN inference. The mean Nusselt numbers along the cylinder surface of CFD and PINN solutions are 7.11 and 6.96, respectively. In this case, the *L*_{2} errors of $(\theta ,u,v,p)$ inferred by PINN are $3.18%,0.18%,0.24%$, and 0.46%, respectively.

Active sensor placement. We have introduced the method for active sensor placement in a previous section, and here we present an example of inferring the constant temperature surface in the flow past a cylinder to demonstrate the effectiveness of this method. As mentioned, the residual of heat transfer equation is considered as the criterion since there exists a correlation between the residual and the error of Nusselt number (which is a posterior assessment). As an example for explanation, the normalized error of Nusselt number and the normalized residual of the network for case 1 in Fig. 4 are illustrated in Fig. 8. As we can see, although the shapes of the curves are not exactly identical to each other, the peak locations of the residual are close to the locations with maximum Nusselt number error. Therefore, it is reasonable to expect better performance of the PINN if we add new sensors on the position with maximal residual.

Based on this strategy, we carry out an experiment of active sensor placement for forced convection of flow past a cylinder. In this case, the initial sensor setup is case 1 in Fig. 4, which contains five measurements. As shown in Fig. 9(a), the algorithm automatically added a sensor near the front stagnation point, which results in the configuration corresponding to case 2 in Fig. 4. That means the active sensor placement algorithm with PINNs can achieve a very high accuracy only with one iteration in this case. From Figs. 9(b) and 9(c), we can observe that the profiles of temperature and Nusselt number on the cylinder surface are consistent with the CFD solutions. The *L*_{2}-norm error of temperature in the domain dramatically decreases to about $3%$ after the first iteration. We note that although the active sensor placement algorithm can be performed with more iterations, here we only demonstrate one iteration since the accuracy is already high enough to terminate the process. For complex problems, more iterations (i.e., more sensors) are expected. However, such adaptive sensor placement algorithm allows us to avoid a trial-and-error procedure for selecting sensor locations.

### 3.2 Mixed Convection.

To obtain data and the reference solution, we consider a two-dimensional heat transfer problem of the mixed convection around a circular cylinder and we numerically simulate it using the spectral/*hp* element method, with the computational domain identical to the one described in Sec. 3.1.2. The cylinder surface is assumed to be no-slip, no-penetration wall, and constant wall temperature is considered. The governing equations of this problem are the Boussinesq approximations of the incompressible Navier–Stokes equations and the corresponding heat transfer equation, as shown in Eq. (6), where the Reynolds number $Re=100$, Péclet number $Pe=71$, and Richardson number $Ri=1.0$. In this article, we investigate the mixed convection where the force term is acted in the −*y* direction. Compared to the forced convection investigated in the previous section, the mixed convection problem in this section involves unsteady flow. To assess the PINN method, the time span of interest is about $t\u2208[0,15]$, covering more than three shedding periods, and the time-step is $\Delta t=0.1$. Moreover, there is a force term in the NS equations, which couples the solutions of temperature and velocity fields. The mixed convection problem we consider is still an ill-posed problem, where the temperature boundary condition on the cylinder surface and the entire flow fields need to be inferred from a few measurements only. In addition to the thermocouples, in this section, we assume that it is available to measure a small patch of temperature in the wake behind the cylinder, e.g., using a technique such as digital particle image thermometry (DPIT) [36].

*t*,

*x*,

*y*) as inputs and outputs $(\theta ,u,v,p)$. Different from the network used for forced convection problem, here we employ the formulation of neuronwise locally adaptive activation function [37], where the relation between the input

*X*and outputs

*Y*of the

*l*th each hidden layer is expressed as

*α*is a neuron-level parameter that can be learned during the training process and $\sigma (\xb7)=sin(\xb7)$. The unknown parameters of the neural network model can be learned by minimizing the following loss function

_{j}Here *N _{r}*, $NtN\theta b,\u2009NtNub,\u2009NtN\theta $, and $NtNsym$ represent the number of residual points corresponding to different terms, with

*N*the number of snapshots. The first term enforces the governing equations by minimizing the residuals of Eq. (6). Here,

_{t}*N*training points are randomly selected in the spatiotemporal domain, in the case we have the batch size

_{r}*N*= 10,000 for each iteration. The second and third terms are the boundary conditions for velocity and temperature fields, respectively. Note again that the temperature surface of the cylinder boundary is unknown, while other boundaries including the inlet, the upper and lower bounds of the computational domain are given as

_{r}*θ*.

_{b}We aim to infer the entire flow fields from temperature measurements at a few locations (e.g., thermocouples) as well as in a small area (e.g., DPIT patch). An example of the configuration is demonstrated in Fig. 10(a), where DPIT measures the temperature in $[1,2]\xd7[\u22120.5,0.5]$, which is represented by a 8 × 8 grid. Together with three additional sensors, we have $N\theta =67$ for the PINN algorithm to infer the entire fields. Compared to the loss function used for forced convection problems, here we also add an extra term $Lsym$, which is used to enforce the symmetric nature of the temperature profile on the cylinder surface. Although the loss function without $Lsym$ also works, we find that this a priori knowledge can help to improve the accuracy. Here, we also note that it is flexible to encode any prior knowledge in the PINN framework. We randomly select *N*_{sym} = 50 on the cylinder surface to enforce the symmetry condition.

The neural network applied for mixed convection problem is composed of ten hidden layers and 150 neurons per layer. The training procedure is performed by applying the Adam optimizer with decreasing learning rates $1\xd710\u22123,\u20091\xd710\u22124,\u20091\xd710\u22125$, and $5\xd710\u22126$; each involves 200k iterations. This can ensure that the neural network converges to a plateau after training.

The relative *L*_{2}-norm errors of inference results over the spatial domain are demonstrated in Fig. 10(b), which indicates that the errors are almost constant in time. We observe that the temperature error for the entire domain is approximately 2%, while the maximum error on the cylinder surface is roughly 5%, as shown in Fig. 10(c). The 2D flow fields at *t *=* *5.0 are also illustrated in Fig. 10(d), along with the pointwise absolute errors. Overall, we can see the magnitudes of the errors are very small. However, it is also worth noting that the errors of the flow (velocity and pressure) mainly exist in the region far from the cylinder, while the mismatch of the temperature is near the cylinder. The reason is that the velocity boundary condition on the cylinder surface is known (no-slip) while that for temperature is unknown. We also expect that adding velocity sensors in the downstream region will further improve the inference performance of PINNs, and this is a straightforward extension for the interested readers to pursue.

## 4 Two-Phase Stefan Problems

A large class of problems in heat transfer involve dynamic interactions between different material phases, giving rise to moving boundaries or free interfaces. A classical example is the so-called Stefan problem that traces back to the ice solidification problem describing the joint evolution of a liquid and a solid phase related to heat transfer [38]. This problem was also considered in 1831 by Lamé and Clapeyron in relation to the problems of ice formation in the polar seas [39]. Applications of free boundary and Stefan problems are ubiquitous in science and engineering as they can naturally model continuum systems with phase transitions, moving obstacles, multiphase dynamics, competition for resources, etc. Specific use cases in the context of heat transfer include thermal convection [40], Marangoni convection in a liquid pool [41], chemical vapor deposition [42], crystal growth and solidification [43–46], welding [47,48], semiconductor design [49], and beyond [42].

Since their initial conception nearly two centuries ago, free boundary and Stefan problems now define a well-studied area in applied mathematics both in terms of theory [50], numerical methods [51], and applications across a wide range of problems in science and engineering [42]. A unique characteristic of free boundary problems that introduces great challenges in terms of computational modeling is that they necessitate the solution of PDEs in domains with unknown boundaries and complex time-dependent interfaces. Different numerical methods have been developed to solve various types of free boundary problems, giving rise to different approaches for resolving the evolution of moving boundaries and free dynamic interfaces [52–57]. All these methods have their own advantages and limitations, and have been proven effective for certain classes of Stefan problems. However, these classical techniques are usually specialized to a specific type of free boundary problem and cannot be easily adapted to build a general framework for seamlessly synthesizing governing physical laws and observational data.

*k*

_{1},

*k*

_{2}are thermal diffusivity parameters. For simplicity, here we assume that $\Omega ={(x,t):(0,L)\xd7(0,T]}$ is a rectangular domain which is subdivided by a latent moving interface

*s*(

*t*) that separates the two phases as

Now, suppose that $\alpha 1,\alpha 2$ are known. Given some measurements of the temperature distribution inside the domain Ω, our goal is to predict the latent functions ${u1(x,t),u2(x,t),s(t)}$ satisfying the system of Eqs. (14)–(19). Moreover, we would also like to infer the unknown thermal diffusivities *k*_{1} and *k*_{2}.

A training dataset for this benchmark can be generated by randomly sampling *M *=* *200 measurement points inside the domain Ω and obtain corresponding data for *u ^{i}* using Eq. (25), i.e., ${(xdatai,tdatai),ui}i=1M$. It is worth emphasizing that for any given datapoint ${(xdatai,tdatai),ui)}$, we do not know the corresponding equation that governs

*u*during training.

^{i}*s*(

*t*) by a deep fully connected neural network (five layers, 100 hidden units, and tanh activations) $s\beta (t)$. Similarly, we represent temperature distributions $u1(x,t)$ and $u2(x,t)$ by another fully connected neural network (five layers, 100 hidden units, and tanh activations) with two outputs $u\theta (1)(x,t)$ and $u\theta (2)(x,t)$, i.e.,

Here *N* denotes the batch size, ${(xi,ti)}i=1N$ are collocation points that are randomly sampled at each iteration of the gradient descent. In addition, $\lambda 1,\lambda 2$ are two extra trainable parameters. Since we know that the thermal diffusivity parameters cannot be negative, we initialize them at 0.1 and constrain them to remain positive during model training.

Figure 12 summarizes the predictions of the proposed PINN model, indicating an accurate reconstruction of both the latent temperature field, as well as the dynamic interface *s*(*t*). Moreover, Fig. 13(b) shows the identified parameters *k*_{1} and *k*_{2}. Here, we can observe a discrepancy between the true and identified parameters indicating that the PINN model described above fails to correctly identify the unknown thermal diffusivity parameters, even after 200,000 training iterations. In fact, we can observe that the two identified parameters do not change as training goes on. This observation suggests that our model seems to get stuck in some local minimum, and, as a result, fails to correctly recover the target parameter values. This indicates that the observed inaccuracy in the model's predictions is not due to insufficient training iterations, but relates to some issue pertaining to the model itself.

*λ*is adaptively updated during training by utilizing the back-propagated gradient statistics [58]. Specifically, the estimates of

*λ*are computed by

*λ*for the next iteration are updated using a moving average of the form

with $\alpha =0.1$. As demonstrated in Ref. [58], this adaptive strategy can effectively mitigate pathologies arising in the training of PINNs due to stiffness in their gradient flow dynamics.

Figure 13 and Table 1 present comparisons of the identified parameters between the original PINN formulation of Raissi et al. [23], and the proposed PINN formulation with adaptive weights [58,59]. Notice that the PINN with adaptive weights not only converges to the exact parameters much faster, but also yields a significantly improved identification accuracy. In addition, we also investigate the accuracy of the reconstructed temperature *u*(*x*, *t*) and inferred interface *s*(*t*) with respect to these two methods. A comparison of relative *L*^{2} error in *u*(*x*, *t*) and *s*(*t*) between these two models is presented in Table 1 from which we can see that the dynamic weights approach improves the relative prediction error by about one order of magnitude. These figures and tables suggest that the weights in the loss function play an important role, and choosing appropriate weight coefficients can enhance the performance of PINNs by accelerating convergence and avoiding poor local minima.

## 5 Applications to Power Electronics

Dealing with the extreme heat fluxes in power electronics requires advanced cooling technologies. The target heat density levels can be >1 kW/cm^{2} and >1 kW/cm^{3} with a typical chip temperature rise 30 °C and maximum temperature difference across the chip footprint around 10 °C [60]. In the following, we present two recent works by Ansys and Nvidia in modeling heat transfer in power electronics using PINNs.

### 5.1 Heat Transfer in Electronic Chips.

The thermal management in semiconductors is extremely important in the AI chip-package-systems, 5G networks and automotive. To this end, solving heat transfer problems on electronic chips is investigated by Central ML Team at Ansys. The heat transfer from a chip involves heat conduction in solid phase, heat transfer from the chip to the surrounding fluid (air) and radiation. To test the effectiveness of using PINNs to solve heat equations, various canonical cases, such as 2D transient heat transfer with Dirichlet boundary condition, 2D transient heat transfer with source, as well as some practical engineering problems, such as 2D and three-dimensional (3D) heat transfer on a chip, chip tile temperature prediction, etc., have been investigated. The experimental results are shown in Fig. 14.

In Fig. 14(a), a 2D transient heat transfer problem with Dirichlet boundary conditions (temperatures on top, bottom, left, and right walls are 150, 300, 50, and 100 K, respectively) is solved and the comparison between numerical simulation results and PINN prediction is given. A 2D transient heat transfer case with heat source in the center is depicted in Fig. 14(b). Since the temperature profile along the line passing through the center of the square is of interest, temperature calculations from a numerical solver and PINN are compared along this line. It is worth noting that even though the residual points only cover a certain range in time axis (e.g., 0–20 s), the trained model can make predictions beyond the training range for extrapolation (e.g., 40 s) with reasonable accuracy (PINN prediction in orange curve, numerical results in blue curve).

As for practical engineering problems, PINNs have been used for solving 2D heat transfer on a chip, which is illustrated in Fig. 14(c). Discrete and drastically distinct power is applied on each tile of the chip and the final temperature distribution on the chip needs to be obtained. As shown in Fig. 14(c), PINNs not only display satisfactory prediction on the overall temperature distribution, but they also demonstrate adequate accuracy for the temperature profile on the line passing through the hottest spot on the chip (PINN prediction in orange curve, numerical results in blue curve).

Inspired by PINN, an Auto Encoder and Image Gradient (AEIG)-based approach has also been proposed by the Central ML Team at Ansys team for solving 2D and 3D chip thermal analysis [22]. Figure 14(d-1) presents some examples of the 2D power maps, their corresponding ground truth, network prediction, and the prediction error. The proposed network shows acceptable agreement with numerical simulation with mean absolute percentage error (MAPE) of 0.4%. Figure 14(d-2) demonstrates some prediction examples by AEIG on 3D power maps and the MAPE is 0.14%.

Moreover, a chip tile-based temperature prediction problem is solved and exhibited in Fig. 14(e). A chip consists of arrays of tiles and the temperature distribution under the condition that power is only applied on a single tile of the chip is concerned. Since there could be variations about parameters such as *x* and *y* coordinates of the tile with power, size of the tile, and power magnitude, etc., a parameterized PINN is utilized where these parameters together with the independent variables of the PDE are taken as inputs to the network. The trained model would then be used to make inferences on any points (unseen combinations of the parameters) within the input space. Temperature predictions by the parameterized PINN for different power source locations, tile sizes with power, power magnitudes, are demonstrated in Figs. 14(e-1), 14(e-2), and 14(e-3), respectively.

### 5.2 SimNet for Heat Sink Design.

To date, the published literature on PINNs has been able to demonstrate the forward solution of only simple problems when not using training data. The gradients, singularities, and discontinuities introduced by complex geometries or complex physics make the forward solution of real-world problems without training data extraordinarily challenging. Nvidia's SimNet^{3} is a toolkit for researchers and engineers with dual goals of being an extensible research platform as well as a solver for real-world and industrial problems. Figure 15 shows results of a conjugate heat transfer problem for the Nvidia's DGX-A100 NVSwitch heat sink whose fin geometry are variable. In heat sink design, the objective is to minimize the peak temperature that can be reached at the source chip while satisfying a maximum pressure drop constraint. This is necessary to meet the operating temperature requirements of the chip on which the heat sink is mounted for cooling. In this example, SimNet trains on a parametrized geometry with multiple design variables in a single training run thereby significantly accelerating design space exploration. In contrast, traditional solvers are limited to single geometry simulations. A forward solution of parameterized complex geometry with turbulent fluid flow between thinly spaced fins without training data makes this problem extremely challenging for the neural networks. Once the training is complete, several geometry, material or physical parameter combinations can be evaluated using inference as a postprocessing step, without solving the forward problem again. Such throughput enables more efficient design space exploration tasks for complex systems in science and engineering. The neural networks in this example are trained with ten variables (with nine of them being geometry parameters) with min, max, and median values in the range of these parameters. The OpenFOAM and commercial solver runs are on 12 CPU cores (DUAL 3.4 GHz Intel Xeon Gold x6128), and the SimNet runs are on 8 V100 GPUs (DGX1). It can be seen that SimNet can solve the problem faster than traditional solvers by several orders of magnitude.

## 6 Summary and Outlook

Employing neural networks in heat-transfer applications could accelerate progress in heat-transfer enhancement, thermal design, and complex multiphase systems. However, the usual data-driven approach of neural networks is not appropriate due to the lack of big data and the various limitations on measuring velocity and temperature fields that may change rapidly in space and time. It is possible to measure temperature on part of a surface, in some cases, by employing the temperature-sensitive paint method (TSP) [61], while for higher temperatures phosphors can be used. The infrared cameras can also be used for measurements on a patch of a heated surface. Their resolutions varies from 320 × 320 to 1280 × 1024, they have good dynamic range at 14 bits, and uncertainties are $\xb12\u2009\xb0C$ for $T<100\u2009\xb0C$, and ±2% of the reading for $T>100\u2009\xb0C$. They can measure up to 600, 2000, and 3000 $\xb0C$ depending of which of the detectors is used. Within the flow, one can use the DPITV method [36], but further enhancements are required to improve its accuracy. In these methods, the response times are slow, of the order of milliseconds, so it will be necessary for critical dynamic applications to improve the response times down to 10 s of microseconds. The surface film TSP approach has a resolution that is equal to the size of the imaged pixel. For the bulk flow methods, the resolution would be of the order of the mean particle distance, which can be improved with standard interpolation methods but also with neural networks and especially PINNs that can fill gaps and provide high accuracy [20].

PINNs can play a catalytic role in bridging the gap between experimental and computational heat transfer. Using sparse measurements by employing the aforementioned multifidelity methods and encoding directly the conservation laws into the neural network architecture, inference of the velocity and temperature fields as well as unknown thermal boundary conditions or interfaces can be accurately obtained. PINNs offer a hybrid model, where we can use any data that is available at any locations and asynchronously in time while enforcing the governing equations using automatic differentiation at random points in the space–time domains without the need for elaborate and expensive mesh generation. Hence, PINNs are based on compact computer codes of 100 s of lines compared to 1000 s lines of code for traditional numerical solvers.

Moreover, PINNs can solve ill-posed inverse problems not possible with standard approaches, such as the omnipresent problem of unknown thermal boundary conditions. For decades, both experimental and computational research in heat transfer relied on idealized thermal boundary conditions of constant temperature or constant heat flux, neither of which is valid in practice. The pioneering experimental work of Robert Moffat [62], see, e.g., the Ph.D. thesis in Ref. [63], first addressed this issue by relaxing these assumptions, and we believe that PINNs is an effective approach to finally resolve this limitation. We demonstrated this possibility in this article by considering forced and mixed heat transfer with unknown thermal boundary conditions, having only a small number of temperature measurements on the heated cylinder or in the wake. Upon training the neural network, we can discover the entire boundary condition while simultaneously infer the velocity and temperature fields in the domain. The presented results indicate good performance of the method, e.g., for the case of constant temperature surface, using at most six measurements can correctly infer the temperature and Nusselt number profiles on the boundary with less than 5% error. As the selection of the sensor location is very important for this inverse problem, we also proposed a method for active sensor placement. The residual of the heat transfer equation is considered as the criterion for selecting the sensor location. This metric is computed by the network, thus no prior knowledge is required. We assume that adding sensors at the location with highest value of residual can efficiently improve the performance of thermal boundary inference. Although the residual is related to the network structure and training process, we demonstrate the correlation between the residual and the posterior error of Nusselt number. As our test results show, this strategy can adaptively select the locations to place sensors while iteratively increase the prediction accuracy. Hence, PINNs can effectively guide the experimental work as well as bridging the gap between simulations and experiments as stated earlier by fusing any multifidelity measurements directly with the governing equations, which are encoded in the deep neural networks.

Here, we presented the basic version of PINNs but there have been many developments already, e.g., in employing domain decomposition in CPINN [64] and XPINN [65] that provide great flexibility with complex geometries but also in assigning a different neural network on each subdomain as well as parallel execution to accelerate inference. Another extension is the use of variational formulation in hp-VPINN [66]. Moreover, to quantify uncertainties one can use spectral expansions as in Ref. [67], generative models [68,69], or a Bayesian formulation as in Ref. [70]. Future work should address the possibility of multimodality/multifidelity measurements as inputs to PINNs, more efficient methods in quantifying uncertainties due to data as well as model uncertainties, tackling multidimensional multiphase problems, and producing proper benchmarks and data sets that can be used to further accelerate development of PINNs, especially in the industrial setting for thermal design, similar to the two applications we highlighted in this article.

## Acknowledgment

The authors thank Professor Dana Dabiri of Washington University for providing the information on current measurement techniques. The authors thank the Nvidia team (Oliver Hennigh, Susheela Narasimhan, Mohammad Amin Nabian, Akshay Subramaniam, Kaustubh Tangsali, Max Rietmann, Jose del Aguila Ferrandis, Wonmin Byeon, Zhiwei Fang, Sanjay Choudhry) and Mr. Jay Pathak and Dr. Haiyang He of Ansys for providing their results for Sec. 5 of the article.

## Funding Data

PhILMs (DOE DE-SC0019453).

MURI/OSD (FA9550-20-1-0358).

DOE grant (DE-SC0019116).

AFOSR grant (FA9550-20-1-0060).

DOE-ARPA grant (DE-AR0001201).

## Nomenclature

*b*=neuron bias

*N*=number of training points

*w*=neuron weight

*x*=_{i}neuron input

*y*=_{j}neuron output

*α*=neuron amplitude for adaptive activation function

*β*=all parameters in the first neural network (Sec. 4)

*θ*=all parameters in the second neural network (Sec. 4)

*λ*=weighting coefficient in loss function

*σ*=activation function

- $L$ =
loss function

## Footnotes

## References

^{TM}: an AI-Accelerated Multi-Physics Simulation Framework