<<

Noname manuscript No. (will be inserted by the editor)

A deep Convolutional Neural Network for topology optimization with strong generalization ability

Yiquan Zhanga· Bo Penga · Xiaoyi Zhoua · Cheng Xianga · Dalei Wanga*

Received: date / Accepted: date

Abstract This paper proposes a deep Convolutional which can be traced back to Michell (1904), have made Neural Network(CNN) with strong generalization abil- tremendous progress. A variety of numerical methods ity for structural topology optimization. The architec- have sprung up later, including SIMP (Bendsoe, 1989; ture of the neural network is made up of encoding and Zhou and Rozvany, 1991; Rozvany et al., 1992), evo- decoding parts, which provide down- and up-sampling lutionary approaches (Xie and Steven, 1993), moving operations. In addition, a popular technique, namely morphable components (Guo et al., 2014; Zhang et al., U-Net, was adopted to improve the performance of the 2018), level-set method (Wang et al., 2003; Allaire et al., proposed neural network. The input of the neural net- 2004; Du et al., 2018), and others. However, the com- work is a well-designed tensor where each channel in- putational cost is still one of the main obstacles pre- cludes different information for the problem, and the venting the adoption of topology optimization methods output is the layout of the optimal structure. To train in practice, in particular for large structures (Sigmund the neural network, a large dataset is generated by a and Maute, 2013). conventional topology optimization approach, i.e. SIMP. With the recent boost of machine learning algo- The performance of the proposed method was evaluated rithms and advances in graphics processing units (GPU), by comparing its efficiency and accuracy with SIMP on machine learning (ML), especially the deep learning, a series of typical optimization problems. Results show has been seen to make many successful stories in vari- that a significant reduction in computation cost was ous fields, including automatic drive, image recognition, achieved with little sacrifice on the performance of de- natural language processing, and even art. It may shed sign solutions. Furthermore, the proposed method can light on accelerating the adoption of topology optimiza- intelligently give a less accurate solution to a problem tion in more design practices. Recently, a few attempts with the boundary condition not included in the train- have been seen to apply ML on topology optimizations ing dataset. (Lei et al., 2018; Sosnovik and Oseledets, 2017; Banga Keywords Topology optimization · Deep et al., 2018; Yu et al., 2018), microstructural materi- learning · Machine learning · Convolutional neural als design (Yang et al., 2018) and additive manufactur- network · Generalization ability ing (Nagarajan et al., 2018). Theoretically, the optimal layout of the material is a complicated function of the initial conditions based on the optimization objective 1 Introduction and constraints. The neural network is good at fitting a complicated function and this characteristic makes it Since the seminal paper of Bendsoe and Kikuchi (1988), possible for the neural network to fit a target function studies on structural topology optimization problems, which can directly give us a good structure without any

Dalei Wang iteration and effectively reduce computational time. E-mail: [email protected] Sosnovik and Oseledets (2017) first introduced the * Corresponding Author deep learning model to topology optimization and im- a Department of Bridge Engineering, Tongji University, 1239 proved the efficiency of the optimization process by for- Siping Road, 200092, Shanghai, China. mulating the problem as an image segmentation task.

2 Yiquan Zhanga et al.

His deep neural network model could map from the in- In this study, a deep CNN model with strong gener- termediate result of the SIMP method to the final struc- alization ability is proposed to solve topology optimiza- ture of the design, which effectively decreased the total tion problems. The input of our network is a multi- time consumption. However, his work did not consider channel array with each channel representing differ- the initial conditions for topology optimization, and the ent initial conditions and the output is a segmentation accuracy of the result heavily relies on the first few it- mask with each element represents the probability of erations. Banga et al. (2018) proposed a deep learning reservation. The evaluation result shows that the pro- approach based on a 3D encoder-decoder Convolutional posed method can predict a near-optimal structure in Neural for accelerating 3D topol- negligible time. The main novelty of this work is the ogy optimization and to determine the optimal com- strong generalization ability of the proposed deep CNN. putational strategy for its deployment. Elaborating as It can give a less accurate solution to the topology op- Sosnovik and Oseledets (2017), this method also takes timization problems with the different boundary condi- the intermediate result of the conventional method as tions even though the CNN was trained on one bound- the input of the neural network. It could down some ary condition. The proposed method will be useful in time needed because of the reduction of iterations, but the preliminary design stage as the well-trained neural it can not completely replace the traditional method. network is able to deliver a rough result which provides the designers with a general idea within a very short Lei et al. (2018) developed a ML driven real-time time. topology optimization paradigm under the Moving Mor- phable -based solution framework. Their ap- proach can reduce the dimension of parameter space 2 Overview of neural networks and enhance the efficiency of the ML process substan- tially. The ML models used in the approach were the 2.1 Artificial neural networks and convolutional neural supported vector regression and the K-nearest-neighbors. networks Rawat and Shen (2018) proposed a new topology design procedure to generate good-performance structures us- Artificial neural networks (ANN), inspired by animal ing an integrated Generative Adversarial Network (GAN) nervous systems, is one of the most prevalent and suc- and CNN architecture. But only volume fraction, penalty cessful algorithms in machine learning. The basic com- and radius of the filter are changeable in this method. ponent of the neural network is a neuron, which is a All other initial conditions including force and bound- mathematical approximation of real neurons. In neural ary conditions must be fixed. Guo et al. (2018) proposed systems, a neuron could accept from other neu- an indirect low-dimension design representation to en- rons, and if the aggregated signals exceed the thresh- hance topology optimization capabilities, which can si- old, the neuron is fired and then sends a to the multaneously improve computational efficiency and so- related ones. In ANNs, a neuron is an abstract compu- lution quality. Yu et al. (2018) also used the GAN and tation unit, receiving inputs from the former neurons CNN to propose a deep learning-based method, which and returning an output to other neurons, which can can predict an optimized structure for a given boundary be mathematically expressed as: condition and optimization setting without any itera- tion. However, the neural network model trained by a y = f (z) = f (wT x + b) (1) large dataset in his research could work under only one boundary condition. Therefore, it is impractical to work where x is the n dimensional input vector, w is the n in the real world because the time needed for prepar- dimensional weight factor vector, and b is the bias ing the dataset and training the model is much longer vector. The reason why ANNs have powerful fitting ca- than the traditional SIMP method. To reduce time on pabilities lies in f (z), which is known as an activation preparing data for training, Cang et al. (2019) proposed function and is usually a non-linear function. There are a theory-driven learning mechanism which uses domain- several commonly used activation functions, including specific theories to guide the learning, thus distinguish- the sigmoid function, the hyperbolic tangent activation ing itself from purely data-driven supervised learning. function tanh(x) and the ReLU function (Nair and Hin- Oh et al. (2019) proposes an artificial intelligent (AI)- ton, 2010). They map a linear input wT x + b to a non- based deep generative design framework that is capa- linear form to strengthen the expression capabilities of ble of generating numerous design options which are a network. not only aesthetic but also optimized for engineering The prevalence of neural networks in recent years performance. mainly stems from the wide application of convolutional

A deep Convolutional Neural Network for topology optimization with strong generalization ability 3 neural networks. Early in the nineties of the last cen- convolution operation can be deemed as a feature ex- tury, Lecun et al. (1998) introduced the convolution op- tractor which could discover proper features and re- eration to ANNs for handwritten numeral recognition, duce man-made recognition bias for the segmentation and with the growth of data and computation capa- tasks. Shelhamer et al. (2014) firstly introduced CNN bilities, CNNs are widely applied in computer vision, in the field of semantic segmentation and named their natural language processing and other relevant fields. segmentation network as fully convolutional networks In CNNs the inner product part wT x + b is replaced by (FCN) since they discarded all the dense layers in their convolution operation and for 2-D images convolution network. Badrinarayanan et al. (2017) trained a neu- is defined as: ral network based on the encoder-decoder architecture and proposed an ”unpooling” operation for upsampling low-resolution images. To improve the performance of a b w(x, y) ∗ f (x, y) = w(s, t)f (x + s, y + t) (2) encoder-decoder architecture, Ronneberger et al. (2015) s=−a t=−b proposed a U-shape architecture to implement feature map fusion with shallow and deep layers and named the where ∗ denotes the convolution operation; f (x, y) de- proposed architecture as U-Net. The key of U-Net lies notes the input image; w(x, y) denotes the convolution in the feature fusion. As the network becomes deeper, kernel. the feature map encodes more semantic information but A modern CNN architecture is a stack of layers in- loses spatial information since the resolution is reduced. cluding convolution layers for convolution operation, So it is hard for traditional CNNs to strike a balance pooling layers for dimension reduction and fully con- between semantic and spatial information and achieve nection layers (or dense layers) which work like com- a good performance in the segmentation task. While in mon ANNs. Some advanced CNNs have more compli- the U-Net architecture, the shallow layers with spatial cated topology network architecture for different tasks. information are combined with deeper layers which en- For example, GoogLeNet (Szegedy et al., 2014; Ioffe code more semantic information, so that the network and Szegedy, 2015; Szegedy et al., 2016; Szegedy et al., could utilize the information in all layers and the per- 2016) introduced ”inception module” which combines formance of the network is improved. convolution with various kernels for feature infusion.

ResNet (He et al., 2016) proposed skip-connection for training very deep CNNs. 3 The typical topology optimization problem

In this study, the proposed method is described along- 2.2 Semantic segmentation side with the classical compliance minimization prob- lem solved by SIMP, while other optimization problems The semantic segmentation in computer vision is the are also fit to the framework in principle. The compli- process of classifying each pixel to a specific category ance minimization problem is formulated as follows: in a given image. Take the self-driven car as an exam- N ple: each pixel in the image from the front cameras is min : c(x) = UT KU = Ee(xe)uT k0ue (3) classified as the road, pedestrian or other vehicles by x e e=1 the neural network so that the self-driven cars could understand the scene in front of themselves. The goal V (x) of the topology optimization is to determine whether subjected to: = f, (4) V0 each element of a structure should be retained or dis- carded after the optimization algorithm, which can be taken as classifying pixels into two categories. In this KU = F, (5) sense, topology optimization is equivalent to the seman- tic segmentation. Semantic segmentation has realized great achieve- 0 ≤ xmin ≤ x ≤ 1 (6) ments with the help of CNNs in recent years. Tra- ditional semantic segmentation algorithms highly rely where c is the compliance, K is the global stiffness ma- on hand-design features for the input image, such as trix, U and F are the displacement and force vectors, color or texture distribution. But in the CNN based respectively, ue is the element displacement vector, k0 methods, these features can be learned automatically is the element stiffness for an element with fully through convolution operation so that the segmenta- distributed solid material, x is the vector of design vari- tion process is greatly simplified. In this sense, each ables (i.e. the element relative densities), xmin is the

4 Yiquan Zhanga et al.

F2

F3

Y The input tensor

F1

Topology Optimization

F2

F3

Y

F1

Fig. 1 The input and output of neural network

lower bound, which aims to avoid singularity, N is the 4) Penalty factor: 3 number of elements used to discretize the design do- 5) Filtering radius: 1.5 main, V (x) and V0 are the material volume and design 6) Number of forces: 1 - 10 (Uniform distribution) domain volume, respectively, and f is the prescribed 7) Direction of each force: X+, X−, Y +, Y − (Uni- volume fraction. form distribution) The input of the neural network is a 41 × 81 × 6 ten-

sor which includes the initial displacement field, strain 4 The proposed deep CNN field and volume fraction. Therefore, nodal displace- ments and strains must be calculated by finite element 4.1 The input and output of the neural network method under the assumption that all elements’ rela- tive densities are equal to the volume fraction. A similar For preparing the input of the neural network, a dataset idea was proposed in Gao et al. (2017)’s paper where of 80000 samples was generated with the 88 lines of code authors used the initial stress field to predict a good (Andreassen et al., 2011) which uses the SIMP method. ground structure. Fig.1 shows an example of the input It is worth mentioning that it is actually not necessary with the volume fraction equal to 0.3. The first two to use the SIMP method, other structural topology op- channels of the input tensor represent the nodal dis- timization methods can serve the same purpose. The placements in X and Y directions, respectively. The volume fraction, number of forces and direction of each next three channels are made up of the nodal normal force are three uniform distributed variables in the ini- strains εx, εy and shearing strain γxy. All the numbers tial conditions. The details of the samples are listed as of the last channel are identical to the volume fraction. follows: The output of the MATLAB code (Andreassen et al., 1) Resolution: 40 × 80 2011) gives the layout of the optimized structure under 2) Cantilever beam boundary condition (Fig.1) the given condition in the form of relative density, which 3) Volume fraction: 0.2 - 0.8 (Uniform distribution) is a 40 × 80 matrix with elements ranging from 0 to

A deep Convolutional Neural Network for topology optimization with strong generalization ability 5

1. It can be taken as a probability matrix representing ement, the activation of this final layer is set as the the probability that a pixel should be retained. Hence, sigmoid function. the optimal design can be directly treated as the goal The model was trained on a computer with an In- output of the neural network. tel(R)Core(TM) i7-8700k CPU and an NVIDIA GeForce To efficiently train the neural network and accu- GTX1080 Ti GPU using the Keras framework. The l2 rately evaluate its performance, all the samples are di- regularizer weight is 1e-5 and the initial learning rate is vided into the training set, validation set and test set 0.001. When the validation loss does not decrease for 10 with a ratio of 8:1:1. epochs, the learning rate will be decayed by 0.1 times. The details about the loss function, the regularization method and the optimizer are shown in 4.2.1 and 4.2.2. 4.2 The architecture of the neural network

4.2.1 Loss function and regularization Our network could be divided into two parts:

1) Encoding part: down-sampling the given array In the view of optimization, the training process of a and return a dimension reduced array; neural network is equivalent to an optimization prob- 2) Decoding part: up-sampling the given array and lem where we need to minimize or maximize an objec- return a dimension ascended array. tive function of the input tensor and other parameters. Fig. 2 shows the architecture of the neural network. In this paper, the objective of topology optimization is Considering the fact that the input of our network is deemed as searching for a probability distribution the condition and the output is the element solu- since each element’s relative density is normalized to tion, the input tensor needs to be modified before sent (0, 1) denoting the probability of existence. So given the to encoding blocks. The input tensor with 6 channels topology optimization target, we need to minimize is firstly sent to a convolution layer with no padding to the ”” between the known distribution reduce the shape. The next 3 blocks are encoding (optimiza- tion target) and the output distribution, and blocks with each block consisting of 2 convolution lay- Kullback- Leible divergence is proposed: ers, 2 batch norm layers, and 1 pooling layer, so an input tensor is convoluted, normalized and pooled in

p(x) D (p|| q) = p(x) log (7) each encoding block. The final output of encoding part KL q(x) is a multi-channel tensor with 8 times reduced in shape x compared to the original.

Before sent to decoding blocks, the outputs of the where p(x) is the optimization target (or empirical dis- tribution); q(x) is the output distribution of neural net- encoding blocks are sent to 2 additional convolution work. Kullback-Leible divergence is widely used in ma- layers to generate feature maps. Next, the feature maps chine learning, and it has many equivalent variants such are sent to decoding blocks, each of which consists of as cross entropy, ordinary least squares, etc. concatenate layer, transpose-convolution layer, batch- norm layer, and convolution layer. There are two kinds Besides, regularization is also used in designing loss of inputs for each decoding block: one is from the for- function. Regularization is a penality added on the pa- mer convolution layer and the other is from the corre- rameters to prevent overfitting and L2 regularization sponding encoding block. The inputs are first concate- (also known as L2 norm) is widely used: nated in the channel dimension. Then in the transpose- 1 convolution layer, the dimension reduced array is up- Ω(θ) = θ2 (8) 2 i sampled to restore the shape. And a convolution layer i is followed to generate the shape restored feature map.

So after 3 decoding blocks, we could get an array with where θi is the network parameters such as convolution the same shape as the input of the encoding part. kernels. In the last layers of our network, we use a stack of Therefore, the total loss function is the sum of Kullback- convolution layers with batch normalization layer to Leible divergence and L2 regularization: obtain element solution. The outputs of the decoding part are firstly concatenated with the input of encod- L = DKL(p||q) + λΩ(θ) (9) ing part in channel dimension. Then the channels are reduced in the following convolution layers and a fea- where λ is the regularization weight, which denotes the ture map with only one channel is generated in the final importance of regularization compared to Kullback- layer. To obtain the reservation probability for each el- Leible divergence.

6 Yiquan Zhanga et al.

Input Output

40×80×1

41×81×6 40×80×32 40×80×32

40×80×64 40×80×32 40×80×32 40×80×32 40×80×32

a=40,b=64 a=40,b=32

Encoding Block Block Encoding Block Decoding

Conv 1×1, Sigmoid Conv 2×2, ReLU Conv 3×3, ReLU

a=20,b=64 Conv 7×7, ReLU

a=20,b=128

Decoding Block Block Decoding Encoding Block Block Encoding Concatenate Bach normalization

Max pooling

Transpose

256 256 256

× × ×

10 10 10

× × ×

5 5 5

a=10,b=128

a=10,b=256

Decoding Block Block Decoding

Encoding Block Block Encoding

a,b a,b

b × (2a) a × b × (2a) a ×

a × (2a) × b × (2a) a ×

a × (2a) × b × (2a) a ×

a × (2a) × b × (2a) a × b × (2a) a × b × (2a) a ×

a × (2a) × b × (2a) a ×

(a/2) × a × b a × (a/2) ×

(a/2) × a × 2b a × (a/2) ×

(a/2) × a × (4b) a × (a/2) ×

Encoding Block Block Encoding Block Decoding

(a/2) × a × 2b a × (a/2) ×

Fig. 2 The architecture of the neural network

4.2.2 Optimizer of modern neural networks. In this paper, the Adam algorithm (Kingma and Ba, 2014), which adaptively In neural networks, gradient based methods are widely modifies the actual learning rate with the first and sec- used to optimize the loss function and the common form ond order moment estimation of the gradient for each is: parameter, is used and the updating rule is: ∂L θt = θt−1 − α (10) ∂θ ∂L where θt is the parameter θ in t step; α is the learning gt = (11) rate and L is the loss function. ∂θ One of the drawbacks of vanilla gradient descent methods is that the choice of learning rate α may have a great effect on the training process. In most cases, a mt = β1 · mt−1 + (1 − β1) · gt (12) smaller learning rate may delay the training process and a larger learning rate could even lead to the failure of training. Therefore, some advanced optimization algo-

t 2 rithms are proposed to accelerate the training process vt = β2 · vt−1 + (1 − β2) · g (13)

A deep Convolutional Neural Network for topology optimization with strong generalization ability 7

Core(TM) i7-8700k CPU and an NVIDIA GeForce GTX

1080 Ti GPU. The average computation time with the

conventional method is 3.913s, while the proposed takes

0.001s in average. It should be noted that the consid- 0.8

erable time of generating the dataset and training the neural network are not included, because once the neu-

0.6 ral network has been well trained, it can be used to

Loss solve innumerable topology optimization problems and these times are negligible as the number of the solved 0.4 problems increases. In this sense, the computational ef-

ficiency is greatly improved.

0.2 In most cases, there are slight differences between the final structures calculated by the conventional method 0 10 20 30 40 50 60 70 80 Epochs and the proposed method, as shown in Fig.4(a). The de- tails of the result comparison are listed in Table 1. Nev- Fig. 3 The loss on training and validation set ertheless, 4.12% samples in the test set have huge com-

pliance errors due to the emergence of the structural disconnection (Fig.4(b)). A similar observation was also

mt reported in the Yu et al. (2018)’s work. It is a short- mˆt = (14) coming for this kind of method since disconnections in 1 − βt 1 material are completely unacceptable for structures. In the proposed method, the high pixel similarity is the vt vˆ = only training objective of the deep neural network, t t (15) 1 − β2 which means the appearance of structural disconnec- tion is not detested as long as the result has high pixel accuracy. Therefore, the structure compliance should mˆt θt = θ t−1 − α · √ (16) be included in the neural network’s training objective vˆt + ε in future work. Besides the unacceptable results, about where mt is the biased first moment estimate of gradi- 8% results are a pleasant surprise. The structure pre- dicted by the neural network has lower compliance with ents; β1 is the exponential decay rate of mt ; vt is the bi- smaller volume fraction than the structure calculated ased second moment estimate of gradients; β2 is the ex- ponential decay rate of vt; mˆt is the bias-corrected first by the conventional method, as shown in Fig.4(c). They moment estimate of gradient; vˆt is the bias-corrected look very similar to each other, but the boundary of the second moment estimate of gradient. predicted structure is clearer. In other words, the pixel Fig.3 shows the history of loss on training and vali- values in the predicted structure are closer to 0 or 1. dation dataset during the training process. The valida- Although the surprising results are only a small frac- tion loss converges to about 0.15 after 50 epochs while tion of the test set, their appearance gives the hope that the training loss still goes down and the overfitting ap- the proposed method may perform better in both pears. Since the validation loss is the only indicator we computational efficiency and optimality than conven- focus on, more epochs of training are unnecessary. The tional method after the continuous improvement in the gap between validation loss and training loss can be re- future. duced by adjusting the hyperparameter λ in equation (9), which is meaningless because it will increase the validation loss. Table 1 The result of comparison

Pixel Volume Calculation Compliance values fraction time 5 Performance evaluation of the method error error error saved

Average The performance of the deep neural network model was 1 4.16% 4.53% 0.13% 99.97% evaluated by the 8000 samples in the test set. The struc- value ture designs of these samples were calculated by both 1 Samples with structural disconnection are not included. the conventional and proposed method. All the calcu- lations were performed on a computer with an Intel(R)

8 Yiquan Zhanga et al.

Conventional method Proposed method

(b) Examples of emergency of structural disconnection

Conventional method Proposed method Conventional method Proposed method (a) Examples of the normal case (c) Examples of the results better than expectation Fig. 4 Comparison of conventional and proposed method

A deep Convolutional Neural Network for topology optimization with strong generalization ability 9

Design Domain Design Domain

(a) Simply supported beam boundary condition (b) Continuous beam boundary condition Fig. 5 Examples of the results on other boundary condition

10 Yiquan Zhanga et al.

Table 2 The performance of the proposed neural network on other boundary condition

Compliance1 Pixel values1 Volume fraction1 Percentage of Boundary condition error error error structural disconnection

Simply supported beam 7.63% 8.05% 1.22% 13.64% Continuous beam 7.14% 7.47% 0.93% 11.50% 1 Samples with structural disconnection are not included.

6 The generalization ability of the neural ferent boundary condition since each sample has a dif- network ferent force condition in the training set. Nevertheless, the change on boundary condition cannot be completely A major disadvantage of previous studies on solving achieved by the reaction forces, because the rigid dis- topology optimization problems by deep learning with- placements would appear when calculating the nodal out iteration is that the trained neural network model is displacements under the condition that only the exter- limited to work on a specific boundary condition only. nal and reaction forces are given. Therefore, the initial Considering the plenty of time needed in the training nodal displacements must be included in the input ten- process, it is impractical to train a new neural network sor. Despite the ingenuity of the input tensor, it is still a once the boundary conditions change. The proposed challenge for the neural network to predict the optimal method may solve this problem to some extents because structure for a completely new boundary condition. For the trained neural network in the proposed method all the samples in the training set, the displacements of has a strong generalization ability. It could work on the leftmost nodes are zero, while the displacements of several different boundary conditions even though all leftmost nodes in the test set are not all zero. It may be samples in the training dataset are generated from the a little ‘confusing’ for the neural network when mak- ing optimization problem for a cantilever beam boundary. the prediction on the test samples and this may explain The generalization ability of the trained neural network why the performance on two other boundary is demonstrated for two typical engineering boundary conditions is much worse than on the cantilever beam conditions (1) a simply supported beam and (2) a two- boundary condition. However, the proposed framework span continuous beam. The performance is evaluated is still plausible given the provided generalization abil- through 8000 samples for each case. Fig.5 shows some ity. In addition, it clearly shows the importance to in- examples of the comparison between the conventional telligently design the input to achieve intelligence for and proposed method. The details of the result are the neural network. listed in the Table2. Compared with the performance on the cantilever beam boundary condition, the perfor- mance on these two boundary conditions is not ideal. 7 Other discussions about the proposed method The compliance error and the pixel values error are al- most doubled and the percentage of the structural dis- 7.1 The influence of the sample number on the neural connection is approximately tripled. Admittedly, chang- network’s performance ing the boundary condition has some adverse effects on the performance of the neural network. The result still It is very expensive to generate the dataset to train the demonstrates the generalization of the neural network neural network. Too few data may limit the abil- ity considering that it has not ‘met’ these two kinds of and accuracy, while too many data would raise the boundary conditions before. argument on the benefit. To figure out the relationship The key factor, which gives the neural network strong between the sample number and neural network’s per- generalization ability, may be the format of the neural formance and find a trade-off, the neural networks with network’s input. In the proposed method, the informa- the same structure were trained by a different num- tion about boundary condition is not a direct input for ber of samples. The numbers of samples include 3000, the neural network but hidden in the initial nodal dis- 4000, 5000, 6000, 7000, 8000, 9000, 10000, 15000, 20000, placements and strains, which are the first five channels 25000, 30000, 35000, 40000, 45000, 50000, 55000, 60000, of the input tensor. The boundary condition’s change 65000, 70000, 75000, 80000 and samples are divided is actually the reaction force’s change for preparing the into training set, validation set and test set with a ra- input tensor, which is also similar to the force condi- tio of 8:1:1 in all experiments. The pixel accuracy on the tion’s change. Consequently, the neural network can validation set is shown in the Fig.6. The accuracy at give a less accurate solution to the problem with dif- each point was the average result of three same ex-

A deep Convolutional Neural Network for topology optimization with strong generalization ability 11 periments. When the training samples number is less input tensor has all the information in the 6-channel than 10000, the pixel accuracy increases obviously with since the strain is the derivative of the displacement, it. From 10000 to 35000, the pixel accuracy increases and the neural network should be capable of finding the slightly with the training samples number. After 35000, strain based on the displacement. But the result indi- the pixel accuracy becomes stable (at about 96%). The cates that an informative input tensor clearly makes it decline of accuracy with the sample number varying easier for the neural network fit the objective function from 7000 to 9000 is a little abnormal. It shows that which can directly give us the near-optimal structure the neural network’s training is unstable if the number without iteration. of samples is not large enough. The 6-channel and 10-channel input tensors have al- most the same performance on the cantilever boundary condition, which demonstrates that all useful informa- tion in the added channels is included in the 6-channel

input tensor. Comparing the performance on two other boundary conditions, the average pixel values error is about 25% higher with 6-channel input tensor than 10- channel input tensor. Apparently, the neural network with 6-channel input tensor has better generalization ability and the redundant information in the added channels is adverse to the generalization ability, hence more information in the input tensor is not always bet- ter.

Table 3 Average pixel values errors with different input ten- sors

Boundary condition 3-channel 6-channel 10-channel Cantilever 13.53% 4.53% 4.82% Fig. 6 The convergence of pixel accuracy on samples number Simply supported untested 7.96% 9.95% Continuous untested 7.47% 9.19% * Samples with structural disconnection are not included.

7.2 The influence of input’s channels on the neural network’s performance 7.3 The influence of different architectures on the In addition to the final input tensor demonstrated in neural network’s performance the section 4.1, two other kinds of input tensors were tried during the experiment. One is a 3-channel tensor Before the final architecture was determined, lots of only including the volume fraction and nodal displace- the architectures had been tested, including GoogLeNet ments in X and Y direction (the first, second and last (Szegedy et al., 2014; Ioffe and Szegedy, 2015; Szegedy channels in Fig.1). The other is a 10-channel tensor et al., 2016; Szegedy et al., 2016), ResNet (He et al., that includes the all channels in Fig.1 and the other 4 2016) and UNet (Ronneberger et al., 2015). Table 4 channels proposed in Yu et al. (2018)’s work. The shows the performance of different architectures. Each 80000 samples introduced in section 4.1 and 10000 test architecture has tried different combinations of hyper- samples introduced in section 6 were transformed into parameters, and the best results are listed in the ta- ble. 3-channel and 10-channel form for training and test- It turns out that only UNet can significantly im- prove ing. The same architecture shown in Fig.2 was used to the pixel accuracy comparing to the conventional CNN. compare the influence of different input tensors on the Combining the ResNet with the U-Net or mak- ing the neural network’s performance. Table 3 illustrates the neural network deeper cannot improve the accu- racy. result of average pixel values errors for different input The use of GooLeNet (Fig. 7) would increase the tensors under different boundary conditions. training time dramatically but the improvement of per- The performance of 3-channel input tensor is signif- formance is almost inconspicuous. Adding the Dropout icantly worse than 6-channel input tensor on the can- layer also can not enhance the performance of the neu- tilever boundary condition. Theoretically, the 3-channel ral network, since the the BatchNormalization layers

12 Yiquan Zhanga et al. used in the original architecture can almost achieve all datasets have only one boundary condition. This makes Dropout layer’s functions. It should be noted that all it closer to the further practical applications because these conclusions can just provide a reference for other one well trained deep CNN could be useful in solving researchers. They are based on the limited hyper- all the problems of the same kind, which repays the in- parameters which have been tested in the experiments vestment to prepare the datasets and train the neural and it is possible that other researchers find a set of network. hyper-parameters which can disprove them in the fu- Nevertheless, the proposed method still has several ture. shortcomings to overcome in future work. The emer- gence of disconnection, which makes designers can not

totally trust the method, is unacceptable for structure final design. Therefore, this method is more suitable to be used in the preliminary design stage when design- ers want to have a general idea that the well-trained neural network is able to deliver within a very short time. To reduce the disconnection, a new learning ob- jective related to the compliance for the neural network would be required. This will be undertaken as part of our future work. In addition, the input of the neural net- work must be same size, which means the initial design domain shape cannot change. Besides, the lengthscale problem should be addressed in the future since it is a key issue to obtain a manufacturable structure design. Finally, this paper discusses the multi-load linear elastic compliance minimization problems, which can be solved within a acceptable time range if the scale is small. More work needs to be done in the future to verify the validity of the proposed method in more complicated problems. Although additional efforts are required, the Fig. 7 The architecture of the GoogLeNet proposed method is still a novel and valuable attempt to use the emerging technique to shed light on accelerating

the applications of topology optimization techniques in design practices. Table 4 Performance of different architectures

The type of architecture Accuracy on the test set

Conventional CNN 81.37% References GoogLeNet 84.95% U-Net 95.47% U-Net with ResNet 95.36% Allaire G, Jouve F, Toader AM (2004) Structural optimization using sensitivity analysis and a level- U-Net with ResNet and Dropout 95.42% set method . Journal of Computational Physics

194(1):363–393 Andreassen E, Clausen A, Schevenels M, Lazarov BS,

Sigmund O (2011) Efficient topology optimization in 8 Conclusion matlab using 88 lines of code. Structural Multidisci- plinary Optimization 43(1):1–16 In this study, a deep CNN is proposed for solving the Badrinarayanan V, Kendall A, Cipolla R (2017) Seg- net: topology optimization problem. Compared to the con- A deep convolutional encoder-decoder architec- ture ventional method, the proposed method can solve the for scene segmentation. IEEE Transactions on problem in negligible time. Besides, the strong gen- Pattern Analysis Machine Intelligence PP(99):1–1 eralization is the greatest advantage of the proposed Banga S, Gehani H, Bhilare S, Patel S, Kara L (2018) method because of the well-designed input form en- 3d topology optimization using convolu- tional abling to solve the topology optimization problem with neural networks. CoRR abs/1808.07440, URL different boundary conditions even though its training http://arxiv.org/abs/1808.07440, 1808.07440

A deep Convolutional Neural Network for topology optimization with strong generalization ability 13

Bendsoe MP (1989) Optimal shape design as a ma- Nagarajan HPN, Mokhtarian H, Jafarian H, Di- massi terial distribution problem. Structural Optimization S, Bakrani-Balani S, Hamedi A, Coatan´ea E, Gary 1(4):193–202 Wang G, Haapala KR (2018) Knowledge-based Bendsoe MP, Kikuchi N (1988) Generating optimal design of artificial neural network topology for ad- topologies in structural design using a homogeniza- ditive manufacturing process modeling: A new ap- tion method. Computer Methods in Applied Mechan- proach and case study for fused deposition modeling. ics Engineering 71(2):197–224 Journal of Mechanical Design 141(2):MD–18–1545 Cang R, Yao H, Ren Y (2019) One-shot generation of Nair V, Hinton GE (2010) Rectified linear units im- near-optimal topology through theory-driven ma- prove restricted boltzmann machines. In: Interna- chine learning. Computer-Aided Design 109:12 – 21, tional Conference on International Conference on DOI https://doi.org/10.1016/j.cad.2018.12.008 Machine Learning, pp 807–814 Du Z, Zhou XY, Picelli R, Kim HA (2018) Connecting Oh S, Jung Y, Kim S, Lee I, Kang N (2019) Deep Gen- microstructures for multiscale topology optimization erative Design: Integration of Topology Optimiza- with connectivity index constraints. Journal of Me- tion and Generative Models. Journal of Mechani- cal chanical Design 140(11):MD–18–1272 Design 141(11), DOI 10.1115/1.4044229, URL Gao G, yu Liu Z, bin Li Y, feng Qiao Y (2017) A new https://doi.org/10.1115/1.4044229, 111405 method to generate the ground structure in truss Rawat S, Shen MHH (2018) A novel topology design topology optimization. Engineering Optimization approach using an integrated deep learning network 49(2):235–251, DOI 10.1080/0305215X.2016.1169050 architecture Guo T, Lohan DJ, Cang R, Ren MY, Allison JT Ronneberger O, Fischer P, Brox T (2015) U-net: Con- (2018) An Indirect Design Representation for volutional networks for biomedical image segmenta- Topology Optimization Using Variational Autoen- tion. In: International Conference on Medical Image coder and Style Transfer. DOI 10.2514/6.2018-0804, and Computer-Assisted Intervention, pp URL https://arc.aiaa.org/doi/abs/10.2514/6.2018- 234–241 0804, https://arc.aiaa.org/doi/pdf/10.2514/6.2018- Rozvany GIN, Zhou M, Birker T (1992) Generalized 0804 shape optimization without homogenization. Struc- Guo X, Zhang W, Zhong W (2014) Doing topology opti- tural Optimization 4(3-4):250–252 mization explicitly and geometrically—a new moving Shelhamer E, Long J, Darrell T (2014) Fully convo- morphable components based framework. Journal of lutional networks for semantic segmentation. IEEE Applied Mechanics 81(8):081009–081009–12 Transactions on Pattern Analysis Machine Intelli- He K, Zhang X, Ren S, Sun J (2016) Deep residual gence 39(4):640–651 learning for image recognition. In: IEEE Conference Sigmund O, Maute K (2013) Topology optimization ap- on Computer Vision and Pattern Recognition, pp proaches. Structural Multidisciplinary Optimization 770–778 48(6):1031–1055 Ioffe S, Szegedy C (2015) Batch normalization: acceler- Sosnovik I, Oseledets I (2017) Neural networks for ating deep network training by reducing internal co- topology optimization. ArXiv e-prints 1709.09578 variate shift. In: International Conference on Interna- Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov tional Conference on Machine Learning, pp 448–456 D, Erhan D, Vanhoucke V, Rabinovich A (2014) Kingma DP, Ba J (2014) Adam: A Method for Stochas- Going Deeper with Convolutions. ArXiv e-prints tic Optimization. ArXiv e-prints arXiv:1412.6980, 1409.4842 1412.6980 Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Inception-v4, Inception-ResNet and the Impact of Gradient-based learning applied to document recog- Residual Connections on Learning. ArXiv e-prints nition. Proceedings of the IEEE 86(11):2278–2324 1602.07261 Lei X, Liu C, Du Z, Zhang W, Guo X (2018) Ma- chine Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z learning driven real time topology optimiza- tion (2016) Rethinking the inception architecture for under moving morphable component (mmc)- based computer vision. In: Computer Vision and Pattern framework. Journal of Applied Mechanics Recognition, pp 2818–2826 86(1):011004–9 Wang MY, Wang X, Guo D (2003) A level set method Michell A (1904) Lviii. the limits of economy of ma- for structural topology optimization. Com- puter terial in -structures. Philosophical Magazine Methods in Applied Mechanics and Engineer- ing 8(47):589–597 192(1–2):227–246

14 Yiquan Zhanga et al.

Xie YM, Steven GP (1993) A simple evolutionary pro- cedure for structural optimization. Computers Struc- tures 49(5):885–896 Yang Z, Li X, Catherine Brinson L, Choudhary AN, Chen W, Agrawal A (2018) Microstructural materi- als design via deep adversarial learning methodology. Journal of Mechanical Design 140(11):MD–18–1252 Yu Y, Hur T, Jung J, Jang IG (2018) Deep learning for determining a near-optimal topological design with- out any iteration. Structural and Multidisciplinary Optimization DOI 10.1007/s00158-018-2101-5, URL https://doi.org/10.1007/s00158-018-2101-5 Zhang W, Liu Y, Du Z, Zhu Y, Guo X (2018) A mov- ing morphable component based topology optimiza- tion approach for rib-stiffened structures considering buckling constraints. Journal of Mechanical Design 140(11):MD–18–1182 Zhou M, Rozvany G (1991) The coc algorithm, part ii: Topological, geometrical and generalized shape optimization. Computer Methods in Applied Me- chanics and Engineering 89(1):309 – 336, DOI https://doi.org/10.1016/0045-7825(91)90046-9, sec- ond World Congress on Computational Mechanics