applied sciences

Article A Long/Short-Term Memory Based Automated Testing Model to Quantitatively Evaluate

Lin-Kung Chen 1, Yen-Hung Chen 2,* , Shu-Fang Chang 3 and Shun-Chieh Chang 4

1 Mathematics and Applied Mathematics (Finance and Statistics), School of Information Engineering, SanMing University, SanMing 365004, China; [email protected] 2 Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112, Taiwan 3 Department of Nursing, National Taipei University of Nursing and Health Sciences, Taipei 112, Taiwan; [email protected] 4 Department of Business Administration, Shih Hsin University, Taipei 116, Taiwan; [email protected] * Correspondence: [email protected] or [email protected]; Tel.: +886-2-2822-7101

 Received: 11 August 2020; Accepted: 23 September 2020; Published: 25 September 2020 

Abstract: The mobile application lifespan is getting shorter. A company has to shorten the game testing procedure to avoid being squeezed out of the game market share. There is no sufficient testing indicator to objectively evaluate the operability of different game designs. Many automated testing methodologies are proposed, but they adopt rule-based approaches and cannot provide quantitative analysis to statistically evaluate gameplay experience. This study suggests applying “Learning Time” as a testing indicator and using the learning curve to identify the operability of different game designs. This study also proposes a Long/Short-Term Memory based automated testing model (called LSTM-Testing) to statistically testing game experience through end-to-end functionality (Input: game image; Output: game action) without any manual intervention. The experiment results demonstrate LSTM-Testing can provide quantitative testing data by using learning time as the control variable, game design as the independent variable, and time to complete game as the dependent variable. This study also demonstrates how LSTM-Testing evaluates the effectiveness of different gameplay learning strategies, e.g., reviewing the newest decisions, reviewing the correct decision, or reviewing the wrong decisions. The contributions of LSTM-Testing are (1) providing an objective and quantitative analytical game-testing framework, (2) reducing the labor cost of inefficient and subjective manual game testing, and (3) allowing game company boosts development by focusing on game intellectual property and leaves game testing to artificial intelligence (AI).

Keywords: artificial intelligence; artificial neural networks; ; software measurement; quality management

1. Introduction The mobile casual game with a simple gameplay feature (called casual game shortly) contributes 15~25% of the total sales volume of the game industry [1]. A gameplay is a series of actions in which players interact with a game, such a jump to dodge attacks. The design of the casual game targets casual players by reducing demands on time and learned gameplay skills, in contrast to complex hardcore games, such as Starcraft/Diablo-like games. This is because casual players only play games in short bursts for relaxing their brain or social purposes, and 70% of people spend only six months to play a game [2]. This leads that the lifespan of the casual game is getting shorter. A game company

Appl. Sci. 2020, 10, 6704; doi:10.3390/app10196704 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 6704 2 of 21 will be squeezed out of the game market share if its game development fails to effectively renew and shorten a new game in six months. The company has to shorten the casual game development cycle and then encounter a huge challenge in the game quality assurance process [3]. That is, evaluating a game design is not a pure software testing process. An unplayable game design lets a player experience disproportionate fairness towards the goal of the game, causing confusion, idling, boring, and finally, the player leaves the game. An unplayable game design is not a software bug and therefore cannot be detected by conventional automated testing technology, such as unit testing. Bad game design/experience is usually discovered by manual testing. Manual testing is not only a costly and time-consuming process, but also hardly evaluates the reasonability of leveling designs in a short testing period. The reason is that game tester will spontaneously acquire higher gameplay skills, and their skill is getting better in-game testing procedures. This leads that a high-skilled tester, from a beginner’s viewpoint, hardly objectively evaluates the reasonability of leveling from easy to hard. The second reason is the game tester may feel paralyzed about the game leveling after numbers of game testing, leading that tester becoming bored and think the leveling is indifferent. Many studies propose several automated testing methods to replace manual testing. Several works [3–6] apply a heuristic methodology to evaluate the overall leveling difficulty. These methods cannot quantitatively analyze game designs, including the difficulties between game levels and the reasonability of the leveling path. Some artificial intelligence (AI) assisted testing methods related to software testing in the industry are proposed [7]. However, their AI testing system is usually designed to find out problems/bugs of a game, instead of comparing the reasonability of game designs (e.g., reasonable leveling or reasonable maze design). To our best knowledge, there is no sufficient quantitative testing method to validate the game levels design. This study, therefore, proposes a Long/Short-Term Memory based automated testing model (called LSTM-Testing) to provide quantitative testing data. These data help the game designer to objectively evaluate the game design. LSTM-Testing emulates real player learning behaviors and iteratively plays the game through end-to-end functionality (Input: game image frame; Output: game action) without any manual feature extraction and intervention. Four phases of LSTM-Testing are adopted. First, LSTM-Testing keep taking screenshots from the game video as the samples in one game session. A game session means the period of time between a player initiates a game and this player reaches a save point or the end of a game. Second, LSTM-Testing applies deep neural network (DNN) to transform the gameplay image to linear equations, and interactively plays it to solve the linear equations. Third, LSTM-Testing adopts long/short-term memory to emulate gamer behaviors, including (a) how to play a game based on long-term memory and (b) how to relearn it based on short-term memory when a gamer forgets how to play it. Finally, LSTM-Testing decides the gameplay strategy by pressing the corresponding keyboard/mouse action and then generates “learning time” to complete a game as the dependent variable, “game design” as the control variable. Therefore, the game designer can control the game design and observe the dependent variable “learning time” to (1) quantitatively evaluate the difficulty of different game design, (2) to ensure reasonable leveling path design, and (3) to validate the game difficulty is proportional to scales. For example, a game designer can argue that a large maze is twice harder than a median maze, if LSTM-testing requires 2000 iterative trainings to solve a median maze design with 128 128 in size, and on the other hand, × LSTM-testing requires 4000 iterative trainings to solve a large maze design with 512 512 in size. × LSTM-Testing intends to help the game project manager to address the disputes when staffs keep debating over which features to include and which to leave for later in the game design process. LSTM-Testing can also predict outcomes of testing without even performing the actual tests. This helps the game designer only focus on those which could be problematic. As state above, the contributions of LSTM-Testing are (1) providing an objective and quantitative testing indicator to evaluate game experience including unreasonable traps in a game, calculate the average time to defend difference scales of enemy attacks, and keep proportional fairness of different game levels, (2) demonstrating Appl. Sci. 2020, 10, 6704 3 of 21 the possibility of rapidly AI-assisted prototyping which enables game project manager with less technical domain experts to develop a suitable game design and visual interfaces, and (3) presenting an AI-driven strategic decision-making framework to assess the performance of existing applications and help both business sponsors and engineering teams identify efforts that would maximize impact and minimize risk. This study is organized as follows: Section2 reviews the game testing procedure. Section3 provides the details of LSTM-Testing. Sections4 and5 demonstrate the experiment results and the implications. Several cases are provided to present the ability of LSTM-testing, including (1) evaluating different scales of enemy attack based on maze designs in Section4, (2) evaluating proportional fairness of different game leveling in Sections 5.1–5.3, and (3) ensuring there might be some unreasonable traps in a game in Section 5.4. Finally, Section6 describes conclusions.

2. Background: Game Development Life Cycle Several methodologies are proposed for the game development life cycle and can be characterized as a six-step process [8]. Step (1): Pitch. Pitch means making the basic design, and there are a few people involves in this phase, mostly senior game designers and artists. Pitch happens while negotiating with the investigator and major stockholders on the budget, the schedule and any specifically required features. At this stage, there will only be a few people on the team, mostly senior game designers and artists. Step (2): Pre-Production. Pre-production concretize the game design and create a prototype, once the contract has been signed. This prototype is sent to the main investigators and stockholders, so they can see what progress has been made and evaluate the feasibility of the game design. Step (3): Main Production. The realization of the game design is done through the main production. In this phase, the software engineering testing including unit testing, integration testing and compatibility testing, is conducted for quality assurance. Step (4): Alpha Test. Alpha Test is performed to explore all possible issues before releasing the game to the public. It simulates real users and carries out actions that a typical user might perform in a lab environment, and usually the testers are internal employees of the organization. Step (5): Beta Test. Beta testing is performed by “real users” of the game in a real environment to reduces product failure risks and provide increased quality of the product through customer validation. Step (6): Master. The game is launched in the master phase. This study focuses on the game design testing in Step (3): Main Production. Many studies propose automated game testing methods to replace manual testing. Several works [3–5] apply Petri-net based methodology that offers a graphical analysis for stepwise processes including choice, iteration, and concurrent execution. On the other hand, Dynamic Difficulty Adjustment (DDA) based on genetic algorithm [6] is proposed to heuristically search possible gameplay model and evaluate the overall leveling difficulty. These methods are still a process-based methodology that iteratively checks each design matches the man-made criteria. They therefore cannot quantitatively Appl. Sci. 2020, 10, 6704 4 of 21 analyze game designs, including the difficulties between game levels and the reasonability of the leveling path. They also cannot validate that the game level design is proportional to the game’s difficulty levels. It should be noted that game testing is a crucial procedure to ensure game quality through the development cycle. The major challenge of game testing is creating a comprehensive list of testing cases which consumes enormous labor cost. Some AI assisted testing methods are proposed [7]. The difference between the previous works and this study is twofold. First, from the perspective of testing approach design, they mainly use reinforcement learning with Q-algorithm to learn to complete a game rather than focusing on the learning strategy of a gamer. This study extends reinforcement learning by adding long/short term memory functionality to emulate gamer behaviors, including (a) how to play a game based on long-term memory and (b) how to relearn it based on short-term memory when a gamer forgets how to play it. Second, from the perspective of the testing objective, their AI testing system is usually designed to find out problems/bugs of a game. This study’s AI testing system is designed to compare the reasonability of game designs (e.g., reasonable leveling or reasonable maze design) by playing it iteratively.

3. Proposed Framework: LSTM-Testing This study, therefore, proposes a Long/Short-Term Memory based automated testing model (called LSTM-Testing) to provide quantitative testing data. These data help the game designer to objectively evaluate the game design. LSTM-Testing emulates real player learning behaviors and iteratively plays the game through end-to-end functionality (Input: game image frame; Output: game action) without any manual feature extraction and intervention. The architecture of LSTM-Testing shown in Figure1 is developed based on [9] and involves five logic gates and 2 memory pools including an input gate, convolutional gate, forget gate, training gate, output gate. Furthermore, short-term memory pool, and long-term memory pool are added to emulate gamer behaviors, including (a) how to play a game based on long-term memory and (b) how to relearn it based on short-term memory when a gamer forgets how to play it. Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 23

FigureFigure 1. Long 1. Long/Short/Short-Term-Term Memory Memory ( (LSTM)-TestingLSTM)-Testing Framewo Framework.rk.

3.1. Phase 1: Acquiring Sample In the first phase, LSTM-Testing screenshots video frame as an input image to the input gate as shown in the top-left side of Figure 1. Figure 2 presents the hardware settings: first to prepare a suitable testing room, projector, and project screen, then to set up a webcam and computer with LSTM-Testing installed, finally to run the game, project the game video frame on the project screen, and take a screenshot of the frame by the webcam. The software platform used by LSTM-Testing is developed by Python 3.8.3, and TensorFlow GPU 2.2.0 on Windows 10 platform.

Appl. Sci. 2020, 10, 6704 5 of 21

Four phases of 11 steps of LSTM-Testing, as shown in Figure1, are adopted: (1) Acquire Sample, (2) Extract Feature by Deep Neural Network, (3) Review by LSTM, and (4) Action and Summarize the Testing. Furthermore, Figure1 demonstrates three working flow: action decision flow, learning flow, and memorizing flow. The action decision flow (red flow) is an iterative flow consisting of steps 1, 2, 3, 4, 5, 6, 10, 11, and then back to step 1. The action decision flow decides what gameplay has the highest possibility to complete a game. The learning flow (blue flow) is also in iterative flow consisting of step 9.1 and step 9.2. The learning flow is an independent background thread that decides what gameplay decisions are picked up for reviewing periodically. One training cycle means a game session (e.g., playing until win or loss), and a cycle owns several times of reviewing. The memorizing flow is a sequential flow consisting of steps 7 and 8 to decide which sample should be stored in and drop from the short-term memory. The detailed operations of each phase are as follows.

3.1. Phase 1: Acquiring Sample In the first phase, LSTM-Testing screenshots video frame as an input image to the input gate as shown in the top-left side of Figure1. Figure2 presents the hardware settings: first to prepare a suitable testing room, projector, and project screen, then to set up a webcam and computer with LSTM-TestingAppl. Sci. 2020, 10 installed,, x FOR PEER finally REVIEW torun the game, project the game video frame on the project screen,6 of 23 and take a screenshot of the frame by the webcam. The software platform used by LSTM-Testing is developed by Python 3.8.3, and TensorFlow GPU 2.2.0 on Windows 10 platform.

(a) (b)

(c)

FigureFigure 2. 2.LSTM-Testing LSTM-Testing hardware hardware and and environment environment settings; settings (;a ()a preparing) preparing a projectora projector screen screen and and projector;projector; (b )(b set) set up up the the webcam webcam and and LSTM-Testing LSTM-Testing system; system; (c )(c project) project the the video video frame frame on on the the project project screenscreen and and take take a screenshota screenshot by by the the webcam. webcam.

3.2.3.2. Phase Phase 2: 2: Extracting Extracting Feature Feature by by Deep Deep Neural Neural Network Network InIn the the second second phase, phase, LSTM-Testing LSTM-Testing extracts extracts features features from from the the screenshot screenshot image image by by deep deep neural neural network.network. The The second second phase phase consists consist ofs of steps step 2~6s 2~6 as as shown shown in in the the top-right top-right corner corner of of Figure Figure1. 1. Steps 2 and 3 use input gate to aggregate continuous pictures as a sample. The input gate owns a screenshot queue to store k screenshots. After one screenshot taken in Step 1, the input gate pops the oldest frame and pushes the new one into its queue, then the input gate combines the screenshots in the queue as a sample of a gameplay, and then it normalizes this sample to standard input (three- dimensional array) for the convolution gate, i.e., . For example, in order to store the action of a gameplay moving right-ward, the input gate pops 𝑠𝑠𝑠𝑠𝑠𝑠the oldest screenshot at time t-4 from the queue 𝑆𝑆 and pushes the current screenshot at time t into the queue, and then the input gate combines the screenshots in the queue at time (t-3)~t to form a gameplay sample of moving left-ward (Figure 3a). The input gate then converts these images to two-dimensional (2D) arrays of pixel values (Figure 3b), and finally aggregates these 2D arrays to a 3D array as a sample to demonstrate moving right-ward action (Figure 3c). It should be noted that the LSTM-Testing does not collect all video frames of a game but only collects the frames based on the game testing scenario (e.g., eye strain or different hardware requirements). For example, LSTM-Testing may only collect 20 frames per second (one frame per 50 milliseconds) to emulate the game experience of a gamer with high eye strain, given that this game can support 120 frames per second for better flicker fusion experience. Appl. Sci. 2020, 10, 6704 6 of 21

Steps 2 and 3 use input gate to aggregate continuous pictures as a sample. The input gate owns a screenshot queue to store k screenshots. After one screenshot taken in Step 1, the input gate pops the oldest frame and pushes the new one into its queue, then the input gate combines the screenshots in the queue as a sample of a gameplay, and then it normalizes this sample to standard input (three-dimensional array) for the convolution gate, i.e., Sstd. For example, in order to store the action of a gameplay moving right-ward, the input gate pops the oldest screenshot at time t-4 from the queue and pushes the current screenshot at time t into the queue, and then the input gate combines the screenshots in the queue at time (t-3)~t to form a gameplay sample of moving left-ward (Figure3a). The input gate then converts these images to two-dimensional (2D) arrays of pixel values (Figure3b), and finally aggregates these 2D arrays to a 3D array as a sample to demonstrate moving right-ward action (Figure3c). It should be noted that the LSTM-Testing does not collect all video frames of a game but only collects the frames based on the game testing scenario (e.g., eye strain or different hardware requirements). For example, LSTM-Testing may only collect 20 frames per second (one frame per 50 milliseconds) to emulate the game experience of a gamer with high eye strain, given that this game can support 120 frames per second for better flicker fusion experience. Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 23

FigureFigure 3. An3. An example example of of aggregate aggregate continuous continuous screenshot screenshot imagesimages toto show a right right-ward-ward action. action.

In stepsIn step 4~6,s 4~6, LSTM-Testing LSTM-Testing applies applies the the convolutional convolutional gategate toto reducereduce the sample size size,, which which can can lowerlower the the computational computational complexity complexity during during the the training training process.process. TheThe convolution gate gate consists consists of of convolution,convolution, pooling, pooling, and and fully-connected fully-connected operations. operations. TheThe convolution convolution operation operation in in step step 4 4 attempts attempts toto extractextract targetedtargeted features from the the origination origination samplesample and and filter filter out out unnecessary unnecessary noise noise by by calculatingcalculating thethe input sample sample Sstd with multiple vectors vectors cv cv 𝑠𝑠𝑠𝑠𝑠𝑠 of weightof weightw ,and, and bias biasb ., The, . The vector vector of of weights weights and and the the biasbias areare calledcalled a a filter, filter, and and this this filter filter p,q,l p,q,l 𝑆𝑆 representsrepresents feature feature𝑐𝑐𝑐𝑐 extraction extraction𝑐𝑐𝑐𝑐 of of the the input input sample.sample. As As shown shown in in Figure Figure 4,4 K, filtersK filters are are first first selected selected in 𝑤𝑤𝑝𝑝 𝑞𝑞 𝑙𝑙 𝑏𝑏𝑝𝑝 𝑞𝑞 𝑙𝑙 this convolutional operation, and each filter k kcontains a weight wcv, , k( P× Q× L matrix) and bias in this convolutional operation, and each filter contains a weight ,p,q,l ( matrix) and bias , × × cv,k × 𝑐𝑐𝑐𝑐 𝑘𝑘 b (, I (J matrix). matrix). Then, Then, the the sample sampleSstd is convolutional is convolutional calculated calculated𝑝𝑝 with𝑞𝑞 𝑙𝑙 with each each filter filter by by equation equation (1), i,j 𝑐𝑐𝑐𝑐 𝑘𝑘 𝑤𝑤 𝑃𝑃 𝑄𝑄 𝐿𝐿 (1), ×extracting a convoluted samplecv 𝑠𝑠𝑠𝑠𝑠𝑠 ( × matrix): extracting𝑏𝑏𝑖𝑖 𝑗𝑗 𝐼𝐼 a convoluted𝐽𝐽 sample S (I , 𝑆𝑆,J matrix): i,j,k ×𝑐𝑐𝑐𝑐 𝑆𝑆𝑖𝑖 𝑗𝑗 𝑘𝑘 𝐼𝐼 𝐽𝐽 , , , , =XP 𝑃𝑃XQ𝑄𝑄 XL𝐿𝐿 ( × , × , × , , + , ,) (1) Scv 𝑐𝑐𝑐𝑐= Sstd𝑠𝑠𝑠𝑠𝑠𝑠 wcv𝑐𝑐𝑐𝑐,k𝑘𝑘+ bcv𝑐𝑐𝑐𝑐,k𝑘𝑘 (1) i,j,𝑖𝑖k𝑗𝑗 𝑘𝑘 d𝑑𝑑 i+𝑖𝑖+p𝑝𝑝,d𝑑𝑑 j𝑗𝑗++q𝑞𝑞,l𝑙𝑙× p𝑝𝑝,q𝑞𝑞,l𝑙𝑙 p𝑝𝑝,q𝑞𝑞,l𝑙𝑙 𝑆𝑆 =�=� � 𝑆𝑆× × 𝑤𝑤 𝑏𝑏 where d is the sliding distance (calledp 𝑝𝑝1=1q 𝑞𝑞stride)=0 0l=𝑙𝑙=11 that a filter attempts to decrease the overlay extracting wherecellsd process.is the sliding distance (called stride) that a filter attempts to decrease the overlay extracting cells process.

Figure 4. Convolutional operation. Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 23

Figure 3. An example of aggregate continuous screenshot images to show a right-ward action.

In steps 4~6, LSTM-Testing applies the convolutional gate to reduce the sample size, which can lower the computational complexity during the training process. The convolution gate consists of convolution, pooling, and fully-connected operations. The convolution operation in step 4 attempts to extract targeted features from the origination sample and filter out unnecessary noise by calculating the input sample with multiple vectors 𝑠𝑠𝑠𝑠𝑠𝑠 of weight , , and bias , , . The vector of weights and the bias are called a filter, and this filter 𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐 𝑆𝑆 represents feature𝑝𝑝 𝑞𝑞 𝑙𝑙 extraction𝑝𝑝 𝑞𝑞of𝑙𝑙 the input sample. As shown in Figure 4, K filters are first selected in 𝑤𝑤 𝑏𝑏 , this convolutional operation, and each filter k contains a weight , , ( × × matrix) and bias , ( × matrix). Then, the sample is convolutional calculated𝑐𝑐𝑐𝑐 𝑘𝑘 with each filter by equation , 𝑤𝑤𝑝𝑝 𝑞𝑞 𝑙𝑙 𝑃𝑃 𝑄𝑄 𝐿𝐿 𝑐𝑐𝑐𝑐 𝑘𝑘 𝑠𝑠𝑠𝑠𝑠𝑠 (1),𝑖𝑖 𝑗𝑗 extracting a convoluted sample , , ( × matrix): 𝑏𝑏 𝐼𝐼 𝐽𝐽 𝑐𝑐𝑐𝑐𝑆𝑆 𝑆𝑆𝑖𝑖 𝑗𝑗 𝑘𝑘 𝐼𝐼 𝐽𝐽 , , , , = 𝑃𝑃 𝑄𝑄 𝐿𝐿 ( × , × , × , , + , , ) (1) 𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠 𝑐𝑐𝑐𝑐 𝑘𝑘 𝑐𝑐𝑐𝑐 𝑘𝑘 𝑆𝑆𝑖𝑖 𝑗𝑗 𝑘𝑘 � � � 𝑆𝑆𝑑𝑑 𝑖𝑖+𝑝𝑝 𝑑𝑑 𝑗𝑗+𝑞𝑞 𝑙𝑙 𝑤𝑤𝑝𝑝 𝑞𝑞 𝑙𝑙 𝑏𝑏𝑝𝑝 𝑞𝑞 𝑙𝑙 Appl.where Sci. d2020 is the, 10, sliding 6704 distance (called𝑝𝑝=1 𝑞𝑞stride)=0 𝑙𝑙=1 that a filter attempts to decrease the overlay extracting7 of 21 cells process.

Appl. Sci. 2020, 10, x FOR PEER REVIEW FigureFigure 4.4. Convolutional operation. 8 of 23

FigureFigure5 5demonstrates demonstrates an an example example of diof ffdifferenterent strides strides with withSstd (7 7 (matrix)7 × 7 matrix) and W and(7 7Wmatrix). (7 × 7 × × Thematrix). number The of number overlay of cell overlay in the featurecell in the extraction feature processextraction with process a stride𝑠𝑠𝑠𝑠𝑠𝑠 with of one a stride is two, of and one the is extractedtwo, and cv 𝑆𝑆 Sthe willextracted be a 5 5 willmatrix. be a On5 × the5 matrix. other hand, On the the other number hand, ofthe overlay number cells of overlay in the featurecells in the extraction feature × cv cv processextraction with process a stride𝑐𝑐𝑐𝑐 with of one a stride is one, of and one the is one, extracted and theS willextra bected a 3 3 matrix.will be Thea 3 S× 3therefore matrix. The shrinks 𝑆𝑆 × cv whentherefore the strideshrinks is setwhen to twothe fromstride one. is set The to gametwo from tester one. can The choose game to𝑐𝑐𝑐𝑐 decreasetester can the choose size of toS decreaseand lower the𝑐𝑐𝑐𝑐 𝑆𝑆 𝑆𝑆 thesize computational of and lower complexity the computational with the trade-o complexityff of learning with the speed. trade-off of learning speed. 𝑐𝑐𝑐𝑐 𝑆𝑆

(a) (b)

FigureFigure 5.5. AnAn exampleexample ofof didifferentfferent strides;strides; (a(a)) a a stridestride ofof one;one; ((bb)) aa stridestride of two.

FigureFigure6 6 shows shows an an example example of of a a convolution convolution operation operation to to transform transform a a 7 7 × 77 arrayarray toto aa 33 × 3 array × × byby 55 × 5 filter filter and stride = 2. At At first, first, the the 5 × 55 array array in in the the top top-left-left corner corner of the original array is picked × × up,up, multiplymultiply thethe filter filter (element (element by by element), element), and and fill fill it init in the the top-left top-left corner corner of theof the new new array array (Figure (Figure6a). Figure6a). Figure6b follows 6b follows the same the same principle principle to fill to the fill new the array.new array.

Figure 6. An example of convolution operation.

The Pooling operation in step 4 reduces the scale of by combining the data clusters into one single cluster. Figure 7 shows the pooling operation that𝑐𝑐𝑐𝑐 first splits the into U samples, then 𝑆𝑆 divides each sample into different clusters, and finally combining the cluster𝑐𝑐𝑐𝑐 into one value. Two 𝑆𝑆 combination methods, max-pooling and average-pooling, are usually adopted. The max-pooling Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 23

Figure 5 demonstrates an example of different strides with (7 × 7 matrix) and W (7 × 7 matrix). The number of overlay cell in the feature extraction process𝑠𝑠𝑠𝑠𝑠𝑠 with a stride of one is two, and 𝑆𝑆 the extracted will be a 5 × 5 matrix. On the other hand, the number of overlay cells in the feature extraction process𝑐𝑐𝑐𝑐 with a stride of one is one, and the extracted will be a 3 × 3 matrix. The 𝑆𝑆 therefore shrinks when the stride is set to two from one. The game𝑐𝑐𝑐𝑐 tester can choose to decrease the𝑐𝑐𝑐𝑐 𝑆𝑆 𝑆𝑆 size of and lower the computational complexity with the trade-off of learning speed. 𝑐𝑐𝑐𝑐 𝑆𝑆

(a) (b)

Figure 5. An example of different strides; (a) a stride of one; (b) a stride of two.

Figure 6 shows an example of a convolution operation to transform a 7 × 7 array to a 3 × 3 array by 5 × 5 filter and stride = 2. At first, the 5 × 5 array in the top-left corner of the original array is picked Appl.up, Sci.multiply2020, 10 ,the 6704 filter (element by element), and fill it in the top-left corner of the new array (Figure8 of 21 6a). Figure 6b follows the same principle to fill the new array.

Figure 6. An example of convolution operation. Figure 6. An example of convolution operation. The Pooling operation in step 4 reduces the scale of Scv by combining the data clusters into one singleThe cluster. Pooling Figure operation7 shows thein step pooling 4 reduces operation the scale that firstof splits by thecombiningScv into the U samples, data clusters then dividesinto one 𝑐𝑐𝑐𝑐 Appl.eachsingle Sci. sample 20 cluster.20, 10 into, x FORFigure diff PEERerent 7 REVIEWshows clusters, the and pooling finally operation combining that the first cluster splits into the one value. into Two U samples, combination9 of then 23 𝑆𝑆 𝑐𝑐𝑐𝑐 methods,divides each max-pooling sample into and different average-pooling, clusters, and are usuallyfinally combining adopted. Thethe cluster max-pooling into one picks value. up Two the picks up the biggest value from the cluster, and the average-pooling calculates𝑆𝑆 the average value of biggestcombination value frommethods, the cluster, max-pooling and the average-poolingand average-pooling, calculates are usually the average adopted. value The of the max cluster.-pooling the cluster.

FigureFigure 7. 7. PoolingPooling operation. operation.

Figure 8 demonstrates an example of a pooling operation to transform a 4 × 4 array to a 2 × 2 array by using max-pooling principle with 22 filter and stride = 2. At first, the biggest one in the top- left corner is picked up, which is 9, and filled it in the top-left corner of the new array (Figure 8a). Figure 9b–d follows the same principle to fill the new array.

Figure 8. An example of pooling operation.

The fully-connected operation in step 6 aggregates information from that matters the most and then generates a short-term memory at time t, namely , as the input𝑝𝑝𝑝𝑝 for step 7. Figure 9 𝑆𝑆 presents the fully-connected operation that first flattens the 𝑡𝑡 to a one-dimensional array = 𝑠𝑠𝑠𝑠 [ , … , ], and then executes the fully-connected calculation by𝑝𝑝𝑝𝑝 Equation (2): 𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝 𝑝𝑝𝑙𝑙 𝑆𝑆 𝐶𝐶 𝑐𝑐1 𝑐𝑐𝑟𝑟 = + (2) 𝑡𝑡 𝑝𝑝𝑝𝑝 𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓 … 𝑠𝑠𝑠𝑠 𝐶𝐶 𝑊𝑊 𝐵𝐵 , , where =[ , … , ] is the short-term memory converted from this sample, = 𝑤𝑤1 1 𝑤𝑤1 𝑝𝑝 𝑓𝑓𝑓𝑓 , … , 1 𝑣𝑣 ⋮ ⋱ ⋮ is the weight𝑠𝑠𝑠𝑠 𝑐𝑐vector,𝑐𝑐 =[ , … , ] is the bias vector. 𝑊𝑊 � � 𝑤𝑤𝑟𝑟 1 𝑤𝑤𝑟𝑟 𝑝𝑝 𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓 𝐵𝐵 𝑏𝑏1 𝑏𝑏𝑟𝑟 Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 23

picks up the biggest value from the cluster, and the average-pooling calculates the average value of the cluster.

Appl. Sci. 2020, 10, 6704 9 of 21 Figure 7. Pooling operation.

Figure8 demonstrates an example of a pooling operation to transform a 4 4 array to a 2 2 Figure 8 demonstrates an example of a pooling operation to transform a ×4 × 4 array to a ×2 × 2 arrayarray by by using using max-pooling max-pooling principle principle withwith 2222 filterfilter and stride == 2.2. At At first, first, the the biggest biggest one one in the in the top- top-leftleft corner corner is ispicke pickedd up, up, which which is is 9, 9, and and filled filled it it in in the top-lefttop-left cornercorner ofof thethe newnew array array (Figure (Figure8a). 8a). FigureFigure9b–d 9b– followsd follows the the same same principle principle to fillto fill the the new new array. array.

Figure 8. An example of pooling operation. Figure 8. An example of pooling operation. The fully-connected operation in step 6 aggregates information from Spl that matters the most and then generatesThe fully a-connected short-term operation memory atin timestep 6t, aggregates namely sm tinformation, as the input from for step 7.that Figure matters9 presents the most and then generates a short-term memory at timepl t, namely , as the input𝑝𝑝𝑝𝑝 forpl steph pl 7. Figurepli 9 the fully-connected operation that first flattens the S to a one-dimensional array𝑆𝑆 C = c , ... , cr , presents the fully-connected operation that first flattens the 𝑡𝑡 to a one-dimensional 1array = and then executes the fully-connected calculation by Equation (2):𝑠𝑠𝑠𝑠 [ , … , ], and then executes the fully-connected calculation by𝑝𝑝𝑝𝑝 Equation (2): 𝑝𝑝𝑝𝑝 𝑆𝑆 𝐶𝐶 𝑝𝑝𝑝𝑝 𝑝𝑝𝑙𝑙 smt = CplW f c + B f c (2) 𝑐𝑐1 𝑐𝑐𝑟𝑟 = + (2) 𝑡𝑡 𝑝𝑝𝑝𝑝 𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓 f c where sm = [c , ... , cv] is the short-term memory converted from this sample,, W… =,  1 𝑠𝑠𝑠𝑠 𝐶𝐶 𝑊𝑊 𝐵𝐵 w ... w[ , … , ]  where1,1 =1,p  is the short-term memory converted from this sample, = 1 1 1 𝑝𝑝  . .  h f c f ci 𝑤𝑤 … 𝑤𝑤  . .. .  is the weight vector, B f c = b , ... , b is the bias vector. 𝑓𝑓𝑓𝑓 , ,  . . . 1  𝑣𝑣 1 r ⋮ ⋱ ⋮  is the weight𝑠𝑠𝑠𝑠 𝑐𝑐vector, 𝑐𝑐 =[ , … , ] is the bias vector. 𝑊𝑊 � �  w ... w  𝑤𝑤𝑟𝑟 1 𝑤𝑤𝑟𝑟 𝑝𝑝 Appl.r,1 Sci. 2020, 10,r ,xp FOR PEER𝑓𝑓 𝑓𝑓REVIEW𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓 10 of 23 𝐵𝐵 𝑏𝑏1 𝑏𝑏𝑟𝑟

FigureFigure 9.9. Fully-connectedFully-connected operation.operation. 3.3. Phase 3: Review by LSTM 3.3. Phase 3: Review by LSTM In Phase 3, LSTM-Testing adopts long/short-term memory to emulate gamer learning behaviors. In Phase 3, LSTM-Testing adopts long/short-term memory to emulate gamer learning behaviors. As steps 7~10 shown in the center part of Figure1, LSTM-Testing stores the short-term memory smt As steps 7~10 shown in the center part of Figure 1, LSTM-Testing stores the short-term memory into the short-term memory pool for the following training. The training strategy is the learning𝑡𝑡 𝑠𝑠𝑠𝑠 strategy that what gameplay decisions are kept in the short memory queue for periodical reviewing 𝑆𝑆𝑆𝑆 and, on the other hand, what gameplay decisions are unnecessary and dropped from the short memory queue. The learning strategy designed in this study is reviewing the newest decision and drop the older ones. That is, the decisions made are kept in the short memory queue and pick up periodically for training, and the oldest decision will be dropped when the short memory queue is full. As shown in Figure 10, the former is controlled by the forget gate ( ), and the latter is controlled by the training gate ( ). 𝐹𝐹𝑓𝑓𝑓𝑓 𝐹𝐹𝑡𝑡𝑡𝑡

Figure 10. Phase 3: Review by LSTM.

The forget gate manages samples in the short-term memory pool. That is, the forget gate determines which sample is valuable and should stay in the pool for training gate, and drops the sample when it is no longer usable, according to ( ) as shown in Figure 11.

𝐹𝐹𝑓𝑓𝑓𝑓 Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 23

Figure 9. Fully-connected operation.

Appl.3.3. Sci. Phase2020 3:, 10 Review, 6704 by LSTM 10 of 21 In Phase 3, LSTM-Testing adopts long/short-term memory to emulate gamer learning behaviors. intoAs thestep short-terms 7~10 shown memory in the center pool SM partfor of theFigure following 1, LSTM training.-Testing Thestores training the short strategy-term memory is the learning into the short-term memory pool for the following training. The training strategy is the learning𝑡𝑡 strategy that what gameplay decisions are kept in the short memory queue for periodical reviewing𝑠𝑠𝑠𝑠 strategy that what gameplay decisions are kept in the short memory queue for periodical reviewing and, on the other hand, what gameplay𝑆𝑆𝑆𝑆 decisions are unnecessary and dropped from the short memory and, on the other hand, what gameplay decisions are unnecessary and dropped from the short queue. The learning strategy designed in this study is reviewing the newest decision and drop the memory queue. The learning strategy designed in this study is reviewing the newest decision and older ones. That is, the decisions made are kept in the short memory queue and pick up periodically drop the older ones. That is, the decisions made are kept in the short memory queue and pick up for training, and the oldest decision will be dropped when the short memory queue is full. As shown periodically for training, and the oldest decision will be dropped when the short memory queue is in Figure 10, the former is controlled by the forget gate F (), and the latter is controlled by the training full. As shown in Figure 10, the former is controlledf tby the forget gate ( ), and the latter is gate F (). controlledtr by the training gate ( ). 𝐹𝐹𝑓𝑓𝑓𝑓 𝐹𝐹𝑡𝑡𝑡𝑡

Figure 10. Phase 3: Review by LSTM. Figure 10. Phase 3: Review by LSTM. The forget gate manages samples in the short-term memory pool. That is, the forget gate The forget gate manages samples in the short-term memory pool. That is, the forget gate determines which sample is valuable and should stay in the pool for training gate, and drops the determines which sample is valuable and should stay in the pool for training gate, and drops the sample when it is no longer usable, according to F () as shown in Figure 11. sampleAppl. Sci. when2020, 10 it, x is FOR no PEER longer REVIEW usable, according to f t ( ) as shown in Figure 11. 11 of 23

𝐹𝐹𝑓𝑓𝑓𝑓

Figure 11. Algorithm of ( ). Figure 11. Algorithm of F f t() 𝐹𝐹𝑓𝑓𝑓𝑓 TheThe training training gate gate applies applies aa stochasticstochastic approximation method, method, which which interactively interactively loads loads samples samples  lm lm  w , …... w,  1,1 1,z  in the short-term memory and then updates the long-term memory = 𝑙𝑙𝑙𝑙. 𝑙𝑙𝑙𝑙. by LM =  1.1 .. 1 𝑧𝑧.  in the short-term memory and then updates the long-term memory  𝑤𝑤 . … . 𝑤𝑤 .  by  , ,  𝐿𝐿𝐿𝐿 � ⋮𝑙𝑙𝑙𝑙lm ⋱ ⋮𝑙𝑙𝑙𝑙lm�  optimizing a differentiable objective according to ( ). As shown in Figure 12, inw vthis,1 study,... w v,z ( ) 𝑤𝑤𝑣𝑣 1 𝑤𝑤𝑣𝑣 𝑧𝑧 applies Adaptive Moment Estimation [10] that evaluates averages of both the gradients and the 𝐹𝐹𝑡𝑡𝑡𝑡 𝐹𝐹𝑡𝑡𝑡𝑡 second moments of the gradients to iteratively optimizing the objective function.

Figure 12. Algorithm of ( ).

𝐹𝐹𝑡𝑡𝑡𝑡 Then, step 10 calculates the probability array = [ , … , ] to demonstrate the probability of z possible gameplays to complete the game. The calculation is defined by: 𝐴𝐴 𝑎𝑎1 𝑎𝑎𝑧𝑧 = + (3) 𝑡𝑡 𝑙𝑙𝑙𝑙 where =[ , … , ] is the bias vector.𝐴𝐴 𝑠𝑠𝑠𝑠 ∙ 𝐿𝐿𝐿𝐿 𝐵𝐵 𝑙𝑙𝑙𝑙 𝑙𝑙𝑙𝑙 𝑙𝑙𝑙𝑙 1 𝑧𝑧 3.4. Phase𝐵𝐵 4: Action𝑏𝑏 and𝑏𝑏 Summarize the Testing Step 11 picks up the action with the highest probability from the result A of the output gate. LSTM-Testing then triggers the corresponding keyboard/mouse event and go back to step 1 iteratively. After acquiring sufficient testing results according to the testing plan, LSTM-Testing then Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 23

Figure 11. Algorithm of ( ).

𝐹𝐹𝑓𝑓𝑓𝑓 The training gate applies a stochastic approximation method, which interactively loads samples Appl. Sci. 2020, 10, 6704 , … , 11 of 21 in the short-term memory and then updates the long-term memory = 𝑙𝑙𝑙𝑙 𝑙𝑙𝑙𝑙 by 1 1 1 𝑧𝑧 𝑤𝑤 , … 𝑤𝑤 , 𝐿𝐿𝐿𝐿 � � optimizingoptimizing a a di differentiablefferentiable objectiveobjective accordingaccording to Ftr()( ).. As shown in in Figure Figure 12,12 ,in in⋮ 𝑙𝑙𝑙𝑙this this ⋱study, study,⋮𝑙𝑙𝑙𝑙 Ftr(() ) 𝑤𝑤𝑣𝑣 1 𝑤𝑤𝑣𝑣 𝑧𝑧 appliesapplies Adaptive Adaptive Moment Moment Estimation Estimation [10 ][10] that that evaluates evaluates averages averages of both of theboth gradients the gradients and the and second the 𝐹𝐹𝑡𝑡𝑡𝑡 𝐹𝐹𝑡𝑡𝑡𝑡 momentssecond moments of the gradients of the gradients to iteratively to iteratively optimizing opti themizing objective the objective function. function.

FigureFigure 12.12. AlgorithmAlgorithm of Ftr()( ).

𝐹𝐹𝑡𝑡𝑡𝑡 Then,Then, step step 10 10 calculates calculates the the probability probability array arrayA ==[a[1, ..., …, ,az]]to to demonstrate demonstrate the the probability probability of of z possiblez possible gameplays gameplays to completeto complete the the game. game. The The calculation calculation is is defined defined by: by: 𝐴𝐴 𝑎𝑎1 𝑎𝑎𝑧𝑧 A == smt LM ++Blm (3(3)) 𝑡𝑡· 𝑙𝑙𝑙𝑙 where =[ , … , ] is the bias vector. lm h lm lmi 𝐴𝐴 𝑠𝑠𝑠𝑠 ∙ 𝐿𝐿𝐿𝐿 𝐵𝐵 where B 𝑙𝑙𝑙𝑙= b 𝑙𝑙𝑙𝑙, ... , bz𝑙𝑙𝑙𝑙 is the bias vector. 1 1 𝑧𝑧 3.4. Phase𝐵𝐵 4: Action𝑏𝑏 and𝑏𝑏 Summarize the Testing 3.4. Phase 4: Action and Summarize the Testing Step 11 picks up the action with the highest probability from the result A of the output gate. LSTMStep-Testing 11 picks then up triggers the action the with corresponding the highest probabilitykeyboard/mouse from theevent result and A go of theback output to step gate. 1 LSTM-Testingiteratively. After then acquiring triggers thesufficient corresponding testing results keyboard according/mouse to event the testing and go plan, back LSTM to step-Testing 1 iteratively. then After acquiring sufficient testing results according to the testing plan, LSTM-Testing then summarizes “learning time to complete a game” as the dependent variable, “game design” as the control variable.

3.5. The LSTM-Testing Parameter Setting in This Study The design of these logic gates can be adjusted according to the complexity of games. The more complex the gates, the better performance it achieves. However, the training time and computation time is getting higher for a deeper network, the game tester management has to gauge the trade-off between performance and computation-cost. The LSTM-Testing settings for the following testing case in this study are presented in Figure 13. LSTM-Testing first screenshots each gameplay as the input, and it uses 4 continuous screenshots as a sample of combination gameplays. The standard input size is a 3D array (80 80 4) for the × × following steps. LSTM-Testing adopts two combinations of convolution and subsampling for feature extraction. The Adaptive Moment Estimation then calculates the probabilities of possible gameplay actions. Then LSTM-Testing chooses the action with the highest probability to complete the game. The size of short-term memory is 50,000, the training gate Ftr() randomly selects 128 samples from the short-term memory for training, and the forgetting gate F f t() drops the oldest sample in the short-term memory when the queue of short-term memory is full. Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 23

summarizes “learning time to complete a game” as the dependent variable, “game design” as the control variable.

3.5. The LSTM-Testing Parameter Setting in This Study The design of these logic gates can be adjusted according to the complexity of games. The more complex the gates, the better performance it achieves. However, the training time and computation time is getting higher for a deeper network, the game tester management has to gauge the trade-off between performance and computation-cost. The LSTM-Testing settings for the following testing case in this study are presented in Figure 13. LSTM-Testing first screenshots each gameplay as the input, and it uses 4 continuous screenshots as a sample of combination gameplays. The standard input size is a 3D array (80 × 80 × 4) for the following steps. LSTM-Testing adopts two combinations of convolution and subsampling for feature extraction. The Adaptive Moment Estimation then calculates the probabilities of possible gameplay actions. Then LSTM-Testing chooses the action with the highest probability to complete the game. The size of short-term memory is 50,000, the training gate ( ) randomly selects 128 samples from Appl.the Sci.short2020-term, 10, 6704 memory for training, and the forgetting gate ( ) drops the oldest sample in12 the of 21 𝐹𝐹𝑡𝑡𝑡𝑡 short-term memory when the queue of short-term memory is full. 𝐹𝐹𝑓𝑓𝑓𝑓

FigureFigure 13.13. LSTM-TestingLSTM-Testing setting for the following cases cases in in this this study. study.

4.4. LSTM-Testing LSTM-Testing Evaluation:Evaluation: PacmanPacman Game Testing

4.1.4.1. EnvironmentEnvironment SettingsSettings ThisThis section section uses uses a a classic classic and and popular popular casual casual game, game, i.e., i.e., Pacman Pacman [11 [11],], as aas case a case study study to ensureto ensure that LSTM-Testingthat LSTM-Testing can replace can replace manual manual testing testing to a certain to a certain extent. extent. Pacman Pacman is a classic is a classic and popular and popular casual gamecasual developed game developed by Namco by BandaiNamco GamesBandai Inc.Games in 1980. Inc. in Three 1980. di Threefferent different difficulties difficulties of mazes of shown mazes in Figureshown 14 in are Figure applied 14 inare this applied evaluation. in this Theevaluation. testing results The testing between results LSTM-Testing between LSTM and manual-Testing testing and aremanual compared testing by are using compared learning by time using as the learning control time variable, as the game control design variable, as the independentgame design variable, as the andindependent time to complete variable, game and time as the to dependent complete game variable. as the dependent variable. Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 23

(a) (b) (c)

Figure 14. ThreeThree maze maze types types of of Pacman; Pacman; ( (aa)) no no hole hole in in walls; walls; ( (bb)) only only one one hole hole in in walls; walls; ( (cc)) two two holes holes inin walls. walls.

The feedback setting is (1) (1)-1-1 point point when when Pacman Pacman does does no nott eat eat a a bean bean in in a a gameplay, gameplay, (2) (2)-10-10 points points wwhenhen Pacman eatseats aa bean,bean, andand (3)-10 (3)-10 points points when when Pacman Pacman is is hunt hunt by by a monster.a monster. Each Each maze maze is testedis tested by byLSTM-Testing LSTM-Testing for 150,000for 150,000 games games including including the games the games that eatingthat eating all beans all bea or failingns or failing to eat allto beans.eat all beans.Regarding the manual testing, each maze is tested by three testers who never play Pacman, andRegarding each tester the has manual to finish testing, 150 games each maze including is tested the by games three that testers eating who all never beans play or Pacman, failing to and eat eachall beans. tester has to finish 150 games including the games that eating all beans or failing to eat all beans.

4.2. Experiment Results Figure 15 demonstrates the results of LSTM-Testing and Manual-Testing regarding different difficulties of mazes. First, considering the order of difficulty of three maze types, as shown in Figure 15a, On the other hand, manual-testing enters the steady-state to eat all beans at about 80, 100, 120 games for the “maze with 2 holes”, “maze with 1 hole”, and “maze with no hole”. The order of the maze difficulty is, from easy to difficult: “maze with 2 holes”, “maze with 1 hole”, and “maze with no hole”. Figure 15b demonstrates that LSTM-Testing enters the steady-state to eat all beans at about 40,000, 60,000, 80,000 games for the “maze with 2 holes”, “maze with 1 hole”, and “maze with no hole”. This means that LSTM-Testing suggests that the order of the maze difficulty is, from easy to difficult: “maze with 2 holes”, “maze with 1 hole”, and “maze with no hole”. Therefore, the result of LSTM-testing shows the order of the maze difficulty matches that of manual-testing.

Appl. Sci. 2020, 10, 6704 13 of 21

4.2. Experiment Results FigureAppl. Sci. 152020 demonstrates, 10, x FOR PEER REVIEW the results of LSTM-Testing and Manual-Testing regarding14 di of ff23erent difficulties of mazes.

(a)

(b)

FigureFigure 15. Testing15. Testing three three di ffdifferenterent di difficultiesfficultiesof of mazes;mazes; ( (aa)) Manual Manual-Testing;-Testing; (b) ( bLSTM) LSTM-Testing.-Testing.

First,On considering the other hand, the orderconsidering of di ffitheculty learning of three rate of maze the casual types, player as shown to the different in Figure mazes, 15a, we On the otherdesign hand, manual-testinga Game Learning enters Rate Similarity the steady-state algorithm to (GLRS) eat all beansto evaluate at about the 80,similarity 100, 120 between games the for the “mazeresults with 2of holes”, manual “maze-testing with and 1that hole”, of LSTM and “maze-Testing. with Figure no hole”.16 shows The that order GLRS of the consists maze of di ffithreeculty is, steps. First, GLRS normalizes the testing time index of Manual-Testing and LSTM-Testing data sets from easy to difficult: “maze with 2 holes”, “maze with 1 hole”, and “maze with no hole”. to the range 0 to 1, since their testing times are quite different. Second, GLRS selects k time-series Figuresamples 15 withb demonstrates the same interval that from LSTM-Testing Manual-Testing enters and theLSTM steady-state-Testing, respectively, to eat all and beans to build at about 40,000,corresponding 60,000, 80,000 vectors games to calculate for the similarity. “maze with In step 2 holes”, 3, GLRS “maze applies with cosine 1 similarity hole”, and method “maze [12] with to no hole”. This means that LSTM-Testing suggests that the order of the maze difficulty is, from easy to Appl. Sci. 2020, 10, 6704 14 of 21 difficult: “maze with 2 holes”, “maze with 1 hole”, and “maze with no hole”. Therefore, the result of LSTM-testing shows the order of the maze difficulty matches that of manual-testing. On the other hand, considering the learning rate of the casual player to the different mazes, we design a Game Learning Rate Similarity algorithm (GLRS) to evaluate the similarity between the results of manual-testing and that of LSTM-Testing. Figure 16 shows that GLRS consists of three steps. First, GLRS normalizes the testing time index of Manual-Testing and LSTM-Testing data sets to the range 0 to 1, since their testing times are quite different. Second, GLRS selects k time-series samples with the same interval from Manual-Testing and LSTM-Testing, respectively, and to build corresponding vectors to calculate similarity. In step 3, GLRS applies cosine similarity method [12] to compare the testing result between Manual-Testing and LSTM-Testing. Cosine similarity is widely applied to estimate the similarity between two non-zero vectors by measuring the cosine of the angle between the two vectors. The cosine of 0◦ means the similarity is 100% of the two vectors, and it is less than 100% for any angle in the interval (0, π] radians. As the result calculated by GLRS, the game learningAppl. Sci. 20 rate20, 10 similarity, x FOR PEER of Manual-TestingREVIEW and LSTM-Testing is 97.60%. 16 of 23

FigureFigure 16.16. GameGame Learning Rate Similarity Algorithm.

FromWe also the aboveprovide evaluation, a case study the to experiment show how results LSTM support-Testing theto explore argument an thatunnecessary LSTM-Testing trap of can a replacegame design. manual testing to a certain extent. Four case studies in the following subsections demonstrate how to apply LSTM-Testing to evaluate the reasonability of the game leveling.

5.1. Ensure Reasonable Leveling Path Design This subsection described how LSTM-Testing evaluates the reasonability of leveling between mazes with a different number of holes. As Section 4 shows that order of the maze difficulty, LSTM-Testing then explores the reasonable leveling path from one maze to another maze by using a trained model in the different maze as a control variable, game design as an independent variable, and time to complete game as a dependent variable. Therefore, the test cases are: → Leveling Path A: from “maze with 2 holes” to “maze with 1 hole”. LSTM-Testing first load the trained model which can eat all beans in “maze with 2 holes”. Then, LSTM-Testing applies this trained model to play the “maze with 1 hole” and observes the training time to eat all beans. → Leveling Path B: from “maze with 2 holes” to “maze with no hole”. LSTM-Testing first load the trained model which can eat all beans in “maze with 2 holes”. Then, LSTM-Testing applies this trained model to play the “maze with no hole” and observes the training time to eat all beans. Appl. Sci. 2020, 10, 6704 15 of 21

5. Implications of LSTM-Testing Game leveling provides new challenges and difficulty to keep player’s interest high in a game. However, manually testing different leveling designs is difficult due to two reasons. First, we cannot stop game testers spontaneously learning higher gameplay skills, and their skill is getting better in-game testing procedures. This leads that a high-skilled tester, from a beginner’s viewpoint, hardly objectively evaluates the reasonability of leveling from easy to hard. The second reason is the game tester may feel paralyzed about the game leveling after numbers of game testing. This causes the tester may think the leveling is indifferent and boring. In order to overcome the problem of manual-testing mentioned above, LSTM-Test can evaluate different game leveling design by using the trained model in different levels as the control variable, game design as the independent variable, and time to complete game as the dependent variable. This means LSTM-Testing can force the AI to stop learning how to play the game in order to acquire objective and quantitative testing data between different levels. LSTM-Testing can also emulate the game players with varying game skills to acquire corresponding testing data in different game leveling path. Therefore, the game designer can construct a reasonable leveling path and guide the player towards the goal of the level and preventing confusion, idling, and boring, which may cause the player to leave the game. We also provide a case study to show how LSTM-Testing to explore an unnecessary trap of a game design. Four case studies in the following subsections demonstrate how to apply LSTM-Testing to evaluate the reasonability of the game leveling.

5.1. Ensure Reasonable Leveling Path Design This subsection described how LSTM-Testing evaluates the reasonability of leveling between mazes with a different number of holes. As Section4 shows that order of the maze di fficulty, LSTM-Testing then explores the reasonable leveling path from one maze to another maze by using a trained model in the different maze as a control variable, game design as an independent variable, and time to complete game as a dependent variable. Therefore, the test cases are:

Leveling Path A: from “maze with 2 holes” to “maze with 1 hole”. LSTM-Testing first load the → trained model which can eat all beans in “maze with 2 holes”. Then, LSTM-Testing applies this trained model to play the “maze with 1 hole” and observes the training time to eat all beans. Leveling Path B: from “maze with 2 holes” to “maze with no hole”. LSTM-Testing first load the → trained model which can eat all beans in “maze with 2 holes”. Then, LSTM-Testing applies this trained model to play the “maze with no hole” and observes the training time to eat all beans. Leveling Path C: from “maze with 1 hole” to “maze with no hole”. LSTM-Testing first load the → trained model which can eat all beans in “maze with 1 hole”. Then, LSTM-Testing applies this trained model to play the “maze with no hole” and observes the training time to eat all beans.

The evaluation result is shown in Figure 17. The training time of leveling path A (orange line), B (blue line), and C (green line) are about 2000 games, 7000 games, and 5000 games, respectively. It should be noted that the leveling path A and C own high percent, 60% and 80% respectively, to eat all beans at the beginning of the training, compared to path B cannot eat most beans at the beginning of the training. These phenomena indicate that: Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 23

→ Leveling Path C: from “maze with 1 hole” to “maze with no hole”. LSTM-Testing first load the trained model which can eat all beans in “maze with 1 hole”. Then, LSTM-Testing applies this trained model to play the “maze with no hole” and observes the training time to eat all beans. The evaluation result is shown in Figure 17. The training time of leveling path A (orange line), B (blue line), and C (green line) are about 2000 games, 7000 games, and 5000 games, respectively. It should be noted that the leveling path A and C own high percent, 60% and 80% respectively, to eat all beans at the beginning of the training, compared to path B cannot eat most beans at the beginning of the training. These phenomena indicate that: First, the leveling path C might be a good leveling design that provides new challenges and difficulty to keep player’s interest high in this maze and also guides the gamer to develop a higher skill to eat all beans. Second, the leveling path A and C might be too easy and unnecessary for the gamer, since LSTM- Testing can acquire a high percent to eat all beans at the beginning of the training. Third, path C requires fewer games than paths A and B is due to that the player can apply his learned skill, i.e., not using “hole” to dodge ghosts, to complete the game without hole of next level. However, the player of the path B and C confronts difficulties to complete the game with one or no hole because he/she needs a hole to doge ghost that is learned from the previous level. Furthermore, Appl. Sci. 2020, 10, 6704 16 of 21 path A requires fewer games than path B is because the player of the path A can still use the hole to dodge the ghost and the player of the path B cannot use the hole to doge the ghost.

Figure 17. Leveling difficulty of different num. of holes in maze. Figure 17. Leveling difficulty of different num. of holes in maze. 5.2. Validate Game Difficulty Is Proportional to Level Scales First, the leveling path C might be a good leveling design that provides new challenges and difficultyThis to subsection keep player’s described interest how high LSTM in- thisTesting maze validates and also the guides argument the that gamer the tomaze develop difficulty a higher is skillproportional to eat all beans.to the maze size for the first-time gamer. The argument origins from that some game testersSecond, suggest the that leveling the bigger path the A and size Cthe might maze be is, toothe easymore andbeans unnecessary this maze has, for theand gamer,the higher since LSTM-Testingdifficulty for the can beginner acquire ato high eat all percent beans. toOn eat the all other beans hand, at the some beginning game testers of the argue training. that the bigger sizeThird, the maze path is, C the requires easier the fewer beginner games can than dodge paths the A chasing and B of is duethe monster. to that the However, player canthere apply is still his learnedno sufficient skill, i.e.,quantitative not using testing “hole” results, to dodge to the ghosts, best of to our complete knowledge, the game to validate without this hole argument. of next level. However,The testing the player result of is the demonstrated path B and in C confrontsFigure 18, where difficulties three todifferent complete maze the sizes, game i.e., with small one maze or no hole(3 rows because with he 9/ shecolumns), needs median a hole to maze doge (5 ghost rowsthat with is 11 learned columns) from, and the large previous maze level. median Furthermore, maze (7 pathrows A with requires 11 columns), fewer games are applied than path in this B is testing because as theshown player in Figure of the 19. path The A training can still time use theof small hole to maze, median maze, and large maze to eat all beans are about 4000 games, 5000 games, and 16,000 dodge the ghost and the player of the path B cannot use the hole to doge the ghost.

5.2. Validate Game Difficulty Is Proportional to Level Scales This subsection described how LSTM-Testing validates the argument that the maze difficulty is proportional to the maze size for the first-time gamer. The argument origins from that some game testers suggest that the bigger the size the maze is, the more beans this maze has, and the higher difficulty for the beginner to eat all beans. On the other hand, some game testers argue that the bigger size the maze is, the easier the beginner can dodge the chasing of the monster. However, there is still no sufficient quantitative testing results, to the best of our knowledge, to validate this argument. The testing result is demonstrated in Figure 18, where three different maze sizes, i.e., small maze (3 rows with 9 columns), median maze (5 rows with 11 columns), and large maze median maze (7 rows with 11 columns), are applied in this testing as shown in Figure 19. The training time of small maze, median maze, and large maze to eat all beans are about 4000 games, 5000 games, and 16,000 games, respectively. This result supports the argument that the bigger maze size is, the more difficult the maze has. It also shows that the maze difficulty might be proportional to the maze size to a certain extent for the Pacman game. However, it should be noted that the difficulty of a game depends on the game design itself and may not be the proportional to the size of other games’ playground. Appl. Sci. 2020, 10, x FOR PEER REVIEW 18 of 23

games,Appl. respectively. Sci. 2020, 10, x FOR This PEER result REVIEW supports the argument that the bigger maze size is, the more18 difficultof 23 the maze has. It also shows that the maze difficulty might be proportional to the maze size to a certain games, respectively. This result supports the argument that the bigger maze size is, the more difficult Appl.extent Sci. for2020 the, 10 ,Pacman 6704 game. However, it should be noted that the difficulty of a game depends17 on of 21 the maze has. It also shows that the maze difficulty might be proportional to the maze size to a certain the gameextent design for the itselfPacman and g ame.may However,not be the it proportional should be noted to thethat si theze ofdifficulty other games’ of a game playground. depends on the game design itself and may not be the proportional to the size of other games’ playground.

(a) (a) ((bb)) (c) (c)

FigureFigure 18. 18.ThreeThree Three different di differentfferent sizes sizes sizes of ofof maze; maze;maze; (( (aaa))) small small maze;maze; maze; ( b(b ()b )median )median median maze; maze; maze; (c) (largec ()c )large large maze. maze. maze.

Figure 19. Leveling difficulty of different size of mazes. Figure 19. Leveling difficulty of different size of mazes. 5.3. The5.3. T Levelinghe Leveling of theof the Snake SnakeFigure Game Game 19. Leveling difficulty of different size of mazes. In this experiment, we evaluate the leveling of the Snake Game [13]. The basic Snake game is 5.3. TInhe thisLeveling experiment, of the Snake we Game evaluate the leveling of the Snake Game [13]. The basic Snake game is Appl. thatSci. 20 a 20sole, 10 player, x FOR attemptsPEER REVIEW to eat beans by running into them with the head of the snake, as shown in19 of 23 that a sole player attempts to eat beans by running into them with the head of the snake, as shown FigureIn this 20. experiment, Each bean eaten we evaluatemakes the the snake leveling longer, of so the controlling Snake Game is progressively [13]. The more basic difficult. Snake gameThe is in Figure 20. Each bean eaten makes the snake longer, so controlling is progressively more difficult. makethat increasinga asole hypothesis player difficulty attempts that of each eating to eat leveling one beans more byof bean therunning isSnake evaluated into Game them by isLSTM with two -the Testingtimes head ofin of thisdifficulty the subsection. snake, comparing as shown inthe The increasing difficulty of eating one more bean is evaluated by LSTM-Testing in this subsection. previousFigure 20.LSTM level. Each- Testing bean eaten uses AImakes learning the timesnake as longer, a dependent so controlling variable to is quantitatively progressively evaluate more difficult.the Snake The increasingGame leveling difficulty design. of eating That is,one we more attempt bean to isobserve evaluated how bymany LSTM games-Testing when inthe this LSTM subsection.-Testing has learned to eat one bean, two beans, and three beans, respectively. Figure 21 demonstrates that LSTM- LSTM-Testing uses AI learning time as a dependent variable to quantitatively evaluate the Snake Testing needs about 10,000 games to learn how to eat the first bean, given that success rate for 80% Game leveling design. That is, we attempt to observe how many games when the LSTM-Testing has means LSTM-Testing has learned the skill. Then, LSTM-Testing needs about 20,000 games to learn to learnedeat two to eat beans, one andbean, that two about beans, 50,000 and game threes beans,to eat three respectively. beans. Therefore, Figure 21 this demonst evaluationrates result that canLSTM - Testing needs about 10,000 games to learn how to eat the first bean, given that success rate for 80% means LSTM-Testing has learned the skill. Then, LSTM-Testing needs about 20,000 games to learn to eat two beans, and that about 50,000 games to eat three beans. Therefore, this evaluation result can

Figure 20. Snake Game.

Figure 21. Evaluation of Snake Game Leveling.

5.4. The Impact of Picture Background on the Game Difficulty: Encounter Unnecessary Trap We design an experiment to observe the impact of picture background on game difficulty. The motive is that using pictures as the game background will increase the game difficulty because the object of the game will be obscured by the background and cause mistake of gameplay. However, there is no sufficient quantitative analysis to evaluate how much game difficulty a picture background increases. This experiment, therefore, uses the ping-pong game as a case study to observe the scale of rising difficulty of a picture background by using LSTM-Testing. The experiment, as shown in Figure 22, uses a simple Ping-pong game without a picture background as the control group (as shown in Figure 22a), and uses a simple ping-pong game with a picture background as the experiment group (as shown in Figure 22b). Then, we observe how many balls coming down when the LSTM-Testing has learned to return the ball. Appl. Sci. 2020, 10, x FOR PEER REVIEW 19 of 23 make a hypothesis that each leveling of the Snake Game is two times of difficulty comparing the previous level. Appl. Sci. 2020, 10, 6704 18 of 21

LSTM-Testing uses AI learning time as a dependent variable to quantitatively evaluate the Snake Game leveling design. That is, we attempt to observe how many games when the LSTM-Testing has learned to eat one bean, two beans, and three beans, respectively. Figure 21 demonstrates that LSTM-Testing needs about 10,000 games to learn how to eat the first bean, given that success rate for 80% means LSTM-Testing has learned the skill. Then, LSTM-Testing needs about 20,000 games to learn to eat two beans, and that about 50,000 games to eat three beans. Therefore, this evaluation result can make a hypothesis that each leveling of the Snake Game is two times of difficulty comparing the previous level. Figure 20. Snake Game.

FigureFigure 21. EvaluationEvaluation of of Snake Snake Game Game Leveling. Leveling.

5.4.5.4. T Thehe I Impactmpact of of Picture Picture Background Background on on the the Game Game Difficulty: Difficulty: Encounter Encounter Unnecessary Unnecessary T Traprap WeWe design design an an experiment experiment to to observe observe the the impact impact of picture of picture background background on game on game difficulty. difficulty. The motiveThe motive is that is thatusing using pictures pictures as the as game the game background background will willincrease increase the game the game difficult diffiyculty because because the objectthe object of the of thegame game will will be beobscured obscured by bythe the background background and and cause cause mistake mistake of of gameplay. gameplay. However, However, therethere isis no no su ffisufficientcient quantitative quantitat analysisive analysis to evaluate to evaluate how much how game much di ffigameculty adifficulty picture background a picture backgroundincreases. This increases. experiment, This experiment, therefore, uses therefore, the ping-pong uses the game ping- aspong a case game study as a tocase observe study theto observe scale of therising scale di ffiofculty rising of difficulty a picture of background a picture background by using LSTM-Testing. by using LSTM-Testing. TheThe experiment, experiment, as as shown shown in in Figure Figure 22,22, uses uses a a simple Ping Ping-pong-pong game game without without a a picture backgroundbackground as the controlcontrol group (as(as shownshown inin FigureFigure 22 22a),a), andand uses uses a a simple simple ping-pong ping-pong game game with with a apicture picture background background as as the the experiment experiment group group (as (as shown shown inin FigureFigure 2222b).b). Then, we observe how many many ballsballs coming coming down when the LSTM LSTM-Testing-Testing has learned to return the ball.ball. Figure 23 demonstrates that the game without picture background needs about 35,000 balls to learn how to return the ball, given that the LSTM-Testing reaches 90% percent to successfully return the balls. However, the game with picture background needs about 43,000 balls to learn how to return the ball, which means the picture used in this test increases by 15% (43,000/35,000) difficulty. This is because the LSTM-Testing encounter an unnecessary trap that LSTM-testing cannot identify the white ball when this ball passes the playground with light color, as shown the position of ball near the nose of the dog. Therefore, this evaluation result can make a hypothesis that a game with a picture background does increase the game difficulty. Appl. Sci. 2020, 10, 6704 19 of 21

Appl. Sci. 2020, 10, x FOR PEER REVIEW 20 of 23

(a) (b)

Figure 22.22. Game with didifferentfferent background; (a) gamegame withoutwithout picture background;background; (b)) gamegame withwith Appl.picturepicture Sci. 2020 background.background., 10, x FOR PEER REVIEW 21 of 23

Figure 23 demonstrates that the game without picture background needs about 35,000 balls to learn how to return the ball, given that the LSTM-Testing reaches 90% percent to successfully return the balls. However, the game with picture background needs about 43,000 balls to learn how to return the ball, which means the picture used in this test increases by 15% (43,000/35,000) difficulty. This is because the LSTM-Testing encounter an unnecessary trap that LSTM-testing cannot identify the white ball when this ball passes the playground with light color, as shown the position of ball near the nose of the dog. Therefore, this evaluation result can make a hypothesis that a game with a picture background does increase the game difficulty.

FigureFigure 23.23. ImpactImpact of picture background background on on game game difficulty. difficulty.

This experiment further evaluates the effect of learning strategy on returning a ball. The learning strategy means what gameplay decisions are kept in the short memory queue for periodical reviewing and, on the other hand, what gameplay decisions are dropped from the short memory queue. Three review strategies are evaluated in this experiment including (1) reviewing the newest decision which is the default setting, (2) reviewing the wrong decision, and (3) reviewing the correct decision. Reviewing the newest decision means the decisions made are kept in the short memory queue, and the oldest decision will be dropped when the short memory queue is full. Reviewing the wrong decision means the keeps the wrong decisions, which LSTM-testing makes, in the short-term memory queue, and drops the right decisions to return the ball when the queue is full. In contrast to reviewing the wrong decisions, reviewing the correct decision is to keep the right decisions in the queue and drop the wrong decisions. The effect of learning strategy to return the ball, as shown in Table 1, shows that reviewing the wrong decision is the most effective strategy than other strategies for the ping-pong game. The strategy of reviewing the wrong decision requires 33,000 times of trainings, which is the fewest training iterations to have 90% successfully return the ball. This is because, for the ping-pong game, the width of the bar to return the ball is shorter than the width of the bottom of the gameplay ground. This leading that, in a random fresh, the opportunities to make the correct decision to move the bar and catch/return the ball is fewer than that to lose the ball. The LSTM-testing then does not have sufficient learning opportunities since the short-term memory does not have enough samples of correct decisions.

Table 1. The effect of learning strategy on returning ball. Appl. Sci. 2020, 10, 6704 20 of 21

This experiment further evaluates the effect of learning strategy on returning a ball. The learning strategy means what gameplay decisions are kept in the short memory queue for periodical reviewing and, on the other hand, what gameplay decisions are dropped from the short memory queue. Three review strategies are evaluated in this experiment including (1) reviewing the newest decision which is the default setting, (2) reviewing the wrong decision, and (3) reviewing the correct decision. Reviewing the newest decision means the decisions made are kept in the short memory queue, and the oldest decision will be dropped when the short memory queue is full. Reviewing the wrong decision means the keeps the wrong decisions, which LSTM-testing makes, in the short-term memory queue, and drops the right decisions to return the ball when the queue is full. In contrast to reviewing the wrong decisions, reviewing the correct decision is to keep the right decisions in the queue and drop the wrong decisions. The effect of learning strategy to return the ball, as shown in Table1, shows that reviewing the wrong decision is the most effective strategy than other strategies for the ping-pong game. The strategy of reviewing the wrong decision requires 33,000 times of trainings, which is the fewest training iterations to have 90% successfully return the ball. This is because, for the ping-pong game, the width of the bar to return the ball is shorter than the width of the bottom of the gameplay ground. This leading that, in a random fresh, the opportunities to make the correct decision to move the bar and catch/return the ball is fewer than that to lose the ball. The LSTM-testing then does not have sufficient learning opportunities since the short-term memory does not have enough samples of correct decisions.

Table 1. The effect of learning strategy on returning ball.

(Default) Review Wrong Review Correct Review Strategy Review Newest Decision Decision Decision Training iterations to reach 90% of 35 33 42 successfully return ball (unit: k)

6. Conclusions The proposed LSTM-Testing is capable to objectively and quantitatively evaluate the reasonability of casual game design. The experiment result shows that the game learning rate similarity of Manual-Testing and LSTM-Testing is 97.60%, which supports that LSTM-Testing can replace manual testing to a certain extent. As the experiment demonstrated in Sections4–6, the most advantages of LSTM-Testing compared to the manual testing are: (1) to quantitatively evaluate the difficulty of different game design, (2) to ensure reasonable leveling path design, and (3) to validate the game difficulty is proportional to level scales. Currently, LSTM-Testing is only applied to test the casual games. This study will extend the proposed LSTM-Testing to hard-core game testing and strategy training for Decision Makings of Project Management [13]. On the other hand, however, the effectiveness of AI Software Testing procedure for the multiplayer game is still required.

Author Contributions: Conceptualization, Y.-H.C.; methodology, Y.-H.C. and S.-F.C.; software, Y.-H.C.; validation, L.-K.C. and S.-C.C.; formal analysis, L.-K.C. and S.-F.C.; investigation, S.-F.C.; resources, Y.-H.C.; data curation, L.-K.C.; writing—original draft preparation, Y.-H.C. and L.-K.C.; writing—review and editing, S.-F.C. and S.-C.C.; visualization, S.-F.C.; supervision, Y.-H.C.; project administration, Y.-H.C. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest. Appl. Sci. 2020, 10, 6704 21 of 21

References

1. Clairfiled International, Gaming Industry-Facts, Figures and Trends Clairfield International. 2018. Available online: http://www.clairfield.com/wp-content/uploads/2017/02/Gaming-Industry-and-Market-Report-2018. 01-2.pdf (accessed on 23 June 2020). 2. Lu, J.-Y.; Dai, W.-S.; Tang, Y.-X.; Su, A.-S. Game app life cycle and user immersion experience. In Proceedings of the Conference on Information Technology and Applications in Outlying Islands, Kaohsiung, Taiwan, 22–23 May 2016; Available online: http://csie.npu.edu.tw/Seminar/15/15/paper/P1-35.pdf (accessed on 23 June 2020). 3. Becares, J.H.; Valero, J.C.; Martin, P.P.G. An approach to automated videogame beta testing. Entertain. Comput. 2017, 18, 79–92. [CrossRef] 4. Cruz, C.A.; Uresti, J.A.R. Player-centered game AI from a flow perspective: Towards a better understanding of past trends and future directions. Entertain. Comput. 2017, 20, 11–24. [CrossRef] 5. Silva, M.P.; Silva, V.d.N.; Chaimowicz, L. Dynamic difficulty adjustment on MOBA games. Entertain. Comput. 2017, 18, 103–123. [CrossRef] 6. Wheat, D.; Masek, M.; Lam, C.P.; Hingston, P. Dynamic difficulty adjustment in 2D platformers through agent-based procedural level generation. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, China, 9–12 October 2015; pp. 2778–2785. 7. Nordin, M.; King, D.; Posthuma, S. But is it fun? Software testing in the , keynote speech. In Proceedings of the 11th IEEE Conference on Software Testing, Validation and Verification (ICST 2018), Västerås, Sweden, 10–12 April 2018; Available online: http://www.es.mdh.se/icst2018/keynotes/#fun (accessed on 23 June 2020). 8. Blitz Games Studios, “Project Lifecycle.”. 2011. Available online: http://www.blitzgamesstudios.com/blitz_ academy/game_dev/project_lifecycle (accessed on 23 June 2020). 9. Raghavendra, U.; Fujita, H.; Bhandary, S.V.; Gudigar, A.; Tan, J.H.; Acharya, U.R. Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images. Inf. Sci. 2018, 441, 41–49. [CrossRef] 10. Kingma, D.P.; Ba, J.L. ADAM: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 23 July 2015. 11. Arcade History, “Pacman”. Available online: https://www.arcade-history.com/?page=detail&id=1914 (accessed on 23 June 2020). 12. Tan, P.-N.; Steinbach, M.; Kumar, V. Introduction to Data Mining; Addison-Wesley: Boston, MA, USA, 2005; Chapter 8; ISBN 0-321-32136-7. 13. Chen, H.-M.; Wei, K.-W. Application of model on simulation training for decision makings of project management. Proc. Eng. Technol. Innov. 2017, 5, 25–30. Available online: http://ojs.imeti.org/index. php/PETI/article/view/933 (accessed on 23 June 2020).

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).