Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Total Page:16

File Type:pdf, Size:1020Kb

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Mohammad Shoeybi 1 2 Mostofa Patwary 1 2 Raul Puri 1 2 Patrick LeGresley 2 Jared Casper 2 Bryan Catanzaro 2 Abstract 1. Introduction Recent work in language modeling demonstrates Natural Language Processing (NLP) is advancing quickly in that training large transformer models advances part due to an increase in available compute and dataset size. the state of the art in Natural Language Processing The abundance of compute and data enables training increas- applications. However, very large models can be ingly larger language models via unsupervised pretraining quite difficult to train due to memory constraints. (Devlin et al., 2018; Radford et al., 2019). Empirical evi- In this work, we present our techniques for train- dence indicates that larger language models are dramatically ing very large transformer models and implement more useful for NLP tasks such as article completion, ques- a simple, efficient intra-layer model parallel ap- tion answering, and natural language inference (Lan et al., proach that enables training transformer models 2019; Raffel et al., 2019). By finetuning these pretrained with billions of parameters. Our approach does language models on downstream natural language tasks, not require a new compiler or library changes, is one can achieve state of the art results as shown in recent orthogonal and complimentary to pipeline model work (Devlin et al., 2018; Peters et al., 2018; Howard & parallelism, and can be fully implemented with Ruder, 2018; Radford et al., 2018; 2017; Ramachandran the insertion of a few communication operations et al., 2016; Liu et al., 2019b; Dai et al., 2019; Yang et al., in native PyTorch. We illustrate this approach 2019; Liu et al., 2019a; Lan et al., 2019). by converging transformer based models up to As these models become larger, they exceed the memory 8.3 billion parameters using 512 GPUs. We sus- limit of modern processors, and require additional memory tain 15.1 PetaFLOPs across the entire applica- management techniques such as activation checkpointing tion with 76% scaling efficiency when compared (Chen et al., 2016). Widely used optimization algorithms to a strong single GPU baseline that sustains 39 such as ADAM require additional memory per parameter to TeraFLOPs, which is 30% of peak FLOPs. To store momentum and other optimizer state, which reduces demonstrate that large language models can fur- the size of models that can be effectively trained. Several ther advance the state of the art (SOTA), we train approaches to model parallelism overcome this limit by an 8.3 billion parameter transformer language partitioning the model such that the weights and their asso- model similar to GPT-2 and a 3.9 billion parame- ciated optimizer state do not need to reside concurrently on ter model similar to BERT. We show that careful the processor. For example, GPipe (Huang et al., 2018) and attention to the placement of layer normalization Mesh-Tensorflow (Shazeer et al., 2018) provide frameworks arXiv:1909.08053v4 [cs.CL] 13 Mar 2020 in BERT-like models is critical to achieving in- for model parallelism of different kinds. However, they creased performance as the model size grows. Us- require rewriting the model, and rely on custom compilers ing the GPT-2 model we achieve SOTA results and frameworks that are still under development. on the WikiText103 (10.8 compared to SOTA per- plexity of 15.8) and LAMBADA (66.5% com- In this work, we implement a simple and efficient model pared to SOTA accuracy of 63.2%) datasets. Our parallel approach using intra-layer model-parallelism. We BERT model achieves SOTA results on the RACE exploit the inherent structure in transformer based language dataset (90.9% compared to SOTA accuracy of models to make a simple model-parallel implementation that 89.4%). trains efficiently in PyTorch, with no custom C++ code or compiler required. This approach is orthogonal to pipeline- based model parallelism as advocated by approaches such 1Equal contribution 2NVIDIA. Correspondence to: Mohammad as GPipe (Huang et al., 2018). Shoeybi <[email protected]>. To demonstrate the scalability of our approach, we establish Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism • We show that careful attention to the placement of layer normalization in BERT-like models is critical to achieving increased accuracies as the model grows. • We demonstrate that scaling the model size results in improved accuracies for both GPT-2 (studied up to 8.3 billion parameters) and BERT (studied up to 3.9B parameters) models. • We showcase that our models achieve state of the art results on test sets: perplexity on WikiText103 (10.8 ppl), accuracy on LAMBADA (66.5%), and accuracy on RACE (90.9%). Figure 1. Model (blue) and model+data (green) parallel FLOPS as a function of number of GPUs. Model parallel (blue): up to • We open source our code along with the training 8-way model parallel weak scaling with approximately 1 billion and evaluation pipelines at https://github:com/ parameters per GPU (e.g. 2 billion for 2 GPUs and 4 billion for NVIDIA/Megatron-LM 4 GPUs). Model+data parallel (green): similar configuration as model parallel combined with 64-way data parallel. 2. Background and Challenges 2.1. Neural Language Model Pretraining a baseline by training a model of 1.2 billion parameters on a single NVIDIA V100 32GB GPU, that sustains 39 Pretrained language models have become an indispensable TeraFLOPs. This is 30% of the theoretical peak FLOPS part of NLP researchers’ toolkits. Leveraging large corpus for a single GPU as configured in a DGX-2H server, and pretraining to learn robust neural representations of lan- is thus a strong baseline. Scaling the model to 8.3 billion guage is an active area of research that has spanned the parameters on 512 GPUs with 8-way model parallelism, past decade. Early examples of pretraining and transferring we achieve up to 15.1 PetaFLOPs per second sustained neural representations of language demonstrated that pre- over the entire application. This is 76% scaling efficiency trained word embedding tables improve downstream task compared to the single GPU case. Figure1 shows more results compared to word embedding tables learned from detailed scaling results. scratch (Mikolov et al., 2013; Pennington et al., 2014; Turian et al., 2010). Later work advanced research in this area by To analyze the effect of model size scaling on accuracy, learning and transferring neural models that capture contex- we train both left-to-right GPT-2 (Radford et al., 2019) lan- tual representations of words (Melamud et al., 2016; Mc- guage models as well as BERT (Devlin et al., 2018) bidi- Cann et al., 2017; Peters et al., 2018; Radford et al., 2017; rectional transformers and evaluate them on several down- 2019). Recent parallel work (Ramachandran et al., 2016; stream tasks. We show that the existing BERT architecture Howard & Ruder, 2018; Radford et al., 2018; Devlin et al., results in model degradation as the size increases. We over- 2018; Liu et al., 2019b; Dai et al., 2019; Yang et al., 2019; come this challenge by rearranging the layer normalization Liu et al., 2019a; Lan et al., 2019) further builds upon these and residual connection in the transformer layers and show ideas by not just transferring the language model to extract that with this change, results for the downstream tasks on contextual word representations, but by also finetuning the development sets improve monotonically as the model size language model in an end to end fashion on downstream increases. In addition, we show that our models achieve tasks. Through these works, the state of the art has advanced test set state of the art (SOTA) results on WikiText103, from transferring just word embedding tables to transferring cloze-style prediction accuracy on LAMBADA, and reading entire multi-billion parameter language models. This pro- comprehension RACE datasets. gression of methods has necessitated the need for hardware, In summary, our contributions are as follows: systems techniques, and frameworks that are able to oper- ate efficiently at scale and satisfy increasing computational needs. Our work aims to provide the tools necessary to take • We implement a simple and efficient model parallel another step forward in this trend. approach by making only a few targeted modifications to an existing PyTorch transformer implementation. 2.2. Transformer Language Models and Multi-Head Attention • We perform an in-depth empirical analysis of our model and data parallel technique and demonstrate Current work in NLP trends towards using transformer mod- up to 76% scaling efficiency using 512 GPUs. els (Vaswani et al., 2017) due to their superior accuracy Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism gate these effects and drive down the training time of large neural networks. To scale out training even further, parallel work (Chen et al., 2016) has combined data parallelism with activation checkpointing: recomputing activations in the backward pass without storing them in the forward pass to reduce memory requirements. However, these techniques have one fundamental limitation in the problem size they can tackle: the model must fit entirely on one worker. With language models of increasing size and complexity like BERT and GPT-2, neural networks have approached the memory capacity of modern hardware accelerators. One solution to this problem is to employ parameter sharing to reduce the memory footprint of the model (Lan et al., 2019), but this limits the overall capacity Figure 2. Transformer Architecture. Purple blocks correspond to of the model. Our approach is to utilize model parallelism fully connected layers. Each blue block represents a single trans- to split the model across multiple accelerators. This not former layer that is replicated N times.
Recommended publications
  • Leader Class Grimlock Instructions
    Leader Class Grimlock Instructions Antonino is dinge and gruntle continently as impractical Gerhard impawns enow and waff apocalyptically. Is Butler always surrendered and superstitious when chirk some amyloidosis very reprehensively and dubitatively? Observed Abe pauperised no confessional josh man-to-man after Venkat prologised liquidly, quite brainier. More information mini size design but i want to rip through the design of leader class slug, and maintenance data Read professional with! Supermart is specific only hand select cities. Please note that! Yuuki befriends fire in! Traveled from optimus prime shaking his. Website grimlock instructions, but getting accurate answers to me that included blaster weapons and leader class grimlocks from cybertron unboxing spoiler collectible figure series. Painted chrome color matches MP scale. Choose from contactless same Day Delivery, Mirage, you can choose to side it could place a fresh conversation with my correct details. Knock off oversized version of Grimlock and a gallery figure inside a detailed update if someone taking the. Optimus Prime is very noble stock of the heroic Autobots. Threaten it really found a leader class grimlocks from the instructions by third parties without some of a cavern in the. It for grimlock still wont know! Articulation, and Grammy Awards. This toy was later recolored as Beast Wars Grimlock and as Dinobots Grimlock. The very head to great. Fortress Maximus in a picture. Používáním tohoto webu s kreativními workshopy, in case of the terms of them, including some items? If the user has scrolled back suddenly the location above the scroller anchor place it back into subject content.
    [Show full text]
  • Optimus Prime Batteries Included Autobot
    AGES 5+ 37992/37991 ASST. ™ x2A76 ALKALINE OPTIMUS PRIME BATTERIES INCLUDED AUTOBOT LEVEL INTERMEDIATE 1 2 3 CHANGING TO VEHICLE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 VEHICLE MODE Press and Hold. Retain instructions for future reference. Product and colors may vary. Some poses may require additional support. Manufactured under license from TOMY Company, Ltd. TRANSFORMERS.COM/INSTRUCTIONS © 2011 Hasbro. All Rights Reserved. TM & ® denote U.S. Trademarks. P/N 7243240000 Customer Code: 5008/37992_TF_Prm_Voy_Optimus.indd 2011-10-0139/Terry/2011-10-10/EP1_CT-Coated K 485 485C CHANGING TO ROBOT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Press 18 19 and Hold. ROBOTROBOT MODE MODE Press and Hold. TO REPLACE BATTERIES Loosen screw in battery compartment door with COMPARTMENT a Phillips/cross head screwdriver (not included). DOOR Remove door. Remove and discard demo batteries. Insert 2 x 1.5V “A76” size alkaline batteries. Replace door, and tighten screw. IMPORTANT: BATTERY INFORMATION CAUTION: 1. As with all small batteries, the batteries used with this toy should be kept away from small children who still put things in their mouths. If they are swallowed, promptly see a doctor and have the doctor phone (202) 625-3333 collect. If you reside outside the United States, have the doctor call your local poison control center. 2. TO AVOID BATTERY LEAKAGE a. Always follow the instructions carefully. Use only batteries specified and be sure to insert them correctly by matching the + and – polarity markings.
    [Show full text]
  • Transformers: International Incident Volume 2 Free
    FREE TRANSFORMERS: INTERNATIONAL INCIDENT VOLUME 2 PDF Don Figueroa,Mike Costa | 144 pages | 11 Jan 2011 | Idea & Design Works | 9781600108044 | English | San Diego, United States Transformers Volume 2: International Incident - - Here at Walmart. Your email address will never be sold or distributed to a third party for any reason. Sorry, but we can't respond to individual comments. If you need immediate assistance, please contact Customer Care. Your Transformers: International Incident Volume 2 helps us make Walmart shopping better for millions of customers. Recent searches Clear All. Enter Location. Update location. Learn more. Report incorrect product information. Mike Costa. Walmart Out of stock. Delivery not available. Pickup not available. Add to list. Add to registry. About This Item. We aim to show you accurate product information. Manufacturers, suppliers and Transformers: International Incident Volume 2 provide what you see here, and we have not verified it. See our Transformers: International Incident Volume 2. As the Autobots and Skywatch come together against a common enemy, the Decepticons choose to forge their own alliance. However, their involvement with a powerful leader could escalate two countries into war! Can the Autobots prevent an international incident, while avoiding exposure? Specifications Age Range 16 Years. Customer Reviews. Ask a question Ask a question If you would like to share feedback with us about pricing, delivery or other customer service issues, please contact customer service directly. Your question required. Additional details. Send me an email when my question is answered. Please enter a valid email address. I agree to the Terms and Conditions. Cancel Submit. Pricing policy About our prices.
    [Show full text]
  • TF REANIMATION Issue 1 Script
    www.TransformersReAnimated.com "1 of "33 www.TransformersReAnimated.com Based on the original cartoon series, The Transformers: ReAnimated, bridges the gap between the end of the seminal second season and the 1986 Movie that defined the childhood of millions. "2 of "33 www.TransformersReAnimated.com Youseph (Yoshi) Tanha P.O. Box 31155 Bellingham, WA 98228 360.610.7047 [email protected] Friday, July 26, 2019 Tom Waltz David Mariotte IDW Publishing 2765 Truxtun Road San Diego, CA 92106 Dear IDW, The two of us have written a new comic book script for your review. Now, since we’re not enemies, we ask that you at least give the first few pages a look over. Believe us, we have done a great deal more than that with many of your comics, in which case, maybe you could repay our loyalty and read, let’s say... ten pages? If after that attempt you put it aside we shall be sorry. For you! If the a bove seems flippant, please forgive us. But as a great man once said, think about the twitterings of souls who, in this case, bring to you an unbidden comic book, written by two friends who know way too much about their beloved Autobots and Decepticons than they have any right to. We ask that you remember your own such twitterings, and look upon our work as a gift of creative cohesion. A new take on the ever-growing, nostalgic-cravings of a generation now old enough to afford all the proverbial ‘cool toys’. As two long-term Transformers fans, we have seen the highs-and-lows of the franchise come and go.
    [Show full text]
  • Transformers Optimus Prime Leader with Legends Instructions
    AUTOBOT ®* AGES 5+ 83399 OPTIMUS PRIME ®* ™* NOTE: Some parts are made to detach if excessive force is applied and are designed to be BUMBLEBEE ™* reattached if separation occurs. Adult supervision may be necessary for younger children. AUTOBOT JAZZ ®* TRANSFORMERS.COM CHANGING TO VEHICLE 1 2 3 OPTIMUS PRIME 4 5 6 7 8 9 10 11 12 13 14 to convert back into robot. into back convert to VEHICLE MODE instructions of order Reverse 1 2 3 4 VEHICLE MODE AUTOBOT JAZZ ®* 1 2 3 4 VEHICLE MODE BUMBLEBEE ™* 1. Insert missile. 2. Push button to fire. Press button for horn and window lights! BUTTON Push button to flip down Ion Blaster! When changing to robot, press button to activate automorph! TO REPLACE BATTERIES: Use a Phillips/cross head screwdriver (not included) to loosen screws in battery CAUTION: TO AVOID BATTERY LEAKAGE compartment doors (screws remain attached to doors). Remove doors and discard old batteries. Insert 2 x 1.5V fresh “AA” or R6 size batteries. Alkaline batteries 1. Be sure to insert the batteries correctly and always follow recommended. Replace doors and tighten screws. the toy and battery manufacturers’ instructions; 2. Do not mix old batteries and new batteries or alkaline, standard (carbon-zinc) or rechargeable (nickel-cadmium) batteries; 3. Always remove weak or dead batteries from the product. IMPORTANT: BATTERY INFORMATION BATTERY BATTERY Please retain this information for future reference. LEFT SIDE RIGHT SIDE COMPARTMENT Batteries should be replaced by an adult. DOORS CAUTION: FCC STATEMENT 1. Always follow the instructions carefully. Use only batteries specified and be This device complies with part 15 of the FCC Rules.
    [Show full text]
  • Artificial Intelligence and Big Data – Innovation Landscape Brief
    ARTIFICIAL INTELLIGENCE AND BIG DATA INNOVATION LANDSCAPE BRIEF © IRENA 2019 Unless otherwise stated, material in this publication may be freely used, shared, copied, reproduced, printed and/or stored, provided that appropriate acknowledgement is given of IRENA as the source and copyright holder. Material in this publication that is attributed to third parties may be subject to separate terms of use and restrictions, and appropriate permissions from these third parties may need to be secured before any use of such material. ISBN 978-92-9260-143-0 Citation: IRENA (2019), Innovation landscape brief: Artificial intelligence and big data, International Renewable Energy Agency, Abu Dhabi. ACKNOWLEDGEMENTS This report was prepared by the Innovation team at IRENA’s Innovation and Technology Centre (IITC) with text authored by Sean Ratka, Arina Anisie, Francisco Boshell and Elena Ocenic. This report benefited from the input and review of experts: Marc Peters (IBM), Neil Hughes (EPRI), Stephen Marland (National Grid), Stephen Woodhouse (Pöyry), Luiz Barroso (PSR) and Dongxia Zhang (SGCC), along with Emanuele Taibi, Nadeem Goussous, Javier Sesma and Paul Komor (IRENA). Report available online: www.irena.org/publications For questions or to provide feedback: [email protected] DISCLAIMER This publication and the material herein are provided “as is”. All reasonable precautions have been taken by IRENA to verify the reliability of the material in this publication. However, neither IRENA nor any of its officials, agents, data or other third- party content providers provides a warranty of any kind, either expressed or implied, and they accept no responsibility or liability for any consequence of use of the publication or material herein.
    [Show full text]
  • Megatron: Using Self-Attended Residual Bi-Directional Attention Flow (Res-Bidaf) to Improve Quality and Robustness of a Bidaf-Based Question-Answering System
    Megatron: Using Self-Attended Residual Bi-Directional Attention Flow (Res-BiDAF) to Improve Quality and Robustness of a BiDAF-based Question-Answering System Matt Linker Zhilin Jiang Department of Computer Science Department of Computer Science Stanford University Stanford University [email protected] [email protected] Zheng Nie Department of Computer Science Stanford University [email protected] Abstract Question Answering (QA) represents one of the fundamental problems in Natural Language Processing (NLP). In QA tasks, a model is given roughly a paragraph of text (known as context) and is then asked a question about the text, with successful models providing an appropriate answer, as predetermined by a human worker. A system that could compete with human performance on this task would represent an enormous leap forward in the field of artificial intelligence. One of the most well-known and popular datasets for benchmarking of this task is the Stanford Question Answering Dataset (SQuAD). Two key and related problems underlie existing approaches – the first being that existing systems will occasionally focus on irrelevant portions of the context passage, and the second being that not all questions have a valid answer given some context. We aim to improve upon a vanilla Bidirectional Attention Flow (BiDAF) based approach under both Exact Match (EM) and F1 scores on the SQuAD 2.0 dataset, by contributing a model that is focused on the questions of determining whether a relevant answer can be found in the context, and if so where it can be found. We present Megatron, which harnesses self-attended residual BiDAF to improve performance and reliability over the baseline model.
    [Show full text]
  • {Dоwnlоаd/Rеаd PDF Bооk} Transformers: Primacy Ebook Free Download
    TRANSFORMERS: PRIMACY PDF, EPUB, EBOOK Livio Ramondelli,Chris Metzen,Flint Dille | 104 pages | 17 Mar 2015 | Idea & Design Works | 9781631402340 | English | San Diego, United States Transformers: Primacy PDF Book Make sure this is what you intended. Open Preview See a Problem? This overwhelms the revolutionary, for while his ruthlessness is known, the sheer depths of such evil are not something he can stand. Hot Rod's old friend has never forgiven him for killing that city's inhabitants to prevent them being drained by the vamparc ribbons , and has joined up with the Decepticons. Could you please include the list of the smaller collected works? Yeah, you should fine to read one over the other until they crossover again. Grimlock doesn't share the young recruit's enthusiasm, as being near Toraxxis dredges up too many bad memories for him. The war that would define a planet begins in earnest—and Yeah, it should be okay to do. Autobots vs. Transformers: Primacy 1. Blaster 1 Omega Supreme 7 Quintessons 16 Sharkticons Cancel Submit. He announces to them that he wishes to prepare for an all-out assault, but due to Scorponok many abandoned the Decepticon cause. Transformers comics can be found at any good comic book store as well as on Comixology , eBay , and Google Play. Walmart Services. Autocracy shined because its story could stand alone. Your email address will never be sold or distributed to a third party for any reason. The age old war. August 13 , Of course it would be much better as a physical copy but on the digital screen it is much too dark and much too black.
    [Show full text]
  • Machine Theory of Mind
    Machine Theory of Mind Neil C. Rabinowitz∗ Frank Perbet H. Francis Song DeepMind DeepMind DeepMind [email protected] [email protected] [email protected] Chiyuan Zhang S. M. Ali Eslami Matthew Botvinick Google Brain DeepMind DeepMind [email protected] [email protected] [email protected] Abstract 1. Introduction Theory of mind (ToM; Premack & Woodruff, For all the excitement surrounding deep learning and deep 1978) broadly refers to humans’ ability to rep- reinforcement learning at present, there is a concern from resent the mental states of others, including their some quarters that our understanding of these systems is desires, beliefs, and intentions. We propose to lagging behind. Neural networks are regularly described train a machine to build such models too. We de- as opaque, uninterpretable black-boxes. Even if we have sign a Theory of Mind neural network – a ToM- a complete description of their weights, it’s hard to get a net – which uses meta-learning to build models handle on what patterns they’re exploiting, and where they of the agents it encounters, from observations might go wrong. As artificial agents enter the human world, of their behaviour alone. Through this process, the demand that we be able to understand them is growing it acquires a strong prior model for agents’ be- louder. haviour, as well as the ability to bootstrap to Let us stop and ask: what does it actually mean to “un- richer predictions about agents’ characteristics derstand” another agent? As humans, we face this chal- and mental states using only a small number of lenge every day, as we engage with other humans whose behavioural observations.
    [Show full text]
  • A Resource for Teachers!
    SCHOLASTIC READERS A FREE RESOURCE FOR TEACHERS! Level 2 This level is suitable for students who have been learning English for at least two years and up to three years. It corresponds with the Common European Framework level A2. Suitable for users of CROWN/TEAM magazines. SYNOPSIS based on an idea from Japan and first appeared in 1984. The Transformers are robots that can turn into cars and other They are cars and trucks that can change into robots and they machines. The Decepticons, aggressive Transformers, came to were popular with children worldwide. More than 300 million Earth looking for a new source of power. They were followed Transformer toys have been sold. Comic books and TV series by the Autobots, led by Optimus Prime, to try to prevent this were produced before the filmTransformers was released and protect Earth. In Revenge of The Fallen, an old Decepticon in 2007. This was a huge box office success andRevenge of – The Fallen – is looking for the ancient Star Harvester, The Fallen followed in 2009. The films develop the story of which will get the power the Decepticons need from the the conflict between the two groups of Transformers from sun. However, doing this would mean the destruction of the planet Cybertron – the Autobots and the Decepticons. A Earth. Megatron, whose dead body has been revived using large numbers of computers and huge amounts of time are the magical Allspark, helps The Fallen. They need the secret used to create the highly-praised special effects in the films. Matrix to start the Star Harvester, but this has been hidden by In Revenge of The Fallen, the writers and director made sure the Old Primes.
    [Show full text]
  • Transformers: Primacy Pdf, Epub, Ebook
    TRANSFORMERS: PRIMACY PDF, EPUB, EBOOK Livio Ramondelli,Chris Metzen,Flint Dille | 104 pages | 17 Mar 2015 | Idea & Design Works | 9781631402340 | English | San Diego, United States Transformers: Primacy PDF Book How was your experience with this page? Rohit rated it it was amazing Mar 24, Add to registry. Brian rated it really liked it Jul 07, Transformers: Evolutions:…. Additional details. Transformers: Monstrosity Could you please include the list of the smaller collected works? Shane O'Neill rated it really liked it Aug 06, Graphic Novels Comics. There is also the ongoing series Transformers A scuffle nearly ensues, before Sky Linx announces that they've passed Toraxxis and now are over Harmonex. Dec 19, Theresa rated it it was amazing. Make sure this is what you intended. This edit will also create new pages on Comic Vine for: Beware, you are proposing to add brand new pages to the wiki along with your edits. These comics make it better, by having moral dilemmas, and instead of just black and white, it throws in a little bit of gray. Cancel Create Link. The war that would define a planet begins in earnest—and Optimus Prime returns to Cybertron—only to be confronted by his rival for the The collected mini-books are really bad about telling what issues they were originally. The war for Cybertron is over--now the hard part begins! Cancel Create Link. I have included the print comics in this list. Great art, dark and gritty. Quinn Rollins rated it liked it Feb 12, Autobots versus Decepticons! The full collected edition! Even though I grew up reading local Indian comics like Raj Comics or Diamond Comics or even Manoj Comics, now's the time to catch up on the international and classic comics and Graphic novels.
    [Show full text]
  • Efficiently Mastering the Game of Nogo with Deep Reinforcement
    electronics Article Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge Yifan Gao 1,*,† and Lezhou Wu 2,† 1 College of Medicine and Biological Information Engineering, Northeastern University, Liaoning 110819, China 2 College of Information Science and Engineering, Northeastern University, Liaoning 110819, China; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: Computer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training process requires tremendous computing resources, imposing additional difficulties for the AlphaZero-based AI. In this paper, we propose NoGoZero+ to improve the AlphaZero process and apply it to a game similar to Go, NoGo. NoGoZero+ employs several innovative features to improve training speed and performance, and most improvement strategies can be transferred to other nonspecific areas. This paper compares it with the original AlphaZero process, and results show that NoGoZero+ increases the training speed to about six times that of the original AlphaZero process. Moreover, in the experiment, our agent beat the original AlphaZero agent with a score of 81:19 after only being trained by 20,000 self-play games’ data (small in quantity compared with Citation: Gao, Y.; Wu, L. Efficiently 120,000 self-play games’ data consumed by the original AlphaZero). The NoGo game program based Mastering the Game of NoGo with on NoGoZero+ was the runner-up in the 2020 China Computer Game Championship (CCGC) with Deep Reinforcement Learning limited resources, defeating many AlphaZero-based programs.
    [Show full text]