IEEE Paper Template in A4 (V1)
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Research Trends in Computer Science & Information Technology (IJRTCSIT)[ISSN: 2455-6513] Volume 6, Issue 2 ,December [2020] A review on GPT-3 - An AI revolution from Open AI Naman Mishra1, Priyank Singhal2, Shakti Kundu3 Teerthanker Mahaveer University, Moradabad, UP, India [email protected], [email protected], [email protected] Abstract— the great achievements, and others have injected When we talk about AI we think of machines that themselves very smoothly into our daily regime. In have the ability to think, to be able to make between these extremes, programs of AI have decisions on their own without any human become vital tools in field of science and commerce. interference. To to so one thing has been extremely Even though not every AI achievement makes it to important i.e, how well a machine is able to process mainstream media there have been some amazing and generate language. Until recently we relied on developments in AI and they have in some small small-scale statistical models or formalized way become a part of how technology works today. grammar systems. Over the course of last few years Voice Assistants like Amazon Alexa or Google we have seen better models comes out for these be Home are best examples of products backed by AI it ELMo, BERT or GPT-n series by OpenAI. In this around us. paper we try to look at the large-scale statistical models with the focus on GPT-3 currently the The world of Artificial Intelligence is changing largest model in the world and try to understand rapidly every we are seeing great developments in their impact in the world of AI. this sector. Some major events in past five years were the Google’s deepmind AI learning to walk or Keywords— NLP, AI, GPT-3, OpenAI, Turing- when in 2017, Google AlphaGo AI was able to beat NLG grandmaster in chinese game of GO, considered one of the most complicated games in the world. Now we are seeing great advancements in AI but I. INTRODUCTION even though great work is being done, it would be Artificial intelligence is what helps machine learn fair to say we are still pretty much in the early overtime and adjust to new inputs without human stages of Artificial Intelligence. interference. John Mcharty defined Artificial Intelligence as, “It is the science and engineering of making intelligent machines, especially intelligent II. Background computer programs. It is related to the similar task of using computers to understand human There has been a curiosity to as to whether human intelligence, but AI does not have to confine itself will even be writing code in the future[1]. Having a to methods that are biologically observable.” good language model would be the first step in the Simply stated we can say AI can refer to man’s direction to no code. We are looking into the recent pursuit to build machines that can reason, learn and developments in the world of Artificial Intelligence act intelligently. mainly in Natural Language Processing domain and we will start with the background on some major Some of ARTIFICIAL INTELLIGENCE’s recent breakthrough models like ELMo, BERT, OpenAI achievements show how far AI had come with all GPT-2 , and Microsoft Reseach’s Turing-NLG. College of Computing Sciences & Information Technology (CCSIT), Teerthanker Mahaveer University, Moradabad, India Page 26 International Journal of Research Trends in Computer Science & Information Technology (IJRTCSIT)[ISSN: 2455-6513] Volume 6, Issue 2 ,December [2020] Embeddings from Language Model or popularly of sentences or large size documents it can generate known as ELMo, as suggested by the name, here their summariess documents. At the time of launch Language Models(LM) are used to create deeply this was the largest language model with 17.5 contextualized word embeddings. ELMo uses billion learning parameters. bidirectional language model (biLM) which is pre- trained on a large text corpus, to learn both words Generative models like Turing-NLG are vital for (e.g., syntax and semantics) and linguistic context Natural Language Processing tasks as the objective (i.e., to model polysemy). BiLM capture context- is to act in a direct and accurate way and as fluently dependent aspects of word meaning. as a humans being would in a given scenerio. Before, many a models for QA and summarising would rely on extracting existing content from documents that may be able to serve as a stand-in answer or “summary”, but more often than not these would seem unusual and unclear. While Turing-NLG can answer questions or reply to Fig 1: A comparison BERT, GPT and ELMO models emails or summarize in a natural way as a human would. Turing-NLG was the largest model ever Bidirectional Encoder Representations from published with 17.5 billion parameters but it was Transformers or Bert by Google Research, as dethroned by OpenAI’s latest GPT-3 model which apparent by the name, this model is based on was trained on 175 billion ML parameters[13]. We bidirectional representations from the unlabeled text will be discussing the GPT-3 in detail in the next by jointly conditioning on both left and right section. context in all layers. Because of this BERT is often hailed as one of the most exciting in some years. III. Discussion BERT makes use of Transformer, an attention Generative Pre-trained Transformer 3(GPT-3) is mechanism that learns contextual relations between part of the GPT-n series developed by OpenAI, words (or sub-words) in a text. In its vanilla form, GPT-3, in simple words can be described as an Transformer includes two separate mechanisms — autoregressive language model that uses deep an encoder that reads the text input and a decoder learning to generate human-like text. GPT-3 is that produces a prediction for the task. much larger neural network in terms of parameters compared to its predecessor GPT-2 which had 1.5 The OpenAI GPT-2 is the second version from the billion parameters to the 175 Billion ML parameters GPT-n series of California based artificial in GPT-3, it was trained on hundreds of billions of intelligence research laboratory OpenAI. “GPT-2 is words. It is currently the largest Natural Language a large transformer-based language model with 1.5 Processing transformer taking the crown from billion parameters, trained on a dataset of 8 million Microsoft’s Turing-NLG which had 17B web pages. GPT-2 is trained with a simple objective: parameters[3]. predict the next word, given all of the previous words within some text., with generative pre- Basically, GPT-3 lets humans to communicate with training of a language model on a diverse corpus of machines in Simple English rather than having to unlabelled text, followed by discriminative fine- learn complicated programming languages. By tuning on each specific task[8].” A complete which we mean that by just describing English what version of GPT-2 was released in November 2019. we want the machine to do we can get it do that task, be it code , complete our sentences, create a Unveiled only this year in February, Turing-NLG is simple website, write an article or generate images. a Transformer-based generative language model, it Since the launch researchers have been coming up has the ability to complete open-ended textual tasks. with unique ways the model can used. One Twitter Not only can it finish open ended conversion, but it user was able to interview Albert Einstein using the can give direct and exact answers to questions. model[12]. Considered the most useful part is given a myriad College of Computing Sciences & Information Technology (CCSIT), Teerthanker Mahaveer University, Moradabad, India Page 27 International Journal of Research Trends in Computer Science & Information Technology (IJRTCSIT)[ISSN: 2455-6513] Volume 6, Issue 2 ,December [2020] As of July 2020, GPT-3 is made available to 4. Arram Sabeti used GPT-3 to generate poems researchers/interested people via a private beta about Elon Musk the way they would have program. It is currently being offered as an API been written by Dr. Suess which can be accessed through the cloud, and till 5. Bemmu Sepponen generated an entire now those who got their hands on it have made presentation using the GPT-3 model. some very interesting products that use the capabilities of GPT-3 to make products such as These were just few of the hundreds of ways people search engines, medical question answering have been able to implement GPT-3 in a useful systems and so much more. way[6]. Newer use case sceneries have been coming, like being able to generate an image just by describing it via text. But, there have been some IV. How it works? scepticism too regarding the model, in the next section we will see what are some issues that have GPT-3 stands for "generative pre-training," and the been raised. three specifies the third version. It's generative because as opposed to other neural networks that give a numeric score or a yes or no answer, GPT-3 V. Problems with GPT-3 can generate long sequences of original text as its output[7]. It is pre-trained in the sense that is has Ever since GPT-3 was offered for beta testing it not been built with any domain knowledge, even created a lot of buzz among people and media, with though it can complete domain-specific tasks, such its capabilities and people finding out all that can be as foreign-language translation. done with this model. As goes with all tech with all “A language model, in the case of GPT-3, is a the good there comes some issues like AI lacking program that calculates how likely one word is to common sense[4], were raised by the media and appear in a text given the other words in the text.