Probabilistic Text Generation with Hashmap<K, V> and Java.Util.Arraylist<E>

Total Page:16

File Type:pdf, Size:1020Kb

Probabilistic Text Generation with Hashmap<K, V> and Java.Util.Arraylist<E>

Probabilistic Text Generation with HashMap and java.util.ArrayList You are asked to complete class RandomWriterWithHashMap.java that provides a random writing application, It must use java.util.HashMap to store all possible nGrams (short strings) as keys mapped to the list of all possible following characters as the value. Your program (file is to be turned into the D2L dropbox named RandomWrite. While testing your code, use any very small input file you want or an entire book from Project Gutenberg. You must use java.util.HashMap class, java.util.ArrayList, and the following algorithm: 1. (This is done) Read all file input into one big string (already done in RandomWriterWithHashMap.java, actually a StringBuilder object to save time--StringBuilder has all the methods of String plus a fast append method) 2. Complete method setUpMap() so it initializes the HashMap instance variable all to have every possible nGram of the given nGram length as the key and an ArrayList as the value for each mapping. These ArrayList values must contain all characters that follow each key in the given text input. 3. Done: Pick a random nGram from the original text. This is known as nGram 4. Complete method printRandom(int howMany): so it prints howMany characters with this algorithm · Get the list of all the characters that follow the nGram (the instance variable nGram is set for you initially) from all (the HashMap instance variable you built in setUpMap. · Randomly select one of the characters that list of followers · Print that random character · Change the nGram so the first character is gone and the just printed random character is appended (nGram must be the same length at the end)

Start with the following program and complete methods setUpMap and printRandom

// This program generates 500 characters of probabilistic text. // When the seedLength is small, as in 2 or 3, the text should // be very random. When the nGram length is 5 words should appear. // At 14 or more, you are probably getting exact sentences from the book. // // Warning: A big input file and a large nGram length can run your computer out of memory. // // @author YOUR NAME // import java.io.File; import java.io.FileNotFoundException; import java.util.ArrayList; import java.util.HashMap; import java.util.Random; import java.util.Scanner; public class RandomWriterWithMap {

public static void main(String[] args) { Scanner keyboard = new Scanner(System.in); System.out.print("Enter the file name to random write: "); String fileName = keyboard.nextLine();

System.out .print("Enter nGram length, 1 is like random, 12 is like the book: "); int nGramLength = keyboard.nextInt(); keyboard.close();

RandomWriterWithMap rw = new RandomWriterWithMap(fileName, nGramLength); rw.printRandom(500); }

private HashMap> all; private int nGramLength; private String fileName; private StringBuilder theText; private static Random generator; private String nGram; public RandomWriterWithMap(String fileName, int nGramLength) { this.fileName = fileName; this.nGramLength = nGramLength; generator = new Random(); makeTheText(); setRandomNGram(); setUpMap(); // Algorithm considered during section. }

private void makeTheText() { Scanner inFile = null; try { inFile = new Scanner(new File(fileName)); } catch (FileNotFoundException e) { e.printStackTrace(); } theText = new StringBuilder(); while (inFile.hasNextLine()) { theText = theText.append(inFile.nextLine().trim()); theText = theText.append(' '); } }

public void setRandomNGram() { generator = new Random(); int start = generator.nextInt(theText.length() – nGramLength - 1); nGram = theText.substring(start, start + nGramLength); }

// Read theText char by char to build a OrderedMaps where // every possible nGram exists with the list of followers. // This method need these three instance variables: // nGramLength theText all private void setUpMap() { // TODO: Implement this method }

// Print chars random characters. Please insert line breaks to make your // output readable to the poor grader :-) void printRandom(int howMany) { // TODO: Implement this method } }

Grading Criteria 50 pts max. Turn this in to the D2L Drop Box RandomWrite ___/ +50 Generates text that is gets closer to the original as the nGram increases (subjective). For example, when nGram length = 2, a few words may appear; but when 12, some sentences appear close to the original text. · -50 If no text is generated with a printRandom(500) message · -40 If you did not use a HashMap and the algorithm presented in section that uses a Map to set up all nGrams and the list of followers for each before printing · -40 If text has no apparent differences with different nGram lengths or file input · -40 Output does not seem reasonable

Recommended publications