
CS 541 — Fall 2021 Programming Assignment 2 CSX_go Scanner Your next project step is to write a scanner module for the programming language CSX_go (Computer Science eXperimental: Go-like syntax). Use the JFlex scanner- generation tool ( ased on !ex). "uture assignments will invol#e a $%& parser' type checker and code generator. The CSX Scanner Generate the $%&(go scanner' a mem er of class Yylex' using JFlex. Your main task is to create the )le csx_go.jflex' the input to *"lex. +he jflex )le specifies the regular expression patterns for all the $%&(go tokens' as well as any special processing re,uired y tokens. When a #alid $%& token is matched y mem er function yylex()' it returns an o ject that is an instance of class java_cup.runtime.Symbol (the class our parser ex- pects to receive from the scanner). Symbol contains an integer )eld sym that identifies the token class just matched. .ossible #alues of sym are identified in the class sym1. Symbol also contains a )eld value,which contains token information eyond the token’s identity. "or $%&(go' the value )eld references an instance of class CSXToken (or a su class of CSXToken). CSXToken contains the line num er and column num er at which each token was found. +his information is necessary to frame high-,uality error messages. +he line num er on which a token appears is stored in linenum. +he column num er at which a token egins is stored in colnum. +he column num er counts tabs as one character' e#en though they expand into se#eral lanks when #iewed. You must also store auxiliary information for identifiers' integer literals' character literals and string literals. "or identifiers' class CSXIdentifierToken' a su class of CSXToken' contains the identifier/s name in )eld identifierText. "or integer liter- als' class CSXIntLitToken' a su class of CSXToken' contains the literal/s numeric 1 Java class names normally are capitalized. However, certain classes created by the tool Java CUP ignore this convention. #alue in )eld intValue. "or character literals' class CSXCharLitToken' a su class of CSXToken' contains the literal/s character #alue in )eld charValue. "or string literals' class CSXStringLitToken' a su class of CSXToken' contains a )eld stringText, the full text of the string (with enclosing dou le ,uotes and internal escape se,uences included as they appeared in the original string text that was scanned). CSX_go Tokens +he $%&_go language uses the following classes of tokens: 0 +he reserved words of the $%&(go language: bool break char const continue else for func if int package print read return var +he break and continue reser#ed words are optional; compilers that include them receive extra credit. 0 Identifiers. 2n identifier is a se,uence of 2%$33 letters and digits starting with a letter' excluding reser#ed words. 3d 45 4(2 6 B | … | 8 | a 6 b | … z) (2 | B | … 6 Z 6 a | | … 9 | 0 6 1 | … 9)= - Reser#ed 0 Integer Literals. 2n integer literal is a se,uence of digits' optionally preceded y a ?. 2 ? denotes a negati#e #alue. 443nteger!it4 544(? 6 l) (0 6 ; 6 … 6 9)@ 0 String Literals. 2 string literal is any se,uence of printable 2%$33 characters' delimited y dou le ,uotes. 2 dou le ,uote within the text of a string must e escaped (as \”) to avoid eing misinterpreted as the end of the string. +abs and newlines within a string must e escaped (\n is newline and \t is tab). 7ackslashes within a string must also e escaped (as ((). Ao other escaped characters are allowed. %trings may not cross line oundaries. %tring!it 5 B ( Aot(B 6 C 6 Unprintable$har) 6 CB 6 Cn 6 Ct 6 CC )= B 0 Character Literals. 2 character literal is any printable character' enclosed within single ,uotes. 2 single ,uote within a character literal must e escaped (as \') to avoid eing misinterpreted as the end of the literal. 2 tab or newline must e escaped (D\nD is a newline and D\tD is a tab). 2 ackslash must also e escaped (as D(('). Ao other escaped characters are allowed. Char!it 5 ' ( Aot(' 6 C 6 Unprintable$har) 6 C' 6 Cn 6 Ct 6 CC ) D 0 Boolean Literals. +he two 7oolean literals are true and false (case insensitive). 0 Other Tokens. +hese are miscellaneous one- or two-character sym ols representing operators and delimiters. ( ) [ ] = ; + - * / == != && || < > <= >= , ! { } 8 +he colon (:) is only re,uired if you do the extra-credit break and continue. 0 End-of- ile !EO " Token. +he EF" token is automatically returned y yylex() when it reaches the end of )le while scanning the )rst character of a token. Comments and white space' as de)ned elow' are not tokens ecause they are not re- turned y the scanner. Ae#ertheless' they must e matched (and skipped) when they are encountered. 0 # Single Line Co$$ent. 2s in $@@' *ava, and Go' this style of comment egins with a pair of slashes and ends at the end of the current line. 3ts ody can include any character other than an end-of-line. !ineComment4 54 GG Aot(Eol)= 4Eol 0 # %&lti-Line Co$$ent. +his comment egins with the pair 99 and ends with the pair 99. 3ts ody can include any character se,uence other than two consecutive 9/s. 7lockComment4 54 HH ( (H6l) Aot(H) )= 4HH 0 White S(ace. White space separates tokens1 otherwise it is ignored. White%pace 5 ( 7lank 6 +ab 6 Eol) @ 2ny character that cannot e scanned as part of a #alid token, comment or white space is invalid and should generate an error message. Considerations)*eq&irements • 7ecause reser#ed words look like identifiers' you must e careful not to miss-scan them as identi)ers. You should include distinct token de)nitions for each reser#ed word before your de)nition of identifiers. • Upper- and lower-case letters are e,uivalent in reser#ed words ut not in identifiers. When you print a reser#ed word' print its conversion to lower case. • .rint character and string literals as they are input, that is' with the escaped characters shown as Cn, CC' or whate#er' and with the surrounding ,uotes. Iowe#er' you should also store the eJective #alues of character and string literals' in which escaped characters are replaced y their meaning' and surrounding ,uotes are remo#ed. • You can use the Unix command “man ascii” for a list of 2%$33 characters in order to uild a regular expression for printable 2%$33 characters. • You should not assume any limit on the length of identifiers. • You should not assume any limit on the length of input lines that are scanned. • You may use *ava 2.3 classes to convert strings representing integer literals to their corresponding integer #alues. 7e careful though; in *ava a minus sign, -, and not ? represents a negative #alue. You must detect and report o#erflow in a system- independent fashion, perhaps using the constants MIN_VALUE and MAX_VALUE in class Integer. No not halt on o#erflow1 print an error message and return MAX_VALUE or MIN_VALUE as the K#alueL of the literal. 2n online reference manual for *"lex may e found in the “Useful .rogramming +oolsL section of the class homepage. • 2lthough *"lex/s regular expression syntax is designed to e #ery similar to that of !ex or .erl' it is not identical. >ead the *"lex manual carefully. ◦ 2 lank should not e used within a character class (i.e.' ) and *). You may use ( ?@? (which is the character code for a lank). ◦ 2 dou le ,uote is meaningful within a character class (i.e.' ) and *). ◦ You should not use yy egin or any explicit states. • 2s was the case in project ;' javac re,uires an environment #aria le CLASSPATH to de)ne the directories to e searched to )nd .class )les stored in libraries. JFlex and JavaCup (in the next assignment) use CLASSPATH to tell *ava where to )nd the classes that they use. Fnce again, the Makefile we supply places all .class )les in su directory classes. • %keleton )les and a :akefile are in the directory ~raphael/. courses/cs541/public/proj2/startup.1 they are also availa le through the class homepage. No not assume that the *ava coding is up to modern standard1 use a style checker to impro#e the code. What to hand in %u mit your project electronically y uploading proj2.tar.gz to Canvas. .lease run make clean )rst to remo#e all .class )les. Your tar all (or zip )le or other archi#e) should contain: (1) the csx_go.jflex )le you create' (2) any other classes you create' (3) the test data you use to test your scanner' (4) the outputs produced using your test data, (4) a G><H:> )le' (5) a Makefile ' and (6) all source )les necessary to uild an executable #ersion of your program (.java )les and a csx_go.jflex )le). Aame the class that contains your main()routine P2.java. Your scanner test program should act like the test program illustrated elow' reading a stream of characters from the command line )le and printing out the tokens matched to the standard output' one per line in the following format: line:column token "or identi)ers' include the text of the identifier1 for integer literals' include the literal’s numeric #alue1 for character and string literals' include the literal’s full text (with enclosing ,uotes and escape se,uences). Use the following format line:column token (value) "or example' if the contents of the )le is: var T { // hello, This is // a test coNst cnst "hello\n" ^ ?10; You should produce: 1:1 Reserved (var) 1:5 Identifier (T) 1:7 Symbol ({) 4:1 Reserved (const) 5:4 Identifier (cnst) 6:1 String literal ("hello\n") 7:1 Invalid token (^) 8:1 Integer literal (-10) 8:4 Symbol (;) Your program should try to follow this format to ease grading.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-