CS 541 — Fall 2021

CS 541 — Fall 2021

CS 541 — Fall 2021 Programming Assignment 2 CSX_go Scanner Your next project step is to write a scanner module for the programming language CSX_go (Computer Science eXperimental: Go-like syntax). Use the JFlex scanner- generation tool ( ased on !ex). "uture assignments will invol#e a $%& parser' type checker and code generator. The CSX Scanner Generate the $%&(go scanner' a mem er of class Yylex' using JFlex. Your main task is to create the )le csx_go.jflex' the input to *"lex. +he jflex )le specifies the regular expression patterns for all the $%&(go tokens' as well as any special processing re,uired y tokens. When a #alid $%& token is matched y mem er function yylex()' it returns an o ject that is an instance of class java_cup.runtime.Symbol (the class our parser ex- pects to receive from the scanner). Symbol contains an integer )eld sym that identifies the token class just matched. .ossible #alues of sym are identified in the class sym1. Symbol also contains a )eld value,which contains token information eyond the token’s identity. "or $%&(go' the value )eld references an instance of class CSXToken (or a su class of CSXToken). CSXToken contains the line num er and column num er at which each token was found. +his information is necessary to frame high-,uality error messages. +he line num er on which a token appears is stored in linenum. +he column num er at which a token egins is stored in colnum. +he column num er counts tabs as one character' e#en though they expand into se#eral lanks when #iewed. You must also store auxiliary information for identifiers' integer literals' character literals and string literals. "or identifiers' class CSXIdentifierToken' a su class of CSXToken' contains the identifier/s name in )eld identifierText. "or integer liter- als' class CSXIntLitToken' a su class of CSXToken' contains the literal/s numeric 1 Java class names normally are capitalized. However, certain classes created by the tool Java CUP ignore this convention. #alue in )eld intValue. "or character literals' class CSXCharLitToken' a su class of CSXToken' contains the literal/s character #alue in )eld charValue. "or string literals' class CSXStringLitToken' a su class of CSXToken' contains a )eld stringText, the full text of the string (with enclosing dou le ,uotes and internal escape se,uences included as they appeared in the original string text that was scanned). CSX_go Tokens +he $%&_go language uses the following classes of tokens: 0 +he reserved words of the $%&(go language: bool break char const continue else for func if int package print read return var +he break and continue reser#ed words are optional; compilers that include them receive extra credit. 0 Identifiers. 2n identifier is a se,uence of 2%$33 letters and digits starting with a letter' excluding reser#ed words. 3d 45 4(2 6 B | … | 8 | a 6 b | … z) (2 | B | … 6 Z 6 a | | … 9 | 0 6 1 | … 9)= - Reser#ed 0 Integer Literals. 2n integer literal is a se,uence of digits' optionally preceded y a ?. 2 ? denotes a negati#e #alue. 443nteger!it4 544(? 6 l) (0 6 ; 6 … 6 9)@ 0 String Literals. 2 string literal is any se,uence of printable 2%$33 characters' delimited y dou le ,uotes. 2 dou le ,uote within the text of a string must e escaped (as \”) to avoid eing misinterpreted as the end of the string. +abs and newlines within a string must e escaped (\n is newline and \t is tab). 7ackslashes within a string must also e escaped (as ((). Ao other escaped characters are allowed. %trings may not cross line oundaries. %tring!it 5 B ( Aot(B 6 C 6 Unprintable$har) 6 CB 6 Cn 6 Ct 6 CC )= B 0 Character Literals. 2 character literal is any printable character' enclosed within single ,uotes. 2 single ,uote within a character literal must e escaped (as \') to avoid eing misinterpreted as the end of the literal. 2 tab or newline must e escaped (D\nD is a newline and D\tD is a tab). 2 ackslash must also e escaped (as D(('). Ao other escaped characters are allowed. Char!it 5 ' ( Aot(' 6 C 6 Unprintable$har) 6 C' 6 Cn 6 Ct 6 CC ) D 0 Boolean Literals. +he two 7oolean literals are true and false (case insensitive). 0 Other Tokens. +hese are miscellaneous one- or two-character sym ols representing operators and delimiters. ( ) [ ] = ; + - * / == != && || < > <= >= , ! { } 8 +he colon (:) is only re,uired if you do the extra-credit break and continue. 0 End-of- ile !EO " Token. +he EF" token is automatically returned y yylex() when it reaches the end of )le while scanning the )rst character of a token. Comments and white space' as de)ned elow' are not tokens ecause they are not re- turned y the scanner. Ae#ertheless' they must e matched (and skipped) when they are encountered. 0 # Single Line Co$$ent. 2s in $@@' *ava, and Go' this style of comment egins with a pair of slashes and ends at the end of the current line. 3ts ody can include any character other than an end-of-line. !ineComment4 54 GG Aot(Eol)= 4Eol 0 # %&lti-Line Co$$ent. +his comment egins with the pair 99 and ends with the pair 99. 3ts ody can include any character se,uence other than two consecutive 9/s. 7lockComment4 54 HH ( (H6l) Aot(H) )= 4HH 0 White S(ace. White space separates tokens1 otherwise it is ignored. White%pace 5 ( 7lank 6 +ab 6 Eol) @ 2ny character that cannot e scanned as part of a #alid token, comment or white space is invalid and should generate an error message. Considerations)*eq&irements • 7ecause reser#ed words look like identifiers' you must e careful not to miss-scan them as identi)ers. You should include distinct token de)nitions for each reser#ed word before your de)nition of identifiers. • Upper- and lower-case letters are e,uivalent in reser#ed words ut not in identifiers. When you print a reser#ed word' print its conversion to lower case. • .rint character and string literals as they are input, that is' with the escaped characters shown as Cn, CC' or whate#er' and with the surrounding ,uotes. Iowe#er' you should also store the eJective #alues of character and string literals' in which escaped characters are replaced y their meaning' and surrounding ,uotes are remo#ed. • You can use the Unix command “man ascii” for a list of 2%$33 characters in order to uild a regular expression for printable 2%$33 characters. • You should not assume any limit on the length of identifiers. • You should not assume any limit on the length of input lines that are scanned. • You may use *ava 2.3 classes to convert strings representing integer literals to their corresponding integer #alues. 7e careful though; in *ava a minus sign, -, and not ? represents a negative #alue. You must detect and report o#erflow in a system- independent fashion, perhaps using the constants MIN_VALUE and MAX_VALUE in class Integer. No not halt on o#erflow1 print an error message and return MAX_VALUE or MIN_VALUE as the K#alueL of the literal. 2n online reference manual for *"lex may e found in the “Useful .rogramming +oolsL section of the class homepage. • 2lthough *"lex/s regular expression syntax is designed to e #ery similar to that of !ex or .erl' it is not identical. >ead the *"lex manual carefully. ◦ 2 lank should not e used within a character class (i.e.' ) and *). You may use ( ?@? (which is the character code for a lank). ◦ 2 dou le ,uote is meaningful within a character class (i.e.' ) and *). ◦ You should not use yy egin or any explicit states. • 2s was the case in project ;' javac re,uires an environment #aria le CLASSPATH to de)ne the directories to e searched to )nd .class )les stored in libraries. JFlex and JavaCup (in the next assignment) use CLASSPATH to tell *ava where to )nd the classes that they use. Fnce again, the Makefile we supply places all .class )les in su directory classes. • %keleton )les and a :akefile are in the directory ~raphael/. courses/cs541/public/proj2/startup.1 they are also availa le through the class homepage. No not assume that the *ava coding is up to modern standard1 use a style checker to impro#e the code. What to hand in %u mit your project electronically y uploading proj2.tar.gz to Canvas. .lease run make clean )rst to remo#e all .class )les. Your tar all (or zip )le or other archi#e) should contain: (1) the csx_go.jflex )le you create' (2) any other classes you create' (3) the test data you use to test your scanner' (4) the outputs produced using your test data, (4) a G><H:> )le' (5) a Makefile ' and (6) all source )les necessary to uild an executable #ersion of your program (.java )les and a csx_go.jflex )le). Aame the class that contains your main()routine P2.java. Your scanner test program should act like the test program illustrated elow' reading a stream of characters from the command line )le and printing out the tokens matched to the standard output' one per line in the following format: line:column token "or identi)ers' include the text of the identifier1 for integer literals' include the literal’s numeric #alue1 for character and string literals' include the literal’s full text (with enclosing ,uotes and escape se,uences). Use the following format line:column token (value) "or example' if the contents of the )le is: var T { // hello, This is // a test coNst cnst "hello\n" ^ ?10; You should produce: 1:1 Reserved (var) 1:5 Identifier (T) 1:7 Symbol ({) 4:1 Reserved (const) 5:4 Identifier (cnst) 6:1 String literal ("hello\n") 7:1 Invalid token (^) 8:1 Integer literal (-10) 8:4 Symbol (;) Your program should try to follow this format to ease grading.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us