Thursday, September 19, 2013

Compilers Project 1: Getting Started

I'm working on a lexer for Python.

My code can be found here: https://bitbucket.org/ashley_dunn/compilers-project-1.

 I'm starting out by doing some research. Specifically reading these:
Based on the little reading I have done so far, I'm leaning towards Racket for the lexer (since I'm using it in my Programming Languages class, and it seems like an appropriate tool for the job).


Output:

There are 8 valid tokens to parse:
  1. (NEWLINE) -- for a logical newline. 
    •  possible newlines include:
       \n        \r        \r\n
  2. (INDENT) -- for a logical increase in indentation. 
    • should be spaces, not tabs
  3. (DEDENT) -- for a logical decrease in indentation.
  4. (ID name) -- for an identifier.
    • possible identifiers match this regex: [A-Za-z_][A-Za-z_0-9]*
  5. (LIT value) -- for a literal value. 
    • possible literals are too numerous to copy pasta here, so check them out in this link.
  6. (KEYWORD symbol) -- for an keyword. 
    • possible keywords include:
      False      class      finally    is         return
      None       continue   for        lambda     try
      True       def        from       nonlocal   while
      and        del        global     not        with
      as         elif       if         or         yield
      assert     else       import     pass
      break      except     in         raise
  7. (PUNCT text) -- for operators and delimiters.
    • possible delimiters include:
      (       )       [       ]       {       }
      ,       :       .       ;       @       =
      +=      -=      *=      /=      //=     %=
      &=      |=      ^=      >>=     <<=     **=
    •  possible operators include:
      +       -       *       **      /       //      %
      <<      >>      &       |       ^       ~
      <       >       <=      >=      ==      !=
  8. (ENDMARKER) -- for the end of the input.
  9. If you encounter a lexical error, print (ERROR "explanation") and quit.
The sample file provided in the project description (written in lex, which I may end up using instead of Racket) already takes care of newline, indent/dedent (mostly), id, and some operators/delimiters. So I need to add:
  • a little more logic for indent/dedent
  • support for literals
  • support for keywords
  • more operators and delimeters
  • and support for EOF

No comments:

Post a Comment