PROJECT STILL IN PROGRESS!
Description: Simple compiler of 7 statement-types with the goal to read a foreign .shcl file (Seohyun Hwang's compiled language), then to translate its contents into C code and write results into a separate .c file; the compiler is written as a Java program to take advantage of OOP.
• Currently implemented: tokenization, variable-collision check, nondeclared variable-usage check
• All source code is found within the src folder (incl. Main.java, Lexer.java, Parser.java, SemAn.java).
• Sample source code in the new programming language is located in main.shcl.
Practice mission: OOP, tokenization, file-management, understanding the reasons for the nature of contemporary compilers, understanding the difference between compilers and interpreters, understanding the reasons for the nature of contemporary compiled programming languages, handling a wide range of logical variations using a cohesive structure
Yet to do: ordering arithmetic operations in PEMDAS form, unrolling while-loops
7 statement-types
- #intnew,varname,;
- #boolnew,varname,;
- #intdef,varname=arithmeticArgument,;
- #booldef,varname=conditionalArgument,;
- #while0,varname{statements};
- #while1,varname{statements};
- #print,varname,;
# begins a statement.
; ends a statement.
Token naming:
#intnew,--> $FK1#boolnew,--> $FK2#intdef,--> $FK3#booldef,--> $FK4#while0,--> $FK5#while1,--> $FK6#print,--> $FK7- Variable in argument of FK3 --> $UV1
- Variable in argument of FK4 --> $UV2
- Numeric-literal in FK1 --> $NL1
- Numeric-literal in FK2 --> $NL2
- Numeric-literal in FK3 --> $NL3
- Numeric-literal in FK4 --> $NL4
- Boolean false --> $B0
- Boolean true --> $B1
+(addition operator) --> $ADD+-(subtraction operator) --> $SUBT-*(multiplication operator) --> $MULT*/(division operator) --> $DIV/:(conditional equality) --> $CEQ:<(lesser than) --> $LESSER<>(greater than) --> $GREATER>=--> $S1 (start of arithmetic/conditional argument){--> $S2 (start of loop);--> $T1 (end of arithmetic/conditional argument)}--> $T2 (end of loop)
Abbreviation meanings:
- FK: first keyword
- S: starting-point
- UV: use-variable
- NL: numeric-literal
- T: terminal
Additional rules:
- Variable-names are case-sensitive; the rest is case-insensitive.
- Variable-names must be purely alphanumeric. They cannot be purely numeric. A number can exist at any position within the variable-name.
- The lexer ignores whitespaces
, newlines\n, and any text outside statement markers#and;(perhaps within loop markers{and}).