我的任务是为语言 c-- 实现词法分析器。我们必须将 c-- 代码翻译成一系列标记,这些标记将在内部表示为整数,因为它更容易操作。该语言的一些词汇约定是有关键字,如 double、else、if、int、return、void 和 while。还有特殊符号,如 + - * / < <= > >= == != = ; , . ( ) [ ] { } /* */ //. 标识符可以以任何字母或下划线开头,后跟字母、数字和下划线的任意组合。空格分隔标记并被忽略。数字可以是整数或小数,并且允许使用注释行和块。
import java.io.*;
public class Lex {
public static boolean contains(char[] a, char b){
for (int i = 0; i < a.length; i++) {
if(b == a[i])
return true;
}
return false;
}
public static void main(String args[]) throws FileNotFoundException, IOException{
//Declaring token values as constant integers.
final int T_DOUBLE = 0;
final int T_ELSE = 1;
final int T_IF = 2;
final int T_INT = 3;
final int T_RETURN = 4;
final int T_VOID = 5;
final int T_WHILE = 6;
final int T_PLUS = 7;
final int T_MINUS = 8;
final int T_MULTIPLICATION = 9;
final int T_DIVISION = 10;
final int T_LESS = 11;
final int T_LESSEQUAL = 12;
final int T_GREATER = 13;
final int T_GREATEREQUAL = 14;
final int T_EQUAL = 16;
final int T_NOTEQUAL = 17;
final int T_ASSIGNOP = 18;
final int T_SMEICOLON = 19;
final int T_PERIOD = 20;
final int T_LEFTPAREN = 21;
final int T_RIGHTPAREN = 22;
final int T_LEFTBRACKET = 23;
final int T_RIGHTBRACKET = 24;
final int T_LEFTBRACE = 25;
final int T_RIGHTBRACE = 26;
final int T_ID = 27;
final int T_NUM = 28;
char[] letters_ = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D',
'E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','_'};
char[] numbers = {'0','1','2','3','4','5','6','7','8','9'};
char[] symbols = {'+','-','*','/','<','>','!','=',':',',','.','(',')','[',']','{','}'};
FileInputStream fstream = new FileInputStream("src\\testCode.txt");
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
BufferedWriter bw1 = new BufferedWriter(new FileWriter(new File("src\\output.txt"), true));
BufferedWriter bw2 = new BufferedWriter(new FileWriter(new File("src\\output2.txt"), true));
String scanner;String temp = "";
int n = 0;
while((scanner = br.readLine()) != null){
for (int i = 0; i < scanner.length(); i++) {
for (int j = 0; j < scanner.length(); j++) {
if(contains(letters_,scanner.charAt(i)) || contains(numbers,scanner.charAt(i)) || contains(symbols,scanner.charAt(i))){
j++;
n++;
if(scanner.charAt(j) == ' ' || scanner.charAt(j) == '\n' || scanner.charAt(j) == '\t'){
}
}
}
}
}
in.close();
}
}
这是我们的测试代码:
int fact(int x) {
// recursive factorial function
if (x>1)
return x * fact(x-1);
else return 1;
}
void main(void) {
/* CS 311 project 2
A lexical analyzer */
int x, y, z;
double _funny;
x = get_integer();
_Funny = get_double();
if (x>0)
print_line(fact(x));
else if (_funny != 3.14)
print_line(x*_funny);
}
这应该是我们的输出
3 27 21 3 27 22 25 2 21 27 13 28 22 4 27 9 27 21 27 8 28 22 18 1 4 28 18 26 5 27 21 5 22 25 3 27 19 27 19 27 18 0 277 18 27 1817 27 17 27 21 22 18 2 21 27 13 28 22 27 21 27 21 27 22 22 18 1 2 21 27 12 28 22 27 21 27 9 27 22 18 26
INT id leftparen INT id rightparen leftbrace IF leftparen id 大于 num rightparen RETURN id 乘法 id leftparen id 减 num rightparen 分号 ELSE RETURN num 分号 rightparen VOID id leftparen VOID rightparen leftbrace INT id comma id comma id 分号 DOUBLE id 分号 id assignop id leftparen rightparen 分号id assignop id leftparen rightparen 分号 IF leftparen id 更大的 num rightparen id leftparen id leftparen id rightparen rightparen 分号 ELSE IF leftparen id notequal num rightparen id leftparen id 乘法 id rightparen 分号 rightbrace
好的,我根据用户 John 的建议编写了一些代码。我仍然对这将如何工作感到困惑。当我迭代第二个循环以查找空白或符号时,我如何知道符号的 ws 之前出现了什么类型的令牌。我试图将我跳过的字符放入字符串中并使用 case 语句来确定它,但我认为它会将整个文件写入字符串,因此我的标记永远不会匹配。另外,方法如何找到评论并安全地忽略它们?