我试图弄清楚如何正确识别输入文件中的标记并返回它应该是什么类型,同时使用空格和换行符的分隔符。词法分析器应该识别的四种类型是:
Identifiers = ([a-z] | [A-Z])([a-z] | [A-Z] | [0-9])*
Numbers = [0-9]+
Punctuation = \+ | \- | \* | / | \( | \) | := | ;
Keywords = if | then | else | endif | while | do | endwhile | skip
例如,如果文件有一行内容:
tcu else i34 2983 ( + +eqdQ
它应该标记并打印出来:
identifier: tcu
keyword: else
identifier: i34
number: 2983
punctuation: (
punctuation: +
punctuation: +
identifier: eqdQ
对于两种不同类型彼此相邻的情况,我无法弄清楚如何让词法分析器通过最长的子字符串。
这就是我的尝试:
//start
public static void main(String[] args) throws IOException {
//input file//
File file = new File("input.txt");
//output file//
FileWriter writer = new FileWriter("output.txt");
//instance variables
String sortedOutput = "";
String current = "";
Scanner scan = new Scanner(file);
String delimiter = "\\s+ | \\s*| \\s |\\n|$ |\\b\\B|\\r|\\B\\b|\\t";
String[] analyze;
BufferedReader read = new BufferedReader(new FileReader(file));
//lines get read here from the .txt file
while(scan.hasNextLine()){
sortedOutput = sortedOutput.concat(scan.nextLine() + System.lineSeparator());
}
//lines are tokenized here
analyze = sortedOutput.split(delimiter);
//first line is printed here through a separate reader
current = read.readLine();
System.out.println("Current Line: " + current + System.lineSeparator());
writer.write("Current Line: " + current + System.lineSeparator() +"\n");
//string matching starts here
for(String a: analyze)
{
//matches identifiers if it doesn't match with a keyword
if(a.matches(patternAlpha))
{
if(a.matches(one))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(two))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(three))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(four))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(five))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(six))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(seven))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else if(a.matches(eight))
{
System.out.println("Keyword: " + a);
writer.write("Keyword: "+ a + System.lineSeparator());
}
else
{
System.out.println("Identifier: " + a);
writer.write("Identifier: "+ a + System.lineSeparator());
}
}
//number check
else if(a.matches(patternNumber))
{
System.out.println("Number: " + a);
writer.write("Number: "+ a + System.lineSeparator());
}
//punctuation check
else if(a.matches(patternPunctuation))
{
System.out.println("Punctuation: " + a);
writer.write("Punctuation: "+ a + System.lineSeparator());
}
//this special case here updates the current line with the next line
else if(a.matches(nihil))
{
System.out.println();
current = read.readLine();
System.out.println("\nCurrent Line: " + current + System.lineSeparator());
writer.write("\nCurrent Line: " + current + System.lineSeparator() + "\n");
}
//everything not listed in regex is read as an error
else
{
System.out.println("Error reading: " + a);
writer.write("Error reading: "+ a + System.lineSeparator());
}
}
//everything closes here to avoid errors
scan.close();
read.close();
writer.close();
}
}
我将不胜感激任何建议。先感谢您。