我想标记这样的字符串
String line = "a=b c='123 456' d=777 e='uij yyy'";
我不能像这样分裂
String [] words = line.split(" ");
知道如何拆分以便获得类似的令牌
a=b
c='123 456'
d=777
e='uij yyy';
最简单的方法是手动实现一个简单的有限状态机。换句话说,一次处理一个字符的字符串:
line.split(" (?=[a-z+]=)")
正确给出:
a=b
c='123 456'
d=777
e='uij yyy'
确保调整 [a-z+] 部分,以防您的键结构发生变化。
编辑:如果该对的值部分中有“=”字符,此解决方案可能会失败。
StreamTokenizer可以提供帮助,尽管它最容易设置为在 '=' 上中断,因为它总是会在带引号的字符串的开头中断:
String s = "Ta=b c='123 456' d=777 e='uij yyy'";
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
st.ordinaryChars('0', '9');
st.wordChars('0', '9');
while (st.nextToken() != StreamTokenizer.TT_EOF) {
switch (st.ttype) {
case StreamTokenizer.TT_NUMBER:
System.out.println(st.nval);
break;
case StreamTokenizer.TT_WORD:
System.out.println(st.sval);
break;
case '=':
System.out.println("=");
break;
default:
System.out.println(st.sval);
}
}
输出
Ta
=
b
c
=
123 456
d
=
777
e
=
uij yyy
如果您省略将数字字符转换为字母的两行,那么您会得到d=777.0
,这可能对您有用。
假设:
这对我来说很好。
输入:
a=b abc='123 456' &=777 #='uij yyy' ABC='slk slk' 123sdkljhSDFjflsakd@*#&=456sldSLKD)#(
输出:
a=b
abc='123 456'
&=777
#='uij yyy'
ABC='slk slk'
123sdkljhSDFjflsakd@*#&=456sldSLKD)#(
代码:
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
// SPACE CHARACTER followed by
// sequence of non-space characters of 1 or more followed by
// first occuring EQUALS CHARACTER
final static String regex = " [^ ]+?=";
// static pattern defined outside so that you don't have to compile it
// for each method call
static final Pattern p = Pattern.compile(regex);
public static List<String> tokenize(String input, Pattern p){
input = input.trim(); // this is important for "last token case"
// see end of method
Matcher m = p.matcher(input);
ArrayList<String> tokens = new ArrayList<String>();
int beginIndex=0;
while(m.find()){
int endIndex = m.start();
tokens.add(input.substring(beginIndex, endIndex));
beginIndex = endIndex+1;
}
// LAST TOKEN CASE
//add last token
tokens.add(input.substring(beginIndex));
return tokens;
}
private static void println(List<String> tokens) {
for(String token:tokens){
System.out.println(token);
}
}
public static void main(String args[]){
String test = "a=b " +
"abc='123 456' " +
"&=777 " +
"#='uij yyy' " +
"ABC='slk slk' " +
"123sdkljhSDFjflsakd@*#&=456sldSLKD)#(";
List<String> tokens = RegexTest.tokenize(test, p);
println(tokens);
}
}
或者,使用用于标记化的正则表达式,以及仅将键/值添加到映射的小型状态机:
String line = "a = b c='123 456' d=777 e = 'uij yyy'";
Map<String,String> keyval = new HashMap<String,String>();
String state = "key";
Matcher m = Pattern.compile("(=|'[^']*?'|[^\\s=]+)").matcher(line);
String key = null;
while (m.find()) {
String found = m.group();
if (state.equals("key")) {
if (found.equals("=") || found.startsWith("'"))
{ System.err.println ("ERROR"); }
else { key = found; state = "equals"; }
} else if (state.equals("equals")) {
if (! found.equals("=")) { System.err.println ("ERROR"); }
else { state = "value"; }
} else if (state.equals("value")) {
if (key == null) { System.err.println ("ERROR"); }
else {
if (found.startsWith("'"))
found = found.substring(1,found.length()-1);
keyval.put (key, found);
key = null;
state = "key";
}
}
}
if (! state.equals("key")) { System.err.println ("ERROR"); }
System.out.println ("map: " + keyval);
打印出来
map: {d=777, e=uij yyy, c=123 456, a=b}
它会进行一些基本的错误检查,并从值中去掉引号。
该解决方案既通用又紧凑(它实际上是 cletus 答案的正则表达式版本):
String line = "a=b c='123 456' d=777 e='uij yyy'";
Matcher m = Pattern.compile("('[^']*?'|\\S)+").matcher(line);
while (m.find()) {
System.out.println(m.group()); // or whatever you want to do
}
换句话说,找到所有由引号字符串或非空格字符组合而成的字符;不支持嵌套引号(没有转义字符)。
public static void main(String[] args) {
String token;
String value="";
HashMap<String, String> attributes = new HashMap<String, String>();
String line = "a=b c='123 456' d=777 e='uij yyy'";
StringTokenizer tokenizer = new StringTokenizer(line," ");
while(tokenizer.hasMoreTokens()){
token = tokenizer.nextToken();
value = token.contains("'") ? value + " " + token : token ;
if(!value.contains("'") || value.endsWith("'")) {
//Split the strings and get variables into hashmap
attributes.put(value.split("=")[0].trim(),value.split("=")[1]);
value ="";
}
}
System.out.println(attributes);
}
输出:{d=777, a=b, e='uij yyy', c='123 456'}
在这种情况下,连续空格将被截断为值中的单个空格。这里属性化的哈希图包含值
import java.io.*;
import java.util.Scanner;
public class ScanXan {
public static void main(String[] args) throws IOException {
Scanner s = null;
try {
s = new Scanner(new BufferedReader(new FileReader("<file name>")));
while (s.hasNext()) {
System.out.println(s.next());
<write for output file>
}
} finally {
if (s != null) {
s.close();
}
}
}
}
java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(line, " ");
while (tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
int index = token.indexOf('=');
String key = token.substring(0, index);
String value = token.substring(index + 1);
}
您是否尝试过按“=”拆分并从每对结果数组中创建一个标记?