java - 使用Java根据关键字解析文本

Question

基本上，我得到了一个文件，其中包含有关人员的详细信息，每个人用新行分隔，例如“

name Marioka address 97 Garderners Road birthday 12-11-1982 \n
name Ada Lovelace gender woman\n
name James address 65 Watcher Avenue

“ 等等..

而且，我想将它们解析为 [Keyword : Value] 对数组，例如

{[Name, Marioka], [Address, 97 Gardeners Road], [Birthday, 12-11-1982]},
{[Name, Ada Lovelace], [Gender, Woman]}, and so on....

等等。关键字将是一组定义的单词，在上述情况下：姓名、地址、生日、性别等......

做这个的最好方式是什么？

我就是这样做的，它有效，但想知道是否有更好的解决方案。

    private Map<String, String> readRecord(String record) {
        Map<String, String> attributeValuePairs = new HashMap<String, String>();
        Scanner scanner = new Scanner(record);
        String attribute = "", value = ""; 

        /* 
         * 1. Scan each word. 
         * 2. Find an attribute keyword and store it at "attribute".
         * 3. Following words will be stored as "value" until the next keyword is found.
         * 4. Return value-attribute pairs as HashMap
         */

        while(scanner.hasNext()) {
            String word = scanner.next();
            if (this.isAttribute(word)) {
                if (value.trim() != "") {
                    attributeValuePairs.put(attribute.trim(), value.trim());
                    value = "";
                }
                attribute = word;
            } else {
                value += word + " ";
            }
        }
        if (value.trim() != "") attributeValuePairs.put(attribute, value);

        scanner.close();
        return attributeValuePairs;
    }

    private boolean isAttribute(String word) {
        String[] attributes = {"name", "patientId", 
            "birthday", "phone", "email", "medicalHistory", "address"};
        for (String attribute: attributes) {
            if (word.equalsIgnoreCase(attribute)) return true;
        }
        return false;
    }

score 1 · Accepted Answer

要从字符串中提取值，请使用正则表达式。我希望您知道如何从文件中读取每一行以及如何使用结果构建一个数组。

这仍然不是一个好的解决方案，因为如果名称或地址中包含任何关键字，它就不起作用......但这就是你要求的......

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {

        Pattern p = Pattern.compile("name (.+) address (.+) birthday (.+)");

        String text = "name Marioka address 97 Garderners Road birthday 12-11-1982";

        Matcher m = p.matcher(text);

        if (m.matches()) {
            System.out.println(m.group(1) + "\n" + m.group(2) + "\n"
                    + m.group(3));
        } else {
            System.out.println("String does not match");
        }
    }
}

score 1 · Accepted Answer

试试这个：

ArrayList<String> keywords = new ArrayList<String>();
    keywords.add("name");
    keywords.add("address");
    keywords.add("birthday");
    keywords.add("gender");
    String s[] = "name James address 65 Watcher Avenue".trim().split(" ");
    Map<String,String> m = new HashMap<String,String>();
    for(int i=0;i<s.length;i++){

        if(keywords.contains(s[i])){
            System.out.println(s[i]);

            String key =s[i];
            StringBuilder b = new StringBuilder();
            i++;
            if(i<s.length){
            while(!(keywords.contains(s[i]))){

                System.out.println("i "+i);
                if(i<s.length-1){
                b.append(s[i] + " ");
                }
                i++;
                if(i>=s.length){
                    b.append(s[i-1]);
                    break;
                }
            }
            }
            m.put(key, b.toString());
            i--;
        }
    }
    System.out.println(m);

只需将要识别的关键字添加到命名的数组列表中即可keywords。

已编辑：请注意，它不会生成“如果某人的姓名或地址包含其中一个关键字”的输出

score 0 · Accepted Answer

最好的方法是将数据放入地图中，这样您就可以设置一个键值（“name”：“Marioka”）

Map<String,String> mp=new HashMap<String, String>();
    // adding or set elements in Map by put method key and value pair
    mp.put("name", "nameData");
    mp.put("address", "addressData")...etc

score 0 · Accepted Answer

这需要您（伪代码）：

1.  >Read a line
2.  >Split it by a delimiter(' ' in your case)
2.5 >Map<String,String> mp = new HashMap<String,String>();
3.  >for(int i = 0; i < splitArray.length; i += 2){
      try{
        mp.put(splitArray[i],splitArray[i+1]);
      }catch(Exception e){ System.err.println("Syntax Error"); }
4.  >Bob's your uncle, Fanny's your aunt.

尽管您必须将数据文件修改为“；” =空间。如

name Ada;Lovelace

score 0 · Accepted Answer

逐行读取文件并在每一行调用 getKeywordValuePairs() 方法。

public class S{

    public static void main(String[] args) {
        System.out.println(getKeywordValuePairs("name Marioka address 97 Garderners Road birthday 12-11-1982",
                new String[]{
                    "name", "address", "birthday", "gghghhjgghjhj"
                }));
    }

    public static String getKeywordValuePairs(String text, String keywords[]) {

        ArrayList<String> keyWordsPresent = new ArrayList<>();
        ArrayList<Integer> indicesOfKeywordsPresent = new ArrayList<>();

        // finding the indices of all the keywords and adding them to the array
        // lists only if the keyword is present
        for (int i = 0; i < keywords.length; i++) {
            int index = text.indexOf(keywords[i]);
            if (index >= 0) {
                keyWordsPresent.add(keywords[i]);
                indicesOfKeywordsPresent.add(index);
            }
        }

        // Creating arrays from Array Lists
        String[] keywordsArray = new String[keyWordsPresent.size()];
        int[] indicesArray = new int[indicesOfKeywordsPresent.size()];
        for (int i = 0; i < keywordsArray.length; i++) {
            keywordsArray[i] = keyWordsPresent.get(i);
            indicesArray[i] = indicesOfKeywordsPresent.get(i);
        }


        // Sorting the keywords and indices arrays based on the position where the keyword appears
        for (int i = 0; i < indicesArray.length; i++) {
            for (int j = 0; j < indicesArray.length - 1 - i; j++) {
                if (indicesArray[i] > indicesArray[i + 1]) {
                    int temp = indicesArray[i];
                    indicesArray[i] = indicesArray[i + 1];
                    indicesArray[i + 1] = temp;
                    String tempString = keywordsArray[i];
                    keywordsArray[i] = keywordsArray[i + 1];
                    keywordsArray[i + 1] = tempString;
                }
            }
        }

        // Creating the result String
        String result = "{";
        for (int i = 0; i < keywordsArray.length; i++) {
            result = result + "[" + keywordsArray[i] + ",";
            if (i == keywordsArray.length - 1) {
                result = result + text.substring(indicesArray[i] + keywordsArray[i].length()) + "]";
            } else {
                result = result + text.substring(indicesArray[i] + keywordsArray[i].length(), indicesArray[i + 1]) + "],";
            }
        }
        result = result + "}";
        return result;
    }
}

score 0 · Accepted Answer

我有一个完全不同的解决方案，探索Java regular expressions and Enum读取并将其解析为 pojo 的能力，这是面向未来的解决方案。

步骤-1：定义您的枚举（您可以扩展枚举以添加所有必需的键）

public enum PersonEnum {
  name { public void set(Person d,String name) {  d.setName(name) ;} },
  address { public void set(Person d,String address) {  d.setAddress(address); } },
  gender { public void set(Person d,String address) {  d.setOthers(address); } };
  public void set(Person d,String others) { d.setOthers(others);  }
}

第 2 步：定义您的 pojo 类（如果您不需要 pojo，您可以更改枚举以使用HashMap）

public class Person {

    private String name;
    private String address;
    private String others;

    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getAddress() {
        return address;
    }
    public void setAddress(String address) {
        this.address = address;
    }
    public String getOthers() {
        return others;
    }
    public void setOthers(String others) {
        this.others = others;
    }
    @Override
    public String toString() {
        return name+"==>"+address+"==>"+others;
    }

第2步：这是解析器

public static void main(String[] args) {

    try {
        String inputs ="name Marioka address 97 Garderners Road birthday 12-11-1982\n name Ada Lovelace gender" +
                " woman address London\n name James address 65 Watcher Avenue";
        Scanner scanner = new Scanner(inputs);
        List<Person> personList = new ArrayList<Person>();
        while(scanner.hasNextLine()){
            String line = scanner.nextLine();
            List<String> filtereList=splitLines(line, "name|address|gender");
            Iterator< String> lineIterator  = filtereList.iterator();
            Person p = new Person();
            while(lineIterator.hasNext()){
                PersonEnum pEnum = PersonEnum.valueOf(lineIterator.next());
                pEnum.set(p, lineIterator.next());
            }
            personList.add(p);
            System.out.println(p);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
public static List<String> splitLines(String inputText, String pString) {
    Pattern pattern =Pattern.compile(pString);
    Matcher m = pattern.matcher(inputText);
    List<String> filteredList = new ArrayList<String>();
    int start = 0;
    while (m.find()) {
        add(inputText.substring(start, m.start()),filteredList);
        add(m.group(),filteredList);
        start = m.end();
    }
    add(inputText.substring(start),filteredList);
    return filteredList;
}
public static void add(String text, List<String> list){
    if(text!=null && !text.trim().isEmpty()){
        list.add(text);
    }
}

注意：您需要在 PersonEnum 中定义可能的枚举常量，否则您需要采取措施防止InvalidArgumentException

eg: java.lang.IllegalArgumentException: No enum const class com.sa.PersonEnum.address

否则，这可能是我建议干杯的最好的 java(OOP) 解决方案之一！

java - 使用Java根据关键字解析文本

6 回答 6

Related

Reference