java - 如何将大写字母视为文本文件中的缩写

Question

所以我的程序应该读取一个包含推文的文本文件（每行一条推文）。它应该输出主题标签（任何以#开头的单词）和名称标签（任何以@开头的单词）的数量，以及困难的部分：它应该检查appreviations（不以@或#开头的所有大写单词）；然后打印缩写以及它们的数量。例如; 输入是

OMG roommate @bob drank all the beer...#FML #ihatemondays
lost TV remote before superbowl #FML
Think @bieber is soo hawt...#marryme
seeing @linkinpark & @tswift in 2 weeks...OMG

输出应如下所示：

Analyzing post:
OMG roommate @bob drank all the beer...#FML #ihatemondays
Hash tag count: 2
Name tag count: 1
Acronyms: OMG 
For a total of 1 acronym(s).

这是我的代码：

import java.io.*; //defines FileNotFoundException
import java.util.Scanner; // import Scanner class

    public class TweetAnalyzer {
    public static void main (String [] args) throws FileNotFoundException{
    //variables
        String tweet;
        Scanner inputFile = new Scanner(new File("A3Q1-input.txt"));

        while (inputFile.hasNextLine())
        {
          tweet = inputFile.nextLine();
          System.out.println("Analyzing post: ");
          System.out.println("\t" + tweet);
          analyzeTweet(tweet);
        }


      }//close main 

      public static void analyzeTweet(String tweet){
        int hashtags = countCharacters(tweet, '#');
        int nametags = countCharacters(tweet, '@');
        System.out.println("Hash tag: " + hashtags);
        System.out.println("Name tag: " + nametags);
        Acronyms(tweet);

      }//close analyzeTweet

      public static int countCharacters(String tweet, char c)//char c represents both @ and # symbols
      {
        int characters = 0;
        char current;
        for(int i=0;i<tweet.length();i++)
        {
          current = tweet.charAt(i);
          if(current == c)
          {
            characters++;
          }
        }
        return characters;
      }

      public static boolean symbol(String tweet, int i) {
        boolean result = true;
        char c;
        if(i-1 >=0)
        {
          c = tweet.charAt(i - 1);
          if (c == '@' || c == '#') {
            result = false;
        }
        }//close if
        else
        {
         result = false;
        }
        return result;
      }

      public static void Acronyms (String tweet){
        char current;
        int capital = 0;
        int j = 0;
        String initials = "";


        for(int i = 0; i < tweet.length(); i++) {
          current = tweet.charAt(i);
          if(symbol(tweet, i) && current >= 'A' && current <= 'Z') {       
            initials += current;
            j = i + 1; 
            current = tweet.charAt(j);
            while(j < tweet.length() && current >= 'A' && current <= 'Z') {
              current = tweet.charAt(j);
              initials += current;
              j++;

            }
            capital++;
            i = j;
            initials += " ";
            }
          else {

            j = i + 1; 
            current = tweet.charAt(j);
            while(j < tweet.length() && current >= 'A' && current <= 'Z') {
              current = tweet.charAt(j);

              j++;

            }

            i = j;

        }
        }
         System.out.println(initials);
         System.out.println("For a total of " + capital + " acronym(s)");
    }//close Acronyms


      }//TweetAnalyzer

除缩写部分外，一切正常。这是我的输出：

Analyzing post: 
    OMG roommate @bob drank all the beer...#FML #ihatemondays
Hash tag: 2
Name tag: 1

For a total of 0 acronym(s)
Analyzing post: 
    lost TV remote before superbowl #FML
Hash tag: 1
Name tag: 0

For a total of 0 acronym(s)
Analyzing post: 
    Think @bieber is soo hawt...#marryme
Hash tag: 1
Name tag: 1

For a total of 0 acronym(s)
Analyzing post: 
    seeing @linkinpark & @tswift in 2 weeks...OMG
Hash tag: 0
Name tag: 2
OMG 
For a total of 1 acronym(s)

请帮助我修复缩写部分。谢谢

score 1 · Accepted Answer

像这样逐字阅读似乎更自然：

for (String word : tweet.split("\\s+")) {
    if (word.charAt(0) == '@') {
        names++;

    } else if (word.charAt(0) == '#') {
        hashtags++;

    } else if (word.toUpperCase().equals(word)) {
        abbrevs++;
    }
}

score 0 · Accepted Answer

用来StringTokenizer分割空格是这样的

StringTokenizer st = new StringTokenizer (yourString);
while(st.hasMoreTokens()) {
   String str = st.nextElement();
   if(str.toUpperCase().equals(str)) {
      abbrvCount++;
   }
}

希望这可以帮助。

score 0 · Accepted Answer

这就是我要做的：我会在空格上分割推文，这样你就有一个单词列表。然后我会扔掉包含符号的单词。您可以为此使用StringUtils.isAlpha。现在，只需检查word.toUpperCase().equals(word). 如果是这样，那是一个没有符号的大写单词。你所说的首字母缩写词。

score 0 · Accepted Answer

试试这个方法来获取首字母缩略词计数：

private static int countAcronyms(String tweet) {
    int acronyms = 0;
    String[] words = tweet.split(" ");

    for (String word : words) {
        if(word.matches("[A-Z]+"))
            acronyms++;
    }

    return acronyms;
}

java - 如何将大写字母视为文本文件中的缩写

4 回答 4

Related

Reference