0

我需要拆分 aString并获取String[]单词。我试过这个:

String[] plain = plainText.split(" ,;<>/[(!)*=]");

但就我而言,这是行不通的。拆分后,数组plain仍然只有一个值,它是字符串中的整个字符串plainText。我的字符串如下所示:

<table class="content" border="0" cellpadding="0" cellspacing="0" style="width:540px;" bgcolor="#ffffff">
            <tr>
                <td align="left" valign="top">
                    <font color="#666666" face="Arial, Verdana" size="1">
                    eBay Inc.<br />
                    2145 Hamilton Avenue<br />
                    San Jose, California 95125<br /><br />

                    Designated trademarks and brands are the property of their respective owners. eBay and the eBay logo are trademarks of eBay Inc.
                    <br /><br />

                    <strong>&copy; 2013 eBay Inc. All Rights Reserved</strong><br /><br />


                    eBay Inc. sent this e-mail to you at maximkr@gmail.com because you opted in to the eBay Deals Daily Alert campaign by signing up at ebay.com/deals.<br /><br />


                    Pricing: We compared the selling price for the featured Deals items on eBay to the List Price for the item. The List price is the price (excluding shipping and handling fees) the seller of the item has provided at which the same item, or one that is nearly identical to it, is being offered for sale or has been offered for sale in the recent past. The price may be the seller's own price elsewhere or another seller's price. The "% off" simply signifies the calculated percentage difference between seller-provided List Price and the seller's price for the eBay Deals item. If you have any questions related to the pricing and/or discount offered in eBay Deals, please contact the seller. All items subject to availability.<br /><br />

                    If you wish to unsubscribe from eBay Deals email alerts, please <a href="http://dailydeal.ebay.com/unsubscribe.jsp?s=4IwA&i=883690252203">click here</a>.
                    Please note that you are only opting out of the eBay Deals email alerts. If you are an eBay customer and wish to change your other eBay Notification Preferences, please log in to My eBay by <a href="http://l.deals.ebay.com/u.d?R4GrxGghJ4SpZccF_r3SS=21801">clicking here</a>. Please note that it may take up to 10 days to process changes to your eBay Notification Preferences. <br /><br />

                    Visit our <a href="http://l.deals.ebay.com/u.d?f4GrxGghJ4SpZccF_r3Sf=21811">Privacy Policy</a> and <a href="http://l.deals.ebay.com/u.d?KYGrxGghJ4SpZccF_r3SY=21821">User Agreement</a> if you have any questions.<br /><br />

                    </font>
                </td>

这是已解析的电子邮件消息的一部分。那么如何将这个文本转换成一个单词数组呢?

4

3 回答 3

3

这个正则表达式是错误的,因为它的一些字符是正则表达式控制字符(例如[(*)并且必须转义才能用作拆分分隔符,而且整个字符组必须包含在 [] 内:

String[] plain = plainText.split("[ ,;<>/\\[\\(!\\)\\*=\\]]");

在此处阅读有关Java 正则表达式的更多信息。

编辑:要跟进 CPerkins 的评论,您还可以使用此正则表达式:

String[] plain = plainText.split("[\\s^\\W]+");

它的作用是分割所有空白字符和所有非单词字符,我认为这有点像你想要的。

注意:以上只是对您问题的直接回答,还有更好的方法来读取/解析 HTML。

于 2013-04-16T14:30:09.090 回答
0

您可以使用 Scanner 类。您可以使用

while(scanner.hasNext()){}

型构造。

链接:扫描仪

于 2013-04-16T14:33:23.313 回答
0

Apache StringUtils.split的一些变体怎么样?

于 2013-04-16T15:59:59.013 回答