1

我正在尝试从我的文本区域中删除 MSWord 格式信息,但不知道如何执行此操作。情况就像我需要将一些内容从 MSWord 复制粘贴到文本框编辑器中。它被复制得很好,但问题是所有格式也被复制,所以我的 300 个字符的句子扩展到 20000 个字符的格式化句子。有人可以建议我该怎么做吗?

好的,完成了一些研发工作,我已经达到了一定的阶段。

这是我从 Word 文档中复制的文本

Once the user clicks on the Cancel icon for a transaction on the Status of Business, and the transaction is eligible for cancellation, a new screen titled “Cancel Transaction” will appear, with the following fields: 

这是我在 $("#textAreaId").val() 中得到的

"

  Normal
  0




  false
  false
  false

  EN-US
  X-NONE
  X-NONE




























Once the user clicks on the Cancel icon for a
transaction on the Status of Business, and the transaction is eligible for
cancellation, a new screen titled “Cancel Transaction” will appear, with the
following fields: 



 /* Style Definitions */
 table.MsoNormalTable
    {mso-style-name:"Table Normal";
    mso-style-parent:"";
    line-height:115%;
    font-:11.0pt;"Calibri","sans-serif";
    mso-bidi-"Times New Roman";}

"
4

1 回答 1

6

我终于在这里找到了解决方案

// removes MS Office generated guff
function cleanHTML(input) {
  // 1. remove line breaks / Mso classes
  var stringStripper = /(\n|\r| class=(")?Mso[a-zA-Z]+(")?)/g; 
  var output = input.replace(stringStripper, ' ');
  // 2. strip Word generated HTML comments
  var commentSripper = new RegExp('<!--(.*?)-->','g');
  var output = output.replace(commentSripper, '');
  var tagStripper = new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font)(.*?)>','gi');
  // 3. remove tags leave content if any
  output = output.replace(tagStripper, '');
  // 4. Remove everything in between and including tags '<style(.)style(.)>'
  var badTags = ['style', 'script','applet','embed','noframes','noscript'];

  for (var i=0; i< badTags.length; i++) {
    tagStripper = new RegExp('<'+badTags[i]+'.*?'+badTags[i]+'(.*?)>', 'gi');
    output = output.replace(tagStripper, '');
  }
  // 5. remove attributes ' style="..."'
  var badAttributes = ['style', 'start'];
  for (var i=0; i< badAttributes.length; i++) {
    var attributeStripper = new RegExp(' ' + badAttributes[i] + '="(.*?)"','gi');
    output = output.replace(attributeStripper, '');
  }
  return output;
}
于 2014-03-28T11:11:01.547 回答