基本上我只需要从浏览器窗口复制 HTML 并将其粘贴到 textarea 元素中的效果。
例如我想要这个:
<p>Some</p>
<div>text<br />Some</div>
<div>text</div>
变成这样:
Some
text
Some
text
基本上我只需要从浏览器窗口复制 HTML 并将其粘贴到 textarea 元素中的效果。
例如我想要这个:
<p>Some</p>
<div>text<br />Some</div>
<div>text</div>
变成这样:
Some
text
Some
text
If that HTML is visible within your web page, you could do it with the user selection (or just a TextRange
in IE). This does preserve line breaks, if not necessarily leading and trailing white space.
UPDATE 10 December 2012
However, the toString()
method of Selection
objects is not yet standardized and works inconsistently between browsers, so this approach is based on shaky ground and I don't recommend using it now. I would delete this answer if it weren't accepted.
Demo: http://jsfiddle.net/wv49v/
Code:
function getInnerText(el) {
var sel, range, innerText = "";
if (typeof document.selection != "undefined" && typeof document.body.createTextRange != "undefined") {
range = document.body.createTextRange();
range.moveToElementText(el);
innerText = range.text;
} else if (typeof window.getSelection != "undefined" && typeof document.createRange != "undefined") {
sel = window.getSelection();
sel.selectAllChildren(el);
innerText = "" + sel;
sel.removeAllRanges();
}
return innerText;
}
我试图找到一些我为这段时间写的代码,我曾经使用过。它工作得很好。让我概述一下它做了什么,希望你能复制它的行为。
您甚至可以进一步扩展它以格式化有序列表和无序列表等内容。这真的取决于你想走多远。
编辑
找到代码!
public static string Convert(string template)
{
template = Regex.Replace(template, "<img .*?alt=[\"']?([^\"']*)[\"']?.*?/?>", "$1"); /* Use image alt text. */
template = Regex.Replace(template, "<a .*?href=[\"']?([^\"']*)[\"']?.*?>(.*)</a>", "$2 [$1]"); /* Convert links to something useful */
template = Regex.Replace(template, "<(/p|/div|/h\\d|br)\\w?/?>", "\n"); /* Let's try to keep vertical whitespace intact. */
template = Regex.Replace(template, "<[A-Za-z/][^<>]*>", ""); /* Remove the rest of the tags. */
return template;
}
我根据这个答案做了一个函数:https ://stackoverflow.com/a/42254787/3626940
function htmlToText(html){
//remove code brakes and tabs
html = html.replace(/\n/g, "");
html = html.replace(/\t/g, "");
//keep html brakes and tabs
html = html.replace(/<\/td>/g, "\t");
html = html.replace(/<\/table>/g, "\n");
html = html.replace(/<\/tr>/g, "\n");
html = html.replace(/<\/p>/g, "\n");
html = html.replace(/<\/div>/g, "\n");
html = html.replace(/<\/h>/g, "\n");
html = html.replace(/<br>/g, "\n"); html = html.replace(/<br( )*\/>/g, "\n");
//parse html into text
var dom = (new DOMParser()).parseFromString('<!doctype html><body>' + html, 'text/html');
return dom.body.textContent;
}
根据chrmcpn 的回答,我必须将基本的 HTML 电子邮件模板转换为纯文本版本,作为node.js 中构建脚本的一部分。我必须使用JSDOM才能使其工作,但这是我的代码:
const htmlToText = (html) => {
html = html.replace(/\n/g, "");
html = html.replace(/\t/g, "");
html = html.replace(/<\/p>/g, "\n\n");
html = html.replace(/<\/h1>/g, "\n\n");
html = html.replace(/<br>/g, "\n");
html = html.replace(/<br( )*\/>/g, "\n");
const dom = new JSDOM(html);
let text = dom.window.document.body.textContent;
text = text.replace(/ /g, "");
text = text.replace(/\n /g, "\n");
text = text.trim();
return text;
}
三步。
First get the html as a string.
Second, replace all <BR /> and <BR> with \r\n.
Third, use the regular expression "<(.|\n)*?>" to replace all markup with "".