23

我已经从 word 模板创建了一个 docx 文件,现在我正在访问复制的 docx 文件并想用一些其他数据替换某些文本。

我无法获得有关如何从文档主要部分访问文本的提示?

任何帮助都是不言而喻的。

以下是我到目前为止的代码。

private void CreateSampleWordDocument()
    {
        //string sourceFile = Path.Combine("D:\\GeneralLetter.dot");
        //string destinationFile = Path.Combine("D:\\New.doc");
        string sourceFile = Path.Combine("D:\\GeneralWelcomeLetter.docx");
        string destinationFile = Path.Combine("D:\\New.docx");
        try
        {
            // Create a copy of the template file and open the copy
            File.Copy(sourceFile, destinationFile, true);
            using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true))
            {
                // Change the document type to Document
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
                //Get the Main Part of the document
                MainDocumentPart mainPart = document.MainDocumentPart;
                mainPart.Document.Save();
            }
        }
        catch
        {
        }
    }

现在如何找到某些文本并替换它?我无法通过链接获得,所以一些代码提示会很明显。

4

11 回答 11

25

只是为了让您了解如何操作,请尝试:

  using ( WordprocessingDocument doc =
                    WordprocessingDocument.Open(@"yourpath\testdocument.docx", true))
            {
                var body = doc.MainDocumentPart.Document.Body;
                var paras = body.Elements<Paragraph>();

                foreach (var para in paras)
                {
                    foreach (var run in para.Elements<Run>())
                    {
                        foreach (var text in run.Elements<Text>())
                        {
                            if (text.Text.Contains("text-to-replace"))
                            {
                                text.Text = text.Text.Replace("text-to-replace", "replaced-text");
                            }
                        }
                    }
                }
            }
        }

请注意文本区分大小写。替换后文本格式不会更改。希望这对您有所帮助。

于 2013-08-20T15:44:20.167 回答
19

In addition to Flowerking's answer:

When your Word file has textboxes in it, his solution would not work. Because textbox has TextBoxContent element so it will not appear at foreach loop of Runs.

But when writing

using ( WordprocessingDocument doc =
                    WordprocessingDocument.Open(@"yourpath\testdocument.docx", true))
{
    var document = doc.MainDocumentPart.Document

    foreach (var text in document.Descendants<Text>()) // <<< Here
    {
        if (text.Text.Contains("text-to-replace"))
        {
            text.Text = text.Text.Replace("text-to-replace", "replaced-text");
        }
    } 
}
        

it will loop all the texts in document(whether it is in textbox or not) so it will replace the texts.

Note that if the text is split between Runs or Textboxes, this also won't work. You need a better solution for those cases. One solution to split texts could be fixing the "template", sometimes, simply deleting the placeholder and re-creating it works wonders.

于 2015-07-26T23:50:49.320 回答
4

也许这个解决方案更容易:
1. aStreamReader读取所有文本,
2. 使用 aRegex不区分大小写替换新文本而不是旧 tex
3. aStreamWriter再次将修改后的文本写入文档。

 using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
    string docText = null;
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        docText = sr.ReadToEnd();

    foreach (var t in findesReplaces)
        docText = new Regex(findText, RegexOptions.IgnoreCase).Replace(docText, replaceText);

    using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        sw.Write(docText);
}
于 2014-06-17T08:29:02.747 回答
3

这是一个解决方案,可以跨文本运行(包括文本框)在打开的 xml(word)文档中查找和替换标签

namespace Demo
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text.RegularExpressions;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;

    public class WordDocumentHelper
    {
        class DocumentTag
        {
            public DocumentTag()
            {
                ReplacementText = "";
            }

            public string Tag { get; set; }
            public string Table { get; set; }
            public string Column { get; set; }
            public string ReplacementText { get; set; }

            public override string ToString()
            {
                return ReplacementText ?? (Tag ?? "");
            }
        }

        private const string TAG_PATTERN = @"\[(.*?)[\.|\:](.*?)\]";
        private const string TAG_START = @"[";
        private const string TAG_END = @"]";

        /// <summary>
        /// Clones a document template into the temp folder and returns the newly created clone temp filename and path.
        /// </summary>
        /// <param name="templatePath"></param>
        /// <returns></returns>
        public string CloneTemplateForEditing(string templatePath)
        {
            var tempFile = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName()) + Path.GetExtension(templatePath);
            File.Copy(templatePath, tempFile);
            return tempFile;
        }

        /// <summary>
        /// Opens a given filename, replaces tags, and saves. 
        /// </summary>
        /// <param name="filename"></param>
        /// <returns>Number of tags found</returns>
        public int FindAndReplaceTags(string filename)
        {
            var allTags = new List<DocumentTag>();

            using (WordprocessingDocument doc = WordprocessingDocument.Open(path: filename, isEditable: true))
            {
                var document = doc.MainDocumentPart.Document;

                // text may be split across multiple text runs so keep a collection of text objects
                List<Text> tagParts = new List<Text>();

                foreach (var text in document.Descendants<Text>())
                {
                    // search for any fully formed tags in this text run
                    var fullTags = GetTags(text.Text);

                    // replace values for fully formed tags
                    fullTags.ForEach(t => {
                        t = GetTagReplacementValue(t);
                        text.Text = text.Text.Replace(t.Tag, t.ReplacementText);
                        allTags.Add(t);
                    });

                    // continue working on current partial tag
                    if (tagParts.Count > 0)
                    {
                        // working on a tag
                        var joinText = string.Join("", tagParts.Select(x => x.Text)) + text.Text;

                        // see if tag ends with this block
                        if (joinText.Contains(TAG_END))
                        {
                            var joinTag = GetTags(joinText).FirstOrDefault(); // should be just one tag (or none)
                            if (joinTag == null)
                            {
                                throw new Exception($"Misformed document tag in block '{string.Join("", tagParts.Select(x => x.Text)) + text.Text}' ");
                            }

                            joinTag = GetTagReplacementValue(joinTag);
                            allTags.Add(joinTag);

                            // replace first text run in the tagParts set with the replacement value. 
                            // (This means the formatting used on the first character of the tag will be used)
                            var firstRun = tagParts.First();
                            firstRun.Text = firstRun.Text.Substring(0, firstRun.Text.LastIndexOf(TAG_START));
                            firstRun.Text += joinTag.ReplacementText;

                            // replace trailing text runs with empty strings
                            tagParts.Skip(1).ToList().ForEach(x => x.Text = "");

                            // replace all text up to and including the first index of TAG_END
                            text.Text = text.Text.Substring(text.Text.IndexOf(TAG_END) + 1);

                            // empty the tagParts list so we can start on a new tag
                            tagParts.Clear();
                        }
                        else
                        {
                            // no tag end so keep getting text runs
                            tagParts.Add(text);
                        }
                    }

                    // search for new partial tags
                    if (text.Text.Contains("["))
                    {
                        if (tagParts.Any())
                        {
                            throw new Exception($"Misformed document tag in block '{string.Join("", tagParts.Select(x => x.Text)) + text.Text}' ");
                        }
                        tagParts.Add(text);
                        continue;
                    }

                }

                // save the temp doc before closing
                doc.Save();
            }

            return allTags.Count;
        }

        /// <summary>
        /// Gets a unique set of document tags found in the passed fileText using Regex
        /// </summary>
        /// <param name="fileText"></param>
        /// <returns></returns>
        private List<DocumentTag> GetTags(string fileText)
        {
            List<DocumentTag> tags = new List<DocumentTag>();

            if (string.IsNullOrWhiteSpace(fileText))
            {
                return tags;
            }

            // TODO: custom regex for tag matching 
            // this example looks for tags in the formation "[table.column]" or "[table:column]" and captures the full tag, "table", and "column" into match Groups
            MatchCollection matches = Regex.Matches(fileText, TAG_PATTERN);
            foreach (Match match in matches)
            {
                try
                {

                    if (match.Groups.Count < 3
                        || string.IsNullOrWhiteSpace(match.Groups[0].Value)
                        || string.IsNullOrWhiteSpace(match.Groups[1].Value)
                        || string.IsNullOrWhiteSpace(match.Groups[2].Value))
                    {
                        continue;
                    }

                    tags.Add(new DocumentTag
                    {
                        Tag = match.Groups[0].Value,
                        Table = match.Groups[1].Value,
                        Column = match.Groups[2].Value
                    });
                }
                catch
                {

                }
            }

            return tags;
        }

        /// <summary>
        /// Set the Tag replacement value of the pasted tag
        /// </summary>
        /// <returns></returns>
        private DocumentTag GetTagReplacementValue(DocumentTag tag)
        {
            // TODO: custom routine to update tag Replacement Value

            tag.ReplacementText = "foobar";

            return tag;
        }
    }
}
于 2019-07-09T21:19:31.227 回答
3

到目前为止,我发现的最简单、最准确的方法是使用Open-Xml-PowerTools。就个人而言,我使用 dotnet core,所以我使用这个 nuget 包

using OpenXmlPowerTools;
// ...

protected byte[] SearchAndReplace(byte[] file, IDictionary<string, string> translations)
{
    WmlDocument doc = new WmlDocument(file.Length.ToString(), file);

    foreach (var translation in translations)
        doc = doc.SearchAndReplace(translation.Key, translation.Value, true);

    return doc.DocumentByteArray;
}

使用示例:

var templateDoc = File.ReadAllBytes("templateDoc.docx");
var generatedDoc = SearchAndReplace(templateDoc, new Dictionary<string, string>(){
    {"text-to-replace-1", "replaced-text-1"},
    {"text-to-replace-2", "replaced-text-2"},
});
File.WriteAllBytes("generatedDoc.docx", generatedDoc);

有关详细信息,请参阅在 Open XML WordprocessingML 文档中搜索和替换文本

于 2020-08-14T21:05:10.283 回答
1

我正在测试这个以生成文档,但我的占位符被拆分为运行和文本节点。我不想将整个文档加载为单个字符串以进行正则表达式查找/替换,因此我使用了 OpenXml api。我的想法是:

  1. 清理占位符节点作为对文档的一次性操作
  2. 每次生成时按节点值查找/替换,现在源是干净的。

测试表明占位符在运行和文本节点之间被分割,但不是段落。我还发现后续占位符不共享文本节点,所以我没有处理。占位符遵循模式{{placeholder_name}}

首先,我需要获取段落中的所有文本节点(根据@sertsedat):

    var nodes = paragraph.Descendants<Text>();

测试表明,这个函数保留了顺序,这对我的用例来说是完美的,因为我可以遍历集合以查找开始/停止指示符,并将那些属于占位符的节点分组。

分组函数在文本节点值中查找{{}}识别作为占位符一部分且应删除的节点,以及应忽略的其他节点。

一旦找到一个节点的开始,所有后续节点,直到并包括终止,都需要删除(通过添加到TextNodes列表中标记),包含在占位符中的那些节点的值StringBuilder,以及任何文本部分不属于占位符的第一个/最后一个节点也需要保存(因此是字符串属性)。当找到新的占位符或在序列末尾时,任何不完整的组都应该引发错误。

最后,我使用分组来更新原始文档

foreach (var placeholder in GroupPlaceholders(paragraph.Descendants<Text>()))
{
    var firstTextNode = placeholder.TextNodes[0];
    if (placeholder.PrecedingText != null)
    {
        firstTextNode.Parent.InsertBefore(new Text(placeholder.PrecedingText), firstTextNode);
    }
    firstTextNode.Parent.InsertBefore(placeholder.PlaceholderText, firstTextNode);
    if (placeholder.SubsequentText != null)
    {
        firstTextNode.Parent.InsertBefore(new Text(placeholder.SubsequentText), firstTextNode);
    }
    foreach (var textNode in placeholder.TextNodes) {
        textNode.Remove();                      
    }
}
于 2021-05-05T19:54:45.037 回答
1

我的类用于替换 word 文档中的长短语,该词拆分为不同的文本块:

类本身:

using System.Collections.Generic;
using System.Text;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace WebBackLibrary.Service
{
    public class WordDocumentService
    {
        private class WordMatchedPhrase
        {
            public int charStartInFirstPar { get; set; }
            public int charEndInLastPar { get; set; }

            public int firstCharParOccurance { get; set; }
            public int lastCharParOccurance { get; set; }
        }

        public WordprocessingDocument ReplaceStringInWordDocumennt(WordprocessingDocument wordprocessingDocument, string replaceWhat, string replaceFor)
        {
            List<WordMatchedPhrase> matchedPhrases = FindWordMatchedPhrases(wordprocessingDocument, replaceWhat);

            Document document = wordprocessingDocument.MainDocumentPart.Document;
            int i = 0;
            bool isInPhrase = false;
            bool isInEndOfPhrase = false;
            foreach (Text text in document.Descendants<Text>()) // <<< Here
            {
                char[] textChars = text.Text.ToCharArray();
                List<WordMatchedPhrase> curParPhrases = matchedPhrases.FindAll(a => (a.firstCharParOccurance.Equals(i) || a.lastCharParOccurance.Equals(i)));
                StringBuilder outStringBuilder = new StringBuilder();
                
                for (int c = 0; c < textChars.Length; c++)
                {
                    if (isInEndOfPhrase)
                    {
                        isInPhrase = false;
                        isInEndOfPhrase = false;
                    }

                    foreach (var parPhrase in curParPhrases)
                    {
                        if (c == parPhrase.charStartInFirstPar && i == parPhrase.firstCharParOccurance)
                        {
                            outStringBuilder.Append(replaceFor);
                            isInPhrase = true;
                        }
                        if (c == parPhrase.charEndInLastPar && i == parPhrase.lastCharParOccurance)
                        {
                            isInEndOfPhrase = true;
                        }

                    }
                    if (isInPhrase == false && isInEndOfPhrase == false)
                    {
                        outStringBuilder.Append(textChars[c]);
                    }
                }
                text.Text = outStringBuilder.ToString();
                i = i + 1;
            }

            return wordprocessingDocument;
        }

        private List<WordMatchedPhrase> FindWordMatchedPhrases(WordprocessingDocument wordprocessingDocument, string replaceWhat)
        {
            char[] replaceWhatChars = replaceWhat.ToCharArray();
            int overlapsRequired = replaceWhatChars.Length;
            int overlapsFound = 0;
            int currentChar = 0;
            int firstCharParOccurance = 0;
            int lastCharParOccurance = 0;
            int startChar = 0;
            int endChar = 0;
            List<WordMatchedPhrase> wordMatchedPhrases = new List<WordMatchedPhrase>();
            //
            Document document = wordprocessingDocument.MainDocumentPart.Document;
            int i = 0;
            foreach (Text text in document.Descendants<Text>()) // <<< Here
            {
                char[] textChars = text.Text.ToCharArray();
                for (int c = 0; c < textChars.Length; c++)
                {
                    char compareToChar = replaceWhatChars[currentChar];
                    if (textChars[c] == compareToChar)
                    {
                        currentChar = currentChar + 1;
                        if (currentChar == 1)
                        {
                            startChar = c;
                            firstCharParOccurance = i;
                        }
                        if (currentChar == overlapsRequired)
                        {
                            endChar = c;
                            lastCharParOccurance = i;
                            WordMatchedPhrase matchedPhrase = new WordMatchedPhrase
                            {
                                firstCharParOccurance = firstCharParOccurance,
                                lastCharParOccurance = lastCharParOccurance,
                                charEndInLastPar = endChar,
                                charStartInFirstPar = startChar
                            };
                            wordMatchedPhrases.Add(matchedPhrase);
                            currentChar = 0;
                        }
                    }
                    else
                    {
                        currentChar = 0;

                    }
                }
                i = i + 1;
            }

            return wordMatchedPhrases;

        }

    }
}

以及易于使用的示例:

public void EditWordDocument(UserContents userContents)
        {
            string filePath = Path.Combine(userContents.PathOnDisk, userContents.FileName);
            WordDocumentService wordDocumentService = new WordDocumentService();
            if (userContents.ContentType.Contains("word") && File.Exists(filePath))
            {
                string saveAs = "modifiedTechWord.docx";
                //
                using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true)) //open source word file
                {
                    Document document = doc.MainDocumentPart.Document;
                    OpenXmlPackage res = doc.SaveAs(Path.Combine(userContents.PathOnDisk, saveAs)); // copy it
                    res.Close();
                }
                using (WordprocessingDocument doc = WordprocessingDocument.Open(Path.Combine(userContents.PathOnDisk, saveAs), true)) // open copy
                {
                    string replaceWhat = "{transform:CandidateFio}";
                    string replaceFor = "ReplaceToFio";
                    var result = wordDocumentService.ReplaceStringInWordDocumennt(doc, replaceWhat, replaceFor); //replace words in copy
                }
            }
        }
于 2020-07-14T17:19:14.660 回答
0

对于现实世界的文档,这里的大多数答案都是错误的。

有两个主要的解决方案。如果您可以控制源文档,请使用邮件合并字段进行查找/替换,而不是尝试使用文档中的文本。

如果您不能使用邮件合并字段,解决方案是编写您自己的文本缓冲区来组合多个文本字段。这将允许您查找/替换在文本字段之间拆分的文本,这种情况经常发生。

由于可能发生的所有拆分组合,很难正确编写!但它已经为我工作了好几年,处理了数百万份文件。

于 2021-12-13T22:37:53.280 回答
0
Dim doc As WordprocessingDocument = WordprocessingDocument.Open("Chemin", True, New OpenSettings With {.AutoSave = True})

Dim d As Document = doc.MainDocumentPart.Document

Dim txt As Text = d.Descendants(Of Text).Where(Function(t) t.Text = "txtNom").FirstOrDefault

If txt IsNot Nothing Then
 txt.Text = txt.Text.Replace("txtNom", "YASSINE OULARBI")
End If

doc.Close()
于 2019-10-15T08:56:56.027 回答
0

如果您要查找的文本位于方括号之间,并且 Word 会在多次运行中拆分您的文本...;

搜索文本 (ienumerable(of text))

for (int i = 0; i <= SearchIn.Count - 1; i++) {

    if (!(i + 2 > SearchIn.Count - 1)) {
        Text TXT = SearchIn(i);
        Text TXT1 = SearchIn(i + 1);
        Text TXT2 = SearchIn(i + 2);

        if (Strings.Trim(TXT.Text) == "[" & Strings.Trim(TXT2.Text) == "]") {
            TXT1.Text = TXT.Text + TXT1.Text + TXT2.Text;

            TXT.Text = "";
            TXT2.Text = "";
        }
    }
}
于 2017-09-15T11:52:32.627 回答
-1

是来自 msdn 的解决方案。

那里的例子:

public static void SearchAndReplace(string document)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;
        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        Regex regexText = new Regex("Hello world!");
        docText = regexText.Replace(docText, "Hi Everyone!");

        using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}
于 2015-12-26T19:44:50.053 回答