0

我有超过 300 个包含单词表的 word 文档,我一直在尝试为 excel 编写一个 VBA 脚本来提取我需要的信息,而且我对 Visual Basic 完全陌生。我需要将文件名复制到第一个单元格,接下来的单元格包含我要提取的信息,然后是下一个文件名,循环直到搜索和提取所有 word 文档。我尝试了多种不同的方法,但我能找到的最接近的代码如下。它可以提取零件编号,但不能提取描述。它还会提取不需要存在的无关信息,但如果它是必要的危险,我可以解决这些信息。我有一个示例 word 文件(将敏感信息替换为其他信息),但我不知道如何附加word文档或word文档第1页和第2页的jpeg。我知道如果你能看到它会很有帮助,所以请让我知道如何在这里或给你,这样你就可以看到它。

所以重新迭代:

  • 我需要第一个单元格中的文件名(A1)
  • 我需要word文档中表3中的某个单元格来excel
  • 如果可能的话,我需要 B 列(B2:B?)中的描述以及 C 列(C2:C?)中字母和数字的混合,然后在下一行,下一个文件名(A?),和继续重复。如果您有任何想法或建议,请告诉我。如果我不能发布图片或实际的示例文档,我愿意通过电子邮件或任何其他必要的方式来获得这方面的帮助。

这是我一直试图操纵的代码。我找到了它,它是表格的第一行和最后一行,为了我的目的,我试图让它工作,但无济于事:

Sub GetTablesFromWord()

   'this Excel file must be in
   'the same folder with the Word
   'document files that are to be'processed.    
   Dim wApp As Word.Application  
   Dim wDoc As Word.Document  
   Dim wTable As Word.Table  
   Dim wCell As Word.Cell 
   Dim basicPath As String  
   Dim fName As String     

   Dim myWS As Worksheet  
   Dim xlCell As Range  
   Dim lastRow As Long  
   Dim rCount As Long  
   Dim cCount As Long  
   Dim RLC As Long  
   Dim CLC As Long    
   basicPath = ThisWorkbook.Path & Application.PathSeparator  
   'change the sheet name as required  
   Set myWS = ThisWorkbook.Worksheets("Sheet1")  
   'clear any/all previous data on the sheet  myWS.Cells.Clear     

   '"open" Word  Set wApp = CreateObject("Word.Application")  
   'get first .doc file name in the folder  
   'with this Excel file  
   fName = Dir(basicPath & "*.doc*")
    Do While fName <> ""
       'this puts the filename into column A to
       'help separate the table data in Excel
       myWS.Range("A" & Rows.Count).End(xlUp).Offset(1, 0) = _
       "FILE: [" & fName & "]"
       'open the Word file
       wApp.Documents.Open basicPath & fName
       Set wDoc = wApp.Documents(1)
       'if there is a table in the
       'Word Document, work with it
       If wDoc.Tables.Count > 0 Then
         Set wTable = wDoc.Tables(3)
         rCount = wTable.Rows.Count
         cCount = wTable.Columns.Count
           For RLC = 1 To rCount
           lastRow = myWS.Range("A" & Rows.Count).End(xlUp).Row + 1
           For CLC = 1 To cCount
          'if there are merged cells in the
          'Word table, an error will be
          'generated - ignore the error,
          'but also won't process the data
          On Error Resume Next
          Set wCell = wTable.Cell(RLC, CLC)
          If Err <> 0 Then
            Err.Clear
            Else
            If CLC = 1 Then 
             Set xlCell = myWS.Range("A" & lastRow)
              xlCell = wCell
            Else
              Set xlCell = myWS.Range("B" & lastRow)
              xlCell = wCell
            End If
          End If
          On Error GoTo 0
        Next
      Next
      Set wCell = Nothing
      Set wTable = Nothing
    End If ' end of wDoc.Tables.Count test
    wDoc.Close False
    Set wDoc = Nothing
    fName = Dir()
 ' gets next .doc* filename in the folder
  Loop  wApp.Quit
  Set wApp = Nothing
  MsgBox "Task Completed"
End Sub
4

1 回答 1

0

此代码循环遍历文件夹中包含的所有 .docx 文件,将数据提取到电子表格中,关闭 word 文档,然后移至下一个文档。单词文档的名称被提取到 A 列,文档中第三个表中的值被提取到 B 列。这应该是您构建的一个很好的起点。

   Sub wordScrape()

Dim wrdDoc As Object, objFiles As Object, fso As Object, wordApp As Object
Dim sh1 As Worksheet
Dim x As Integer

FolderName = "C:\code" ' Change this to the folder containing your word documents

Set sh1 = ThisWorkbook.Sheets(1)
Set fso = CreateObject("Scripting.FileSystemObject")
Set wordApp = CreateObject("Word.application")
Set objFiles = fso.GetFolder(FolderName).Files

x = 1
For Each wd In objFiles
    If InStr(wd, ".docx") And InStr(wd, "~") = 0 Then
        Set wrdDoc = wordApp.Documents.Open(wd.Path, ReadOnly = True)
        sh1.Cells(x, 1) = wd.Name
        sh1.Cells(x, 2) = Application.WorksheetFunction.Clean(wrdDoc.Tables(3).Cell(Row:=3, Column:=2).Range)
        'sh1.Cells(x, 3) = ....more extracted data....
        x = x + 1
    wrdDoc.Close
    End If

Next wd
wordApp.Quit
End Sub
于 2013-10-05T21:33:17.797 回答