0

我正在尝试将许多旧的 .DOC 文件转换为 PDF 格式或 RTF 格式。到目前为止,我已经找到了完成后者(转换为 RTF)的方法,但是旧 Word 应用程序的格式仍然存在于文档中。如果您打开 Microsoft Word(我使用的是 2010)并单击“文件”>“打开”,则会出现一个下拉菜单,可让您选择“从任何文件(.)中恢复文本”。是否可以在转换过程中使用它来过滤掉 .DOC 文档中的格式数据?以下是我目前尝试修改的几个脚本示例:

尽管它似乎只是将 .rtf 附加到文件的末尾而不是更改格式,但它已经奏效了:

Sub SaveAllAsDOCX()
Dim strFilename As String
Dim strDocName As String
Dim strPath As String
Dim oDoc As Document
Dim fDialog As FileDialog
Dim intPos As Integer
Set fDialog = Application.FileDialog(msoFileDialogFolderPicker)
With fDialog
    .Title = "Select folder and click OK"
    .AllowMultiSelect = False
    ..InitialView = msoFileDialogViewList
    If .Show <> -1 Then
        MsgBox "Cancelled By User", , "List Folder Contents"
        Exit Sub
    End If
    strPath = fDialog.SelectedItems.Item(1)
    If Right(strPath, 1) <> "\" Then strPath = strPath + "\"
End With
If Documents.Count > 0 Then
    Documents.Close SaveChanges:=wdPromptToSaveChanges
End If
If Left(strPath, 1) = Chr(34) Then
    strPath = Mid(strPath, 2, Len(strPath) - 2)
End If
strFilename = Dir$(strPath & "*.doc")
While Len(strFilename) <> 0
    Set oDoc = Documents.Open(strPath & strFilename)
    strDocName = ActiveDocument.FullName
    intPos = InStrRev(strDocName, ".")
    strDocName = Left(strDocName, intPos - 1)
    strDocName = strDocName & ".docx"
    oDoc.SaveAs FileName:=strDocName, _
        FileFormat:=wdFormatDocumentDefault
    oDoc.Close SaveChanges:=wdDoNotSaveChanges
    strFilename = Dir$()
Wend
End Sub

到目前为止,这个在任何转换中都没有成功:

Option Explicit
Sub ChangeDocsToTxtOrRTFOrHTML()
'with export to PDF in Word 2007
    Dim fs As Object
    Dim oFolder As Object
    Dim tFolder As Object
    Dim oFile As Object
    Dim strDocName As String
    Dim intPos As Integer
    Dim locFolder As String
    Dim fileType As String
    On Error Resume Next
    locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\myDocs")
    Select Case Application.Version
        Case Is < 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML")
        Case Is >= 12
            Do
                fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT"))
            Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF")
    End Select
    Application.ScreenUpdating = False
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.CreateFolder(locFolder & "Converted")
    Set tFolder = fs.GetFolder(locFolder & "Converted")
    For Each oFile In oFolder.Files
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        ChangeFileOpenDirectory tFolder
        Select Case fileType
        Case Is = "TXT"
            strDocName = strDocName & ".txt"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText
        Case Is = "RTF"
            strDocName = strDocName & ".rtf"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF
        Case Is = "HTML"
            strDocName = strDocName & ".html"
            ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML
        Case Is = "PDF"
            strDocName = strDocName & ".pdf"

            ' *** Word 2007 users - remove the apostrophe at the start of the next line ***
            'ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF

        End Select
        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    Application.ScreenUpdating = True
End Sub
4

1 回答 1

1

我将介绍一种方法,即使用 VBA 脚本来做你想做的事,而不必使用 Word 内置的“从任何文件中恢复文本”模式功能。

它将一个目录中的每个 .doc/.docx 转换为 .txt,但可用于转换为父应用程序支持的任何其他格式(我使用 Word 2010 进行了测试)。如下:

'------------ VBA script start -------------
Sub one1()
Set fs = CreateObject("Scripting.FileSystemObject")
Set list1 = fs.GetFolder(ActiveDocument.Path)
For Each fl In list1.files
  If InStr(fl.Type, "Word") >= 1 And Not fl.Path = ActiveDocument.Path & "\" & ActiveDocument.Name Then
    Set wordapp = CreateObject("word.Application")
    Set Doc1 = wordapp.Documents.Open(fl.Path)
    'wordapp.Visible = True
    Doc1.SaveAs2 FileName:=fl.Name & ".txt", fileformat:=wdFormatText
    wordapp.Quit
  End If
Next
End Sub
'------------ VBA script start -------------

要另存为 PDF,请使用

Doc1.SaveAs2 FileName:=fl.Name & ".pdf", fileformat:=wdFormatPDF

反而

要另存为 RTF,请使用

Doc1.SaveAs2 FileName:=fl.Name & ".rtf", fileformat:=wdFormatRTF 

反而

或者,比如说,HTML:

Doc1.SaveAs2 FileName:=fl.Name & ".html", fileformat:=wdFormatHTML

等等。

我没有费心检查的一些缺点,因为它们是无害的:

  • 在执行结束时会弹出一条错误消息,但没有任何后果。

  • 它试图打开自己,因为它是文档本身内部的 VBA 脚本,而且它是一个文档打开器脚本。然后,当弹出消息时,您将不得不指示“他”以只读方式手动打开它。

  • 它将所有文档保存到 C:\users\username\Documents ,而不是执行它的那个,在大多数情况下会更好。

  • 缓慢的过程,预计大多数普通个人计算机的速度为 2-3 个文档/秒。

于 2013-05-29T19:39:51.830 回答