0

我想将网站保留为离线对象。我在 Windows10 上使用 Powershell 5.1.19041.546

#在线分析(确实有效)

$website = Invoke-WebRequest https://www.w3schools.com/html/html_tables.asp
$website | gm

#I get an  Microsoft.PowerShell.Commands.HtmlWebResponseObject object
#next I use $website in this function (I call it Get-WebRequestTable) that expects a [Microsoft.PowerShell.Commands.HtmlWebResponseObject] $WebRequest, input object https://www.leeholmes.com/blog/2015/01/05/extracting-tables-from-powershells-invoke-webrequest/

#offline 分析在本地保存网站并使用 get-content 导入(不起作用)

#saving the website locally
Invoke-WebRequest -Uri  https://www.w3schools.com/html/html_tables.asp -OutFile C:\temp\website
#writing the website back to a variable
$offlinedata = Get-Content C:\temp\website
#I get a string object
$offlinedata | gm
#String can not be used in function :Get-WebRequestTable : Cannot process argument transformation on parameter 'WebRequest'. Cannot convert the "System.Object[]" value of type "System.Object[]" to type "Microsoft.PowerShell.Commands.HtmlWebResponseObject".
Get-WebRequestTable -WebRequest $offlinedata

#offline 分析将网站本地保存为 XML(不起作用)

Invoke-WebRequest -Uri  https://www.w3schools.com/html/html_tables.asp  | Export-Clixml C:\temp\website.xml

这运行时间很长,我得到以下 XML(短)

<Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04">
  [...]                  <S>System.__ComObject</S>
                         <S>System.__ComObject</S>

此时似乎创建了一个无限循环

 <S>System.__ComObject</S>

#将其转换为json以将其存储在本地(不起作用)

$website = Invoke-WebRequest -Uri  https://www.w3schools.com/html/html_tables.asp 
$website | ConvertTo-Json

我明白了

ConvertTo-Json : An item with the same key has already been added.

有谁知道如何在本地存储网站并稍后恢复 [Microsoft.PowerShell.Commands.HtmlWebResponseObject] 对象以进行进一步处理?

4

1 回答 1

0

此代码将本地 html 代码导入“HtmlWebResponseObject”对象

function convert-localhtml($localhtmlpath){
    $HTML = New-Object -Com "HTMLFile"
    $website = Get-Content "$localhtmlpath" -raw -ErrorAction Stop
    # Write HTML content according to DOM Level2 
    $HTML.IHTMLDocument2_write($website)
    $HTML
}

感谢 Prateek Singh https://ridicurious.com/2017/01/24/powershell-tip-parsing-html-from-a-local-file-or-a-string/

我稍微更改了 lee holmes 的代码,以便它可以处理两种对象类型。[Microsoft.PowerShell.Commands.HtmlWebResponseObject] 如果您使用invoke-webrequest或 [HTMLDocumentClass] 如果您使用convert-localhtml

https://www.leeholmes.com/blog/2015/01/05/extracting-tables-from-powershells-invoke-webrequest/

感谢他出色的表格提取代码

   function Get-WebRequestTable{
        param(
            [Parameter(Mandatory = $true)]
            $WebRequest,
            [Parameter(Mandatory = $true)]
            [int]$TableNumber
    
        )
    
          # Ensure that a supported type was passed.
      if (($WebRequest.GetType().Name -ne "HTMLDocumentClass") -and ($WebRequest.GetType().Name -ne "HtmlWebResponseObject")) { Throw "Unsupported argument type. Need [Microsoft.PowerShell.Commands.HtmlWebResponseObject] or [HTMLDocumentClass] " }
    
      if ($WebRequest -is [Microsoft.PowerShell.Commands.HtmlWebResponseObject]) {
      $tables = @($WebRequest.ParsedHtml.getElementsByTagName("TABLE"))
      }
      else {
        #"[HTMLDocumentClass] arguments given."
        $tables = @($WebRequest.getElementsByTagName("TABLE"))
      }
        
        ## Extract the tables out of the web request
        
        $table = $tables[$TableNumber]
        $titles = @()
        $rows = @($table.Rows)
    
        ## Go through all of the rows in the table
    
        foreach ($row in $rows)
        {
            $cells = @($row.Cells)
            ## If we've found a table header, remember its titles
            if ($cells[0].tagName -eq "TH")
    
            {
    
                $titles = @($cells | ForEach-Object { ("" + $_.InnerText).Trim() })
    
                continue
    
            }
    
            ## If we haven't found any table headers, make up names "P1", "P2", etc.
    
            if (-not $titles)
    
            {
    
                $titles = @(1..($cells.Count + 2) | ForEach-Object { "P$_" })
    
            }
    
            ## Now go through the cells in the the row. For each, try to find the
    
            ## title that represents that column and create a hashtable mapping those
    
            ## titles to content
    
            $resultObject = [Ordered]@{}
    
            for ($counter = 0; $counter -lt $cells.Count; $counter++)
    
            {
    
                $title = $titles[$counter]
    
                if (-not $title) { continue }
    
    
    
                $resultObject[$title] = ("" + $cells[$counter].InnerText).Trim()
    
            }
    
            ## And finally cast that hashtable to a PSCustomObject
    
            [pscustomobject]$resultObject
    
        }
    
    }
于 2020-10-21T17:39:01.600 回答