I have a function in Powershell that gets the content of a file and breaks it up into fields to put into a CSV file. I'm wondering if there's a way to get a value from the link and add it to the columns that are sent to the CSV file while keeping the link column intact.
function Convert2CSV {
(Get-Content $input_path) -match "href" | % {
$data = ($_ -replace '(?:.*)href="(.*?)">Date:\s*([\w\.]+)\s*([\w\:]+)\s*Item:\s*(.*)</a>(?:.*)' , '$1;$2;$3;$4').Split(";")
New-Object psobject -Property @{
"Link" = $data[0]
"Date" = $data[1]
"Time" = $data[2]
"Item" = $data[3]
}
} #| Export-Csv $output_file -NoTypeInformation
}
The value I'm looking for is either
FeedDefault_.*?(&) or _Feed.*?(&)
Am I correct in thinking that I can add some sort of if statement to the "Link" = $data[0] part?
Sample Output as requested.
Value in Link | Link | Date | Time | Item |
--------------------------------------------------------------------------------------------------------------------------------------------|
bluepebbles | http://www.domain.com/page.html?FeedDefault_bluepebbles&something | 2013-05-19 | 13:30 | Blue Pebbles |
--------------------------------------------------------------------------------------------------------------------------------------------|
redpebbles | http://www.domain.com/page.html?Feed_redpebbles&something | 2013-05-19 | 13:31 | Red Pebbles |
--------------------------------------------------------------------------------------------------------------------------------------------|
CSV Formatted
Value in Link,Link,Date,Time,Item
"bluepebbles","http://www.domain.com/page.html?FeedDefault_bluepebbles&something","2013-05-19","13:30","Blue Pebbles"
"redpebbles","http://www.domain.com/page.html?Feed_redpebbles&something","2013-05-19","13:31","Red Pebbles"
So entering in
$input_path = 'f:\mockup\area51\files\link.html'
$output_file = 'f:\mockup\area51\files\db_csv.csv'
$tstampCulture = [Globalization.cultureinfo]::GetCultureInfo("en-GB")
$ie = New-Object -COM "InternetExplorer.Application"
$ie.Visible = $false
$ie.Navigate("file:///$input_path")
$ie.document.getElementsByTagName("a") | % {
$_.innerText -match 'Date:\s*([\w\.]+)\s*([\w\:]+)\s*Item:\s*(.*)'
$obj = New-Object psobject -Property @{
"Link" = $_.href
"Date" = $matches[1]
"Time" = $matches[2]
"Item" = $matches[3]
}
if ( $obj.Link -match '\?Feed(?:Default)?_(.*?)&' ) {
$obj | Add-Member –Type "NoteProperty" –Name "LinkValue" –Value $matches[1]
}
$obj
} #| Export-Csv $output_file -NoTypeInformation
Returns an error:
You cannot call a method on a null-valued expression.
At line:12 char:38
+ $ie.document.getElementsByTagName <<<< ("a") | % {
+ CategoryInfo : InvalidOperation: (getElementsByTagName:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
So I'm pretty sure that I probably messed something up. :)