0

我得到了一个来自多个来源的 url 列表,一些来源重叠,所以我有重复的列表,虽然它们不是完全重复的,有些有 http://,有些 www.,有些带有斜杠等.

目前我得到了一个可以使用精确重复的脚本,我需要改变 init 以像上面一样工作吗?

这就是我到目前为止所得到的

Sub Delete_duplicates()
Dim iListCount As Integer
Dim iCtr As Integer

' Turn off screen updating to speed up macro.
Application.ScreenUpdating = False

' Get count of records to search through.
iListCount = Sheets("Sheet1").Range("A1:A100").Rows.Count
Sheets("Sheet1").Range("A1").Select
' Loop until end of records.
Do Until ActiveCell = ""
   ' Loop through records.
   For iCtr = 1 To iListCount
      ' Don't compare against yourself.
      ' To specify a different column, change 1 to the column number.
      If ActiveCell.Row <> Sheets("Sheet1").Cells(iCtr, 1).Row Then
         ' Do comparison of next record.
         If ActiveCell.Value = Sheets("Sheet1").Cells(iCtr, 1).Value Then
            ' If match is true then delete row.
            Sheets("Sheet1").Cells(iCtr, 1).Delete xlShiftUp
               ' Increment counter to account for deleted row.
               iCtr = iCtr + 1
         End If
      End If
   Next iCtr
   ' Go to next record.
   ActiveCell.Offset(1, 0).Select
Loop
Application.ScreenUpdating = True
MsgBox "Done!"
End Sub
4

1 回答 1

1

您可以使用函数来“规范化”您的 URL,即

...
            If strapUrl(ActiveCell) = strapUrl(Sheets("Sheet1").Cells(iCtr, 1)) Then
...

Function strapURL(Arg As String) As String
Dim Tmp As String

    Tmp = Replace(Arg, "http://", "")     ' remove http://
    Tmp = Replace(Tmp, "www.", "")        ' remove www.
    If Right(Tmp, 1) = "/" Then
        Tmp = Left(Tmp, Len(Tmp) - 1)     ' remove trailing /
    End If
    strapURL = Tmp

End Function

将此函数应用于工作表中的某些示例,您将产生

http://www.mydomain.com/    mydomain.com
www.mydomain.com/           mydomain.com
mydomain.com/               mydomain.com
http://www.mydomain.com     mydomain.com
www.mydomain.com            mydomain.com
mydomain.com                mydomain.com

这使您可以“平等地”比较 URL。

于 2012-12-19T08:12:02.757 回答