-1

我正在从一个 API 中检索两个 CSV,其中一个students.csv类似于:

StudentNo,PreferredFirstnames,PreferredSurname,UPN
111, john, smith, john@email.com
222, jane, doe, jane@email.com

一个叫rooms.csv

roomName, roomNo, students
room1, 1, {@{StudentNo=111; StudentName=john smith; StartDate=2018-01-01T00:00:00; EndDate=2018-07-06T00:00:00},....
room2, 2,{@{StudentNo=222; StudentName=jane doe; StartDate=2018-01-01T00:00:00; EndDate=2018-07-06T00:00:00},...   

rooms.csv 中的第三列是 API 提供的数组

将两者合并为的最佳方法是什么

StudentNo,PreferredFirstnames,PreferredSurname,UPN, roomName
111, john, smith, john@email.com, room1
222, jane, doe, jane@email.com, room2

我在想类似...

$rooms = Import-Csv rooms.csv
$students  = Import-Csv students.csv
$combined = $students | select-object StudentNo,PreferredSurname,PreferredFirstnames,UPN,
@{Name="roomName";Expression={ ForEach ($r in $rooms) {
    if ($r.Students.StudentNo.Contains($_.StudentNo) -eq "True") 
{return $r.roomName}}}} 

这行得通,但是foreach我把事情混在一起是正确的方法还是有更有效的方法???

--- 原帖 ---

使用所有这些信息,我需要比较学生数据并更新 AzureAD,然后编译一个数据列表,包括从AzureAD 检索到的 、 、 和其他first name数据last nameupnroom

我的问题是“效率”。我的代码大部分都可以工作,但需要几个小时才能运行。目前我正在循环遍历students.csv,然后每个学生循环遍历rooms.csv以找到他们所在的房间,并且显然在这一切之间等待多个 api 调用。

为每个学生找到房间的最有效方法是什么?将 CSV 作为自定义导入是否PSObject与使用哈希表相当?

4

1 回答 1

0

我能够让您提出的代码工作,但它需要对代码和数据进行一些调整:

  • 必须有一些额外的步骤将students列反序列化为rooms.csv对象集合。它似乎是ScriptBlock评估为 s 数组的a HashTable,但仍需要对 CSV 输入进行一些更改:
    • 和属性需要被引用并转换StartDate为。EndDate[DateTime]
    • 至少对于包含多个学生的房间,必须引用该值,以免Import-Csv将分隔数组元素解释,为附加列。
  • 使用 CSV 作为中间格式的缺点是丢失了原始属性类型;一切都变成了[String]进口。有时出于效率目的需要转换回原始类型,有时为了使某些操作正常工作是绝对必要的。您可以在每次使用它们时转换这些属性,但我更喜欢在导入后立即转换它们。

随着这些变化rooms.csv成为...

roomName, roomNo, students
room1, 1, "{@{StudentNo=111; StudentName='john smith'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"
room2, 2, "{@{StudentNo=222; StudentName='jane doe'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"

......脚本变成......

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock)

                return $studentsArray
            }
        }
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
    | Select-Object `
        -ExcludeProperty 'StudentNo' `
        -Property '*', @{
            Name = 'StudentNo'
            Expression = { [Int32] $_.StudentNo }
        }
$combined = $students `
    | Select-Object -Property `
        'StudentNo', `
        'PreferredSurname', `
        'PreferredFirstnames', `
        'UPN', `
        @{
            Name = "roomName";
            Expression = {
                foreach ($r in $rooms)
                {
                    if ($r.Students.StudentNo -contains $_.StudentNo)
                    {
                        return $r.roomName
                    }
                }

                #TODO: Return text indicating room not found?
            }
        }

这可能很慢的原因是因为您正在为每个学生对象执行线性搜索 - 实际上是其中两个搜索;首先通过房间集合 ( foreach),然后通过每个房间的学生集合 ( -contains)。这很快就会变成大量的迭代和相等比较,因为在当前学生未分配到的每个房间中,您都在迭代该房间的学生的整个集合,直到找到该学生的房间。

执行线性搜索时可以进行的一项简单优化是对正在搜索的项目进行排序(在这种情况下,Students属性将按StudentNo每个学生的属性排序)...

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock) `
                    | Sort-Object -Property @{ Expression = { $_.StudentNo } }

                return $studentsArray
            }
        }

...然后当您搜索同一个集合时,如果遇到大于您正在搜索的项目的项目,您知道集合的其余部分不可能包含您正在搜索的内容并且您可以立即中止搜索...

@{
    Name = "roomName";
    Expression = {
        foreach ($r in $rooms)
        {
            # Requires $room.Students to be sorted by StudentNo
            foreach ($roomStudentNo in $r.Students.StudentNo)
            {
                if ($roomStudentNo -eq $_.StudentNo)
                {
                    # Return the matched room name and stop searching this and further rooms
                    return $r.roomName
                }
                elseif ($roomStudentNo -gt $_.StudentNo)
                {
                    # Stop searching this room
                    break
                }

                # $roomStudentNo is less than $_.StudentNo; keep searching this room
            }
        }

        #TODO: Return text indicating room not found?
    }
}

更好的是,通过排序集合,您还可以执行二分搜索,这比线性搜索要快*。这个Array已经提供了一个BinarySearch静态方法,所以我们也可以用更少的代码来完成这个......

@{
    Name = "roomName";
    Expression = {
        foreach ($r in $rooms)
        {
            # Requires $room.Students to be sorted by StudentNo
            if ([Array]::BinarySearch($r.Students.StudentNo, $_.StudentNo) -ge 0)
            {
                return $r.roomName
            }
        }

        #TODO: Return text indicating room not found?
    }
}

然而,我解决这个问题的方法是使用 a到房间的[HashTable]映射。StudentNo构建时需要进行一些预处理,[HashTable]但这将在为学生检索房间时提供恒定时间的查找。

function GetRoomsByStudentNoTable()
{
    $table = @{ }

    foreach ($room in $rooms)
    {
        foreach ($student in $room.Students)
        {
            #NOTE: It is assumed each student belongs to at most one room
            $table[$student.StudentNo] = $room
        }
    }

    return $table
}

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock)

                return $studentsArray
            }
        }
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
    | Select-Object `
        -ExcludeProperty 'StudentNo' `
        -Property '*', @{
            Name = 'StudentNo'
            Expression = { [Int32] $_.StudentNo }
        }
$roomsByStudentNo = GetRoomsByStudentNoTable
$combined = $students `
    | Select-Object -Property `
        'StudentNo', `
        'PreferredSurname', `
        'PreferredFirstnames', `
        'UPN', `
        @{
            Name = "roomName";
            Expression = {
                $room = $roomsByStudentNo[$_.StudentNo]
                if ($room -ne $null)
                {
                    return $room.roomName
                }

                #TODO: Return text indicating room not found?
            }
        }

$roomsByStudentNo您可以通过在导入的同时这样做来改善构建的影响rooms.csv...

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock)

                return $studentsArray
            }
        } `
    | ForEach-Object -Begin {
        $roomsByStudentNo = @{ }
    } -Process {
        foreach ($student in $_.Students)
        {
            #NOTE: It is assumed each student belongs to at most one room
            $roomsByStudentNo[$student.StudentNo] = $_
        }

        return $_
    }

*小型阵列除外

于 2018-09-13T22:53:38.453 回答