0

我有多个特征列表,这些特征是我想要分析的字符串。也就是说,例如:

[["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]]

我知道如何将像“0.5”这样的字符串转换为浮点数,但是有没有办法将这些列表“标准化”为整数或浮点值(在我的情况下每个列表都是独立的)?我想得到这样的东西:

[[2, 1, 0, 3, 0], [0, 1, 3, 0, 2]]

有谁知道如何实现这一目标?不幸的是,我还找不到与此问题相关的任何内容。

4

2 回答 2

0

有点乱,但可能应该做你想做的 - 使用字典来跟踪你使用过的列表中的项目。您可以用生成器替换 for 循环以减少冗长。

def track_items_in_list(test_list):
    outer_list = []
    # iterate through outer list
    for _list in test_list:
        # unique_count is an integer that corresponds to an item in your list
        unique_count = 0
        # used_tracker matches the unique_count with an item in your list
        used_tracker = {}
        inner_list = []
        # iterate through inner list
        for _item in _list:
            # check the used_tracker to see if the item has been used - if so, replace with the corresponding v'unique count'
            if _item in used_tracker:
                inner_list.append(used_tracker[_item])
            else:
                # if not, add the count to the tracker
                inner_list.append(unique_count)
                used_tracker[_item] = unique_count
                unique_count += 1
         outer_list.append(inner_list)

track_items_in_list([["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]])
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]
于 2020-08-25T13:45:29.450 回答
0

使用字典和计数器为新值赋予 ID 并记住过去的 ID:

import itertools, collections

def norm(lst):
    d = collections.defaultdict(itertools.count().__next__)
    return [d[s] for s in lst]

lst = [["0.5", "0.4", "disabled", "0.7", "disabled"],
       ["feature1", "feature2", "feature4", "feature1", "feature3"]]
print(list(map(norm, lst)))
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]

或者通过枚举排序的唯一值;但是请注意,它"disables"在数值之后排序:

def norm_sort(lst):
    d = {x: i for i, x in enumerate(sorted(set(lst)))}
    return [d[s] for s in lst]

print(list(map(norm_sort, lst)))
[[1, 0, 3, 2, 3], [0, 1, 3, 0, 2]]
于 2020-08-25T13:47:49.800 回答