java - java中动态多选项groupby的复杂哈希图数据结构-改进在任意数据上运行groupbys的实现的想法

Question

我正在编写一个程序，它将针对文件中的所有数字列在任何任意文件（没有数据的先验知识）的每个特征列上运行 groupby。我希望这个过程非常快，但我希望它首先起作用。我有两个问题：

1）。就这个复杂的 HashMaps 数据结构列表如何在视觉上表示（在评论中描述）而言，以下理解是否正确？

    List<HashMap<String, ArrayList<HashMap<String, Number>>>> finalResult = 
            new ArrayList<HashMap<String, ArrayList<HashMap<String, Number>>>>();
        /**
         * Result should contain something like this for population and other metrics:
         * [{population={state={Virginia=20000000, Texas=200000, NY=30000000}, 
         *      {Country={Africa=30000000, India=400000000}}, 
         *  {Temperature={state={Virginia=83, Texas=92, NY=72},
         *      {Country={Africa=90, India=88, England=65, Canada=69}}}},
         *  {LifeExpectancy={state={Virginia=77, Texas=83, NY=67},
         *      {Country={Africa=90, India=88, England=65, Canada=69}}}}]
         */

2）。有没有更聪明的方法来存储所有这些信息？在改进这种数据结构设计方面有什么想法吗？它基本上将存储聚合类型列表和每个特征列的数字指标。

这是一个示例文件（顺便说一下，可以是任何类型的文件）：

id;state;city;total_pop;avg_temp
1;Florida;;120000;76
2;Michigan;Detroit;330000;54
3;New Jersey;Newark;;34
4;Florida;Miami;200000;80
5;New Jersey;Jersey City;1200000;55

先感谢您。

score 2 · Accepted Answer

拥有包含这些属性的Countryor对象会更容易。State然后，您可以使用 custom 进行排序Comparator。然后你会得到这样的东西：

Map<String, List<Country>> countryStatistics = new Map<>();    
countryStatistics.put(
    "population", 
    new ArrayList<Country>(
        Collections.sort(
            countries, 
            new Comparator<Country>() {
                int compare(Country c1, Country c2) {
                    return c1.getPopulation() - c2.getPopulation();
                }
            }
        )
    )
);

依此类推，对于每个类别。然后，您将拥有一张地图，将每个统计数据映射到按该统计数据排序的国家/地区排序列表。

根据您的编辑，对于任意数据，您也许可以执行以下操作：

//there's probably a better name for this, but let's go with this for now
public class Data {
    private Map<String, Integer> attributes = new HashMap<>();

    public Integer getValue(String attribute) {
        return attributes.get(attribute); //This doesn't handle cases where
                                          //the attribute doesn't exist. Maybe
                                          //you want to return 0 for that. 
    }

    public Integer setValue(String attribute, Integer value) {
        attributes.put(attribute, value);
    } 
}

然后你会做类似的事情：

Map<String, List<Data>> dataStatistics = new Map<>();    
dataStatistics.put(
    "population", 
    new ArrayList<Country>(
        Collections.sort(
            countries, 
            new Comparator<Country>() {
                @Override
                public int compare(Country c1, Country c2) {
                    return c1.getValue("population") - c2.getValue("population");
                }
            }
        )
    )
);

如果您不想重复代码，您可以创建一个工厂方法，该方法Comparator根据指定的属性返回该排序的实例：

public Comparator<Data> createComparatorForAttribute(final String attribute) {
    return new Comparator<Data>() {
        @Override
        public int compare(Data d1, Data d2) {
            return d1.getValue(attribute) - d2.getValue(attribute);
        }
    };
}

java - java中动态多选项groupby的复杂哈希图数据结构-改进在任意数据上运行groupbys的实现的想法

1 回答 1

Related

Reference