java - 在 groupingBy 期间为重复记录组中的字段分配唯一值

Question

根据此处devReddit提供的回复，我对以下测试文件（假数据）的CSV记录（相同的客户端名称）进行了分组：

CSV 测试文件

id,name,mother,birth,center
1,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
2,Carlos Roberto de Souza,Amália Maria de Souza,2004/12/10,1
3,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
4,Danilo da Silva Cardoso,Sônia de Paula Cardoso,2002/08/10,3
5,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
6,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
7,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
8,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
9,Rosana Pereira de Campos,Ivana Maria de Campos,2002/07/16,3
10,Paula Cristina de Abreu,Cristina Pereira de Abreu,2014/10/25,2
11,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
12,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4

客户实体

package entities;

public class Client {

    private String id;
    private String name;
    private String mother;
    private String birth;
    private String center;
    
    public Client() {
    }

    public Client(String id, String name, String mother, String birth, String center) {
        this.id = id;
        this.name = name;
        this.mother = mother;
        this.birth = birth;
        this.center = center;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getMother() {
        return mother;
    }

    public void setMother(String mother) {
        this.mother = mother;
    }

    public String getBirth() {
        return birth;
    }

    public void setBirth(String birth) {
        this.birth = birth;
    }

    public String getCenter() {
        return center;
    }

    public void setCenter(String center) {
        this.center = center;
    }
        
    @Override
    public String toString() {
        return "Client [id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth + ", center=" + center
                + "]";
    }
        
}

程序

package application;
    
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
    
import entities.Client;
    
public class Program {
    
    public static void main(String[] args) throws IOException {
            
        Pattern pattern = Pattern.compile(",");
            
        List<Client> file = Files.lines(Paths.get("src/Client.csv"))  
            .skip(1)
            .map(line -> { 
                String[] fields = pattern.split(line);
                return new Client(fields[0], fields[1], fields[2], fields[3], fields[4]);
            })
            .collect(Collectors.toList()); 
                        
        Map<String, List<Client>> grouped = file
            .stream()
            .filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
            .collect(Collectors.toList())
            .stream()
            .collect(Collectors.groupingBy(p -> p.getCenter(), LinkedHashMap::new, Collectors.mapping(Function.identity(), Collectors.toList())));

        grouped.entrySet().forEach(System.out::println);    
    }
}

private static Boolean isDuplicate(Client x, Client y) {

    return !x.getId().equals(y.getId())
    && x.getName().equals(y.getName())
    && x.getMother().equals(y.getMother())
    && x.getBirth().equals(y.getBirth());    
}

最终结果（按中心分组）

1=[Client [id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
    Client [id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]
2=[Client [id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
    Client [id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
    Client [id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
    Client [id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
    Client [id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
    Client [id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]

我需要的

我需要为每组重复记录分配一个唯一值，从每次中心值更改开始，甚至将记录保持在一起，因为 map 不能保证这一点，根据下面的示例：

左侧的数字显示按中心（1 和 2）分组。重复名称具有相同的内部组号并从“1”开始。当中心号码发生变化时，内组号码应从“1”重新开始，以此类推。

    1=[Client [group=1, id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
       Client [group=1, id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]

 // CENTER CHANGED (2) - Restart inner group number to "1" again.

    2=[Client [group=1, id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
       Client [group=1, id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
       Client [group=1, id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
 
// NAME CHANGED, BUT SAME CENTER YET - so increases by "1" (group=2)
      
Client [group=2, id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
       Client [group=2, id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
       Client [group=2, id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]

score 0 · Accepted Answer

如果我理解得很好，您需要根据所有三个属性name、mother和对已经分组的条目进行排序birth。您可以在使用收集之前应用这样的排序groupingBy，使用sorted：

 Map<String, List<Client>> grouped = file.stream()
                    .filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
                    .sorted(Comparator.comparing(Client::getName)
                                      .thenComparing(Client::getMother)
                                      .thenComparing(Client::getBirth))
                    .collect(Collectors.groupingBy(Client::getCenter));

Collectors.groupingBy在内部Collectors.toList()用作其下游，因此它保留了您已经用sorted;定义的顺序。不需要LinkedHashMap那么。

更新： 要生成 groupId，您可以从Client实体生成它。以下是更新的Client：

package com.example.demo;

import java.util.Optional;

public class Client {

    private String id;
    private String name;
    private String mother;
    private String birth;
    private String center;
    private String groupId;

    public Client() {
    }

    public Client(String id, String name, String mother, String birth, String center) {
        this.id = id;
        this.name = name;
        this.mother = mother;
        this.birth = birth;
        this.center = center;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getMother() {
        return mother;
    }

    public void setMother(String mother) {
        this.mother = mother;
    }

    public String getBirth() {
        return birth;
    }

    public void setBirth(String birth) {
        this.birth = birth;
    }

    public String getCenter() {
        return center;
    }

    public void setCenter(String center) {
        this.center = center;
    }

    public Optional<String> getGroupId() {
        return Optional.ofNullable(groupId);
    }

    public void setGroupId(final String groupId) {
        this.groupId = groupId;
    }

    @Override
    public String toString() {
        return getGroupId().isPresent()
                ? "Client [groupId=" + groupId + ", id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth +
                ", center=" + center + "]"
                : "Client [id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth + ", center=" + center + "]";
    }
    
    ///
    /// Other public methods
    ///

    public Client generateAndAssignGroupId() {
        setGroupId(String.format("**group=%s**", center));
        return this;
    }
}

和新的流：

Map<String, List<Client>> grouped = file.stream()
                .filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
                .sorted(Comparator.comparing(Client::getName).thenComparing(Client::getMother).thenComparing(Client::getBirth))
                .collect(Collectors.groupingBy(Client::getCenter,
                        Collectors.mapping(Client::generateAndAssignGroupId, Collectors.toList())));

score 0 · Accepted Answer

您可以通过使用相关字段形成键来创建地图，而不是file.stream在 each中使用：filter

Client课堂上的新方法

public String getKey() {
    return String.format("%s~%s~%s~%s", id, name, mother, birth);
}

使用它来创建一个以计数为值的映射。

Map<String, Long> countMap = 
    file.stream()
        .map(Client::getKey)
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

然后

// For each inner group you need a separate id based on the name.
// The input would be a map with client name as the key and the
// value would be the corresponding list of clients.
// The below function returns a new map with 
// integer as the key part (required unique id for each inner group).
Function<Map<String, List<Client>>, Map<Integer, List<Client>>> mapper
    = map -> {
        AtomicInteger i = new AtomicInteger(1);
        return map.entrySet().stream()
                  .collect(Collectors.toMap(e -> i.getAndIncrement(), Map.Entry::getValue);
    };

// assuming static import of "java.util.stream.Collectors"
Map<String, Map<Integer, List<Client>>> grouped = 
    file.stream()
        .filter(x -> countMap.get(x.getKey()) > 1L) // indicates duplicate
        .collect(groupingBy(Client::getCenter,    
                            collectingAndThen(groupingBy(Client::getName, toList()),
                                              mapper /* the above function*/ )));

java - 在 groupingBy 期间为重复记录组中的字段分配唯一值

2 回答 2

Related

Reference