根据此处devReddit提供的回复,我对以下测试文件(假数据)的CSV记录(相同的客户端名称)进行了分组:
CSV 测试文件
id,name,mother,birth,center
1,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
2,Carlos Roberto de Souza,Amália Maria de Souza,2004/12/10,1
3,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
4,Danilo da Silva Cardoso,Sônia de Paula Cardoso,2002/08/10,3
5,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
6,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
7,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
8,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
9,Rosana Pereira de Campos,Ivana Maria de Campos,2002/07/16,3
10,Paula Cristina de Abreu,Cristina Pereira de Abreu,2014/10/25,2
11,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
12,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
客户实体
package entities;
public class Client {
private String id;
private String name;
private String mother;
private String birth;
private String center;
public Client() {
}
public Client(String id, String name, String mother, String birth, String center) {
this.id = id;
this.name = name;
this.mother = mother;
this.birth = birth;
this.center = center;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getMother() {
return mother;
}
public void setMother(String mother) {
this.mother = mother;
}
public String getBirth() {
return birth;
}
public void setBirth(String birth) {
this.birth = birth;
}
public String getCenter() {
return center;
}
public void setCenter(String center) {
this.center = center;
}
@Override
public String toString() {
return "Client [id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth + ", center=" + center
+ "]";
}
}
程序
package application;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import entities.Client;
public class Program {
public static void main(String[] args) throws IOException {
Pattern pattern = Pattern.compile(",");
List<Client> file = Files.lines(Paths.get("src/Client.csv"))
.skip(1)
.map(line -> {
String[] fields = pattern.split(line);
return new Client(fields[0], fields[1], fields[2], fields[3], fields[4]);
})
.collect(Collectors.toList());
Map<String, List<Client>> grouped = file
.stream()
.filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
.collect(Collectors.toList())
.stream()
.collect(Collectors.groupingBy(p -> p.getCenter(), LinkedHashMap::new, Collectors.mapping(Function.identity(), Collectors.toList())));
grouped.entrySet().forEach(System.out::println);
}
}
private static Boolean isDuplicate(Client x, Client y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
最终结果(按中心分组)
1=[Client [id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
Client [id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]
2=[Client [id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]
我需要的
我需要为每组重复记录分配一个唯一值,从每次中心值更改开始,甚至将记录保持在一起,因为 map 不能保证这一点,根据下面的示例:
左侧的数字显示按中心(1 和 2)分组。重复名称具有相同的内部组号并从“1”开始。当中心号码发生变化时,内组号码应从“1”重新开始,以此类推。
1=[Client [group=1, id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
Client [group=1, id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]
// CENTER CHANGED (2) - Restart inner group number to "1" again.
2=[Client [group=1, id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [group=1, id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [group=1, id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
// NAME CHANGED, BUT SAME CENTER YET - so increases by "1" (group=2)
Client [group=2, id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [group=2, id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [group=2, id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]