如何删除基于特定字符串或字符的重复行?
例如,我有一个包含以下内容的文件:
https://example.com/?first=one&second=two&third=three
https://example.com/?first=only&second=cureabout&third=theparam
https://example.com/?fourth=four&fifth=five
https://stack.com/?sixth=six&seventh=seven&eighth=eight
https://stack.com/?sixth=itdoesnt&seventh=matter&eighth=something
我希望它根据字符串参数使行唯一,并打印唯一一个具有相同参数的 URL,当然还可以识别它们的域。价值观并不重要。
期望的结果:
https://example.com/?first=one&second=two&third=three
https://stack.com/?sixth=six&seventh=seven&eighth=eight
更新
在下面的代码中,我尝试在之前 grep 3 个字符=
,如果行包含该特定字符,则使用唯一行并打印结果。实际上,如果文件具有一定数量的相似参数,则目标是使文件唯一。
for url in $(cat $1); do
# COUNT NUMBER OF EQUAL CHARACTER "="
count_eq=$(echo $url | sed "s/=/=\n/g" | grep -a -c '=')
if [[ $count_eq == "3" ]]; then
# GREP 3 CHARACTERS BEFORE "="
same_param=$(printf $url | grep -o -P '.{0,3}=.{0,0}' | sort -u)
if [[ $url == *"$same_param"* ]];then
sort -u "$url" | printf "$url\n"
fi
fi
done
谢谢。