-3

我有一组网址,例如:

https://www.facebook.com/profile.php?id=456789
https://www.facebook.com/messages/78134
https://www.facebook.com/profile.php?id=123
https://www.facebook.com/messages/781234
https://www.facebook.com/45/settings/781234/ab
https://www.facebook.com/48/settings/989213/ef

数据集至少有 100 个 url,比如 5-6 种类型。我期望的是:

[
  ['https://www.facebook.com/profile.php?id=456789',
   'https://www.facebook.com/profile.php?id=123'],
  ['https://www.facebook.com/messages/781234',
   'https://www.facebook.com/messages/78134'],
  ['https://www.facebook.com/45/settings/781234/ab',
   'https://www.facebook.com/48/settings/989213/ef']
]

我如何对它们进行分类?没有学习输入。

4

2 回答 2

1

您的问题没有明确定义,但这似乎可以根据所需的输出工作:

require 'uri'

URL_DIVISIONS = %w[profile messages settings]
URL_DIVISION_REGEX = Regexp.union(URL_DIVISIONS)

urls = %w[
  https://www.facebook.com/profile.php?id=456789
  https://www.facebook.com/messages/78134
  https://www.facebook.com/profile.php?id=123
  https://www.facebook.com/messages/781234
  https://www.facebook.com/45/settings/781234/ab
  https://www.facebook.com/48/settings/989213/ef
]

pp urls.group_by{ |url|
  URI.parse(url).path[URL_DIVISION_REGEX] 
}

哪个输出:

{"profile"=>
  ["https://www.facebook.com/profile.php?id=456789",
  "https://www.facebook.com/profile.php?id=123"],
"messages"=>
  ["https://www.facebook.com/messages/78134",
  "https://www.facebook.com/messages/781234"],
"settings"=>
  ["https://www.facebook.com/45/settings/781234/ab",
  "https://www.facebook.com/48/settings/989213/ef"]}

如果您需要没有划分信息的列表,请使用:

pp urls.group_by{ |url|
  URI.parse(url).path[URL_DIVISION_REGEX] 
}.values

哪个输出:

[["https://www.facebook.com/profile.php?id=456789",
  "https://www.facebook.com/profile.php?id=123"],
["https://www.facebook.com/messages/78134",
  "https://www.facebook.com/messages/781234"],
["https://www.facebook.com/45/settings/781234/ab",
  "https://www.facebook.com/48/settings/989213/ef"]]

不过,我会将其保留为哈希,并使用 URL_DIVISIONS 数组循环键,并根据需要提取值。

于 2013-02-04T05:00:35.463 回答
1

这是一个自学版本。您没有指定学习的确切标准,因此您可能想要调整正则表达式,但也许您可以将其用作起点:

require 'uri'

urls = %w[
  https://www.facebook.com/profile.php?id=456789
  https://www.facebook.com/messages/78134
  https://www.facebook.com/profile.php?id=123
  https://www.facebook.com/messages/781234
  https://www.facebook.com/45/settings/781234/ab
  https://www.facebook.com/48/settings/989213/ef
]

pp urls.group_by { |url|
  (URI.parse(url).path.match(/[a-z]+/) || ["unknown"])[0]
}

输出:

{"messages"=>
  ["https://www.facebook.com/messages/78134",
   "https://www.facebook.com/messages/781234"],
 "profile"=>
  ["https://www.facebook.com/profile.php?id=456789",
   "https://www.facebook.com/profile.php?id=123"],
 "settings"=>
  ["https://www.facebook.com/45/settings/781234/ab",
   "https://www.facebook.com/48/settings/989213/ef"]}
于 2013-02-04T05:36:05.003 回答