假设我有城市 A、城市 B、城市 C 和城市 D 的时间序列数据,如下所示:
+------------+--------+--------+--------+--------+
| Dates | City A | City B | City C | City D |
+------------+--------+--------+--------+--------+
| 2020-01-01 | 10 | 20 | 20 | 30 |
+------------+--------+--------+--------+--------+
| 2020-01-02 | 20 | 30 | 30 | 40 |
+------------+--------+--------+--------+--------+
| 2020-01-03 | 30 | 40 | 20 | 20 |
+------------+--------+--------+--------+--------+
| 2020-01-04 | 40 | 20 | 15 | 40 |
+------------+--------+--------+--------+--------+
| 2020-01-05 | 50 | 40 | 18 | 10 |
+------------+--------+--------+--------+--------+
| 2020-01-06 | 60 | 50 | 20 | 15 |
+------------+--------+--------+--------+--------+
| 2020-01-07 | 70 | 60 | 40 | 72 |
+------------+--------+--------+--------+--------+
| 2020-01-08 | 50 | 80 | 60 | 90 |
+------------+--------+--------+--------+--------+
| 2020-01-09 | 30 | 30 | 90 | 17 |
+------------+--------+--------+--------+--------+
| 2020-01-10 | 60 | 50 | 18 | 15 |
+------------+--------+--------+--------+--------+
我想通过对齐时间索引分别计算 A&B、A&C、A&D 之间的余弦和欧几里得距离。
例如,要计算城市 A 和城市 B 之间的欧几里得距离,我会计算他们 2020-01-01 数据、2020-01-02 数据、2020-01-03 数据的欧几里得距离……然后将所有数据相加将它们加在一起,得出城市 A 和城市 B 之间的最终欧几里得距离。
编写执行此任务的 R 函数的优雅方法是什么?
然后,如果我的数据开始包含更多变量:
+------------+-------+------+------+------+
| Dates | City | Var1 | Var2 | Var3 |
+------------+-------+------+------+------+
| 2020-01-01 | A | 20 | 200 | 5 |
+------------+-------+------+------+------+
| 2020-01-02 | A | 30 | 300 | 3 |
+------------+-------+------+------+------+
| 2020-01-03 | A | 40 | 220 | 4 |
+------------+-------+------+------+------+
| 2020-01-04 | A | 20 | 150 | 2 |
+------------+-------+------+------+------+
| 2020-01-05 | A | 40 | 180 | 5 |
+------------+-------+------+------+------+
| 2020-01-01 | B | 50 | 200 | 6 |
+------------+-------+------+------+------+
| 2020-01-02 | B | 60 | 400 | 7 |
+------------+-------+------+------+------+
| 2020-01-03 | B | 80 | 600 | 8 |
+------------+-------+------+------+------+
| 2020-01-04 | B | 30 | 900 | 4 |
+------------+-------+------+------+------+
| 2020-01-05 | B | 50 | 180 | 2 |
+------------+-------+------+------+------+
| 2020-01-01 | C | 20 | 230 | 3 |
+------------+-------+------+------+------+
| 2020-01-02 | C | 30 | 340 | 5 |
+------------+-------+------+------+------+
| 2020-01-03 | C | 40 | 230 | 3 |
+------------+-------+------+------+------+
| 2020-01-04 | C | 20 | 120 | 5 |
+------------+-------+------+------+------+
| 2020-01-05 | C | 40 | 120 | 4 |
+------------+-------+------+------+------+
| 2020-01-01 | D | 20 | 400 | 5 |
+------------+-------+------+------+------+
| 2020-01-02 | D | 30 | 500 | 6 |
+------------+-------+------+------+------+
| 2020-01-03 | D | 10 | 600 | 7 |
+------------+-------+------+------+------+
| 2020-01-04 | D | 50 | 3O0 | 7 |
+------------+-------+------+------+------+
| 2020-01-05 | D | 20 | 300 | 4 |
+------------+-------+------+------+------+
使用上面相同的示例,要计算城市 A 和城市 B 之间的欧几里德距离,我将为变量 1 的 2020-01-01 数据、2020-01-02 数据、2020-01-03 数据计算欧几里得距离 - > 对变量 2 和变量 3 重复此过程。然后,最后将所有这些相加,得到城市 A 和城市 B 之间的总欧几里得距离。
我想知道这样的距离计算在技术上是否可行,如果是这样,我如何编写一个 R 函数来执行欧几里德和余弦距离的这些任务,分别为 1 个感兴趣的单个变量和多个感兴趣的变量?
非常感谢您的帮助!