1

我正在与某人讨论下表,该表用于链接特定于客户的项目:

Table LINK:

Client (int) 
Item1 (int) 
Item2 (int)

这是有争议的设计。所有三个字段都引用其他表。两个 Item 字段引用同一个其他表。这些不是真正的字段名称,因此不必费心讨论命名约定(然而,“1”和“2”实际上是字段名称的一部分)。我认为这种设计在违反 1NF 的基础上是不好的,而另一个人则认为即使这看起来令人反感,但对于我们的特定用例来说,所有其他选择都更糟糕。

笔记:

  • 绝大多数情况只需要将两个项目相互链接;
  • 但是允许 N:1 组;在这种情况下,相同的 Item1 在具有不同 Item2 值的多行上重复;
  • 在极少数情况下,某些 Item2 值(在现有 Item1-Item2 链接中)本身链接到其他 Item,在这些情况下,这些值出现在 Item1 列中,而另一个链接值出现在 Item2 列中; 所有链接的项目都对应一个组,并且必须这样检索。

我的主张:

  • 这违反了1NF:Item1和Item2是同一张表的外键,因此构成一个重复组(对方对重复组的定义不同意);
  • 对于 Item 的搜索,这意味着需要两个索引而不是一个,例如在使用 GroupID 字段的表中;
  • 这使得在此表中查找特定项目的查询更加复杂,因为限制子句必须同时检查 Item1 和 Item2 字段。
  • 对于出现项目链接链的情况的检索将更加复杂。

对方声称:

  • 最可行的替代方案是具有单个 Item 字段和附加 GroupID 字段的表;
  • 更简单、更常见的两项链接案例现在变得更加复杂;
  • 获取 GroupID 插槽时可能存在并发问题,需要对其进行管理
  • 管理 GroupID 并发问题可能需要在具有唯一性约束的字段中使用 GroupID 的第二个表
  • 您现在必须至少在某些时候执行连接,尤其是在使用 ORM 的情况下。与在当前设计中使用单个表相比,连接效率较低。

我想听听对此的一些看法。我已经阅读了关于数据库设计的其他帖子,尤其是 1NF,但它们并没有像我希望的那样专门处理我上面的案例。根据网上的大量研究,我还了解到,像 1NF 这样的所谓标准可以由不同的人以多种不同的方式定义。我试图尽可能清楚地说明这两种论点,而不是偏袒其中一种。

编辑1:

  • Item1 和 Item2 是(金融)交易
  • “1”和“2”实际上是字段名称的一部分
4

2 回答 2

2

什么是项目 1 和项目 2?它们是不同的实体吗?然后设计对我来说似乎很好。

For example, you might want to fill a database with solutions to the traveling salesman problem. You have a table City(cityId, latitude, longitude), and a table Path(pathId, salesmanId). Now a path where the salesman visits n+1 cities would be represented by n entries in PathSegment(pathId, segmentId, fromCityId, toCityId). Here, although fromCityId and toCityId are foreign keys that reference the same table City, they describe different attributes of the PathSegment entity, hence this does not violate NF1.

Edit:

So you want to store trees, actually, only your trees are mostly just linked lists, and most of those are linked lists with just two nodes, right? And apparently your coworker wants to do this as an adjacency list, so a tree like

1-2-3
\-4

becomes

(1,2)
(2,3)
(1,4)

There's nothing wrong with that, but it's not the only way to store a tree in a database. For a good summary of alternatives, see here.

In your case, the advantage of using an adjacency list is that most of your trees have only two nodes, so most of them end up being one row in the table, keeping that simple. Also, questions about the immediate neighbours are easy. "What's the invoice for this payment?" becomes

select item1 from link where item2 = :paymentID

which is neat, too. There are drawbacks, though. The order of child nodes often matters, and the list doesn't help you here, so you have to store that either as a separate column or as something like timestamps in the tables your foreign keys are referring to). Also, reconstructing an entire branch becomes a recursive task, and not all database systems can do that. So if your application often has to retrieve a message-board-like overview of the invoice history, it might require some application-side logic that turns the list of adjacent nodes into a tree on the client and works on that. If that becomes too cumbersome, you might want to consider a nested sets representation, see here.

What's best for your problem? Depends on several things: size and shape of your trees (if they are really mostly short linked lists, adjacency list is good), frequency of inserts and updates (if frequent, adjacency list is good, because its inserts are cheap), frequency and complexity of queries (if frequent and complex, nested sets is good, because its selects are simple and fast). So for a message board, i'd go with nested sets (or even Tropashkos nested intervals for speed and extra coolness), but for a simple request-response (and sometimes some more response) table, i'd probably use an adjacency list.

于 2009-12-01T15:22:59.920 回答
1

Just having two foreign keys pointing to the same table isn't by default a "violation." You might have a Person table with FatherID and MotherID fields both pointing back to the Person table. This isn't a repeating group, as they're semantically different attributes. Your first claim—as written and free of any other context—is false.

于 2009-12-01T16:01:20.840 回答