mysql - 复杂的 SQL 语句让我越来越好

Question

我知道我一定是非常愚蠢的，但我正在尝试使用相当复杂的语句（至少对我而言）查询数据库，并且我得到的行数比我预期的要多，有人知道如何“解决”这个问题吗？

我正在查询的表创建如下：

glycoPeptide | CREATE TABLE `glycoPeptide` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `protein` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1 |

run   | CREATE TABLE `run` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `glycoPeptide` int(11) NOT NULL,
  `run` enum('spectrum','chromatogram') NOT NULL,
  `glycoType` enum('N','O') DEFAULT NULL,
  `glycoSite` int(11) DEFAULT NULL,
  `pepMass` varchar(5) DEFAULT NULL,
  `pepSeq` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `glycoPeptide` (`glycoPeptide`),
  CONSTRAINT `run_ibfk_1` FOREIGN KEY (`glycoPeptide`) REFERENCES `glycoPeptide` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 |

spectrum | CREATE TABLE `spectrum` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `run` int(11) NOT NULL,
  `glycoform` varchar(255) DEFAULT NULL,
  `spectrum` enum('m/z','intensity') NOT NULL,
  PRIMARY KEY (`id`),
  KEY `run` (`run`),
  CONSTRAINT `spectrum_ibfk_1` FOREIGN KEY (`run`) REFERENCES `run` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=latin1 |

precursor | CREATE TABLE `precursor` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `run` int(11) NOT NULL,
  `retentionTime` time DEFAULT NULL,
  `mzValue` float DEFAULT NULL,
  `chargeState` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `run` (`run`),
  CONSTRAINT `precursor_ibfk_1` FOREIGN KEY (`run`) REFERENCES `run` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=latin1 |

binaryDataArray | CREATE TABLE `binaryDataArray` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `spectrum` int(11) NOT NULL,
  `arrayLength` int(11) NOT NULL,
  `EncodedLength` int(11) NOT NULL,
  `arrayData` text,
  PRIMARY KEY (`id`),
  KEY `spectrum` (`spectrum`),
  CONSTRAINT `binaryDataArray_ibfk_1` FOREIGN KEY (`spectrum`) REFERENCES `spectrum` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=latin1 |

我有一些关于 2 种蛋白质（IgG 和 IgE）的测试数据。IgG 仅包含 1 次运行，仅包含 1 个糖位点，因此仅包含 1 个 binaryDataArrays 的“集合”。IgE 包含 3 个糖位点，因此有 3 次运行，每次运行可以包含多个光谱（每组 2 个 binaryDataArray）。

我使用以下查询（我知道使用 JOINS 会更漂亮）：

select
  precursor.mzValue,
  glycoPeptide.protein,
  binaryDataArray.arrayLength,
  binaryDataArray.encodedLength,
  precursor.chargeState,
  run.pepMass,
  run.PepSeq
from
  precursor,
  glycoPeptide,
  binaryDataArray,
  spectrum,
  run
where
  run.glycoPeptide = glycoPeptide.id AND
  spectrum.run = run.id AND
  precursor.run = run.id AND
  binaryDataArray.spectrum = spectrum.id AND
  spectrum.spectrum like 'm/z' AND
  precursor.mzValue like '1196.79' AND
  glycoPeptide.protein like 'IgE' AND
  run.glycoSite like '252' AND
  run.glycoType like 'N';

产生 IgG 的结果与我预期的一样：

+---------+---------+-------------+---------------+-------------+---------+-----------+
| mzValue | protein | arrayLength | encodedLength | chargeState | pepMass | PepSeq    |
+---------+---------+-------------+---------------+-------------+---------+-----------+
|   933.4 | IgG     |       10301 |         22912 |           3 | 1189.   | EEQYNSTYR |
+---------+---------+-------------+---------------+-------------+---------+-----------+
1 row in set (0.00 sec)

对于 IgE（使用上面的语句），我得到以下结果：

+---------+---------+-------------+---------------+-------------+---------+-----------+
| mzValue | protein | arrayLength | encodedLength | chargeState | pepMass | PepSeq    |
+---------+---------+-------------+---------------+-------------+---------+-----------+
| 1196.79 | IgE     |       10301 |        109880 |           3 | 1033.   | GTVNLTWSR |
| 1196.79 | IgE     |       10301 |         54940 |           3 | 1033.   | GTVNLTWSR |
| 1196.79 | IgE     |       10301 |         54940 |           3 | 1033.   | GTVNLTWSR |
+---------+---------+-------------+---------------+-------------+---------+-----------+
3 rows in set (0.00 sec)

虽然我希望这里只有 1 行，但我似乎无法理解它。

任何帮助将不胜感激

-- 编辑 1 --

据我所知，我编写 where 子句的方式应该与 join 完全一样，所以这不应该是问题......

-- 编辑 2 --

样本数据：

select * from glycoPeptide;
+----+---------+
| id | protein |
+----+---------+
|  1 | IgG     |
|  2 | IgE     |
+----+---------+
2 rows in set (0.00 sec)

mysql> select * from run;
+----+--------------+----------+-----------+-----------+---------+-----------------+
| id | glycoPeptide | run      | glycoType | glycoSite | pepMass | pepSeq          |
+----+--------------+----------+-----------+-----------+---------+-----------------+
|  1 |            1 | spectrum | N         |       297 | 1189.   | EEQYNSTYR       |
|  2 |            2 | spectrum | N         |       275 | 1516.   | NGTLTVTSTLPVGTR |
|  3 |            2 | spectrum | N         |       252 | 1033.   | GTVNLTWSR       |
|  4 |            2 | spectrum | N         |        99 | 1556.   | VAHTPSSTDWVDNK  |
+----+--------------+----------+-----------+-----------+---------+-----------------+
4 rows in set (0.00 sec)

select * from precursor;
+----+-----+---------------+---------+-------------+
| id | run | retentionTime | mzValue | chargeState |
+----+-----+---------------+---------+-------------+
|  1 |   1 | 00:13:32      |   933.4 |           3 |
|  2 |   2 | 00:00:00      |  965.55 |           2 |
|  3 |   2 | 00:00:00      | 912.036 |           2 |
|  4 |   2 | 00:00:00      | 1127.06 |           3 |
|  5 |   3 | 00:00:00      | 1099.97 |           2 |
|  6 |   3 | 00:00:00      |  1153.9 |           3 |
|  7 |   3 | 00:00:00      | 1196.79 |           3 |
|  8 |   4 | 00:00:00      |  1109.5 |           2 |
|  9 |   4 | 00:00:00      | 1157.66 |           2 |
| 10 |   4 | 00:00:00      | 1225.66 |           2 |
| 11 |   4 | 00:00:00      | 1206.47 |           3 |
| 12 |   4 | 00:00:00      | 1328.31 |           3 |
| 13 |   4 | 00:00:00      | 1304.09 |           3 |
| 14 |   4 | 00:00:00      | 1165.04 |           2 |
+----+-----+---------------+---------+-------------+
14 rows in set (0.00 sec)

mysql> select * from spectrum;
+----+-----+-----------+-----------+
| id | run | glycoform | spectrum  |
+----+-----+-----------+-----------+
|  1 |   1 | G1F       | m/z       |
|  2 |   1 | G1F       | intensity |
|  3 |   2 | NULL      | m/z       |
|  4 |   2 | NULL      | intensity |
|  5 |   2 | NULL      | m/z       |
|  6 |   2 | NULL      | intensity |
|  7 |   2 | NULL      | m/z       |
|  8 |   2 | NULL      | intensity |
|  9 |   3 | NULL      | m/z       |
| 10 |   3 | NULL      | intensity |
| 11 |   3 | NULL      | m/z       |
| 12 |   3 | NULL      | intensity |
| 13 |   3 | NULL      | m/z       |
| 14 |   3 | NULL      | intensity |
| 15 |   4 | NULL      | m/z       |
| 16 |   4 | NULL      | intensity |
| 17 |   4 | NULL      | m/z       |
| 18 |   4 | NULL      | intensity |
| 19 |   4 | NULL      | m/z       |
| 20 |   4 | NULL      | intensity |
| 21 |   4 | NULL      | m/z       |
| 22 |   4 | NULL      | intensity |
| 23 |   4 | NULL      | m/z       |
| 24 |   4 | NULL      | intensity |
| 25 |   4 | NULL      | m/z       |
| 26 |   4 | NULL      | intensity |
| 27 |   4 | NULL      | m/z       |
| 28 |   4 | NULL      | intensity |
+----+-----+-----------+-----------+
28 rows in set (0.00 sec)

mysql> select id, spectrum, arrayLength, encodedLength from binaryDataArray;
+----+----------+-------------+---------------+
| id | spectrum | arrayLength | encodedLength |
+----+----------+-------------+---------------+
|  1 |        1 |       10301 |         22912 |
|  2 |        2 |       10301 |          3092 |
|  3 |        3 |       10301 |         54940 |
|  4 |        4 |       10301 |        109880 |
|  5 |        5 |       10301 |         54940 |
|  6 |        6 |       10301 |        109880 |
|  7 |        7 |       10301 |        102408 |
|  8 |        8 |       10301 |        109880 |
|  9 |        9 |       10301 |        109880 |
| 10 |       10 |       10301 |         54940 |
| 11 |       11 |       10301 |         54940 |
| 12 |       12 |       10301 |        109880 |
| 13 |       13 |       10301 |         54940 |
| 14 |       14 |       10301 |        109880 |
| 15 |       15 |       10301 |        109880 |
| 16 |       16 |       10301 |         54940 |
| 17 |       17 |       10301 |         54940 |
| 18 |       18 |       10301 |        109880 |
| 19 |       19 |       10301 |        109880 |
| 20 |       20 |       10301 |         54940 |
| 21 |       21 |       10301 |        109880 |
| 22 |       22 |       10301 |         54940 |
| 23 |       23 |       10301 |         54940 |
| 24 |       24 |       10301 |        109880 |
| 25 |       25 |       10301 |         54940 |
| 26 |       26 |       10301 |        109880 |
| 27 |       27 |       10301 |        109880 |
| 28 |       28 |       10301 |         54940 |
+----+----------+-------------+---------------+
28 rows in set (0.00 sec)

-- 编辑 3 --

当前所需的数据无法从数据库中收集，因为其中一个关系不存在（需要能够将光谱链接到前体）。我必须感谢 Radical 先生和 Jack 帮助发现了这个缺陷并接受了 Jack 的回答，因为他在查询中的连接表示法比我做的更容易阅读。

score 1 · Accepted Answer

首先，我会像这样重写您的查询；更容易看到连接条件是什么，并且它保持 where 子句干净：

select
  precursor.mzValue,
  glycoPeptide.protein,
  binaryDataArray.arrayLength,
  binaryDataArray.encodedLength,
  precursor.chargeState,
  run.pepMass,
  run.PepSeq
from
  precursor
  inner join glycoPeptide on run.glycoPeptide = glycoPeptide.id
  inner join binaryDataArray on binaryDataArray.spectrum = spectrum.id
  inner join spectrum on spectrum.run = run.id
  inner join run on precursor.run = run.id
where
  spectrum.spectrum like 'm/z' AND
  precursor.mzValue like '1196.79' AND
  glycoPeptide.protein like 'IgE' AND
  run.glycoSite like '252' AND
  run.glycoType like 'N';

您的查询的问题在于spectrum表格。连接 fromrun产生三行，spectrum.id分别为 9、11 或 13。

|  9 |   3 | NULL      | m/z       |
| 11 |   3 | NULL      | m/z       |
| 13 |   3 | NULL      | m/z       |

score -1 · Accepted Answer

MySql JOIN 会帮助你。你的问题太长了，所以我给你一个简单的 JOIN 例子

表引用可以使用 tbl_name AS alias_name 或 tbl_name alias_name 来别名：

SELECT t1.name, t2.salary
  FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;

SELECT t1.name, t2.salary
  FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;

更多详情请访问： http ://www.w3schools.com/sql/sql_join.asp

mysql - 复杂的 SQL 语句让我越来越好

2 回答 2

Related

Reference