I'm currently working on a project which is supposed to take some arbitrary input data, do some natural language processing, and dynamically generate a corresponding SQL query. I also have a "reference" set of SQL queries which I can use to compare my SQL against to verify that the SQL generation is accurate.
This is one such SQL query that I've generated:
SELECT DISTINCT t0.airline_code
FROM (
SELECT airline.*
FROM airline, flight
WHERE (
( airline.airline_code = flight.airline_code )
AND
( flight.flight_days = 'DAILY' )
)
)
AS t0
INNER JOIN (
SELECT airline.*
FROM airline, flight, airport_service, city
WHERE (
( airline.airline_code = flight.airline_code )
AND
( flight.from_airport = airport_service.airport_code )
AND
( airport_service.city_code = city.city_code )
AND
( city.city_name = 'BOSTON' )
)
)
AS t1
ON t0.airline_code = t1.airline_code
INNER JOIN (
SELECT airline.*
FROM airline, flight, airport_service, city
WHERE (
( airline.airline_code = flight.airline_code )
AND
( flight.to_airport = airport_service.airport_code )
AND
( airport_service.city_code = city.city_code )
AND
( city.city_name = 'DALLAS' )
)
)
AS t2
ON t1.airline_code = t2.airline_code;
Running this returns the following columns:
airline_code
------------
AA
CO
HP
TW
DL
NW
UA
US
The reference SQL, however, returns slightly different results:
SELECT DISTINCT airline.airline_code
FROM airline
WHERE airline.airline_code IN
(SELECT flight.airline_code
FROM flight
WHERE (flight.flight_days = 'DAILY'
AND (flight.from_airport IN
(SELECT airport_service.airport_code
FROM airport_service
WHERE airport_service.city_code IN
(SELECT city.city_code
FROM city
WHERE city.city_name = 'BOSTON'))
AND flight.to_airport IN
(SELECT airport_service.airport_code
FROM airport_service
WHERE airport_service.city_code IN
(SELECT city.city_code
FROM city
WHERE city.city_name = 'DALLAS')))));
Result:
airline_code
------------
AA
DL
TW
UA
US
Obviously, the two are different in that the first one is using joins while the second one uses nested SQL statements. However, this doesn't appear to be causing any problems for the other generated SQL/reference SQL that I'm working with, which are structured similarly (the generated SQL uses joins, the reference SQL is nested).
I'm fairly new to SQL, and know next to nothing about databases, so I could be missing something stupidly obvious, but for the life of me, I can't see why the two SQL statements are returning different results. They seem functionally identical, as best as I can tell. Does anybody know what I'm doing wrong, and how I can fix the generated SQL to match the reference?
If it matters, I'm using Microsoft SQL Server 2012.