Tuesday, March 20, 2012

Joins in SQL Server

Lets get started - For this FAQ I'll be using an example of a fictious health care system. We have 2 simple tables for the initial discussion.

Doctors Patients
ID FullName MedCntrID ID FullName DocID
1 Joe Manners 1 1 Jim Thick 4
2 Sue Tongs 1 2 Tom Small 2
3 Jeff Spine 1 3 Al Downs 4
4 Mary Rasch 2 4 Ann Hills 1
5 Tom Thumb 2 5 Tim Burrow 3
6 Norm Lobe 3 6 Jane Fern 5
7 Sam Broom 2
8 Gary Far 1
9 Bill Out 5
10 Dave Bell 4
11 Fred Overs 5
12 Greg Double 1
13 Bob Marks 9

Now that we have our 2 tables lets see how we can join them

JOINS take on the format of

FROM
Left_Table
JOIN_TYPE
Right_Table
ON
Join_Condition



INNER JOIN - This JOIN takes every record in the Left_Table and looks for 1 or more matches in the Right_Table based on the Join_Condition. If 1 is found the record is added to the result set. If more then one matching record is found in the Right_Table then there are multiple records added to the result set.

Lets look at this in action

SELECT D.FullName AS DoctorName, P.FullName AS PatientName
FROM
Doctors D
INNER JOIN
Patients P
ON
D.ID = P.DocID
The query exectutes and grabs the first record in the Left_Table Doctors D

1 Joe Manners 1

It then matches all records in the Right_Table Patients P based on the Join_Condition D.ID = P.DocID

4 Ann Hills 1
8 Gary Far 1
12 Greg Double 1

And produces this in the result set

DoctorName PatientName
Joe Manners Ann Hills
Joe Manners Gary Far
Joe Manners Greg Double

Then the next record in the Left_Table Doctors D is grabbed

2 Sue Tongs 1

Then all matching records in the the Right_Table Patients P for this record based on the Join_Condition D.ID = P.DocID are found and the records are added to the result set. The final result set for this query would look like this

DoctorName PatientName
Joe Manners Ann Hills
Joe Manners Gary Far
Joe Manners Greg Double
Sue Tongs Tom Small
Sue Tongs Sam Broom
Jeff Spine Tim Burrow
Mary Rasch Jim Thick
Mary Rasch Al Downs
Mary Rasch Dave Bell
Tom Thumb Jane Fern
Tom Thumb Bill Out
Tom Thumb Fred Overs

Note that Norm Lobe does not appear in the result set because no Patients have him listed as a doctor. Also Bob Marks does not appear in the result set because his doctor is not in the Doctors table


OUTER JOIN - This JOIN takes on 3 variations. All 3 have a similar function. These JOINs are designed to bring 2 tables together but include data even if there the Join_Condition is does not find a matching record(s). What it does is fill in the tables columns with NULLs. Lets take the the different types and talk about them.

LEFT OUTER JOIN - This JOIN, is a bit like the INNER JOIN. It takes the Left_Table and tries to match records based on the Join_Condition in the Right_Table. If record(s) are found in the Right_Table then they are match just as they would be in the INNER JOIN. If no match is found in the Right_Tablethen only 1 row is added to the result set for the record in the Left_Table and the columns that come from the Right_Table have the value of NULL

So a query of


SELECT D.FullName AS DoctorName, P.FullName AS PatientName
FROM
Doctors D
LEFT OUTER JOIN
Patients P
ON
D.ID = P.DocID


has a result set of

DoctorName PatientName
Joe Manners Ann Hills
Joe Manners Gary Far
Joe Manners Greg Double
Sue Tongs Tom Small
Sue Tongs Sam Broom
Jeff Spine Tim Burrow
Mary Rasch Jim Thick
Mary Rasch Al Downs
Mary Rasch Dave Bell
Tom Thumb Jane Fern
Tom Thumb Bill Out
Tom Thumb Fred Overs
Norm Lobe NULL

Note this is the same result set as the INNER JOIN but includes a the bolded record because the rule is that ever record in the Left_Table will end up in the result set.

RIGHT OUTER JOIN - This JOIN is just like the LEFT OUTER JOIN except the Right_Table is the table that will have every one of its records in the result set and if no record is found in the
Left_Table then its columns in the result set will be NULL

So a query of


SELECT D.FullName AS DoctorName, P.FullName AS PatientName
FROM
Doctors D
RIGHT OUTER JOIN
Patients P
ON
D.ID = P.DocID


has a result set of
DoctorName PatientName
Joe Manners Ann Hills
Joe Manners Gary Far
Joe Manners Greg Double
Sue Tongs Tom Small
Sue Tongs Sam Broom
Jeff Spine Tim Burrow
Mary Rasch Jim Thick
Mary Rasch Al Downs
Mary Rasch Dave Bell
Tom Thumb Jane Fern
Tom Thumb Bill Out
Tom Thumb Fred Overs
NULL Bob Marks

FULL OUTER JOIN - This JOIN is a combination of both. All records from both Left_Table and Right_Table are in the result set and matched when they can be on the Join_Condition when no record is found in the opposit table NULL values are used for the columns.
So a query of


SELECT D.FullName AS DoctorName, P.FullName AS PatientName
FROM
Doctors D
FULL OUTER JOIN
Patients P
ON
D.ID = P.DocID


has a result set of
Norm Lobe NULL
DoctorName PatientName
Joe Manners Ann Hills
Joe Manners Gary Far
Joe Manners Greg Double
Sue Tongs Tom Small
Sue Tongs Sam Broom
Jeff Spine Tim Burrow
Mary Rasch Jim Thick
Mary Rasch Al Downs
Mary Rasch Dave Bell
Tom Thumb Jane Fern
Tom Thumb Bill Out
Tom Thumb Fred Overs
NULL Bob Marks



Now there is one more Join type in T-SQL. This is not frequently used and I've personally come up with 1 use for it and that is filling up a table with dummy data to get a rough idea of performance. r937 has shown me a few other times you can use these and links to those Threads are at the bottom of this section.

CROSS JOIN This JOIN has a slightly different format in that it does not have a ON clause with a Join_Condition. This is because of the nature of the CROSS JOIN it doesn't need a join condition. What it does is perform a cartesian product of the tables involved in the join. This mean every row in the Left_Table is joined to every row in the Right_Table. For our tables with 6 doctors and 13 patients we would get a result set of 6x13 or 78 records. The query would look like


SELECT D.FullName AS DoctorName, P.FullName AS PatientName
FROM
Doctors D
CROSS JOIN
Patients P


The result set would look roughly like this

Joe Manners Jim Thick
Joe Manners Tom Small
Joe Manners Al Downs
Joe Manners Ann Hills
Joe Manners Tim Burrow
.
.
.
Norm Lobe Fred Overs
Norm Lobe Greg Double
Norm Lobe Bob Marks


following threads are some great examples of using cross joins by r937, thanks for enlightening me.

Thread183-755853 -- triple cross-join of the integers 0 through 9 to create the numbers which are used to generate a range of dates

Thread436-755873 -- cross join a single row of one table to unrelated rows of another

Thread701-728879 -- cross join two tables to get all possible combinations, then outer join to find the ones that are missing


Now there are the JOINs and how they work. What about when you want to join more then 2 tables?
Really you can only join 2 tables together. You just can have multiple sets of 2 tables and 1 single table may be joined to more then 1 other table, heck you can even join a table back to itself.

Some rules to follow to make things easier.
1) Use a LEFT OUTER JOIN over a RIGHT OUTER JOIN when ever possible. It will make things easier as you can always think of the Left_Table as the base table.

2) Start from you base table and work outwards towards the auxillary tables. Say you wanted to get a list of Patients with that have doctors on your register and pull the information about the Medical Center the doctor works at if it is avialable then you would do it like this

SELECT ....
FROM Patients P
INNER JOIN Doctors D *NOTE 1
ON P.DocID = D.ID
LEFT OUTER JOIN MedicalCenters MC *NOTE 2
ON D.MedCntrID = MC.ID


*NOTE 1 INNER JOIN because you only want patients that have a doctor that is on your register
*NOTE 1 Left OUTER JOIN because you want the information about the medical center if you have it but if you don't you still want the record to be included. If you made this a INNER JOIN then if a Doctor's medical center was not on record then the doctor would not be included in the result set meaing that any of their patients would not be on the result set since Patients and Doctors is done with an INNER JOIN too.

3) You can have multiple conditions for the ON clause just like you can in a WHERE clause. Make sure you know the difference between a JOIN condition and a WHERE clause. Though they look similiar in task they can execute slightly different giving unexpected results if you are not careful.

4) Use Alias in tables names they can make reading your queries much easiers

5) Format your queries for easy reading. A query doesn't run any faster if its all on 1 line or spread over 40 lines. As you can see from my formating style it is very easy to see what joins to what and the conditions they use which means less of a chance you made a error and will get unexpected results.

6) JOIN conditions are prime canidates for indexs. On small tables it doesn't matter but large tables will show great performance benifits from haveing the columns indexed.

I hope this little FAQ helps you understand JOINs better.

The content of this article is takne from http://www.tek-tips.com/faqs.cfm?fid=4785