Relational Databases and SQL Joins

A student asks “I don’t really yet understand Joins, just figured it out using the example in the text book.  Joins are fascinating and I look forward to better understanding them as I use them at work in a ‘trial and error’ mode mostly.

The SQL class is exploring the technique of SQL Joins in this first week’s exercises (pages 156-162).   The Advanced Dreamweaver class will do so in a few weeks in Lesson 11.   I will try to explain the concept of SQL Joins in this post, students in either class may add comments or questions below.

Relational Database Theory

The relational database was conceived by two IBM researchers, Doctors Codd & Date, over 40 years ago and many computer science PhDs have explored, refined and standardized the relational database model in the time since.   The concept begins with advanced mathematical set theory and is beyond the scope of either of these classes.

The basic concept is that by storing data within rows and columns of inter-related tables, you can  reduce (or even eliminate) data redundancy and inaccuracy and make it easier to explore data dynamically.  If you are familiar with spreadsheets, you have probably seen where a set of data can become very fragmented if every user or entry can spell out data elements directly, such  city names as an example.   You’ll have address records with data such as Santa Rosa, santa rosa, SR, SANTA ROSA within a few weeks.   If that column was tied to another table of pre-validated city names, data would be more consistent and easier to collate, sort and summarize.

Relational databases are typically designed conceptually before they are built using Entity/Relationship Diagrams.     Entities have a Primary Key (unique identifier) and one or more Attributes.      Attributes that are common between tables are Relationships.   Here is the ER Diagram for the music database:

Entity Relationship Diagram
The ER Diagram shows entities in blue,  attributes below with  keys or diamond icons, and relationships as linking lines with a delta where there can be more than one foreign key  for a given relationship.   This diagram can be loosely translated into English - a recording artist may release one or more albums which can each contain one or more more tracks.      A track may be played many times but only once per timestamp.

SQL Joins

The join function is used in SQL SELECT statements to coordinate the results (often referred to as a recordset in MySQL Workbench and Dreamweaver) into a tangible view of the information.    If we ask the database for information from two tables in the database without joining them, we get the superset of the table info (all possible combinations).   Try executing this statement in MySQL Workbench Scratchpad:

select album_name, artist_name from album,artist;

That will return obviously meaningless information.

We are actually looking for the intersection of the table info, each album with it’s one coinciding recording artist.   We need to express our SQL statement with an inner join – like this:

select album_name, artist_name from album inner Join artist on album.artist_id = artist.artist_id;

This statement returns the ‘correct’ meaningful resultset – it is now much easier to determine the author’s musical tastes.  By joining artist_id’s from both tables we create a virtual new record of only the intersecting records – and that becomes our result set.

Query Resultset

Don’t Be Intimidated

My goal is to teach you the TECHNIQUES you need to use MySQL for basic applications.   Inner joins a bit tricky to grasp conceptually, but at this point just keep in mind that they will be necessary when you need to collect data where the columns are in more than one data table.     These 2 questions from the Learning MySQL text are another good illustration of the concept.

Download this music.sql file.

1. Use one or more SELECT statements to find out how many tracks are on New Order’s
Brotherhood album.
2. Using a join, list the albums that we own by the band New Order.