Sunday, February 8, 2015

SQL - Things To Remember About

Actually if You're not working with SQL on a daily basis there're several things You will forget really quick. Here probably not the most popular notes but as for me those for keep in mind. All of them were taken from the internet and adopted for myself.


1. Which TCP/IP port does SQL Server run on? How can it be changed?
SQL Server runs on port 1433. It can be changed from the Network Utility TCP/IP properties.

2. What are the difference between clustered and a non-clustered index?
1.     A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.
2.     A non clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

3. What are the different index configurations a table can have?
A table can have one of the following index configurations:
1.     No indexes
2.     A clustered index
3.     A clustered index and many nonclustered indexes
4.     A nonclustered index
5.     Many nonclustered indexes

4. What are different types of Collation Sensitivity?
1.     Case sensitivity - A and a, B and b, etc.
2.     Accent sensitivity
3.     Kana Sensitivity - When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.
4.     Width sensitivity - A single-byte character (half-width) and the same character represented as a double-byte character (full-width) are treated differently than it is width sensitive.

5. What's the difference between a primary key and a unique key?
Both primary key and unique key enforces uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but unique key allows one NULL only.

6. What is difference between DELETE and TRUNCATE commands?
Delete command removes the rows from a table based on the condition that we provide with a WHERE clause. Truncate will actually remove all the rows from a table and there will be no data in the table after we run the truncate command.
1.     TRUNCATE:
1.     TRUNCATE is faster and uses fewer system and transaction log resources than DELETE.
2.     TRUNCATE removes the data by deallocating the data pages used to store the table's data, and only the page deallocations are recorded in the transaction log.
3.     TRUNCATE removes all rows from a table, but the table structure, its columns, constraints, indexes and so on, remains. The counter used by an identity for new rows is reset to the seed for the column.
4.     You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint. Because TRUNCATE TABLE is not logged, it cannot activate a trigger.
5.     TRUNCATE cannot be rolled back.
6.     TRUNCATE is DDL Command.
7.     TRUNCATE Resets identity of the table
2.     DELETE:
1.     DELETE removes rows one at a time and records an entry in the transaction log for each deleted row.
2.     If you want to retain the identity counter, use DELETE instead. If you want to remove table definition and its data, use the DROP TABLE statement.
3.     DELETE Can be used with or without a WHERE clause
4.     DELETE Activates Triggers.
5.     DELETE can be rolled back.
6.     DELETE is DML Command.
7.     DELETE does not reset identity of the table.
Note: DELETE and TRUNCATE both can be rolled back when surrounded by TRANSACTION if the current session is not closed. If TRUNCATE is written in Query Editor surrounded by TRANSACTION and if session is closed, it can not be rolled back but DELETE can be rolled back.

7. When is the use of UPDATE_STATISTICS command?
This command is basically used when a large processing of data has occurred. If a large amount of deletions any modification or Bulk Copy into the tables has occurred, it has to update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.

8. What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?
They specify a search condition for a group or an aggregate. But the difference is that HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query whereas WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.

9. What are the properties and different Types of Sub-Queries?
1.     Properties of Sub-Query
1.     A sub-query must be enclosed in the parenthesis.
2.     A sub-query must be put in the right hand of the comparison operator, and
3.     A sub-query cannot contain an ORDER-BY clause.
4.     A query can contain more than one sub-query.
2.     Types of Sub-Query
1.     Single-row sub-query, where the sub-query returns only one row.
2.     Multiple-row sub-query, where the sub-query returns multiple rows,. and
3.     Multiple column sub-query, where the sub-query returns multiple columns

10. Name 3 ways to get an accurate count of the number of records in a table?

SELECT * FROM table1
SELECT COUNT(*) FROM table1
SELECT rows FROM sysindexes WHERE id = OBJECT_ID(table1) AND indid < 2

11. What is the difference between a Local and a Global temporary table?
1.     A local temporary table exists only for the duration of a connection or, if defined inside a compound statement, for the duration of the compound statement.
2.     A global temporary table remains in the database permanently, but the rows exist only within a given connection. When connection is closed, the data in the global temporary table disappears. However, the table definition remains with the database for access when database is opened next time.

12. What is PRIMARY KEY constraint?
A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table should have a primary key constraint to uniquely identify each row and only one primary key constraint can be created for each table. The primary key constraints are used to enforce entity integrity.

13. What is UNIQUE KEY constraint?
A UNIQUE constraint enforces the uniqueness of the values in a set of columns, so no duplicate values are entered. The unique key constraints are used to enforce entity integrity as the primary key constraints.

14. What is FOREIGN KEY constraint?
A FOREIGN KEY constraint prevents any actions that would destroy links between tables with the corresponding data values. A foreign key in one table points to a primary key in another table. Foreign keys prevent actions that would leave rows with foreign key values when there are no primary keys with that value. The foreign key constraints are used to enforce referential integrity.

15. What is CHECK constraint?
A CHECK constraint is used to limit the values that can be placed in a column. The check constraints are used to enforce domain integrity.

16. What is NOT NULL constraint?
A NOT NULL constraint enforces that the column will not accept null values. The not null constraints are used to enforce domain integrity, as the check constraints.

17. What is a table called, if it has neither Cluster nor Non-cluster Index? What is it used for?
Unindexed table or Heap. Microsoft Press Books and Book on Line (BOL) refers it as Heap. A heap is a table that does not have a clustered index and, therefore, the pages are not linked by pointers. The IAM pages are the only structures that link the pages in a table together. Unindexed tables are good for fast storing of data. Many times it is better to drop all indexes from table and then do bulk of inserts and to restore those indexes after that.

18. What is BCP? When does it used?
BulkCopy is a tool used to copy huge amount of data from tables and views. BCP does not copy the structures same as source to destination. BULK INSERT command helps to import a data file into a database table or view in a user-specified format.

19. How to implement one-to-one, one-to-many and many-to-many relationships while designing tables?
One-to-One relationship can be implemented as a single table and rarely as two tables with primary and foreign key relationships. One-to-Many relationships are implemented by splitting the data into two tables with primary key and foreign key relationships. Many-to-Many relationships are implemented using a junction table with the keys from both the tables forming the composite primary key of the junction table.

20. In SQL, what’s the difference between an inner and outer join?

Joins are used to combine the data from two tables, with the result being a new, temporary table. 
The temporary table is created based on column(s) that the two tables share, 
which represent meaningful column(s) of comparison. The goal is to extract meaningful data 
from the resulting temporary table. Joins are performed based on something called a predicate, 
which specifies the condition to use in order to perform a join. A join can be either an inner 
 join or an outer join, depending on how one wants the resulting table to look.
It is best to illustrate the differences between inner and outer joins by use of an example. 
Here we have 2 tables that we will use for our example:
Employee
Location
EmpID
EmpName
13
Jason
8
Alex
3
Ram
17
Babu
25
Johnson
EmpID
EmpLoc
13
San Jose
8
Los Angeles
3
Pune, India
17
Chennai, India
39
Bangalore, India

It’s important to note that the very last row in the Employee table does not exist in the Employee Location table. Also, the very last row in the Employee Location table does not exist in the Employee table. These facts will prove to be significant in the discussion that follows.

21. Outer Joins
Let’s start the explanation with outer joins. Outer joins can be be further divided into left outer joins, right outer joins, and full outer joins. Here is what the SQL for a left outer join would look like, using the tables above:

select * from employee left outer join location
on employee.empID = location.empID;
In this SQL we are joining on the condition that the employee ID’s match in the rows tables. So, we will be essentially combining 2 tables into 1, based on the condition that the employee ID’s match. Note that we can get rid of the "outer" in left outer join, which will give us the SQL below. This is equivalent to what we have above.
select * from employee left join location
on employee.empID = location.empID;


A left outer join retains all of the rows of the left table, regardless of whether there is a row that matches on the right table. The SQL above will give us the result set shown below.
Employee.EmpID
Employee.EmpName
Location.EmpID
Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
25
Johnson
NULL
NULL

22. The Join Predicate – a geeky term you should know
Earlier we had mentioned something called a join predicate. In the SQL above, the join predicate is "on employee.empID = location.empID". This is the heart of any type of join, because it determines what common column between the 2 tables will be used to "join" the 2 tables. As you can see from the result set, all of the rows from the left table are returned when we do a left outer join. The last row of the Employee table (which contains the "Johson" entry) is displayed in the results even though there is no matching row in the Location table. As you can see, the non-matching columns in the last row are filled with a "NULL". So, we have "NULL" as the entry wherever there is no match. 


23. A right outer join
is pretty much the same thing as a left outer join, except that the rows that are retained are from the right table. This is what the SQL looks like:

select * from employee right outer join location
on employee.empID = location.empID;


// taking out the "outer", this also works:

select * from employee right join location
on employee.empID = location.empID;


Using the tables presented above, we can show what the result set of a right outer join would look like:
Employee.EmpID
Employee.EmpName
Location.EmpID
Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India
NULL
NULL
39
Bangalore, India

We can see that the last row returned in the result set contains the row that was in the Location table, but not in the Employee table (the "Bangalore, India" entry). Because there is no matching row in the Employee table that has an employee ID of "39", we have NULL’s in the result set for the Employee columns.

24. Inner Joins
Now that we’ve gone over outer joins, we can contrast those with the inner join. The difference between an inner join and an outer join is that an inner join will return only the rows that actually match based on the join predicate. Once again, this is best illustrated via an example. Here’s what the SQL for an inner join will look like:

select * from employee inner join location on
employee.empID = location.empID


This can also be written as:

select * from employee, location
where employee.empID = location.empID


Now, here is what the result of running that SQL would look like:

Employee.EmpID
Employee.EmpName
Location.EmpID
Location.EmpLoc
13
Jason
13
San Jose
8
Alex
8
Los Angeles
3
Ram
3
Pune, India
17
Babu
17
Chennai, India


25. Inner vs Outer Joins
We can see that an inner join will only return rows in which there is a match based on the join predicate. In this case, what that means is anytime the Employee and Location table share an Employee ID, a row will be generated in the results to show the match. Looking at the original tables, one can see that those Employee ID’s that are shared by those tables are displayed in the results. But, with a left or right outer join, the result set will retain all of the rows from either the left or right table.

26. Definition of a Key
According to the SQL Standard, a key is a subset of columns in a table that allow a row to be uniquely identified. So, a key can be more than just one column. And, every row in the table will have a unique value for the key – or a unique combination of values if the key consists of more than just one column.

27. Can a key have NULL values in SQL?
According to the SQL standard, a key is not allowed to have values that are NULL-able. Any key that has more columns than necessary to uniquely identify each row in the table is called a super-key (think of it as a super-set). But, if the key has the minimum amount of columns necessary to uniquely identify each row then it is called a minimal super-key. A minimal super-key is also known as a candidate key, and there must be one or more candidate keys in a table.

28. Keys in actual RDBMS implementations
Even though the SQL standard says that a key can not be NULL, in practice actual RDBMS implementations (like SQL Server and Oracle), allow both foreign and unique keys to actually be NULL. And there are plenty of times when that actually makes sense. However, a primary key can never be NULL.

29. In SQL, what are the differences between primary, foreign, and unique keys?
The one thing that primary, unique, and foreign keys all have in common is the fact that each type of key can consist of more than just one column from a given table. In other words, foreign, primary, and unique keys are not restricted to having just one column from a given table – each type of key can cover multiple columns. So, that is one feature that all the different types of keys share – they can each be comprised of more than just one column, which is something that many people in software are not aware of.

30. What is the point of having a foreign key?
Foreign keys are used to reference unique columns in another table. So, for example, a foreign key can be defined on one table A, and it can reference some unique column(s) in another table B. Why would you want a foreign key? Well, whenever it makes sense to have a relationship between columns in two different tables.

31. Can a table have multiple unique, foreign, and/or primary keys?
A table can have multiple unique and foreign keys. However, a table can have only one primary key.

32. Can a unique key have NULL values? Can a primary key have NULL values?
Unique key columns are allowed to hold NULL values. The values in a primary key column, however, can never be NULL.

33. Can a foreign key reference a non-primary key?
Yes, a foreign key can actually reference a key that is not the primary key of a table. But, a foreign key must reference a unique key.

34. Can a foreign key contain null values?
Yes, a foreign key can hold NULL values. Because foreign keys can reference unique, non-primary keys – which can hold NULL values – this means that foreign keys can themselves hold NULL values as well.

35. Some other differences between foreign, primary, and unique keys
While unique and primary keys both enforce uniqueness on the column(s) of one table, foreign keys define a relationship between two tables. A foreign key identifies a column or group of columns in one (referencing) table that refers to a column or group of columns in another (referenced) table – in our example above, the Employee table is the referenced table and the Employee Salary table is the referencing table.

36. Provide an example and definition of a natural key in SQL.
You have probably come across the term natural key within the context of SQL and data warehouses. What exactly is a natural key? A natural key is a key composed of columns that actually have a logical relationship to other columns within a table. What does that mean in plain English? Well, let’s go through an example of a natural key.

37. Natural Key Example
Consider a table called People. If we use the columns First_Name, Last_Name, and Address together to form a key then that would be a natural key because those columns are something that are natural to people, and there is definitely a logical relationship between those columns and any other columns that may exist in the table.

38. Why is it called a natural key?
The reason it’s called a natural key is because the columns that belong to the key are just naturally a part of the table and have a relationship with other columns in the table. So, a natural key already exists within a table – and columns do not need to be added just to create an “artificial” key.

39. Natural keys versus business keys
Natural keys are often also called business keys – so both terms mean exactly the same thing.

40. Natural keys versus domain keys
Domain keys also mean the same thing as natural keys.

41. Natural keys versus surrogate keys
Natural keys are often compared to surrogate keys. What exactly is a surrogate key? Well, first consider the fact that the word surrogate literally means substitute. The reason a surrogate key is like a substitute is because it’s unnatural, in the sense that the column used for the surrogate key has no logical relationship to other columns in the table.

In other words, the surrogate key really has no business meaning – i.e., the data stored in a surrogate key has no intrinsic meaning to it.

42. What is a simple key in a dbms?
In a database table, a simple key is just a single attribute (which is just a column) that can uniquely identify a row. So, any single column in a table that can uniquely identify a row is a simple key. The reason it’s called a simple key is because of the fact that it is simple in the sense that it’s just composed of one column (as opposed to multiple columns) and that’s it.

43. Example of a simple key
Let’s go through an example of a simple key. Consider a table called Employees. If every employee has a unique ID and a column called EmployeeID, then the EmployeeID column would be considered a simple key because it’s a single column that can uniquely identify every row in the table (where each row is a separate employee). Simple isn’t it?

44. What is the definition of a secondary key?
You may have heard the term secondary key in Oracle, MySQL, SQL Server, or whatever other dbms you are dealing with. What exactly is a secondary key?
A given table may have more than just one choice for a primary key. Basically, there may be another column (or combination of columns for a multi-column primary key) that qualify as primary keys. Any combination of column(s) that may qualify to be a primary key are known as candidate keys. This is because they are considered candidates for the primary key. And the options that are not selected to be the primary key are known as secondary keys.

45. Example of a Secondary Key in SQL
Let’s go through an example of a secondary key. Consider a table called Managers that stores all of the managers in a company. Each manager has a unique Manager ID Number, a physical address, and an email address. Let’s say that the Manager ID is chosen to be the primary key of the Managers table. Both the physical address and email address could have been selected as the primary key, because they are both unique fields for every manager row in the Managers table. But, because the email address and physical address were not selected as the primary key, they are considered to be secondary keys.

46. How do database indexes work? And, how do indexes help? Provide a tutorial on database indexes.
Let’s start out our tutorial and explanation of why you would need a database index by going through a very simple example. Suppose that we have a database table called Employee with three columns – Employee_Name, Employee_Age, and Employee_Address. Assume that the Employee table has thousands of rows.

Now, let’s say that we want to run a query to find all the details of any employees who are named ‘Jesus’? So, we decide to run a simple query like this:

SELECT * FROM Employee
WHERE Employee_Name = 'Jesus'


47. What would happen without an index on the table?
Once we run that query, what exactly goes on behind the scenes to find employees who are named Jesus? Well, the database software would literally have to look at every single row in the Employee table to see if the Employee_Name for that row is ‘Jesus’. And, because we want every row with the name ‘Jesus’ inside it, we can not just stop looking once we find just one row with the name ‘Jesus’, because there could be other rows with the name Jesus. So, every row up until the last row must be searched – which means thousands of rows in this scenario will have to be examined by the database to find the rows with the name ‘Jesus’. This is what is called a full table scan.

48. How a database index can help performance
You might be thinking that doing a full table scan sounds inefficient for something so simple – shouldn’t software be smarter? It’s almost like looking through the entire table with the human eye – very slow and not at all sleek. But, as you probably guessed by the title of this article, this is where indexes can help a great deal. The whole point of having an index is to speed up search queries by essentially cutting down the number of records/rows in a table that need to be examined.

49. What is an index?
So, what is an index? Well, an index is a data structure (most commonly a B- tree) that stores the values for a specific column in a table. An index is created on a column of a table. So, the key points to remember are that an index consists of column values from one table, and that those values are stored in a data structure. The index is a data structure – remember that.

50. What kind of data structure is an index?
B- trees are the most commonly used data structures for indexes. The reason B- trees are the most popular data structure for indexes is due to the fact that they are time efficient – because look-ups, deletions, and insertions can all be done in logarithmic time. And, another major reason B- trees are more commonly used is because the data that is stored inside the B- tree can be sorted. The RDBMS typically determines which data structure is actually used for an index. But, in some scenarios with certain RDBMS’s, you can actually specify which data structure you want your database to use when you create the index itself.

51. How does a hash table index work?
Hash tables are another data structure that you may see being used as indexes – these indexes are commonly referred to as hash indexes. The reason hash indexes are used is because hash tables are extremely efficient when it comes to just looking up values. So, queries that compare for equality to a string can retrieve values very fast if they use a hash index. For instance, the query we discussed earlier (SELECT * FROM Employee WHERE Employee_Name = ‘Jesus’) could benefit from a hash index created on the Employee_Name column. The way a hash index would work is that the column value will be the key into the hash table and the actual value mapped to that key would just be a pointer to the row data in the table. Since a hash table is basically an associative array, a typical entry would look something like “Jesus => 0×28939″, where 0×28939 is a reference to the table row where Jesus is stored in memory. Looking up a value like “Jesus” in a hash table index and getting back a reference to the row in memory is obviously a lot faster than scanning the table to find all the rows with a value of “Jesus” in the Employee_Name column.

52. The disadvantages of a hash index
Hash tables are not sorted data structures, and there are many types of queries which hash indexes can not even help with. For instance, suppose you want to find out all of the employees who are less than 40 years old. How could you do that with a hash table index? Well, it’s not possible because a hash table is only good for looking up key value pairs – which means queries that check for equality (like “WHERE name = ‘Jesus’”). What is implied in the key value mapping in a hash table is the concept that the keys of a hash table are not sorted or stored in any particular order. This is why hash indexes are usually not the default type of data structure used by database indexes – because they aren’t as flexible as B- trees when used as the index data structure.

53. How does an index improve performance?
Because an index is basically a data structure that is used to store column values, looking up those values becomes much faster. And, if an index is using the most commonly used data structure type – a B- tree – then the data structure is also sorted. Having the column values be sorted can be a major performance enhancement – read on to find out why.

Let’s say that we create a B- tree index on the Employee_Name column This means that when we search for employees named “Jesus” using the SQL we showed earlier, then the entire Employee table does not have to be searched to find employees named “Jesus”. Instead, the database will use the index to find employees named Jesus, because the index will presumably be sorted alphabetically by the Employee’s name. And, because it is sorted, it means searching for a name is a lot faster because all names starting with a “J” will be right next to each other in the index! It’s also important to note that the index also stores pointers to the table row so that other column values can be retrieved – read on for more details on that.

54. What exactly is inside a database index?
So, now you know that a database index is created on a column in a table, and that the index stores the values in that specific column. But, it is important to understand that a database index does not store the values in the other columns of the same table. For example, if we create an index on the Employee_Name column, this means that the Employee_Age and Employee_Address column values are not also stored in the index. If we did just store all the other columns in the index, then it would be just like creating another copy of the entire table – which would take up way too much space and would be very inefficient.

An index also stores a pointer to the table row

55. How does a database know when to use an index?
When a query like “SELECT * FROM Employee WHERE Employee_Name = ‘Jesus’ ” is run, the database will check to see if there is an index on the column(s) being queried. Assuming the Employee_Name column does have an index created on it, the database will have to decide whether it actually makes sense to use the index to find the values being searched – because there are some scenarios where it is actually less efficient to use the database index, and more efficient just to scan the entire table.

56. Can you force the database to use an index on a query?
Generally, you will not tell the database when to actually use an index – that decision will be made by the database itself. Although it is worth noting that in most databases (like Oracle and MySQL), you can actually specify that you want the index to be used.

57. How to create an index in SQL:
Here’s what the actual SQL would look like to create an index on the Employee_Name column from our example earlier:

CREATE INDEX name_index
ON Employee (Employee_Name)


58. How to create a multi-column index in SQL
We could also create an index on two of the columns in the Employee table , as shown in this SQL:

CREATE INDEX name_index
ON Employee (Employee_Name, Employee_Age)


59. What is a good analogy for a database index?
A very good analogy is to think of a database index as an index in a book. If you have a book about dogs and you are looking for the section on Golden Retrievers, then why would you flip through the entire book – which is the equivalent of a full table scan in database terminology – when you can just go to the index at the back of the book, which will tell you the exact pages where you can find information on Golden Retrievers. Similarly, as a book index contains a page number, a database index contains a pointer to the row containing the value that you are searching for in your SQL.

60. What is the cost of having a database index?
So, what are some of the disadvantages of having a database index? Well, for one thing it takes up space – and the larger your table, the larger your index. Another performance hit with indexes is the fact that whenever you add, delete, or update rows in the corresponding table, the same operations will have to be done to your index. Remember that an index needs to contain the same up to the minute data as whatever is in the table column(s) that the index covers.

61. What is a self join? Explain it with an example and tutorial.
Let’s illustrate the need for a self join with an example. Suppose we have the following table – that is called employee. The employee table has 2 columns – one for the employee name (called employee_name), and one for the employee location (called employee_location):

employee
employee_name
employee_location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York


Now, suppose we want to find out which employees are from the same location as the employee named Joe. In this example, that location would be New York. Let’s assume – for the sake of our example – that we can not just directly search the table for people who live in New York with a simple query like this (maybe because we don’t want to hardcode the city name) in the SQL query:

SELECT employee_name
FROM employee
WHERE employee_location = "New York"


So, instead of a query like that what we could do is write a nested SQL query (basically a query within another query – which more commonly called a subquery) like this:

SELECT employee_name
FROM employee
WHERE employee_location in
( SELECT employee_location
FROM employee
WHERE employee_name = "Joe" )


Using a subquery for such a simple question is inefficient. Is there a more efficient and elegant solution to this problem?

It turns out that there is a more efficient solution – we can use something called a self join. A self join is basically when a table is joined to itself. The way you should visualize a self join for a given table is by imagining that a join is performed between two identical copies of that table. And that is exactly why it is called a self join – because of the fact that it’s just the same table being joined to another copy of itself rather than being joined with a different table.

62. How does a self join work
Before we come up with a solution for this problem using a self join, we should go over some concepts so that you can fully understand how a self join works. This will also make the SQL in our self join tutorial a lot easier to understand, which you will see further below.

A self join must have aliases

In a self join we are joining the same table to itself by essentially creating two copies of that table. But, how do we distinguish between the two different copies of the table – because there is only one table name after all? Well, when we do a self join, the table names absolutely must use aliases otherwise the column names would be ambiguous. In other words, we would not know which table’s columns are being referenced without using aliases for the two copies of the table. If you don’t already know what an alias is, it’s simply another name given to a table, and that name is then used in the SQL query to reference the table. So, we will just use the aliases e1 and e2 for the employee table when we do a self join.

63. Self join predicate
As with any join there must be a condition upon which a self join is performed – we can not just arbitrarily say “do a self join”, without specifying some condition. That condition will be our join predicate.

Now, let’s come up with a solution to the original problem using a self join instead of a subquery. This will help illustrate how exactly a self join works. The key question that we must ask ourselves is what should our join predicate be in this example? Well, we want to find all the employees who have the same location as Joe.

Because we want to match between our two tables (both of which are the same table – employee – aliased as e1 and e2) on location our join predicate should clearly be “WHERE e1.employee_location = e2.employee_location”. But is that enough to give us what we want? No, it’s not, because we also want to filter the rows returned since we only want people who are from the same location as Joe.

So, how can we filter the rows returned so that only people from Joe’s location are returned? Well, what we can do is simply add a condition on one of the tables (e2 in our example) so that it only returns the row where the name is Joe. Then, the other table (e1) will match up all the names that have the same location in e2, because of our join predicate – which is “WHERE e1.employee_location = e2.employee_location”. We will then just select the names from e1, and not e2 because e2 will only have Joe’s name. If that’s confusing then keep reading further to understand more about how the query will work.

So, the self join query that we come up with looks like this:
Self Join SQL Example

SELECT e1.employee_name
FROM employee e1, employee e2
WHERE e1.employee_location = e2.employee_location
AND e2.employee_name="Joe";


This query will return the names Joe and Jack – since Jack is the only other person who lives in New York like Joe.

Generally, queries that refer to the same table can be greatly simplified by re-writing the queries as self joins. And, there is definitely a performance benefit for this as well.

64. What does a self join look like?
It will help tremendously to actually visualize the actual results of a self join internally. Remember that a self join is just like any other join, where the two tables are merged into one temporary table. First off, you should visualize that we have two separate copies of the employee table, which are given aliases of e1 and e2. These copies would simply look like this – note that we shortened the column names from employee_name and employee_location to just Name and Location for convenience:

e1
e2
Name
Location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York
Name
Location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York


And the final results of running the self join query above – the actual joined table – would look like this:


e1.employee_name
e1.employee_location
e2.employee_name
e2.employee_location
Joe
New York
Joe
New York
Jack
New York
Joe
New York

65. Self joins versus inner joins
Are self joins and inner joins the same? You might be wondering if all self joins are also inner joins. After all, in our example above our self join uses an inner join because only the rows that match based on the join predicate are returned – non-matching rows are not returned. Well, it turns out that a self join and inner join are completely different concepts. A self join could just as well be an outer join or an inner join – it just depends on how the query is written. We could easily change the query we used above to do a LEFT OUTER JOIN – while the query still remains a self join – but that wouldn’t give us the results we want in our example. So, we use an implied inner join instead because that gives us the correct results. Remember that a query is a self join as long as the two tables being joined are exactly the same table, but whether it’s an inner join or outer join depends on what is specified in the SQL. And, inner/outer joins are separate concepts entirely from a self join.

Self joins manager employee example

The most commonly used example for self joins is the classic employee manager table. The table is called Employee, but holds all employees – including their managers. Every employee has an ID, and there is also a column for the manager ID. So, for example, let’s say we have a table that looks like this – and we call it Employee:

EmployeeID
Name
ManagerID
1
Sam
10
2
Harry
4
4
Manager
NULL
10
AnotherManager
NULL


Notice that in the table above there are two managers, conveniently named “Manager” and “AnotherManager”. And, those managers don’t have managers of their own – as noted by the NULL value in their Manager column.

Now, given the table above, how can we return results that will show each employee’s name, and his/her manager’s name in nicely arranged results – with the employee in one column and his/her manager’s name in the other column. Well, it turns out we can use a self join to do this. Try to come up with the SQL on your own before reading our answer.

Self join manager employee answer

In order to come up with a correct answer for this problem, our goal should be to perform a self join that will have both the employee information and manager information in one row. First off, since we are doing a self join, it helps to visualize the one table as two tables – let’s give them aliases of e1 and e2. Now, with that in mind, we want the employee’s information on one side of the joined table and the manager’s information on the other side of the joined table. So, let’s just say that we want e1 to hold the employee information and e2 to hold the corresponding manager’s information. What should our join predicate be in that case?

Well, the join predicate should look like “ON e1.ManagerID = e2.EmployeeID” – this basically says that we should join the two tables (a self join) based on the condition that the manager ID in e1 is equal to the employee ID in e2. In other words, an employee’s manager in e1 should have the manager’s information in e2. An illustration will help clarify this. Suppose we use that predicate and just select everything after we join the tables. So, our SQL would look like this:

SELECT *
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID

Now, remember that we only want to return the names of the employee and corresponding manager as a pair. So, we can fine-tune the SQL as follows:

SELECT e1.Name, e2.Name
FROM Employee e1
INNER JOIN Employee e2
ON e1.ManagerID = e2.EmployeeID

Running the SQL above would return:

Sam AnotherManager
Harry Manager

66. How would you retrieve the unique values for the employee_location without using the DISTINCT keyword?

employee
employee_name
employee_location
Joe
New York
Sunil
India
Alex
Russia
Albert
Canada
Jack
New York
Alex
Russia

We can actually accomplish this with the GROUP BY keyword. Here’s what the SQL would look like:

SELECT employee_location from employee
GROUP BY employee_location


Running this query will return the following results:

employee_location
New York
India
Russia
Canada

So, you can see that the duplicate values for "Russia" and "Canada" are not returned in the results.

This is a valid alternative to using the DISTINCT keyword.


It became longer than expected and even now this post doesn't contain all important questions related to SQL understanding. In any case keeping this information in mind will give You more confidence and knowledge about databases and SQL itself.