Posted On: 2025-September-09
10 Minutes Read
Author: jack frost
So, you've mastered the basics of SQL – SELECT, FROM, WHERE, JOIN – and you're ready to take the next step. That's fantastic! This article is your guide to **Intermediate SQL**, designed to equip you with the skills you need to impress employers, tackle complex data challenges, and truly unlock the power of databases. We'll cover essential topics, including window functions and more, with a focus on practical application and clear explanations.
Window functions are a game-changer in SQL. They allow you to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions (like SUM, AVG, COUNT), which collapse rows into a single result, window functions retain the original row's granularity while providing additional insights.
Imagine you want to calculate the moving average of sales over the past three months. An aggregate function could only give you the average sales across *all* months. A window function, however, would let you calculate the average for each month, considering the preceding two months.
Let's say you have a table called `sales` with columns `sale_date`, `salesperson`, and `amount`.
To calculate the running total of sales by salesperson:
SELECT
sale_date,
salesperson,
amount,
SUM(amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total
FROM
sales;
This query partitions the data by `salesperson` and orders it by `sale_date`. The `SUM(amount) OVER (...)` calculates the cumulative sum of the `amount` for each salesperson as the `sale_date` progresses.
To identify the top 3 salespeople each month:
WITH RankedSales AS (
SELECT
EXTRACT(MONTH FROM sale_date) AS sale_month,
salesperson,
SUM(amount) AS total_sales,
RANK() OVER (PARTITION BY EXTRACT(MONTH FROM sale_date) ORDER BY SUM(amount) DESC) AS sales_rank
FROM
sales
GROUP BY
EXTRACT(MONTH FROM sale_date),
salesperson
)
SELECT
sale_month,
salesperson,
total_sales
FROM
RankedSales
WHERE
sales_rank <= 3;
This example uses a CTE (Common Table Expression, explained later) to first rank salespeople by total sales within each month, and then selects only the top 3 from each month.
To calculate the difference in sales between consecutive days for each salesperson:
SELECT
sale_date,
salesperson,
amount,
amount - LAG(amount, 1, 0) OVER (PARTITION BY salesperson ORDER BY sale_date) AS sales_difference
FROM
sales;
This query uses `LAG()` to access the previous day's sales amount. The `1` specifies that we want to look back one row, and the `0` provides a default value for the first day, where there's no previous day to compare to.
Subqueries are queries nested inside another query. They're powerful for filtering and retrieving data based on the results of another query.
Find all customers who placed orders with amounts greater than the average order amount:
SELECT
customer_id,
order_id,
amount
FROM
orders
WHERE
amount > (SELECT AVG(amount) FROM orders);
The subquery calculates the average order amount only once, and the outer query uses this result to filter the orders.
Find all orders placed by customers who have placed at least two orders:
SELECT
customer_id,
order_id,
amount
FROM
orders AS o1
WHERE EXISTS (
SELECT 1
FROM orders AS o2
WHERE o2.customer_id = o1.customer_id
GROUP BY o2.customer_id
HAVING COUNT(*) >= 2
);
The subquery is executed for each row in the outer `orders` table (aliased as `o1`). It checks if the customer associated with the current row has at least two orders in the `orders` table (aliased as `o2`).
Correlated subqueries can be slow, especially with large datasets, because they are executed for each row of the outer query. Optimizing correlated subqueries often involves rewriting them using JOINs or CTEs. Non-correlated subqueries are generally faster because they are executed only once.
CTEs are temporary, named result sets that you can define within a single query. They make complex queries more readable and maintainable by breaking them down into smaller, logical units.
Let's revisit the example of identifying top performers each month. Using a CTE, we make the code more readable:
WITH MonthlySales AS (
SELECT
EXTRACT(MONTH FROM sale_date) AS sale_month,
salesperson,
SUM(amount) AS total_sales
FROM
sales
GROUP BY
EXTRACT(MONTH FROM sale_date),
salesperson
),
RankedSales AS (
SELECT
sale_month,
salesperson,
total_sales,
RANK() OVER (PARTITION BY sale_month ORDER BY total_sales DESC) AS sales_rank
FROM
MonthlySales
)
SELECT
sale_month,
salesperson,
total_sales
FROM
RankedSales
WHERE
sales_rank <= 3;
This example breaks the query into two CTEs: `MonthlySales` calculates the total sales for each salesperson each month, and `RankedSales` ranks the salespeople within each month based on their sales. The final SELECT statement retrieves the top 3 salespeople.
Recursive CTEs are used to traverse hierarchical data. Imagine an `employees` table with `employee_id`, `employee_name`, and `manager_id` columns. A recursive CTE can be used to find all employees reporting to a specific manager, directly or indirectly.
WITH RECURSIVE EmployeeHierarchy AS (
SELECT
employee_id,
employee_name,
manager_id,
1 AS level
FROM
employees
WHERE
manager_id IS NULL -- Start with the top-level manager
UNION ALL
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
eh.level + 1
FROM
employees AS e
JOIN
EmployeeHierarchy AS eh ON e.manager_id = eh.employee_id
)
SELECT
employee_id,
employee_name,
level
FROM
EmployeeHierarchy
ORDER BY
level;
This CTE starts with employees who have no manager (the top-level managers). Then, it recursively joins the `employees` table to the `EmployeeHierarchy` CTE to find employees reporting to those managers, and so on, until all employees have been added to the hierarchy.
Beyond the basics of INNER, LEFT, and RIGHT JOIN, there are other JOIN types that offer more flexibility in retrieving data from multiple tables.
A self-join joins a table to itself. This is useful when you need to compare rows within the same table, such as finding employees who earn more than their manager.
Suppose you have a `customers` table and an `orders` table. To get all customers and all orders, regardless of whether they have a match, you would use a FULL OUTER JOIN:
SELECT
c.customer_id,
c.customer_name,
o.order_id,
o.order_date
FROM
customers AS c
FULL OUTER JOIN
orders AS o ON c.customer_id = o.customer_id;
To generate all possible combinations of products and colors, you could use a CROSS JOIN. Let's assume you have `products` and `colors` tables.
SELECT
p.product_name,
c.color_name
FROM
products AS p
CROSS JOIN
colors AS c;
To find employees who earn more than their manager (assuming an `employees` table with `employee_id`, `employee_name`, `salary`, and `manager_id` columns):
SELECT
e.employee_name AS employee,
m.employee_name AS manager
FROM
employees AS e
JOIN
employees AS m ON e.manager_id = m.employee_id
WHERE
e.salary > m.salary;
Pivoting transforms rows into columns, while unpivoting does the opposite, transforming columns into rows. These techniques are useful for reshaping data to make it easier to analyze and present.
Unfortunately, standard SQL doesn't have universally supported PIVOT and UNPIVOT commands. The implementation depends on the database system (e.g., SQL Server, Oracle). However, the logic is generally achieved using CASE statements or other conditional aggregation techniques.
Suppose you have a `sales` table with `product_category`, `month`, and `sales_amount` columns. To pivot the data to show sales amounts for each month as columns, you could use a CASE statement:
SELECT
product_category,
SUM(CASE WHEN month = 'January' THEN sales_amount ELSE 0 END) AS January,
SUM(CASE WHEN month = 'February' THEN sales_amount ELSE 0 END) AS February,
SUM(CASE WHEN month = 'March' THEN sales_amount ELSE 0 END) AS March
FROM
sales
GROUP BY
product_category;
To unpivot the pivoted data back into its original format, you can use UNION ALL:
SELECT
product_category,
'January' AS month,
January AS sales_amount
FROM
PivotedSales
UNION ALL
SELECT
product_category,
'February' AS month,
February AS sales_amount
FROM
PivotedSales
UNION ALL
SELECT
product_category,
'March' AS month,
March AS sales_amount
FROM
PivotedSales;
Transactions are a sequence of database operations treated as a single logical unit of work. If any operation fails, the entire transaction is rolled back, ensuring data consistency.
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT; -- Or ROLLBACK;
If an error occurs between the UPDATE statements, you would use `ROLLBACK` to undo the changes made by the first UPDATE, maintaining data integrity.
Concurrency control mechanisms, like locking, prevent multiple transactions from interfering with each other. Databases typically use various types of locks (e.g., shared locks, exclusive locks) to manage access to data.
Mastering **Intermediate SQL** is crucial for any aspiring data professional. You've now gained valuable insights into window functions, subqueries, CTEs, advanced JOINs, pivoting/unpivoting, and transactions. These skills will allow you to write more efficient, maintainable, and powerful SQL queries. Keep practicing, keep experimenting, and you'll be well on your way to becoming a SQL expert. Remember that even **intermediate SQL, i.e., window functions and more**, builds upon fundamental knowledge, so regular practice will solidify your understanding.
copyright © 2026. thehyperanalytics.com