Data Engineering Journey - no. 03
Table of contents:
- What have you learned?
- What were 2-3 interesting points?
- What were 2-3 points you didn’t understand?
- Where does this skill or best practice fit?
- Conclusion
1. What have you learned?
Below is a list of SQL queries I used while querying the database:
How manyt tables is in database:
SELECT COUNT(*) AS LiczbaTabel
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE';
Exercise Simple SQL Queries
Get all columns from the tables Customers, Orders and Suppliers:
SELECT * FROM Customers;
SELECT * FROM Orders;
SELECT * FROM Suppliers;
Get all Customers alphabetically, by Country and name:
SELECT *
FROM Customers
ORDER BY Country, ContactName;
Get all orders by date:
SELECT *
FROM Orders
ORDER BY OrderDate;
Get the count of all Orders made during 1997
SELECT COUNT(*) AS [Number of Orders During 1997]
FROM Orders
WHERE YEAR(OrderDate) = 1997
Get all orders by date:
SELECT *
FROM Orders
ORDER BY OrderDate;
Get the names of all the contact persons where the person is a manager, alphabetically
SELECT ContactName
FROM Customers
WHERE ContactTitle LIKE '%Manager%'
ORDER BY ContactName;
Create a report for all the orders of 1996 and their Custome
SELECT *
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE YEAR(o.OrderDate) = '1996'
Create a report that shows the number of employees and customers from each city that has employees in it
SELECT e.City AS City, COUNT(DISTINCT e.EmployeeID) AS [Number of Employees], COUNT(DISTINCT c.CustomerID) AS [Number of Customers]
FROM Employees e
LEFT JOIN Customers c ON e.City = c.City
GROUP BY e.City
ORDER BY City;
Create a report that shows the number of employees and customers from each city that has customers in it
SELECT c.City AS City, COUNT(DISTINCT c.CustomerID) AS [Number of Customers], COUNT(DISTINCT e.EmployeeID) AS [Number of Employees]
FROM Employees e
RIGHT JOIN Customers c ON e.City = c.City
GROUP BY c.City
ORDER BY City;
Create a report that shows the number of employees and customers from each city
SELECT
e.City,
c.City,
COUNT(DISTINCT e.EmployeeID) AS [Number of Employees],
COUNT(DISTINCT c.CustomerID) AS [Number of Customers]
FROM Employees e
FULL JOIN Customers c ON e.City = c.City
GROUP BY e.City, c.City
ORDER BY e.City;
SELECT
ISNULL (e.City, c.City) AS [City],
COUNT(DISTINCT e.EmployeeID) AS [Number of Employees],
COUNT(DISTINCT c.CustomerID) AS [Number of Customers]
FROM Employees e FULL JOIN Customers c ON
e.City = c.City
GROUP BY e.City, c.City
ORDER BY [City];
Create a report that shows the order ids and the associated employee names for orders that shipped after the required date
SELECT o.OrderID, e.LastName, e.FirstName
FROM Orders o
JOIN Employees e ON o.EmployeeID = e.EmployeeID
AND o.ShippedDate > o.RequiredDate;
Create a report that shows the total quantity of products (from the Order_Details table) rdered. Only show records for products for which the quantity ordered is fewer than 200
SELECT o.ProductID, p.ProductName, SUM(o.Quantity) AS [Total Quantity]
FROM [Order Details] o
JOIN Products p ON p.ProductID = o.ProductID
GROUP BY o.ProductID, p.ProductName
HAVING SUM(o.Quantity) < 200
ORDER BY [Total Quantity] DESC;
Create a report that shows the total number of orders by Customer since December 31, 1996. The report should only return rows for which the total number of orders is greater than 15
SELECT CustomerID, COUNT(OrderID) AS [Total Number of Orders]
FROM Orders
WHERE OrderDate > '1996-12-31'
GROUP BY CustomerID
HAVING COUNT(OrderID) > 15
ORDER BY [Total Number of Orders]
Find the 10 top selling products
SELECT Products.ProductName, SUM([Order Details].UnitPrice * [Order Details].Quantity * (1.0- [Order Details].Discount)) AS [Sales]
FROM Products
INNER JOIN [Order Details]
ON [Order Details].ProductID = Products.ProductID
GROUP BY Products.ProductName
ORDER BY [Sales] DESC;
GO
Create a view with total revenues per customer
DROP VIEW IF EXISTS [Total Revenues Per Customer];
GO
CREATE VIEW [Total Revenues Per Customer] AS
SELECT c.CustomerID, c.ContactName, ISNULL(CAST(CONVERT(money, SUM(od.UnitPrice * od.Quantity * (1.0-od.Discount)*100)/100) AS DECIMAL(11,2)),0) AS [Revenue]
FROM Customers c
FULL JOIN Orders o ON c.CustomerID = o.CustomerID
FULL JOIN [Order Details] od ON od.OrderID = o.OrderID
GROUP BY c.CustomerID, c.ContactName;
GO
SELECT *
FROM [Total Revenues Per Customer]
ORDER BY Revenue DESC;
GO
Which UK Customers have payed us more than 1000 dollars
SELECT Customers.ContactName, CONVERT(money,SUM([Order Details].UnitPrice * [Order Details].Quantity * (1.0- [Order Details].Discount)*100)/100) AS [Payments]
FROM Customers
INNER JOIN Orders ON Orders.CustomerID = Customers.CustomerID
INNER JOIN [Order Details] ON [Order Details].OrderID = Orders.OrderID
WHERE Customers.Country = 'UK'
GROUP BY Customers.ContactName
HAVING SUM([Order Details].UnitPrice * [Order Details].Quantity * (1.0- [Order Details].Discount)) > 1000;
How much has each customer payed in total and how much in 1997
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Country, ISNULL(SUM([Order Subtotals].Subtotal), 0) AS [Customer Total], ISNULL(SUM(CONVERT(money, [1997].Payments/100)*100),0) AS [1997]
FROM Customers
LEFT JOIN Orders ON Orders.CustomerID = Customers.CustomerID
LEFT JOIN [Order Details] ON [Order Details].OrderID = Orders.OrderID
LEFT JOIN [Order Subtotals] ON [Order Subtotals].OrderID = Orders.OrderID
LEFT JOIN (SELECT Customers.CustomerID, Customers.CompanyName, Customers.Country, ([Order Details].UnitPrice * [Order Details].Quantity * (1.0- [Order Details].Discount)) AS [Payments]
FROM Customers
INNER JOIN Orders ON Orders.CustomerID = Customers.CustomerID
INNER JOIN [Order Details] ON [Order Details].OrderID = Orders.OrderID
WHERE YEAR(Orders.OrderDate) = '1997') AS [1997]
ON [1997].CustomerID = Customers.CustomerID
GROUP BY Customers.CustomerID, Customers.CompanyName, Customers.Country
ORDER BY [Customer Total]
2. What were 2-3 interesting points?
There are two useful kinds of JOIN-s that we can use to solve queries of medium difficulty. These are called Self JOIN
and CROSS JOIN
.
CROSS JOINS
tend to generate very large tables, so they are usually costly. However, they virtually allow us to iterate over two tables directly with SQL.
In general, we should try to avoid them, but it’s good to know they exist in case we need them.
- Use an inner join to combine rows from two tables based on a shared column, and you only want to include rows where the data in the join column matches in both tables.
- Use a full outer join to combine all rows from two tables, regardless of whether there is a match in the join column.
- Use a left outer join to combine all rows from the left table and matching rows from the right table.
- Use a right outer join to combine all rows from the right table and matching rows from the left table.
3. What were 2-3 points you didn’t understand?
I understood everything; for me, this is a review of the material and the basics to reinforce and refresh the information I already had before.
4. Where does this skill or best practice fit?
This skill and best practice fit into the domain of database management and data engineering, particularly in the following areas:
- Data Querying and Analysis: These practices are essential for retrieving, analyzing, and reporting data efficiently. Mastering SQL is a core competency for roles such as data analysts, data engineers, and database administrators.
- ETL Processes (Extract, Transform, Load): SQL is a crucial tool for handling transformations in ETL pipelines. Writing efficient queries ensures smooth data extraction and integration into the target systems.
- Database Optimization: Understanding and using different types of joins, filters, and aggregation functions is key to optimizing database performance and ensuring scalability.
- Reporting and Business Intelligence: SQL queries, such as those retrieving total revenues or identifying trends, form the backbone of dashboards and business intelligence reports.
- Data Engineering Projects: These practices are foundational for constructing data pipelines, ensuring data quality, and enabling downstream analytical tasks.
- Best Practices for Collaboration: Writing clear, optimized, and reusable SQL queries is a best practice for teams working together on complex data projects. In summary, these skills align with roles in data-driven decision-making, enabling efficient data workflows, and supporting broader business objectives
Conclusion:
The queries presented above are both the starting point and the fundamental queries that can be used for querying and performing operations on the database.
My site is free of ads and trackers. Was this post helpful to you? Why not
Reference:
Disqus is great for comments/feedback but I had no idea it came with these gaudy ads.