Data Naming Conventions
Naming conventions for database, table, column, star schema, indexes, Azure, and PowerBI.
Following the AdventureWorks
database sample from Microsoft:
Database naming
databasename
- Lower case
- No spaces/dashes/underscores
Tables, columns, schema
TableName
orColumnName
- Use column [Name] if they are reserved keywords
- Or use friendlier columns like
ProductName
- UpperCamelCase
- No spaces
- Singular
Views
- Same as Tables and
- Use the prefix lower
v
followed by ViewName likevGetAllCategories
Stored Procedures
- Same as Tables and
- Use the prefix lower
usp
(user stored procedure) followed by a Name, likeuspLogError
Keys
- Primary/Foregin key:
TableName + ID
such as CustomerID
Indexes
- Clustered:
PK_TableName_ColumnName
- Non-Clustered:
IX_TableName_ColumnName
T-SQL Cheat Sheet (ChatGPT generated)
ChatGPT can be contradictory. The examples generated for tables used plural as in Customers
. However, when I asked if I should use singular or plural when creating tables, it responded to use singular Customer
.
Most common DDL and DML
CREATE TABLE Customer (
CustomerID INT IDENTITY(1, 1) PRIMARY KEY NOT NULL,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Email VARCHAR(100)
);
ALTER TABLE Customer
ADD PhoneNumber VARCHAR(20);
INSERT INTO Customer (CustomerID, FirstName, LastName, Email)
VALUES (1, 'John', 'Doe', 'johndoe@example.com');
UPDATE Customer
SET Email = 'johndoe123@example.com'
WHERE CustomerID = 1;
DELETE FROM Customer
WHERE CustomerID = 1;
Clustered/Non-Clustered indexes
ChatGPT’s naming convention here was “whatever” :)
This will create a clustered index called “idx_CustomerID” for Customers
CREATE CLUSTERED INDEX IX_Customer_CustomerID ON Customer (CustomerID);
This will create a non-clustered index called “idx_LastName” on the “LastName” column of the “Customers” table. Note that a table can have multiple non-clustered indexes.
CREATE NONCLUSTERED INDEX IX_Customer_LastName ON Customer (LastName);
MERGE
Suppose you have two tables, “Sales” and “SalesStaging”, that contain information about sales transactions. You want to update the “Sales” table with any new or updated transactions from the “SalesStaging” table. You can use the MERGE statement to perform this operation.
In this example, the MERGE statement merges the “SalesStaging” table into the “Sales” table based on the matching “SaleID” columns. If a matching row is found in both tables, the “SaleAmount” and “SaleDate” columns in the “Sales” table are updated to match the values in the “SalesStaging” table. If a matching row is not found in the “Sales” table, a new row is inserted with the values from the “SalesStaging” table.
MERGE Sales AS target
USING SalesStaging AS source
ON target.SaleID = source.SaleID
WHEN MATCHED THEN
UPDATE SET
target.SaleAmount = source.SaleAmount,
target.SaleDate = source.SaleDate
WHEN NOT MATCHED BY TARGET THEN
INSERT (SaleID, SaleAmount, SaleDate)
VALUES (source.SaleID, source.SaleAmount, source.SaleDate);
UPSERT
Suppose you have a table named “Customers” with the following columns: “CustomerID”, “FirstName”, “LastName”, and “Email”. You want to insert a new row into the table, or update an existing row if a row with the same “CustomerID” already exists.
Here’s an example UPSERT statement using the MERGE statement:
In this example, the MERGE statement attempts to match the “CustomerID” value of the new row (1, ‘John’, ‘Doe’, ‘johndoe@example.com’) with an existing row in the “Customers” table. If a match is found, the existing row is updated with the new values for “FirstName”, “LastName”, and “Email”. If no match is found, a new row is inserted with the values from the new row.
Note that in this example, the new row is specified using a VALUES clause inside the USING clause. In practice, you may use a subquery, a table-valued parameter, or another method to provide the new row values to the MERGE statement. Also note that the MERGE statement is available starting from SQL Server 2008.
MERGE Customers AS target
USING (VALUES (1, 'John', 'Doe', 'johndoe@example.com')) AS source (CustomerID, FirstName, LastName, Email)
ON target.CustomerID = source.CustomerID
WHEN MATCHED THEN
UPDATE SET
target.FirstName = source.FirstName,
target.LastName = source.LastName,
target.Email = source.Email
WHEN NOT MATCHED BY TARGET THEN
INSERT (CustomerID, FirstName, LastName, Email)
VALUES (source.CustomerID, source.FirstName, source.LastName, source.Email);
Identify duplicate rows
Suppose you have a table named “Orders” with columns “OrderID”, “CustomerID”, and “OrderDate”. You want to find all duplicate rows in the table based on the “OrderID” column.
In this query, the GROUP BY clause is used to group the rows in the “Orders” table by the “OrderID” column. The COUNT(*) function is used to count the number of rows in each group, and the HAVING clause is used to filter out any groups with only one row. The result set shows the “OrderID” values that appear more than once in the table, along with the number of duplicates for each value.
SELECT OrderID, COUNT(*) AS NumDuplicates
FROM Orders
GROUP BY OrderID
HAVING COUNT(*) > 1;
If you also want to see the actual rows that are duplicated, you can use a subquery like this:
In this query, the subquery in the WHERE clause identifies the “OrderID” values that appear more than once in the “Orders” table. The outer query then returns all rows from the “Orders” table that have one of those “OrderID” values. The result set shows all rows that are duplicated in the table.
SELECT *
FROM Orders
WHERE OrderID IN (
SELECT OrderID
FROM Orders
GROUP BY OrderID
HAVING COUNT(*) > 1
);
Remove duplicates
Suppose you have a table named “Orders” with columns “OrderID”, “CustomerID”, and “OrderDate”. You want to remove any duplicate rows from the table based on the “OrderID” column.
In this query, a common table expression (CTE) is used to assign a row number to each row based on the “OrderID” column, with duplicates receiving the same row number. The PARTITION BY clause specifies the “OrderID” column to partition the result set into groups, and the ORDER BY clause ensures that the rows within each group are sorted in a consistent manner. The ROW_NUMBER() function is used to generate the row numbers.
The DELETE statement then removes any rows from the CTE where the row number is greater than 1, effectively deleting all but one instance of each duplicated “OrderID”. The result is a table with no duplicate “OrderID” values.
WITH CTE AS (
SELECT OrderID,
ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY OrderID) AS RowNum
FROM Orders
)
DELETE FROM CTE WHERE RowNum > 1;
UNPIVOT
How to use the UNPIVOT operator in T-SQL to transform columns into rows:
Suppose you have a table named “Sales” with columns “Month”, “ProductA”, “ProductB”, and “ProductC”. The “ProductA”, “ProductB”, and “ProductC” columns contain the sales figures for each product in each month. You want to transform this table into a “normalized” form with three columns: “Month”, “Product”, and “Sales”.
In this query, the UNPIVOT operator is used to transform the “ProductA”, “ProductB”, and “ProductC” columns into rows, with the sales figures in a new “Sales” column. The “FOR Product IN” clause specifies the names of the columns to unpivot, and the AS keyword is used to give the unpivoted result set an alias of “unpvt”.
The result set shows the sales figures for each product in each month, with one row per product per month. The “Product” column identifies which product the sales figure belongs to.
SELECT Month, Product, Sales
FROM Sales
UNPIVOT (
Sales FOR Product IN (ProductA, ProductB, ProductC)
) AS unpvt;
Other T-SQL tips
nvarchar
instead ofvarchar
- insert values with
N
prefix likeN'Thor'
- insert values with
datetime
- Specify
null
ornot null
- End all queries with semicolon
- Separate queries in batches using
GO
- Use keyword
INTO
inINSERT INTO
even though Microsoft docs show with and without.