/ SQL Server

4. How Indexes can help you live a better life (SQL Fundamentals)

The series so far:

  1. SQL Server Fundamentals - Part 1: Life of Databases
  2. SQL Server Fundamentals - Part 2: How Introduction to Tables in Databases
    can keep you out of trouble
  3. SQL Server Fundamentals - Part 3: Working with data in the table
  4. SQL Server Fundamentals - Part 4: How Indexes can help you live a better life
  5. SQL Server Fundamentals - Part 5: Non Relational Data Types in 2 Easy Steps

Introduction

Index in book or telephone directory. The index is the content itself here, and it's easy to find things. We might have an index at the end of the book to have a look at the contents of the book. It assists us to go through secondary pages and find data quickly.
Data stored as pages in the disk. We look to reduce the number of pages to be read to get to the desired record.

Clustered Index

Create a Clustered index on Product Id column with name "idx_ProductID", and it will physically organise the page in the disk in the order of ProductID. We could form a composite index on multiple columns.

CREATE CLUSTERED INDEX idx_ProductID
ON Sales.Product (ProductID);

Query data with Clustered Index will have an index page, and we can look into index page and navigate like the tree and reduces the time.
We can have only one clustered index per table because there can be one physical ordering.

Summary

  1. Clustered Index determines the order in which we store the rows.
  2. We can create only one clustered index per table
  3. A Table without the clustered index is a heap

Non-Clustered Index

We can create multiple non-clustered indexes. We are building a non-clustered index on name column. What happens here is we have an index page based on the name. e.g. all A's to the left of the index page and M's on the right of an index page. After reaching the page of the desired name, it points to the index page of the clustered index.

CREATE NONCLUSTERED INDEX idx_Name
ON Sales.Product (Name);

When we have a clustered index, and we create a non-clustered index, it points to the pages in the clustered index and then gets to record we need.
In case of the missing clustered index, it utilises the row id.

Summary

  1. Non-Clustered Index stores pointers to the row id of the heap or clustered key of a clustered index.
  2. We can create multiple clustered index per table
  3. A Table without the clustered index is a heap

Performance Tips

See how many pages are read by enabling statistics

-- Set stats so that we can see how many pages read
SET STATISTICS IO ON

Also, we can see how SQL query engine is performing by using Execution Plan.

Without any index

Execution Plan for a simple select query with where clause without any index (a heap) shows using Table Scan Operation, scanned rows to get the record.
We see the statistics about the number of rows read in the messages window.
Scan count, logical reads, physical reads used to know our query which indicates logical reads equals the number of rows scanned.

With Clustered index

Execution Plan for a simple select query with where clause (on clustered index column) with clustered index shows using Clustered Index Seek Operation, it is using clustered index to get back the data.

With Clustered index on other columns

Execution Plan for a simple select query with where clause (on a column other than clustered index column) with clustered index shows using Clustered Index Scan Operation, it also scans clustered indexes to get the record.

With Non Clustered index

Creating a Non-Clustered index for above case leads to Index Seek (NonClustered)

With Non Clustered index including non-indexed field

Creating a Non-Clustered index for above case leads to Index Seek (NonClustered) for filtering on where clause and for select of non-indexed fields it uses Key Lookup (Clustered) to find the desired row.
It takes advantage of both the indexes.

Column Store Index

The particular type of index

ProductID Name Price
1 A 10
2 B 20
3 C 30

Product Table


Data is stored in pages Page 1 and Page 2 as below:

ProductID Name Price
1 A 10
2 B 20

Page 1



ProductID Name Price
3 C 30

Page 2

Instead of storing the pages of rows. We save the index of segments or columns instead of RowGroups. Advantages like compression and in memory and we can query them.
Similar syntax and provides the ability to create for entire columns.

CREATE CLUSTERED COLUMNSTORE INDEX idx_Product
ON Sales.Product;

The similar syntax for Non-Clustered ColumnStore index

CREATE NONCLUSTERED COLUMNSTORE INDEX idx_Name
ON Sales.Product(ProductID, Name);

Columnstore indexes:

  • Stored in memory
  • Store data by column instead of by row and allows us to have good compression for same data in columns.
  • Use compression to optimise memory usage and performance
  • Columnstore indexes can be clustered or non-clustered
  • Clustered column store indexes can include all columns
  • Only one clustered column store index per table

Without Indexes

Let's consider a scenario with select by joining two tables with where clause without any indexes. It performs two Table Scan since we have no indexes.
Hash Match to join them together.
After creating clustered column store index on all the columns.

With ColumnStore Index on one table

Uses Table Scan and ColumnStore Index Scan
We see Segment Reads in Statistics.

With ColumnStore Index on both tables

Let's consider the scenario of the select query with the join on two tables with where clause without any indexes. It performs two ColumnStore Index Scan. We see Segment Reads in Statistics.

Memory-Optimized Tables and Native Stored Procedures

We need to add the filegroup with MEMORY_OPTIMIZED_DATA.
Let's create a table with primary table, and because it is in the memory table, we are using option non clustered hash where we hash values in in-memory buckets.

CREATE TABLE DiskTableInMemory
(id NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000), 
date_value DATETIME NULL)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

Durability defines persisting of schema and data to disk. Sometimes we need to persist the schema only to behave it as sort of cache.

Memory Optimized Tables:

  • Compiled C# structs
  • Stored in memory
  • Can be accessed using Transact-SQL or native stored procedures

Performance Comparison: Memory vs Disk Table

Let consider we have disk table without MEMORY_OPTIMIZED ON. We insert 500000 rows into Disk table vs In-memory table.

Operation Memory Table (s) Disk Table (s)
Insert 7 17
Delete 0 14

Memory vs Disk Table

Native Stored Procedures

The idea of native stored procedures as well as In-Memory tables, these are C# structs compiled to DLL.
We can create stored procedures that are C# code compiled to DLL.
To create a Native stored procedure, we mention WITH NATIVE_COMPILATION

CREATE PROCEDURE Sales.InsertProduct
WITH NATIVE_COMPILATION,
     SCHEMABINDING,
     EXECUTE AS OWNER
AS
INSERT INTO Sales.Product...;

EXECUTE Sales.InsertProduct;

Performance Comparison

Let's create the native stored procedure as below and execute it and compare execution time with normal looped insertion.

CREATE PROCEDURE dbo.InsertData
WITH NATIVE_COMPILATION,
     SCHEMABINDING,
     EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english')
    DECLARE @MemId INT = 1;
    WHILE @MemId <= 500000
    BEGIN
        INSERT INTO dbo.MemoryTable VALUES(@MemId, GETDATE())
        SET @MemId = 1
    END
END

VS

-- Loop and insert in the transaction
INSERT INTO dbo.MemoryTable...;

EXECUTE dbo.InsertData;
We know from the previous section (Performance Comparison: Memory vs Disk Table) the loop approach in transaction took 17 seconds.
Let's compare insert 500000 rows into Disk table vs In-memory table vs Native Stored Procedure using an In-Memory table.

Operation Memory Table (s) Disk Table (s) Native Stored Procedure
Insert 7 17 0
Delete 0 14 0

Memory vs Disk Table vs Native Stored Procedure

Native stored procedures:

  • Compiled C# code
  • Stored in memory

Summary

Use indexes to optimise SELECT performance

  • Each table can have one clustered index
  • You can create multiple non-clustered indexes per table

Columnstore indexes can improve some workloads

  • Particularly data warehouse queries with highly compressible data

Memory-optimized tables offer high performance

  • Use these for contention-bound workloads
  • Optimize further with native stored procedures

4. How Indexes can help you live a better life (SQL Fundamentals)
Share this

Subscribe to Coding Today

Popular queries