Hello,
I try to optimize some process, which include many large SQL queries with many operators Cluster Index Seek on certain table. I found out that set of columns which is used by all these operators (from all these queries) is about half of all columns of the
table.
In this case, it is good idea to reduce size of used index by creation of Non-Cluster Index which has the same key(s) like Cluster Index but contains only necessary (for this process) columns in INCLUDES.
(Note, that in most cases Cluster Index Seek and Non-Cluster Index Seek have the same operation cost on execution plan and hence they take the same time. It is true, but when optimizer turns on mechanism ‘read ahead’ the size of used index significantly
impacts on performance. Surely, it is not taken into account in execution plans.)
I created index and assumed that optimizer will choose Non-Cluster index for seek operation instead of Cluster Index seek for all cases, because Non-Cluster index has less size for each record or because it is last created acceptable index. In worst case,
I expected that optimizer will remain Cluster Index everywhere. But I did not expect that optimizer can use different operators (even in one query).
After I created the Non-Cluster index, I have seen that new index is used only in several places of execution plan. In others places, optimizer could use the Non-Cluster Index, but it did not do it. It was very bad because Cluster Index Seek operators did
not use the data, which was cached by Non-Cluster index and vise versa. It leaded to performance degradation.
I found simple cases, which show these anomalies of index selection. To show it use AdventureWorks201 database (https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-2017).
We will look at index selection on table Person.Person. Further, we need Non-Cluster Index
CREATE UNIQUE NONCLUSTERED INDEX [IX_BusinessEntityID_FirstName] ON [Person].[Person]
(
[BusinessEntityID] ASC
)
INCLUDE ([FirstName])
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
and run the following queries
SELECT em.FirstName FROM [HumanResources].[EmployeeDepartmentHistory] ed
LEFT JOIN Person.Person em ON em.BusinessEntityID = ed.BusinessEntityID
WHERE ed.DepartmentID = 5
SELECT em.FirstName FROM [HumanResources].[EmployeeDepartmentHistory] ed
LEFT JOIN (SELECT * FROM Person.Person) em ON em.BusinessEntityID = ed.BusinessEntityID
WHERE ed.DepartmentID = 5
SELECT em.FirstName FROM [HumanResources].[EmployeeDepartmentHistory] ed
LEFT JOIN Person.Person em ON em.BusinessEntityID = ed.BusinessEntityID
WHERE ed.DepartmentID = 5 AND ed.BusinessEntityID = 250
These queries correspond to the following execution plans
|--Nested Loops(Left Outer Join, OUTER REFERENCES:([ed].[BusinessEntityID]))
|--Index Seek(OBJECT:([AdventureWorks2017].[HumanResources].[EmployeeDepartmentHistory].[IX_EmployeeDepartmentHistory_DepartmentID] AS [ed]), SEEK:([ed].[DepartmentID]=(5)) ORDERED FORWARD)
|--Index Seek(OBJECT:([AdventureWorks2017].[Person].[Person].[IX_BusinessEntityID_FirstName] AS [em]), SEEK:([em].[BusinessEntityID]=[AdventureWorks2017].[HumanResources].[EmployeeDepartmentHistory].[BusinessEntityID] as [ed].[BusinessEntityID]) ORD
|--Nested Loops(Left Outer Join, OUTER REFERENCES:([ed].[BusinessEntityID]))
|--Index Seek(OBJECT:([AdventureWorks2017].[HumanResources].[EmployeeDepartmentHistory].[IX_EmployeeDepartmentHistory_DepartmentID] AS [ed]), SEEK:([ed].[DepartmentID]=(5)) ORDERED FORWARD)
|--Compute Scalar(DEFINE:([Expr1004]=[AdventureWorks2017].[Person].[Person].[FirstName]))
|--Clustered Index Seek(OBJECT:([AdventureWorks2017].[Person].[Person].[PK_Person_BusinessEntityID]), SEEK:([AdventureWorks2017].[Person].[Person].[BusinessEntityID]=[AdventureWorks2017].[HumanResources].[EmployeeDepartmentHistory].[BusinessEnt
|--Nested Loops(Left Outer Join)
|--Index Seek(OBJECT:([AdventureWorks2017].[HumanResources].[EmployeeDepartmentHistory].[IX_EmployeeDepartmentHistory_DepartmentID] AS [ed]), SEEK:([ed].[DepartmentID]=(5) AND [ed].[BusinessEntityID]=(250)) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([AdventureWorks2017].[Person].[Person].[PK_Person_BusinessEntityID] AS [em]), SEEK:([em].[BusinessEntityID]=(250)) ORDERED FORWARD)
On first execution plan, you can find our Non-Cluster Index seek, on other plans - Cluster Index seek. Note, that differences between first and second queries is only in syntax. After some experiments, I can say that for first case optimizer takes Non-Cluster
Index, which was created first, for other cases optimizer takes only Cluster Index.
Questions are the following:
Why optimizer choose index this way? What are differences between these cases for mechanism of index selection?
How can I impact on this selection? (using of hints is not acceptable solution)
How can I fix this behavior to solve my issue?
Thank you in advance.