SQL Server CE: Multi-columns statistics

 In post SQL Server CE: Multiple single-column statistics, I discussed how SQL Server calculates Multiple single-column statistics. I'm going to talk about the 'multiple-columns statistics' today.

 

 

Here are examples , adventure 2019 OLTP  database is used in this example

 

------- Data manipulation-------------------------------------------

alter database [AdventureWorks2019] set compatibility_level=150

go

use [AdventureWorks2019]

go

if exists(select 1 from sys.tables where name='SalesOrderDetail')

drop table SalesOrderDetail

go

select * into SalesOrderDetail from Sales.SalesOrderDetail---- import all the data into new table SalesOrderDetail.

Go

--Create two statistics explicitly. The first statistics has two columns

create statistics I_ProductID_UnitPrice on SalesOrderDetail(ProductID,UnitPrice) with fullscan

create statistics I_UnitPrice  on SalesOrderDetail(UnitPrice) with fullscan

------- Data manipulation-------------------------------------------

 

NEW CE in different database compatibility  behaves differently, I'm going to demo by one by.

 

 

NEW CE-2017/2019-db_compatibility_level>=140

      a)The selectivity of combined multi single-column stats, the formula isMAX(  min('all density',p0,p1,p2,p3),   p0*p1*p2*p3)

      b)'All density' of the combined columns  in  multi-column statistics

      c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3.

      d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored.

 

1.New CE with compatibility_level 150

alter database [AdventureWorks2019] set compatibility_level=150

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

thumbnail image 1 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

 

 

  1)The estimated row is 341.

  2)Here is the estimated row formula:   max(  min('All density of two columns',P1,P2),P1*P2)  *cardinality.

       a)All density of two columns is :0.001321004

       b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

       c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

       d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

thumbnail image 2 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

 

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

thumbnail image 3 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

  3)Let me pass the values to the formula:  max(  min('All density of two columns',p0,p1),p0*p1)  *cardinality

max(  min(0.001321004,0.03864256,0.07275979),0.03864256*0.07275979)*121317

=max(0.001321004,0.03864256*0.07275979)*121317

=max(0.001321004,0.0028116245506624)*121317

=0.0028116245506624*121317=341.09785561, rounded down to 341.

   4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363)

------------trace flag 2363 output-----------------

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3

Cardinality using multi-column statistics 0.001321 and with independence assumption 0.00281163. Picking cardinality 0.00281163

Selectivity: 0.00281163

Stats collection generated:

  CStCollFilter(ID=3, CARD=341.098)

      CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

          ------------trace flag 2363 output-----------------

 

 

NEW CE-2016-db_compatibility_level=130

     a)The selectivity of multi single-column stats, the formula is: min('all density',p0,p1,p2,p3)

     b)'All density' is the one for all columns  in  multi-column statistics

     c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3.

     d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored.

 

Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150

 

1.New CE with compatibility_level 130

alter database [AdventureWorks2019] set compatibility_level=130

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

thumbnail image 4 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

 

 

 

  1)The estimated row is 160.

  2)Here is the estimated row formula:   min('all density',p0,p1,p2,p3)  *cardinality.

       a)All density of two columns is :0.001321004

       b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

       c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

       d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

thumbnail image 5 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

 

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

thumbnail image 6 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

  3)Let me pass the values to the formula:   min('All density of two columns',p0,p1)  *cardinality

MIN(0.001321004,0.03864256,0.07275979)*121317=0.001321004*121317=160.260242268, is rounded down to 160.

   4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363,)

------------trace flag 2363 output-----------------

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3

Selectivity: 0.001321

Stats collection generated:

  CStCollFilter(ID=3, CARD=160.26)

      CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

          ------------trace flag 2363 output-----------------

 

NEW CE-2014-db_compatibility_level=120

SQL 2014 is not aware of Multi-columns stats, the algorithm is as same as Multiple single column statistics.formula isp0 * p1^(1/2) * p2^(1/4)* p3^(1/8)

 

Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150

 

1.New CE with compatibility_level 120

alter database [AdventureWorks2019] set compatibility_level=120

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

thumbnail image 7 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

 

 

 

  1)The estimated row is 12650.

  2)Here is the estimated row formula:   p0 * p1^(1/2) *cardinality.

       a)All density of two columns is :0.001321004

       b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

       c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

       d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

thumbnail image 8 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

 

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

thumbnail image 9 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics

 

  3)Let me pass the values to the formula:    p0 * p1^(1/2)  *cardinality

0.03864256*0.07275979^(1/2)*121317=0.0104235*121317=1264.547750, is rounded up to 1265.

   4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363,)

------------trace flag 2363 output-----------------

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2

Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3

Selectivity: 0.0104235

Stats collection generated:

  CStCollFilter(ID=2, CARD=1264.54)

      CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

          ------------trace flag 2363 output-----------------

 

thumbnail image 10 of blog post titled 
	
	
	 
	
	
	
				
		
			
				
						
							Multi-column statistics


Related post:

SQL Server CE: Multiple single-column statistics (sqlserverbang.blogspot.com)

SQL Server CE: Multiple single-column statistics connected by OR (sqlserverbang.blogspot.com)

SQL Server CE: Multiple single-column statistics (sqlserverbang.blogspot.com)

SQL Server CE: Multi-column statistics (sqlserverbang.blogspot.com)

Comments

popular posts

SQL Server Polybase in Failover cluster instance

Tutorial: Create SQL Cluster(FCI) on RHEL

Configure multiple-subnet Always On Availability Groups and failover cluster instances by modifying CIB

Realcase: Failed to upgrade SQL Server 2016 SP2 CU11. (Installation success or error status: 1648)

How to find SQL Server Replication related jobs and T-SQL statements

SQL Server Extents, PFS, GAM, SGAM and IAM and related corruptions

Tutorial: Add a node to SQL cluster on RHEL

Password is required when adding a database to AG group if the database has a master key

You may fail to backup log or restore log after TDE certification/key rotation

SQL Server GAM corruption samples