SQL Server CE: Multiple single-column statistics connected by OR
In post SQL Server CE: Multiple single-column statistics, I discussed the selectivity when AND is used to join the columns in Where clause. Today, I'm going to talk about the OR operator
Let's say we have four predicates connected by OR with selectivites P0,P1,P2 and P3,
Legacy CE
The combined selectivity is : 1-(1-P0)*(1-P1)*(1-P2)*(1-P3).
If there are only two predicates, the formula is relative simple: (P0+P1)-(P0*P1)
New CE
1.The combined selectivity is : 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8)
2.P0,P1,P1 and P2 are the selectivity of each value of the column in WHERE clause and P0>P1>P2>P3.
3.We can have up to 4 predicates. If there are more than 4 columns in the where clause, it only counts the first 4, the rest of them are ignored.
Combined selectivity of four predicates: 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8), where P0>P1>P2>P3.
In practice, Please use following formula to calcualte the selectivity if the combined selectivity has more than 2 predicate.
1-
As you expected, more columns used in where clause, more estiamted rows you get.
Here are examples , adventure 2019 OLTP database is used in this example
use AdventureWorks2019
go
IF exists(select 1 from sys.tables where name='SalesOrderDetail' and schema_id=schema_id('dbo'))
drop table SalesOrderDetail
Go
select * into SalesOrderDetail from Sales.SalesOrderDetail
GO
create statistics I_ProductID on SalesOrderDetail(ProductID) with fullscan
create statistics I_SalesOrderID on SalesOrderDetail(SalesOrderID) with fullscan
Here is the T-SQL query I'm going to test
select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288
The selectivity of ProductId 708 is 3007/121317=0.02478630
dbcc show_statistics(SalesOrderDetail,I_ProductID)
The selectivity of SalesOrderID 44288 is 34/121317=0.00028025
Legacy CE:
Combined selectivity:(P0+P1)-(P0*P1)=(0.02478630+0.00028025)-(0.02478630*0.00028025)=0.0250596036394250
Estimated row: 0.0250596036394250*121317=3040.15, is rounded down to 3040
select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 9481)--trace flag 9481 is used to force legacy CE
New CE:
Combined selectivity:1-(1-P0)*sqrt(1-P1)=1-(1-0.02478630)*(sqrt(1-0.00028025)=0.0249229613952225
Estimated row: 0.0249229613952225*121317=3023.57 is rounded up to 3024
select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312)--2312 is used to force New CE
Related trace flags
Trace flag 9471: Pickup the lowest selectivity(Most rows) from all predicates, and ignore rest of them. This trace flag only works when New CE is used.
Combined selectivity: Max(P0,P1,P2,P3,P4..)
In this case, it's Max(P0,P1)*CARD=MAX(0.02478630,0.00028025)*CARD=0.02478630*CARD=0.02478630*121317=3007
select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312,querytraceon 9471)---2312 is used to force New CE
Trace flag 9472:Switch to independent formula, the result is as same as the result of Legacy CE and this trace flag only works when New CE is used.
select * from SalesOrderDetail where ProductID=708 or SalesOrderID=44288 option(recompile,querytraceon 2312,querytraceon 9472)--force New CE
Assessment:
T-SQL
use AdventureWorks2019
go
IF exists(select 1 from sys.tables where name='SalesOrderDetail' and schema_id=schema_id('dbo'))
drop table SalesOrderDetail
go
select * into SalesOrderDetail from Sales.SalesOrderDetail
GO
--create five statistics
create statistics I_ProductID on SalesOrderDetail(ProductID) with fullscan
create statistics I_orderqty on SalesOrderDetail(orderqty) with fullscan
create statistics I_SalesOrderID on SalesOrderDetail(SalesOrderID) with fullscan
create statistics I_SpecialOfferid on SalesOrderDetail(SpecialOfferid) with fullscan
create statistics I_modifiedDate on SalesOrderDetail(modifiedDate) with fullscan
1.Please calculate the estimated rows of following queries and run the queries to verify
select *From SalesOrderDetail where ProductId=708 or orderqty=10 or modifiedDate='2011-05-31 00:00:00.000' or SpecialOfferid=4 or SalesOrderID=44288 option(recompile,querytraceon 9481)
go
select *From SalesOrderDetail where ProductId=708 or orderqty=10 or modifiedDate='2011-05-31 00:00:00.000' or SpecialOfferid=4 or SalesOrderID=44288 option(recompile,querytraceon 2312)
2.If there are multi-column stats, does it make different?
Comments
Post a Comment