Posts

SQL Server CE: Multiple single-column statistics connected by OR

  In post  SQL Server CE: Multiple single-column statistics , I discussed the selectivity when   AND   is used to join the columns in Where clause. Today, I'm going to talk about the OR operator   Let's say we have four predicates connected by OR with selectivites P0,P1,P2 and P3,   Legacy CE  The combined selectivity is : 1-(1-P0)*(1-P1)*(1-P2)*(1-P3). If there are only two predicates, the formula is relative simple: (P0+P1)-(P0*P1)   New CE 1.The combined selectivity is : 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8) 2.P0,P1,P1 and P2 are the selectivity of each value of the column in WHERE clause and  P0>P1>P2>P3. 3.We can have up to 4 predicates. If there are more than 4 columns in the where clause, it only counts the first 4, the rest of them are ignored. Combined selectivity of four predicates: 1-(1-P0)*(1-P1)^(1/2)*(1-P2)^(1/4)*(1-P3)^(1/8), where   P0>P1>P2>P3. In practice, Please use following formula to calcualte the selectivity if the combined sel

SQL Server CE: Multi-columns statistics

Image
  In post  SQL Server CE: Multiple single-column statistics , I discussed how SQL Server calculates Multiple single-column statistics. I'm going to talk about the 'multiple-columns statistics' today.     Here are examples , adventure 2019 OLTP  database is used in this example   ------- Data manipulation------------------------------------------- alter database [AdventureWorks2019] set compatibility_level=150 go use [AdventureWorks2019] go if exists(select 1 from sys.tables where name='SalesOrderDetail') drop table SalesOrderDetail go select * into SalesOrderDetail from Sales.SalesOrderDetail---- import all the data into new table SalesOrderDetail. Go --Create two statistics explicitly. The first statistics has two columns create statistics I_ProductID_UnitPrice on SalesOrderDetail(ProductID,UnitPrice) with fullscan create statistics I_UnitPrice  on SalesOrderDetail(UnitPrice) with fullscan ------- Data manipulation-------------------------------------------   NEW C

SQL Server CE: Multiple single-column statistics

Image
Skip to footer con The SQL Server Query Optimizer is a cost-based Query Optimizer. For one specific query and tables,  SQL Server creates the query plan based on the estimated rows. Hence, understanding how SQL Server calculate the estimated rows helps you troubleshoot query plan related issues. I’m going to start a series of posts talking about SQL Server CE(Legacy/New).   The way that SQL Server calculates the estimated row of one single predicate is straightforward and simple. Well, it becomes a little bit complex when SQL Server handles more than one predicates.  Unlike the one single predicate, SQL Server has different strategies for Legacy CE and New CE. In Legacy CE, SQL Server assumes that data distribution on different columns are  independent ;  In New CE, SQL Server assumes that data distribution on different columns are correlated.  For short, the estimated rows returned by New CE is generally greater than the rows returned by  Legacy CE.   Here are examples , adventure 201