RSS

Category Archives: Data Management

DMVs for performance tuning

DMV Categories

Most important DMV Categories for performance tuning are

dm_db_* – General database and index utilization
dm_exec_* – Statistics for queries that are executing, as well as queries that have completed and still have plans in the query cache
dm_io_* – Disk subsystem statistics
dm_os_* – Statistics related to use of hardware resources

Database Statistics
Most common DMVs used for database statistics are

  1. sys.dm_db_index_usage_stats – number of times (and the last time) each index was used to satisfy a seek, scan or lookup , as well as the number of time and the last time an update was performed to each index.
  2. sys.dm_db_index_operational_stats – this is a function that takes 4 parameters (database_id, object_id, index_id and partition_id). Returns locking, latching and access statistics for each index that can help you determine how heavily an index is being used. Also helps you diagnose contention issues due to locking and latching.
  3. sys.dm_db_index_physical_stats – function that takes in 5 parameters – all 4 of above and mode. Returns size and fragmentation statistics for each index and should be the primary source for determining when an index needs to be defragmented.
  4. sys.dm_db_missing_index_groups
  5. sys.dm_db_missing_index_group_stats
  6. sys.dm_db_missing_index_details

SQL Server logs details of an index miss which can be viewed using sys.dm_db_missing_index_* views (4, 5, 6 above) . Index miss is a situation when an index would be beneficial but doesn’t exist.

Query Statistics
The dm_exec_* views provide information related to connection to an instance as well as query execution.

  • dm_exec_connections – one row for each connection to the instance
  • dm_exec_sessions – one row for each currently authenticated session- also returns accumulated reads, writes, cpu and query execution duration for session.
  • dm_exec_requests – one row for each currently executing request in the instance. Blocking_session_id columns helps to diagnose contention issues.
  • dm_exec_query_stats – contains detailed statistics on performance and resources consumed for every query in the query cache.
  • dm_exec_sql_text function returns the text of the sql statement associated to the handle that was passed in
  • dm_exec_query_plan accepts a plan handle and returns corresponding xml showplan

If you want to see the currently executing queries along with xml plan

SELECT * FROM
sys.dm_exec_requests
CROSS APPLY
sys.dm_exec_query_plan (plan_handle)
CROSS APPLY
sys.dm_exec_sql_text(sql_handle)

IF you want to see the sql statement and xml plan for every query in the query cache

SELECT * FROM
sys.dm_exec_query_stats
CROSS APPLY
sys.dm_exec_query_plan (plan_handle)
CROSS APPLY
sys.dm_exec_sql_text(sql_handle)

DISK SUBSYSTEM STATISTICS
sys.dm_io_virtual_file_stats – statistics about reads and writes for every database file. IOStalls for both reads and writes can be found here.
sys.dm_io_pending_requests – contains one row for each request that is waiting for the disk subsystem to complete an I/O request.

HARDWARE RESOURCES
When a request is sen to the waiting queue, SQL Server sets a value called teh wait type that designates the type of resources that the rqeuest is waiting on. As soon as wait type is set, SQL Server starts a clock. When the resource becomes avaialble, SQL Server stops the clock and records the amount of time that the request had to wait for the resource to become avialable, called teh wait time. SQL Server also sets a clcok when a request enters the running queue, called teh signal wait, which records how long it takes a process to get to the top of the queue and begin executing.

dm_os_wait_stats lists the aggregate amount of signal wait and wait time for each wait type.

Source :MCTS Training Kit – Mike Hotek

 
 

Filtered Index

An index key could have a signifi cant skew in the data values where a large percentage of the
table contains duplicated values only within a narrow range of the overall set of values. If a
query were executed that retrieved data from the portion of the table that was highly selective,
it is likely that subsequent queries executed against the low selectivity range would use the
same index, but doing so is clearly inappropriate.

To handle the cases where signifi cant skew exists in the data, SQL Server 2008 allows you
to create fi ltered indexes. A filtered index is simply an index with a WHERE clause. Only the
index keys matching the WHERE clause are added to the index, allowing you to build indexes
that focus on the highly selective portions of the table while allowing SQL Server to choose
another method for the less selective range.

Filtered indexes have the following restrictions:
􀁑 They must be a nonclustered index.
􀁑 They cannot be created on computed columns.
􀁑 Columns cannot undergo implicit or explicit data type conversion

 

Source: MCTS 2008 Training Kit – Mike Hotek

 
 

Index

Index Levels
The number of levels in an index and the number of pages within each level of an index are
determined by simple mathematics. A data page in SQL Server is 8,192 bytes in size, which
can be used to store up to 8,060 bytes of actual user data. Based on the number of bytes
required to store an index key, determined by the data type, you can calculate the number of
rows per page that are stored by using simple division.
The following example describes not only how an index is built, but also the size calculations
for an index. It gives you an idea of how valuable it can be to use an index to fi nd data within
very large tables, as well as explain why the amount of time to fi nd a piece of data does not
vary much even if the size of a database increases dramatically. Of course, the amount of time
needed to locate data also depends upon writing effi cient queries.
If you build an index on an INT column, each row in the table will require 4 bytes of
storage in the index.

If the table contains only 1,200 rows of data, you need 4,800 bytes of storage. Because all
the entries would fi t on a single page of data, the index would have a single page that would
be the root page as well as the leaf page. In fact, you could store 2,015 rows in the table and
still allocate only a single page to the index.
As soon as you add the 2,016th row, however, all the entries can no longer fi t on a single
page, so two additional pages are allocated to the index in a process called page splitting.
The existing root page is pushed down the structure to become a leaf-level page. SQL Server
takes half of the data on the index page and moves it to one of the newly allocated pages.
The other new page is allocated at the top of the index structure to become the new root
page. The fi nal step in the process is to take the fi rst entry on each of the leaf-level pages
and write the entries to the newly created root page. You are now left with an index with a
root page and two leaf-level pages. This index does not need an intermediate level created
because the root page can contain all the values at the beginning of the leaf-level pages. At
this point, locating any row in the table requires scanning exactly two pages in the index.
NOTE PAGE SPLITS
Keep in mind that rows on an index page are maintained in sorted order, so SQL Server
You can continue to add rows to the table without affecting the number of levels in
the index until you reach 4,060,225 rows. You then have 2,015 leaf-level pages with 2,015
entries apiece. The root page has 2,015 entries corresponding to the fi rst row on each of the
leaf-level pages. Therefore, for SQL Server to fi nd any row within the 4,060,255 rows in the
table, it would require reading exactly two pages. When the 4,060,226th row of data is added
to the table, another page needs to be allocated to the index at the leaf level, but the root
page cannot hold 2,016 entries because that would make it exceed the 8,060 bytes that
are allowed. So SQL Server goes through a page split process. The previous root-level page
now becomes an intermediate-level page, with a second page allocated at the intermediate
level. The former root page undergoes a page split to move half of the entries to the newly
allocated intermediate-level page, and the fi rst entry on each of the two intermediate-level
pages is written to the newly allocated root page.
The next time SQL Server needs to introduce an intermediate level occurs when it must add
the 8,181,353,376th row of data to the table—2,015 rows on the root page corresponding to
2,015 pages on the intermediate level, each of which has 2,015 entries corresponding to 2,015
pages at the leaf level, plus one extra row of data that will not fi t.
As you can see, this type of structure enables SQL Server to locate rows in extremely large
tables very quickly. In this example, fi nding a row in the table with a little over 4 million rows
requires SQL Server to scan only two pages of data, and the table could grow to more than
8 billion rows before it would require SQL Server to read three pages to fi nd any row.

 

Source: Microsoft 2008 SQL Server 2008 – Implementation and Maintenance – Mike Hotek

 
 

Data Partitioning

Use sp_helpFile and sp_helpFileGroup to see the allocations.

Index partitioning
If table and index use the same partitioning function and partitioning columns (in same order), they are said to be aligned.
If table and index use the same partitioning function and partitioning scheme, they are said to be storage aligned. In general, having the tables and indexes on the same file or filegroup can often benefit where multiple CPUs can parallelize an operation across partitions. In the case of storage alignment and multiple CPUs, SQL Server can have each processor work directly on a specific file or filegroup – and know that there are no conflicts in data access because all data required for a particular join or index lookup will be colocated on the same disk.

LEFT and RIGHT parititions
For a LEFT partition function, the extreme LEFT boundary point is fixed. For a RIGHT partition function, the extreme RIGHT boundary point is fixed. The other ends of the partitions will not have boundary points explicitly  defined (ie, the extreme right will be ∞ for a left partition function, and the extreme left will be -∞ for a right partition function).

To easily remember the rules of LEFT and RIGHT partition functions, picture a number line and use just one boundary condition as an example.

In a LEFT partition function, any value that is on the LEFT of a boundary condition, is always <= the boundary condition, and any value that is on the right of the boundary condition is always > than the boundary condition. (Remember, that in the left partition function, the extreme right is open (∞), so towards right is always > and going towards infinity).


For e.g. if the first value (or boundary condition) of a partition function is ‘20001001’ then the values within the bordering partitions will be:

1st partition is all data <= ‘20001001’
2nd partition is all data >  ‘20001001’

In a RIGHT partition function, any value that is on the RIGHT of a boundary condition, is always >= the boundary condition. Any value on the left of the boundary condition is always < than the boundary condition. (Remember, in the RIGHT partition function, the extreme left is open (-∞), so towards left is always < and going towards -∞.

For e.g. if the first value (or boundary condition) of a partition function is ‘20001001’ then the values within the bordering partitions will be:
1st partition is all data < ‘20001001’
2nd partition is all data => ‘20001001’

Process of partitioning a table, index, or indexed view is as follows:

1. Create a partition function.
2. Create a partition scheme mapped to a partition function
3. Create the table, index, or indexed view on the partition scheme.

PARTITION FUNCTION
A partition function defines the set of boundary points for which data will be partitioned.
e.g.

CREATE PARTITION FUNCTION myPartFunction (int)
AS RANGE LEFT
FOR VALUES (10,20,30,40,50)

PARTITION SCHEME
A partition scheme defines the storage structures and collection of filegroups that you want to use with a given partition function.
E,g,

CREATE PARTITION SCHEME myPartScheme
AS PARTITION myPartFunction
TO (fileGroup1, fileGroup2, fileGroup3, fileGroup4, fileGroup5

OR

CREATE PARTITION SCHEME myPartScheme
AS PARITION myPartFunction
ALL TO (fileGroup1)

Important: The filegroup must already be present in the database. SQL Server allows the use of the ALL keyword which allows you to create all paritions defined by the partition function within a single filegroup. If you do not use the “all” keyword, then the partition scheme must be defined in such a way as to contain a filegroup for each partition that is created by the partition function mapped to the partition scheme. ie, Each partition that is created in the partition function, must be associated with atleast one file group. If you use the “all” keyword, a maximum of only one filegroup can be specified.

CREATING PARTITIONED TABLE OR PARTITIONED INDEX
This is similar to creating a non partitioned table, index or indexed view. The ON clause is where the difference lies. Instead of specifying a filegroup on which to create the table, you specify a partition scheme.

CREATE TABLE Employee
(EmployeeID INT, Name varchar(100))
ON mypartscheme (employeeID)

You need to specify the column in the table, the partitioning key, to which the partition function will be applied. The partitioning key must match the data type, length and precision of the partition function. If the partition column is computed, the computed column must be PERSISTED (values must be physically stored)

Partitioned index is similar

CREATE NONCLUSTERED INDEX idx_employeeName ON dbo.Employee(name)
ON myPartScheme(EmployeeID).
When specifying partitioning key for an index, you are not limited to the columns that on which the index is defined.

SPLIT AND MERGE OPERATORS
The SPLIT operator introduces a new boundary point into a partition function. MERGE eliminates a boundary point from a partition function. You must be careful while using this because you are either adding or removing an entire partiton form the partition function. Data is not being removed – only partition. Because a partition can reside only in a single filegroup, these operations could cause significant amount of disk I/O as sql server relocates rows on disk.

ALTER PARTITION FUNCTION partition_function_name
{SPLIT RANGE (boundary_value)
| MERGE RANGE (boundary_value)} [;]

ALTERING A PARTITION SCHEME
You can add filegropus to an existing partition scheme to create more storage space for a partitioned table.

ALTER PARTITION SCHEME partition_Scheme_name
NEXT USED [file_group_name] [;]

The NEXT USED clause has two purposes
1. It adds a new filegroup to the partition scheme, if the specified filegroup is not already part of the partition scheme.
2. It marks the NEXT USED property for a filegroup. The filegroup that is marked with the NEXT USED flag is the filegroup that contains the next partition that is created when a SPLIT operation is executed.

Note: Partitioning can be done after it has been populated with data – although this causes significant disk IO. To partition an existing table, you need to drop the clustered index and re-create the clustered index on the partition scheme. To partition an existing index or indexed view, drop the index and recreate the index on a partition scheme.

SWITCH OPERATOR
The basic idea is that SWITCH allows you to exchange partitions between tables in a perfectly scalable manner with no locking, blocking or deadlocking. SWITCH has severala requirements to ensure that the operation is perfectly scalabale. Th emost important are as follows

  • The data and index for the source and target tables must be aligned.
  • Source and target tables must have the same structure.
  • Data cannot be moved from one file group to another
  • Two partitions with data  cannot be exchanged
  • The target partition must be empty
  • The source or target table cannot participate in replication.
  • The source or target tables cannot have full text indexes or FILESTREAM data type defined..

Points to Remember:

1. You cannot use columns which are of text, ntext, image, sml, varbinary(max), varchar(max) or any CLR data types for partitioning. Any column that are used for partitioning should be deterministic.
2. Maximum number of partitions for a table is 1000.
3. There fore you are allowed to specify a maximum of 999 boundary points.
4. A computed column must be PERSISTED
5. SPLIT is used to introduce a new boundary point to a partition function
6. MERGE is used to remove a boundary point from a partition function
7. SWITCH is used to move partition between tables.
8.In a SWITCH operation, You cannot move data from one filegroup to another , nor exchange two partitions with data – ie, only one partition may contain data.

List out which tables are partitioned, and what partition scheme and partition function they use:

SELECT DISTINCT object_name(o.object_id), pf.name AS function_name, ps.name AS scheme_name
FROM
sys.objects o
INNER JOIN
sys.indexes i ON o.object_id = i.object_id
INNER JOIN
sys.partition_schemes ps ON ps.data_space_id = i.data_space_id
INNER JOIN
sys.partition_functions pf ON     ps.function_id = pf.function_id
WHERE i.index_id < 2

LIST which column the table is being partitioned on

SELECT b.name
FROM
sys.index_columns a
inner join
sys.columns b on
a.column_id= b.column_id
WHERE
OBJECT_NAME(a.OBJECT_ID)='<tableName>’ and
a.partition_ordinal = 1 and OBJECT_NAME(b.OBJECT_ID)='<tableName>’;
*If partition_ordinal = 0, it is not a partitioned column

 

SOURCE: MCTS Training kit (chapter 6)

http://www.sqlskills.com/resources/whitepapers/partitioning%20in%20sql%20server%202005%20beta%20ii.htm#_Toc79339948