SQL Server 2008 Partitioning

Here are some notes on “SQL Server 2008 Partitioning” I took while attending an advanced class on SQL Server taught by Kimberly Tripp (http://sqlskills.com/AboutKimberlyLTripp.asp).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Partitioning misconceptions

It’s not all about speeding up for parallelism
It’s not all about scaling out databases for performance
It’s not all about very large and slow databases
It’s also (mostly?) about range-based inserts or range-based deletes
It’s also (mostly?) about the sliding window scenario
There are cases when you do get peformance/paralelism benefits

Need partitioning

Problem: Customer query was slow and asked for partitioning
Solution: fixed the problem with indexing
They did not need partitioning in that scenario

Partitioning strategies

SQL Server 7 or later: read only partitioned views
SQL Server 2000 or later: updatable partitioned views
SQL Server 2005 or later: partitioned tables
SQL Server 2008 or later: enhanced partitioned tables

Vertical partitioning

Case: Large Patient table. Lots of columns, including picture.
You could: Have a table for critical columns.
Leave all the less used and BLOB columns in a second table.

Horizontal partitioning

Case: Sales table for last few years.
Problem: Weekly full backup of 1 billion rows.
Problem: INSERTs blocked by queries on historical data.
You could: Create multiple tables with partition views (PV).
Uses constraints to implement the partitioning rules.
You could: Create partitioned tables (PT) with read-only file groups.
Uses a partition function and a partition scheme.
Saves in management, backup. Could have different SLAs per year.

Easier Maintenance

Loading options
Index maintanence
Backup/restore

Files and File Groups

Multiple files in same file group: round robin between files
If you lose one file in the FG, you can’t access the data
Multiple file groups with partitions: spread by partitioning scheme.
You can lose just one FG and still acess some of the data

The range-based delete problem

Large table, large delete of old data takes a lot of time
It’s all about reducing the impact on the multiple indexes
Turn into an instant operation with the right partitioning

The data load problem

Typical issue: need to load a lot of data into a table
Again, it’s all about reducing the impact on the multiple indexes
Typical: Load into heap, build clustered index, then non-clustered
If source data is ordered and ever increasing: Load into clustered index, then non-clustered
Will not work so well if you’re parellelizing the load operation
Do not shrink in the end! It will cause massive fragmentation
In any case, the impact on the log is big. Consider going to bulk logging mode.

Sliding window numbers

Single processor, 5.7 million rows, 2 non-clustered indexes
Data load – from 28m to 1m26s with partitioned table
Data delete – from 15m to <1s with partitioned table
Partitioned scneario benefits greatly from multiple processor
Parallel bulk load and parallel index creation – almost linear gains

PVs

Partitioned Views – Many tables, single view : SELECTS + UNION ALL
Manually create the tables with constraints, place in separate FG
Constraint: CHECK (SalesDate >= ‘20090101’ AND SalesDate < ‘20090201’)
Create the view, which are then updateable
Be careful not to make mistakes when creating the table and constraints
SQL does not know of the PV when you create the tables
Lots of tables, indexes, constraints – do it right

PVs and Constraints

SQL uses the constraints to figure it all out
Constraints need to allways be trusted – on create table, always check, never disable
SQL can’t prune tree if the constraints are not trusted
Check if NOT OBJECTPROPERTY(OBJECT_ID(‘constraint’),’CnstIsNotTrusted’)
Use DBCC CHECKCONSTRAINTS. If not trusted, DROP and re-CREATE.
See http://sqlblog.com/blogs/kalen_delaney/archive/2006/10/10/281.aspx

PTs

Partitioned tables are defined in a more structured fashion.
You define specific partitioning points “on a line”. 4 points = 5 partitions.
Classic case is years or months on a time line. Which changes over time.
It’s OK if you end up with an empty partition in the beginning or end.

Partitioning Function

Partition Function – Logical – Defines the points on the line (right or left)
Classic case is years or months on a time line. Which changes over time.
Left is OK, but be careful with date boundaries – ’01/31/2009 23:59:59.997′
You can get lost in the syntax – Do it in a white board first – do the code later
It has to be over a single column – typically a date, country
Partition function with n items creates n+1 partitions
MOD function on an identity field? Possible, with maintanence issues…

Partitioning Scheme

Partition Scheme – Physical – Maps to the file groups (could be a single one)
You can add one extra file group that is not allocated – Next used
Typically, you want different FG in different drives (or sets of drives)
You cannot remove the first file in the file group

Final Step

Create the table on the scheme – same place where you specify the file group
Add the data to the table, which flow to the right partition
Create non-clustered indexes also aligned with the partitions – many benefits
You can even rebuild a table with a clustered index on the scheme as an online operation
This is done with ALTER TABLE … ON psdate(SalesDate)

Rolling windows

Create new metadata on filegroup: table, constraints, indexes, views (same schema)
Switch the partition live. It’s all physically just updating the IAM pointers.
You can only add one partition in a command.
Could do a switch in and switch out together as a transaction.
The first (in RIGHT) partition will remain empty as data slides/rolls forward
Be careful not to merge when there’s data on it, or else data needs to move (expensive)

Other

Cannot rebuild indexes online for only one partition, only for entire partitioned table.
Separate into multiple tables to be able to do this. Benefits in doing this…
Can use read-only and read-write partitioned tables. Benefits in doing this…
New in SQL Server 2008: partition-aligned indexed views.
Partitioned table parallelism. “Non-selective leading index columns”. Skip scanning.

White Paper: Partitioned Tables and Indexes in SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms345146.aspx

SQL Server 2008 Partitioning

Published by joseabarreto

One thought on “SQL Server 2008 Partitioning”

Leave a comment Cancel reply

Share this:

Related

Published by joseabarreto

One thought on “SQL Server 2008 Partitioning”

Leave a comment Cancel reply