Here are some notes on “SQL Server 2008 Partitioning” I took while attending an advanced class on SQL Server taught by Kimberly Tripp (http://sqlskills.com/AboutKimberlyLTripp.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Partitioning misconceptions
- It’s not all about speeding up for parallelism
- It’s not all about scaling out databases for performance
- It’s not all about very large and slow databases
- It’s also (mostly?) about range-based inserts or range-based deletes
- It’s also (mostly?) about the sliding window scenario
- There are cases when you do get peformance/paralelism benefits
Need partitioning
- Problem: Customer query was slow and asked for partitioning
- Solution: fixed the problem with indexing
- They did not need partitioning in that scenario
Partitioning strategies
- SQL Server 7 or later: read only partitioned views
- SQL Server 2000 or later: updatable partitioned views
- SQL Server 2005 or later: partitioned tables
- SQL Server 2008 or later: enhanced partitioned tables
Vertical partitioning
- Case: Large Patient table. Lots of columns, including picture.
- You could: Have a table for critical columns.
- Leave all the less used and BLOB columns in a second table.
Horizontal partitioning
- Case: Sales table for last few years.
- Problem: Weekly full backup of 1 billion rows.
- Problem: INSERTs blocked by queries on historical data.
- You could: Create multiple tables with partition views (PV).
- Uses constraints to implement the partitioning rules.
- You could: Create partitioned tables (PT) with read-only file groups.
- Uses a partition function and a partition scheme.
- Saves in management, backup. Could have different SLAs per year.
Easier Maintenance
- Loading options
- Index maintanence
- Backup/restore
Files and File Groups
- Multiple files in same file group: round robin between files
- If you lose one file in the FG, you can’t access the data
- Multiple file groups with partitions: spread by partitioning scheme.
- You can lose just one FG and still acess some of the data
The range-based delete problem
- Large table, large delete of old data takes a lot of time
- It’s all about reducing the impact on the multiple indexes
- Turn into an instant operation with the right partitioning
The data load problem
- Typical issue: need to load a lot of data into a table
- Again, it’s all about reducing the impact on the multiple indexes
- Typical: Load into heap, build clustered index, then non-clustered
- If source data is ordered and ever increasing: Load into clustered index, then non-clustered
- Will not work so well if you’re parellelizing the load operation
- Do not shrink in the end! It will cause massive fragmentation
- In any case, the impact on the log is big. Consider going to bulk logging mode.
Sliding window numbers
- Single processor, 5.7 million rows, 2 non-clustered indexes
- Data load – from 28m to 1m26s with partitioned table
- Data delete – from 15m to <1s with partitioned table
- Partitioned scneario benefits greatly from multiple processor
- Parallel bulk load and parallel index creation – almost linear gains
PVs
- Partitioned Views – Many tables, single view : SELECTS + UNION ALL
- Manually create the tables with constraints, place in separate FG
- Constraint: CHECK (SalesDate >= ‘20090101’ AND SalesDate < ‘20090201’)
- Create the view, which are then updateable
- Be careful not to make mistakes when creating the table and constraints
- SQL does not know of the PV when you create the tables
- Lots of tables, indexes, constraints – do it right
PVs and Constraints
- SQL uses the constraints to figure it all out
- Constraints need to allways be trusted – on create table, always check, never disable
- SQL can’t prune tree if the constraints are not trusted
- Check if NOT OBJECTPROPERTY(OBJECT_ID(‘constraint’),’CnstIsNotTrusted’)
- Use DBCC CHECKCONSTRAINTS. If not trusted, DROP and re-CREATE.
- See http://sqlblog.com/blogs/kalen_delaney/archive/2006/10/10/281.aspx
PTs
- Partitioned tables are defined in a more structured fashion.
- You define specific partitioning points “on a line”. 4 points = 5 partitions.
- Classic case is years or months on a time line. Which changes over time.
- It’s OK if you end up with an empty partition in the beginning or end.
Partitioning Function
- Partition Function – Logical – Defines the points on the line (right or left)
- Classic case is years or months on a time line. Which changes over time.
- Left is OK, but be careful with date boundaries – ’01/31/2009 23:59:59.997′
- You can get lost in the syntax – Do it in a white board first – do the code later
- It has to be over a single column – typically a date, country
- Partition function with n items creates n+1 partitions
- MOD function on an identity field? Possible, with maintanence issues…
Partitioning Scheme
- Partition Scheme – Physical – Maps to the file groups (could be a single one)
- You can add one extra file group that is not allocated – Next used
- Typically, you want different FG in different drives (or sets of drives)
- You cannot remove the first file in the file group
Final Step
- Create the table on the scheme – same place where you specify the file group
- Add the data to the table, which flow to the right partition
- Create non-clustered indexes also aligned with the partitions – many benefits
- You can even rebuild a table with a clustered index on the scheme as an online operation
- This is done with ALTER TABLE … ON psdate(SalesDate)
Rolling windows
- Create new metadata on filegroup: table, constraints, indexes, views (same schema)
- Switch the partition live. It’s all physically just updating the IAM pointers.
- You can only add one partition in a command.
- Could do a switch in and switch out together as a transaction.
- The first (in RIGHT) partition will remain empty as data slides/rolls forward
- Be careful not to merge when there’s data on it, or else data needs to move (expensive)
Other
- Cannot rebuild indexes online for only one partition, only for entire partitioned table.
- Separate into multiple tables to be able to do this. Benefits in doing this…
- Can use read-only and read-write partitioned tables. Benefits in doing this…
- New in SQL Server 2008: partition-aligned indexed views.
- Partitioned table parallelism. “Non-selective leading index columns”. Skip scanning.
White Paper: Partitioned Tables and Indexes in SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms345146.aspx
PingBack from http://trendynetnews.8baiduo.com/sql-server/
LikeLike