SQL Server 2008 Storage

Here are some notes on “SQL Server 2008 Storage” I took while attending an advanced class on SQL Server taught by Greg Low (from http://sqlblog.com/blogs/greg_low/ and http://www.sqldownunder.com/).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Disk history and basics

Stone, paper, magnetic tape, magnetic disk 🙂
Amount of store increasing rapidly
Magnetic disk – Non volatile, random address, re-writable
Discussion – Reading data later – how far ahead can you guess (10 years?), especially removable media
Functions – Seek to, Random from, Write to, atomic access to single block (512 bytes)
Physical geometry obscured with LBAs – Reality of the geometry could be completely different
Discussion – For how long will we have mechanical drives?

Disk structure

Structure of disks – platters, head, tracks, cylinders, sector, density changes, more data on outer tracks
Discussion – Small drives or large drives
Spindles – Seek to track or stay in the track, rotating spindles, actuators
Discussion – Drive reliability, MTBF
Removable disks – Disk packs. Spin up/load heads/seek/read/write. Alignment packs.
Discussion – Engineering of old style removable disk.
Fixed disk – Winchester. Heads land and take off, like a plane. Radial connections.

Disks details

Media – magnetic (disks/tapes), semiconductor (flash/RAM), optical (CD/DVD/BD), hybrid, holographic?
Discussion – Trust the media? Keeping track of many tapes over time. How many copies?
Discussion – Magnetic physical limitations. RAMSAN, FusionIO, SSD. Video = array of 24 SSD drives.
Semiconductor limitations – SLC vs. MLC, wear leveling, block erases
Magnetics x Semiconductor – Fast seeks and low power vs. magnetic resilience
How/why disks fail – electrical/mechanical domains, wearing out, trying forever, data corruption, ECC
“Not quite broken” – transitory errors, connection bus, vibration, heat, RFI, static
Discussion – desktop drives x enterprise drives, error rates
Disk terminology – tracks per inch, bits per inch, seek latency, access time, response time, IOPs, bandwidth
IOPS – I/Os per second – max IOPS = bandwidth/IO size. IOPs per spindle. Controller capacity.
Latency – How long does it take? – IOPs/Bandwith vs. Latency – RPM vs. Latency – 7.2K vs 10K vs 15K
Discussion – How is “average seek” time calculated?

Disk topics

Real-time – What is real time? SAN QOS, SSD, SQL snapshot isolation, Resource Governor
Power – Lower power, lower heat, 15K disk vs. SSD, power is a significant part of the overall cost
How much data – How many copies of the data do your keep? Online, backup, compliance, archive?
Backup options – D2T, D2D, D2D2T, D2D2D, etc…
Future: Storage Utility? Compute Utility? Services for data access, validity, recovery. Cloud?
Discussion – Pods. Datacenters in containers. 3,500 servers/pod. 3 connections: power, network, cooling

Accessing data

LBA or LBN is disk address, not head/cylinder/sector anymore
It’s all a big bunch of sectors and blocks (512 bytes)
Initators (find and mounts drives/LUNs) and targets
Disk ordering of commands (fifo, seek time, latency, size of stream, direction)

Drive options

Parallel attachment – IDE, ATA, PATA, SCSI
Serial attachment – FC, SATA, SAS, Ethernet
Classes of disks – Desktop (SATA, 7.2K RPM), Enterprise (SAS/FC, 15K RPM)
Size and density of disks – 3.5” (standard) and 2.5” (small form factor, SFF)
Duty cycle
Serial interfaces (SATA/SAS/FC)
Channel Performance (USB/IDE/SATA/FC), Network Performance (10M/100M/1G/10G)
RAID – Software, Direct-Attach RAID, External RAID (SAN)
SSD – Low (no) latency, low (no) seek time, high read IOPs, good bandwidth per drive
SSD – More expensive, lower capacity, higher cost,

What is an IO? What is the path from the App to the physical disk?
SQL Server – SQL Server protocol, query processor, SQL OS, storage engine, buffer manager, buffer pool, OS
“Hardware” I/O – Driver, HBA controller, buffer, FC switch, “SAN OS”, controller, cache, backend switch, disk
I/O request lifecycle – Should all happen in milliseconds. Better if can be done via buffer/cache.
I/O Latency – Optimal – Log: <5ms (ideally <1ms), OLTP: <20ms (ideally <10ms), OLAP: <30ms
I/O Latency – Depends on technology, should be consistent

Storage

Retains information on power removal. Disk, tape, optical.
Assumed to be reliable, fair amount of solution cost, regulatory requirements to keep data
Acronyms – JBOD, RAID, DAS, NAS, SAN, CAS, COS, CIFS, SMB, NFS
Storage: Initiator, Target, LUN

SQL Server storage

3 basic requirements: stable media, write ordering, torn I/O prevention
Applied also to any storage replication scheme – sync or async
The idea is to avoid data corruption and/or loss of transactional consistent

Disk structures

Platters, heads, tracks and sectors – typical sector size is 512 bytes
SQL Server uses 4KB sectors
Disk sector – Preamble, data sync, user data (512 bytes), ECC (40 bits), gap
Data is stored in sectors, accessed by LBN or block address
For SQL, should always have some sort of RAID construct for fault tolerance
NTFS – Default 4KB cluster size
Consider using 64KB cluster – aligns with SQL extents, improves read-ahead, reduces split I/O
Disable NTFS compression
GPT vs. MBR – MBR is the old partitioning. GPT is newer and allows volumes >2TB.

Sector alignment

Beginning of disk used to store partition info, actual data portion start after that.
Query with
– Windows Server 2008 and later will automatically align at next 1MB boundary
– Older versions of Windows will align at 63 sectors
– You should re-align at 1MB if possible. It can have significant impact.
See http://support.microsoft.com/kb/929491
See http://blogs.msdn.com/jimmymay/archive/2008/10/14/disk-partition-alignment-for-sql-server-slide-deck.aspx
See http://sqlblog.com/blogs/kevin_kline/archive/2008/10/08/how-to-improve-application-and-database-performance-up-to-40-in-one-easy-step.aspx

SQL Server I/O

Write-ahead logging (WAL) – Key to make it ACID
All log records must go to stable media BEFORE data pages does.
If these are reordered and stopped in the middle, bad things can happen
Stable media – must be able to survive failure. Cache is OK, if battery backed.
Multiple channels, Load balancing, Mirroring, Remote Mirroring – Also need to guarantee order
Torn I/O – two bits on each of the 512 sectors on the 8K page
Page checksum – checksum for the entire page, replaces Torn I/O if enabled
Force unit access – CreateFile with FILE_FLAG_WRITEHTROUGH flag
Hardware read cache – In controller, reads more data than requested. Good for OLAP, reporting.
Latching – Data synchronization, reader/writer “lock” in SQL Server when writing page

Scatter-Gather

Scatter-Gather I/O: Instead of doing individual I/Os, try to do multiple. For instance: entire extents.
Lazy Writer: On memory pressure, free pages. LRU stays. Up to 256 pages.
Checkpoint: Flush dirty pages in buffer pool to stable media. Up to 16 pages.
Eager Write: Minimally-logged, goes to disk without going to the buffer pool
Read-ahead: Read multiple pages ahead. Up to 128 pages (Enterprise Edition, up to 1024).
Enterprise Edition also can prime the cache on start.
Log manager: Writes log files, sequential. 8 pages (32 in 64-bit) and 480KB 32 outstanding log writes.
Orderly shutdown: Will checkpoint all databases, close out internal structures.
Don’t kill SQL Server process. You might need to increase shutdown time
See http://support.microsoft.com/kb/146092

Errors

Careful – Stale reads, lost writes, filter drivers, online file copy, drive fragmentation
Error 823 – Error, some data returned – Usually, something close to being broken (or broken)
Error 824 – Error, no data returned – Usually, something broken
Error 832 – Stale I/O or checksum issue
Error 833 – Hung I/O – nothing comes back (could be anti-virus, compression, network issue, etc)
Stale Read Protection – Holds some info on written pages, checked on read – Limited to ~320MB of data on 64-bit
Read Retry: Drives will try, OS will also try, SQL Server will retry on top of that – can mask issues
Page and log audit – trace flags to do checks
Idle server – Suspend/resume SQL Server – By default only on Express/Workgroup. Careful!

Scalable Shared Databases

Multiple SQL Servers mounting the same files.
Read-only databases and read-only files. SQL Server 2005 or later.
See http://msdn.microsoft.com/en-us/library/ms345392.aspx

Instant File Initialization

Both OS and FS must support IFI.
SQL Server account needs SE_MANAGE_VOLUME_NAME.
Used in data files, not logs.
Careful – Files will not be initialized.

DMVs

SELECT * FROM sys.dm_io_pending_io_requests
See http://msdn.microsoft.com/en-us/library/ms188762.aspx
SELECT * FROM sys.dm_io_virtual_file_stats(<database>/NULL,<file>/NULL)
Shows io_stall, io_stall_read_ms, io_stall_write_ms, num_of_reads, num_of_writes
See http://msdn.microsoft.com/en-us/library/ms190326.aspx

SAN

Storage Area Network. Fibre Channel or iSCSI. Comprised of fabric and end points.
Terminology: HBA, array, frame, port, frames, PDU, fabrics, zoning, VSAN, masking, WWN, MPIO.
See http://www.snia.org/images/tutorial_docs/Networking/MKJibbe-Wilson_Fibre_Channel_Technologies-rev.pdf
Terminology: Write Atomicity, Restripe, Concatenation, Thin provisioning, LUN growth/shrink
“Magic SAN dust” – Scalability, Availability, Redundancy, Data Management/Mobility, Snaps, Close, Mirroring
Easy and flexible provisioning
Generally used with Failover Clustering
Careful – Shared storage performance – How to guarantee?
Careful – RAID and stripe size – Alignment still matters
Careful – LUN security
Discussion – SAN vs. DAS – Pros and cons of each…

VLDB

Maintenance: Index rebuilds, updating statistics
Fragmentation: Only important if you’re doing scans. Watching for them is important.
Identify the few very large tables that need attention. Consider partitioning
Backup and Restore strategy

Performance

IOPS
Minimize Log write waits
Throughput vs. Latency
Reduce SQL locking
Add CPUs
Reduce I/O path

Multiple spindles

Capacity – Striping (RAID 0)
Availability – Mirroring (RAID1, RAID10), Parity (RAID5, RAID6)
Performance – Striping (RAID 0)
Tempdb in RAID0? SQL Server needs tempdb to be there all the time…
RAID5/RAID6 have a performance penalty

Pre-deployment

Determine capacity of I/O system prior to deployment
Tools: SQLIO, IOMeter, SQLIOSim (not for performance)
See http://sqlblog.com/blogs/kevin_kline/archive/2007/06/28/understanding-sqliosim-output.aspx
SQLIO: IOPS, MB/s, latency. One or many files. Single I/O type per run.
IOMeter: IOPS, MB/s, latency. Can combine many IO types per run.
IO sizes: SQL can use 8/64/128/256/1024KB. Random – look at IOPS. Sequential – look at MB/s.
Workloads: OLTP – heavy reads, writes during checkpoints, 60K sequential read/write for logs
Workloads: OLAP – sequential reads and writes, larger I/O sizes, more scans
Determining Saturation Point – IOPS constant, latency increasing
Careful – HBA can saturate before the SAN disks, consider multiple channels
Careful – HBA drivers/configuration, controller cache configuration, volume alignment

SQL Server 2008 Storage

Published by joseabarreto

One thought on “SQL Server 2008 Storage”

Leave a comment Cancel reply

Share this:

Related

Published by joseabarreto

One thought on “SQL Server 2008 Storage”

Leave a comment Cancel reply