Search

Friday, October 21, 2011

Diagnosing and Resolving Spinlock issue on SQL2008R2

Spinlocks are lightweight synchronization primitives which are used to protect access to data structures. Spinlocks are not unique to SQL Server. When a thread attempting to acquire a spinlock is unable to obtain access it executes in a loop periodically checking to determine if the resource is available instead of immediately yielding. After some period of time a thread waiting on a spinlock will yield before it is able to acquire the resource in order to allow other threads running on the same CPU to execute. This is known as a Backoff.

SQL Server utilizes spinlocks to protect access to some of its internal data structures. These are used within the engine to serialize access to certain data structures in a similar fashion to latches. The main difference between a latch and a spinlock is the fact that spinlocks will spin (execute a loop) for a period of time checking for availability of a data structure while a thread attempting to acquire access to a structure protected by a latch will immediately yield if the resource is not available. Yielding requires context switching of a thread off the CPU so that another thread can execute.

Spinlock statistics are exposed by the sys.dm_os_spinlock_stats Dynamic Management View (DMV) within SQL Server.

select * from sys.dm_os_spinlock_stats

order by spins desc

















The statistics exposed by this query are described as follows:








































Symptoms which may indicate spinlock contention:

1. A high number of spins and backoffs are observed for a particular spinlock type.

AND

2. The system is experiencing heavy CPU utilization or spikes in CPU consumption. In heavy CPU scenarios one may also observe high signal waits on SOS_SCHEDULER_YEILD (reported by the DMV sys.dm_os_wait_stats).

AND

3. The system is experiencing very high concurrency.

AND

4. The CPU usage and spins are increased disproportionate to throughput.



Some of the more common causes for increased CPU consumption include:



1. Queries which become more expensive over time due to growth of the underlying data resulting in the need to perform additional logical reads of memory resident data.

2. Changes in query plans resulting in suboptimal execution.

With that said, if each of the conditions listed above is true then it would be advisable to perform further investigation into possible spinlock contention issues.

Diagnosing SQL Server Spinlock Contention

The primary tools used to diagnose spinlock contention are:

1. Performance Monitor - Look for high CPU conditions or divergence between throughput and CPU consumption.

2. The sys.dm_os_spinlock stats DMV - Look for a high number of spins and backoff events over periods of time.

3. SQL Server Extended Events - Used to track call stacks for spinlocks which are experiencing a high number of spins.

4. Memory Dumps - In some cases, memory dumps of the SQL Server process and the Windows Debugging tools. In general, this level of analysis is done when the Microsoft SQL Server support teams are engaged.

The general technical process for diagnosing SQL Server Spinlock contention is:

1. Step 1 – Determine that there is contention which may be spinlock related (see section above).

2. Step 2 – Capture statistics from sys.dm_ os_spinlock_stats to find the spinlock type experiencing the most contention.

3. Step 3 – Obtain debug symbols for sqlservr.exe (sqlservr.pdb) and place the symbols in the same directory as the SQL Server service .exe file (sqlservr.exe) for the instance of SQL Server.

In order to see the call stacks for the back off events, you must have symbols for the particular version of SQL Server that you are running. Symbols for SQL Server are available on the Microsoft Symbol Server. For more information about how to download symbols from the Microsoft Symbol Server, see Microsoft Knowledge Base article 311503, Use the Microsoft Symbol Server to obtain debug symbol files (http://support.microsoft.com/kb/311503).

4. Step 4 – Use SQL Server Extended Events to trace the back off events for the spinlock types of interest.

Extended Events provide the ability to track the "backoff" event and capture the call stack for those operation(s) most prevalently trying to obtain the spinlock. By analyzing the call stack it is possible to determine what type of operation is contributing to contention for any particular spinlock.



/*

This script will monitor for backoff events over a given period of time and

capture the code paths (callstacks) for those.



--Find the spinlock types

select map_value, map_key, name from sys.dm_xe_map_values

where name = 'spinlock_types'

order by map_value asc



--Example: Get the type value for any given spinlock type

select map_value, map_key, name from sys.dm_xe_map_values

where map_value IN ('SOS_CACHESTORE', 'LOCK_HASH', 'MUTEX')



Examples:

61LOCK_HASH

144 SOS_CACHESTORE

08MUTEX



*/



--create the even session that will capture the callstacks to a bucketizer

--more information is available in this reference: http://msdn.microsoft.com/en-us/library/bb630354.aspx

create event session spin_lock_backoff on server

add event sqlos.spinlock_backoff (action (package0.callstack)

where

type = 61--LOCK_HASH

or type = 144--SOS_CACHESTORE

or type = 8--MUTEX

)

add target package0.asynchronous_bucketizer (

set filtering_event_name='sqlos.spinlock_backoff',

source_type=1, source='package0.callstack')

with (MAX_MEMORY=50MB, MEMORY_PARTITION_MODE = PER_NODE)



--Ensure the session was created

select * from sys.dm_xe_sessions

where name = 'spin_lock_backoff'



--Run this section to measure the contention

alter event session spin_lock_backoff on server state=start



--wait to measure the number of backoffs over a 1 minute period

waitfor delay '00:01:00'



--To view the data

--1. Ensure the sqlservr.pdb is in the same directory as the sqlservr.exe

--2. Enable this trace flag to turn on symbol resolution

DBCC traceon (3656, -1)



--Get the callstacks from the bucketize target

select event_session_address, target_name, execution_count, cast (target_data as XML)

from sys.dm_xe_session_targets xst

inner join sys.dm_xe_sessions xs on (xst.event_session_address = xs.address)

where xs.name = 'spin_lock_backoff'



--clean up the session

alter event session spin_lock_backoff on server state=stop

drop event session spin_lock_backoff on server



Automating the Process of Capturing Memory Dumps to Analyze Spinlock Contention



The following extended events script has proven to be useful to automate the collection of memory dumps when spinlock contention becomes significant.



The following extended events script has proven to be useful to automate the collection of memory dumps when spinlock contention becomes significant.

SQL Query for Capturing Memory Dumps

The following SQL script can be used to automate the process of capturing memory dumps to help analyze spinlock contention:

/*

This script is provided "AS IS" with no warranties, and confers no rights.



Use: This procedure will monitor for spinlocks with a high number of backoff events

over a defined time period which would indicate that there is likely significant

spin lock contention.



Modify the variables noted below before running.





Requires:

xp_cmdshell to be enabled

sp_configure 'xp_cmd', 1

go

reconfigure

go



*********************************************************************************************************/

use tempdb

go

if object_id('sp_xevent_dump_on_backoffs') is not null

drop proc sp_xevent_dump_on_backoffs

go

create proc sp_xevent_dump_on_backoffs

(

@sqldumper_path nvarchar(max) = '"c:\Program Files\Microsoft SQL Server\100\Shared\SqlDumper.exe"'

,@dump_threshold int = 500 --capture mini dump when the slot count for the top bucket exceeds this

,@total_delay_time_seconds int = 60 --poll for 60 seconds

,@PID int = 0

,@output_path nvarchar(max) = 'c:\'

,@dump_captured_flag int = 0 OUTPUT



)

as

/*

--Find the spinlock types

select map_value, map_key, name from sys.dm_xe_map_values

where name = 'spinlock_types'

order by map_value asc



--Example: Get the type value for any given spinlock type

select map_value, map_key, name from sys.dm_xe_map_values

where map_value IN ('SOS_CACHESTORE', 'LOCK_HASH', 'MUTEX')

*/

if exists (select * from sys.dm_xe_session_targets xst

inner join sys.dm_xe_sessions xs on (xst.event_session_address = xs.address)

where xs.name = 'spinlock_backoff_with_dump')

drop event session spinlock_backoff_with_dump on server



create event session spinlock_backoff_with_dump on server

add event sqlos.spinlock_backoff (action (package0.callstack)

where

type = 61 --LOCK_HASH

--or type = 144 --SOS_CACHESTORE

--or type = 8 --MUTEX

--or type = 53 --LOGCACHE_ACCESS

--or type = 41 --LOGFLUSHQ

--or type = 25 --SQL_MGR

--or type = 39 --XDESMGR

)

add target package0.asynchronous_bucketizer (

set filtering_event_name='sqlos.spinlock_backoff',

source_type=1, source='package0.callstack')

with (MAX_MEMORY=50MB, MEMORY_PARTITION_MODE = PER_NODE)



alter event session spinlock_backoff_with_dump on server state=start





declare @instance_name nvarchar(max) = @@SERVICENAME

declare @loop_count int = 1

declare @xml_result xml

declare @slot_count bigint

declare @xp_cmdshell nvarchar(max) = null



--start polling for the backoffs

print 'Polling for: ' + convert(varchar(32), @total_delay_time_seconds) + ' seconds'

while (@loop_count < CAST (@total_delay_time_seconds/1 as int))

begin

waitfor delay '00:00:01'



--get the xml from the bucketizer for the session

select @xml_result= CAST(target_data as xml)

from sys.dm_xe_session_targets xst

inner join sys.dm_xe_sessions xs on (xst.event_session_address = xs.address)

where xs.name = 'spinlock_backoff_with_dump'



--get the highest slot count from the bucketizer

select @slot_count = @xml_result.value(N'(//Slot/@count)[1]', 'int')



--if the slot count is higher than the threshold in the one minute period

--dump the process and clean up session

if (@slot_count > @dump_threshold)

begin

print 'exec xp_cmdshell ''' + @sqldumper_path + ' ' + convert(nvarchar(max), @PID) + ' 0 0x800 0 c:\ '''

select @xp_cmdshell = 'exec xp_cmdshell ''' + @sqldumper_path + ' ' + convert(nvarchar(max), @PID) + ' 0 0x800 0 ' + @output_path + ' '''

exec sp_executesql @xp_cmdshell

print 'loop count: ' + convert (varchar(128), @loop_count)

print 'slot count: ' + convert (varchar(128), @slot_count)

set @dump_captured_flag = 1

break

end



--otherwise loop

set @loop_count = @loop_count + 1



end



--see what was collected then clean up

DBCC traceon (3656, -1)

select event_session_address, target_name, execution_count, cast (target_data as XML)

from sys.dm_xe_session_targets xst

inner join sys.dm_xe_sessions xs on (xst.event_session_address = xs.address)

where xs.name = 'spinlock_backoff_with_dump'



alter event session spinlock_backoff_with_dump on server state=stop

drop event session spinlock_backoff_with_dump on server

go



/* CAPTURE THE DUMPS

******************************************************************/

--Example: This will run continuously until a dump is created.

declare @sqldumper_path nvarchar(max) = '"c:\Program Files\Microsoft SQL Server\100\Shared\SqlDumper.exe"'

declare @dump_threshold int = 300 --capture mini dump when the slot count for the top bucket exceeds this

declare @total_delay_time_seconds int = 60 --poll for 60 seconds

declare @PID int = 0

declare @flag tinyint = 0

declare @dump_count tinyint = 0

declare @max_dumps tinyint = 3 --stop after collecting this many dumps

declare @output_path nvarchar(max) = 'c:\' --no spaces in the path please :)





--Get the process id for sql server

declare @error_log table (LogDate datetime,

ProcessInfo varchar(255),

Text varchar(max)

)

insert into @error_log

exec ('xp_readerrorlog 0, 1, ''Server Process ID''')

select @PID = convert(int, (REPLACE(REPLACE(Text, 'Server Process ID is ', ''), '.', '')))

from @error_log where Text like ('Server Process ID is%')

print 'SQL Server PID: ' + convert (varchar(6), @PID)



--Loop to monitor the spinlocks and capture dumps. while (@dump_count < @max_dumps)

begin



exec sp_xevent_dump_on_backoffs @sqldumper_path = @sqldumper_path,

@dump_threshold = @dump_threshold,

@total_delay_time_seconds = @total_delay_time_seconds,

@PID = @PID,

@output_path = @output_path,

@dump_captured_flag = @flag OUTPUT

if (@flag > 0)

set @dump_count=@dump_count + 1

print 'Dump Count: ' + convert(varchar(2), @dump_count)

waitfor delay '00:00:02'



end



Capturing Spinlocks Statistics Over a Specific Time Period





The following script can be used to look at spinlock statistics over a specific time period. Each time it runs it will return the delta between the current values and previous values collected.

/* Snapshot the current spinlock stats and store so that this can be compared over a time period

Return the statistics between this point in time and the last collection point in time.



**This data is maintained in tempdb so the connection must persist between each execution**

**alternatively this could be modified to use a persisted table in tempdb. if that

is changed code should be included to clean up the table at some point.**

*/



use tempdb

go



declare @current_snap_time datetime

declare @previous_snap_time datetime



set @current_snap_time = GETDATE()



if not exists(select name from tempdb.sys.sysobjects where name like '#_spin_waits%')

create table #_spin_waits

(

lock_name varchar(128)

,collisions bigint

,spins bigint

,sleep_time bigint

,backoffs bigint

,snap_time datetime

)



--capture the current stats

insert into #_spin_waits

(

lock_name

,collisions

,spins

,sleep_time

,backoffs

,snap_time

)

select name

,collisions

,spins

,sleep_time

,backoffs

,@current_snap_time

from sys.dm_os_spinlock_stats



select top 1 @previous_snap_time = snap_time from #_spin_waits

where snap_time < (select max(snap_time) from #_spin_waits)

order by snap_time desc



--get delta in the spin locks stats

select top 10

spins_current.lock_name

, (spins_current.collisions - spins_previous.collisions) as collisions

, (spins_current.spins - spins_previous.spins) as spins

, (spins_current.sleep_time - spins_previous.sleep_time) as sleep_time

, (spins_current.backoffs - spins_previous.backoffs) as backoffs

, spins_previous.snap_time as [start_time]

, spins_current.snap_time as [end_time]

, DATEDIFF(ss, @previous_snap_time, @current_snap_time) as [seconds_in_sample]

from #_spin_waits spins_current

inner join (

select * from #_spin_waits

where snap_time = @previous_snap_time

) spins_previous on (spins_previous.lock_name = spins_current.lock_name)

where

spins_current.snap_time = @current_snap_time

and spins_previous.snap_time = @previous_snap_time

and spins_current.spins > 0

order by (spins_current.spins - spins_previous.spins) desc



--clean up table

delete from #_spin_waits

where snap_time = @previous_snap_time



No comments:

Post a Comment