Category: SQL Server Performance

Async Auto Create Statistics Resolves Timeout

The following is based on an actual customer support case:

Customer facing web application executes a specific stored procedure that is set to timeout after 5 seconds. Most of the time the duration of this stored procedure is sub-second, but when the server load is high, this stored procedure will timeout roughly 5% of the time.

Top wait = latch non-buffer

Analyzed extended events capture of query executions filtering on sessions where stored procedure duration > 5 sec and found 4.9 seconds of the duration was spent updating qpstats on a > 100 million row table.

Enabled auto update stats async to resolve the issue:

ALTER DATABASE YourDatabaseName SET AUTO_UPDATE_STATISTICS_ASYNC ON

Statistics updates are synchronous by default which means queries always compile and execute with the latest statistics.  When the threshold has been reached and statistics are stale, the query optimizer will wait for the statistics to be updated before compiling the query plan.  If there are many rows in the table, the wait can be significant. When statistics updates are asynchronous, the query optimizer doesn’t wait but goes ahead and uses the existing query plan (or compiles a new one) using the existing statistics. Now, it is possible that the query optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles. Queries that compile after the asynchronous updates have completed will benefit from using the updated statistics.

You should stick with the default statistics updates when you perform operations that change the distribution of data, such as truncating a table or performing a bulk update of a large percentage of the rows. If you do not update the statistics after completing the operation, using synchronous statistics will ensure statistics are up-to-date before executing queries on the changed data.

You should use asynchronous update statistics to achieve more predictable query response times for the following scenarios:

  • Your application frequently executes the same query, similar queries, or similar cached query plans. Your query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the query optimizer can execute incoming queries without waiting for up-to-date statistics. This avoids delaying some queries and not others.
  • Your application has experienced client request time outs caused by one or more queries waiting for updated statistics. In some cases, waiting for synchronous statistics could cause applications with aggressive time outs to fail.

 

Advertisements

Missing SQL Server Performance Counters

8317

Server Error: 8317, Severity: 16, State: 1.
Server Cannot query value 'First Counter' associated with registry key 'HKLM\SYSTEM\CurrentControlSet\Services\MSSQLSERVER\Performance'. SQL Server performance counters are disabled.Server Default collation: SQL_Latin1_General_CP1_CI_AS (us_english 1033) Server Perfmon counters for resource governor pools and groups failed to initialize and are disabled.

We use the “First Counter” value in the registry to identify where the counters for this SQL instance start. The “First Counter” value should map to “MSSQL$Instance:Buffer Manager” as this is the first Perfmon counter in the ini file
If it doesn’t, we throw the error mentioned above in SQL error logs.

Does your registry key look like mine?

registry

If your registry key is missing ‘First Counter’, ‘First Help’, ‘Last Counter’, ‘Last Help’ then perform the following:

Re-add extensible counters
1). Open a Command Prompt window.
2). Run:  cd %Systemroot%\System32
3). Run:  findstr drivername *.ini
4). Note the name of the .ini file for each drivername in the list.
5). At the command prompt, type the following command, and then press ENTER:
lodctr axperf.ini
— use *.ini in the each drivername list obtained in step 4
6). Repeat step 5 for each .ini file in the list.

7). Rebuild all Performance counters including extensible and third-party counters:
cd\windows\system32
lodctr /R

8). Re-load SQL performance counters then restart SQL service in a DOS command window:
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn> unlodctr MSSQL$MSSQLSERVER
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn>lodctr perf-MSSQLSERVERsqlctr.ini
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn>net stop MSSQL$MSSQLSERVER
C:\Program Files\M
icrosoft SQL Server\MSSQL.1\MSSQL\Binn>net start MSSQL$MSSQLSERVER

9). Check the SQL errorlog and confirm that there are no more errors

10). Check the registry key and confirm that ‘First Counter’, ‘First Help’, ‘Last Counter’, ‘Last Help’ key values are correct now
HKLM\SYSTEM\CurrentControlSet\Services\MSSQL$MSSQLSERVER\Performance

11). Check perfmon and confirm SQL Performance Counters are available

perfmon

Why you shouldn’t run sp_updatestats after you rebuild your indexes

Don’t run sp_updatestats after rebuilding your indexes. The sp_updatestats will use the default sample and you could end up with less accurate stats post sp_updatestats than you had after your index rebuild. Only update statistics with fullscan post index rebuild, or better yet, only update your column statistics post index rebuild.

Current statistics =

Table Statistic ID Statistic Last Updated rows rows_sampled unfiltered_rows Modifications
tbl 1 tbl_Hash 2015-06-26 20:21:36.5930000 1318318 476423 1318318 0
tbl 2 tbl_DBID 2015-06-26 20:21:37.0930000 1318318 486032 1318318 0
tbl 3 tbl_AppName 2015-06-26 20:21:37.5770000 1318318 486032 1318318 0
tbl 4 tbl_Login 2015-06-26 20:21:38.0600000 1318318 486032 1318318 0
tbl 5 tbl_Time 2015-06-26 20:21:38.5330000 1318318 486032 1318318 0
tbl 6 _WA_Sys_00000011_341F99B2 2015-06-26 20:21:38.9370000 1318318 486032 1318318 4

After running sp_updatestats.  Note the rows_sampled changed.

Table Statistic ID Statistic Last Updated rows rows_sampled unfiltered_rows Modifications
tbl 1 tbl_Hash 2015-06-26 20:48:21.1600000 1718318 553940 1718318 0
tbl 2 tbl_DBID 2015-06-26 20:48:21.6830000 1718318 536499 1718318 0
tbl 3 tbl_AppName 2015-06-26 20:48:22.2030000 1718318 536499 1718318 0
tbl 4 tbl_Login 2015-06-26 20:48:22.7170000 1718318 536499 1718318 0
tbl 5 tbl_Time 2015-06-26 20:46:32.4200000 1718318 536499 1718318 0
tbl 6 _WA_Sys_00000011_341F99B2 2015-06-26 20:48:23.1500000 1718318 536499 1718318 0

Rebuild all indexes
EXEC sp_MSforeachtable @command1=”SET QUOTED_IDENTIFIER ON;ALTER INDEX ALL ON ? REBUILD;”

Note how the rows_sampled is all the rows

 Table Statistic ID Statistic Last Updated rows rows_sampled unfiltered_rows Modifications
tbl 1 tbl_Hash 2015-06-26 20:51:06.1900000 1718318 1718318 1718318 0
tbl 2 tbl_DBID 2015-06-26 20:51:08.5700000 1718318 1718318 1718318 0
tbl 3 tbl_AppName 2015-06-26 20:51:10.9400000 1718318 1718318 1718318 0
tbl 4 tbl_Login 2015-06-26 20:51:13.3430000 1718318 1718318 1718318 0
tbl 5 tbl_Time 2015-06-26 20:51:15.7100000 1718318 1718318 1718318 0
tbl 6 _WA_Sys_00000011_341F99B2 2015-06-26 20:48:23.1500000 1718318 536499 1718318 0

added 10,000 rows and ran sp_updatestats.  Note how we sampled less than all the rows

Table Statistic ID Statistic Last Updated rows rows_sampled unfiltered_rows Modifications
tbl 1 tbl_Hash 2015-06-26 20:53:39.3270000 1728318 479590 1728318 0
tbl 2 tbl_DBID 2015-06-26 20:53:39.8230000 1728318 520146 1728318 0
tbl 3 tbl_AppName 2015-06-26 20:53:40.3230000 1728318 520146 1728318 0
tbl 4 tbl_Login 2015-06-26 20:53:40.8200000 1728318 520146 1728318 0
tbl 5 tbl_Time 2015-06-26 20:53:41.3100000 1728318 520146 1728318 0
tbl 6 _WA_Sys_00000011_341F99B2 2015-06-26 20:53:41.7470000 1728318 520146 1728318 0

TempDB growth control

It feels really good when I stretch my knowledge, do some research, trust the tried and true, make a recommendation and …IT WORKS!  It really is a beautiful thing.

Most recent case involved ETL jobs.  All the ETL work was handled by stored procedures that used temp tables to manipulate and insert data into a staging database.  Problem was the tempdb was growing until it consumed all the disk space and the job wouldn’t complete.  Acquiring additional storage was going to take some time, so action needed to be taken else thousands of employees would be twiddling their thumbs, waiting on the disks to arrive.

Examining the stored procedures I noticed that there was an explicit drop of the temp tables if they existed, at the beginning, but no explicit drop of the temp tables at the end.  I suggested we add an explicit drop after each step throughout all the stored procedures in all of the jobs.

This resulted in the tempdb going from 160GB growth to 600MB growth.

Cardinality Estimator was Rewritten for SQL Server 2014

The query optimizer’s purpose is to select the best strategy for answering a query. It attempts to achieve this goal by assigning a cost to each potential plan and choosing the plan with the lowest cost. The cost of a plan depends on several different factors, but the most important factor is the number of rows processed by each stage in the plan. The process of getting the number of rows to be processed is referred to as “Cardinality Estimation” or “CE”.

Why was it rewritten? There were some core assumptions that were outdated based on how users write queries and what the data looks like, plus there were some CE issues that needed to be addressed.

How to Enable the new CE

1.Enable at server level: Trace flag 2312. Note that if you enable trace flag 2312, all queries will be forced to use new CE. Every database will use new CE.
2.Enable at database level: set the database compatibility level to 120. This enables the new CE to be used at the database level. No other databases will use the new CE until their compatibility level is changed to 120.
3.Enable at query level: option (QUERYTRACEON 2312)

How to Disable the new CE and revert to old CE

1.Disable at server level: Trace flag 9481. Note that if you enable trace flag 9481, all queries will be forced to use old CE. Every database will use old CE.*
2.Disable at database level: set the database compatibility level to < 120. This enables the old CE to be used at the database level.
3.Disable at query level: option (QUERYTRACEON 9481)

*Note: If you enable 9841 at server level, then all queries will be forced to use old CE unless you have QUERYTRACEON for a particular query.

What happens when you enable both trace flag 9481 and 2312?

If you enable both trace flags, neither will be used to determine or decide which CE will be used. Instead, database compatibility level will be used to decide.

New CE has not made any improvement in the following known constructs:

1. Table variable (estimates 1 row)
2. Multi-statement table valued function (TVF) (estimates 1 row)
3. Table valued parameter (TVP) (rows won’t change upon first compile)
4. Local variable (estimates as ‘unknown’)
5. Recursive CTE (generally very low estimate)
6. Parameter sniffing

Database Mirroring Time-Out Mechanism

Error: 1479, Severity: 16, State: 2

The mirroring connection to “TCP://ComputerName:PortNumber” has timed out for database “DatabaseName” after 10 seconds without a response. Check the service and network connections.

Twins
Are you Synchronous or Asynchronous? If you do not need auto-failover, increasing the timeout may prevent false positives that may occur during times of heavy load. If you set the timeout to 60 seconds, then failures that resolve within that time frame won’t even bother the existing setup.

The Mirroring Time-Out Mechanism
Because soft errors are not detectable directly by a server instance, a soft error could potentially cause a server instance to wait indefinitely. To prevent this, database mirroring implements its own time-out mechanism, based on each server instance in a mirroring session sending out a ping on each open connection at a fixed interval.
To keep a connection open, a server instance must receive a ping on that connection in the time-out period defined, plus the time that is required to send one more ping. Receiving a ping during the time-out period indicates that the connection is still open and that the server instances are communicating over it. On receiving a ping, a server instance resets its time-out counter on that connection. If no ping is received on a connection during the time-out period, a server instance considers the connection to have timed out. The server instance closes the timed-out connection and handles the time-out event according to the state and operating mode of the session. Even if the other server is actually proceeding correctly, a time-out is considered a failure. If the time-out value for a session is too short for the regular responsiveness of either partner, false failures can occur. A false failure occurs when one server instance successfully contacts another whose response time is so slow that its pings are not received before the time-out period expires. In high-performance mode sessions, the time-out period is always 10 seconds. This is generally enough to avoid false failures. In high-safety mode sessions, the default time-out period is 10 seconds, but you can change the duration. To avoid false failures, don’t set your timeout to less than 10 seconds.

If you notice that you get timeouts during certain times of the day, or when one of your SQL Agent Jobs run, you could write a script to run right before that time of day, or right before that job executes, that changes the timeout to a larger number, then have another script that runs at the end of the time period, or when the job completes, that changes it back.

If you experience these types of resource hits that cause mirroring to time out, then you should get to the root cause of your performance issue. Increasing the mirroring timeout is just a temporary band aide.

Maintenance Plans – I don’t think they are doing it right

Are they rebuilding all their indexes and then updating stats at 20%?
Or, perhaps, rebuilding all indexes where fragmentation is greater than 50%, reorganizing all indexes where fragmentation is greater than 20% and not updating any statistics at all?
Or, my personal favorite, autogrow in 1MB increments followed by a weekly shrink. OH YEAH! http://youtu.be/OG_6CopW9GQ&#8221;

Objectively point out where the maintenance plan does not line up with this link: http://technet.microsoft.com/en-us/magazine/2008.08.database.aspx?pr=blog

Let the data do the talking. If they still circular file your recommendations, then you have to do a risk assessment on whether it is worth having the DBAs hate you when you go over their heads.

I don't know how to answer that question