Tuesday, July 14, 2015

Are Your VS Database Projects Dropping and Recreating Constraints?

I work with Visual Studio SSDT database projects extensively in my day to day work.  In fact, most of our application databases that are involved in our weekly software release cycle are in source control using Visual Studio SSDT projects.  These databases are all automatically deployed to Dev, QA, Staging, and Production and (most of the time) it works wonderfully!  Including our application databases in our release automation processes has drastically reduce deployment errors and has enabled us to track down most problems before they ever get to production.

There are a few specific situations where the database project cannot be realistically deployed using release automation.  Here are two scenarios that generally force us to manually deploy changes:
  1. A new non-nullable column needs to be added to an existing table with a default constraint
  2. An index needs to be changed
Typically, these situations arise when the table is very large and/or the deployment requires multiple schema changes that have to be deployed in a sequential order.  In any of these cases we simply execute the changes manually on the database using a script then commit the changes to source control so the project reflects what the new database schema looks like.  When the automated deployment executes it doesn't actually make any changes because they're already in place.

All-in-all, I'm very happy with the database projects and I would recommend using them.

The Publish Profile

When the database schema is published to the database server, it is recommended to use a publish profile.  A publish profile defines all of the rules that control the behavior of the publish.  In our case, one of the rules we enforce is keyword case sensitivity.  We enable this rule in both the project and in the publish profile.  Some would argue that if you're using a case insensitive collation such as SQL_Latin1_General_CP1_CI_AS then you don't need to enforce keyword case sensitivity in the database project.  Aside from code aesthetics, there are some very specific performance reasons for having consistently written T-SQL statements.

According to Microsoft KB Article 263889 having inconsistent stored procedure casing could cause cache misses
"the request for a COMPILE lock can cause a "blocking chain" situation if there are many SPIDs trying to execute the same procedure with a different case ... the algorithm that is being used to find the procedure in cache is based on hash values (for performance reasons), which can change if the case is different"
This kind of behavior also applies to ad-hoc statements being executed.  In general, enforcing case sensitivity in your database code is a good thing.

I Am Seeing Objects Being Dropped and Recreated

I've recently noticed in our deployment logs that many constraints and even some indexes were being consistently dropped and recreated.  During some of the database publish tasks the log showed over 100 objects being dropped and recreated.

I even found a Microsoft Connect issue opened for this problem that remains unresolved since 2013: https://connect.microsoft.com/VisualStudio/feedback/details/793433/ssdt-generated-script-drops-and-re-adds-default-and-check-constraints-every-time

This problem was baffling me.  I even have several database schema compare projects (.scmp) configured and saved in each solution (to create one, right-click the database project and select Compare Schema to generate a corresponding .scmp file).  I use these compare profiles to verify what will change when my database project gets published.  It's handy and I recommend doing the same and using it prior to checking in your code.

I See No Differences In The Compare Project, So What's the Problem?

I use the same options in the schema compare project that I use in the publish profile... or so I thought.  Configuring the schema compare project is a bit tricky because the options to configure the compare project aren't represented the same way that the options to configure the publish profile are.  I finally discovered, thanks to this helpful Stack post, that my compare profile was missing a setting!  There is an option called ignore keyword casing that the publish profile had unchecked (because we want a case-sensitive comparison) but the schema compare project had checked.


With the option unchecked I am now able to see the issue.

So What Was the Issue?

The root cause of the issue is that when the compiled project file (or .dacpac file) is being compared with the actual SQL Server database, the way that the schema is represented on the SQL Server side may be slightly syntactically different than the way the script is defined in the project.  This is detected as a difference so SSDT tries to drop and recreate the object to make them consistent.

How do I fix it?

The only way to fix this issue is to review the differences with the schema compare project and edit the code in the project so it is in the same format that SSDT is detecting it in SQL Server.

Here are some examples that I found repeated throughout the database projects that I work with:
  1. Some system functions are represented as lowercase in SQL Server like sysutcdatetime(), getutcdate(), and newid()
  2. Scalar numerical values in default constraints need to be enclosed in double parenthesis ((0)), but I noticed that string values do not
  3. Filtered index and check constraint conditions must be enclosed in parenthesis (), the column names must be in square brackets, and the comparison operator must not have spaces between the left and right side (e.g. WHERE ([myCol]=(1)) )
  4. Some columns must be enclosed in square brackets []

Conclusion

It was a bit of a headache for a few days until I figured out the problem.  With the comparison project options set correctly I was eventually able to review all of the projects and fix these issues which reduced the overall publish duration significantly in a few cases.  It would be best if Microsoft addressed the Connect issue, but this is one issue that can be reasonably managed fairly easily once you know what to look for, it just takes some due diligence.

Thanks for reading, I hope this helped a few of you out there!

Friday, July 10, 2015

SQL Server - New Standard Edition Features in SQL Server 2016

I love going to the local tech meetups and yesterday I attended the monthly NYC SQL Server User Group on 7/9/2015 where @aaronbertrand discussed new standard edition features that were introduced in SQL Server 2014 as well as upcoming features slated for SQL Server 2016.  Microsoft tends to focus on their enteprise edition offering yet many DBAs use standard edition because of the price point so I thought I'd share this list of features with you as well as post some links I found to posts discussing these features for further reading.

What's New for SQL Server 2014 Standard Edition

Improved Cardinality Estimator

Joe Sack from SQLSkills wrote a nice white paper that he linked to on his SQLSkills blog called Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.

Parallel SELECT * INTO...

Another Joe Sack post on the SQLPerformance site called Exploring SQL Server 2014 SELECT INTO Parallelism discusses this new feature and the benefits.

TempDB Eager Write Fix

This this short blog post by Bob Dorr from the PSSQL team quickly explains how just by upgrading to SQL 2014 you could be seeing significant performance benefits by the improvements to how tempdb write operations are being managed.

Buffer Pool Extension

This feature is great for those of us forced to run SQL on a limited resource server or a highly dense VM infrastructure.  Brent Ozar has a nice write-up called SQL Server 2014 Buffer Pool Extensions for this feature on his blog.

Delayed Durability

Aaron Bertrand discusses this feature on his own blog in a post called Delayed Durability in SQL Server 2014.  While this feature can grant you some performance improvements, you will introduce some level of potential data loss if a problem occurs.

Inline Index Declarations

Here's a quick write-up by Junaith Haja on SQLCast.com called Inline Specification of Indexes explaining how to create indexes at table creation time.

sys.dm_exec_query_profiles

This is a new DMV that shows query plan progress.  Show profile must be on in order to use it.  There is a visualization tool for this in 2016 (mentioned below).  This blog post by David Barbarin on dbi-services.com called SQL Server 2014: sys.dm_exec_query_profiles, a new promising feature explains this new feature in detail.

Native Backup Encryption

Native encryption for SQL Server backups is huge! Nicolas Souqet explains this feature in great detail on his msdn post called SQL Server 2014 Backup Encryption.


What's New for SQL Server 2016 Standard Edition


Wider Indexes

They're going to be making similar indexing changes as they did for Azure in SQL 2014.  You'll be able to have 32 columns instead of 16 and, at least for nonclustered indexes, 1700 bytes instead of 900.  I couldn't find any post on this feature for SQL 2016, but here's a post on sqlmag.com called Indexes in Azure SQL Database v12 explaining how they work for SQL 2014 in Azure.

Stretch Database

This feature will let you store partial data as read only in the cloud.  It is similar to a distributed partitioned view.  This article called SQL Server 2016: Stretch Database on infoq has more details including the limitations.

Configure TempDB On Setup

Aaron Bertrand from SQLSentry has a very detailed write-up on this new feature in his post called SQL Server 2016 : Getting tempdb a little more right.

Smarter Maintenance Plans

Aaron didn't really go into detail on this during his talk and I couldn't find much online regarding the improvements.  I did find one Power Point Presentation by Kat Long from xtivia.com where she bullets a few of the improvements:
  • Perform Checksum
  • Continue on Error
  • Block Size
  • Max Transfer Size
If anyone can find more information please let me know.


SSMS Automatic Check for Updates

In order to keep up with the fast paced cloud offerings, in SQL 2016 SQL Server Management Studio Now Has Its Own Upgrade Path independent of the normal release cycle.

.Net 3.5 Dependency (Might Be) Going Away

That's about all that we know.  Aaron didn't seem too sure about this one either but it's one to keep an eye out for.

New Extended Events for Backup / Restore Operations

Aaron Bertrand discusses New Backup and Restore Extended Events in this SQLPerformance.com post.  They give you a much more granular view of the backup / restore operations to help troubleshoot problems.

New Query Hints

Aaron mentioned that some new query hints called min_grant_request and max_grant_request will be available for governing memory resources at the query level.  This would be particularly useful for a developer that is writing a large ETL / reporting query and they want to intentionally throttle the memory usage.  This is not a replacement for resource governor.

SQL Query Store

It's possible this feature will remain enterprise only.  This is a very helpful feature that keeps track of query plans and shows regressions.  Benjamin Navarez posted an article called The SQL Server Query Store on sqlperformance.com which discusses this new feature in detail. 

JSON Support

If you're like me and many other DBA's, then you know that experience with NoSQL databases like MongoDB are an inevitable part of the future career path of many database administrators and with that will come JSON.  SQL Server 2016 now natively supports JSON in a similar way to how it supports XML.  Aaron Bertrand wrote an entire article on his SQLSentry blog called SQL Server 2016 : JSON Support that goes into detail about the feature.

Additional Online Index Support

Aaron glazed over this item quickly during the talk.  I couldn't find anything after doing some Google searches.  We'll just have to keep an eye out for this.

Temporal Table Support

This is a cool feature that could possibly be used in place of Change Data Capture.  This feature allows for any logical record to have a current version and zero or more historical versions that can be queries using some newly available syntax.  Additionally, this feature is ANSI standard.  Jonathan Allen posted a nice article on infoq called SQL Server 2016: Temporal Tables that explains this feature in more detail.

Live Query Stats

Another cool feature is that SSMS 2016 adds visual support to the sys.dm_exec_query_profiles DMV mentioned above.  Jonathan Allen also posted an article about this feature on infoq called SQL Server 2016: Live Query Statistics that explains this in more detail.

sys.dm_exec_function_stats

Having trouble with developers adding functions to your stored procedure?  Well, now you can show them exactly how detrimental that function is to their procedure with this new DMV that's available.  Aaron Bertrand has a nice write-up of this feature on sqlperformance.com called SQL Server 2016 : sys.dm_exec_function_stats

New Diagnostic Columns in sys.dm_exec_query_stats

This troubleshootingsql.com article called SQL Server 2016 Public Preview (CTP2) – sys.dm_exec_query_stats lists out the 24 new columns available to the sys.dm_exec_query_stats DMV and what they do.

sys.dm_exec_session_wait_stats

This is a new dmv for breaking out wait stats by session.  A Google search yielded nothing for me so we'll have to wait and see about this one.

Row Level Security

Previous versions of SQL Server allowed you to secure table data by columns, but new to SQL 2016 is a feature that allows you to secure table data by rows.  This msdn article explains the Row Level Security feature and how it works.

Dynamic Data Masking

On the surface, this feature looks awesome.  It allows you to mask data from the client, who wouldn't want the ability to do that?!  Within a matter of about 20 minutes, after running some custom test queries, Aaron, with the help of the participants at the user group, was able to render this feature minimally useful.  There are still some valid use cases, but not as many as I would have liked.  Here's an article from Pinal Dave on the sqlauthority site called SQL Server – 2016 – New Feature: Dynamic Data Masking explaining the feature in detail.

What the feature does is allow you to show garbage data for certain users.  You can provide a filter function which basically does a character replacement for the most part.

The problem lies in the fact that this is basically a filter applied at the SELECT clause.  It doesn't actually change data stored in the database.  Here are some ways we found to subvert the masking:
  • you can still search for a value within WHERE clauses (so if the value 12345 is masked to 1XXXX, you can still do WHERE myCol = 12345 and the row will be returned)
  • a SELECT ... INTO, regardless of whether it's a temporary or persisted table, will contain the unmasked data
Aside from restricting viewing access via the application or during a presentation, I'm not sure how valuable this feature will be.

Always Encrypted

This is another great feature available in 2016.  For those of you that need to encrypt data at rest, it may save some trouble with having to use third party tools.  The keys are now stored in the front-end, off server which will help satisfy some of those pesky security requirements where TDE failed.

Jonathan Allen has a nice post on infoq called SQL Server 2016: Always Encrypted that goes into detail about this feature.

New Password Hashing Functionality

I couldn't find any information online in regards to this feature, but one interesting item that Aaron pointed out during his talk is that a hashed password with check policy on fails (silently) yet is_policy_checked still shows a value of 1.  Aaron has a write-up for this issue in his post called Please help me fix a big security loophole. (Connect Issue 1250342 for more details)

High Availability

High Availability may become available for basic availability groups.
  • no listener support
  • no readable secondaries
  • single database per group

Replication to Azure

This was mentioned as a SQL 2016 feature but it seems to be available in 2014 as well: Use the Add Azure Replica Wizard (SQL Server)

Additional Notes

There were a few takeaways from the meetup that were mentioned but weren't necessarily SQL 2014/2016 specific features.

sys.dm_db_persisted_sku_features

This DMV can be used to check standard edition features that are restricted in the current version of SQL Server.

Development Tricks

There are some nice ways for developers to create a test database that will act like a production database without having all of the production data.
  • Create an empty database but add all of the production statistics and histograms called a stats-only database (don't forget to turn off auto update stats).  Matteo Lorini posted a nice article on mssqltips.com called How to create a SQL Server Clone Statistics Only Database which goes into detail about how to do this.
  • DBCC OPTIMIZER_WHATIF allows you to see what a query plan would look like if the hardware were different.  This article on sqlity.net called Optimizer, what if I had more CPUs? has more information
  • DBCC SHOWWEIGHTS, SETCPUWEIGHT, and SETIOWEIGHT allow you to see what plans would look like if the CPU and IO costs change.  This article by Paul White called Inside the Optimizer: Plan Costing goes into more detail about how the plan is generated and how these undocumented features work.
  • UPDATE STATISTICS ... WITH ROWCOUNT will allow you to generate statistics based on false row count information

Summary

There are so many new features being introduced in SQL Server 2016 with a good number of those being released for standard edition.  Many of these features attempting to align the feature set with the cloud based Azure offering.

Thursday, July 9, 2015

SQL Server - Comparing Partially Disparate Lists

What's the Situation?

I was given a source list of users and three other lists of users.  I need to find all of the users in all of the lists in order to run some analysis.  This sounds simple, but to do it by hand is a bit tedious when the lists are large and to do it in SQL Server is a bit tricky to get it just right.

The original lists were given as flat files.  Each list comprised of a first name and last name.  I did some data massaging then imported the files into SQL Server using the Import/Export wizard.  The result became 4 tables of two columns (FirstName and LastName).  Now the fun begins...

Test Case Setup

In order to setup a test case so you can follow along, create 4 tables:

CREATE TABLE TableA (
FirstName VARCHAR(255) NULL
, LastName VARCHAR(255) NULL
)
CREATE TABLE TableB (
FirstName VARCHAR(255) NULL
, LastName VARCHAR(255) NULL
)
CREATE TABLE TableC (
FirstName VARCHAR(255) NULL
, LastName VARCHAR(255) NULL
)
CREATE TABLE TableD (
FirstName VARCHAR(255) NULL
, LastName VARCHAR(255) NULL
)

Then load some dummy data:

INSERT INTO TableA VALUES ('John', 'Smith'), ('Bob', 'Gray'), ('Jane', 'Doe'), ('Gary', 'Thompson')
INSERT INTO TableB VALUES ('Bob', 'Gray'), ('Gary', 'Thompson'), ('Theodore', 'Edwards')
INSERT INTO TableC VALUES ('John', 'Smith'), ('Bob', 'Gray'), ('Gary', 'Thompson'), ('Theodore', 'Edwards')
INSERT INTO TableD VALUES ('John', 'Smith'), ('Gary', 'Thompson'), ('Theodore', 'Edwards'), ('Mike', 'Jones')

Making the Report

In order to see who belongs where, we obviously need a join.  The question becomes what join do I use and how to I use it so I get the results I need? (A list of user membership in each table with one row per user).

In this case, I only needed to join on LastName because all last names were unique.

Using Left Joins

A LEFT OUTER JOIN would suffice if we only cared about the users in the source list.  If this were the case, the query below would have been sufficient:

SELECT
   a.FirstName AS TableA_FirstName
,  a.LastName AS TableA_LastName
,  b.FirstName AS TableB_FirstName
,  b.LastName AS TableB_LastName
,  c.FirstName AS TableC_FirstName
,  c.LastName AS TableC_LastName
,  d.FirstName AS TableD_FirstName
,  d.LastName AS TableD_LastName
FROM
   TableA a
   LEFT JOIN TableB b
      ON a.LastName = b.LastName
   LEFT JOIN TableC c
      ON a.LastName = c.LastName
   LEFT JOIN TableD d
      ON a.LastName = d.LastName
ORDER BY
   COALESCE(a.LastName, b.LastName, c.LastName, d.LastName)

The problem with the left join is that we only see 4 rows instead of 6 because two of the names don't exist in TableA.


Using Full Outer Joins

What about users that belong only in one of the other three lists?  A FULL OUTER JOIN can help solve this:

SELECT
   a.FirstName AS TableA_FirstName
,  a.LastName AS TableA_LastName
,  b.FirstName AS TableB_FirstName
,  b.LastName AS TableB_LastName
,  c.FirstName AS TableC_FirstName
,  c.LastName AS TableC_LastName
,  d.FirstName AS TableD_FirstName
,  d.LastName AS TableD_LastName
FROM
   TableA a
   FULL OUTER JOIN TableB b
      ON a.LastName = b.LastName
   FULL OUTER JOIN TableC c
      ON a.LastName = c.LastName
   FULL OUTER JOIN TableD d
      ON a.LastName = d.LastName
ORDER BY
   COALESCE(a.LastName, b.LastName, c.LastName, d.LastName)

The problem now is if the user only belongs in at least two of the other lists (b, c, or d) and not the first list (a), the result becomes skewed creating a new row for every additional entry after the first one found.  Using the example data and the query above we now have 8 rows instead of 6 because Theodore Edwards exists in lists b, c, and d but doesn't exist in list a.  There's an additional row for lists c and d.


Using Full Outer Joins With Complex ON Conditions

To fix the row skew issue we can add ON conditions to TableC and TableD so all possible tables are accounted for.

SELECT
   a.FirstName AS TableA_FirstName
,  a.LastName AS TableA_LastName
,  b.FirstName AS TableB_FirstName
,  b.LastName AS TableB_LastName
,  c.FirstName AS TableC_FirstName
,  c.LastName AS TableC_LastName
,  d.FirstName AS TableD_FirstName
,  d.LastName AS TableD_LastName
FROM
   TableA a
   FULL OUTER JOIN TableB b
      ON a.LastName = b.LastName
   FULL OUTER JOIN TableC c
      ON a.LastName = c.LastName
      OR b.LastName = c.LastName
   FULL OUTER JOIN TableD d
      ON a.LastName = d.LastName
      OR b.LastName = d.LastName
      OR c.LastName = d.LastName
ORDER BY
   COALESCE(a.LastName, b.LastName, c.LastName, d.LastName)

Now the result looks the way we want it to.  Only one row per matching name across all 4 tables (lists).


Summary

Though this trick may not come in handy in day to day operations I find it to be helpful when performing ad-hoc analysis of data.  This type of analytic querying may be helpful for reviewing lists of SQL logins, lists of database users, or lists of active directory security group membership to name a few.