Simple Talk Newsletter, 5Th September 2011

SQL Code Reuse: teaching a dog new tricks Published Thursday, September 01, 2011 5:42 PM Developers, by every natural instinct and training, strive to make their code reusable and generic. Dissuading them from doing so, in certain circumstances, is akin to trying to persuade a dog not to fetch a stick. However, when Gail Shaw commented on Twitter last week that "For the record, code reuse in SQL.is not always a good thing", it was more than a causal remark; it was borne out of bitter experience. The problem is that, in the absence of their usual armory O-O techniques such as encapsulation and inheritance, the price of making database code easier to maintain, by such obvious methods, can be high. The "generic" views, stored procedures and functions that result, may seem elegant and reusable, but can destroy performance, because it is tough for the query optimizer to produce an efficient execution plan. It hurts to make SQL code generic. At some point, nearly every SQL Programmer gets infected with the feverish idea of passing table names to stored procedures. "Hey, why write scores of procedures to do this process on each table when I can write a generic, reusable procedure that does it on any table!" Bad idea; behind every stored procedure is an execution plan and a stored procedure designed to work with "any table" will result in a generic execution plan that will perform very poorly for a majority of tables. It is far better if they are tailored for specific tables and specific needs. Another typical example is where the logic for some seemingly-complex calculation has been "abstracted" into a monstrous, but reusable, view, which performs tortuous aggregations and multiple joins, executes appalling slowly, acquires numerous long-held locks and causes severe blocking in the database. Often, such twisted logic can be replaced by simple, easily optimized SQL statements. Granted, it isn't "reusable" and flaunts the 'DRY' (Don't repeat yourself) principle, but it is relatively easy to write and will often perform orders of magnitude faster. User-defined Functions (UDFs) are another favorite mechanism for promoting code reuse, and are often even more problematic. In-line logic is always much faster, even if to the sensitive developer it has the look of hippos doing line-dancing. Memories of the overuse of UDFs can make any seasoned DBA flinch. If you ever bump into Grant Fritchey at a community event, buy him a beer and ask him about the case of the application with multi-statement UDFs that called other multi-statement UDFs in an attempt at enforcing inheritance in a database. Also ask him how well it scaled beyond a single-user and a single row. Should SQL Server simply get better at adopting and supporting such basic and universally-accepted programming practices as putting logic in a function? Probably, yes, but in the meantime, we must measure any code reuse in the database against the likely performance penalty. Perhaps the most effective form of code reuse is via constraints, though it requires lateral thinking to extend this beyond simple data rules. Functions can be used, but extra care and effort is required to write them as inline functions; in-line code or calculated columns will always outperform UDFs. Stored procedure use is to be actively encourage; just don't try to make them generic. On Simple-Talk we've published a lot about execution plans, query optimization and performance. We believe that, once a developer is aware of the process, they are better able to judge that fine balancing point in the compromise between performance and maintainability. Even better, we hope we've also given a glimpse of an alternative path to those goals, by means of intelligent database design. A neat trick, if you can do it. Cheers, Tony. by Tony Davis Temporary Tables in SQL Server 01 September 2011 by Phil Factor Temporary tables are used by every DB developer, but they're not likely to be too adventurous with their use, or exploit all their advantages. They can improve your code's performance and maintainability, but can be the source of grief to both developer and DBA if things go wrong and a process grinds away inexorably slowly. We asked Phil for advice, thinking that it would be a simple explanation. emporary tables are just that. They are used most often to provide workspace for the intermediate results when processing data within a batch or Tprocedure. They are also used to pass a table from a table-valued function, to pass table-based data between stored procedures or, more recently in the form of Table-valued parameters, to send whole read-only tables from applications to SQL Server routines, or pass read-only temporary tables as parameters. Once finished with their use, they are discarded automatically. Temporary tables come in different flavours including, amongst others, local temporary tables (starting with #), global temporary tables (starting with ##), persistent temporary tables (prefixed by TempDB..), and table variables.(starting with (@) Before we get too deep into the technology, I’d advise that you should use table variables where possible. They’re easy, and SQL Server does the work for you. They also tend to cause fewer problems to a hard-working OLTP system. Just occasionally, you may need to fine-tune them to get good performance from them, but I'll explain that in a moment, Table Variables Table variables are used within the scope of the routine or batch within which they are defined, and were originally created to make table-valued functions possible. However, they are good for many of the uses that the traditional temporary table was put to. They behave like other variables in their scoping rules. Once out of scope, they are disposed of. These are much easier to work with, and pretty secure, and they trigger fewer recompiles in the routines where they’re used than if you were to use temporary tables. Table variables require less locking resources as they are 'private' to the process that created them. Transaction rollbacks do not affect them because table variables have limited scope and are not part of the persistent database, so they are handy for creating or storing data that ought to survive roll backs such as log entries. The downside of table variables is that they are often disposed of before you can investigate their contents for debugging, or use them to try out different SQL expressions interactively. If your application is conservative and your data volumes light you’ll never want anything else. However, you can hit problems. One difficulty is that table variables can only be referenced in their local scope, so you cannot process them using dynamic SQL as you might with a temporary table or table-valued parameter. This is because you can’t refer an externally-defined table variable within dynamic SQL that you then execute via the EXEC statement or the sp_ExecuteSQL stored procedure because the dynamic SQL is executed outside the scope of the table variable. You can, of course, create, and then use, the table variable inside the dynamic SQL because the table variable would be in scope. However, once the dynamic SQL is run, there would be no table variable There are a few anomalies to be aware of too. You can’t, for example, change the table definition after the initial DECLARE statement; a table variable can’t be the destination of a SELECT INTO statement or a INSERT EXEC; You can’t call user-defined functions from CHECK constraints, DEFAULT values, and computed columns in the table variable. The only constraints that you're allowed beyond CHECK constraints are PRIMARY KEY, UNIQUE KEY, and NULL / NOT NULL The trickiest problems, though, come with increasing size of the tables, because you can’t declare an index explicitly and Distribution statistics aren’t maintained on them. You also cannot generate parallel query plans for a SQL expression that is modifying the table's contents. To get around the index restriction, you can use constraints to do the same thing. Most essential is the Primary Key constraint which allows you to impose a clustered index, but unique constraints are useful for performance. The Query optimiser will happily use them if they are around. The biggest problem with table variables is that statistics aren’t maintained on the columns. This means that the query optimiser has to make a guess as to the size and distribution of the data and if it gets it wrong, then you’re going to see poor performance on joins: If this happens, there is little you can do other than to revert to using classic local temporary tables. One thing you can try is to add option (recompile) to the statement that involves the table variable joining with other tables. By doing this, SQL Server will be able to detect number of rows at recompile because the rows will have already been populated. In this demo, the join was reduced in time by three quarters simply by adding the OPTION (RECOMPILE) SET nocount ON DECLARE @FirstTable TABLE (RandomInteger INT) DECLARE @SecondTable TABLE (RandomInteger INT) DECLARE @WhenWeStarted DATETIME DECLARE @ii INT BEGIN TRANSACTION SET @ii = 0 WHILE @ii < 100000 BEGIN INSERT INTO @FirstTable VALUES (RAND() * 10000) SET @ii = @ii + 1 END SET @ii = 0 WHILE @ii < 100000 BEGIN INSERT INTO @SecondTable VALUES (RAND() * 10000) SET @ii = @ii + 1 END COMMIT TRANSACTION SELECT @WhenWeStarted = GETDATE() SET STATISTICS PROFILE ON SELECT COUNT(*) FROM @FirstTable first INNER JOIN @SecondTable second ON first.RandomInteger = second.RandomInteger OPTION (RECOMPILE) -- 153Ms as opposed to 653Ms without the hint SET STATISTICS PROFILE OFF SELECT 'That took ' + CONVERT(VARCHAR(8), DATEDIFF(ms, @WhenWeStarted, GETDATE())) + ' ms' go Now if you can make what goes into the tables unique, you can then use a primary key constraint on these tables.

Simple Talk Newsletter, 5Th September 2011

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support