SQL Code Reuse: teaching a dog new tricks

Published Thursday, September 01, 2011 5:42 PM

Developers, by every natural instinct and training, strive to make their code reusable and generic. Dissuading them from doing so, in certain circumstances, is akin to trying to persuade a dog not to fetch a stick. However, when Gail Shaw commented on Twitter last week that "For the record, code reuse in SQL.is not always a good thing", it was more than a causal remark; it was borne out of bitter experience. The problem is that, in the absence of their usual armory O-O techniques such as encapsulation and inheritance, the price of making database code easier to maintain, by such obvious methods, can be high. The "generic" views, stored procedures and functions that result, may seem elegant and reusable, but can destroy performance, because it is tough for the query optimizer to produce an efficient execution plan. It hurts to make SQL code generic. At some point, nearly every SQL Programmer gets infected with the feverish idea of passing table names to stored procedures. "Hey, why write scores of procedures to do this process on each table when I can write a generic, reusable procedure that does it on any table!" Bad idea; behind every stored procedure is an execution plan and a stored procedure designed to work with "any table" will result in a generic execution plan that will perform very poorly for a majority of tables. It is far better if they are tailored for specific tables and specific needs. Another typical example is where the logic for some seemingly-complex calculation has been "abstracted" into a monstrous, but reusable, view, which performs tortuous aggregations and multiple joins, executes appalling slowly, acquires numerous long-held locks and causes severe blocking in the database. Often, such twisted logic can be replaced by simple, easily optimized SQL statements. Granted, it isn't "reusable" and flaunts the 'DRY' (Don't repeat yourself) principle, but it is relatively easy to write and will often perform orders of magnitude faster. User-defined Functions (UDFs) are another favorite mechanism for promoting code reuse, and are often even more problematic. In-line logic is always much faster, even if to the sensitive developer it has the look of hippos doing line-dancing. Memories of the overuse of UDFs can make any seasoned DBA flinch. If you ever bump into Grant Fritchey at a community event, buy him a beer and ask him about the case of the application with multi-statement UDFs that called other multi-statement UDFs in an attempt at enforcing inheritance in a database. Also ask him how well it scaled beyond a single-user and a single row. Should SQL Server simply get better at adopting and supporting such basic and universally-accepted programming practices as putting logic in a function? Probably, yes, but in the meantime, we must measure any code reuse in the database against the likely performance penalty. Perhaps the most effective form of code reuse is via constraints, though it requires lateral thinking to extend this beyond simple data rules. Functions can be used, but extra care and effort is required to write them as inline functions; in-line code or calculated columns will always outperform UDFs. Stored procedure use is to be actively encourage; just don't try to make them generic. On Simple-Talk we've published a lot about execution plans, query optimization and performance. We believe that, once a developer is aware of the process, they are better able to judge that fine balancing point in the compromise between performance and maintainability. Even better, we hope we've also given a glimpse of an alternative path to those goals, by means of intelligent database design. A neat trick, if you can do it. Cheers, Tony. by Tony Davis Temporary Tables in SQL Server 01 September 2011 by Phil Factor

Temporary tables are used by every DB developer, but they're not likely to be too adventurous with their use, or exploit all their advantages. They can improve your code's performance and maintainability, but can be the source of grief to both developer and DBA if things go wrong and a process grinds away inexorably slowly. We asked Phil for advice, thinking that it would be a simple explanation.

emporary tables are just that. They are used most often to provide workspace for the intermediate results when processing data within a batch or Tprocedure. They are also used to pass a table from a table-valued function, to pass table-based data between stored procedures or, more recently in the form of Table-valued parameters, to send whole read-only tables from applications to SQL Server routines, or pass read-only temporary tables as parameters. Once finished with their use, they are discarded automatically. Temporary tables come in different flavours including, amongst others, local temporary tables (starting with #), global temporary tables (starting with ##), persistent temporary tables (prefixed by TempDB..), and table variables.(starting with (@) Before we get too deep into the technology, I’d advise that you should use table variables where possible. They’re easy, and SQL Server does the work for you. They also tend to cause fewer problems to a hard-working OLTP system. Just occasionally, you may need to fine-tune them to get good performance from them, but I'll explain that in a moment, Table Variables

Table variables are used within the scope of the routine or batch within which they are defined, and were originally created to make table-valued functions possible. However, they are good for many of the uses that the traditional temporary table was put to. They behave like other variables in their scoping rules. Once out of scope, they are disposed of. These are much easier to work with, and pretty secure, and they trigger fewer recompiles in the routines where they’re used than if you were to use temporary tables. Table variables require less locking resources as they are 'private' to the process that created them. Transaction rollbacks do not affect them because table variables have limited scope and are not part of the persistent database, so they are handy for creating or storing data that ought to survive roll backs such as log entries. The downside of table variables is that they are often disposed of before you can investigate their contents for debugging, or use them to try out different SQL expressions interactively. If your application is conservative and your data volumes light you’ll never want anything else. However, you can hit problems. One difficulty is that table variables can only be referenced in their local scope, so you cannot process them using dynamic SQL as you might with a temporary table or table-valued parameter. This is because you can’t refer an externally-defined table variable within dynamic SQL that you then execute via the EXEC statement or the sp_ExecuteSQL stored procedure because the dynamic SQL is executed outside the scope of the table variable. You can, of course, create, and then use, the table variable inside the dynamic SQL because the table variable would be in scope. However, once the dynamic SQL is run, there would be no table variable There are a few anomalies to be aware of too. You can’t, for example, change the table definition after the initial DECLARE statement; a table variable can’t be the destination of a SELECT INTO statement or a INSERT EXEC; You can’t call user-defined functions from CHECK constraints, DEFAULT values, and computed columns in the table variable. The only constraints that you're allowed beyond CHECK constraints are PRIMARY KEY, UNIQUE KEY, and NULL / NOT NULL The trickiest problems, though, come with increasing size of the tables, because you can’t declare an index explicitly and Distribution statistics aren’t maintained on them. You also cannot generate parallel query plans for a SQL expression that is modifying the table's contents. To get around the index restriction, you can use constraints to do the same thing. Most essential is the Primary Key constraint which allows you to impose a clustered index, but unique constraints are useful for performance. The Query optimiser will happily use them if they are around. The biggest problem with table variables is that statistics aren’t maintained on the columns. This means that the query optimiser has to make a guess as to the size and distribution of the data and if it gets it wrong, then you’re going to see poor performance on joins: If this happens, there is little you can do other than to revert to using classic local temporary tables. One thing you can try is to add option (recompile) to the statement that involves the table variable joining with other tables. By doing this, SQL Server will be able to detect number of rows at recompile because the rows will have already been populated. In this demo, the join was reduced in time by three quarters simply by adding the OPTION (RECOMPILE)

SET nocount ON

DECLARE @FirstTable TABLE (RandomInteger INT) DECLARE @SecondTable TABLE (RandomInteger INT) DECLARE @WhenWeStarted DATETIME DECLARE @ii INT

BEGIN TRANSACTION SET @ii = 0 WHILE @ii < 100000 BEGIN INSERT INTO @FirstTable VALUES (RAND() * 10000) SET @ii = @ii + 1 END SET @ii = 0 WHILE @ii < 100000 BEGIN INSERT INTO @SecondTable VALUES (RAND() * 10000) SET @ii = @ii + 1 END COMMIT TRANSACTION SELECT @WhenWeStarted = GETDATE() SET STATISTICS PROFILE ON SELECT COUNT(*) FROM @FirstTable first INNER JOIN @SecondTable second ON first.RandomInteger = second.RandomInteger OPTION (RECOMPILE) -- 153Ms as opposed to 653Ms without the hint SET STATISTICS PROFILE OFF SELECT 'That took ' + CONVERT(VARCHAR(8), DATEDIFF(ms, @WhenWeStarted, GETDATE())) + ' ms' go

Now if you can make what goes into the tables unique, you can then use a primary key constraint on these tables. This allowed the optimiser to use a clustered index seek instead of a table scan and the execution time was too rapid to measure Start with table variables, and drop back to using local temporary tables if you hit performance problems. Table-Valued Parameters

The Table-Valued Parameter (TVP) is a special type of table variable that extends its use. When table variables are passed as parameters, the table is materialized in the TempDB system database as a table variable and passed by reference, a pointer to the table in the TempDB. Table-valued parameters have been used since SQL Server 2008 to send several rows of data to a Transact-SQL routine or to a batch via sp_ExecuteSQL.. Their particular value to the programmer is that they can be used within TSQL code as well as in the client application, so they are good for sending client tables to the server. From TSQL, you can declare table-valued variables, insert data into them, and pass these variables as table-valued parameters to stored procedures and functions.Their more general usefulness is limited by the fact that they are only passed as read-only. You can't do UPDATE, DELETE, or INSERT statements on a table-valued parameter in the body of a routine.

You need to create a User-Defined Table Type and define a table structure to use them. Here is a simple example of their use in TSQL

/* First you need to create a table type. */ CREATE TYPE Names AS TABLE (Name VARCHAR(10)) ; GO

/* Next, Create a procedure to receive data for the table-valued parameter, the table of names and select one item from the table*/ CREATE PROCEDURE ChooseAName @CandidateNames Names READONLY AS DECLARE @candidates TABLE (NAME VARCHAR(10), theOrder UNIQUEIDENTIFIER) INSERT INTO @candidates (name, theorder) SELECT name, NEWID() FROM @CandidateNames SELECT TOP 1 NAME FROM @Candidates ORDER BY theOrder GO

/* Declare a variable that references the type for our list of cows. */ DECLARE @MyFavouriteCowName AS Names ;

/* Add data to the table variable. */ INSERT INTO @MyFavouriteCowName (Name) SELECT 'Bossy' UNION SELECT 'Bessy' UNION SELECT 'petal' UNION SELECT 'Daisy' UNION SELECT 'Lulu' UNION SELECT 'Buttercup' UNION SELECT 'Bertha' UNION SELECT 'Bubba' UNION SELECT 'Beauregard' UNION SELECT 'Brunhilde' UNION SELECT 'Lore' UNION SELECT 'Lotte' UNION SELECT 'Rosa' UNION SELECT 'Thilde' UNION SELECT 'Lisa' UNION SELECT 'Peppo' UNION SELECT 'Maxi' UNION SELECT 'Moriz' UNION SELECT 'Marla'

/* Pass the table with the list of traditional nemes of cows to the stored procedure. */ EXEC chooseAName @MyFavouriteCowName GO

As with Table Variables, the table-valued parameter ceases to exist once it is out of scope but the type definition remains until it is explicitly dropped. Like Table Variables they do not acquire locks when the data is being populated from a client, and statistics aren't maintained on columns of table-valued parameters. You cannot use a table-valued parameter as target of a SELECT INTO or INSERT EXEC statement. As you'd expect, a table-valued parameter can be in the FROM clause of SELECT INTO or in the INSERT EXEC string or stored-procedure.

The TVP solves the common problem of wanting to pass a local variable to dynamic SQL that is then executed by a sp_ExecuteSQL. It is poorly documented by , so I'll show you a worked example to get you started

DECLARE @SeaAreas TABLE (NAME Varchar(20)) INSERT INTO @SeaAreas (name) SELECT 'Viking' UNION SELECT 'North Utsire' UNION SELECT 'South Utsire' UNION SELECT 'Forties' UNION SELECT 'Cromarty' UNION SELECT 'Forth' UNION SELECT 'Tyne' UNION SELECT 'Dogger' UNION SELECT 'Fisher' UNION SELECT 'German Bight' UNION SELECT 'Humber' UNION SELECT 'Thames' UNION SELECT 'Dover' UNION SELECT 'Wight' UNION SELECT 'Portland' UNION SELECT 'Plymouth' UNION SELECT 'Biscay' UNION SELECT 'Trafalgar' UNION SELECT 'Finisterre' UNION SELECT 'Sole' UNION SELECT 'Lundy' UNION SELECT 'Fastnet' UNION SELECT 'Irish Sea' UNION SELECT 'Shannon' UNION SELECT 'Rockall' UNION SELECT 'Malin' UNION SELECT 'Hebrides' UNION SELECT 'Bailey' UNION SELECT 'Fair Isle' UNION SELECT 'Faeroes' UNION SELECT 'Southeast Iceland'

CREATE TYPE seanames AS TABLE (Name VARCHAR(20)) ; DECLARE @SeaAreaNames AS SeaNames ; INSERT INTO @SeaAreaNames (name) SELECT * FROM @SeaAreas EXEC sp_executesql N'SELECT * FROM @MySeaAreas', N'@MySeaAreas [dbo].[seanames] READONLY', @MySeaAreas = @SeaAreaNames

Before we move on to describe the more traditional temporary tables and their use, we'll need to delve into the place where temporary tables are held. TempDB. TempDB

Temporary tables and table variables are created in the TempDB database, which is really just another database with simple recovery: With TempDB, only sufficient 'minimal' logging is done to allow rollback, and other ACID niceties. The special difference of TempDB is that any objects such as tables are cleared out on startup. Because TempDB always uses the simple recovery model, the completed transaction are cleared from the log log on the next TempDB checkpoint, and only the live transactions are retained. This all means that temporary tables behave like any other sort of base table in that they are logged, and stored just like them. In practice, temporary tables are likely to remain cached in memory, but only if they are frequently-used: same as with a base table. TempDB operates a system called temporary object reuse, which will cache a portion of the temporary objects with the plan, if there is sufficient memory. This may account for the legend that temporary objects exist only in memory. The truth as ever is 'it depends...'. A lot of other things go on in TempDB: The database engine can use it for placing work tables for DBCC checks, for creating or rebuilding indexes, cursors, for example. Intermediate tables in queries described as 'hashes', 'sorts' and 'spools' are materialized in TempDB, for example, along with those required for several 'physical' operations in executing SQL Statements. It is also used as a version store for Snapshot isolation, Multiple Active Results Sets (MARS), triggers and online-index-build. Because temporary tables are stored just like base tables, there are one or two things you need to be wary of. You must, for example, have CREATE TABLE permission in TempDB in order to create a normal table. To save you the trouble, this is assigned by default to the DBO (db owner) role, but you may need to do it explicitly for users who aren’t assigned the DBO role. All users have permissions to create local or global temporary tables in TempDB because this is assigned to them via the GUEST user security context.

The classic temporary table comes in two flavors, the Global, or shareable, temporary table, prefixed by ‘##’, and the local temporary table, whose name is prefixed with ‘#’.The local temporary tables are less like normal tables than the Global temporary tables: You cannot create views on them, or associate triggers with them. It is a bit tricky to work out which process, session or procedure created them. We’ll give you a bit of help with that later. Most importantly, they are more secure than a global temporary table as only the owning process can see it. Another oddity of the local temporary table is that it has a different name in the to the one you give it in your routine or batch. If the same routine is executed simultaneously by several processes, the Database Engine needs to be able to distinguish between the identically-named local temporary tables created by the different processes. It does this by adding a numeric string to each local temporary table name left-padded by underscore characters. Although you specify the short name such as #MyTempTable, what is actually stored in TempDB is made up of the table name specified in the CREATE TABLE statement and the suffix. Because of this suffix, local temporary table names must be 116 characters or less. If you’re interested in seeing what is going on, you can view the tables in TempDB just the same way you would any other table. You can even use sp_help work on temporary tables only if you invoke them from TempDB.

USE TempDB go execute sp_Help #mytemp or you can find them in the system views of TempDB without swithching databases.

SELECT name, create_date FROM TempDB.sys.tables WHERE name LIKE '#%'

Or the Information Schema

SELECT * FROM TempDB.information_schema.tables

Even better, you can find out what process, and user, is holding on to enormous temporary tables in TempDB and refusing to give up the space

-- Find out who created the temporary table,and when; the culprit and SPId. SELECT DISTINCT te.name, t.Name, t.create_date, SPID, SessionLoginName FROM ::fn_trace_gettable(( SELECT LEFT(path, LEN(path)-CHARINDEX('\', REVERSE(path))) + '\Log.trc' FROM sys.traces -- read all five trace files WHERE is_default = 1 ), DEFAULT) trace INNER JOIN sys.trace_events te on trace.EventClass = te.trace_event_id INNER JOIN TempDB.sys.tables AS t ON trace.ObjectID = t.OBJECT_ID WHERE trace.DatabaseName = 'TempDB' AND t.Name LIKE '#%' AND te.name = 'Object:Created' AND DATEPART(dy,t.create_date)= DATEPART(Dy,trace.StartTime) AND ABS(DATEDIFF(Ms,t.create_date,trace.StartTime))<50 --sometimes slightly out ORDER BY t.create_date

You cannot use user-defined datatypes in temporary tables unless the datatypes exist in TempDB; that is, unless the datatypes have been explicitly created User Tables in TempDB

In normal use, you will create temporary tables, or table variables without thinking too deeply about it. However, it is interesting, though, that TempDB is there for any sort of sandbox activity. You can create ordinary base tables, views, or anything else you want. You can create schemas, stored procedures and so on. You’re unlikely to want to do this, but it is certainly possible since TempDB is just another database. I've just had to restart my development SQL Server after proving this to myself by installing AdventureWorks onto it. This means that it is possible to create a base table in TempDB, a sort of ..er... temporary permanent table. Unlike the global temporary table, you’d have to do all your own housekeeping on it: you’re on your own. The same is true of routines. The advantage of doing this is that any processing that you do uses TempDB’s simple recovery so that, if you fail to mop up, SQL Server acts as mother on the next startup: though this could be a very long time. The next stage is to have what I call a ‘persistent temporary’ table. In this table, the data itself is volatile when the server restarts, but the table itself persists. Probably the most common way to create a persistent Temporary table is to recreate on startup a global temporary table. This can be done automatically when all databases are recovered and the "Recovery is completed" message is logged. Even though this is a ‘global temporary’, it isn’t deleted when all connections using it have disappeared, because the process that runs it never disappears. Arguably, it is better to create this kind of work table in the database that uses it, though, if you are using full recovery, the temporary work will remain in the log. You can, of course, just create an ordinary table in TempDB. You can create these ‘persistent’ tables on startup by defining a stored procedure in master that creates the global temporary table

USE master go CREATE PROCEDURE createMyGlobalTables AS CREATE TABLE ##globalTemporary1 (-- Blah blah (insert DDL here) CREATE TABLE ##globalTemporary2 (-- Blah blah (insert DDL here) --and so on…. CREATE TABLE ##globalTemporaryn (-- Blah blah (insert DDL here)

go EXEC sp_procoption 'createMyGlobalTables', 'startup', 'true' -- A stored procedure that is set to autoexecution runs every time an instance of SQL Server is started

Why use this sort of hybrid table? There are, for example, a number of techniques for passing tables between procedures via ‘persistent’ tables in a multiprocess-safe way, so as to do a series of processing to the data. These are referred to a Process-keyed tables (see ‘How to Share Data Between Stored Procedures: Process-Keyed table by Erland Sommarskog). They will initially raise the eyebrows of any seasoned DBA but they are an effective and safe solution to a perennial problem, when they are done properly. As well as temporary tables, there are also a number of table types that aren’t directly derived from base tables, such as ‘fake’ tables and derived tables: some of these are so fleeting that they are best thought of as ephemeral rather than temporary. The CTE uses ephemeral tables that are ‘inline’ or ‘derived’ and aren’t materialised. BOL refers to them as ‘temporary named result sets’. They exist only within the scope of the expression. In a CTE, they have the advantage over derived tables in that they can be accessed more than once. Local Temporary Table

With Local temporary table (names that begin with #), what goes on under the hood is surprisingly similar to table variables. As with Table Variables, Local Temporary tables are private to the process that created it. They cannot therefore be used in views and you cannot associate triggers with them. They are handier than table variables if you like using SELECT INTO to create them, but I'm slightly wary about using SELECT INTO in a system that is likely to require modification, I'd much rather create my temporary tables explicitly, along with all the constraints that are needed. You cannot easily tell which session or procedure has created these tables. This is because, if the same stored procedure is executed simultaneously by several processes, the Database Engine needs to be able to distinguish the same tables created by the different processes. The Database Engine does this by internally appending a left-padded numeric suffix to each local temporary table name. The full name of a temporary table as stored in the sys.objects view in TempDB is made up of the table name specified in the CREATE TABLE statement and the system-generated numeric suffix. To allow for the suffix, the table name specified for a local temporary name must be less than 116 characters. You get housekeeping with Local Temporary tables; they are automatically dropped when they go out of scope, unless explicitly dropped by using DROP TABLE. Their scope is more generous than a table Variable. Local temporary tables are dropped automatically at the end of the current session or procedure. This can cause head-scratching: a local temporary table that us created within a stored procedure is dropped when the stored procedure is finished so it cannot be referenced by the process that called the stored procedure that created the table. It can, however, be referenced by any nested stored procedures executed by the stored procedure that created the table. If the nested procedure references a temporary table and two temporary tables with the same name exist at that time, which table is the query is resolved against? Global Temporary Tables.

Like Local temporary tables, Global temporary tables (they begin with ##) are automatically dropped when the session that created the table ends: However, because global tables aren’t private to the process that created it, they must persist thereafter until the last Transact-SQL statement that was actively referencing the table at the time when the creating session ended has finished executing and the locks are dropped. Anyone who has access to TempDB at the time these Global Temporary tables exist can directly query, modify or drop these temporary objects. You can associate rules, defaults, and indexes with temporary tables, but you cannot create views on temporary tables or associate triggers with them. You can use a user-defined datatype when creating a temporary table only if the datatype exists in TempDB Stored procedures can reference temporary tables that are created during the current session. Within a stored procedure, you cannot create a temporary table, drop it, and then create a new temporary table with the same name. Although this works….

CREATE table #Color( Color varchar(10) PRIMARY key) INSERT INTO #color SELECT 'Red' UNION SELECT 'White' UNION SELECT 'green'UNION SELECT'Yellow'UNION SELECT'blue' DROP TABLE #color go CREATE table #Color( Color varchar(10) PRIMARY key) INSERT INTO #color SELECT 'Red' UNION SELECT 'White' UNION SELECT 'green'UNION SELECT'Yellow'UNION SELECT'blue' DROP TABLE #color

…this doesn’t

CREATE PROCEDURE MisbehaviourWithTemporaryTables AS CREATE table #Color( Color varchar(10) PRIMARY key) INSERT INTO #color SELECT 'Red' UNION SELECT 'White' UNION SELECT 'green'UNION SELECT'Yellow'UNION SELECT'blue' DROP TABLE #color CREATE table #Color( Color varchar(10) PRIMARY key) INSERT INTO #color SELECT 'Red' UNION SELECT 'White' UNION SELECT 'green'UNION SELECT'Yellow'UNION SELECT'blue' DROP TABLE #color go

You can, by using local temporary tables, unintentionally force recompilation on the stored procedure every time it is used. This isn’t good because the stored procedure is unlikely to perform well. To avoid recompilation, avoid referring to a temporary table created in a calling or called stored procedure: If you can’t do so, then put the reference in a string that is then executed using the EXECUTE statement or sp_ExecuteSQL stored procedure. Also, make sure that the temporary table is created in the stored procedure or trigger before it is referenced and dropped after these references. Don’t create a temporary table within a control-of-flow statement such as IF... ELSE or WHILE. Conclusions

In any shared playground, be very careful how you swing that bat. You'll have realized, whilst reading this, that a lot of activity goes on in TempDB, and you can cause havoc to the whole SQL Server by using long-running processes that fill temporary tables, whatever type they are, with unnecessary quantities of data. In fact, I've given you clues in this article how to really, really, upset your DBA by inconsiderate use of that precious shared resource, the TempDB. (In the old days before S2005, using SELECT INTO with a huge table was the great V-weapon.)

I'm always wary of providing over-generalized advice, but I always prefer my databases to use Table Variables, and TVPs wherever possible, They require less resource, and you're less likely to hold onto them when you're finished with them. I like to use them to the max, with column and table checks and constraints. You may find times when they run out of steam, especially when table sizes get larger. In cases like this, or where it isn't practical to use table variables because of their restricted scope, then I'll use local temporary tables. It takes a lot of pursed lips and shaking of heads before I'll agree to a global temporary table or persistent temporary table. They have a few valid and perfectly reasonable uses, but they place reliance on the programmer to do the necessary housekeeping Always bear in mind that misuse of temporary tables, such as unnecessarily large, or too long-lived, can have effects on other processes, even on other databases on the server. You are, after all, using a shared resource, and you wouldn't treat your bathroom that way would you?

© Simple-Talk.com The ANTS Memory Profiler Filter help panel

Published Wednesday, August 31, 2011 9:00 AM

In my last blog post, I discussed the features that were renamed in ANTS Memory Profiler 7.0. In this post, I present another aspect that I worked on in the same product: the 'Filter help panel'. Like many of the changes we make in our products, user feedback strongly influenced our decision to include additional embedded user assistance in the profiler. Throughout the lifetime of ANTS Memory Profiler 5 and 6, we had learnt that the optimal way to use the profiler depended on the specific problem being investigated, but a substantial number of users did not have sufficient understanding of .NET memory to work out how they should use the tool in their circumstances. Sometime after watching another usability test in which someone who was unfamiliar with the profiler failed to locate the source of his memory leak, my colleague Stephen Chambers (the usability engineer for ANTS Memory Profiler) and I decided that we needed to do more to assist. We already knew from speaking to customers that they often use the memory profiler for the first time when they need to solve an immediate problem. The software isn't generally investigated in case it is needed later on. In turn, this means that users want to get to the cause as quickly as possible, and don't want to read the help to find it. So, we had a dilemma: on the one hand, there are users who don't want to read help, and who are probably under pressure to solve an issue. On the other, whilst we strive to make our software as easy-to-use as possible, .NET memory is complex. To be able to use the profiler, the fundamentals need to be understood and the results need to be interpreted. How could we help? Our idea was to include in the product a series of hyperlinks to guide users through the profiling process. Together with the developers, we created a decision tree defining the main types of memory problem likely to be encountered and how best to go about fixing them. It was very detailed, to say the least:

Decision tree for guiding users through a memory profiling problem

Stephen and I decided to add this information to the recently-redesigned panel which contains the filtering options for the memory profiler. The rationale was that our testing had shown the filters to be something that novice users look at while trying to work out what the next step should be.

Implementation of the Filter help panel in ANTS Memory Profiler 7.0.0.521

As with all innovations at Red Gate, the new user assistance immediately went for user experience testing. Three times, we sat a novice user in front of the latest build and three times we hoped that they would solve their problem quickly by being led straight to the solution by the new help. On each occasion, we were disappointed to learn that the user's problem was one of the cases we hadn't accounted for, and so our work hindered, rather than helped, their investigation. Back to the drawing board. I decided that we'd have to get the user to read the online help, because one small text box clearly wasn't sufficient space to give enough detail. Because the user was likely to be under pressure, however, they would have to go to exactly the right page, and that page would need to contain just the information they needed. Any background information would have to be somewhere else. Together with Andrew Hunter, our resident memory guru and the lead developer on ANTS Memory Profiler, I created a list of the five main types of memory problem, in order of likelihood: Large object heap fragmentation An application using too much memory Managed code memory leaks Unmanaged code memory leaks Wanting to know what class uses the most memory

We decided that each of these would become a single page in the web help, and that the Filter help panel would point to the relevant page. Additionally, we would need a background article on how .NET memory works, to provide the necessary domain knowledge to use this information. The result was a whole new chapter in the help, and a much-simplified Filter help panel in the final release:

Implementation of the Filter help panel in ANTS Memory Profiler 7.0.0.731 (Released version)

The final, and most important, question is what users think. We've had regrettably little feedback about this feature, but what little we have received indicates that the new help has been highly appreciated. A strong indication of success comes from Google Analytics, which indicates a phenomenal amount of hits on the linked pages ('I do not know which kind I have' was hit over 1100 times in August 2011 alone!) and the fact that, anecdotally, our Support team seem to have received fewer calls from confused customers since the launch of version 7. So, if you've used the Feature help panel, I'd love to hear what you thought of it, and whether we should do something similar in other products that need a certain amount of domain knowledge. Incidentally, if you're interested in hearing more about our approach to embedded user assistance, I'm presenting a three-hour workshop on just that topic with my colleague Roger Hart at the Technical Communication UK conference in Oxford later in September. by Dom Smith Filed Under: technical communications, embedded ua Statistics on Ascending Columns 01 September 2011 by Fabiano Amorim

It comes as rather a shock to find out that one of the commonest circumstances in an OLTP database, an ascending primary key with most querying on the latest records, can throw the judgement of the Query Optimiser to the extent that perfomance nose-dives. Fabiano once again puts on snorkel and goggles to explore the murky depths of execution plans to find out why. First of all, a warning: Much of what you will see here is ‘undocumented’. That means we can discuss it, but you can’t trust that this will not change in a service pack or a new product release. Also you will not be able to call Microsoft support for help or advice on the topic, or if you get an error when using the following techniques. Use with precaution and with extra attention.

Introduction

A very common problem relating to distribution statistics is associated with what we call “ascending value columns” in a table. This generally happens when a large table has ascending values and the most recent rows are the ones most commonly being accessed. When data values in a column ascend, most new insertions are beyond the range covered by the distribution statistics. This can lead to poorly performing plans since they inaccurately predict from the statistics that filters selecting recent data would exclude the entire relation. As we already know, a column which belongs to a statistic has to get to a threshold quantity of modifications after the previous update in order to trigger the auto-updated statistics, and for certain cases, this threshold is too high. In other words, a column table requires too many modifications before the distribution statistics are rebuilt. This delay in auto-updating a statistic can be a problem for queries that are querying the newest data in the table. That is very likely to happen when we are using date or identity columns. As you know, the use of an outdated statistic can be very bad for performance because SQL Server will not be able to estimate precisely the number of rows a table will return, and is likely to use a poor execution plan. To simulate this problem I’ll start by creating a table called Pedidos (which means Orders in Portuguese) and I’ll make the identity column a primary key.

USE Tempdb GO SET NOCOUNT ON; GO IF OBJECT_ID('Pedidos') IS NOT NULL DROP TABLE [Pedidos] GO CREATE TABLE [dbo].[Pedidos] ( [ID_Pedido] [int] IDENTITY(1, 1) NOT NULL, [ID_Cliente] [int] NOT NULL, [Data_Pedido] Date NOT NULL, [Valor] [numeric](18, 2) NOT NULL, CONSTRAINT [xpk_Pedidos] PRIMARY KEY CLUSTERED(ID_Pedido) ) GO

CREATE INDEX ix_Data_Pedido ON Pedidos(Data_Pedido) GO

INSERT INTO Pedidos (ID_Cliente, Data_Pedido, Valor) SELECT ABS(CONVERT(Int, (CheckSUM(NEWID()) / 10000000))), '18000101', ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5))) GO -- Inserting 50000 rows in the table INSERT INTO Pedidos WITH(TABLOCK) (ID_Cliente, Data_Pedido, Valor) SELECT ABS(CONVERT(Int, (CheckSUM(NEWID()) / 10000000))), (SELECT DATEADD(d, 1, MAX(Data_Pedido)) FROM Pedidos), ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5))) GO 50000

SELECT * FROM Pedidos GO

Here is what the data looks like:

As you can see the orders are added sequentially and that’s what usually happens with this sort of table. The date_order (column data_pedido) is the date of the order. Now let’s suppose a query that is searching for the orders of the day will look like the following:

SET STATISTICS IO ON SELECT * FROM Pedidos WHERE Data_Pedido >= Convert(date, GetDate()) OPTION (RECOMPILE) SET STATISTICS IO OFF

In the beginning of the day SQL Server can estimate that 0 (even it shows 1 on the plan, it is estimating 0) rows will be returned. But what will happen in the end of the day when thousands of orders were inserted? It’s likely that an execution plan using the Key Lookup will not be a good option since the lookup operation will demand a lot of pages to be read. Let’s simulate the problem by inserting 5001 new orders in the table. Because this is below the autoupdate threshold it will not trigger the auto- updated statistics.

INSERT INTO Pedidos WITH(TABLOCK) (ID_Cliente, Data_Pedido, Valor) SELECT ABS(CONVERT(Int, (CheckSUM(NEWID()) / 10000000))), GetDate(), ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5))) GO INSERT INTO Pedidos (ID_Cliente, Data_Pedido, Valor) VALUES (ABS(CONVERT(Int, (CheckSUM(NEWID()) / 10000000))), (SELECT DateAdd(d, 1, MAX(Data_Pedido)) FROM Pedidos), ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5)))) GO 5000

Now the table has lots of new orders, and each order was inserted in ascending order. That means the column Data_Pedido is increasing. Let’s run the same query and check how many pages SQL Server has to read to execute this plan. SET STATISTICS IO ON SELECT * FROM Pedidos WHERE Data_Pedido >= Convert(date, GetDate()) OPTION (RECOMPILE) SET STATISTICS IO OFF GO

Note: Notice that I’m using the hint RECOMPILE to avoid query plan reuse. Also the auto create and auto update statistics are enabled on the tempdb database.

We can see here that the estimation of how many rows will be returned is very wrong. The 5001 insertions were not enough to trigger the auto- updated statistics, and the query optimizer still thinks that the table has no orders greater than GetDate(). What if we force a scan on the base table?

SET STATISTICS IO ON SELECT * FROM Pedidos WITH(FORCESCAN, INDEX(0)) WHERE Data_Pedido >= Convert(date, GetDate()) OPTION (RECOMPILE) SET STATISTICS IO OFF

Note: The hint FORCESCAN is new on SQL Server 2008 R2 SP1. You can read more this hint http://technet.microsoft.com/en- us/library/ms187373.aspx.

As we can see, a scan on the clustered index requires much fewer page-reads, 200 pages on clustered index versus 10017 on the non-clustered one, plus the lookup on the clustered index. The estimate is still wrong because the statistics are outdated and therefore don’t represent the reality of what is in the table. Branding

SQL Server can detect when the leading column of a statistics object is ascending and can mark or “brand” it as ascending. A statistics object that belongs to an ascending column is branded as “ascending” after three updates on the statistics. It’s necessary to update it with ascending column values so that when the third update occurs, SQL Server brands the statistics object as ascending. It’s possible to check the statistics’ brand using the trace flag 2388, when turned on it changes the result of the DBCC SHOW_STATISTICS, and then we can see a column called “Leading column type” with the brand of the column. For instance:

-- Enable trace flag 2388 DBCC TRACEON(2388) GO -- Look at the branding DBCC SHOW_STATISTICS (Pedidos, [ix_Data_Pedido]) GO -- Disable trace flag 2388 DBCC TRACEOFF(2388) GO

As we can see, the column now is “Unknown”. Let’s insert 10 rows with ascending orders and update the statistics to see what will happen.

-- Insert 10 rows INSERT INTO Pedidos (ID_Cliente, Data_Pedido, Valor) VALUES (ABS(CONVERT(Int, (CheckSUM(NEWID()) / 10000000))), (SELECT DateAdd(d, 1, MAX(Data_Pedido)) FROM Pedidos), ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5)))) GO 10 -- Update statistics UPDATE STATISTICS Pedidos [ix_Data_Pedido] WITH FULLSCAN GO DBCC TRACEON(2388) DBCC SHOW_STATISTICS (Pedidos, [ix_Data_Pedido]) DBCC TRACEOFF(2388)

As I said before, the statistics have to be updated three times to be branded as ascending, so let’s do it.

-- Insert 10 rows INSERT INTO Pedidos (ID_Cliente, Data_Pedido, Valor) VALUES (ABS(CONVERT(Int, (CheckSUM(NEWID()) / 10000000))), (SELECT DateAdd(d, 1, MAX(Data_Pedido)) FROM Pedidos), ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5)))) GO 10 -- Update statistics UPDATE STATISTICS Pedidos [ix_Data_Pedido] WITH FULLSCAN GO -- Insert 10 rows INSERT INTO Pedidos (ID_Cliente, Data_Pedido, Valor) VALUES (ABS(CONVERT(Int, (CheckSUM(NEWID()) / 10000000))), (SELECT DateAdd(d, 1, MAX(Data_Pedido)) FROM Pedidos), ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5)))) GO 10 -- Update statistics UPDATE STATISTICS Pedidos [ix_Data_Pedido] WITH FULLSCAN GO DBCC TRACEON(2388) DBCC SHOW_STATISTICS (Pedidos, [ix_Data_Pedido]) DBCC TRACEOFF(2388) Trace Flags 2389 and 2390

By default, the query optimizer keeps the information about the branding of statistics, but doesn’t make use of it. The optimizer won’t choose a different plan based on whether the column is ascending or not. To change this, you need to use the trace flags 2389 or 2390. When the trace flag 2389 is enabled, the statistics are ascending and you have a covered index on the ascending leading key of the statistic, then the query optimizer will query the table to compute the highest value of the column. This value is then used in the estimation of how many rows will be returned for the predicate. The trace flag 2390 works similarly to flag 2389, the main difference being that, with this flag set, the query optimizer doesn’t care if the column was branded as ascending or not. In other words, even if the column is marked as “Unknown” it will query the table to find the highest value. To see this in practice we’ll use an undocumented query hint called QUERYTRACEON. With this query hint we can enable a trace flag in a statement scope. Here we have a command that is using the trace flags 2389 and 2390:

SET STATISTICS IO ON SELECT * FROM Pedidos WHERE Data_Pedido >= Convert(date, GetDate()) OPTION(QUERYTRACEON 2389, QUERYTRACEON 2390, RECOMPILE) SET STATISTICS IO OFF GO

SQL Server, we can see, now has a new estimation and it was enough to avoid the bad plan that was using the key lookup.

Note: Internally QUERYTRACEON will run DBCC TRACEON command and you’ll need a sysadmin privilege to run this hint. A good alternative is to use a stored procedure and run it as an administrator user.

Note: There is another brand for a statistics object called “Stationary”. The query optimizer will not trigger the query on the table to compute the value if the brand is stationary. In other words, the data in the leading column(s) is not ascending.

Conclusion

Remember that this is just an alternative way to fix a problem that is seen with large tables with ascending columns. The best option is to update your statistics periodically. In this case, it’s just a matter of creating a job to update the statistics more frequently for the tables with ascending columns up to several times a day, there is no fixed number of updates per day here, you’ll need to figure out which is the appropriate number of updates necessary for your scenario. You also have query hints, plan guides and other alternatives to fix this problem. Here are some questions I asked myself when I first read about these trace flags. Are trace flags always good? Probably not, you have to find the scenario where to test them, and test, test, test. Will SQL Server always query the table to find the highest value on in the column? Not always. It depends on the predicate and how you are querying the table. Can I use the trace flag 2389 alone? Yes, but it will work only for columns branded as ascending. Can I use the trace flag 2390 alone? You can, but it doesn’t make sense to do so because it will stop working when the column turns out to be ascending. As with everything, you need to test it before you use it and it always depends on the scenario. It’s also worth mentioning that this method is not supported; so don’t call Microsoft if you have any doubts about this, or me either, come to think of it. That’s all folks, see you.

© Simple-Talk.com Storing Windows Event Viewer Output in a SQL Server table with PowerShell

Published Wednesday, August 31, 2011 6:31 PM

My good friend Marcos Freccia (blog | twitter) asked me for a simple and fast way to save the output of running the Get-EventLog cmdlet on a SQL Server table. Well, the quickest and easiest way that I know is to use Chad Miller's Out-DataTable and Write-DataTable functions, because the Write-dataTable function uses sqlbulkcopy. I took the liberty of tweaking the Write-DataTable function to get the output object via Pipeline, and you can download the modified version over on my blog. Keep in mind that when you pass the object by pipeline, it will be using the SqlBulkCopy too, but line by line. I will show two variations. First lets create the table to receive the cmdlet output:

USE [Test] GO /****** Object: Table [dbo].[EventViewer] /Script Date: 08/28/2011 08:56:09 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[EventViewer]( [Index] [int] NULL, [Time] [datetime] NULL, [EntryType] [varchar](MAX) NULL, [Source] [varchar](MAX) NULL, [InstanceID] [varchar](MAX) NULL, [Message] [varchar](MAX) NULL ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO

Then let's populated the table, passing the objects by pipeline (which means that I am inserting the data line by line) Get-EventLog -ComputerName YourComputerName -LogName Security -After "22-08-2011" | select index,TimeGenerated,EntryType,Source,InstanceID,Message | Out-DataTable | Write-DataTable -ServerInstance YourServer -Database YourDatabase -TableName EventViewer Just as a benchmark, let's see how long that takes: Measure-Command{ Get-EventLog -ComputerName Vader -LogName Security -After "22-08-2011" | select index,TimeGenerated,EntryType,Source,InstanceID,Message | Out-DataTable | Write-DataTable -ServerInstance Vader -Database Test - TableName EventViewer } Days : 0 Hours : 0 Minutes : 0 Seconds : 1 Milliseconds : 753 TotalMilliseconds : 1753,707 On the other hand, let´s populate the table using Write-DataTable with an appropriate set of values. $variable = ( Get-EventLog -ComputerName YourComputer -LogName Security -After "22-08-2011" | select index,TimeGenerated,EntryType,Source,InstanceID,Message); $valuedatatable = Out-DataTable -InputObject $variable ; Write-DataTable - ServerInstance YourServer -Database YourDatabase -TableName EventViewer -Data $valuedatatable . And how many seconds did that take? Measure-Command { $variable = ( Get-EventLog -ComputerName YourComputer -LogName Security -After "22-08-2011" | select index,TimeGenerated,EntryType,Source,InstanceID,Message); $valuedatatable = Out-DataTable -InputObject $variable ; Write-DataTable - ServerInstance YourServer -Database YourDatabase -TableName EventViewer -Data $valuedatatable } Days : 0 Hours : 0 Minutes : 0 Seconds : 1 Milliseconds : 192 TotalMilliseconds : 1192,0523 We can clearly see the difference ; the first script took 1753 millisecond's and the second only took 1192. Looks like it is not only in the SQL Server that line by line operations are evil. Scaling Out

First we have to add a column to our SQL Server table that stores the computer name and instance ID which you're applying the Get-EventLog cmdlet to. Remember: because the Write-DataTable cmdlet uses sqlbulkcopy, you need the pass the columns to it in the same order as they occur in the table (as you'll see below). We'll also need to store the locations of the servers we want to survey in a flat text file, called servers.txt in this example. So, let's start by creating the receiving table:

USE [Test] GO /****** Object: Table [dbo].[EventViewer] /Script Date: 08/28/2011 09:21:24 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[EventViewer]( [ComputerName] [varchar](50) NULL, [Index] [int] NULL, [Time] [datetime] NULL, [EntryType] [varchar](MAX) NULL, [Source] [varchar](MAX) NULL, [InstanceID] [varchar](MAX) NULL, [Message] [varchar](MAX) NULL ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO

Then we can use the Script: foreach ($server in Get-Content c:\temp\servers.txt) { $variable = ( Get-EventLog - ComputerName $server -LogName Security -After "22-08-2011" | select @{Expression={$($server) };Label = "ComputerName"} ,index,TimeGenerated,EntryType,Source,InstanceID,Message ) $valuedatatable = Out-DataTable - InputObject $variable Write-DataTable -ServerInstance YourServer -Database YourDatabase - TableName EventError -data $valuedatatable Simple, Fast and Clean - classic PowerShell. #PowerShellLifeStyle by laerte Designing C# Software With Interfaces 24 August 2011 by David Berry

The best way to understand how interfaces improve software design is to see a familiar problem solved using interfaces. First, take a tightly-coupled system design without interfaces, spot its deficiencies and then walk-through a solution of the problem with a design using interfaces.

s systems grow in size and complexity, software design must increasingly achieve the separation of concerns, adaptability for future changes Aand loose coupling. To do this, it is essential to design applications using interfaces. Interfaces are one of the most powerful concepts in modern object orientated languages such as C#, VB.NET or Java. Through the use of interfaces, developers can clearly define the relationship between different modules within a system. This allows a better definition of the boundaries of responsibility between different modules. The result of this is that individual modules are loosely coupled from each other, making the entire system more adaptable to change. While most C# developers are familiar with the syntax and concept of interfaces, fewer have mastered their use sufficiently to design software around interfaces. Introductory texts on C# provide the basic syntax and a few simple examples, but do not have enough detail and depth to help the reader understand how powerful interfaces are, and where they should best be used. Books on design patterns catalog a vast array of patterns, many focused around interfaces, but often seem overwhelming to those developers who are making their first foray into software design. Consequently, too many developers reach a plateau in their abilities and their careers, and are unable to take the next steps into working with, and designing, more complex software systems. I believe that many developers can benefit from learning to design with interfaces, and that the best way to learn about how to design with interfaces is to see an example of a practical, familiar problem solved using interfaces. First, I am going to show a system design without using interfaces and discuss some of the deficiencies that can result from this tightly coupled design. Then I am going to re-solve the problem with a design using interfaces, walking the reader through step by step of what decisions I am making and why I am making them. In understanding this solution, the reader will see how this is a more flexible, more adaptable design. Finally, it is hoped the reader will then be able to take these same techniques and apply them to other aspects of system design they may be dealing with. The High Cost of Tightly Coupled Systems

Every university computer science student knows that hardcoding values in a program is bad design practice. By writing code with modules that are tightly coupled together, you are in just about as bad of shape. What do we mean by tightly coupled? Imagine we have two modules in our system, module A and module B. If module A depends on module B, then module A and B are said to be tightly coupled. If you make a change to module B, whether a change in a method signature, class layout or any other change, you are going to also have to change module A. Module A has significant knowledge of how module B works, so any changes on module B are going to cascade up to module A.

At this point you are saying, “Of course module A has to know about module B. My classes in module A need to use my classes in module B”. That is a fair statement, and there has to be some amount of coupling for a system work. Otherwise none of the modules or layers in a system could talk to any of the other modules or layers of the system. Where we run into trouble though is when we are tightly coupled to a module that can change. The following example will demonstrate this point. Almost every application needs to perform some sort of application logging. Several high quality logging frameworks are freely available, including log4net from the Apache Foundation and Enterprise Library from Microsoft. In addition, many companies may have developed their own internal logging frameworks. Let us imagine for a moment that we have developed a typical n-tier system for managing widgets. As part of this system, we have separate projects for each of the major layers of the system—user interface, business logic, domain objects and data access. As part of our design, we have decided to use the log4net framework for logging. The diagram below shows the major components of this system and their dependencies (the arrow points in the direction of the dependency). We feel as we have done everything right in our design. We have separated out the layers of our system, put our business logic in a separate layer and separated out our data access. Yet, all of our modules in this system remain tightly coupled to each other. For the purposes of this example, we are going to focus on the logging module. Consider the following two scenarios: A new CIO just started at our company, and he wants to standardize on Enterprise Library. All projects need to convert to using the logging framework in Enterprise Library within the next three months. In this case, we need to not just change out the reference to log4net with Enterprise Library, but we need to track down all of the logging statements in all of our code and replace them with their Enterprise Library equivalents. Our widget application is wildly successful. Another team wants to integrate widget information into their Intranet web application. But this team uses their own, home grown logging framework. So using our widget libraries, we will either be writing log messages using two different frameworks (and hence multiple files) or we need to either convert our libraries to use their logging framework or adapt their application to use log4net.

In both of these cases, we are faced with unpalatable options. Instead of spending time adding features to our system that our business users want, we are ripping out one set of plumbing and replacing it with another. The fundamental problem here is that our system is tightly coupled to a specific logging implementation. Changing from one implementation (log4net) to another (Enterprise Library) represents a significant change to our code. The second example is even worse. Do we change our application to use the homegrown logging system of the other group, or do we try to convince them to use our chosen logging solution (log4net). Another alternative would be to maintain two sets of the widget libraries, one for each logging framework in use. This is clearly unsatisfactory though, because the two libraries will immediately start to grow apart. Finally, they could choose just to go and write their own widget library rather than use ours. But once again, our company is now maintaining two sets of source code to perform the same tasks. What we are lacking here is any sort of interchangeability with regards to our logging system. By being tightly integrated to one solution to the problem (in this case, log4net), we have sacrificed any sort of flexibility to change to a different solution. If we want to proceed with this change, it comes at a very high cost—namely rewriting and recompiling significant portions of our system. A Better Way – Designing with Interfaces

Our goal in the above system is to design an intermediate layer which will allow us to easily switch out logging subsystems for whatever is needed. In doing so, we also want to allow flexibility so that a new logging framework could easily be plugged in at a later date. All of this should be accomplished so that any changes to the logging sub-system do not require us to make any changes to our existing application code. Interfaces provide a simple yet elegant solution to precisely this problem. First, an enum will be defined in our common logging library. The value of the enum will represent the different levels at which log messages can be written out. Log levels are used to filter what messages are actually written to the log in a log system. Generally, levels such as FATAL and ERROR will always be configured to be included in the log system. Levels such as INFO and VERBOSE are only included in development systems or when attempting to debug a problem. Our common interface will define the log levels shown below. The level FATAL is considered most important and VERBOSE the least important. Each message that is written to the log will be assigned one of these levels.

///

/// Enum defining log levels to use in the common logging interface /// public enum LogLevel { FATAL = 0, ERROR = 1, WARN = 2, INFO = 3, VERBOSE =4 }

The next step in our refactored design is create an interface that defines how any other code we write will talk to the logging sub-system.

using System; using System.Collections.Generic; using System.Linq; using System.Text;

namespace DesigningWithInterfaces.LoggingInterface {

///

/// Defines the common logging interface specification /// public interface ILogger { /// /// Writes a message to the log /// /// A String of the category to write to /// A LogLevel value of the level of this message /// A String of the message to write to the log void WriteMessage(string category, LogLevel level, string message);

} }

Of all of the code in this article, this interface is the most important. It defines how the rest of the modules in our application or any application are going to talk to the logging subsystem. Any piece of code that wants to put a message in a log will have to do it according the specification defined above. Furthermore, any backend logging system we want to plugin and use must adhere to this specification. The interface defined above is the bridge between the two subsystems. On one side you have the module that wants to write messages to a log. On the other is the concrete logging module (like log4net or Enterprise Library) that will perform the actual details of writing the message out . This interface serves as an agreement about how those two modules will communicate. The problem we face now is that we have the interface we defined above, but none of our actual concrete logging systems implement the above interface. That is, there is no class in log4net, Enterprise Library or any other logging system out there that directly implements this interface. In fact, they all do logging a little bit differently. What is required is a class that will adapt our logging interface to an actual implementation of a logging framework. This class will fulfill our interface defined above and effectively translate those calls that are defined in our interface into method calls that work with the logging implementation that we have chosen. Such a class is shown below.

using System; using System.Collections.Generic; using System.Linq; using System.Text; using DesigningWithInterfaces.LoggingInterface; using log4net;

namespace DesigningWithInterfaces.Log4Net { ///

/// Driver class to adapts calls from ILogger to work with a log4net backend /// internal class Log4NetLogger : ILogger {

public Log4NetLogger() { // Configures log4net by using log4net's XMLConfigurator class log4net.Config.XmlConfigurator.Configure(); }

///

/// Writes messages to the log4net backend. /// /// /// This method is responsible for converting the WriteMessage call of /// the interface into something log4net can understand. It does this /// by doing a switch/case on the log level and then calling the /// appropriate log method /// /// A string of the category to log to /// A LogLevel value of the level of the log /// A String of the message to write to the log public void WriteMessage(string category, LogLevel level, string message) { // Get the Log we are going to write this message to ILog log = LogManager.GetLogger(category);

switch (level) { case LogLevel.FATAL: if (log.IsFatalEnabled) log.Fatal(message); break; case LogLevel.ERROR: if (log.IsErrorEnabled) log.Error(message); break; case LogLevel.WARN: if (log.IsWarnEnabled) log.Warn(message); break; case LogLevel.INFO: if (log.IsInfoEnabled) log.Info(message); break; case LogLevel.VERBOSE: if (log.IsDebugEnabled) log.Debug(message); break; } }

The comments in the code above speak for themselves. The above class translates our common interface to something that log4net can understand. This is the way that we can write the rest of our system to work against our common interface, but yet still use a high quality, full featured logging system like log4net to take care of the actual work of writing our logging messages out to a file or a database or whatever we desire. The real beauty though is that we can define multiple classes that implement the interface, in this case, each class serving as an adapter to a different logging backend. We can write a second class which implements the ILogger interface to support an Enterprise Library backend. All of the application code that uses logging is the same. It is programmed against our ILogger interface. To change which logging backend we use, we only have to substitute a different adapter class to be used by the system. Furthermore, if a new logging backend comes along at some point in the future, all we have to do to use it is to write a new adapter class that fulfills our interface and plug in to the new framework. This is a huge win because we can switch to something in the future we don’t even know about yet without rewriting our system. In fact, we can switch to a new framework without modifying any of our application or library code at all, just by writing one simple adapter class. Such a class is shown below.

///

/// Adapter class to use Enterprise Library logging with the common /// logging interface /// internal class EnterpriseLibraryLogger : ILogger {

public void WriteMessage(string category, LogLevel level, string message) { // First thing we need to do is translate our generic log level enum value // into a priority for Enterprise Library. Along the way, we will also // assign a TraceEventType value TraceEventType eventSeverity = TraceEventType.Information; int priority = -1; switch (level) { case LogLevel.FATAL: eventSeverity = TraceEventType.Critical; priority = 10; break; case LogLevel.ERROR: eventSeverity = TraceEventType.Error; priority = 8; break; case LogLevel.WARN: eventSeverity = TraceEventType.Warning; priority = 6; break; case LogLevel.INFO: eventSeverity = TraceEventType.Information; priority = 4; break; case LogLevel.VERBOSE: eventSeverity = TraceEventType.Verbose; priority = 2; break; }

// This creates an object to specify the log entry and assigns // values to the appropriate properties LogEntry entry = new LogEntry(); entry.Categories.Add(category); entry.Message = message; entry.Priority = priority; entry.Severity = eventSeverity;

// This line actually writes the entry to the log(s) Logger.Write(entry); } }

The final problem that must solved is how to we select which one of our adapter classes to use. Somewhere in our code we need to locate the correct adapter and create an actual instance class of our adapter to talk to our backend logging framework. At first blush, one might think of writing a code snippet like the following:

public static ILogger GetLogger() { string logger_key = ConfigurationManager.AppSettings["LoggerKey"]; if (logger_key.Equals("log4net")) { return new Log4NetLogger(); } else if (logger_key.Equals("EnterpriseLibrary")) { return new EnterpriseLibraryLogger(); } else { throw ApplicationException("Unknown Logger"); } }

One major problem this code suffers from is that it needs to know about all of the available logging implementations up front, when we write this code. A second problem is that the ILogger interface, all of the adapter classes that implement ILogger and the above piece of code all tend to end up in a single project in order to avoid circular dependencies. Further, each adapter class will have a reference to its individual backend logging assemblies, which will all come along for the ride when we compile the above code. This heavyweight, tightly coupled project is exactly what we were trying to avoid in the first place. In this case, we have used an interface, but we really have not gained very much. What we really desire is a solution that is truly decoupled and interchangeable, interchangeable to the point that at any time in the future we can substitute in a completely new adapter class and new logging backend. By utilizing the Reflection API in .NET, we can accomplish just that. The Reflection API allows us to dynamically instantiate an object of a class. That is, we can create an object of a class without calling new() on the class, but by providing the name of the class and to the Reflection API. The following code shows how to do this.

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Configuration; using System.Reflection;

namespace DesigningWithInterfaces.LoggingInterface {

///

/// Factory class to get the appropriate ILogger based on what is specified in /// the App.Config file /// public class LoggerFactory {

#region Member Variables

// reference to the ILogger object. Get a reference the first time then keep it private static ILogger logger;

// This variable is used as a lock for safety private static object lockObject = new object();

#endregion

public static ILogger GetLogger() { lock (lockObject) { if (logger == null) { string asm_name = ConfigurationManager.AppSettings["Logger.AssemblyName"]; string class_name = ConfigurationManager.AppSettings["Logger.ClassName"];

if (String.IsNullOrEmpty(asm_name) || String.IsNullOrEmpty(class_name)) throw new ApplicationException("Missing config data for Logger");

Assembly assembly = Assembly.LoadFrom(asm_name); logger = assembly.CreateInstance(class_name) as ILogger;

if (logger == null) throw new ApplicationException( string.Format("Unable to instantiate ILogger class {0}/{1}", asm_name, class_name)); } return logger; } } } }

This is a factory class. To get a reference to the appropriate ILogger object, we don’t call new() in our application code, we call the static method LoggerFactory.GetLogger(). The first time this method is called, it will look into the app.config file and get the assembly and class name of the adapter class for the logging system we want to use. It then uses the reflection API to load that assembly into the .NET Runtime and then get an instance of the class. It then keeps a reference to this ILogger object to use on subsequent calls to this method. Finally, all of this logic is wrapped in a lock statement to assure thread safety. The last paragraph may seem complex, but when you boil it down, what is really happening is that we are creating the appropriate ILogger to use based on two entries in the app.config file. Now, we can change what logging system is being used by our application simply by changing entries in the config file. We do not have to make any code changes or recompile any code, simply to change the values in the config file. The .NET runtime still has to be able to locate and load the DLL with the adapter class in it (and in turn, locate and load the logging framework DLL’s you intend to use). But our system is truly decoupled now. Our application code is programmed against an interface such that any backend logging system can be used. If we want to support some new logging backend that hasn’t been invented yet, we simply have to write a new adapter class, compile it, copy the DLL to where our application can find it along with the new logging system DLL’s and change the config file. We have achieved true interchangeability within our system, so now different logging frameworks are merely plugins, and we can select whichever one suits our needs best. Putting It All Together

Here is what the new design looks like in terms of a UML diagram: The classes Log4NetLogger and EnterpriseLibraryLogger both implement a common interface, ILogger. Log4NetLogger and EnterpriseLibraryLogger both serve as adapter classes, adapting the common logging interface we defined in ILogger to specific implementations for their respective backend logging frameworks. The class LoggerFactory is a factory class. It contains just one static method, GetLogger(), which will figure out the appropriate implementation of ILogger we are using, create the appropriate adapter class via reflection (Log4NetLogger or EnterpriseLibraryLogger) and return it to us. From the perspective of any application, the application only knows about three types: the enum LogLevel since it defines the logging levels, LoggerFactory which is used to get a reference to ILogger and ILogger itself. Any code that uses logging has no knowledge of whether Log4Net, Enterprise Library or another system is being used. That is the purpose of this design. The application is programmed against an interface (in this case ILogger), but has no knowledge of what specific implementation (log4net or Enterprise Library) is being used. The design is now said to be loosely coupled. Every design has tradeoffs, and this design is no exception. By programming to a common specification (the interface), we have gained the ability to easily switch out different logging frameworks without any impact to our application code. What we have given up though is the capability to use any advanced or unique features of an individual logging system. By programming to an interface, we can only make use of the features exposed by that interface, not any of the other capabilities that underlying implementation may possess. For example, Enterprise Library contains a number of additional features not in our logging interface: The ability to specify a title as well as a message for each log entry The ability to specify multiple categories for a log message, thereby potentially directing it to multiple destinations The ability to define more granular levels through the use of the priority field in addition to the event severity The ability to specify an Event ID value to further categorize event messages

Of course, I have purposely left the logging interface very generic and simplistic for the purposes of this article. But this list of sacrificed features still serves to illustrate an important point. To program to an interface, you will most likely give up access to any unique features and advanced capabilities that a particular implementation provides. In this case, if we merely want to perform general purpose logging, we can accept this tradeoff because our desire to achieve flexibility an interchangeability outweighs our need for the unique features above. However, if those unique features were determined to be essential requirements, we would be faced with two choices. First, we could forgo the use of an interface and program directly against our target implementation. If those features are that important to us, this may be an acceptable tradeoff. Second, we could refactor our interface to take additional parameters that could be passed on to our adapter class and ultimately our target framework. In this case though, every adapter class must be modified to fulfill the interface. For example, let’s say that having the ability to specify an event id was a required feature for our logging interface. We would have to modify ILogger to accept event id in its WriteMessage() event, and both Log4NetLogger and EnterpriseLibraryLogger. Since log4net doesn’t have a concept of event id, we have to figure out something to do with the event id passed in. We could throw the value away, but more likely, we may add the value into the message in some formatted fashion. In this case, the tradeoff is additional complexity. To include a feature that is unique to Enterprise Library, we have to add some logic and therefore complexity to the Log4netLogger. Any system design will include such tradeoffs. What is important is to evaluate what is most important in each particular design and make the right tradeoffs for the system being implemented. Design Patterns

The design above is what is known as the Bridge design pattern. This pattern is also sometimes known as the plugin pattern because each specific implementation can be easily “plugged in” to a program. The concept of the bridge pattern is that as long as an implementation fulfills the defined interface, implementations can be freely substituted for each other. Since the application is programmed against the interface, it is unaffected by switching to a new implementation. The bridge pattern is very commonly used with regards to database drivers. ADO.NET, OLE DB and ODBC are all standard data access technologies for which Microsoft publishes a written specification. Other database vendors then take these specification and develop database drivers for their specific database management system. As long as those drivers adhere to the published specification, they can be easily used by any application without the application needing to know what specific database backend is used. An example of this is Microsoft Excel. Excel knows how to query data from ODBC. Excel doesn’t know if your backend database is SQL Server, Oracle, MySQL or even a text file. Excel just knows that it is talking to ODBC, and there is a driver that implements the ODBC specification. As long as you have an ODBC driver for your data source, Excel can get data from it. This is a classic example of the bridge pattern and the power of programming against an implementation rather than a specific interface. The second design pattern present is the Adapter pattern. Both Log4NetLogger and EnterpriseLibraryLogger serve are examples of the adapter pattern in action, effectively translating the ILogger interface methods into something that each backend logging framework can understand. Without the adapter classes in the middle, ILogger and the logging frameworks would be incompatible. With the adapter in the middle, the two are able to work together. The third and final design pattern here is the factory pattern. The factory pattern is a design pattern that encapsulates the details of how to get an instance of an object within the factory. In this example, we do not want code in other modules to use new() to create a new ILogger instance. Doing so would defeat the whole purpose of putting using an interface in the first place, because then some code somewhere would still be bound to a specific implementation. Therefore, we have created the LoggerFactory class to act as a factory that handles the details of obtaining an instance of the correct object. In this case, these details involve looking in the app.config file for the name of the appropriate class and using the reflection API to obtain an instance. All of this code is encapsulated in the factory class, so other modules can, with ease, obtain a reference to the correct Logger object without knowing all of these details. Design patterns have become a hot topic in the .NET community over the last few years, with literally hundreds of books, articles and screencasts to explain the details of all of the various patterns available. What is more important than being able to rattle off the intricacies of each and every pattern at the drop of a name is to understand the design concepts that are involved. The focus in software design should be on good design principles like clean separation of modules, separating the interface from implementation and decoupling different parts of the system. Design patterns offer established solutions to common problems in software development. However, you shouldn’t become a slave to patterns or force fit patterns into your design. Follow the principles above, and the places where a pattern can effectively help solve a problem will naturally emerge. When Should You Design with Interfaces

The last step in learning how to successfully design with interfaces is being able to recognize when using an interface will result in a superior design. The following are some classic examples of when designing with and coding to an interface is appropriate and generally results in a better design. Whenever a third party component or service is used. Unfortunately, libraries change, companies merge and acquire new products and stop supporting old ones. If you are able to distill out the important operations that a component or service performs into an interface, you are generally in a much better position if you use the design outlined above and call the component or service via an interface. This way, if you want to switch to a competitor’s product in the future, you have the flexibility to do so simply by writing a new adapter class. Secondly, if the component or service ever changes or is upgraded, it will now be much easier to test, because you are testing that the new implementation fulfills the interface. As long as the new implementation fulfills the interface, you can have high confidence the entire system will still work without having to test each and every feature in the system as a whole. Another classic example is to hide your data access code behind an interface or series of interfaces. In some cases this is done so you can have vendor specific implementations of the data access layer—that is, one data access implementation for SQL Server, one for Oracle, one for DB2, etc. In this way, each data access implementation can utilize vendor specific features (example, identity columns in SQL Server, sequences in Oracle) but the rest of the application does not need to know about these details Application settings can be loaded from different places, including the app.config file, a database, an XML file on a network share or the Windows registry. It would be very straightforward to abstract the operation of getting an application setting to an interface, and then provide a specific implementation to read the settings from each location. This technique could be useful if for example, today application settings are stored in the Windows registry for legacy reasons, but in the future, you are planning to transition these settings to a different location, like an XML file in the user’s application settings directory. Programming to an interface allows you to provide one implementation for legacy compatibility (the Windows registry) and a second implementation for your desired state (a custom XML file). The power of using an interface though is that when you make the switch, you simply have to update a configuration setting somewhere, rather than rewrite or even recompile code.

When designing with interfaces, there are some design guidelines you should follow. One of the most important is to keep your interfaces focused on the problem you are trying to solve. Interfaces that perform multiple unrelated tasks tend to be very difficult to implement in a class. A class may only want to implement part of the interface because that is all that is needed, but is required to implement the entire interface. An interface should clearly and concisely communicate what purpose it serves and what functionality it provides. This makes it clear to implementers what is expected and clear to users of the interface what functions the module can perform. When an interface starts trying to perform too many tasks, it is too easy for the original purpose of the interface to become lost, defeating much of the value of having an interface in the first place. A second guideline is to make sure the interface does not contain too many methods. Too many methods makes implementing the interface difficult as the implementing class has to provide for each and every method in the interface. At best, this is tedious. At worst, the implementer may be tempted to “stub out” methods that they don’t consider important by providing an empty or underdeveloped implementation. In this case, while the class provides a method for each method in the interface, it does not truly fulfill the interface. This becomes a problem if you have application code expecting certain functionality of an interface since a method is provided, but the implementation is but a stub. By keeping the number of methods in an interface reasonable and keeping those methods focused on the functionality that the interface is supposed to provide, it is much more likely the interface will be used and correctly implemented. A third guideline to remember is to not allow implementation specific functionality creep up into the interface. Too often, when a development team has an existing module from which they are trying to extract an interface, they will include functionality in the interface that is very specific to the current implementation that exists. This becomes a problem when you want to write a different class that implements the interface. This then limits the usefulness of the interface, because the interface itself is really tied to a specific implementation, which is not the point. An interface should define the common functionality that the module or subsystem will perform. Any implementation specific logic must be contained inside of an implementing class, not exposed as a method on the interface itself. The interface is a definition of what functionality the module provides, not a constraint on how an implementing class must provide that functionality. A fourth and final guideline is to keep in mind is that while some level of abstraction is positive, too many levels of abstraction lead to code that is over-complex and difficult to maintain. In this article, I have focused on the use of interfaces to provide a level of abstraction between different subsystems of an overall application. In this case, providing a level of abstraction between the logging system and the application code has a clear purpose and helps to provide a level of separation between the logging component and the rest of the application. Including additional levels of abstraction through would probably only serve to make the design more difficult to understand. Using an interface should help more clearly define what the role of a module or unit of code is, and therefore lead to a design that is clearer to understand, not more complex. When in doubt, ask a colleague to review the design and explain it back to you. If the design is correct, they should be able to provide an explanation of the purpose of each interface in the system. If they struggle to understand the purpose of a particular interface, you may want to rethink what you were trying to accomplish with that interface in the first place and whether the code reflects your intentions. Conclusion

Designing with interfaces will result in cleaner separation of responsibilities between different subsystems within an application. By programming to an interface, applications become much more flexible because different subsystems can be easily switched out if the need arises. Furthermore, if a subsystem does have to be switched out, the burden of testing is reduced, because now you can focus most of your testing efforts on insuring that the new implementation properly fulfills the interface rather than testing the entire system as a whole. This article covered an example of how interface design could be applied to the problem of application logging frameworks. I hope that, by understanding the example given in this article, more developers will come to recognize the power and flexibility that interfaces bring and will start using interfaces to design more flexible software systems.

© Simple-Talk.com Coordinating schedules

Published Tuesday, August 23, 2011 3:00 AM

I'm moving a SQL Server off old hardware at the moment and one thing that makes life easier if to have the same schedules on the new server, all ready to pick from the UI when you are creating new jobs. Having to create a new schedule in the middle of this process is a pain and a distraction, check the short video for the comparison between the complex interface to create a schedule and the ease of picking one that already exists. So, the trick is to get the schedules from your existing server to your new one before you start creating the jobs on your new server. Details of SQL Server Jobs and Schedules are held in the MSDB system database. It's safe to work with data in here so long as you know what you are doing, attempting to do and have a restorable backup ready to get you out of trouble. Do this process on a test server until you are confident that nothing will go wrong. I wont be able to rescue you if it does, you will be on your own looking for the quickest available SQL contractor in your area.

Now, if we review the two tables in question we see that the sysjobs table has a schedule_id column and the sysschedules table has a schedule_id column. This means there must be a middle table to manage the fact that one job can run on many schedules but also one schedule could be used by many jobs. The many-to-many relationship is handled by the sysjobschedules table. When you create a SQL Job you are effectively inserting a row in the sysjobs table. Likewise when you create a schedule you insert a row in the sysschedules table and then when you then pick a schedule for a job you insert a row in the sysjobschedules table. All we need to do is shortcut using the UI to create each and every schedule one by one.

INSERT INTO [newserver].[msdb].[dbo].[sysschedules] ( [schedule_uid] , [originating_server_id] , [name] , [owner_sid] , [enabled] , [freq_type] , [freq_interval] , [freq_subday_type] , [freq_subday_interval] , [freq_relative_interval] , [freq_recurrence_factor] , [active_start_date] , [active_end_date] , [active_start_time] , [active_end_time] , [date_created] , [date_modified] , [version_number] ) SELECT [schedule_uid] , [originating_server_id] , [name] , [owner_sid] , [enabled] , [freq_type] , [freq_interval] , [freq_subday_type] , [freq_subday_interval] , [freq_relative_interval] , [freq_recurrence_factor] , [active_start_date] , [active_end_date] , [active_start_time] , [active_end_time] , [date_created] , [date_modified] , [version_number] FROM [oldserver].msdb.[dbo].[sysschedules] AS s JOIN [oldserver].msdb.[dbo].[sysjobschedules] AS s2 ON [s].[schedule_id] = [s2].[schedule_id] WHERE [enabled] = 1

Now in this code I have linked to the oldserver.sysjoschedules table so that I can determine whether the code is enabled or not as I only wanted to transfer the active schedules from the old server. You may or may not wish to do the same.

I hope this helps you while you are working with your servers. If you are in the US you may be interested in seeing some world class SQL Server speakers for free. are running the LA SQL in the City event there. Do try to get along, the London event was brilliant. by fatherjack Filed Under: SSMS, Tips and Tricks, TSQL, Admin Further Down the Rabbit Hole: PowerShell Modules and Encapsulation 24 August 2011 by Michael Sorens

Modules allow you to use standard libraries that extend PowerShell's functionality. They are easier to use than to create, but if you get the hang of creating them, your code will be more easily-maintained and re-usable. Let Michael Sorens once more be your guide through PowerShell's 'Alice in Wonderland' world.

Contents

Encapsulation Refactor Inline Code into Functions Refactor Functions into Files Refactor Functions into Modules Best Practices for Module Design Extracting Information about Modules Installing Modules Associating a Manifest to a Module Unapproved Verbs Documenting a Module Enhancing Robustness Name Collisions – Which One to Run? Conclusion

n my previous PowerShell exploration (A Study in PowerShell Pipelines, Functions, and Parameters) I concentrated on describing how Iparameters were passed to functions, explaining the bewildering intricacies on both sides of the function interface (the code doing the calling and the code inside the function doing the receiving). I didn’t mention how to go about actually creating a function because it was so simple to do that it could safely be left as an extracurricular exercise. With modules, by contrast, the complexity reverses; it is more intricate to create a module than to use a module, so that is where you are heading now. The first half of this article guides you along the twisted path from raw code to tidy module; the second half introduces a set of best practices for module design. Encapsulation

As you likely know, encapsulation makes your code more manageable. Encapsulation is the process of separating an interface from its implementation by bundling data and code together and exposing only a well-defined portion of it. The following sections walk you along the road to encapsulation in PowerShell.

"Would you tell me, please, which way I ought to go from here?" "That depends a good deal on where you want to get to," said the Cat. "I don't much care where – " said Alice. "Then it doesn't matter which way you go," said the Cat. " – so long as I get somewhere," Alice added as an explanation. "Oh, you're sure to do that," said the Cat, "if you only walk long enough."

-- Chapter 6, Alice's Adventures in Wonderland (Lewis Carroll)

Refactor Inline Code into Functions

Encapsulation encourages you to convert a single code sequence with inordinate detail into a more digestible and simpler piece of code (Figure 1). Figure 1: Refactoring inline code to a function Refactoring the first example into the second ended up only moving one or two lines of code (depending on how you count it) into the separate Match-Expression function. But look at how much easier it is to comprehend the code! The main program lets a reader of your code observe that Match-Expression uses the given regular expression to find several values from a given string. It does not reveal how—the Match-Expression function hides the details of how the match operator works. And that's great, because your reader does not care. Before you argue the point, consider a different context such as some .NET-supplied function, e.g. String.Join. Except in rare circumstances you simply do not care about the implementation of String.Join; you just need to know what it does. Refactoring to functions is useful and important to do, of course, but there is one cautionary note: if instead of the simple Match-Expression function you have a more complex function that includes several support functions and variables, all of those support objects are polluting your current scope. There is nothing to prevent another part of your script from using one of these support functions that was specifically designed to be used only by Match-Expression (or rather its complex cousin). Or worse, in your zeal to refactor into smaller and smaller functions you might create a function with the same name as a built-in cmdlet; your function would supersede the built-in one. The next section returns to this consideration after a fashion. Refactor Functions into Files

Now you have this Match-Expression function that came in quite handy in your script. You find it so useful, in fact, that you want to use it in other scripts. Good design practice dictates the DRY principle: Don't Repeat Yourself. So rather than copying this function into several other script files, move it into its own file (Expressions.ps1) and reference it from each script. Modify the above example to use dot-sourcing (explained in the Using Dot Source Notation with Scope section of the help topic about_Scopes) to incorporate the contents of Expressions.ps1 (Figure 2).

Figure 2: Refactoring an inline function to a separate file The code on the right is exactly equivalent to the code on the left. The elegance of this is that if you want to change the function you have only one piece of code to modify and the changes are automatically propagated everywhere you have referenced the file. Dot-sourcing reads in the specified file just as if it was in the file. Dot-Sourcing Pitfall

There is, however, a potential problem. As you have just seen, dot-sourcing syntax includes just two pieces: a dot (hence the name!) and a file path. In the example above I show the file path as a dot as well, but there it means current directory. The current directory is where you happen to be when you invoke the script; it is not tied to the script's location at all! Thus, the above only works because I specifically executed the script from the script directory. What you need then is a way to tell PowerShell to look for the Expressions.ps1 file in the same directory as your main script— regardless of what your current directory is. A web search on this question leads you to the seemingly ubiquitous script that originated with this post by Jeffrey Snover of the PowerShell team:

function Get-ScriptDirectory { $Invocation = (Get-Variable MyInvocation -Scope 1).Value Split-Path $Invocation.MyCommand.Path }

If you include the above in your script (or in a separate file and dot-source it!) then add this line to your script:

Write-Host (Get-ScriptDirectory)

…it will properly display the directory where your script resides rather than your current directory. Maybe. The results you get from this function depend on where you call it from! It failed immediately when I tried it! I was surprised, because I found this code example proliferated far and wide on the web. I soon discovered that it was because I used it It is a very inconvenient habit of kittens (Alice had differently to Snover's example: Instead of calling it at the top-level in my script, I’d called it once made the remark) that, whatever you say to from inside another function in a way I refer to as “nested twice” in the following table. It took them, they always purr. “If they would only purr for just a simple tweak to make Get-ScriptDirectory more robust: You just need to change from 'yes,' and mew for 'no,' or any rule of that sort,” she parent scope to script scope; -scope 1 in the original function definition indicates parent had said, “so that one could keep up a scope and $script in the modified one indicates script scope. conversation! But how can you talk with a person if they always say the same thing?” function Get-ScriptDirectory { --Alice. Chapter 12, Through the Looking Glass (Lewis Carroll) Split-Path $script:MyInvocation.MyCommand.Path }

To illustrate the difference between the two implementations, I created a test vehicle that evaluates the target expression in four different ways (bracketed terms are keys in the table that follows): Inline code [inline] Inline function, i.e. function in the main program [inline function] Dot-sourced function, i.e. the same function moved to a separate .ps1 file [dot source] Module function, i.e. the same function moved to a separate .psm1 file [module]

The first two columns in the table define the scenario; the last two columns display the results of the two candidate implementations of Get- ScriptDirectory. A result of script means that the invocation correctly reported the location of the script. A result of module means the invocation reported the location of the module (see next section) containing the function rather than the script that called the function; this indicates a drawback of both implementations such that you cannot put this function in a module to find the location of the calling script. Setting this module issue aside, the remarkable observation from the table is that using the parent scope approach fails most of the time (in fact, twice as often as it succeeds)!

Where Called What Called Script Scope Parent Scope Top Level inline script error inline function script script dot source script script module module module Nested once inline script script inline function script error dot source script error module module module Nested twice inline script error inline function script error dot source script error module module module

(You can find my test vehicle code for this in my post on StackOverflow.) Dot-Sourcing: The Dark Side

Dot-sourcing has a dark side, too, however. Consider again if instead of the simple Match-Expression function you have a more complex function that includes several support functions and variables. Moving those support functions out of the main file and hiding them (i.e. encapsulating them) in the file you will include with dot-sourcing is clearly a good thing to do. But the problem of dot-sourcing, then, is precisely the same as the benefit: Dot-sourcing reads in the specified file just as if it was in the file. That means dot-sourcing pollutes your main file with all of its support functions and variables—it is not actually hiding anything. In fact, the situation is far worse with dot-sourcing than it was with just refactoring in the same file: here the detritus is hidden from you (because you no longer see it in your main file) yet it is present and polluting your current scope all the same. But do not despair! The next section provides a way out of this quagmire. Refactor Functions into Modules

A module is nothing more than a PowerShell script with a .psm1 extension instead of a .ps1 extension. But that small change also addresses both of the issues just discussed for dot-sourcing a script. Figure 3 returns to the familiar example again. The contents of Expressions.ps1 and Expressions.psm1 are identical for this simple example. The main program uses the Import-Module cmdlet instead of the dot-sourcing operator.

Figure 3: Refactoring code from dot-sourcing to module importation Notice that the Import-Module cmdlet is not referencing a file at all; it references a module named Expressions, which corresponds to the file Expressions.psm1 when it is located under one of these two system-defined locations (See Storing Modules on Disk under Windows PowerShell Modules): Machine-specific C:\Windows\System32\WindowsPowerShell\v1.0\Modules User-specific C:\Users\username\Documents\WindowsPowerShell\Modules

Thus, the whole issue of current directory and script directory, a problem for dot-sourcing, becomes moot for modules. To use modules you must copy them into one or the other of these system repositories to be recognized by PowerShell. Once deposited you then use the Import-Module cmdlet to expose its interface to your script. (Caveat: you cannot just put Expressions.psm1 in either repository as an immediate child; you must put it in a subdirectory called Expressions. See the next section for the rules on this interesting topic.) The second issue with dot-sourcing and inline code was pollution due to “faux encapsulation”. A module truly does encapsulate its contents. Thus, you can have as much support code as you want in your module; your main script that imports the module will be able to see only what you want exposed. By default, all functions are exposed. So if you do have some functions that you want to remain private, you have to use explicit exporting instead of the default. Also, if you want to export aliases, variables, or cmdlets, you must use explicit exporting. To explicitly specify what you want to export (and thus what a script using the module can see from an import) use the Export-ModuleMember cmdlet. Thus, to make Expressions.psm1 use explicit exporting, add this line to the file:

Export-ModuleMember Match-Expression

Best Practices for Module Design

Before you launch into creating modules willy-nilly, there are a few more practical things you should know, discussed next. Extracting Information about Modules

Before you can use modules you have to know what you already have and what you can get. Get-Module is the gatekeeper you need. With no arguments, Get-Module lists the loaded modules. (Once you load a module with Import-Module you then can use its exported members.) Here is an example:

ModuleType Name ExportedCommands ------Manifest Assertions {Set-AbortOnError, Assert-Expression,Set-MaxExpressionDisplayLe… Manifest IniFile Get-IniFile Manifest Pscx {} Script Test-PSVersion {} Script TestParamFunctions {} Manifest BitsTransfer {}

The module type may be , script, or binary (more on those later). The exported commands list identifies all the objects that the module writer exported with explicit exports. An empty list indicates default or implicit export mode, i.e. all functions in the module.

Guideline #1: Use explicit exports so Get-Module can let your user know what you are providing

Get-Module has a ListAvailable parameter to show you what is available to load, i.e. what you have correctly installed into one of the two system repository locations provided earlier. The output format is identical to that shown just above. The default output of Get-Module shows just the three properties above, but there are other ones that are important as well. To see what other interesting properties you could extract from Get-Module, pipe it into the handy Get-Member cmdlet: Get-Module | Get-Member

Notable properties you find in the output include Path (the path to the module file), Description (a brief summary of the module), and Version. To display these properties with Get-Module, switch from its implicit use of Format-Table to explicit use, where you can enumerate the fields you want:

Get-Module -ListAvailable | Format-Table Name, Path, Description, Version

Name Path Description Version ------Assertion C:\Users\ms\Documents\Wi... Aborting and non-abortin... 1.0 EnhancedChildItem C:\Users\ms\Documents\Wi... Enhanced version of Get-... 1.0 inifile C:\Users\ms\Documents\Wi... INI file reader 1.0 SvnKeywords C:\Users\ms\Documents\Wi... 0.0 MetaProgramming C:\Users\ms\Documents\Wi... MetaProgramming Module 0.0.0.1 TestParamFunctions C:\Users\ms\Documents\Wi... 0.0 AppLocker C:\Windows\system32\Wind... PowerShell AppLocker Module 1.0.0.0 BitsTransfer C:\Windows\system32\Wind... 1.0.0.0 PSDiagnostics C:\Windows\system32\Wind... 1.0.0.0 TroubleshootingPack C:\Windows\system32\Wind... Microsoft Windows Troubl... 1.0.0.0

If you actually want to see the value of some fields, though, particularly longer fields like Path or Description, it might behoove you to use Format- List rather than Format-Table:

Get-Module -ListAvailable | Format-List Name, Path, Description, Version

Name : Assertion Path : C:\Users\ms\Documents\WindowsPowerShell\Modules\CleanCode\Assertion\Assertion.psm1 Description : Aborting and non-aborting validation functions for testing. Version : 1.0

Name : EnhancedChildItem Path : C:\Users\ms\Documents\WindowsPowerShell\Modules\CleanCode\EnhancedChildItem\ EnhancedChildItem.psd1 Description : Enhanced version of Get-ChildItem providing -ExcludeTree, -FullName, -Svn, -ContainersOnly, and -NoContainersOnly. Version : 1.0

etc. . .

The Get-Member cmdlet quite thoroughly tells you what you can learn about a module but if, like me, you occasionally prefer to bore down into the raw details, you can follow the object trail to its source. First, you can determine that the .NET type of an object returned by Get-Module is called PSModuleInfo via this command:

(Get-Module)[0].GetType().Name

Lookup PSModuleInfo on MSDN and there you can see that the list of public properties are just what Get-Member showed you. On MSDN, however, you can dig further. For example, if you follow the links for the ModuleType property, you can drill down to find that the possible values are Binary, Manifest, and Script, as mentioned earlier. Finally, for loaded modules (i.e. not just installed but actually loaded) you can explore further with the Get-Command cmdlet, specifying the module of interest:

Get-Command -Module Assertion

CommandType Name Definition ------Function Assert-Expression param($expression, $expected)… Function Get-AssertCounts … Function Set-AbortOnError param([bool]$state)… Function Set-MaxExpressionDisplayLength param([int]$limit = 50)…

Again, you can use Get-Member to discover what other properties Get-Command could display. Installing Modules

Now that you know how to see what you have installed here are the important points you need to know about installation. As mentioned earlier you install modules into either the system-wide repository or the user-specific repository. Whichever you pick, its leaf node is Modules so in this discussion I simply use “Modules” to indicate the root of your repository. The table shows what Get-Module and Import-Module can each access for various naming permutations.

# Location Get-Module ? Import-Module ? 1 name\name.psm1 name name 2 name.psm1 X X 3 namespace\name\name.psm1 name namespace\name 4 namespace\folder\name\name.psm1 name namespace\folder\name 5 name\other-name.psm1 X name\other-name

Standard module installation (line 1 in the table) requires that you copy your module into this directory: Modules/module-name/module-name.psm1

That is, whatever your modules base file name is, the file must be stored in a subdirectory of the same name under Modules. If instead you put it in the Modules root without the subdirectory: Modules/module-name.psm1

…PowerShell will not recognize the module (line 2 in the table)! This peculiar behavior is probably what you would try first, so it is a common source of frustration with modules not being recognized. Putting a module in the Modules directory is not good enough; only in an eponymous subfolder will it make be recognized by PowerShell. Alice felt dreadfully puzzled. The Hatter's remark Line 3 illustrates that you can use namespaces rather than clutter up your Modules root with seemed to her to have no sort of meaning in it, and a hodgepodge of modules from different sources. When you use Get-Module, though, the yet it was certainly English. “I don't quite default output shows just the name; you must look at the Path property of Get-Module if understand you,” she said, as politely as she could. you want to see the namespace as well. If you ask Get-Module to find a particular module, --Alice, Chapter 7, Alice's Adventures in Wonderland (Lewis you again provide only the name. However, when you use Import-Module you specify the Carroll) path relative to the Modules root. Note that namespaces are purely a convention you may or may not choose to use; PowerShell has no notion of namespaces per se (at least as of version 2—Dmitry Sotnikov has made a plea via Microsoft Connect to add namespaces in future versions; see We Need Namespaces!). Line 4 extends the case of line 3, showing that you can make your namespace as nested as you like—as long as your modules end up in like- named leaf directories. Given the above discourse, here is the next cardinal rule for modules:

Guideline #2: Install a module in an eponymous subdirectory under your Modules root

Line 5 in the table presents an interesting corner case showing what happens if you violate Guideline #2. The module is invisible to Get-Module - ListAvailable yet you can still load it by specifying the differing subdirectory name and module name. This is, of course, not advisable. Associating a Manifest to a Module

The first half of the article showed the progression from inline code to script file to module file. There is a further step – introducing a manifest file associated with the module file. You need to use a manifest to specify details of your module that may be accessed programmatically. Recall that when discussing Get-Module one example showed how to get additional properties beyond the default – including description and version. But in the example's output, some entries showed an empty description and a 0.0 version. Both description and version come from the manifest file; a module lacking a manifest has just those default values. To create a manifest file, simply invoke the New-ModuleManifest command and it will prompt you to enter property values. If you do this in a standard PowerShell command-line window, you receive a series of prompts for each property. If, on the other hand, you use the PowerGUI script editor it presents a more flexible pop-up dialog, as shown in figure 4. I also entered a couple other common properties, author and copyright. Figure 4: New-ModuleManifest dialog from PowerGUI Script Editor The ModuleToProcess property must reference your module script file. Upon selecting OK, the dialog closes and the manifest file is created at the location you specified for the Path property. The path of the manifest file must also follow rule #2, this time with a .psd1 extension. Once the manifest exists, PowerShell now looks to the manifest whenever you reference the module, notably in both the Get-Module and Import-Module cmdlets. You can confirm this with Get-Module: recall that Get-Module displays the ModuleType property by default; now you will see it display Manifest instead of Script for the ModuleType.

Guideline #3: Use a manifest so your users can get a version and description of your module

Once you create your manifest, or at any time later, you can use Test-ModuleManifest to validate it. This cmdlet checks for existence of the manifest and it verifies any file references in the manifest. For more on manifests, see How to Write a Module Manifest on MSDN. Unapproved Verbs

If you imported the Expressions.psm1 module given earlier, you likely received this warning message: WARNING: Some imported command names include unapproved verbs which might make them less discoverable. Use the Verbose parameter for more detail or type Get-Verb to see the list of approved verbs.

PowerShell wants to encourage users to use standard naming conventions so it is easier for everybody who uses external modules to know what to expect. Cmdlets and functions should use the convention action-noun (e.g. Get-Module). PowerShell does not make any guesses about your choice of nouns, but it is particular about your choice of actions. You can see the list of approved actions, as the warning about indicates, by executing the Get-Verb cmdlet. Note that I use the term action rather than verb in this paragraph, because PowerShell's definition of verb is rather non-standard(!). Humpty Dumpty really had the right idea – I use this quote frequently…

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean – neither more nor less.”

-- Chapter 6, Through the Looking Glass (Lewis Carroll)

To PowerShell a verb is “a word that implies an action”, so a construct such as New-ModuleManifest qualifies. See Cmdlet Verbs in MSDN for more details on naming. Guideline #4: Name your functions following PowerShell conventions

Documenting a Module

The help system in PowerShell is a tremendous boon: without leaving the IDE (or PowerShell prompt) you can immediately find out almost anything you care to know about any PowerShell cmdlet (e.g. Get-Help Get-Module) or general topic (e.g. Get-Help about_modules). When you create a module you can easily provide the same level of professional support for your own functions. Implementing the help is the easy part; writing your content is what takes most of your time. To implement the integrated help support, you add documentation comments (“doc-comments”) to your module script file just like you would with your other favorite programming language. Some IDEs provide great support for adding doc-comments. Visual Studio, for example, with the GhostDoc add-on almost writes the doc-comments for you. Alas, PowerShell does not yet have such a ghost writer available. To do it yourself, start with about_Comment_Based_Help (which you can also access from the PowerShell prompt by feeding that to Get-Help!). Scroll down to the Syntax for Comment-Based Help in Functions section. Note that the page also talks about adding help for the script itself; that applies only to main scripts (ps1 files); it does not apply to modules (psm1 files). What you will see here is that you must add a special comment section that looks like this for each function:

<# .< help keyword> < help content> . . . #>

…and that you can place that in any of three positions relative to your function body. You can then pick your relevant help keywords from the subsequent section, Comment-Based Help Keywords. One small annoyance (hard to say if it is a feature or a defect, since it documents it as both in adjoining paragraphs!): for each function parameter, Get-Help displays a small table of its attributes. But the default value is never filled in! Here is an example from Get-Module' s ListAvailable parameter:

-ListAvailable [] Gets all of the modules that can be imported into the session. Get-Module gets the modules in the paths specified by the $env:PSModulePath environment variable.

Without this parameter, Get-Module gets only the modules that have been imported into the session.

Required? false Position? named Default value Accept pipeline input? false Accept wildcard characters? false

You can see this feature/issue documented under Autogenerated Content > Parameter Attribute Table. The documentation is certainly thorough on this point, though, even to the extent of providing a workaround—it suggests you mention your default in your help text. And that is just what all the standard .NET cmdlets do! PowerShell provides support for help on individual modules, allowing Get-Help to access your help text, as you have just seen. If you produce libraries rather than just individual modules you will next be looking for the way to create an API documentation tree that you can supply with your library. Wait for it… sigh. No, PowerShell does not provide any such took like javadoc for Java or Sandcastle for .NET. Well, I found that rather unsatisfactory so I undertook to create one. My API generator for PowerShell (written in PowerShell, of course!) is in my PowerShell library, scheduled for release in the fourth quarter of 2011. You can find it here on my API bookshelf, alongside my libraries in five other languages. As an enthusiastic library builder, I have created similar API generators for Perl (see Pod2HtmlTree) and for T-SQL (see XmlTransform). (Note that the Perl version is Perl-specific while the T-SQL one is my generic XML conversion tool configured to handle SQL documentation, described in Add Custom XML Documentation Capability To Your SQL Code.)

Guideline #5: Add polish to your modules by documenting your functions

Enhancing Robustness

I would be remiss if I did not add a mention, however brief, of an important guideline for any PowerShell script, module or otherwise. Let the compiler help you—turn on strict mode with Set-StrictMode: Set-StrictMode -Version Latest

Guideline #6: Tighten up your code by enforcing strict mode

It is regrettable that that setting is not on by default. Name Collisions – Which One to Run?

If you create a function with the same name as a cmdlet, which one does PowerShell pick? To determine that you need to know the execution precedence order (from about_Command_Precedence):

1. Alias 2. Function 3. Cmdlet 4. Native Windows commands

If you have two items at the same precedence level, such as two functions or two cmdlets with the same name, the most recently added one has precedence. (Hence the desire by some to have namespaces introduced in PowerShell, as mentioned earlier.) When you add a new item with the same name as another item it may replace the original or it may hide the original. Defining a function with the same name as an existing cmdlet, for example, hides the cmdlet, but does not replace it; the cmdlet is still accessible if you provide a fully-qualified name. To determine the name, examine the PSSnapin and Module properties of the cmdlet:

Get-Command Get-ChildItem | Format-List -property Name, PSSnapin, Module

Name : Get-ChildItem PSSnapIn : Microsoft.PowerShell.Management Module :

The fully qualified name, then, for the Get-ChildItem cmdlet is:

Microsoft.PowerShell.Management\Get-ChildItem

To avoid naming conflicts in the first place, import a module with the Prefix option to the Import-Module cmdlet. If you have created, for example, a new version of Get-Date in a DateFunctions module and run this:

Import-Module -name DateFunctions -prefix Enhanced

…then your Get-Date function is now mapped to Get-EnhancedDate, i.e., the action in the command is affixed with the prefix you specified. Conclusion

Modules let you organize your code well and to make your code highly reusable. Now that you are aware of them, you will probably start noticing code smells that shout “Module!”. That is, be on the lookout for chunks of code that perform a useful calculation but are generic enough to deserve separating out from your main code. I have found that taking the effort to move generic functionality into a separate module forces me to think about it in isolation and often leads me to find corner cases that I missed in the logic. Also, modularizing lets you then focus more fine-grained and more specific unit tests on that code as well. For further reading, be sure to take a look at the whole section on modules on MSDN at Writing a Windows PowerShell Module. Finally, for a smattering of open-source modules, see Useful PowerShell Modules.

© Simple-Talk.com Hybrid Cloud

Published Tuesday, August 30, 2011 3:45 AM

Someone from the Productivity team at Microsoft sent me personally a tweet (squee!) asking what I thought about this article on the idea that companies are going to take a hybrid approach to the cloud. My immediate response, which I tweeted, was "Of course." It only makes sense. People are going to identify things that they don't want to do on premises, like manage email or run SharePoint servers or even run database servers, and they're going to find a service that will do it for them. Don't believe me? Fine. Let me ask you this: When is the last time you built an email management system? What do you mean that you went and found one and purchased it? Why would you do that? Oh, it's saving you time and money? Your company isn't in the business of developing email management software? It'd be buggy? You could spend time & money elsewhere that helps the business more? Yep. My thoughts exactly. You can't stop cloud adaptation. You're not even going to slow it down. I've already seen departmental teams within a large company bypassing the IT department to put some stuff into the cloud in order to get it out there faster. It's going to happen to you. The very best thing you can do is not to stand in front of this bus called the cloud that's headed right for you. No, the best thing you can do is climb on board, grab the wheel and tromp the accelerator to the floor. Start learning this stuff. Figure it out and recognize that your worth is not in some narrow bit of knowledge about some particular piece of software. Your worth is in your ability to provide solutions, in any form, to the business so that you are helping them do their jobs better & faster. Do I think everything is going into the cloud? Is 1/2 to 3/4 of IT going to be out of work? Am I suggesting you slit your own throat? No. Absolutely not. I think core competencies are going to remain in-house. Information that is really about the business, and not about junk around the business, is going to be staying largely on premises for most companies, especially larger organizations. And there's going to be tons and tons of work moving stuff up to the cloud, maintaining it in the cloud, tuning it in the cloud, or pulling it back down out of the cloud when the performance is bad, or the costs are too high, or whatever might go wrong that I'm not thinking of. That's in addition to all the work you're doing to maintain the systems that aren't and probably never will be, sent off premises. We have plenty of work ahead of us. by Grant Fritchey Filed Under: SQL Azure