No Significant Fragmentation? Look Closer… 28 February 2012 by Luciano Moreira

If you are relying on using 'best-practice' percentage-based thresholds when you are creating an index maintenance plan for a SQL Server that checks the fragmentation in your pages, you may miss occasional 'edge' conditions on larger tables that will cause severe degradation in performance. It is worth being aware of patterns of data access in particular tables when judging the best threshold figure to use.

n this article I’ll be describing an edge case related to logical and internal fragmentation within a specific index branch that may cause Iperformance issues, and also I’d like to contribute to the debate about the use of “global” thresholds for your maintenance plans. Let’s suppose you have a table with a structure that holds 5 rows per page and leaves almost no space to accommodate changes. After a complete index rebuild with a fillfactor of 100%, the pages would be almost full and you should see a minimal logical fragmentation in your index. Here is the script to generate the initial state of our . Script 01 – Create database and tables with records

CREATE DATABASE SimpleTalk GO

USE SimpleTalk GO

IF EXISTS (SELECT [name] FROM sys.tables WHERE [name] = 'HistoricalTable') DROP TABLE HistoricalTable GO

CREATE TABLE HistoricalTable ( ID INT IDENTITY NOT NULL CONSTRAINT PK_ID PRIMARY KEY , ColumnA VARCHAR(1590) NULL , ColumnB VARCHAR(1000) NULL , EventDate DATETIME NULL ) GO

-- Insert some records to simulate our history INSERT INTO HistoricalTable (ColumnA) VALUES (REPLICATE('SimpleTalk', 159)) GO 100000

UPDATE HistoricalTable SET EventDate = DATEADD(HOUR, ID - 100000, GETDATE()) GO

ALTER TABLE dbo.HistoricalTable REBUILD WITH (FILLFACTOR = 100) GO

Great! As many other DBAs would do, you take care of your indexes and deploy a maintenance plan that checks the fragmentation in your pages: If a certain threshold is met, and a 30% fragmentation is commonly mentioned, you would start a task to rebuild your indexes (let’s put reorganization aside for the sake of simplicity). Your routine runs every night and you have the time window that is necessary to accommodate the entire maintenance task. After the index-rebuild is over, a quick analysis of this index using sys.dm_db_index_physical_stats shows low levels of logical (ordering of pages) and internal fragmentation (page density); 0.03499% and 99.72% respectively. You can check it by running the following statement: Script 02 – Checking fragmentation

SELECT * FROM sys.dm_db_index_physical_stats(DB_ID('SimpleTalk'), OBJECT_ID('HistoricalTable'), NULL, NULL, 'DETAILED') GO

A more detailed analysis would show the b-tree+ to be very well organized at the rightmost branch of the index (red branches in figure 01). We can accomplish this detailed checking by using DBCC PAGE and navigate thru the tree structure (script 02).

(Figure 01 – B-tree+ rightmost branches) Starting from the root, and verifying the page ordering at the last two non-leaf index pages, it’s possible to check that all the pages are ordered and that no logical fragmentation is seen. At the leaf level, all the pages are fully allocated with 21 bytes free (in the page header, “m_freeCnt = 21”). Script 03 – Analyzing the rightmost branch

-- Get the root page for our index: 0xAA5000000100 in this sample -- Doing byte swap: 0x000050AA (page number: 20650) at file 0x0001 SELECT AU.root_page FROM sys.system_internals_allocation_units AS AU INNER JOIN SYS.Partitions AS P ON AU.Container_id = P.Partition_id WHERE OBJECT_ID = OBJECT_ID('HistoricalTable') GO

-- Checking root page and get references to the two rightmost non-leaf pages (figure 02) DBCC TRACEON(3604) DBCC PAGE ('SimpleTalk', 1, 20650, 3) GO

(Figure 02 – Index root level)

-- Non-leaf index page (figure 03) -- Note that all the child pages are ordered DBCC PAGE (36, 1, 20972, 3) GO (Figure 03 – Index non-leaf level)

-- Non-leaf index page (figure 04) -- Note that all the child pages are ordered DBCC PAGE (36, 1, 20973, 3) GO

(Figure 04 – Index non-leaf level)

-- Looking at a leaf page (choose one from the level above) -- The last page should show some space left, waiting for the next inserts in the clustered index DBCC PAGE (36, 1, 40048, 3) GO

Now let’s suppose that this table hold many years of historical records and the most recent records can get updated, because that’s the way your business works. After some updates happening in the most recent records (script 04), you again check your index fragmentation and it is worse, showing 3.95% of logical fragmentation and 97.78 for page density: But those values are far from the threshold defined by your rebuild routine. Script 04 – Updates in action and new fragmentation

-- Business rules and application in action UPDATE HistoricalTable SET ColumnB = 'Update bigger then 21 bytes free in each page.' WHERE ID >= 98000 AND (ID % 5) = 0 GO SELECT * FROM sys.dm_db_index_physical_stats(DB_ID('SimpleTalk'), OBJECT_ID('HistoricalTable'), NULL, NULL, 'DETAILED') GO

A small fragmentation means nothing to do, right? Not so fast… If we re-execute the same steps to analyze the rightmost branch of your index (Script 05) you will notice something very different from the first execution. Since the rows that get updated didn’t fit in the space available in each page, SQL Server has to execute a series of page splits to organize the index to respect the order of the index key. Script 05 – Analyzing the rightmost branch after fragmentation

-- Checking root page and get references to the rightmost non-leaf pages (figure 05) -- Note that new pages (out of order) are shown... DBCC TRACEON(3604) DBCC PAGE ('SimpleTalk', 1, 20650, 3) GO

(Figure 05 – Index root level with fragmentation)

-- Non-leaf index page (figure 06) -- Note that all child pages are NOT ordered DBCC PAGE ('SimpleTalk', 1, 45464, 3) GO

-- Non-leaf index page (figure 07) -- Note that all child pages are NOT ordered DBCC PAGE ('SimpleTalk', 1, 20973, 3) GO

-- Looking at a leaf page (choose one from the level above) -- The page now shows some space left. In this page, 4866 bytes (m_freeCnt = 4866). DBCC PAGE ('SimpleTalk', 1, 40048, 3) GO

(Figure 06 - Index non-leaf level with fragmentation) (Figure 07 - Index non-leaf level with fragmentation) Checking the non-leaf level starting from row with ID 98000, we can clearly see that the logical fragmentation for this b-tree branch should be bigger than 90%, since the physical order of pages in the leaf level is not the same as the logical order. This means that the most accessed pages are out of order (potentially avoiding read aheads, but if they are hot they will be in the data cache anyway), leading to a greater logical fragmentation than represented by the DMV. Maybe this doesn’t seem that bad, but another aspect also worries me. If you check the details of a page in the fragmented part of the leaf level, it will show that, on average, only 50% of the page is used (m_freeCnt in the page header). Since the most accessed pages are about half empty, this means that you are wasting space in your data cache, and if your table has a significant size and the updates touches GBs of data, you are wasting half of the space used by those pages. In this case it doesn’t seem that bad due to the low number of pages, but we’re working on a small set of data, probably your SQL Server has a lot more GBs that can be wasted. You can check for the space used (and available) inside the data cache by using sys.dm_os_buffer_descriptors. In the script below I show the average space free (or wasted) for this databasein each page loaded in cache. This is without grouping by object, something worth monitoring in your environment. Script 06 – Checking the data cache from free space

-- Clean the data cache CHECKPOINT DBCC DROPCLEANBUFFERS GO

-- Bring fragmented pages to memory SELECT COUNT(*) FROM dbo.HistoricalTable WHERE ID >= 98000 GO

-- In average, 3895 bytes are wasted in each page. SELECT AVG(free_space_in_bytes) FROM sys.dm_os_buffer_descriptors WHERE database_id = DB_ID('SimpleTalk') GROUP BY database_id GO

To check for the real fragmentation in this branch (without using DBCC PAGE), you can re-execute the script but as well as the cluster index, you can create a filtered non-clustered index (script 07) that mimics your original data structure and check for fragmentation after all the updates are made. This non-clustered index showed me a logical fragmentation of 99.75% and page density of 50.02%. Script 07 – Filtered non-clustered index creation

CREATE NONCLUSTERED INDEX idxNCL_Filtered ON HistoricalTable (ID) INCLUDE (ColumnA, ColumnB, EventDate) WHERE ID >= 98000 GO Conclusion

A considerable fragmentation in specific branches of an index, not seen as a representative change in the overall index fragmentation, may be happening to your servers and frankly, you can’t always prevent those from happening. Partitioning your index and using a different fill factor for each partition would probably give you the best results, but this isn’t the main concern in this article. I want to alert you to the potential problem on relying on thresholds - like 20% or 30% - to rebuild and reorganize all your indexes. This usually won’t suffice and may lead to degradation of performance, especially for large tables; the one you normally care about the most. Even when working with partitions, keeping the data from the current year in the “hot partition” can make your year have a great start and be a problem during Christmas. In this simple case a small table shows 3% logical fragmentation and 97.7% of page density for the whole index, and that’s correct for the whole table, but for the most-used pages you would see a huge logical fragmentation and 50% of page density. That’s one of the reasons I worry when people take those thresholds as being the absolute truth and don’t think about the patterns of data access and manipulation in their own environment, not mentioning the collateral effect and real impact that those may be causing to your hot spots in the database. You, as the DBA responsible for your data should know better than anyone the behavior of your SQL Server and . Of course that is better having a threshold and maintenance plan than to have nothing, but as Paul Randall (said in his blog), don’t treat those numbers as absolute truth. Take care, and remember, there is always more than meets the eye…

© Simple-Talk.com Working with Variables in SQL Server Integration Services 01 March 2012 by Robert Sheldon

There are a number of ways that you can incorporated variables into your SSIS Scripts. Robert Sheldon demonstrates how.

QL Server Integration Services (SSIS) supports two types of variables: system and user-defined. SSIS automatically generates the system Svariables when you create a package. That’s not the case for user-defined variables. You create them as needed when setting up your package. However, both variable types store data that tasks and containers can access during a package’s execution. In addition, package components can save data to user-defined variables during execution in order to pass that information onto other objects. Variables are useful in a number of situations. You can bind them to Transact-SQL parameters in a Execute SQL task, or use them to provide the iterative lists necessary to run a Foreach Loop task. SSIS variables can also be mapped to the variables used within a Script task or Script data flow component. And anywhere you can create a property expression, you can include user-defined or system variables. Event handlers, too, can make use of both types of variables. Viewing Variables in SSIS

To view the variables available in an SSIS package, click the Variables option on the SSIS menu. This will open the Variables pane in the left part of your window. By default, the Variables pane displays only user-defined variables when you first open it, so if none have been defined on your package, the pane will be empty. However, at the top of the Variables pane you’ll find several buttons that let you control what information is displayed: Add Variable: Adds a user-defined variable. Delete Variable: Deletes the selected user-defined variable. Show System Variables: Toggles between a list that includes system variables and one that does not. User-defined variables are blue, and system variables are gray. Show All Variables: Toggles between a list that includes all variables and one that includes only those variables within the scope of the package or the selected container or task. The list will include system variables only if the Show System Variables option is selected. Choose Variable Columns: Launches the Choose Variable Columns dialog box, where you can select which information is shown in the Variables pane.

Figure 1 shows the Variables pane after I clicked the Show System Variables button. As you can see, the pane now displays the system variables available in an SSIS package (SQL Server 2008 R2). Each listing includes the variable name, its scope within the package, its data type, and its pre-defined value. In this case, all the system variables have a scope of SsisVariables. That’s the name of the package I created, which means all the system variables listed in Figure 1 have a package-level scope and are available to the entire package. Figure 1: Viewing the Variables pane in SSIS I can display additional information about each variable by clicking the Choose Variable Columns button and selecting the columns I want to display from the Choose Variable Columns dialog box, shown in Figure 2. As you can see, the only columns I can display in addition to the default columns are Namespace and Raise Change Event (represented by the option Raise event when variable value changes). The Namespace column displays the User namespace for user-defined variables and System namespace for system variable. The Raise Change Event column indicates whether to raise an event when a variable value changes. You can sort the variables listed in the Variables pane by clicking the top of a particular column For example, if you want to sort your variables by scope, you can click the top of the Sort column. Figure 2: Selecting columns for the Variables pane

Creating User-Defined Variables in an SSIS Package

The Variables pane also lets you easily create user-defined variables. However, if you want to create your variables at a scope other than at the package-level, you should first add the necessary tasks and containers to your package. For this article, I created a simple package that includes only an OLE DB connection manager and an Execute SQL task. First, I added the connection manager and configured it to connect to the AdventureWorks2008R2 database on a local instance of SQL Server 2008 R2. I named the connection manager after the database.

Then I added the Execute SQL task, but before I configured it, I used the following Transact-SQL code to create the TestTbl in AdventureWorks2008R2 database:

USE AdventureWorks2008R2; GO

IF OBJECT_ID('dbo.TestTbl') IS NOT NULL DROP TABLE dbo.TestTbl; GO

CREATE TABLE dbo.TestTbl (UserID INT, Username VARCHAR(50));

The OLE DB connection manager and Execute SQL task will use a set of user-defined variables to insert data into this table. So let’s create those variables. When you click the Add Variable button in the Variables pane, a row is added to the list of variables. (At this point, I usually find it easier to toggle the Show System Variables button off so that only user-defined variables are displayed.) Before you click the Add Variable button, however, you must make sure that the scope is preselected for your variable. That means you have to ensure that no containers or tasks are selected in your package if you want your variable to have a package-level scope. But if you want your variable to have a scope specific to a container or task, you must select that object in the SSIS designer before you create your variable. For our example package, the first variable I create has a package-level scope, which means no objects are selected. I click the Add Variable button to add a row to the list of displayed variables. I name the variable SqlServer, select the String data type, and provide the value localhost\SqlSrv2008R2, which connects to my local instance of SQL Server.

I then add two more variables. However, these are at the scope of the Execute SQL task, so I first select the task and then add the variables. The first of these two variables I name UserID, assign the Int32 data type, and provide a value of 101. The second one I name UserName, assign the String data type, and assign a value of johndoe.

Figure 3 show the three user-defined variable I added to the Variables pane. Notice that SqlServer variable has a scope of SsisVariables (the name of the package), and the other two variables have scope of Execute SQL Task (the default name of that task). The SqlServer variable will be used to pass in the name of the SQL Server instance to the connection manager. The UserID and UserName variables will be used to pass data into an INSERT statement in the Execute SQL task.

Figure 3: Adding user-defined variables to an SSIS package When you create a variable in the Variables pane, you’re limited in your ability to view and update variable properties. However, you can view additional properties in the Properties pane. To view a variable’s properties, you must select that variable in the Variables pane, but also have the Properties pane open so you can view the variable’s properties. For example, when I select the UserID variable in the Variables pane, the Properties pane displays the properties shown in Figure 4. Figure 4: Viewing a variable’s properties Notice that there are more properties than what are shown in the Variables pane. For example, you can configure a user-defined variable to be read-only by setting its ReadOnly property to True. Or you can define an expression that determines the variable’s value. Properties you can’t modify are grayed out. Using Variables in an SSIS Package

Once you’ve created your user-defined variables, you can reference them—along with system variables—in your package’s control flow and data flow components. One way to do that is to create property expressions that include the variables. For example, the SqlServer variable contains the target SQL Server instance, so I can use the variable to pass the instance name into the connection manager. To do so, I select the connection manager, ensure that the Properties pane is displayed, and create the following expression on the ServerName property:

@[User::SqlServer]

The expression identifies the User namespace, followed by the SqlServer variable. Now when I run the package, the original value for the ServerName property is replaced by the current value of the SqlServer variable, as shown in Figure 5. Figure 5: Adding a user-defined variable to a property expression The original value of the ServerName property had specified the server’s actual name. After I ran the package, the variable value, which specified localhost, replaced the original value, as evidence by the ServerName and ConnectionString properties in the figure.

Now let’s turn to the Execute SQL task. On the General page of the Execute SQL Task editor, I specify the AdventureWorks2008R2 connection manager, as shown in Figure 6. That means, when the task connects to the target database, it will use the SqlServer variable specified in the connection manager’s property expression to establish that connection.

Figure 6: Configuring the Execute SQL task Next, I add the following statement to the SQLStatement property:

INSERT INTO TestTbl VALUES(?,?)

The two question marks serve as parameter placeholders that we’ll map to the UserID and UserName variables. The mapping itself we do on the Parameter Mapping page of the Execute SQL Task editor, shown in Figure 7.

Figure 7: Mapping user-defined variables to parameters in the Execute SQL task As the figure shows, I add two mappings based on the UserID and UserName variables, both as input parameters. Next I assign the appropriate data types, in this case, LONG and VARCHAR, respectively. I then set the parameter names (0 and 1) and leave the default value (-1) for the parameter size. When I run the package, the INSERT statement will add the variable values to the TestTbl table.

Now that you’ve seen how to use the user-defined variables, let’s take a look at an example that uses a system variable to insert data into the TestTbl table. To demonstrate this, we must first alter the TestTbl table. I used the following ALTER TABLE statement to add a DATETIME column to the table:

ALTER TABLE TestTbl ADD StartTime DATETIME;

Now let’s add another parameter mapping to our Execute SQL task. In this case, we’ll use the ContainerStartTime system variable for the mapping, as Figure 8 demonstrates. Figure 8: Mapping a system variable to a parameter in the Execute SQL task Once again, we’re adding an input parameter, but this one is configured with the DATE data type, and the parameter name in this case is 2. But adding this mapping means we need to update our INSERT statement to include an additional parameter placeholder, as the following code shows:

INSERT INTO TestTbl VALUES(?,?,?)

Now when we run the package, it inserts not only the values from the UserID and UserName variables, but also from the ContainerStartTime variable, which contains a timestamp of when the Execute SQL task starts to execute.

One other item worth noting about the Parameter Mapping page is that you can also create a user-defined variable directly from the page. After you add a new mapping, select the New Variable option from the Variable Name drop-down list, as shown in Figure 9.

Figure 9: Creating a variable from within the Execute SQL task When you select this option, it launches the Add Variable dialog box, shown in Figure 10. Here you can define a variable that can be used just like any other user-defined variable. Several tasks and containers offer this option, which is handy when you want to create a variable on the go.

Figure 10: Configuring a user-defined variable created in the Execute SQL task One final example we’ll look at involves defining an equation that generates a value for a user-defined variable. But first, we’ll need to alter our table again. The following Transact-SQL code adds an INT column to the TestTbl table:

ALTER TABLE TestTbl ADD Runtime INT;

After I add the column, I create a variable named Runtime, configured with the Execute SQL Task scope, the Int32 data type, and an initial value of 0. I then create another parameter mapping in the Execute SQL task, as shown in Figure 11.

Figure 11: Mapping a user-defined variable to a parameter in the Execute SQL task Of course, we can’t forget to update our INSERT statement to include an additional parameter placeholder, as the following code shows:

INSERT INTO TestTbl VALUES(?,?,?,?)

So now we have everything just about set up. However, if we run our package as is, all it does is insert a 0 into the Runtime column because that’s the value we initially assigned to the Runtime variable. What we really want to do is insert the amount of time—in milliseconds—it takes from the time the package starts running to the current date and time. To achieve this, we add the following expression to the Runtime variable properties:

DATEDIFF("ms", @[System::StartTime] , GETDATE())

The DATEDIFF() method calculates the difference between the value in the StartTime system variable and the current data and time, as retrieved by the GETDATE() method. Note, however, to use an expression to generate a variable value, you must also set the variable’s EvaluateAsExpression property to True, as shown in Figure 12.

Figure 12: Using an expression to define the value of a user-defined variable Now when a package’s components access the Runtime variable, a current value will be calculated and that value will be used. Making Use of Variables

As the article has tried to demonstrate, variables extend the capabilities of an SSIS package to a significant degree; however, the examples I’ve shown here only skim the surface of how extensively they can be used. To see more examples of variables in actions, you might want to check out other articles that use variables: Adding the Script Task to Your SSIS Packages Working with Precedence Constraints in SQL Server Integration Services Working with Property Expressions in SQL Server Integration Services Implementing Checkpoints in an SSIS Package XML Configuration files in SQL Server Integration Services

Each article demonstrates ways in which you can incorporate variables into your SSIS packages. From this information and from what I’ve provided in this article, you should have plenty of examples of variables in action. As you’ll discover, they are, for the most part, easy to implement and can add enormous value to your SSIS packages. © Simple-Talk.com Oh no! My padding's invalid!

Published Tuesday, February 28, 2012 12:33 PM

Recently, I've been doing some work involving cryptography, and encountered the standard .NET CryptographicException: 'Padding is invalid and cannot be removed.' Searching on StackOverflow produces 57 questions concerning this exception; it's a very common problem encountered. So I decided to have a closer look. To test this, I created a simple project that decrypts and encrypts a byte array:

// create some random data byte[] data = new byte[100]; new Random().NextBytes(data);

// use the Rijndael symmetric algorithm RijndaelManaged rij = new RijndaelManaged(); byte[] encrypted;

// encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); encrypted = encryptedStream.ToArray(); }

byte[] decrypted;

// and decrypt it again using (var decryptor = rij.CreateDecryptor()) using (CryptoStream crypto = new CryptoStream( new MemoryStream(encrypted), decryptor, CryptoStreamMode.Read)) { byte[] decrypted = new byte[data.Length]; crypto.Read(decrypted, 0, decrypted.Length); }

Sure enough, I got exactly the same CryptographicException when trying to decrypt the data even in this simple example. Well, I'm obviously missing something, if I can't even get this single method right! What does the exception message actually mean? What am I missing? Well, after playing around a bit, I discovered the problem was fixed by changing the encryption step to this:

// encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) { using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length); } encrypted = encryptedStream.ToArray(); }

Aaaah, so that's what the problem was. The CryptoStream wasn't flushing all it's data to the MemoryStream before it was being read, and closing the stream causes it to flush everything to the backing stream. But why does this cause an error in padding? Cryptographic padding All symmetric encryption algorithms (of which Rijndael is one) operates on fixed block sizes. For Rijndael, the default block size is 16 bytes. This means the input needs to be a multiple of 16 bytes long. If it isn't, then the input is padded to 16 bytes using one of the padding modes. This is only done to the final block of data to be encrypted.

CryptoStream has a special method to flush this final block of data - FlushFinalBlock. Calling Stream.Flush() does not flush the final block, as you might expect. Only by closing the stream or explicitly calling FlushFinalBlock is the final block, with any padding, encrypted and written to the backing stream. Without this call, the encrypted data is 16 bytes shorter than it should be. If this final block wasn't written, then the decryption gets to the final 16 bytes of the encrypted data and tries to decrypt it as the final block with padding. The end bytes don't match the padding scheme it's been told to use, therefore it throws an exception stating what is wrong - what the decryptor expects to be padding actually isn't, and so can't be removed from the stream. So, as well as closing the stream before reading the result, an alternative fix to my encryption code is the following:

// encrypt the data using a CryptoStream using (var encryptor = rij.CreateEncryptor()) using (MemoryStream encryptedStream = new MemoryStream()) using (CryptoStream crypto = new CryptoStream( encryptedStream, encryptor, CryptoStreamMode.Write)) { crypto.Write(data, 0, data.Length);

// explicitly flush the final block of data crypto.FlushFinalBlock();

encrypted = encryptedStream.ToArray(); }

Conclusion

So, if your padding is invalid, make sure that you close or call FlushFinalBlock on any CryptoStream performing encryption before you access the encrypted data. Flush isn't enough. Only then will the final block be present in the encrypted data, allowing it to be decrypted successfully. by Simon Cooper A Testing Perspective of Controllers and Orchestrators 14 February 2012 by Dino Esposito

The neat separation between processing and rendering in ASP.NET MVC guarantees you an application design that is inherently teastable. It doesn't guarantee that your application will be well-designed and quick to test. For that, attention to use-cases and the structure of your code is essential.

his article focuses on the testability of ASP.NET MVC controllers and suggests that, if you keep your controllers super-thin and move the logic to Tservices and model, then you have no need to unit-test controllers and probably don’t need to mock the HTTP context that much. The article is natural follow-up of a previous article that appeared on Simple-Talk… Never Mind the Controller, Here is the Orchestrator. From RAD to Unit Testing

ASP.NET MVC is inherently testable. Testability, the degree to which software inherently supports testing, has been recognized as a fundamental attribute of software since the first draft of the international standard ISO/IEC 9126 paper about software architecture. The first draft of this paper dates back to 1991. However, Design for testability didn’t seem to concern the average .NET developer much until 2004. Whatever the reason for its slow take-up, the success of .NET as a platform brought many companies to build more and more line-of-business applications, thereby dumping an incredible amount of complexity and business rules on development teams. Developers had to hurriedly change their approach to software development: Development needed to be rapid, but also reliable and extensible. It was becoming increasingly important to be able to design software in such a way to make it easy to test, and in particular to test automatically. Automated tests can give you a mechanical way to check edge cases and figure out quickly and reliably whether changes have broken existing features. Testing Applied to Controllers

When the long-awaited ASP.NET MVC was introduced, the framework made it possible for website developers to practice unit testing. A lot of tutorials have been posted to show you how to write unit tests around controllers. These tutorials assume that the controller is where you serve the request and coordinate access to backend layers such as the data access layer. If the controller is the nerve center of your ASP.NET MVC application then, once you can artificially create a HTTP context to simulate requests and responses to mock up the HTTP context, then you’re pretty much done. Because ASP.NET MVC provides facilities to mock up the HTTP context, you are already provided with a clear and clean path ahead: You just need to write a bunch of unit tests for any of your controller classes and you’ll be fine. Is this really true? Should You Test the Controller?

Testing is much easier if you can rely on clean code that is well separated in layers. Is the controller one of these layers? That question may surprise you if you, as an MVC developer, assume that the controller is the centralized machinery that governs the activity of an ASP.NET MVC site. The controller isn’t really one of the architectural layers of an ASP.NET MVC site: more accurately, the controller is an essential element that’s hard-coded in the framework. The controller is part of the infrastructure and therefore not a part of your code. The controller merely serves the purpose of receiving requests and dispatching them to other application specific services. in a previous article of mine for Simple-Talk, I named these services as orchestrators. An orchestrator is a helper component that coordinates any activity related to a specific request. Ideally, an orchestrator is invoked by a controller but operates independently. This means that it receives from the controller any data that it needs to work on and returns to it any calculated data. in this model, the controller is merely a pass-through layer with no significant logic that really needs testing. The real business is taking place in the orchestrator and so it is the orchestrator component where you will want to focus your testing efforts. in general, you should focus more on the structure of your code and apply unit-testing practices where it is most beneficial. The code coverage, the percentage of code covered by unit tests, is a reliable indicator of neither code-quality nor the bug-count. If you use orchestrators, the controller is so thin that it needs almost no testing. It makes sense to take as much code as possible out of the controller classes by using orchestrators, because it helps you to focus on use-cases in sufficient detail to give these orchestrators a programming interface. This is a great stimulus to write better code because you are forced to plan the code in terms of use-cases and design issues. By using orchestrators you reduce significantly the need to mock the HTTP context. If, for example, some session state must be consumed by the orchestrator, then the controller will access it, extract any data and pass it to the orchestrator. in most cases, you can focus on testing the orchestrator without being overly concerned with the HTTP context. Testing Orchestrators

ASP.NET MVC does its best to support testing but it knows nothing about your and the design you’ve come up with. ASP.NET MVC doesn’t write tests for you either. You should aim at writing tests that are relevant rather than aiming at getting a high score in code coverage. What’s the value in testing code like this? [TestMethod] public void TestIfSomeMethodWorks() { var controller = new MyController(); var result = controller.DoSomething(); Assert.AreEqual(someExpectedResult, result); }

The value of such a test is strictly related to the implementation of the method DoSomething. The method could be of any size; it might perhaps be executing a number of activities internally or might merely delegate the execution of the workflow to other layers. As I consider the controller to be part of the ASP.NET MVC infrastructure, I recommend the second approach. When the method DoSomething is just responsible for massaging some URL parameters and possibly some parts of the HTTP context into input values for orchestrators the value of testing DoSomething is much smaller than testing the orchestrator itself.

{ private readonly IMyOrchestrator _orchestrator; public MyController(IMyOrchestrator orchestrator) { _orchestrator = orchestrator; }

public ActionResult DoSomething( /* Parameters via model binding */ ...) { // Massage data from HTTP context into more abstract data structures var someSessionData = HttpContext.Session["Data"];

// trigger the orchestrator var model = _orchestrator.PerformActionAndGetviewModel(someSessionData, ...); return view(model); } }

The orchestrator is a class that receives anything it needs to work from the outside, typically via dependency injection (DI), either coded manually or via an IoC container.

public class MyOrchestrator { public MyOrchestrator( /* List of dependencies */ ...) { ... }

public SomeviewModel DoSomething( /* Data obtained from URL and HTTP context */ ...) { // Implement the workflow required BY the requested user action ...

// use mockable dependencies here TO test in isolation ... } }

The orchestrator class has no dependencies on the HTTP context. instead, it usually has a few dependencies on such services as repositories and domain services. These dependencies are easily managed via DI and can be easily mocked when it comes to testing. in other words, testing is simplified and there is complete decoupling of framework and logic. As a developer, you focus on UI triggers and those tasks that need to be orchestrated in response. You simply use the controller as the built-in mechanism that matches UI requests to tasks. To put it another way, the controller is not part of your code. When the Controller Orchestrates Itself…

Let’s review a scenario in which the controller does it all. You seem to have a simpler project structure and coordinate any actions related to the requested task from within the controller. Your DoSomething method may look like below:

public class MyController { public ActionResult DoSomething( /* Parameters via model binding */ ...) { // Massage data from HTTP context into more abstract data structures var someSessionData = HttpContext.Session["Data"];

// Step 1 // use some ICustomerRepository dependency

// Step N // use some other dependency

// Build a view model // This may be a long block OF code (30+ lines) return view(model); } }

The controller is instantiated by the registered controller factory. The default factory can only create controller instances by using their default constructor. To inject dependencies, therefore, you need to replace the factory. This is extra work. in ASP.NET MVC 3 you can, perhaps, use dependency resolvers to save yourself the task of creating a factory. in terms of patterns, this means using a Service Locator based on IoC containers in order to inject dependencies. It works, and it seems elegant. I wouldn’t say this is simple though. It rather seems to me quite convoluted. With dependency resolvers, you’re writing less code yourself but you’re forcing the framework to do a lot more work. Finally, by putting this logic in the controller, you are distracted from use-cases and overall design. Writing logic in the controller is exactly the same as writing the logic in Button1_Click. This is what a controller method actually is: a more elegant way of rendering a button click handler. Clean design, and with it effective and really quick testing, is still far away. Final Thoughts

Personally, I have grown to love ASP.NET MVC. Mine was certainly not love at first sight. Love grew from appreciating the benefits of the neat separation between processing and rendering and the total control over the markup and responses. The care with which ASP.NET MVC was created was never extended to the tutorials. These, unfortunately, suggested to many developers that is was sufficient to merely use ASP.NET MVC in order to comply with best practices in architecture and design. It is certainly true, as the tutorials suggest, that testing is greatly simplified with ASP.NET MVC: Firstly, processing and rendering are neatly separated so you can intercept results being integrated in the view and write assertions on them. Secondly, some facilities exist to mock parts of a request that rely on the run time environment. in ASP.NET MVC, smart tooling and framework magic make it possible for you to write code that can be unit-tested. Is it this also necessarily clean and well-designed code? I’d argue that ASP.NET MVC doesn’t guarantee you a good design. If you miss this point, you are likely to produce controllers whose design is just an updated version a of Button1_Click handler, and which are, in consequence, difficult to test.

© Simple-Talk.com Glenn Berry: DBA of the Day 23 February 2012 by Richard Morris

Glenn Berry works as a Database Architect at Avalara in Bainbridge Island, Washington. He is a SQL Server MVP, and has a whole collection of certifications, including MCITP, MCDBA, MCSE, MCSD, MCAD, and MCTS. As well as working as a DBA, he is an Adjunct Faculty member at the University of Denver, where he has been teaching since 2000. He wrote chapters in the SQL Server MVP Deep Dives books as well as 'SQL Server Hardware' for Simple-Talk.

lenn Berry is used to do doing things his own way. A former US Marine, who says that, as an infantryman, he became so tired of walking that Ghe became a Tank Commander. He first learned to program after persuading a clerk at a Radio Shack store to print out the source code from a Commodore game called B1 Nuclear Bomber. This was then retyped and converted it to a 20 column display with added sound effects. After leaving the forces he studied economics and international affairs thinking it might become useful in his chosen career as an intelligence analyst similar to Tom Clancy’s hero Jack Ryan, the former stockbroker and CIA agent. Instead Glenn spent his second career rather more prosaically as a developer before discovering he liked being a DBA much better. The job has its similarities; both require the patience of a saint and being fluent in a second language. A well known SQL Server MVP since July 2007, Glenn is also a competent author, blogger and regular contributor at PASS events with a particular emphasis on T-SQL Programming, query tuning, best practice and configuration management and version control. A keen amateur astronomer, military historian and modeller, he worked for social media communicator NewsGator for five years before joining Avalara in Bainbridge Island, Washington last summer.

RM: Glenn, when did you learn to program and what was the transition to what you do now? GB: I was working as a Claims Adjuster at an Automobile Insurance company when I taught myself how to program with Visual Basic 3.0 and Access 2.0. Not too long after that, I got my first development job. I was always more interested in databases, so I eventually gravitated to being a DBA. One joke I always tell when I am speaking is that I used to be a developer, but then I grew up and became a DBA. RM: You’re largely self-taught. Do you have any advice for self-taught programmers? Does it matter that people learn assembly or Basic if the way software is going to be all web applications or a piece of distributed code that will move around a dozen servers spawning other copies of itself. GB: I would say that you should buy a few decent books for the programming language you are interested in learning, and use them to help you get a good grasp of the fundamentals. Looking at good code examples from other developers is also very helpful. I think that learning a popular language like C# is a good idea, regardless of what you are planning on doing. RM: Lots of programmers come out of mathematics and lots of computer-science theory is very much mathematical. You’re proof that it is not necessary. How much maths based thinking is necessary to be a good programmer? GB: Well, I don’t know that I am really a good programmer. I was always much better at optimizing existing code to make it run much faster. Working on most business and database applications, you don’t really need much math beyond algebra and arithmetic. You certainly don’t need things like calculus for most programming tasks in that arena. RM: Have the kind of people who can be successful at programming changed, do you think? GB: I think the most important trait for a good developer is perseverance. Most end-users (and many IT people) tend to get frustrated and give up if they can’t figure out a problem pretty quickly. As a good developer, you have to stick with it, troubleshooting, debugging, and walking through your code until you figure out what the problem is. Then, you have to figure out how to fix it. You can’t just give up, and wait for someone else to fix it. RM: I wonder if the inclination to take things apart and understand how everything works is little more tempered these days. If you tried to take apart every piece of code you work with, it would never end. I guess now you have to say to yourself, ‘I sort of understand how this works and I’m going to let it go at that until it becomes urgent that I understand it better.’ GB: Many developers are endlessly curious, and they do like to figure everything out. This is fun for them, but can be bad for a project if they spend too much time “gold-plating” their code. RM: Can you tell me something about the way you work? Do you live a hectic life and work every day, 7 days a week? GB: Currently, I work from home, doing remote work for Avalara. Not having to commute to work is very nice, but there is a trade-off in that you sometimes feel guilty if you are not online working 24 x 7. I try not to live a hectic life and work seven days a week, but if there is a crisis, I feel a sense of responsibility. RM: You’re well known as a mentor, did you have any important mentors when you were starting out? Can you say something about the direct influences on the way you work? GB: I try to spread useful, accurate knowledge as much as possible, and I like to help people out, so I guess that makes me somewhat of a mentor. One of my early role models was Kimberly Tripp, who really inspired me after I saw a few of her presentations. Paul Nielsen helped me get started as a speaker by giving up one of his speaking slots at a regional event, so that I could get a chance to speak there. I think being in the U.S. Marines had a big influence on how I try to work, with the idea of discipline and responsibility. RM: T-SQL as a programming language isn’t very sophisticated and there are a lot of things you would take for granted in any modern language that just are not there. Do you think Microsoft need to do a lot of work in this area such as tools for the professional programmer? GB: Unlike many developers, I think T-SQL works pretty well for what I think people should actually be using it for (which are basic CRUD operations). If you find yourself wanting to do complicated business logic in T-SQL, and you are frustrated because T-SQL won’t let you do it as easily as you would like, I would say that you should be writing that business logic in C# to run in a middle-tier component. RM: Do you think beginners who have no formal training see SQL as more difficult than it really is? I was talking with a well known MVP and he was saying that colleagues of his can write SQL queries all day long and are able to retrieve information but they wouldn’t be able to that if they had to write in C# or Visual Basic. GB: The basics of T-SQL are pretty easy to pick up, kind of like the rules of Chess. People can learn to write quite a bit of T- SQL in a relatively short time. T-SQL does work pretty well for its intended purpose. RM: What are the important parts of your programming toolkit, Transact-SQL? Do you use source control? Do you think people who use SSMS and similar tools are led astray by the fact that it so easy to create an object in the database that they forget saving it to a file? GB: I spend most of my time in SSMS, and I have quite a library of useful DMV queries that I have built up over the years. I have used various different types of Source Control at different times. I always teach people that the Script button in SSMS is their friend. I never want to make a change or create an object in the SSMS GUI without looking at the T-SQL commands that are going to be run. Unless you work at an ISV, having a good backup and recovery strategy for your databases is even more important that simply having all of your DDL scripted out and in Source Control. RM: What would you most like to see added to T-SQL in the next iteration? GB: I honestly don’t have any pet command or extension that I am anxiously waiting for in T-SQL. RM: Let’s chat briefly about your life as a teacher and writer. What do you consider the most difficult aspect of teaching? And how does a teacher become a published writer? Was it a natural progression for you? GB: One of the most difficult aspects of teaching is the reality of “grade inflation”, where some students get very upset if they don’t get 100% on every assignment or test. On the other hand, I really enjoy teaching people how to use SQL Server the “proper” way, so that they can be safe and effective DBAs. There are so many people who are accidental DBAs, who unfortunately have no idea, what they are doing! They can be really dangerous, where they can make one mistake that have huge consequences. I’m not sure that teaching really led into writing for me. I think being a speaker was more of an inspiration. RM: You wrote SQL Server Hardware and contributed to Volume 1 and Volume 2 of SQL Server MVP Deep Dives. Were there attempts at other books before then? GB: No, not really. I have been blogging pretty regularly for over five years, and between that and my speaking at big conferences, it seemed like a natural progression to eventually write a book. Being able to just write a few chapters for the MVP Deep Dives books was a good way to get started. I always thought that I would try to write a few magazine articles first, but that never happened. I am really grateful that Tony Davis at Red Gate gave me a chance to write SQL Server Hardware. RM: Were there books that were important to you when you were learning to programme? GB: When I was first learning, I used to read Visual Basic Programmer’s Journal quite a bit. I have bought so many books from Microsoft Press, APress, and Wrox over the years, they tend to blur together. Honestly, Redgate has been putting out some very good SQL Server books over the last several years. RM: Have you read Knuth’s The Art of Computer Programming? GB: Sadly, I have not. I do know that is one of the classics that everyone is supposed to read. RM: I’d like to ask about speaking at conferences, something which fills some people with dread. Do you feel comfortable as a speaker and what’s the key to delivering a good speech? GB: I used to get pretty nervous when I was going to speak at the PASS Summit. I was worried that someone like Paul Randal would be sitting in the audience, waiting for me to make the slightest technical mistake, so they could publicly correct me. So far, that has never happened. Even if I made a mistake like that, the better known speakers in the SQL Server world would not publicly humiliate anyone (unless they really, really deserved it).

If you can speak effectively to five or ten people, you can speak to two or three hundred. As long as you know your material, and are passionate about the subject, you will be fine. Of course, I always feel better after I am done speaking! RM: How would you make money from your skills if you weren't in the job that you are now? GB: Hopefully, it would be something related to SQL Server, whether it was consulting or training, or just working as a DBA. RM: Is there any moment or event either in IT or computer science would you like to have been at and why? GB: It would have been nice to have been Bill Gates best friend at Harvard, and been involved in the absolute beginning of Microsoft. That would have been interesting and very lucrative if you had been in the right place at the right time, and had been able to stick with it. Being around during the early days of Intel would have been a similar situation.

© Simple-Talk.com Exploring SSIS Architecture and Execution History Through Scripting 16 February 2012 by Feodor Georgiev

When you are using SSIS, there soon comes a time when you are confronted with having to do a tricky task such as searching for particular connection strings in all your SSIS packages, or checking the execution history of scheduled SSIS jobs. You can do this type of work effectively in TSQL as Feodor explains.

y previous article on SSIS was focused on the architecture and the functioning of the product. I had also provided a few essential T-SQL scripts Mwhich provide certain ways for documenting the SSIS environment. In this article I will focus more on the T-SQL scripting and the ways to reveal configuration, performance and architectural information through scripting. Exploring SSIS’s Metadata Objects SSIS Metadata Objects in sys.objects

Let’s start simple by exploring the metadata objects that are related to the SSIS installation. If we look at the Integration Services metadata objects in SQL 2005 we will notice that the objects contain the ‘DTS’ phrase in their names. By executing the following script in SQL 2005 we will get all objects related to the SSIS metadata (notice that the script is executed in the MSDB context):

USE msdb ;

SELECT * FROM sys.objects WHERE name LIKE '%dts%'

Later on, in the SQL 2008 and later we have objects containing the phrase ‘DTS’ as well as ‘SSIS’ in the names. Execute the following script to view the objects (again, in the context of the MSDB database):

USE msdb ;

SELECT * FROM sys.objects WHERE name LIKE '%dts%' OR name LIKE '%ssis%'

Why is this? In SQL Server 2005 you will find dbo.sysdtspackages and dbo. sysdtspackages90, which help SQL Server distinguish between Integration Services packages created in BIDS and legacy packages inherited and transferred from the old SQL Server 2000 DTS (Data Transformation Services). In SQL Server 2008 and up we find dbo. sysdtspackages and dbo. sysssispackages, where the first table contains legacies, and the second – the BIDS packages with versions from 2005 and 2008. SSIS Metadata Objects in other system tables

In SQL 2008 and up we have:

name - 2008 sysdtscategories One row for each category description sysdtspackagelog Legacy sysdtspackages Legacy sysdtssteplog Legacy sysdtstasklog Legacy One row per entry generated by SSIS package at runtime (when the SQL sysssislog Server log provider is used) sysssispackagefolders One row for each folder in the SSIS structure sysssispackages One row for each SSIS package

… and in SQL 2005 there is …

name - 2005 sysdtscategories One row for each category description sysdtslog90 One row per entry generated by SSIS package at runtime sysdtspackagefolders90 One row for each folder in the SSIS structure sysdtspackagelog Legacy sysdtspackages Legacy sysdtspackages90 One row for each SSIS package sysdtssteplog Legacy sysdtstasklog Legacy Structure and contents of the SSIS packages

As we know, the SSIS packages are just structured XML files that contain all information needed for the package to carry out its tasks. In other words, the SSIS package itself contains the objects in the flows, the precedence, the connections and their configurations. SSIS Packages may be saved on the file system, or in MSDB repository. In the case of the package being saved in MSDB, the package definition is saved in the packagedata column of the dbo.sysssispackages table (or in dbo.sysdtspackages90 in SQL Server 2005). The column itself is of the image datatype, hence in order for us to retrieve the contents, we need to cast it as a VARBINARY(MAX) first, and then as a XML data type. Depending on the security level of the package, however, it might not be very easy to explore the contents of the package definitions in MSDB; in case the package is encrypted, the package definition will begin with the EncryptedData tag. Retrieving the definitions of the SSIS Packages

So, here is how to retrieve the definitions of the SSIS packages in MSDB:

In 2005:

SELECT p.[name] AS [PackageName] ,[description] AS [PackageDescription] ,CASE [packagetype] WHEN 0 THEN 'Undefined' WHEN 1 THEN 'SQL Server Import and Export Wizard' WHEN 2 THEN 'DTS Designer in SQL Server 2000' WHEN 3 THEN 'SQL Server Replication' WHEN 5 THEN 'SSIS Designer' WHEN 6 THEN 'Maintenance Plan Designer or Wizard' END AS [PackageType] ,CASE [packageformat] WHEN 0 THEN 'SSIS 2005 version' WHEN 1 THEN 'SSIS 2008 version' END AS [PackageFormat] ,p.[createdate] ,CAST(CAST(packagedata AS VARBINARY(MAX)) AS XML) PackageXML FROM [msdb].[dbo].[sysdtspackages90] p

In 2008 and up:

SELECT p.[name] AS [PackageName] ,[description] AS [PackageDescription] ,CASE [packagetype] WHEN 0 THEN 'Undefined' WHEN 1 THEN 'SQL Server Import and Export Wizard' WHEN 2 THEN 'DTS Designer in SQL Server 2000' WHEN 3 THEN 'SQL Server Replication' WHEN 5 THEN 'SSIS Designer' WHEN 6 THEN 'Maintenance Plan Designer or Wizard' END AS [PackageType] ,CASE [packageformat] WHEN 0 THEN 'SSIS 2005 version' WHEN 1 THEN 'SSIS 2008 version' END AS [PackageFormat] ,p.[createdate] ,CAST(CAST(packagedata AS VARBINARY(MAX)) AS XML) PackageXML FROM [msdb].[dbo].[sysssispackages] p

Now that we have the definition, what can we do with it? We can parse it and extract some useful data. Extracting connection strings from an SSIS definition

Here is how to retrieve the data connection strings:

In SQL 2005:

;WITH XMLNAMESPACES ('www.microsoft.com/SqlServer/Dts' AS pNS1, 'www.microsoft.com/SqlServer/Dts' AS DTS) -- declare XML namespaces SELECT c.name, SSIS_XML.value('./pNS1:Property [@pNS1:Name="DelayValidation"][1]', 'varchar(100)') AS DelayValidation, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ObjectName"][1]', 'varchar(100)') AS ObjectName, SSIS_XML.value('./pNS1:Property[@pNS1:Name="Description"][1]', 'varchar(100)') AS Description, SSIS_XML.value('pNS1:ObjectData[1]/pNS1:ConnectionManager[1] /pNS1:Property[@pNS1:Name="Retain"][1]', 'varchar(MAX)') Retain, SSIS_XML.value('pNS1:ObjectData[1]/pNS1:ConnectionManager[1] /pNS1:Property[@pNS1:Name="ConnectionString"][1]', 'varchar(MAX)') ConnectionString FROM -- ( SELECT id , CAST(CAST(packagedata AS VARBINARY(MAX)) AS XML) PackageXML FROM [msdb].[dbo].[sysdtspackages90] ) PackageXML CROSS APPLY PackageXML.nodes('/DTS:Executable/DTS:ConnectionManager') SSIS_XML ( SSIS_XML ) INNER JOIN [msdb].[dbo].[sysdtspackages90] c ON PackageXML.id = c.id

In SQL 2008 and up:

;WITH XMLNAMESPACES ('www.microsoft.com/SqlServer/Dts' AS pNS1, 'www.microsoft.com/SqlServer/Dts' AS DTS) -- declare XML namespaces SELECT c.name, SSIS_XML.value('./pNS1:Property [@pNS1:Name="DelayValidation"][1]', 'varchar(100)') AS DelayValidation, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ObjectName"][1]', 'varchar(100)') AS ObjectName, SSIS_XML.value('./pNS1:Property[@pNS1:Name="Description"][1]', 'varchar(100)') AS Description, SSIS_XML.value('pNS1:ObjectData[1]/pNS1:ConnectionManager[1] /pNS1:Property[@pNS1:Name="Retain"][1]', 'varchar(MAX)') Retain, SSIS_XML.value('pNS1:ObjectData[1]/pNS1:ConnectionManager[1] /pNS1:Property[@pNS1:Name="ConnectionString"][1]', 'varchar(MAX)') ConnectionString FROM -- ( SELECT id , CAST(CAST(packagedata AS VARBINARY(MAX)) AS XML) PackageXML FROM [msdb].[dbo].[sysssispackages] ) PackageXML CROSS APPLY PackageXML.nodes('/DTS:Executable/DTS:ConnectionManager') SSIS_XML ( SSIS_XML ) INNER JOIN [msdb].[dbo].[sysssispackages] c ON PackageXML.id = c.id

Extracting connection strings from an SSIS definition

Here is how to retrieve the package configurations:

In SQL 2005:

;WITH XMLNAMESPACES ('www.microsoft.com/SqlServer/Dts' AS pNS1, 'www.microsoft.com/SqlServer/Dts' AS DTS) -- declare XML namespaces SELECT c.name, SSIS_XML.value('./pNS1:Property [@pNS1:Name="ConfigurationType"][1]', 'varchar(100)') AS ConfigurationType, CASE CAST(SSIS_XML.value('./pNS1:Property[@pNS1:Name="ConfigurationType"][1]', 'varchar(100)') AS INT) WHEN 0 THEN 'Parent Package' WHEN 1 THEN 'XML File' WHEN 2 THEN 'Environmental Variable' WHEN 3 THEN 'Registry Entry' WHEN 4 THEN 'Parent Package via Environmental Variable' WHEN 5 THEN 'XML File via Environmental Variable' WHEN 6 THEN 'Registry Entry via Environmental Variable' WHEN 7 THEN 'SQL Server' END AS ConfigurationTypeDesc, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ConfigurationVariable"][1]', 'varchar(100)') AS ConfigurationVariable, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ObjectName"][1]', 'varchar(100)') AS ConfigurationName, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ConfigurationString"][1]', 'varchar(100)') AS ConfigurationString FROM ( SELECT id , CAST(CAST(packagedata AS VARBINARY(MAX)) AS XML) PackageXML FROM [msdb].[dbo].[sysdtspackages90] ) PackageXML CROSS APPLY PackageXML.nodes('/DTS:Executable/DTS:Configuration') SSIS_XML ( SSIS_XML ) INNER JOIN [msdb].[dbo].[sysdtspackages90] c ON PackageXML.id = c.id

In SQL 2008 and up:

;WITH XMLNAMESPACES ('www.microsoft.com/SqlServer/Dts' AS pNS1, 'www.microsoft.com/SqlServer/Dts' AS DTS) -- declare XML namespaces SELECT c.name, SSIS_XML.value('./pNS1:Property [@pNS1:Name="ConfigurationType"][1]', 'varchar(100)') AS ConfigurationType, CASE CAST(SSIS_XML.value('./pNS1:Property[@pNS1:Name="ConfigurationType"][1]', 'varchar(100)') AS INT) WHEN 0 THEN 'Parent Package' WHEN 1 THEN 'XML File' WHEN 2 THEN 'Environmental Variable' WHEN 3 THEN 'Registry Entry' WHEN 4 THEN 'Parent Package via Environmental Variable' WHEN 5 THEN 'XML File via Environmental Variable' WHEN 6 THEN 'Registry Entry via Environmental Variable' WHEN 7 THEN 'SQL Server' END AS ConfigurationTypeDesc, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ConfigurationVariable"][1]', 'varchar(100)') AS ConfigurationVariable, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ObjectName"][1]', 'varchar(100)') AS ConfigurationName, SSIS_XML.value('./pNS1:Property[@pNS1:Name="ConfigurationString"][1]', 'varchar(100)') AS ConfigurationString FROM ( SELECT id , CAST(CAST(packagedata AS VARBINARY(MAX)) AS XML) PackageXML FROM [msdb].[dbo].[sysssispackages] ) PackageXML CROSS APPLY PackageXML.nodes('/DTS:Executable/DTS:Configuration') SSIS_XML ( SSIS_XML ) INNER JOIN [msdb].[dbo].[sysssispackages] c ON PackageXML.id = c.id

There are many other aspects to be explored in the definitions of the SSIS packages, and it is all matter of finding the node names and parsing them. In the remaining part of this article, I would like to shift our attention to these areas: the interaction between SQL Agent and the SSIS packages and some scripts to gather performance statistics. Overriding the package internal configurations

The SSIS packages can be executed in several ways: as a scheduled job from the SQL Server Agent or from the command line (or even from a batch file). Regardless of which method is used for the execution, it is always the DTExec.exe who carries the task. Before executing the SSIS package, the SQL Server Agent or the command line script have to form an execution string and pass parameters to the DTExec, and thus control the execution of the package. Here is a script which shows all SQL Agent jobs steps which execute SSIS packages and the custom configurations provided through the SQL Agent job:

USE [msdb] GO SELECT j.job_id, s.srvname, j.name, js.subsystem, js.step_id, js.command, j.enabled, js.output_file_name, js.last_run_outcome, js.last_run_duration, js.last_run_retries, js.last_run_date, js.last_run_time, js.proxy_id FROM dbo.sysjobs j JOIN dbo.sysjobsteps js ON js.job_id = j.job_id JOIN MASTER.dbo.sysservers s ON s.srvid = j.originating_server_id --filter only the job steps which are executing SSIS packages WHERE subsystem = 'SSIS' --use the line below to enter some search criteria --AND js.command LIKE N'%ENTER_SEARCH%' GO

As you noticed, you can even use the script above to filter and search through the configurations of the SQL Agent Jobs. For example, you can search for all jobs which are executing encrypted SSIS packages by using …

AND js.command LIKE N'%/DECRYPT%'

…as a search criteria in the above script. You may also want to search for a server name, for example. Exploring execution history

Finally, let’s look into some execution history of the SSIS packages which are scheduled as SQL Server Agent jobs. The following script will return all SQL Server Agent Jobs, which are currently (as of the moment of the execution of the script) executing SSIS packages and also the last execution time and duration, as well as the execution command. SET NOCOUNT ON -- Check if the SQL Server Agent is running IF EXISTS ( SELECT 1 FROM MASTER.dbo.sysprocesses WHERE program_name = N'SQLAgent - Generic Refresher' ) BEGIN SELECT @@SERVERNAME AS 'InstanceName' , 1 AS 'SQLServerAgentRunning' END ELSE BEGIN SELECT @@SERVERNAME AS 'InstanceName' , 0 AS 'SQLServerAgentRunning' RAISERROR('The SQL Server Agent is not running.', 16, 1) WITH SETERROR ; END -- Execute the script IF EXISTS ( SELECT * FROM tempdb.dbo.sysobjects WHERE id = OBJECT_ID(N'[tempdb].[dbo].[Temp1]') ) DROP TABLE [tempdb].[dbo].[Temp1] GO CREATE TABLE [tempdb].[dbo].[Temp1] ( job_id UNIQUEIDENTIFIER NOT NULL , last_run_date NVARCHAR(20) NOT NULL , last_run_time NVARCHAR(20) NOT NULL , next_run_date NVARCHAR(20) NOT NULL , next_run_time NVARCHAR(20) NOT NULL , next_run_schedule_id INT NOT NULL , requested_to_run INT NOT NULL , request_source INT NOT NULL , request_source_id SYSNAME COLLATE database_default NULL , running INT NOT NULL , current_step INT NOT NULL , current_retry_attempt INT NOT NULL , job_state INT NOT NULL ) DECLARE @job_owner SYSNAME DECLARE @is_sysadmin INT SET @is_sysadmin = ISNULL(IS_SRVROLEMEMBER('sysadmin'), 0) SET @job_owner = SUSER_SNAME() INSERT INTO [tempdb].[dbo].[Temp1] EXECUTE MASTER.dbo.xp_sqlagent_enum_jobs @is_sysadmin, @job_owner

UPDATE [tempdb].[dbo].[Temp1] SET last_run_time = RIGHT('000000' + last_run_time, 6) , next_run_time = RIGHT('000000' + next_run_time, 6) ; ----- SELECT j.name AS JobName , j.enabled AS Enabled , CASE x.running WHEN 1 THEN 'Running' ELSE CASE h.run_status WHEN 2 THEN 'Inactive' WHEN 4 THEN 'Inactive' ELSE 'Completed' END END AS CurrentStatus , COALESCE(x.current_step, 0) AS CurrentStepNbr , CASE x.running WHEN 1 THEN js.step_name ELSE NULL END AS CurrentStepName , CASE WHEN x.last_run_date > 0 THEN CONVERT (DATETIME, SUBSTRING(x.last_run_date, 1, 4) + '-' + SUBSTRING(x.last_run_date, 5, 2) + '-' + SUBSTRING(x.last_run_date, 7, 2) + ' ' + SUBSTRING(x.last_run_time, 1, 2) + ':' + SUBSTRING(x.last_run_time, 3, 2) + ':' + SUBSTRING(x.last_run_time, 5, 2) + '.000', 121) ELSE NULL END AS LastRunTime , CASE h.run_status WHEN 0 THEN 'Fail' WHEN 1 THEN 'Success' WHEN 2 THEN 'Retry' WHEN 3 THEN 'Cancel' WHEN 4 THEN 'In progress' END AS LastRunOutcome , CASE WHEN h.run_duration > 0 THEN ( h.run_duration / 1000000 ) * ( 3600 * 24 ) + ( h.run_duration / 10000 % 100 ) * 3600 + ( h.run_duration / 100 % 100 ) * 60 + ( h.run_duration % 100 ) ELSE NULL END AS LastRunDuration , js.command AS SSISPackageExecutionCommand FROM [tempdb].[dbo].[Temp1] x LEFT JOIN msdb.dbo.sysjobs j ON x.job_id = j.job_id JOIN msdb.dbo.sysjobsteps js ON js.job_id = j.job_id LEFT OUTER JOIN msdb.dbo.syscategories c ON j.category_id = c.category_id LEFT OUTER JOIN msdb.dbo.sysjobhistory h ON x.job_id = h.job_id AND x.last_run_date = h.run_date AND x.last_run_time = h.run_time AND h.step_id = 0 WHERE x.running = 1 AND js.subsystem = 'SSIS'

DROP TABLE [tempdb].[dbo].[Temp1]

In conclusion, SSIS is a vast product which provides significant amount of metadata available to the SQL Server administrator. In this article I have shown the way to explore the SSIS metadata through some scripts and hopefully they will make the daily administration of your SSIS environments much easier.

© Simple-Talk.com A Complete Guide to Writing Timer Jobs in SharePoint 2010 20 February 2012 by Damon Armstrong

Sharepoint allows you to run recurring processes in background on a schedule. These are Timer Jobs. It is easy to get confused by the process of writing, scheduling, administering and updating timer jobs. Luckily, Damon has made it his mission to produce a complete guide for the Sharepoint developer.

any applications need to be able to specify some kind of recurring processes that are run in the background to handle batch processing, Mexecute long running operations, or handle cleanup routines. In SharePoint 2010, these operations are written as SharePoint Timer Jobs and in the following article, I will cover all of the ins and outs of writing one. What is a Timer Job?

Timer Jobs are recurring background processes that are managed by SharePoint. If you navigate to the Central Administration site, click on the Monitoring link from the main page, and then choose the Review job definitions link under the Timer Jobs section, then you’ll see a list of scheduled timer job instances. Notice that I did not say a list of timer jobs, but rather a list of scheduled timer job instances. Unfortunately, the term ‘Timer Job’ in SharePoint is a bit too general. A timer job really consists of three parts: Timer Job Definitions, Timer Job Instances, and Timer Job Schedules. A timer job definition is a .NET class that inherits from the SPJobDefinition class and defines the execution logic of the Timer Job. Since it is just a .NET class, the Timer Job Definition has all of the things that you would expect a class to have: properties, methods, constructors, etc. A timer job instance, as you may have guessed, is an object instance of the .NET Timer Job Definition class. You can have multiple instances of a timer job definition, which allows you to define a Timer Job Definition once, but vary the way it operates by specifying different property values for each Timer Job Instance. Whether you need one or many instances of a Timer Job Definition depends entirely on what you are trying to accomplish. A timer job schedule is the last part of the puzzle. SharePoint exposes a series of pre-defined scheduling classes that derive from the SPSchedule class. A timer job instance must be associated with one of these schedules in order to run, even if you only want to run the timer job instance once. Timer Jobs in the Central Administration UI

As seems standard with a number of constructs in SharePoint, Microsoft has made it a bit confusing for developers by using the term “Job Definition” in the SharePoint user interface that means something a bit different than what you would expect if you work them in code. In Central Administration, you can review all of the “Job Definitions” by clicking on the Monitoring link from the main screen (1); then under the Timer Jobs heading clicking the Review Job Definitions (2) link. Since there is a SPJobDefinition class, you may be led to believe that this screen shows all of the timer job definitions that are available, but that is not the case. SharePoint does not have a mechanism to register a job definition by itself, so it does not have a list of the available job definitions. SharePoint only maintains a list of timer job instances that have been scheduled (i.e. SPJobDefinition instances with an associated SPSchedule). So the Review Job Definitions page is really a list of timer job instances that have been scheduled. Adding to the confusion, there is also a link from the Review Job Definitions page to Scheduled Jobs. Since a timer job instance needs to be scheduled before it can be used, you would think this page contains a list of timer job instances that have been scheduled. Technically-speaking, it does, but it’s basically the same list that you find on the Review Job Definitions page with a different view. The main difference is that this page displays, and is sorted by, the Next Start Time (which informs you when the job will run next) instead of by Title. Timer jobs with schedules that have been disabled will also not appear on this list, so it may have a fewer number of items listed than the Review Job Definitions page. Clicking on the Title from either page takes you to the same timer job instance configuration page that allows you to modify the schedule of the timer job instance. Timer Job Associations

A timer job instance must be associated with either a SharePoint web application (SPWebApplication) or a SharePoint service application (SPService). It may also be associated with a server (SPServer) if you desire, but it is not a requirement. The reason behind the association requirements is that the SPJobDefinition class inherits from the SPPersistedObject class. Without getting into many of the technical details, SharePoint automatically persists state information about SharePoint objects in the farm using a tree-like structure of SPPersistedObject instances. The root of this tree is the SharePoint Farm itself (an SPFarm object) and includes the various web applications, services, and a myriad of other SharePoint objects that reside in the farm. All SPPersistedObject instances must have a parent in order to reside in the tree, and Microsoft deemed the SPWebApplication and SPService objects appropriate places for timer job instances to live in that hierarchy. What do Timer Job Associations Mean for a Developer?

What this means for you, the developer, is that there are two main constructors for the SPJobDefinition class. One of the constructors allows you to associate the timer job with a web application, and the other allows you to associate it with a service application. You will need to determine which one is best suited for your situation, a chore that should be relatively simple. If you are developing a service application, then it should be associated with that service application. Otherwise, it should be associated with a web application. One question that may quickly arise is which web application should I associate my timer job with if my timer job really isn’t targeting a specific web application? I recommend associating it with the Central Administration web application, which will be demonstrated later on in this article. You also have the option of associating a timer job with a specific server in the farm (SPServer). By doing so, it means that your timer job will only run on that one server. A server can be specified for either one of the constructors. In case you were curious, server association has absolutely nothing to do with the SPPersistedObject hierarchy. How Do I Associate My Timer Job with the Central Admin Web Application?

You can get a reference to the web application associated with the Central Admin site through the SPWebAdministration.Local property. Just pass this as the web application to the web application constructor and your timer job will be associated with the Central Admin Web Application. Can I Associate My Timer Job with a Server and Skip the Other Entities?

No. It has to be associated with a web application or service application because it must have a parent in the SPPersistedObject hierarchy for the farm. As mentioned before, the server associated with a timer job instance has nothing to do with that persistence hierarchy. Can I Just Pass A Null Association Into the Constructor?

Nice try, but no. If you attempt to get around the association by passing null into the constructor for either the web application or service application, it will result in a null object reference exception when that constructor is called in your code. Using Associated Entities in Code

There are three properties on the SPJobDefinition that are important when it comes to the SharePoint entities associated with a timer job: WebApplication, Service, and Server. As you would hopefully expect, associating a timer job with any of these items results in these properties being populated with a reference to the associated item. Timer jobs don’t necessarily need to use these references, but if you do happen to need them they are available. How Do I Write a Timer Job Definition?

Writing a timer job definition is extremely easy. All you have to do is create a class that inherits from SPJobDefinition, implement a few constructors, and override the Execute method. Required Assemblies

You will need to add a reference to the Microsoft.SharePoint.dll if you are starting from a blank project. If you created a SharePoint 2010 project, you should not need to manually add any references to your project to write a timer job. If you need to reference an assembly manually, most of the SharePoint assemblies are located in C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\ISAPI Recommended using Statements

Both the SPJobDefinition and SPSchedule classes reside in the Microsoft.SharePoint.Administration namespace. The execute method also accepts a GUID parameter, which lives in the System namespace. As such, you will need (at least) the following two using statements if you don’t want to fully qualify everything in your code:

using System; using Microsoft.SharePoint.Administration;

Inherit from SPJobDefinition

All SharePoint timer job definitions ultimately inherit from SPJobDefinition, so our class will need to do the same. You can, of course, inherit from a different class as long as SPJobDefinition is somewhere in the inheritance chain.

public class MyTimerJob : SPJobDefinition { //Class Code }

Write the Timer Job Definition’s ConstructorsThe SPJobDefinition class exposes three constructors: the default constructor, a web application association constructor, and a service application association constructor. You have two requirements when writing the constructor(s) for your timer job definition:

1. Your timer job must implement a default (parameter-less) constructor. 2. Your timer job must call either the web service association constructor or the service application association constructor from the SPJobDefinition class in one of its constructors.

You are required to implement a default (parameter-less) constructor for your timer job for deserialization purposes. The SPJobDefinition class inherits directly from the SPPersistedObject class, which means that SharePoint automatically stores the state of your timer job to a permanent store without you having to implement any of that storage logic. However, the deserialization mechanism expects a default parameter-less constructor to be present in order to operate correctly. Failure to implement the default constructor will result in the following message when you call .Update() from your timer job class: cannot be deserialized because it does not have a public default constructor. You are also required to call either the web service association constructor or the service application association constructor from one of your timer job definition’s constructors. Remember, your timer job must be associated with one of these two entities. The only time that an association can be created is from the constructors on the SPJobDefinition class, so one of your constructors has to be associated with a call down to the appropriate base constructor. There are four key pieces of information for which your timer job definition needs to account in the constructor:

1. Web Application or Web Service Association Reference 2. Server Association Reference (optional) 3. Name of the timer job 4. Lock Type of the Job

Please understand that the two constructor requirements listed previously are not mutually exclusive – in other words, you are not required to have 2 constructors. If you have static values for these four key pieces of information, then you can implement a single default constructor that calls the appropriate base constructor with your static values, as in the following example:

public MyTimerJob () : base( /* NAME: */ "My Timer Job", /* WEB APP: */ SPAdministrationWebApplication.Local, /* SERVER: */ null, /* LOCK TYPE:*/ SPJobLockType.Job) { //Constructor Code (if applicable) }

Notice that the example above satisfies both requirements: the MyTimerJob class exposes a default constructor, that default constructor is calling down to the web application association constructor of the base class, and all four key pieces of information have been provided. Many timer jobs will, however, require information to be passed in via a constructor. If this is the case, then you will need to implement two constructors: the default (parameter-less) constructor, and the constructor with the parameters that need to be passed. If you are implementing a default constructor simply for the sake of having it for deserialization purposes, then you will find the following constructor helpful:

//Constructor for deserialization public MyTimerJob () : base() { }

//...Your Other Constructor With Parameters...

Notice that you do not need to worry about passing any values into the base default constructor. Remember, SharePoint uses this constructor for deserialization, so all of the properties required by your timer job will be populated back into the timer job by SharePoint during that deserialization process. Naming a Timer Job

One of the four key pieces of information that you must provide a timer job is the timer job name. Timer jobs have two properties related to naming: Name and DisplayName. The Name property is used to identify a timer job, so it must be unique for the timer job instances that exist under the parent SPService or SPWebApplication with which the timer job is associated. In other words, you can have two timer jobs with the same name, as long as they exist under different parents. DisplayName should contain the name of the timer job as it will appear in Central Administration. If you do not provide an explicit DisplayName value, the value will default to the value in the Name property. Since this name is only used as a display value, it does not have to be unique. You should be aware, however, that the timer job instance lists in Central Administration do not display in a hierarchy – they appear as a flat list. As such, you should take care to distinguish the timer job DisplayName in some way for the sake of users. For example, let’s say you have a timer job definition that cleans up files in a web application. You’ve got two timer job instances created, one of which is associated with Web Application A and one of which is Associated with Web Application B. Since the timer job instances reside under different web applications, they can both have the same Name. If you do not give them different display names, this is what users will see in the timer job instance list in Central Administration: Timer Job Timer Job

This can be a bit confusing because it looks like the same timer job is defined twice. By simply varying the DisplayName based on the associated web application’s Title, ID, or URL, you can clear up the confusion and display something far more meaningful like: Timer Job – Web App A Timer Job – Web App B

Later on we’ll discuss how to store custom properties in a timer job instance. Be aware that you can also use these custom properties to set both the Name and the DisplayName values to ensure that the name is unique under a given parent and that the user has some way to distinguish between timer jobs in the timer job instance list. Specifying a Timer Job Lock Type

Another one of the four key pieces of information that you must provide the timer job instance constructor is the SPJobLockType value which helps dictate where and how many times the timer job runs. There are three options for this value:

Name Value Indicates that there are no locks on the job so it will run one time on each server in the farm, but only if the parent web application or service application with which the timer job is associated has been None provisioned to that server. If you want a job to run on all the servers in the farm, then this is the option you will want to choose. Indicates that the timer job will run once for each content database associated with the web application with which the timer job is associated. If there is one content database for the web application, it will run once. If there are twenty content databases for the web application, then it will ContentDatabase run twenty times. When you specify this option, the targetInstanceId parameter on the Execute method will be populated with the GUID of the content database for which the timer job is firing. Use this option when you have timer job operations that must interact with content databases. Job Indicates that the job will run only one time.

If you are wondering exactly where a timer job will run, know that lock types play a major role in defining that location. We will cover in more detail how to determine exactly which machine a timer job will run on later on in this article. Override the Execute Method

The main reason you write a timer job is to run code on a periodic basis. All of the code you want to run when your timer job executes should be located in the conveniently named Execute method. Simply override the method and put in whatever code you want to run:

public override void Execute(Guid targetInstanceId) { //Code to execute when the timer job runs }

As mentioned before, if your timer job has a lock type value of ContentDatabase, then the targetInstanceId parameter is populated with the ID of the content database for which the timer job is being run. Otherwise, you can disregard this parameter. There is really no limit to what you can do inside of this method, so your timer job process can be as simple or as complicated as you like. Writing the MyTimerJob Demo Timer Job Definition

In an effort to demonstrate what we’ve discussed, we’ll go ahead and build out a very simple timer job definition that writes out a single line of text containing the current date/time to a file each time it runs. Of course, you can implement a lot more complex timer jobs, but for the sake of demonstration I do not want to have an overly complex scenario that takes a lot of time to setup. Plus, it should make it easy to see that your timer job is actually working. Following is the code for the MyTimerJob class:

public class MyTimerJob : SPJobDefinition { //Constructor public MyTimerJob() : base() { }

//Constructor public MyTimerJob(string name, SPWebApplication webApp, SPServer server, SPJobLockType lockType) : base(name, webApp, server, lockType) { }

//Timer Job Code public override void Execute(Guid targetInstanceId) { string directory = "C:\\"; string fileName = string.Format("{0}.txt", this.Name); FileInfo fi = new FileInfo(Path.Combine(directory, fileName)); if (!fi.Directory.Exists) fi.Directory.Create(); using (StreamWriter sw = new StreamWriter(fi.FullName, true)) { sw.WriteLine(string.Format("{0}", DateTime.Now)); sw.Flush(); sw.Close(); } } }

The first thing to notice about this class is that it derives from SPJobDefinition class. As mentioned before, all timer job definitions ultimately derive from this class. Next, we have two constructors. The first constructor is a default, parameter-less constructor that is required for serialization purposes. The second constructor mimics the base constructor from SPJobDefinition that associates a timer job with a SharePoint web application. After the constructors, you will see the overridden Execute method. This is the method that contains any code that you want executed during the timer job. In our case, the code builds out a directory and file name, ensures the directory exists, and writes the current date/time to the file. That’s all there is to it. Making a timer job definition is a pretty simple and straightforward process. Scheduling a Timer Job Instance

Once you have a job definition, all that is left to do is create an instance of that definition, set any applicable properties on the instance, and associate instance with a schedule that defines how often the timer job runs. To do this, you have to create an appropriate SPSchedule object and assign it to the Schedule property on your timer job instance. Since the SPSchedule class is an abstract base class that defines common functionality required for scheduling a timer job, you will not be creating instances of an SPSchedule object directly. SharePoint ships with a number of classes that derive from the SPSchedule class that offer a variety of scheduling options. Most of these are fairly self-explanatory, but here is the list none-the-less: Class Name Recurrence SPYearlySchedule Runs the timer job once per year SPMonthlySchedule Runs the timer job once per month on the specified day For example, the 15 of the month Runs the timer job once per month on the specified day and week SPMonthlyByDaySchedule For example, the 3rd Friday of the month SPWeeklySchedule Runs the timer job once per week SPDailySchdedule Runs the timer job once every day SPHourlySchedule Runs the timer job once every hour uns the timer job once every n minutes SPMinuteSchedule where n is the value in the Interval property SPOneTimeSchedule Runs the timer job once

Scheduling a Job to Run Each Day

Following is an example demonstrating how to schedule the MyTimerJob timer job to run once per day between the hours of 11:05 a.m. and 2:15 p.m.

var timerJobInstance = new MyTimerJob("My Timer Job (Hourly)", SPAdministrationWebApplication.Local, null, SPJobLockType.Job);

timerJobInstance.Schedule = new SPDailySchedule() { BeginHour = 11, BeginMinute = 5, EndHour = 13, EndMinute = 15, };

timerJobInstance.Update();

First, we create an instance of the timer job class. In this case we’re calling the timer job instance “My Timer Job (Hourly)”, associating it with the Central Administration web application, not associating it with any particular server in the farm, and specifying the lock type value as Job. Next, we assign a new SPDailySchdedule instance to the Schedule property on the timer job instance. We’re using the object initialization syntax on the SPDailySchdedule constructor to define the window in which the timer job may run – the 11:05 start time is defined using the BeginHour and BeginMinute properties, and the 2:15 end time is defined using the EndHour and EndMinute properties. You can also get really specific and define the start and end time windows down to the second using the BeginMinute and EndMinute properties if you so desire. Finally, we call the Update method on the timer job instance to save the timer job instance and starts it running on the schedule you have defined. If you fail to call update, then your timer job will not run and it will not show up in any of the timer job lists in Central Administration. If you fail to associate your timer job instance with a schedule, your timer job will not run and will not show up in the timer job lists in Central Administration (even if you do call the Update method). What’s with the Begin an End Properties on Schedules?

All of the built-in SPSchedule classes allow you to define a window in which the timer job may run. The date/time when that window begins is defined by the properties prefixed with Begin like BeginDay, BeginHour, BeginMinute, etc. The end of the window is defined by the properties prefixed with End like EndDay, EndHour, and EndMinute. An SPSchedule object communicates the next date and time when the timer job instance is supposed to run via the NextOccurrence method. For all of the built-in SharePoint SPSchedule objects, this method returns a randomized value that occurs somewhere in the window that has been defined. This is extremely beneficial for processor-intensive jobs that run on all servers in the farm because each machine in the farm calls the NextOccurrence method and receives a different start time than the other servers in the farm. Thus, the start-time for each server will be staggered to avoid slamming all of the servers in the farm with a processor-intensive timer job at the same time, allowing your farm to continuing processing requests. Can SharePoint “Miss” a Timer Job if the Scheduled Window is Too Small?

No. Let’s say that you define a timer job instance with a schedule that has a two second window that starts at 10:00:01 and ends at 10:00:02. SharePoint uses the start and end times to calculate a random value that falls between those values. As such, the timer job will be scheduled to run either at 10:00:01 or 10:00:02 (because those are the only two possible values in this scenario). Although it will be random, it is a concrete value that the SharePoint timer service is aware of and will use to ensure that all timer jobs are run. For example, let’s say that at 10:00:00 the SharePoint Timer Job Service “checks” to see if it should be running any jobs. Since it is not yet time to run your timer job instance, the job will not be started. Let’s say the next time the SharePoint Timer Service checks to see if it should be running a job is at 10:00:05, effectively missing the window when the timer job can start. Some people mistakenly believe that since the window was missed, SharePoint will simply not run the job. Rest assured, that is not the case. SharePoint has a concrete time that that timer job instance was supposed to start, and if the current time is past that start time then SharePoint is going to start that timer job. Can I Make a Custom Schedule Class?

Writing a custom SPSchedule is outside of the scope of this article, but it is certainly possible. The question is how useful is it? Writing a custom schedule requires you to create a class that inherits from the SPSchedule class and overrides the NextOccurrence and ToString methods. The ToString method must return a valid recurrence string that defines how often the job recurs, and the syntax for this string has predefined recurrence intervals (e.g. you can’t create your own). As such, it appears that you’re stuck with the intervals that have already been defined, and those are effectively exposed via the SPSchedule objects outlined in the table above. You can find more information about recurrence values from Mark Arend’s blog post on the subject.

Note: I have not done extensive research in this area, and I am not completely familiar with how the recurrence value from the ToString method and the DateTime returned from the NextOccurrence depend on one another. You may be able to “trick” SharePoint by having a valid recurrence value but returning a NextOccurrence value that matches the schedule you actually want to follow.

How Do I Update the Progress Bar for a Timer Job?

Timer jobs have the ability to communicate their overall progress to an end user, which is especially useful for long running processes. Seeing the progress bar update lets users know that the timer job is working and hasn’t hung for some reason. It also gives them a feeling for how long the timer job is going to run if they are waiting for it to complete.

Updating the progress for a timer job is extremely simple: you just call the UpdateProgress method and pass in a value from 0 to 100 to indicate the completion percentage of your timer job. The hardest part is probably figuring out the integer math to come up with the percentage:

public override void Execute(Guid targetInstanceId) { int total = 1000; for (int processed = 1; processed <= total; processed++) { //Next line slows down loop so we can see the progress bar in the UI System.Threading.Thread.Sleep(10);

int percentComplete = (processed * 100) / total; UpdateProgress(percentComplete); } }

One mistake that people often make is calculating the percentage like you were taught in school – by calculating the number of items processed by the total and then multiplying by 100. Unfortunately, this always results in a value of 0 because a decimal value is truncated in integer math. So you need to multiply the dividend by 100 first so your value will have meaning after the decimal portion is truncated.

Divide First (Incorrect) Multiply First (Correct) (5 / 10) * 100 = 0 (5 * 100) / 10 = 50 How Do You Pass Parameters to a Timer Job?

You have two approaches for passing information to a timer job. One option is to have your timer job read from a defined location, like a SharePoint list or SQL database. To pass information to your list, you just write information to that designated location and your timer job will have access to it. This approach works well when you only plan on having one instance of your timer job definition. You also have the option of storing key/value pairs along with your timer job instance using the Properties Hashtable on the SPJobDefinition. The only requirement is that any values you place in the Hashtable must be serializable because they will be persisted to the SPPersistedObject hierarchy with your timer job instance. Since the properties are serialized with each timer job instance, this approach makes a lot of sense when you plan on having multiple instances of a timer job definition. For an example of how to use the Properties Hashtable, see the next section. Are Class-Level Timer Job Properties Stored Automatically?

No. Although a timer job instance is serialized automatically into the SPPersistedObject hierarchy, the public properties on the timer job are not automatically included in this process. If you want to expose a strongly-typed property that is serialized, back the property with the Properties Hashtable as demonstrated in the following example:

public string MyProperty { get { return (string)Properties["MyProperty"]; } set { Properties["MyProperty"] = value; } }

Deleting a Timer Job

Deleting a timer job is really simple. All you have to do is call the Delete method on the timer job instance. The real challenge is getting a reference to your timer job instance so you can call that Delete method. Both the SPWebApplication and SPService classes expose a property named JobDefinitions that contains a SPJobDefinitionCollection containing a collection of the timer job instances associated with the entity. Unfortunately, SPJobDefinitionCollection does not expose any helpful methods for locating your timer job instance, so you’ll have to manually iterate through the collection and check each item to see if it’s the one you want. Following is a helpful extension method named DeleteJob that demonstrates how to look through the collection and find your timer job instance:

public static bool DeleteJob( this SPJobDefinitionCollection jobsCollection, string jobName) { var jobsList = jobsCollection.ToList(); for (int i = 0; i < jobsList.Count; i++) { if (jobsList[i].Name == jobName) { jobsList[i].Delete(); return true; } } return false; }

If you include this extension method, then you can call DeleteJob directly from the JobDefinitions property on both the SPWebApplication and SPService classes:

SPWebAdministration.Local.JobDefinitions.DeleteJob("Timer Job Name");

Which Server Does the Timer Job Run On?

One frequently asked question about timer jobs is which server do they run on? The answer depends on the following factors: Which SPJobLockType is associated with the timer job instance? Which server is the code that creates the timer job instance running on? Is the timer job instance associated with a specific server? Is the parent web application or service application associated with the timer job instance provisioned to the server?

Remember that timer jobs are associated with either a web application or a service application. In order for a server to be eligible to run a timer job instance, the server must have the associated web application or service application for the timer job provisioned to the server. When the SPJobLockType of the timer job instance is set to Job or ContentDatabase, then the timer job is executed on a single server. By default, the server that called the code to create the timer job instance will also be the server that actually runs the timer job. However, if the server is ineligible to run the timer job because it does not have the associated web or service application provisioned, then the timer job will be run by the first server in the farm that does have the appropriate associated entity provisioned. When the SPJobLockType of the timer job instance is set to None, then the timer job is executed on all of the servers in the farm on which the timer job instance’s associated web or service application has been provisioned. When a specific server has been associated with a timer job instance, the timer job will only run on the specified server, even if the SPJobLockType is None. Furthermore, if the server is ineligible to run the timer job because it does not have the appropriate web or service application provisioned, then the job will simply not run. What Account Does the Timer Job Run Under?

Timer jobs are executed by the SharePoint 2010 Timer Service (OWSTIMER.EXE), which you can find in the Services list in under the Administrative Tools item in the Windows Control Panel of your server. To determine the account the SharePoint 2010 Timer Service runs under, just look at the account located in the Log On As column:

By default, the Farm account is associated with the SharePoint 2010 Timer Job. However, there is nothing stopping an administrator from changing the service account through the Windows Services interface. If you are experiencing security or access issues, make sure to manually check the account on each server in the farm to ensure they are all using the same account, and that it is the account you expected. Conclusion

Timer Jobs are just one small part of the beast that is SharePoint, but I hope this has left you with a good understanding of what timer jobs are, how they work, and how to write them!

© Simple-Talk.com

TortoiseSVN and Subversion Cookbook Part 5: Instrumenting Files with Version Information 21 February 2012 by Michael Sorens Contents

Enabling keyword substitution in a file Inserting the author, the revision, or other keywords when committing Automatically enabling keyword expansion in new files Keeping your keyword expansions to fixed widths Finding keyword anomalies Troubleshooting why keyword expansion fails

Subversion lets you embed, and automatically update, information within source-controlled files to make it easy to see who did what, and when they did so. It is not entirely straightforward to get it working , though; unless of course you read, and follow, Michael's easy guide.

his is the fifth installment of the TortoiseSVN and Subversion Cookbook series, a collection of practical recipes to help you navigate through the Toccasionally subtle complexities of source control with Subversion and its ubiquitous GUI front-end, TortoiseSVN. So far this series has covered: Part 1: Checkouts and commits in a multiple-user environment. Part 2: Adding, deleting, moving, and renaming files, plus filtering what you add. Part 3: Putting things in and taking things out of source control. Part 4: Sharing source-controlled libraries in other source-controlled projects.

This installment examines the less well-known but extremely handy world of embedded version information.

Reminder: Refer to the Subversion book and the TortoiseSVN book for further reading as needed, and as directed in the recipes below.

Say you are trying to track down a defect and need to review a collection of files as you probe the system, test hypotheses, and follow your hunches. You believe that this defect showed up only in the last month and seems tied to the minor release that was the result of adding features X and Y worked on by developers A, B, and C. So you know who, you know what (i.e. which files you want to examine) and you know when (both by date and by revision number by cross-referencing the release M.N tag with revision history). So now all you need is an easy way to identify the who, what, and when of the files you are examining. Subversion lets you embed and automatically update these identifying pieces of information within each file you choose with the use of keywords. The Keyword Substitutions section in the Subversion book introduces keywords succinctly: “Subversion has the ability to substitute keywords—pieces of useful, dynamic information about a versioned file—into the contents of the file itself. Keywords generally provide information about the last modification made to the file. Because this information changes each time the file changes, and more importantly, just after the file changes, it is a hassle for any process except the version control system to keep the data completely up to date. Left to human authors, the information would inevitably grow stale.” To use keywords in a file you insert a keyword anchor, which is simply a keyword sandwiched between two dollar signs, e.g. $Date$ or $Id$. The table displays the available keywords, some of which have aliases. You can use either the main keyword or its alias interchangeably in a keyword anchor. Keyword Alias Meaning Date LastChangedDate Date of last known commit; date in local time. Revision LastChangedRevision Revision of last known commit. Author LastChangedBy Last known user to commit. HeadURL URL URL to latest version of the file in the repository. Id none Abbreviated combination of other 4 keywords; date in UTC time. Header none Same as Id except the URL is not abbreviated. Note that keywords do not update based on repository activity; rather they update based on your activity, because keyword expansion is a client- side operation. When you commit, the keywords are updated because of your changes. When you update, keywords are updated because of other people’s changes. There are some other things you’ll need to do to allow keyword expansion to occur, however, as I’ll explain in the recipes of this section. Enabling keyword substitution in a file

Keyword substitution happens only in those files where you have specifically enabled it using the svn:keywords property. You can manually enable it on a file by setting the Subversion properties of that file. To do this, open the TortoiseSVN properties panel from the context menu either with TortoiseSVN >> Properties or with Properties >> Subversion >> Properties and create, or edit, the property there. Alternately, as with most Subversion properties, if you set svn:keywords on a folder you can apply it recursively to every file within that folder. In version 1.6 you must manually select the ‘Apply property recursively’ checkbox in the property editor. Version 1.7 has streamlined the process: it automatically checks the ‘Apply property recursively’ checkbox for folders and unchecks it for files. Technically the svn:keywords property does not apply to folders at all, only to files. Thus, when you apply the svn:keywords property to a folder, you might think nothing happens if the folder is small enough. TortoiseSVN applies the property to all child files but not to the folder itself, and it is the folder’s property list that you are looking at once you apply the property. TortoiseSVN shows a progress box as it processes all the children but again, for a small folder, you might not even notice it. A third approach to setting svn:keywords on a file is the proper, lazy approach: let the system do it for you when you add a file to source control. See the Automatically enabling keyword expansion in new files recipe. The value you provide to the svn:keywords property is simply a list of one or more keywords (see the table in the introduction). In version 1.6 you are on your own with supplying this list. One great feature of Version 1.7 is the custom property editors for each Subversion property. For the svn:keywords property you are given a list of the available keywords and you simply check the ones you want to include. Upon closing the property editor your definition appears in the list of Subversion properties (Figure 5-1).

Figure 5-1 A Subversion property list showing the svn:keywords property. Any of the keywords listed may be actively substituted in the file to which these properties are attached. Inserting the author, the revision, or other keywords when committing

To actually have keywords appear in your files you must enable keywords (see the Enabling keyword substitution in a file recipe) and you must instrument your file with keyword anchors (see the introductory remarks). (You must also satisfy a few further constraints—see Troubleshooting why keyword expansion fails later in this installment.) For example, here is a typical format that I tend to favor—I use just a single keyword anchor (marked in red), but this particular keyword references a composite of the other primary keywords (Date, Revision, Author, and HeadURL). So this is what your file might look like just before a commit:

/* * ======* @ID $Id$ * @created 2010-12-01 * @project http://cleancode.sourceforge.net/ * ======*/

The commit action triggers keyword substitution so your working copy immediately after a commit might look like this for a file called EnumerableDebugger.cs being checked in at revision 1158:

/* * ======* @ID $Id: EnumerableDebugger.cs 1158 2011-10-17 04:26:50Z ms $ * @created 2010-12-01 * @project http://cleancode.sourceforge.net/ * ======*/ Note that Subversion is smart enough to recognize anchors whether they are virgin or whether they have already had substitution applied. Thus, if you commit the same file some time later it will update the revision number and the date in the previously expanded keyword, e.g.:

/* * ======* @ID $Id: EnumerableDebugger.cs 1199 2011-11-05 12:14:50Z ms $ * @created 2010-12-01 * @project http://cleancode.sourceforge.net/ * ======*/

Alternately, you might prefer individual keywords, perhaps something like this:

######################################################################## # Revision $Revision$ # Last Revised $Date$ # Author $Author$ # File $HeadURL$ ########################################################################

And note that you may use either keywords or aliases (as presented in the table in the introductory remarks), so this is equivalent:

######################################################################## # Revision $LastChangedRevision$ # Last Revised $LastChangedDate$ # Author $LastChangedBy$ # File $HeadURL$ ########################################################################

However you choose to organize your keyword anchors, typically you put them in some commented preamble to your file. Automatically enabling keyword expansion in new files

It would be quite a hassle if, every time you add a new file to the repository, you have to manually edit its Subversion properties to enable keyword substitution (as detailed in the Enabling keyword substitution in a file recipe). Fortunately, TortoiseSVN—well, Subversion really—lets you enable keyword substitution in newly added files automatically with some simple, one-time setup. First, locate your Subversion configuration file in one of your "TortoiseSVN… lets you enable keyword substitution in Window’s ApplicationData directories. To find just the right one newly added files automatically with some simple, one- examine $Env:APPDATA from PowerShell (or $APPDATA from time setup." DOS). The full path to the configuration file is $Env:APPDATA\Subversion\config.

Second, search for the enable-auto-props property and make sure it is set to yes. This is the master switch for enabling automatic property attachment to new files. Once the master switch is turned on, then you can enable groups of files specified by extension.

enable-auto-props = yes

Finally, enable automatic properties for the specific files that you are interested in by specifying a line in the config file for the given file extension. In the example fragment shown below, a variety of file extensions define automatic properties but only .h and .txt files define keywords among their properties (highlighted in red). Furthermore, the particular keywords you want to use must be included in the defined list. For example, here the Id keyword is activated for *.txt files but not for *.h files.

*.c = svn:eol-style=native *.cpp = svn:eol-style=native *.h = svn:keywords=Author Date;svn:eol-style=native *.dsp = svn:eol-style=CRLF *.dsw = svn:eol-style=CRLF *.sh = svn:eol-style=native;svn:executable *.txt = svn:eol-style=native;svn:keywords=Author Date Id Rev URL;

With those settings in place in the configuration file, the next time you add a file with a .txt suffix that includes any keyword anchors, they will be expanded when you commit the file. Similarly, the next time you add an include file (.h) including either the Author or Date keyword anchor, those will be expanded. Keeping your keyword expansions to fixed widths

If your keyword anchors appear last on each line, as in this…

######################################################################## # Revision $Revision$ # Last Revised $Date$ # Author $Author$ ########################################################################

…it does not really matter what width the expanded values take, e.g.

######################################################################## # Revision $Revision: 1234 $ # Last Revised $Date: 2011-11-11 12:01:03 -0500 $ # Author $Author: ms $ ########################################################################

However, if your preference is to put them earlier in the line and attempt to make the subsequent phrases line up horizontally like this…

######################################################################## # $Revision$ Revision # $Date$ Last Revised # $Author$ Author ########################################################################

…then as soon as you commit the file your alignments will be askew:

######################################################################## # $Revision: 1234 $ Revision # $Date: 2011-11-11 12:01:03 -0500 $ Last Revised # $Author: ms $ Author ########################################################################

Subversion provides a fixed-length keyword syntax to address just this issue. For each keyword anchor in your file instead of specifying just $anchor$ instead use $anchor::□□□□$ (each box represents one space character), where the number of spaces between the double-colon and the final dollar sign define the fixed-length field width. Thus, the above example becomes this:

######################################################################## # $Revision:: $ Revision # $Date:: $ Last Revised # $Author:: $ Author ########################################################################

When committed, the result is this—notice that short values are padded with spaces while longer values are truncated to the given field width (and include an octothorp to indicate truncation):

######################################################################## # $Revision: 1234 $ Revision # $Date: 2011-11-11#$ Last Revised # $Author: ms $ Author ########################################################################

See the Keyword Substitution section of the Subversion book for more details. Finding keyword anomalies

If you decide to become a keyword aficionado and instrument all your files with keyword anchors, you will likely want to have an easy way to check that you have made all the right connections among anchors, enabled files, and auto-enabled properties and determine whether you missed any. Furthermore, once you manage to achieve a harmonious balance of anchors, enabled files, and auto-enabled properties, you will want to be able to verify over time that the balance you have laboriously put in place remains. Because neither Subversion nor TortoiseSVN provides this capability inherently I created the PowerShell Measure-SvnKeywords function to handle it. (The link takes you directly to the API of that function; the root of my open-source PowerShell library is here.) Note that this function requires that you have command-line Subversion available, not just TortoiseSVN. (But if you are using TortoiseSVN 1.7 or later, it includes the command-line executables in the installation!) Measure-SvnKeywords reports a variety of statistics on keywords "Measure-SvnKeywords… lets you easily see where and keyword-related anomalies, letting you easily see where you are you are missing keyword anchors or, conversely, where missing keyword anchors or, conversely, where you have keyword you have keyword anchors but did not enable keyword anchors but did not enable keyword expansion. It even gives you expansion." information to determine if there are other files where you might want to add keywords. Think of it as being primarily designed to answer these two questions: Do you have files with keyword anchors that do not have keyword expansion enabled? Do you have files with keyword expansion enabled that do not use keyword anchors?

Here is the calling signature of the function:

Measure-SvnKeywords [[-Path] ] [-Include ] [-Exclude ] [-Recurse]
[-ExcludeTree ] [-EnableKeywords]

The first four parameters are quite standard; you will find them behaving identically to those in the standard Get-ChildItem cmdlet, for example. The penultimate parameter, ExcludeTree, is semi-standard, in that it operates the same as it does in Get-EnhancedChildItem (also from my open- source library). That parameter extends the capability of Get-ChildItem to let you prune whole subtrees, quite a useful capability! The final parameter, EnableKeywords, lets you actually go beyond just measuring Subversion keywords and update them! I’ll say more on that shortly. Measure-SvnKeywords generates a report of a variety of keyword-related statistics on the set of files you specify. It can take some time to run, though: it takes perhaps a full minute to process a couple thousand files in my working copy. Because of the non-instantaneous run time, it makes use of the standard Write-Progress cmdlet to provide feedback during execution; this flexible cmdlet adapts to its environment—see Figure 5-2. Running inside the PowerGUI Script Editor, it displays a first-class pop-up progress monitor (left side of the figure). Inside a plain, text-oriented PowerShell window, it displays an ASCII rendition of a progress bar (right side of the figure).

Figure 5-2 The PowerShell Write-Progress cmdlet renders a progress bar for Measure-SvnKeywords, adapting to the environment in which it runs (PowerGUI on the left, plain PowerShell window on the right). The Measure-SvnKeywords report includes these six sections: Report Section Description Enumerates each file type (by extension) enabled for auto-property Extensions enabled in configuration file attachment in your configuration file, along with the keyword anchors assigned to each type. Summary of SVN files with keywords Summary of all SVN files instrumented with keyword anchors plus a count of each file type. All files with keywords not enabled IN CONFIG All files with keywords not enabled IN CONFIG List of all files (not just SVN files) that have keyword anchors but whose file FILE types are not enabled in the configuration file. SVN files without keywords List of all SVN files not instrumented with keyword anchors. SVN files without keywords where keywords List of all SVN files not instrumented with keyword anchors yet having the are enabled svn:keywords property. SVN files with keywords to be enabled List of all SVN files instrumented with keyword anchors but without the svn:keywords property.

Here is an excerpt from the report generated on my own development system:

=== Extensions enabled in configuration file:

*.bat => Author Date Id Rev URL *.cmd => Author Date Id Rev URL *.cs => Author Date Id Rev URL *.java => Author Date Id Rev URL *.js => Author Date Id Rev URL *.pl => Author Date Id Rev URL *.pm => Author Date Id Rev URL *.ps1 => Author Date Id Rev URL *.psm1 => Author Date Id Rev URL *.sql => Author Date Id Rev URL *.txt => Author Date Id Rev URL *.xml => Author Date Id Rev URL

=== Summary of SVN files with keywords:

Extension=.cs Occurrences= 92 Extension=.html Occurrences= 3 Extension=.java Occurrences= 53 Extension=.js Occurrences= 29 Extension=.pm Occurrences= 36 Extension=.ps1 Occurrences= 9 Extension=.psm1 Occurrences= 5 Extension=.sql Occurrences= 8 Extension=.xml Occurrences= 92

=== All files with keywords not enabled IN CONFIG FILE:

None

=== SVN files without keywords (10 files):

Extension=.bat Occurrences= 3 *****C:\code\dotnet\SqlDiffFramework\installer\package.bat *****C:\code\js\ccwebpages\jsmake.bat *****C:\code\js\validate\jsmake.bat

Extension=.cs Occurrences= 1 *****C:\code\dotnet\SqlDiffFramework\SqlDiffFramework\Program.cs

Extension=.html Occurrences= 6 *****C:\code\powershell\CleanCode\Assertion\module_overview.html *****C:\code\powershell\CleanCode\DocTreeGenerator\module_overview.html *****C:\code\powershell\CleanCode\EnhancedChildItem\module_overview.html *****C:\code\powershell\CleanCode\IniFile\module_overview.html *****C:\code\powershell\CleanCode\SvnSupport\module_overview.html *****C:\code\powershell\CleanCode\namespace_overview.html

=== SVN files without keywords where keywords are enabled (4 files):

Extension=.cs Occurrences= 1 *****C:\code\dotnet\SqlDiffFramework\SqlDiffFramework\Program.cs

Extension=.pl Occurrences= 1 *****C:\code\cleancode-support\pscaption.pl Extension=.ps1 Occurrences= 2 *****C:\code\powershell\scripts\AnalyzeMySvnKeywords.ps1 *****C:\code\powershell\scripts\GenerateCleanCodeAPI.ps1

=== SVN files with keywords to be enabled (3 files):

Extension=.ps1 Occurrences= 2 *****C:\code\powershell\CleanCode\SvnSupport\SvnInfo.ps1 *****C:\code\powershell\CleanCode\Svn\SvnTrackerPat.ps1

Extension=.psm1 Occurrences= 1 *****C:\code\powershell\CleanCode\Assertion\Assertion.psm1

=== Enabling keywords on files containing keywords:

property 'svn:keywords' set on 'C:\code\powershell\CleanCode\Assertion\Assertion.psm1' property 'svn:keywords' set on 'C:\code\powershell\CleanCode\Svn\SvnInfo.ps1' property 'svn:keywords' set on 'C:\code\powershell\CleanCode\Svn\SvnTrackerPat.ps1'

The final section of the report (Enabling keywords on files containing keywords) displays the effect of the -EnableKeywords parameter: it processes the files identified in the section just above it, i.e. those that have keyword anchors but do not have the svn:keywords property. Any such files clearly indicate an error—keyword anchors are useful if and only if svn:keywords is defined. Other sections of the report may or may not show something that needs to be fixed: This is something you’ll need to check individually; but any files in that last section of the report are in a known inconsistent state so may be fixed programmatically. Recalling the two-stage process of Subversion (make changes, then commit changes) activating the -EnableKeywords action is perfectly safe; you still have to commit these property changes so you may review them at your leisure. Troubleshooting why keyword expansion fails

For Subversion keyword expansion to occur you must satisfy all of the criteria listed below—if any one of these is missing you will not see keyword expansion occur—and neither Subversion nor TortoiseSVN provides any clue which criterion failed. The table distinguishes files newly added to Subversion (SVN Add) from those already in Subversion just being modified (SVN Update). The additional criteria for new files (items 1 – 3) are technically not for keyword expansion to occur; rather, they are convenience steps that alleviate the need to items 4 and 5 by hand for each new file that you add. Action New file Existing file 1 Master switch enabled in config file (enable-auto-props property) • — 2 File type defined in config file • — 3 Keyword of interest specified for file type in config file • — 4 Keyword expansion enabled for given file (svn:keywords exists) • • 5 Particular keyword enabled (included in svn:keywords property list) • • 6 File instrumented with keyword anchor • • 7 Keyword anchors correctly cased • • 8 File svn:mime-type indicates text • • 9 File is not Unicode (UTF-16 or UTF-32) • • 10 File committed to repository • •

Notes on the table:

Items 1 through 6 are amply covered in the earlier recipes in this section. Item 7: You must use the correct case for keyword anchors in your file for them to be recognized. That is you must use $Date$ or $Revision$; case variations (e.g. $DATE$) would not work. Item 8: Subversion only performs keyword substitution on files that it considers to be human-readable—this is, files which don't carry an svn:mime-type property whose value indicates otherwise. Item 9: I almost did not include this line item when I encountered the situation because I could not believe Subversion does not support keyword expansion on Unicode files! To investigate this, I added a Unicode file to Subversion and noticed that it set the mime-type (in Subversion properties) to application/octet-stream. To see if I could override this indication of a non-text status, I changed the mime-type manually to text, which actually resulted in the svn:mime-type property being deleted from the properties list (since text is the default presumably). I committed the file to the repository then added a keyword anchor and committed again. The keyword anchor was not expanded. This issue, as it turns out, is really a sub-category of item 8: Subversion simply does not provide support for Unicode files as text, so it does not expand keywords. This is an outstanding defect. I found one other symptom related to this (item 9): I tried to assign the svn:eol-style property a value of native but TortoiseSVN refused saying the file has inconsistent line endings; in reality, it appears to be simply because it does not realize it is text. To confirm, I converted my file from Unicode to ASCII and then the keywords expanded upon commit. Item 10: This last item—committing your file—seems so "Keyword expansion … occurs only when you innocuous but you should take a moment to consider the initiate an SVN Update or SVN Commit, i.e. at the implications. Keyword expansion from virgin anchors or, time you synchronize your working copy with the perhaps more insidiously, keyword update from previous repository." expanded anchors, occurs only when you initiate an SVN Update or SVN Commit, i.e. at the time you synchronize your working copy with the repository. That is, keyword expansion is strictly a client-side operation. As the Subversion book states, “…your client ‘knows’ only about changes that have occurred in the repository when you update your working copy to include those changes. If you never update your working copy, your keywords will never expand to different values even if those versioned files are being changed regularly in the repository. [my emphasis]” So SVN Update will trigger keyword expansion, as will SVN Commit: both of these operations synchronize your working copy with the repository.

© Simple-Talk.com Database Migration Scripts: Getting from place A to place B

Published Tuesday, February 28, 2012 1:00 AM

We'll be looking at a typical database 'migration' script which uses an unusual technique to migrate existing 'de-normalised' data into a more correct form. So, the book-distribution business that uses the PUBS database has gradually grown organically, and has slipped into 'de-normalisation' habits. What's this? A new column with a list of tags or 'types' assigned to books. Because books aren't really in just one category, someone has 'cured' the mismatch between the database and the business requirements. This is fine, but it is now proving difficult for their new website that allows searches by tags. Any request for history book really has to look in the entire list of associated tags rather than the 'Type' field that only keeps the primary tag.

We have other problems. The TypleList column has duplicates in there which will be affecting the reporting, and there is the danger of mis-spellings getting there. The reporting system can't be persuaded to do reports based on the tags and the Database developers are complaining about the unCoddly things going on in their database. In your version of PUBS, this extra column doesn't exist, so we've added it and put in 10,000 titles using SQL Data Generator. /* So how do we refactor this database? firstly, we create a table of all the tags. */

IF OBJECT_ID('TagName') IS NULL OR OBJECT_ID('TagTitle') IS NULL BEGIN CREATE TABLE TagName (TagName_ID INT IDENTITY(1,1) PRIMARY KEY , Tag VARCHAR(20) NOT NULL UNIQUE) /* ...and we insert into it all the tags from the list (remembering to take out any leading spaces */

INSERT INTO TagName (Tag) SELECT DISTINCT LTRIM(x.y.value('.', 'Varchar(80)')) AS [Tag] FROM (SELECT Title_ID, CONVERT(XML, '' + REPLACE(TypeList, ',', '') + '') AS XMLkeywords FROM dbo.titles)g CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y )

/* we can then use this table to provide a table that relates tags to articles */

CREATE TABLE TagTitle (TagTitle_ID INT IDENTITY(1, 1), [title_id] [dbo].[tid] NOT NULL REFERENCES titles (Title_ID), TagName_ID INT NOT NULL REFERENCES TagName (Tagname_ID) CONSTRAINT [PK_TagTitle] PRIMARY KEY CLUSTERED ([title_id] ASC, TagName_ID) ON [PRIMARY])

CREATE NONCLUSTERED INDEX idxTagName_ID ON TagTitle (TagName_ID) INCLUDE (TagTitle_ID,title_id)

/* ...and it is easy to fill this with the tags for each title ... */

INSERT INTO TagTitle (Title_ID, TagName_ID) SELECT DISTINCT Title_ID, TagName_ID FROM (SELECT Title_ID, CONVERT(XML, '' + REPLACE(TypeList, ',', '') + '') AS XMLkeywords FROM dbo.titles)g CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y ) INNER JOIN TagName ON TagName.Tag=LTRIM(x.y.value('.', 'Varchar(80)')) END

/* That's all there was to it. Now we can select all titles that have the military tag, just to try things out */ SELECT Title FROM titles INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE tagname.tag='Military'

/* and see the top ten most popular tags for titles */ SELECT Tag, COUNT(*) FROM titles INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID GROUP BY Tag ORDER BY COUNT(*) DESC

/* and if you still want your list of tags for each title, then here they are */ SELECT title_ID, title, STUFF( (SELECT ','+tagname.tag FROM titles thisTitle INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE ThisTitle.title_id=titles.title_ID FOR XML PATH(''), TYPE).value('.', 'varchar(max)') ,1,1,'') FROM titles ORDER BY title_ID So we've refactored our PUBS database without pain. We've even put in a check to prevent it being re-run once the new tables are created. Here is the diagram of the new tag relationship

We've done both the DDL to create the tables and their associated components, and the DML to put the data in them. I could have also included the script to remove the de-normalised TypeList column, but I'd do a whole lot of tests first before doing that. Yes, I've left out the assertion tests too, which should check the edge cases and make sure the result is what you'd expect. One thing I can't quite figure out is how to deal with an ordered list using this simple XML-based technique. We can ensure that, if we have to produce a list of tags, we can get the primary 'type' to be first in the list, but what if the entire order is significant? Thank goodness it isn't in this case. If it were, we might have to revisit a string-splitter function that returns the ordinal position of each component in the sequence. You'll see immediately that we can create a synchronisation script for deployment from a comparison tool such as SQL Compare, to change the schema (DDL). On the other hand, no tool could do the DML to stuff the data into the new table, since there is no way that any tool will be able to work out where the data should go. We used some pretty hairy code to deal with a slightly untypical problem. We would have to do this migration by hand, and it has to go into source control as a batch. If most of your database changes are to be deployed by an automated process, then there must be a way of over-riding this part of the data synchronisation process to do this part of the process taking the part of the script that fills the tables, Checking that the tables have not already been filled, and executing it as part of the transaction.

Of course, you might prefer the approach I've taken with the script of creating the tables in the same batch as the data conversion process, and then using the presence of the tables to prevent the script from being re-run. The problem with scripting a refactoring change to a database is that it has to work both ways. If we install the new system and then have to rollback the changes, several books may have been added, or had their tags changed, in the meantime. Yes, you have to script any rollback! These have to be mercilessly tested, and put in source control just in case of the rollback of a deployment after it has been in place for any length of time. I've shown you how to do this with the part of the script .. /* and if you still want your list of tags for each title, then here they are */ SELECT title_ID, title, STUFF( (SELECT ','+tagname.tag FROM titles thisTitle INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE ThisTitle.title_id=titles.title_ID FOR XML PATH(''), TYPE).value('.', 'varchar(max)') ,1,1,'') FROM titles ORDER BY title_ID

…which would be turned into an UPDATE … FROM script. UPDATE titles SET typelist= ThisTaglist FROM (SELECT title_ID, title, STUFF( (SELECT ','+tagname.tag FROM titles thisTitle INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID WHERE ThisTitle.title_id=titles.title_ID ORDER BY CASE WHEN tagname.tag=titles.[type] THEN 1 ELSE 0 END DESC FOR XML PATH(''), TYPE).value('.', 'varchar(max)') ,1,1,'') AS ThisTagList FROM titles)f INNER JOIN Titles ON f.title_ID=Titles.title_ID

You'll notice that it isn't quite a round trip because the tags are in a different order, though we've managed to make sure that the primary tag is the first one as originally. So, we've improved the database for the poor book distributors using PUBS. It is not a major deal but you've got to be prepared to provide a migration script that will go both forwards and backwards. Ideally, database refactoring scripts should be able to go from any version to any other. Schema synchronization scripts can do this pretty easily, but no data synchronisation scripts can deal with serious refactoring jobs without the developers being able to specify how to deal with cases like this. by Phil Factor The Road to Professional Database Development: Set-Based Thinking 21 February 2012 by Peter Larsson

Under the pseudonym of 'SwePeso', Peter Larsson is famous on SQL forums for the amazing performance he can get from SQL. How does he do it? In the first of a series of articles, Peter explains his secrets.

he single most common barrier to writing efficient database code, for a developer versed in a typical client language (such as C#), is an inability Tto leave behind the habit of working on one row at a time, in favor of working with a set (a collection of rows). What is the difference? Let's examine a very simple example of summing up 5 rows in a table. The following code creates a table variable (@Sample) and populates it with the rows of data.

DECLARE @Sample TABLE ( Data INT )

INSERT @Sample ( Data ) VALUES ( 1 ), ( 2 ), ( - 4 ), ( 5 ), ( 8 )

SELECT Data FROM @Sample

For a .NET developer this pseudo-code would be the way to perform this simple sum:

Dim intS AS Integer = 0, 'Initialise the sum to zero objR AS Row

For Each objR In coll.Rows intS += objR.Data Next

Debug.Print intS

This is the standard approach for a procedural language; we not only tell the compiler what we want (the sum of all objects) but also how we want to approach the objects when calculating the sum. In this example, the coll variable holds a collection with objects of type Row. The For-Next loop iterates through all the objects and, for each object, fetches the current value from the Data property and adds the value to the intS variable. In other words the code operates on one row at a time. T-SQL is a declarative language which means that the programmer is supposed to tell the database server what is required but let SQL Server decide the best way to finalize the task. However, many programmers force the procedural approach on SQL Server, with code such as the following:

DECLARE curSum CURSOR FOR SELECT Data FROM @Sample

DECLARE @s INT , @Total INT = 0

OPEN curSum

FETCH NEXT FROM curSum INTO @s

WHILE @@FETCH_STATUS = 0 BEGIN SET @Total + = @s FETCH NEXT FROM curSum INTO @s END

CLOSE curSum DEALLOCATE curSum

SELECT @Total

Even though this code is written in T-SQL, it still operates on one object (row) at any given time. This approach is commonly referred to as Row-by- Agonizing Row (RBAR). If we increase the number of rows tenfold, the time to complete the task would increase by a factor of ten; for a very large table (100 million rows or more) this sum could easily take hours to complete. So how do you do this in a set-based way? Luckily for us, T-SQL has an aggregate function available to do the job for us.

SELECT SUM ( Data ) FROM @Sample

The following table compares the performance of the cursor method, versus the set-based method, for an increasing numbers of rows.

Rows CURSOR SET BASED Performance Factor DURATION (mS) READS DURATION READS DURATION READS 1k 47 4009 0 7 – 953 10k 492 40004 3 42 164 953 100k 5189 400389 28 388 179 1039 1000k 51358 4003848 286 3847 180 1041

As you can see, both solutions scales linearly i.e. when we increase the number of rows by a factor of ten, we increase the query duration by roughly the same factor. However, there is one big difference, and that is that the set-based solution is about 180 times faster, and puts only about 1/1000th of the read pressure on the server! So why is the performance difference so vast? We'll discuss this in full detail in a later article in this series, but in essence, the set-based approach allows SQL Server to return the all the required rows very efficiently, fully exploiting the underlying data storage mechanisms. The internal structure where the data is stored in SQL Server is called a Page and it defines the minimum amount of data read or written to the database, regardless of whether you want all rows in that Page or only one row. So, with a set-based approach, we only need to fetch a page once to get all x rows stored within, whereas with the RBAR approach, we must re-fetch the page x times. We'll also see in a later article how, with appropriate indexes in place, we can speed up this process still further. This paradigm shift is all there is to it! SQL Server has some built-in aggregate functions to accomplish the task of summing a number of rows, and many others. The key difference is that you, in set-based thinking, instead of working with individual rows, we work with sets which can contain any number of rows. To become a professional database developer you have to abandon the old habit of thinking what you want to do with each row, and start thinking instead about what you want to do with the set, by manipulating the columns. Think vertical instead of horizontal! If we take this example a step further, considering the task of multiplying together all the rows, we will encounter the second biggest barrier to becoming a professional database developer, and that is math skills. For a .NET developer, this pseudo code would be the way to go:

Dim intP AS Integer = 1, ' Initialise the product to 1 objR AS Row

For Each objR In coll.Rows intP *= objR.Data Next

Debug.Print intP

However, we just established that we no longer want to work with single rows, right? Unfortunately, however, while SQL Server provides a handy SUM aggregate function, it does not provide a MULTIPLY aggregate function. So, how do we perform this task in a set-based fashion? This is where your math skills become handy.

A * B * C equals e^(ln(A)+ln(B)+ln(C)) Having transformed the multiplication into a sum, it's easier to see that the final query should look something like this:

SELECT EXP ( SUM ( LOG ( Data ))) FROM @Sample

But we still have an issue here; logarithmic operations are only allowed on positive values, and not including zero (logarithmic operations on zero and negative values are undefined in natural numbers and will give an Invalid Floating Point operation in T-SQL), and multiplication with zero, under any circumstances, is equal to zero. Also, remember that a multiplication with an odd number of negative numbers results in a negative product and multiplication with an even number of negative number results in a positive product. Here is one example of how to solve this problem:

IF EXISTS( SELECT * FROM @Sample WHERE Data = 0 ) SELECT 0.0E -- If any value is zero, the total product is zero. ELSE SELECT CASE IsNegativeProduct WHEN 1 THEN - EXP ( theSum ) ELSE EXP ( theSum ) END FROM ( SELECT SUM ( LOG ( ABS ( Data ))), -- Sum all exponents -- Keep track if product is negative or positive SUM ( CASE WHEN Data < 0 THEN 1 ELSE 0 END ) % 2 FROM @Sample ) AS d ( theSum , IsNegativeProduct )

The main reason for the checks in the above code is the limitations of the LOG function. So, can we complete the task in another fashion and still produce the correct result? Yes, we can:

DECLARE @p INT = 1

UPDATE @Sample SET @p * = Data

SELECT @p

This code, which looks very much like the C# pseudo code, produces the right answer, a value of -320, but how did it work? The difference is that we tell T-SQL to multiply the values but we do not force T-SQL to retrieve the rows in any order. Another reason we can use the code above is due to the nature of a product. If we want the product of A and B (a * b), the reverse is also valid (b * a); it doesn't matter which order we take the values in the multiplication, as long they are of same data type. This leads us to the third 'light bulb' moment on the road to becoming a professional database developer: understanding that rows in a database have no order. This is very different to the way .NET developers think. A collection of objects is normally a double-linked list, where you can traverse forward and backward. Rows in a table can be logically ordered by assigning a sequence number or a sequence denoted by date and time, but you can never rely on the physical order of rows in a set. Summary

Think in sets and not in single rows Brush up your math skills as much as you possibly can Rows have no determined physical order

Coming Next…. Efficient storage and retrieval: Normalization rules.

© Simple-Talk.com Revisiting Hosting Reflector 23 February 2012 by Nick Harrison

You can automate .NET Reflector processes, and run .NET Reflector from within a .NET application, from within ASP.NET or even within ASP.NET MVC. What is more you can host a reflector Add-in. This opens up many possibilities as Nick Harrison points out, and he has the code to prove it too.

hree years ago, I wrote about hosting Reflector in your own application. At that time, we created a User Control that used Reflector to show the Tdisassembled code for whatever method we specified. A lot has changed in the intervening years. If you are not already familiar with Reflector, it is a wonderful tool that allows you to disassemble compiled code and navigate through this output. There is also a wide array of plug ins available to extend the functionality to. Here we will exploit the same features that allow you to write plug ins to drive Reflector. Reflector has gone through a couple of major version upgrades and more than its share of controversy. There are lots of new language features in the DotNet framework and asp.net is no longer the only game in town for building a Web UI in a DotNet world. So, how do we go about harnessing the power of Reflector for our own purposes, and what has changed in the past three years? We will explore this and a couple of other questions as we update the code from this older article to the latest version of both Reflector and the DotNet framework. Setting up the Environment

At first glance, it looks like nothing has changed at all, and this is close to the truth. All of the interfaces that we used before are still there and they still have the same meaning. In fact, if you start with the code from the original article and simply replace the referenced to Reflector with the most current version and switch the Target Framework to 4.0 everything compiles. This gives us a warm fuzzy feeling, but all is not right. When we run the original code, we are immediately presented with a most unsightly error.

My initial solution was to simply add the ASP COM Compatibility attribute to my page directive since I know that this will force the web page to operate as an STA thread. So, I add this:

<%@ Page Language="C#" AutoEventWireup="true" Codebehind="Default.aspx.cs" Inherits="ReflectorHost._Default" ASPCOMPAT="true" %>

And when I refresh the page the error goes away and I get the same output that I got three years ago. And it looks like all is right with the world, until I refresh the page:

This is an even uglier error than the first one. Clearly, the page attribute did not solve the problem. Fortunately we get a clue in that both times, the error message is on the same line of code. Turning our attention to this line, we can discover that there is an overload to the constructor. This overload exposes a new Boolean parameter named embedded.

Well, this looks promising. So we change the errant line to this

serviceProvider = new ApplicationManager(null,true);

Now when we rerun the application, we get the expected results and we can refresh the page to our heart’s content. Considering all of the changes that Reflector has been through, this is not a lot of changes to make three year old code work with the latest version. Cleaning up the Code

Well the code now works and runs as expected, but looking back at it, it seems a little dated. It does not take advantage of any of the new compiler syntax candy that Microsoft has passed out recently. Let’s see what we can do. The for loop to initialize the language can be rewritten from:

foreach (ILanguage language in languageManager.Languages) { languageNames.Add(language.Name); } To a single LINQ statement like this:

languageNames = (from l in languageManager.Languages .Cast() select l.Name).ToList();

…or this depending on your preferred syntax.

languageNames = (languageManager.Languages.Cast() .Select(l => l.Name)).ToList();

This is only a stylistic preference because both ways compile to the same IL and Reflector will not be able to tell the difference.

You may notice that this is not what it looks like from Reflector:

Why is that? I suspect that I am missing a setting in the ConfigureLanguageConfiguration method, but I have not figured out which one yet. This is a frustrating inconsistency between Reflector and our hosted implementation. Surprisingly, evaluating LINQ style expressions have been the only inconsistencies that I have found. Reflector Meets MVC

As I mentioned earlier, ASP.Net is also a little dated. The cool kids these days use MVC and use custom editors instead of WebControls. How will that look different? The first thing that we need to do is move much of the heavy lifting out of the existing ReflectorComponent class. Let’s create a new class called ReflectorTools. Into this class we will move most of the constructor code for ReflectorComponent and the DisassembleMethod itself. The ReflectorComponent now looks like this:

public ReflectorComponent() { reflector = new ReflectorSupport(); }

private ReflectorSupport reflector; public int LanguageCode { get; set; }

public string AssemblyPath { get; set; }

public string TypeName { get; set; }

public string MethodName { get; set; }

protected override void Render(HtmlTextWriter writer) { writer.WriteLine(""); writer.AddAttribute(HtmlTextWriterAttribute.class, "code"); writer.RenderBeginTag(HtmlTextWriterTag.Div); reflector.Disassemble(LanguageCode, AssemblyPath, TypeName, MethodName, writer); writer.RenderEndTag(); }

public List LanguageNames { get { return reflector.LanguageNames; } }

}

With this code pulled out of the UserControl, we are free to reuse it in our MVC application. This makes our MVC very straightforward to implement. To duplicate the functionality from our ASP.Net application, we need a controller like this:

public class DefaultController : Controller { public ActionResult Index() { return View(0); } [HttpPost] public ActionResult Index (int Language) { return View(Language); } }

We will pass the selected language to the View. We start by passing in 0 to default to the first language. Subsequent post backs will pass in the selected item from the language drop down back to the controller which we will pass back to the View. The View may look like this:

<% Html.BeginForm(); {%> <%=Html.DropDownList("Language", Html.LanguageDropDown(Model ), new {onchange = "this.form.submit();"})%>

<% var assemblyPath = Server.MapPath(@"~\bin\ReflectorWrapper.dll");%> <%=Html.ShowCode(Model, assemblyPath, "ReflectorSupport", ".ctor")%>
<% }%>

There are a couple of interesting things to note here. We make the drop down list behave like an autopostback by setting the onchange to submit the form. The call to BeginForm ensures that the form will post back to the same controller. The final piece to add is the extension methods to the HTMLHelper. The LanguageDropDown will simply pull from the property exposed by the ReflectorSupport class:

public static IEnumerable LanguageDropDown (this HtmlHelper html, int whichLanguage) { var reflector = new ReflectorSupport();

var returnValue = (from language in reflector.LanguageNames select new SelectListItem {Text = language}).ToList() ; int index = 0; foreach (var item in returnValue) { item.Value = index.ToString(); index++; if (item.Value == whichLanguage.ToString()) item.Selected = true; } return returnValue; }

The ShowCode method is very simple since the ReflectorSupport class is doing all of the work:

public static string ShowCode(this HtmlHelper html, int languageCode, string assemblyPath, string typeName, string methodName) { var reflector = new ReflectorSupport(); var htmlTextWriter =new HtmlTextWriter (html.ViewContext.Writer ); reflector.Disassemble(languageCode, assemblyPath, typeName, methodName, htmlTextWriter); return ""; }

This also showcases the appeal of the MVC framework. Comparing the two implementations shows how much simpler it can be to implement the same functionality with MVC Hosting a Add in

Disassembling code is fun and informative, but it is not the only trick up Reflector’s sleeve. There are also a wide array of add ins available. We can also host these in our application. One of my favorite add ins is Code Metrics . It allows you to easily track key software metrics such as cyclomatic complexity, coupling, and number of instructions. In an earlier article, I provided a good overview of how to use this add in to help identify where to focus your refactoring efforts. It can be very helpful to host this add in to automate gathering these metrics.. Let’s start by running Reflector on the add in code. Because this is an add-in, we want to look for the class that implements the IPackage interface. We can quickly see that this is the CodeMetricPackage object. This object includes the two methods Load and Unload to implement the interface. The Load method, handles hooking the add in into the Reflector runtime. Because we are hosting Reflector, it is up to us to create the environment that the add in needs to operate. Actually, we have it a lot easier than Reflector does because we are not targeting to host every add-in. We are targeting to host and manipulate only this single add in that has caught our eye. From the implementation of the CodeMetricPackage, we see that the Load and Unload methods deal with wiring up event handlers. From this code, we learn that the user interacts with the add in through the CodeMetricWindow. We can skip most of the details in the CodeMetricPackage and redirect our attention to this CodeMetricWindow. In looking at this object, we see that it is internal sealed, so we cannot instantiate it directly, but we can investigate how the window works and replicate the code on our own. Remember we are not trying to duplicate the usage and behavior of the add in. We only want to access some of the functionality. We can ignore the logic associated with displaying a list of all accessible assemblies. We are not really interested in allowing the user to sort the metric results and we don’t care about the status updates. This allows us to ignore much of the code and hone in on the StartAnalysis method. In looking at this method, we learn that the heavy lifting is done by the Analyze method of the CodeMetricManager. From this method, we see that the CodeMetricManager includes a collection of Assemblies and a collection of CodeMetrics. Again we aren’t interested in the details in the UI feedback from the event handers, but we are interested in how the CodeMetrics and the Assemblies get populated. The CodeMetrics collection is initialized in the constructor for the CodeMetricWindow. Reflector registers three different classes of metrics. We are only interested in the MethodCodeMetric. Looking around a bit further, we see that the CodeMetricManager also includes a couple of methods for maintaining the list of Assemblies. We will call the AddAssembly method to specify which assemblies to analyze. Once again, our job is easy because we won’t allow the user to interactively pick an Assembly. We will specify an Assembly, analyze it, and report the results. We are now ready to add a CodeMetrics method to the ReflectorSupport class by piecing together the pieces that we have found so far.

public IList CodeMetrics(string assemblyPath) { var assemblyManager = (IAssemblyManager) serviceProvider. GetService(typeof (IAssemblyManager)); assemblyManager.Resolver = new CustomAssemblyResolver(assemblyManager);

We will come back to the MethodMetricData class shortly, but so far everything looks like the same code used to initialize the environment earlier. Now we are ready to initialize and configure the CodeMetricManager. Unlike Reflector, we will only register the one metric. We could change this line to pull in other metrics

var codeMetricManager = new CodeMetricManager(); codeMetricManager.Register(new MethodCodeMetric());

Now we let Reflector load the specified assembly and pass it to the codeMetricManager. The CodeMetrics collection serves double duty. Not only does it identify the metrics to run but it also tracks the results. We explicitly add our assembly of interest to the CodeMetricManager, call the Analyze method and gather the results.

if (File.Exists(assemblyPath)) { IAssembly assembly = assemblyManager.LoadFile(assemblyPath); codeMetricManager.AddAssembly(assembly); codeMetricManager.Analyze(); foreach (CodeMetric metric in codeMetricManager.CodeMetrics) { foreach (DataRow row in metric.Result.Rows ) { var data = new MethodMetricData { Name = (string) row[0], CodeSize = (int) row[1], CyclomaticComplexity = (int) row[2], Instructions = (int) row[3], Locals = (int) row[4], MaxStack = (int) row[5], ExceptionHandlers = (int) row[6], throw = (int) row[7], NewObj = (int) row[8], Ret = (int) row[9], CastClass = (int) row[10] }; returnValue.Add(data); } } }

The metricResult uses a custom type defined in the Reflector.CodeMember assembly, but it is not the most intuitive to use for our perspective. Plus we don’t want to create an external dependency to Reflector or the CodeMetric add in. This is why we created the MetricMethodData. It will be a simple POCO object (Plain Old CLR Object) that will expose a property for each metric that the MethodCodeMetric calculates.

namespace ReflectorWrapper { public class MethodMetricData { public string Name { get; set; } public int CodeSize { get; set; } public int CyclomaticComplexity { get; set; } public int Instructions { get; set; } public int Locals { get; set; } public int MaxStack { get; set; } public int ExceptionHandlers { get; set; } public int throw { get; set; } public int NewObj { get; set; } public int Ret { get; set; } public int CastClass { get; set; } } }

The MethodMetricData is also suitable for use as a Model in our MVC application. Our controller may look like this:

public class CodeMetricsController : Controller { public ActionResult Index() { var executingAssembly = System.Reflection.Assembly.GetExecutingAssembly(); var references = executingAssembly.GetReferencedAssemblies(); var model = (from name in references let assembly = Assembly.Load(name) select new ReflectorSourceModel { DisplayName = name.Name, Location = assembly.Location }).ToList(); return View(model); }

public ActionResult Detail(string name) { var reflector = new ReflectorSupport(); IList data = reflector.CodeMetrics(name); data = data.Where(p => p.CyclomaticComplexity >= 10 || p.Instructions >= 200) .OrderByDescending(p => p.CyclomaticComplexity) .ToList(); return View(data); }

}

For the Index method, we create a simple model with two properties, DisplayName and Location. The view will show the DisplayName and use the location as the name parameter to the Detail action. In the Detail method we filter the results to show only the results where the Cyclomatic Complexity is greater than 10 or the Number of Instructions is greater than 200. As we outlined in the Metric Driven Refactoring article, these methods warrant a close review and may need to be refactored. In this example, we are simply displaying the results on a web page. This could easily be incorporated into a daily build to track when key metrics fall outside of standards. There are lots of useful information available from Reflector using similar Conclusion

While this is not a widely publicized feature, the ability to host .NET Reflector within a .NET application opens up some interesting possibilities. I've shown you how to get started, but there are many potentially interesting ways that .NET Reflector can be used, with the methods I've described here. The code for this project is in the speech-bubble at the top of the article

© Simple-Talk.com The ASs of Distributed Computing 27 February 2012 by Buck Woody

What's The 'Cloud'? nothing more than one or more of three different types of distributed service, conceptually similar to any other service such as telephone or gas. These services provide Infrastructure, Software and platform. Buck Woody cuts a trail through the jungle of marketing verbiage to reveal the technology behind the Cloud..

he term “Cloud” is, sometimes intentionally I think, far too vaguely defined. I tend to use the term Distributed Computing to refer to this new way of Tworking. Distributed Computing is simply operating some or all of your computing needs somewhere else - often operated by an external vendor rather than an internal team. Even that description is probably too simplified to be practical. “Some or all of your computing” leaves a lot open to interpretation, so the industry developed a few more terms that are technically more specific describing which parts of the computing are sourced to another location, and how they are structured. This leads to all kinds of other discussions, such as architectures, performances and pricing. As with any discipline, focusing on specifics leads to further distinctions that become unworkable at general levels. For instance, we call a canine house pet a “dog”, although a zoologist would argue that this is far too vague - we should use the term Canis lupus familiaris. A fellow zoologist would argue even this is too non-specific, and would classify the animal using even more terms. At some point the description includes whether the dog is domestic or feral, or even if the animal is alive or dead! But for most of us, stating “When a man’s best friend is a dog, then that dog has a problem” is enough. We all know what the sentence means. But we run into the same issue with clarifying the Distributed Computing discussion. The industry has defined three primary terms - with more coming each week, it seems - describing the ways you can distribute your computing needs. The terms are: Infrastructure as a Service Software as a Service Platform as a Service

The “as-a-Service” moniker is both useful and much-maligned. I find it quite descriptive, because it explains that what precedes the moniker is being delivered, operated, controlled and maintained by someone else. Although even that might be a little misleading… I’ll explain that nuance in a moment. In general, each of these terms means that the setup, operation and maintenance for each of these areas are handled just like a utility handles providing you power, clean water, and phone service. Note that this means that the choice of the provider of those services matters a great deal. For instance, let’s take the phone example. Many of us have strong preferences for a particular phone vendor because of the selection of phones, the uptime, charges, coverage and more. Many of those same considerations come into play when you’re selecting not only an architecture preference, but the vendor that provides them. This moves you away from the straight technology question to more along the lines of business questions like security, experience and trust. Let’s explore where these terms are specific enough, and some of the interesting benefits and considerations when using each paradigm, and a few of the architectural patterns available in each. Understand that I’ll stick to the most basic components of these terms, so there is a lot of room for further interpretation of each. But just like many parts of life, various levels of understanding can be useful - not everything needs to be defined down to the atomic level. Infrastructure as a Service - IaaS

Moving part or all of an infrastructure was one of the earliest use of the term “cloud”. And it all started with virtualization. Computers at their most basic normally involve four components - a Central Processing Unit (CPU), some sort of persistence storage (I/O), non- persistent memory (RAM), and a network interface (NIC). Since the very earliest days of the mainframe, technical design experts realized that these components were merely facilities to store and manipulate data. They theorized, and later built, software components that emulated each of these components - and virtualization was born. As mainframes matured, as early as 1972 an entire operating system was used to hold a virtualization environment. The virtualization we recognize today has been in use since before the microcomputer.

Interesting but useless side note: The earliest names for technology did not sound like an infant named every web site on the planet, and all projects were not named after some child’s stuffed toy. Most had a mix of acronyms or numbers, such as VM/CMS. But the earliest “mascot” for virtualization started with, and remains - a teddy bear!

At first enterprises were slow to adopt virtualization at the PC Server level. The hypervisors weren’t mature enough to handle high-intensity loads, so Virtual Machines (VM’s) were limited to things liked development systems, testing systems and personal workstations. But as the hypervisors increased in performance, the “one application / one physical server” rule began to relax, to the point of serious, real-time workloads - even databases - became acceptable targets for virtualization. With virtualization entrenched in the workplace, it became common in a relatively short time to not only remove multiple physical servers and collapse them onto fewer physical “hosts”, even those physical machines could be removed. After all, most companies have a data center - a building where the physical computers are housed - since the cost of setting up the proper electrical, temperature and safety levels are better suited outside the location where the company does its regular business. In fact, many of these facilities began to be shared among companies. You might not need an entire data center, and its commensurate costs. Companies sprang up with the sole purpose of housing physical computers for a company’s virtualization environment. These facilities (often called co-locations or colo’s) provided a service, arguably one of the first “as-a- Service” offerings, of not only hosting these physical computers, but often even providing the hardware itself. And the “cloud” (in practice if not in marketing term) was born. From there, it was simply a matter of time before the next step in servicing this need was taken - and the formal “Cloud Computing” environment was created by vendors and offered for sale on a wide scale. Primary Characteristics

So this brief but necessary history lesson brings us to the basic characteristics that set Infrastructure-as-a-Service apart from other architectures. The first is that the asset (CPU, RAM, NIC and I/O) are presented essentially “raw”. You can treat the asset as if you completely control it - within the boundaries of the vendor that provides it. For instance, in the case of storage (I/O), a “drive” is presented to you for access across the public Internet or in some cases a paid-for direct network tunnel for better performance. But what you do with the storage, and in some cases even the file system you use, is up to you. It’s just a resource for you to consume, located somewhere else. The second characteristic is that of abstraction. Although that drive is presented to you as if it were a Storage Area Network (SAN) disk or local hard drive, you have no control over the actual physical components underneath. For that matter, you don’t even know what those components are. And in many cases, it’s possibly several levels of more software before you actually hit hardware. This allows the vendor to shift your assets around, replace them with faster/newer/cheaper components, and so on. Which brings up the third characteristic of IaaS - the provisioning and servicing aspect. In fact, this characteristic is shared among all of the “as-a- Service” designations, and is actually the heart of the “cloud” or Distributed Computing. The key here is that the vendor provides not only the buildings and the facilities, the personnel, the power, cooling, computers and other hardware, but they have some sort of system that allows you to simply request what you want to use - called provisioning. If you’re a technical professional involved in your own company’s infrastructure, this is something you normally provide. Someone submits a request for a computing need, and you’re expected to figure out how much capacity you have, how much you need, and to build and provide the system. A Distributed Computing vendor for IaaS now takes over that task. The next part of this characteristic is the servicing and maintenance of the systems. Someone has to ensure the system is up, functional and performing well. In the case of IaaS, this is often where the servicing aspect stops. All of this has an interesting side-effect regarding purchasing and maintaining software for the system. While the vendor handles everything from the facilities through the hardware abstraction, the point of demarcation is the software from the Operating System (OS)-up. Compatible Use-Cases

Since IaaS simply virtualizes physical servers and handles the provisioning aspects, you could say that most any computing need will run on this architecture. Of course, there are some practicality issues that make that too broad of a blanket statement. You normally start at the operating system installation. Most of the time, this is a pre-configured image with a selection of approved “images” to choose from, which is completely understandable. After all, even virtualized machines need drivers compatible with their host software, and the vendor wants to limit how many they have to support to be able to be responsive. So you’ll need to do some investigation on which operating systems you require for your application. That being said, since you have complete control over the operating system, many INSTALL.BAT or SETUP.EXE -type programs are suited for IaaS systems. You can also cluster systems or figure out a scale-out solution, but since you don’t have complete control over the hardware and infrastructure, you’ll need to work with your IaaS vendor to ensure things work the way you expect. Considerations

With all of the “as-a-Service” solutions, there are security considerations. There may or may not be a way to connect your enterprise security environment (such as Active Directory) to the IaaS environment - check with your vendor to see how that can be implemented. Even so, you can’t rely on things like encryption alone to secure your private assets. Any time someone has access to your environment, special care needs to be taken in the security arena. Federation, in which you allow not only your local users access to an asset but also folks from other areas (such as customers) with their own security mechanisms needs to be thought through carefully. In an IaaS environment, although your software is initially licensed and patched to a certain level for you, from there on these items will be your responsibility. Not only will you need to apply patches for the VM internal environment, but any operating system, coding platform and other software will need to be treated as if you still own it onsite. Upgrading and patching needs to be built in to your operations just as it always has. Another consideration is the aforementioned scale paradigms. Virtual Machines can only reach a certain number of CPUs, NICs and so on before it becomes impossible to scale them “up” any higher. You may have a fabric of software you use in order to scale these systems outward, which is the proper way to handle increasing loads in a Distributed computing environment. Ensure that you understand how your fabric of choice works with your vendor’s IaaS environment. Software as a Service - SaaS

Probably one of the easiest “as-a-Service” paradigms to understand is that of Software as a Service. Simply put, this is an environment where you log on to a remote system, use the software via a series of screens and buttons, and log off. There’s often nothing to install, and little to configure to begin using it. An example that even pre-dates the current “cloud” moniker is a remote financial application. Companies for years have been using a SaaS environment to handle established patterns in software systems like accounting and finance, and payroll. In fact, when I visit many companies to talk about their use of Distributed Computing, they tell me they aren’t currently using any. I then bring up this example and they are surprised to learn they’ve been using SaaS for a long time. Primary Characteristics

A SaaS offering is composed of a set of software, often on the web, that is running on a set of remote servers. The Operating System, hardware, scale and all other aspects of computing are normally obfuscated from the users. In most part, a SaaS offering is set for a particular series of screens or application paths. However, some SaaS offerings can be customized, which is what leads a few purists to debate this term. In fact, some are merely groups of functions that need to be customized before any kind of use, which makes them less SaaS and more “Some-other-function-as-a-Service” - but if there are no Virtual Computing environment factors to consider, no code to write, and nothing to deploy, it normally fits the description of SaaS. Compatible Use-Cases

For the most part, other than with the caveats already mentioned, SaaS is well-suited to a “best fit” solution. If you have the need to run an office suite of software and your connectivity is robust, there are multiple on-line solutions available - nothing to install, nothing to license, just use and (in many cases) pay. Considerations

Which brings up the first consideration with a SaaS solution -the cost. It’s important to understand how you’ll be billed for using the service. Free offerings may be fine, but most of those are not licensed to be used within a company. Even if they were, nothing can be operated for free, so it’s either ad-supported or perhaps the vendor has access to your private data. Support is another consideration. You need to ensure you have support available, that it is robust, and that it works. Since you’re essentially outsourcing an entire function, you need to be able to rely that your users can get support and training when they need it. It should be noted that one of the biggest considerations is connectivity. While the SaaS vendor may have amazing facilities, great uptime, perfect support and response, if the users can’t get to it, it’s down for them. Some vendors solve this issue by installing local software that caches the data when offline – like modern e-mail software. Finally, ensure you understand data management in a SaaS provider. Who owns your data? Who has access to it? How is it backed up, and how is it restored? Platform as a Service - PaaS

One of the newest, and sometimes a little more complicated “as-a-Service” offerings, is “Platform as a Service” or PaaS. In this paradigm, not only is the hardware, virtualization and other infrastructure controlled, provisioned and maintained, but also the Operating System and in most cases even the scale-out paradigm. You write code, deploy it to the service, and your users use it as a SaaS offering. Primary Characteristics

If you think about IaaS as the ability to use a Virtual Computer, you can think of PaaS as a complete deployment environment available for your use. Imagine for a moment you write some code, and hand that un-compiled code to a friend for her to compile and run on her server. Often the PaaS solution comes with multiple components. Not only can you deploy code to run, but storage systems, queues, caches, secure connections, federations, and other services are available. The way that you interact with the PaaS environment depends on the vendor. In some, you write code on a local system and compile, deploy, test and use the software on the vendor’s PaaS environment. In others, you get a local emulation of the PaaS, and can do everything “offline” until you’re ready to deploy the tested code to “production” - once again, the vendor’s PaaS. Compatible Use-Cases There are multiple places where a PaaS makes sense - most involving when you need to write your own code, although as an aside many PaaS providers allow pre-configured packages (such as Hadoop for big data and so on) to be deployed with no code at all. A flexible environment where you need to be able to write and deploy code quickly fits well in a PaaS environment. There’s nothing to plan for or configure, the version is always “now” on the operating system, and it’s just ready to accept your code. There are no licenses to buy, no viri scan to configure and so on. Another well-suited use-case is elastic scale - meaning that the system needs to be able to grow, sometimes quickly and massively, and then shrink back down. In some cases this is a manual effort, in others it can be coded directly into the deployed application. Anything facing the public Internet that needs to interact with both customers and internal stakeholders fits well in a PaaS. There are even connectors and components to allow federated security, making the deployment even more in line with standard patterns. If you do in fact have code you have written within your organization, a PaaS solution can provide rapid “hybrid” integration - allowing functions that need to have the speed or elasticity advantages simply coded in to the current software. This allows, for instance, customers to enter data on a web page and the internal accounting, sales, fulfillment and other departments to access that data and combine it with their internal access - all without exposing the internal network needlessly. Considerations

Stateless programming is essential for scale in a PaaS, or actually any kind of Distributed Computing. Since most SETUP.EXE applications expect their own server, they are sometimes not well suited to PaaS. Backup and recovery is a shared effort between you and the PaaS vendor. There are things you can do to ensure that you are planning for these kinds of events right in your code, and many PaaS vendors provide features in their platform to ensure availability and continuity. The languages supported by the PaaS vendor are important. Some PaaS vendors lock you into only one or two languages, others provide multiple choices. There are other “AS’s” that are being explored and phased into the lexicon. Data as a Service - DaaS, Reporting as a Service - RaaS, and many others have been announced, with more on the way. It’s important to note that Distributed Computing is here to stay, but it isn’t meant as a replacement for the way we work today. It’s a supplement, just as every other computing paradigm has been. Physical servers, virtualized private environments, colos, and even mainframes are all still here. The technical professional person is most useful when he or she takes the time to learn the several ways of working with technology, and applies them properly to the business problem at hand.

© Simple-Talk.com Tuning Red Gate: #4 of Some

Published Thursday, February 23, 2012 12:00 AM

First time connecting to these servers directly (keys to the kingdom, bwa-ha-ha-ha. oh, excuse me), so I'm going to take a look at the server properties, just to see if there are any issues there. Max memory is set, cool, first possible silly mistake clear. In fact, these look to be nicely set up. Oh, I'd like to see the ANSI Standards set by default, but it's not a big deal. The default location for database data is the F:\ drive, where I saw all the activity last time. Cool, the people maintaining the servers in our company listen, parallelism threshold is set to 35 and optimize for ad hoc is enabled. No shocks, no surprises. The basic setup is appropriate. On to the problem database. Nothing wrong in the properties. The database is in SIMPLE recovery, but I think it's a reporting system, so no worries there. Again, I'd prefer to see the ANSI settings for connections, but that's the worst thing I can see. Time to look at the queries, tables, indexes and statistics because all the information I've collected over the last several days suggests that we're not looking at a systemic problem (except possibly not enough memory), but at the traditional tuning issues. I just want to note that, I started looking at the system, not the queries. So should you when tuning your environment. I know, from the data collected through SQL Monitor, what my top poor performing queries are, and the most frequently called, etc. I'm starting with the most frequently called. I'm going to get the execution plan for this thing out of the cache (although, with the cache dumping constantly, I might not get it). And it's not there. Called 1.3 million times over the last 3 days, but it's not in cache. Wow. OK. I'll see what's in cache for this database: SELECT deqs.creation_time, deqs.execution_count, deqs.max_logical_reads, deqs.max_elapsed_time, deqs.total_logical_reads, deqs.total_elapsed_time, deqp.query_plan, SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, (deqs.statement_end_offset - deqs.statement_start_offset) / 2 + 1) AS QueryStatement FROM sys.dm_exec_query_stats AS deqs CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp WHERE dest.dbid = DB_ID('Warehouse') AND deqs.statement_end_offset > 0 AND deqs.statement_start_offset > 0 ORDER BY deqs.max_logical_reads DESC ; And looking at the most expensive operation, we have our first bad boy:

Multiple table scans against very large sets of data and a sort operation. a sort operation? It's an insert. Oh, I see, the table is a heap, so it's doing an insert, then sorting the data and then inserting into the primary key. First question, why isn't this a clustered index? Let's look at some more of the queries. The next one is deceiving. Here's the query plan:

You're thinking to yourself, what's the big deal? Well, what if I told you that this thing had 8036318 reads? I know, you're looking at skinny little pipes. Know why? Table variable. Estimated number of rows = 1. Actual number of rows. well, I'm betting several more than one considering it's read 8 MILLION pages off the disk in a single execution. We have a serious and real tuning candidate. Oh, and I missed this, it's loading the table variable from a user defined function. Let me check, let me check. YES! A multi-statement table valued user defined function. And another tuning opportunity. This one's a beauty, seriously. Did I also mention that they're doing a hash against all the columns in the physical table. I'm sure that won't lead to scans of a 500,000 row table, no, not at all. OK. I lied. Of course it is. At least it's on the top part of the Loop which means the scan is only executed once. I just did a cursory check on the next several poor performers. all calling the UDF. I think I found a big tuning opportunity. At this point, I'm typing up internal emails for the company. Someone just had their baby called ugly. In addition to a series of suggested changes that we need to implement, I'm also apologizing for being such an unkind monster as to question whether that third eye & those flippers belong on such an otherwise lovely child.

PowerShell for SharePoint Developers 28 February 2012 by Dave McMahon

For some reason, Sharepoint developers haven't taken to PowerShell with the same enthusiasm as the DBAs and SysAdmins. Dave McMohan is a man on a mission to explain that PowerShell can provide plenty of power for repetitive tasks and, once learned, can mean very quick scripting.

s a group, I would say that we .NET and SharePoint Developers are pretty poor generally when it comes to PowerShell. Seeing that it’s A‘command line stuff’, most Developers I know will either turn their noses up at it, or give me the ‘rabbit in the headlights’ look then make a hasty exit, when I suggest they try it. To be fair, I did the same a few months back but, as with all things when faced with the necessity to do something, we normally come up trumps. I’m pleased to report that I’m now relaxed with PowerShell and even take a perverse pleasure in doing things on the command line; Just because I can. If you want this to be you, then read on! The aim of this article is mainly to get you, a SharePoint Developer, comfortable with the idea of using PowerShell to carry out repetitive tasks. I’ll approach this pretty much as I approached the job of teaching myself; I decided I would learn as much as I needed to know to get by on the particular task I had at hand, and then move on from there. A secondary aim of this article is to provide you with a few very basic scripts which as a Developer you will probably want to use, and which whilst they are all there out on the great and good Internet, it’s nice to have a little package of useful stuff in one place to refer to from time to time. By the way, if you want any further motivation to learn PowerShell, don’t forget that STSADM the previous scripting option for SharePoint is deprecated and so there is no guarantee it will be available in future editions. Microsoft are very clear on this, PowerShell is the future of scripting with SharePoint! The examples I’ve included in this article were all written in the SharePoint 2010 Management Shell which you find at Start -> All Programs -> Microsoft SharePoint 2010 Products. PowerShell

Looking back on my progress in learning PowerShell, there were just a few things I needed to know to really get going and get productive: Learn how to get PowerShell to run my scripts Learn how to get help Learn how to declare a variable Learn how to Work with Collections Learn how to work with Logical Operators Learn how to use Boolean values

I’m going to go through all these very quickly then as we work through a few examples I’ll add in a few extra bits’ ‘n pieces. Learn how to get PowerShell to run my scripts

On the face of it this may seem a strange thing to say. PowerShell is a command shell; right? For running scripts; right? So why should I have to do anything to PowerShell to get it to do its primary purpose? It all comes down to that word which most developers loathe and fear. Security. However, since, by using PowerShell, you can do pretty much anything to both the machine you are on and remote machines too, it’s only common sense to have some safeguards in place which try to protect you from malicious scripts. I’m not going to go into the world of PowerShell Best Practice here. To do things as you should on a production system then I highly recommend you read and implement the instructions in the following Scott Hanselman article. If you want to just play around on your laptop then from the Windows start menu select All Programs -> Microsoft SharePoint 2010 Products -> SharePoint 2010 Management Shell and then type the following:

PS C:> Set-ExecutionPolicy Unrestricted

Contrary to popular belief this will not allow you to run scripts willy-nilly. Each time you run a script which is not "signed” it will prompt you to confirm that you wish to run it. Good enough to my mind for development. If you know 100% that you will not be running any scripts other than your own, you could use Bypass instead of Unrestricted as it disables any prompts and warnings, I’ll leave that decision to you. So now you know how to actually get PowerShell to run your scripts. Learn How to Get Help One thing about PowerShell is its syntax is pretty simple and, thankfully, pretty consistent. If I want to get some help on PowerShell you only have to type the following:

PS C:> get-help

I think even I can remember that. If I want to get help on a specific SharePoint PowerShell command, say Enable-SPFeature, then I would type:

PS C:> get-help Enable-SPFeature

This then gives me everything I need to know about the command and also how to get to some examples and detailed technical information. Job done. Learn How to Declare a Variable

As a Developer we’re in the business of working with variables. We’re always assigning and reading variables. It’s something we do best. So how do you do it in PowerShell? We use the $ symbol, thus if I wanted to use a variable called myInt I would refer to it as $myInt. Easy enough and you will be glad to know that PowerShell is case insensitive. So by means of a real example, The first command in the following block assigns a collection of the Service instances to a variable, the second command simply outputs the contents of the $services variable to the console:

PS C:> $services = Get-SPServiceInstance; PS C:> $services TypeName Status Id ------Secure Store Service Online 5c1a5db1-3b58-45a1-8d7a-b3d56e8327f7 Microsoft SharePoint Foundati... Online 305e3461-834c-4a7d-8b04-fe6bfb5fd68d Microsoft SharePoint Foundati... Online 3585503d-e49e-4c75-9b80-cc3df2dd1658 SharePoint Server Search Online 96f199cb-80d0-4938-a80f-c82239db7d57 Search Query and Site Setting... Online 658e22fc-7ca6-4d6b-895f-6140e3fe52c5 Microsoft SharePoint Foundati... Online 8b81a9b7-148f-4ac3-ab6b-1052cc78391a Central Administration Online 3f1e33bd-4b64-4303-831a-6cb6d39cf1fb Microsoft SharePoint Foundati... Online f9ea50d0-356c-47ca-8b24-e1fe09d3b157 Microsoft SharePoint Foundati... Online 4412f136-8fba-424e-849e-25d3b7df601c SharePoint Foundation Help Se... Online ebf68004-fb57-42c1-a6ce-a23873d3c1f7

Learn how to Work with Collections

Quite often you end up with a collection variable. This happens because although Powershell is aware of types variables can actually hold any type. So taking our $services variable above, just type a dot ‘.’and then a . You will see the Count Property show.

PS C:> $services.Count

If you keep hitting you will go through all the available properties and methods on the type currently held in the $services variable. The Count is a dead giveaway that you are dealing with a collection variable, if you want further confirmation, keep typing until you see the GetEnumerator() method. In the above example with the Get-SPServiceInstance we ended up with a collection variable, but often you only want to go to work on a single object. How to get to the single object? In this case the syntax is:

PS C:>foreach($service in $services){Write-Host $service.TypeName} Managed Metadata Web Service User Profile Synchronization Service Business Data Connectivity Service Secure Store Service Claims to Windows Token Service Microsoft SharePoint Foundation Workflow Timer Service PerformancePoint Service Application Registry Service Microsoft SharePoint Foundation Sandboxed Code Service Visio Graphics Service SharePoint Server Search Document Conversions Launcher Service Document Conversions Load Balancer Service Search Query and Site Settings Service Web Analytics Web Service Microsoft SharePoint Foundation Web Application Central Administration

This is very C#-ish I think you’ll agree? The Write-Host command simply outputs the result to the console window so that you can see the results which are listed. If you want to do a LINQ type of query where you filter a list with a where clause, you can type the following if you have a single server farm and want to get a reference to the Secure Store Service:

PS C:>$service = Get-SPServiceInstance | where {$_.TypeName –eq "Secure Store Service" }; PS C:\>$service TypeName Status Id ------Secure Store Service Online 5c1a5db1-3b58-45a1-8d7a-b3d56e8327f7

Let’s take a look at this example in a little more detail as it contains the first unique sort of PowerShell syntax and a first real ‘gotcha’ if you are not careful. First off we have the ‘pipe’ between ‘Get-SPServiceInstance’ and ‘where’. Now most people make a big thing of the ‘pipe’ and indeed it is a very powerful feature. Basically each expression or statement in PowerShell produces an output which can be used as an input to another expression or statement. I’d highly recommend that as you get more familiar with PowerShell you get into the habit of using it. However when you first start out, you will find you don’t need it as much as you might think, and I think that excessive use of the pipe can lead to incomprehensible code for beginners. Think about who is coming after you, not how ‘cool’ your code looks! Next we have the $_ construct which is the PowerShell way of accessing each element of a collection, an implied loop variable if you like. This essentially allows you to do a trawl through a collection on one line, the collection in this case being the collection of SharePoint Service Instances. So the $_ represents in this case a single SharePoint Service Instance. Finally, and this is the ‘gotcha’, we have the logical comparison operator ‘-eq’. Learn how to work with Logical Operators

Why not use the ‘=’ as in VB or ‘==’ sign as in C# for comparison? Well the latter is not valid PowerShell syntax and the former is the assignment operator. See this article for details about the various comparison operators. So the PowerShell command:

PS C:> if ($s == 4) { Write-Host ‘true’ };

Will error, as ‘==’ is invalid syntax.

PS C:> if ($s = 4) { Write-Host ‘true’ };

Will always return ‘true’ as $s is assigned to the value 4.

PS C:> if ($s -eq 4) { Write-Host ‘true’ };

Will write ‘true’ to the console only if the value stored in the $s variable is 4. So if you accidentally use the ‘=’ instead of ‘-eq’, you would rename all your SharePoint Service Instances to have a TypeName of ‘Secure Store Service’ if SharePoint allowed that. You can see that in certain circumstances this could have serious consequences! You have been warned! As an aside, by the way, all the logical operators in PowerShell are preceded with a dash so to do a logical AND you would use ‘-and’, and likewise for OR you would use ‘-or’. Simple enough to remember. Why did the designers of the language choose –eq as opposed to ==.? I don’t really know, but I’m guessing they assumed that most people who had done scripting on Windows used VB script and were familiar with the ‘=’ sign being the comparison and assignment operator. I’m further assuming they wanted to clearly differentiate between the two in PowerShell, hence the –eq syntax. Learn how to use Boolean values Thankfully this is easy enough, as it should be, but a little odd in its syntax. A Boolean value is identified by the terms $true and $false. Note the dollar signs before the words. Don’t ask me why. You can use them just like normal variables which I guess is why they are the way they are. When using them in conjunction with cmdlets they are generally used a little differently though. All PowerShell cmdlets which have side-effects have a "- Confirm” switch which if set to $true prompts the user to confirm they wish to carry out the action. So if I wanted to start up PerformancePoint Service instance I would type something like:

PS C:\> Get-SPServiceInstance | where { $_.TypeName -eq "PerformancePoint Service" } | Start- SPServiceInstance -Confirm:$true

Note the colon after the –Confirm parameter. It won’t work without it, and this is how you set a Boolean switch parameter. In a similar vein the representation for a Null Reference is $null. Using PowerShell with SharePoint

So what’s a useful thing to do with PowerShell as a Developer? Well as a developer you need to deploy your SharePoint WSP solution and activate any features which are in those solutions. Now, in your development environment, Visual Studio does all that for you. However, in order to bridge the gap to your production system you can use PowerShell to Add and Install your solution and you can also enable your features. In fact you really can only use PowerShell, or STSADM to add a solution to the Farm Solution Store, so let’s see the PowerShell for each of those:

C:>Disable-SPFeature –Identity "MyFeature Name" –Url http://MySharePoint C:>Uninstall-SPSolution –Identity "sp.wsp" –WebApplication "http://MySharePoint" C:>Remove-SPSolution –Identity "sp.wsp"

They’re pretty simple and self-explanatory I hope. I’m assuming your SharePoint Site is at http://sp, your Solution package is called sp.wsp and your feature is called "MyFeature Name”. Obviously change the names to reflect your environment. Note that in Powershell unlike STSADM, you need to use absolute paths. You might like to rollback your deployment in which case you need to deactivate your features, uninstall and delete your solution:

C:>Disable-SPFeature –Identity "MyFeature Name" –Url http://MySharePoint C:>Uninstall-SPSolution –Identity "sp.wsp" –WebApplication "http://MySharePoint" C:>Remove-SPSolution –Identity "sp.wsp"

Pretty useful. But I suspect you don’t want to have to type all of those lines separately. You’d like to combine them into a script, maybe with parameters which you can run from the command line on a single line. However all our commands so far have relied on using the Microsoft SharePoint 2010 Management Console which through its configuration has pre-loaded the necessary PowerShell library for SharePoint. When we run our scripts, most likely we’ll want to use something like Pre and Post Build command or MSBuild. We need to make sure that PowerShell loads the correct library. Installing SharePoint registers the necessary DLL, but we need to load that DLL into our PowerShell session. So let’s look at another bit standard PowerShell to help us.

PS C:>$snapin = get-pssnapin | where { $_.Name -eq "Microsoft.SharePoint.PowerShell" }; PS C:>if($snapin -eq $null) >>{ >>Write-Host "Loading SharePoint PowerShell Snapin" >>Add-PsSnapin Microsoft.SharePoint.PowerShell >>} >> PS C:>

The first line is checking to see if the Powershell is loaded. If you write the above at the command prompt, you will see the >> appear after the closing bracket of the if clause when you hit enter. PowerShell knows when it has an incomplete statement and prompts you with >>. To get out of the multi-line mode, just hit enter twice. This PowerShell will allow us to run our PowerShell script without having to use the SharePoint 2010 Management Shell, as this script uses the Add-PSSnapin command which adds in the Registered Snapin Microsoft.SharePoint.PowerShell . We can then run our SharePoint PowerShell commands safely from knowing we have access to the relevant libraries. To complete our first PowerShell for SharePoint script, open up Notepad or Notepad++ or your favourite text editor and type in the following: Param ($action) $snapin = get-pssnapin | where { $_.Name -eq "Microsoft.SharePoint.PowerShell" }; if($snapin -eq $null) { Write-Host "Loading SharePoint PowerShell Snapin" Add-PsSnapin Microsoft.SharePoint.PowerShell } if ($action –eq "install") { Add-SPSolution –LiteralPath "C:\sp.wsp"; Install-SPSolution –Identity "sp.wsp" –WebApplication "http://sp" –GACDeployment; Enable-SPFeature –Identity "MyFeature Name" –Url "http://sp" } if ($action –eq "uninstall") { Disable-SPFeature –Identity "MyFeature" –Url "http://sp" Uninstall-SPSolution –Identity "sp.wsp" –WebApplication "http://sp" Remove-SPSolution –Identity "sp.wsp" }

Save the file as test.ps1. Now open a standard windows console and type ‘PowerShell’ then enter. This will open the standard windows PowerShell console. Assuming you have a SharePoint farm with the URL http://sp and a solution package called sp.wsp which contains a feature called ‘MyFeature Name”, then to install the feature and enable it type:

PS C:>.\test.ps1 install

To deactivate and uninstall the feature and solution type:

PS C:> .\test.ps1 uninstall

Note that we type ".\test.ps1” and not just "test.ps1”. Powershell uses fully qualified path names. If you want to do more advanced PowerShell, develop scripts, functions and so forth then I suggest you use PowerGUI which you can download from here. It is a great tool and I highly recommend its use when working with PowerShell. It provides a VS2010 like experience complete with Intellisense and tooltips.

In Summary

Hopefully this brief and rapid introduction to PowerShell for SharePoint has given you a taster to what you can do. My aim has been to get you over those first few hurdles and to get even minimally productive with PowerShell. You should now be able to: Open the SharePoint 2010 Management Console Set Execution Policy Get PowerShell help Assign variables Work with Loops Work with PowerShell logical operators Work with PowerShell Boolean values Create and run a PowerShell Script in any PowerShell window. I hope this has been of some use and that you now will not run a mile when somebody suggests to you that you should work with PowerShell. Good luck!

© Simple-Talk.com