Homepage

Opportunity Nokia's [Thu, 17 Feb 00:00] The editor finds hope in 's alliance with Nokia. These are two companies that need each other's help in the Smartphone market, and the result could surprise the industry.... Correlating SQL Server Profiler with Performance Monitor [Wed, 16 Feb 00:00] Both Performance Monitor and SQL Server Profiler provide valuable information about your server. However, if you combine the data from the two sources, then this correlated information can help find some of the subtler bugs. Brad explains how. SQL Scripts Manager with PowerShell [Thu, 17 Feb 00:00] SQL Scripts Manager was released as a Christmas present to Simple-Talk subscribers. William Brewer then wrote an appreciation of the tool. Now, he reveals a secret:: It also runs PowerShell scripts, and hence, SMO. He has the scripts to prove it, though hopefully, you'll soon be running your own PowerShell from SQL Scripts Manager. What Counts for a DBA: Skill [Tue, 15 Feb 00:00] “Practice makes perfect:” right? Well, not exactly. Different types of programming require different skill sets and you as a programmer must recognize the difference between the procedural languages and SQL, and treat them differently. Skill comes from practicing doing things the right way and making “right” a habit. Look-up Tables in SQL [Tue, 01 Feb 00:00] Lookup tables can be a force for good in a relational . Whereas the 'One True Lookup Table' remains a classic of bad database design, an auxiliary table that holds static data, and is used to lookup values, still has powerful magic. Joe Celko explains.... The Presentation Isn't Over Until It's Over [Mon, 14 Feb 00:00] Phil Factor was challenged to blog about his best and worst presentations. The worst was obviously the one he has already recounted when a realistic toy pistol rolled out accidentally from his briefcase. This presentation, on the other hand...... Raw Materials: Dinner Out [Wed, 16 Feb 00:00] Derek sees all, knows all, won't say how. Partitioning Your Code Base through .NET Assemblies and Visual Studio Projects [Thu, 10 Feb 00:00] Should every Visual Studio project really be in its own assembly? And what does 'Copy Local=True' really mean? Patrick Smacchia is no stranger to large .NET projects,is well placed to lay a few myths to rest, and gives some advice that promises up to a tenfold increase in speed of compilation. Web Testing with Selenium Sushi: A Practical Guide and Toolset [Wed, 09 Feb 00:00] How does one test the user-interface of a web application? Too often, the answer seems to be 'clumsily, slowly, and not very well'. The technology of automated, repeatable, testing of websites is still developing, but it exists; and Michael Sorens is here to describe an even better approach based on Selenium Hitting the Ground Running with in .NET 4.0 [Tue, 01 Feb 00:00] With the arrival of Parallel Extensions in .NET 4.0, the concurrent programming powers traditional reserved for the most elite of developers are now available to all of us. With multi-core processors increasingly becoming the norm, this is good news, and Jeremy Jarrell gives us the essential knowledge we'll need to get started. | Next | Section Menu | Main Menu |

Inside the tent.... Occasional Editorial announcements.

Opportunity Nokia's

Published Thursday, February 17, 2011 3:25 PM

Nokia’s alliance with Microsoft is likely to be good news for anyone using Microsoft technologies, and particularly for .NET developers. Before the announcement, the future wasn’t looking so bright for the ‘mobile’ version of Windows, Windows Phone. Microsoft currently has only 3.1% of the Smartphone market, even though it has been involved in it for longer than its main rivals. Windows Phone has now got the basics right, but that is hardly sufficient by itself to change its predicament significantly. With Nokia's help, it is possible. Despite the promise of multi-tasking for third party apps, integration with Microsoft platforms such as Xbox and Office, direct integration of Twitter support, and the introduction of IE 9 “later this year”, there have been frustratingly few signs of urgency on Microsoft’s part in improving the Windows Phone product. Until this happens, there seems little prospect of reward for third-party developers brave enough to support the platform with applications. This is puzzling when one sees how well SQL Server and Microsoft’s other server technologies have thrived in recent years, under good leadership from a management that understands the technology. The same just hasn’t been true for some of the consumer products. In consequence, iPads and Android tablets have already exposed diehard Windows users, for the first time, to an alternative GUI for consumer Tablet PCs, and the comparisons aren’t always in Windows’ favour. Nokia’s problem is obvious: Android’s meteoric rise. Android now has 33% of the worldwide market for smartphones, while the market share of Nokia’s Symbian has dropped from 44% to 31%. As details of the agreement emerge, it would seem that Nokia will bring a great deal of expertise, such as imaging and Nokia Maps, to Windows Phone that should make it more competitive. It is wrong to assume that Nokia’s decline will continue: the shock of Android’s sudden rise could be enough to sting them back to their previous form, and they have Microsoft’s huge resources and marketing clout to help them. For the sake of the whole Windows stack, I really hope the alliance succeeds. by Andrew Clarke

| Section Menu | Main Menu | Correlating SQL Server Profiler with Performance Monitor

16 February 2011 by Brad McGehee

Both Performance Monitor and SQL Server Profiler provide valuable information about your server. However, if you combine the data from the two sources, then this correlated information can help find some of the subtler bugs. Brad explains how.

n the past, when watching the % Processor Time counter in Performance Monitor on my live production SQL Servers, I would occasionally see Isudden spikes in CPU utilization, to 50, 70 or even 80%. These spikes might last several minutes or more, then disappear. In some extreme cases I would see spikes that lasted 30, or even 60 minutes. I always wanted to know which SQL Server activity was causing the spike, but the problem was that I had no way of correlating a specific statement running in SQL Server to a specific resource usage spike. With SQL Server 2005 Profiler, I now have the tools to identify the causes of such spikes. I can import Performance Monitor log data and compare it directly with Profiler activity. If I see a spike in CPU utilization, I can identify which statement or statements were running at the same time, and diagnose potential problems. In this article, I will describe how to perform a correlation analysis using Profiler and Performance Monitor, covering: How to collect Profiler data for correlation analysis How to collect Performance Monitor data for correlation analysis How to capture Profiler traces and Performance Monitor logs How to correlate SQL Server 2005 Profiler data with Performance Monitor data How to analyze correlated data

I assume you have a basic working knowledge of Performance Monitor (sometimes called System Monitor) as well as Profiler, in order to focus on how to use the two tools together. If you need further information on the basics of using Performance Monitor, check out Books Online. How to Collect Profiler Data for Correlation Analysis

While it is possible to correlate most Profiler events to most Performance Monitor counters, the area of greatest correlation is between Profiler Transact-SQL events and Performance Monitor counters that indicate resource bottlenecks. This is where I focus my efforts, and the following sections describe how I collect Profiler data for correlation with Performance Monitor. As always, feel free to modify my suggestions to suit your own needs. The key, as always when using Profiler, is to capture only those events and data columns you really need in order to minimize the workload on your production server when the trace is running. Events and Data Columns

My favorite Profiler template, when correlating Profiler trace data with Performance Monitor counter data, is the one I outlined in How to Identify Slow Running Queries. This template collects data on the following events: RPC:Completed SP:StmtCompleted SQL:BatchStarting SQL:BatchCompleted Showplan XML

In addition, I include these data columns: Duration ObjectName TextData CPU Reads Writes IntegerData DatabaseName ApplicationName StartTime EndTime SPID LoginName EventSequence BinaryData

Note that in order to perform an accurate correlation between Profiler and Performance Monitor data, you need to capture both the StartTime and EndTime data columns as part of your trace. Filters

The only filter I create is based on Duration, because I want to focus my efforts on those SQL Statements that are causing the most problems. Selecting the ideal Duration for the filter is not always easy. Generally, I might initially capture only those events that are longer than 1000 milliseconds in duration. If I find that there are just too many events to easily work with, I might "raise the bar" to 5000 or 10000 milliseconds. You will need to experiment with different durations to see what works best for you. In the example for this article, I use 1000 milliseconds. I don't filter on DatabaseName, or any other data column, because I want to see every statement for the entire SQL Server instance. Performance Monitor counters measure the load on an instance as a whole, not just the load on a single database. Ordering and Grouping Columns

I don't do any grouping, but I generally order the data columns in an order that works well for me. You can perform any grouping or aggregation you want, but it won't affect the correlation analysis, and so I generally omit it. How to Collect Performance Monitor Data for Correlation Analysis

I assume you know the basics of using Performance Monitor, but in case you don't know how to create logs, I will describe in this section how to set up a Performance Monitor log to capture counter activity, which can then be correlated with Profiler trace events.

NOTE: Performance Monitor comes in different versions, depending on the operating system, and the way logs are created differs from version to version. In this example, I am using the version of Performance Monitor that is included with Windows Vista and Windows 2008.

The activity data collected by Performance Monitor can be displayed "live" on a graph, or you can store it in a log file, using what is called a user defined data collector set. In order to correlate Performance Monitor data with Profiler trace data, you must store the activity data in a log file. This log file can then be imported into Profiler for the correlation analysis. Performance monitor provides a wizard to help you do this, which entails three main steps: Creating a new Log file definition Selecting Performance Counters Creating and saving the Log file Defining a new Performance Monitor Log File

On starting Performance Monitor, you will see a screen similar to the one shown in Figure 1-1. By default, Performance Monitor operates in "live graphing" mode, which shows the graph being created on the screen in real time. Figure 1-1: Start Performance Monitor. The appearance of the screen will vary somewhat from OS to OS. In Vista and SQL Server 2008, a Performance Monitor log is referred to as a Data Collector Set. To set up a new data collector set (i.e. log file), double-click on "Data Collector Sets" then right-click on "User Defined" and select "New | Data Collector Set", as shown in Figure 1-2:

Figure 1-2: You need to create a new "Data Collector Set." You will be presented with the "Create a new Data Collector Set" screen, as shown in Figure 1-3: Figure 1-3: Give the Data Collector Set its own name. Assign the Data Collector Set a name, such as "System Correlation". At the bottom of the screen, select "Create Manually" and click "Next". I recommend that you use the manual option over the template option because you have more flexibility when selecting the events you want to collect. The screen shown in Figure 1-4 appears:

Figure 1-4: You want to create a "Performance Counter" data collector set. To create our Performance Monitor log, check the box next to "Performance Counter" and click "Next". The other events that can be collected are of no use to us when performing our correlation analysis. Selecting Performance Counters for the Log File

The next screen in the wizard, shown in Figure 1-5, allows you to select the counters you'd like to record and save in the log file. Figure 1-5: You now need to select the Performance Monitor counters you want to capture as part of your log file. Performance Monitor offers several hundred counters, many more than there are Profiler events. However, you don't want to select more counters than you need, as it will just make correlation analysis that much more difficult. My goal is to select only those Performance Monitor counters I need to identify key CPU, disk I/O, and memory bottlenecks within SQL Server. With this is mind, I generally track two different counters for each of the three key bottleneck areas: LogicalDisk: % Disk Time – Indicates the activity level of a particular logical disk. The higher the number, the more likely there is an I/O bottleneck. Be sure to select those counters for the logical drives that contain your mdf and ldf files. If you have these separated on different logical disks, then you will need to add this counter for each logical disk. LogicalDisk: Avg. Disk Queue Length – If a logical disk gets very busy, then I/O requests have to be queued. The longer the queue, the more likely there is an I/O bottleneck. Again, be sure to select those counters for each logical drive that contains your mdf and ldf files. Memory: Available Mbytes – Measures how much RAM is currently unused, and so available for use by SQL Server and the OS. Generally speaking, if this drops below 5mb, this is a possible indication of a memory bottleneck. Memory: Pages/sec – Measures how much paging the OS is performing. A high number may indicate a potential memory bottleneck. Processor: % Processor Time: _Total – Measures the percentage of available CPUs in the computer that are busy. Generally speaking, if this number exceeds 80% for long periods of time, this may be an indication of a CPU bottleneck. System: Processor Queue Length – If the CPUs get very busy, then CPU requests have to be queued, waiting their turn to execute. The longer the queue, the more likely there is a CPU bottleneck.

The first two are for LogicalDisk, the second two are for Memory, and the last two (although they have different instance names) are for the Processor. I find that using two counters per area, rather than one, provides just enough information to identify the cause of most bottlenecks. You will probably want to modify the above list to suit your own needs and environment, but it's a good starting point. Having selected your performance counters, the screen will look similar to Figure 1-6: Figure 1-6: Select those counters that best meet your needs. Click "OK" to proceed, and the screen shown in Figure 1-7 returns:

Figure 1-7: Set the "Sample Interval." The next step is to choose how often Performance Monitor counter data is to be collected. The default value is once every 15 seconds. However, when it comes to performing a Profiler and Performance Monitor correlation analysis, accurate timing is important, so I highly recommend that you select a sample interval of 1 second. The upside to this is that a 1-second interval will help you to better see the correlation between Profiler events and Performance Monitor counters. The downside is that you will collect a lot of data very quickly. Generally, this is not a problem if you capture a minimum number of counters and don't run your trace for hours at a time. Creating and Saving the Log File

Once you have entered the sample interval, click "Next", and the screen shown in Figure 1-8 appears: Figure 1-8: Specify where you want Performance Monitor logs to be stored. Specify where you would like to store the Performance Monitor log. Any place will work, but you don't want to forget this location as you will need to be able to retrieve the log file for importing into Profiler later. Once you have specified the log location, click "Next" to continue and the screen shown in Figure 1-9 appears:

Figure 1-9: You can specify if this data collector set is to be the default set or not. If this is the only data collector set you have, then it will be set to the default data collector set. If you have other data collector sets, you can choose one of those to be the default. From the perspective of this analysis, the default setting is irrelevant. Either way, make sure that "Save and Close" is selected, and then click "Finish" to save your work. The wizard will close, returning you to the Performance Monitor. You will see your new Log file, Profiler Correlation, listed under Data Collector sets, as shown in Figure 1-10: Figure 1-10: Once you are done, your screen should look similar to this. Collecting Performance Monitor Data

We are now done with the hard work. The only thing left to do is to start collecting the "Profiler Correlation" data. To do this, right-click on the "Profiler Correlation" data collection set and select "Start", as shown in Figure 1-11:

Figure 1-11: Right-click on the name of the Data Collector Set to start and stop it. To stop collecting activity data, right-click on the "Profiler Correlation" data collection set and click "Stop". You can also schedule Data Collection Sets to start and stop automatically, using the Vista-based Performance Monitor main menu. In the next section, we will look at the best ways to start and stop both Profiler traces and Performance Monitor data collection sets. How to Capture Profiler Traces and Performance Monitor Logs

Now that we've created both a Profiler trace and a Performance Monitor Data Collector Set, we are ready to start both the tools and begin collecting the data that we will later correlate. Here are some points to keep in mind when running these tools to capture data: Run your copy of Profiler and Performance Monitor on a computer other than the SQL Server you are monitoring. Both the Profiler trace and the Performance Monitor logs should be started and stopped at about the same time. For example, if you decide to run an analysis for a two-hour period, start both tools at the 8:00 AM and end them at 10:00 AM. If the traces don't occur at the same time, they cannot be correlated. Be sure that the SQL Server instance that you are monitoring, and the computer on which you are running Profiler and Performance Monitor, are in the same time zones. If they are not, the data cannot be correlated correctly. Make sure that the physical server running your SQL Server instance is not doing other work that could interfere with the analysis, such as running a different program or performing a backup. The only way to perform an accurate correlation analysis is to ensure the SQL Server is performing only regular production activity. As I have said many times, be sure that you only collect the minimum necessary number of events, data columns, and counters that you need for your analysis, especially when collecting data every second. We want to minimize the impact that Profiler and Performance Monitor have on the system. Run your trace and log files at a "representative" time of day. For example, if you find that your server's resources are peaking almost every morning between 9:00 AM and 11:00 AM, then this is the best time to capture your data. Monitor the size of the Profiler trace file and Performance Monitor log file during the capture, to ensure that not too much data is being collected. If the file sizes get too big, you may have to stop your data capture sooner than you planned.

Once you have completed capturing the data for both tools, you are ready to perform the correlation analysis. How to Correlate SQL Server 2005 Profiler Data with Performance Monitor Data

Correlating Performance Monitor and Profiler data is a straightforward process that simply involves importing both sets of data into Profiler. Start Profiler and load the trace file you want to correlate. It should be displayed on the screen, as shown in Figure 1-12, just as for any other trace:

Figure 1-12: Start Profiler and load the trace file. From the main menu of Profiler, select File | Import Performance Data, as shown in Figure 1-13: Figure 1-13: Select File|Import Performance Data

NOTE: If the "Import Performance Data" option is grayed out, exit Profiler, then restart it, reload the trace data, and try again.

The screen shown in Figure 1-14 appears:

Figure 1-14: Select the Performance Monitor log you want to correlate with your Profiler data. Locate your Performance Monitor log file and then click "Open":

Figure 1-15: You must select the counters you want to correlate with your Profiler data. The screen shown in Figure 1-15 allows you to select which counters to display as part of the correlation analysis. Ideally, you should include no more than about six, or the analysis screen (shown in Figure 1-17) will get too busy and be hard to read. In this example, only six counters were collected, so import them all by clicking on the checkbox next to the server's name, as shown in Figure 1- 16:

Figure 1-16: This is what the screen looks like if you select all the counters. Once you have selected the counters you want to include, click on "OK." The correlation analysis screen appears, as shown in Figure 1-17: Figure 1-17: The correlation analysis screen can be hard to read unless it is displayed on a big screen. As you can see, despite the fact that we limited ourselves to six performance counters, the screen is still very busy (although I did have to shrink the screen to fit this page). The next section discusses how to read and analyze this correlated data. How to Analyze Correlated Data

Before we analyze the data, let's take a closer look at the screen in Figure 1-17. It's divided into four sections. The top section of the screen (see figure 1-18) is one that we are already very familiar with. It lists all the captured Profiler events in the order they occurred, and data columns are ordered the way you specified when you created the trace:

Figure 1-18: We have seen this Profiler screen many times before. You should also be familiar with the bottom section of the screen (figure 1-19). It displays the contents of the TextData column for the event selected in Figure 1-18. In this case, a ShowPlan XML event was selected, so we see a graphical execution plan for the query following this event: Figure 1-19: The bottom screen displays the value of the TextData data column of the event selected in the top screen. So far, so good. Now let's examine the middle section of the screen, shown in Figure 1-20:

Figure 1-20: The second and third screens are show above. It shows a line graph of all the counter activity for the duration of the trace, below which is a table showing each of the counters used in the graph. This table includes lots of valuable information, such as minimum, maximum, and average values for each counter. If the line graph becomes too busy, you can deselect some of the counters, making the screen easier to read. For example, if I remove all the counters except for % Processor Time, the screen looks as shown in Figure 1-21:

Figure 1-21: It is much easier to view only one counter at a time. It is much easier to view the graph one counter at a time, but bear in mind that you may then miss out some valuable information, such as how one counter relates to another. Another way to make it easier to view the line graph activity is to zoom in on the data, using a hidden zoom option. For example, consider the original graph, shown again in Figure 1-23, which is very difficult to read: Figure 1-22: This screen is impossible to read. If we zoom in on a particular time range, it becomes much easier to see what is going on. To zoom in, click on the graph at a start point, say 4:29 PM. Holding down the left mouse button, drag the mouse pointer to an end point, such as 4:30 PM and then release the mouse button. The screen will zoom in, displaying the time range you specified and making the screen much more readable, as you can see in Figure 1-23:

Figure 1-23: This line graph is much easier to read. If you want to zoom back out again, right-click on the line graph and choose "Zoom Out". As you can probably tell, correlation analysis is a manual process. There is nothing automated about it. There are two different ways to approach a correlation analysis:

1. Start from Performance Monitor – identify time periods where a certain resource has become a bottleneck, and then look for corresponding Profiler events in that period. 2. Start from Profiler – identify a long running event and then examine the performance counter data to see if this event has caused a resource issue

We'll examine each technique in turn. Correlation Analysis Part One: Starting from Performance Monitor

Of the different ways you can correlate data, I find the simplest is to identify long period of excess resource activity and then drill down until you identify the statement or statements that are causing the problem. So, let's say we want to start out reviewing Performance Monitor data, looking for areas where one or more server resources have became a bottleneck for an extended time period. Having done this, we can then identify those Profiler events that occurred during the same time period, and so try to locate the cause of the stress on those server resources. The first step I take is to maximize the screen size, in order to view the data in a much more readable fashion. Next, I select a single performance counter at a time and look for periods of time where that resource is maxed out (or close). In this example, I am going to focus on % Disk Time because I want to identify statements that have a significant impact on disk I/O. Having deselected all counters other than %Disk Time, and zoomed in to a short, 2-minute time period, the line graph looks as shown in Figure 1-24: Figure 1-24: In this small amount of time, disk I/O reached 100% six times. As you can see, there is a period of about a minute and a half where % Disk Time reached 100%. Obviously, whenever a disk is operating at 100% of capacity, this is a very strong indication of a disk I/O bottleneck. Of the six times the line chart reaches 100% disk activity, the fourth has the longest duration. Our next job is to find out what Profiler events were running during this time period. To do this, click on the line at the left of the fourth spike. A red line should appear where you clicked, and the event that was executing at the time indicated by the red line should be highlighted, as shown in Figure 1-25:

Figure 1-25: When I click on the left hand side of the line graph, it turns red and also highlights the Profiler event in the top of the screen. Now click on the right side of the spike, and a red line should appear, indicating when the resource spike ended, as shown in Figure 1-26: Figure 1-26: The red line is now at the right side, or the end of, the peak. Notice also that the highlighted line at the top of the screen is now reflecting a different event (or row) on the Profiler screen. This indicates the event that was running when the spike ended. To make this clearer, Figure 1-27 shows an expanded section of the Profiler trace, with the events that mark the start and end of the resource spike highlighted in blue:

Figure 1-27: The two rows highlighted in blue indicate the top and bottom boundaries of the events that were running during the spike in % Disk Time. Using Profiler, you can only see one blue line at a time. I have cheated here to make my point more obvious. By selecting the beginning and ending points of our resource spike on the line graph, we have identified the events that occurred during this time period. Hopefully these events will look familiar to you, as we have seen them before in this book. Essentially, we see four events that represent the execution of a single statement within a stored procedure: Row one is a SQL:BatchStarting event and indicates that a stored procedure is about to execute Row two shows the execution plan for the statement within the stored procedure. Row three is the execution of the statement within the stored procedure. This is where the actual work takes place. Row four, while shown to be outside the time line, is actually a part of this single query. The SQL:BatchCompleted event indicates that the stored procedure has completed.

As you can see with this example, the timing correlation may not always be 100% perfect, but it will be close. So what did this exercise tell us? Essentially, we now know why the %Disk Time spiked as it did. Not only can we see that the statement within the stored procedure was the cause of the spike, we can also see from the Profiler data that the statement had to read a total of 158,375 pages total and make 161 page writes, in order to return 85 rows of data. In addition, we can look at the graphical execution plan for the statement, and consider ways to rewrite the query so that it is more efficient. Although this is not a dramatic example, it shows you how you can correlate a spike in server resource activity in the line graph, with specific events running inside SQL Server. Correlation Analysis Part Two: Starting from Profiler

For this analysis, we'll start from the Profiler data then drill down into the Performance Monitor activity. For example, let's say that you are reviewing the Profiler trace data and find an event that takes 11190 milliseconds to run. Eleven seconds is a long time for a query to run, so you want to find out if running this particular query harms Server performance. The first step is to identify the beginning and ending events that comprise the statement. In this example, the events are shown in Figure 1-28:

Figure 1-28: To begin this analysis, first identify all the events that comprise a single statement. We see that a stored procedure has executed a single statement and that this is encapsulated within four events. We can also see that the statement executed within the stored procedure took 14,623 page reads to return 72,023 rows of data. Now let's see how this particular statement affected server resource usage, as indicated by our Performance Monitor counters. The process is simply the "reverse" of that described in the previous section. When you click on the first event, SQL:batchStarting, a red line will appear on the line graph, indicating the point in time where that event occurred, as shown in Figure 1-29:

Figure 1-29: I have zoomed in on the line graph to make it easier for you to see where the statement starts. You can immediately see that this Profiler event preceded a spike in disk I/O activity. When you click of the fourth of the four events, SQL:BatchCompleted, a red line appears at the point in time this event occurred (i.e. when the statement completed executing), as shown in Figure 1-30: Figure 1-30: By looking at figure 1-29 and 1-30, you can see the activity that occurred when the statement executed. You can see that the completion of execution of the statement marks the end of the 100% spike in disk activity. This is strong evidence that this statement is having an impact on server resources. Now it is up to you to decide if the impact is big enough to merit action and, if so, what steps you will take to fix the query so that is runs more efficiently in the future. Summary

In this article, you learned how to correlate Profiler events with Performance Monitor activity. This gives you the ability to find out what statement might be causing a spike in server resource usage, or to examine a single statement and find out how running it affects resource utilization. Correlation analysis is a manual process, so it can be time-consuming and needs practice. However, once you master it, you will find it an invaluable tool for identifying relationships between Profiler events and actual physical server activity.

This article is adapted from an extract of Brad's book 'Mastering SQL Server Profiler'. Since the book was published, Red Gate have released SQL Monitor, a monitoring and alerting tool for SQL Server. Click here to get a free copy of Brad's book and download a free trial of SQL Monitor.

© Simple-Talk.com SQL Scripts Manager with PowerShell

17 February 2011 by William Brewer

SQL Scripts Manager was released as a Christmas present to Simple-Talk subscribers. William Brewer then wrote an appreciation of the tool. Now, he reveals a secret:: It also runs PowerShell scripts, and hence, SMO. He has the scripts to prove it, though hopefully, you'll soon be running your own from SQL Scripts Manager.

he free tool SQL Scripts Manager (SSM) does much more than just run SQL Scripts. It will also run IronPython and PowerShell scripts. By Tusing PowerShell, you’ll be able to run SMO for a wide range of tasks for which PowerShell scripts are provided, but now you'll have a standard front-end for the parameters, and help text, and to see both the progress and results of the script. For me, it is a great advantage to have a single way of running scripts, whether they’re SQL, PowerShell or Python, within a single interface. SSM allows you some very nice facilities for getting the user-interface right. When you need to provide scripts that other people may be required to run, or might have to be run by remoting in via a tablet, then the user interface requires a new level of attention. Why?

I've been to a lot of IT 'shops' where the various essential scripts for maintaining the servers are in a rather mixed state of development, and it isn't always clear as to what they do and what parameters you give to them. They come in a plethora of scripting languages, and you have to look in a number of directories to find them. A muddle. The user of SQL Scripts Manager (SSM) is no longer forced to have three or more different places on the PC to have to navigate to in order to run scripts. This is particularly useful where scripts must be done in a particular order, and if they come in different flavours of scripting language. For the users, there is no real difference as far as they care. It all works as expected. From an IronPython or PowerShell script, it is easy to drive SMO in order to perform a huge range of administrative tasks for SQL Server. Actually, it seems to run any PowerShell script I throw at it. As PowerShell is the means of automating administrative functions across the range of Microsoft servers, you can us SSM for doing them. It is more convenient too, since you have a potential means of running, from one application, a large number of scripts, all of which get their parameters from a User Interface in a similar way, and all of which can display their results or report on progress likewise. How?

Just to recap from my first article, SSM works by looking at a directory of XML-based files, each of which contains a script of some sort, and all the ancillary information that is required in order to run the script. It uses what is in this directory to create a menu of scripts to run. It will poll the directory occasionally to detect changes within the directory: this means that, if you add, alter or delete scripts, it will read in the configuration file to check it, and if it likes what it sees, it then displays it. Otherwise, it highlights the error. Scripts are stamped with a key, to distinguish authenticated from non-authenticated scripts. For the general run-of-the-mill script, the task of embedding the script into the XML wrapper is pretty simple, and there are standard templates provided to help with this, here and here) Running PowerShell from SQL Scripts Manager. Hello world!

To be run in SSM, The PowerShell “Hello World” morphs into the XML document (file-type rgtool) …

Hello World Test

There are two things you’ll notice. These involve the $Progress object that is exposed to PowerShell by SSM. The first is $progress.Success = $true which reports to the UI that the script executed OK. ( for Python, the syntax is RedGate.Progress.Success = True; ). Even if a script produces no errors, it may have failed to do what the user wanted, which is why it has to be set to ‘True’. The second is $progress.Message to which is assigned every message you want to send to the progress log. We’ve chosen just to have the progress log in the final results window, and have used just the ForceLog value in the ‘displaytype’ attribute of the UI output, in order to open it out rather than collapse it.

This script is unusually innocent of an input user-interface. Here is a rather more useful script that lists your server downtime (or Uptime if you look at life that way) and starts by eliciting the name of the server. It creates a grid and writes information into it. Uptime events

This script will list the service start and stop dates on a server General false 1024 250 localhost

You’ll see here that we are just getting the user to specify the name of the computer and passing it to the PowerShell function. The interest here is in getting the data from PowerShell to the grid. Here, you have a $progress object exposed that allows you to write to the grid, specifying the columns and notifying the start of a new row. The line that pasts the description of the event from the Event Log is this… $progress.RowValue('Event', $line.EventCode);

and the date is done likewise with…… $progress.RowValue('Date', $line.TimeGenerated); Where the 'Date' is the key for the column and the second parameter is the object value to be displayed in the current row for that column When a row is complete, the call to the function … $progress.RowFinish() … ensures that the next data is written to the following row of the grid. It bumps the 'current row to the next. The function … $progress.TableFinish() … completes the population of the grid. (for IronPython, you'd use … RedGate.Progress.Success = True;

The progress object is at the heart of integrating a script with the Output UI. There are a number of values and methods that are worth knowing about. These are the main ones. The Public interface of $progress

This is probably the best place to list the attributes and methods of the $progress object, (RedGate.Progress in IronPython) Attributes $progress.PercentageComplete # gets, or sets the percentage completion of the progress bar. $progress.Message # places or retrieves, a message in the message widget $progress.Finished # Set to $true when the script is finished $progress.Success # Flag$true or $false to inform the output UI whether the script was successful

Methods $progress.Log("log message"); #writes the string to the log widget $progress.PrintError("Error message"); #writes the string as an error (in red) $progress.PrintError("fmt", params string[] args); #writes the sPrintf result as an error $progress.RowFinish(); #move on to the next row, making it the current one $progress.RowFinish("rowKey"); # $progress.RowValue("colKey", value); #Write the value into the "colKey" column of current row $progress.ClearRows(); #clear the rows of the current grid

Continuous monitoring with Powershell and SSM There is a great deal of power hidden away in SQL Scripts Manager. Here is a PowerShell script that continuously monitors a process. You can do this in the command line but it will always work more neatly with a GUI

This script will list the running processes on a machine false 1024 250 5 360 General localhost 20

The Output window of the UI

For a start, we've used some trickery to specify the UI display. There are several options you can use for the display and you can combine them together like this script does. None (Not advised, but there for the minimalists. it is better to always include at least OutputLog or ForceLog so that there is something that the application can print error messages to) ProgressBar (Yes, you can specify a progress bar, and call a function to specify % completion. It is really essential for long-running scripts) Message (Traditional message area) Grid (this is a traditional results table which will be most useful for multi-column results.) OutputLog (this is a scrolling 'log' display that, by default, is closed or folded. This is the best type where the predominent display must be the grid) ForceLog (this is a scrolling 'log' display that is fully open, a good substitute for command line! ForceLog is better in most cases since it's not so obvious to the user how OutputLog is unfolded.) and there are some short-cut UI specifications in case you don't like using the OR (|) All = ProgressBar | Message | Grid | OutputLog, Simple = Message | ForceLog ProgressOnly = ProgressBar | Message | OutputLog NeverEnding = Grid | OutputLog

In this script, we'll use the NeverEnding value ;

SSM and SMO: Scripting a database!

When you specify a database within SSM, you get passed an object that gives you a great deal of information. You need to choose the right control for the job in hand. The details are here . The relevant controls are createdatabase which includes a means of specifying a database name and SQL Server instance and the connection credentials. (as, for example, when the database will be created when the script is run). createtable which allows the SQL Server instance and the connection credentials to be specified. It includes a combo box for selecting an existing database and a text box for specifying a table (for example, if you are using the script to import data into a new table). server which allows the SQL Server instance and the connection credentials to be specified. database which allows the SQL Server instance and the connection credentials to be specified. It includes a combo box for selecting a database table which allows the SQL Server instance and the connection credentials to be specified and includes combo boxes for selecting a database and a table.

If you merely want to attach to a server, you would specify this in the UI by means of the server control. You specify the ID that you want to use for the object. If you assume that you’ve specified the id of ‘connection’ to the server control, you can, if you want, access the $connection.ConnectionTimeout, $connection.Datasource, $connection.NetworkLibrary, $connection.PortID, $connection.ExecutionTimeOut, $connection.Encrypt (Boolean), $connection.Credentials and $connection.InitialCatalog which are more than enough for database work. You'll rarely need more than the Datasource and Credentials. You can access all the necessary attributes of the credentials for login via the Credentials object. These values can be accessed by means of $connection.Credentials.Instance, $connection.Credentials.IntegratedSecurity, $connection.Credentials.UserID, and $connection.Credentials.Password This will be necessary in order to use SMO, or SQLClient. if you are supporting SQL Server Security. The User/Password combination will be needed if you aren’t using Windows security, as when you are attaching to servers outside the domain. The Database object has ConnectionTimeout, Datasource, Database, NetworkLibrary, PortID, ExecutionTimeOut, Encrypt (Boolean), Credentials and InitialCatalog With this article, I’ve provided an example that uses the database control and SMO, taken originally from Allen White’s PowerShell code. It is too long to put in the article itself, so it is attached. The part of the code that does the business of connecting to the server is here. Because we've used the database control, and assigned it the 'connection' ID we use the $connection.Datasource and $connection.Database attributes. This script demonstrates how to cater for both integrated security and SQL Server security, in order to access servers outside the domain

if ($connection.Credentials.IntegratedSecurity) { $s = new-object ('Microsoft.SqlServer.Management.Smo.Server') $connection.DataSource } else { $mySrvConn = new-object Microsoft.SqlServer.Management.Common.ServerConnection $mySrvConn.ServerInstance=$connection.DataSource $mySrvConn.LoginSecure = $false $mySrvConn.Login = $connection.Credentials.UserID $mySrvConn.Password = $connection.Credentials.Password $s= new-object Microsoft.SqlServer.Management.SMO.Server($mySrvConn) } $progress.Message = "Connected to "+ $connection.DataSource

Conclusions

Most routine administration scripts for Sysadmin and DBA work require a flexible user-interface that does just a few things well. As far as the input screens go, the standard widgets and specialised database-access widgets are fine. It might be nice to have common filters to validate input but one can get along without them. The output screens aren't going to be wild or wacky, and are generally little more than a place to display messages, or result tables. By putting the basics in place, making TSQL scripting easy, and allowing PowerShell scripting for server administration and SMO, along with IronPython for the more complex procedures involving the local file system, SQL Scripts Manager seems to provide a great way of sorting out the mass of existing scripts and providing a simple means of developing new ones so that, if a server needs attention when they're off-duty, the DBA or SysAdmin can be sure that the essential maintenance scripts can be done by whoever is on-site, or can be easily accessed by remoting in from an iPad. This seems a much better approach than trying to predict all the common routine tasks that the DBA finds scripts for and producing a huge set of apps. With SSM, you can customise scripts to your exact requirement.

© Simple-Talk.com | Next | Section Menu | Main Menu | Previous |

Louis Davidson

What Counts for a DBA: Skill

Published Tuesday, February 15, 2011 1:57 AM

“Practice makes perfect:” right? Well, not exactly. The reality of it all is that this saying is an untrustworthy aphorism. I discovered this in my “younger” days when I was a passionate tennis player, practicing and playing 20+ hours a week. No matter what my passion level was, without some serious coaching (and perhaps a change in dietary habits), my skill level was never going to rise to a level where I could make any money at the sport that involved something other than selling tennis balls at a sporting goods store. My game may have improved with all that practice but I had too many bad practices to overcome. Practice by itself merely reinforces what we know and what we can figure out naturally. The truth is actually closer to the expression used by Vince Lombardi: “Perfect practice makes perfect.” So how do you get to become skilled as a DBA if practice alone isn’t sufficient? Hit the Internet and start searching for SQL training and you can find 100 different sites. There are also hundreds of blogs, magazines, books, conferences both onsite and virtual. But then how do you know who is good? Unfortunately often the worst guide can be to find out the experience level of the writer. Some of the best DBAs are frighteningly young, and some got their start back when were stored on stacks of paper with little holes in it. As a programmer, is it really so hard to understand normalization? Set based theory? Query optimization? Indexing and performance tuning? The biggest barrier often is previous knowledge, particularly programming skills cultivated before you get started with SQL. In the world of technology, it is pretty rare that a fresh programmer will gravitate to database programming. Database programming is very unsexy work, because without a UI all you have are a bunch of text strings that you could never impress anyone with. Newbies spend most of their time building UIs or apps with procedural code in C# or VB scoring obvious interesting wins. Making matters worse is that SQL programming requires mastery of a much different toolset than most any mainstream programming skill. Instead of controlling everything yourself, most of the really difficult work is done by the internals of the engine (written by other non-relational programmers…we just can’t get away from them.) So is there a golden road to achieving a high skill level? Sadly, with tennis, I am pretty sure I’ll never discover it. However, with programming it seems to boil down to practice in applying the appropriate techniques for whatever type of programming you are doing. Can a C# programmer build a great database? As long as they don’t treat SQL like C#, absolutely. Same goes for a DBA writing C# code. None of this stuff is rocket science, as long as you learn to understand that different types of programming require different skill sets and you as a programmer must recognize the difference between one of the procedural languages and SQL and treat them differently. Skill comes from practicing doing things the right way and making “right” a habit. by drsql

| Section Menu | Main Menu | Look-up Tables in SQL

01 February 2011 by Joe Celko

Lookup tables can be a force for good in a relational database. Whereas the 'One True Lookup Table' remains a classic of bad database design, an auxiliary table that holds static data, and is used to lookup values, still has powerful magic. Joe Celko explains.... History

Tables, in a properly designed schema, model either an entity or a relationship, but not both. Slightly outside of the tables in the data model, we have other kinds of tables. Staging tables bring in "dirty data" so we can scrub it and then insert it into base tables. Auxiliary tables hold static data for use in the system, acting as the relational replacement for computations. This is not a new idea. If you can find an old text book (1960's or earlier), there is a good chance you will find look-up tables in the back. Finance books had compound interest, net present value (NPV) and internal rate of return (IIR). Trigonometry books had sines, cosines, tangents and maybe haversines and spherical trig functions. There were no cheap calculators; and slide rulers were good to only three decimal places and required some skill to use. Look up tables were easy for anyone to use and usually went to five decimal places. I still remember my first Casio scientific calculator that I bought with my Playboy Club Key account in 12 monthly installments. The price of that model dropped to less than one payment before I paid it off. These machines marked the end of look-up tables in textbooks. Today, you can get a more powerful calculator on a spike card in the check-out line of a school bookstore. Basic Look-up Table Design

The concept of pre-calculating a function and storing the outputs can be carried over to databases. Programmers do it all the time. Most of the time, they are used for display purposes rather than computations. That is, you are likely to see a table which translates an encoding into a description that a human being can understand. The ISO-11179 data element-naming conventions have the format [_]_. The example used in the base document was an attribute of "tree" with properties like "tree_diameter", "tree_species" and so forth. Some properties do not apply to some attributes -- "employee_diameter" is not something we need to model just yet and "employee_species" is a bit insulting.

The attribute properties that deal with encodings for scales are the candidates for look-up tables. Here is a list and definitions for some of the basic ones I introduced my my book SQL PROGRAMMING STYLE. "_id" = Identifier, it is unique in the schema and refer to one entity anywhere it appears in the schema. A look-up table deals with attributes and their values, not entities, so by definition, this is not used in such tables. That is why things like "_category_id" or "_type_id" are garbage names. Never use "

_id"; that is a name based on location and tell you this is probably not a real key at all. Just plain "id" is too vague to be useful to anyone and will screw up your data dictionary when you have to find a zillion of them, all different, but with the same data element name and perhaps the same oversized data type. "_date" or "dt" = date, temporal dimension. It is the date of something -- employment, birth, termination, and so forth; there is no such column name as just a date by itself. "_nbr" or "num" = tag number; this is a string of digits or even alphanumrics that names something. Do not use "_no" since it looks like the Boolean yes/no value. I prefer "nbr" to "num" since it is used as a common abbreviation in several European languages. "_name" or "nm" = this is an alphabetic name and it explains itself. It is also called a nominal scale. "_code" or "_cd"= A code is a standard maintained by a trusted source outside of the enterprise. For example the ZIP code is maintained by the United States Postal Service. It has some legal authority to it. "_size" = an industry standard or company scale for a commodity, such as clothing, shoes, envelopes or machine screws. There is usually a prototype that defines the sizes kept with a trusted source. "_seq" = sequence, ordinal numbering. This is not the same thing as a tag number, since it cannot have gaps. It also has a rule for successors in the sequence. "_cat" = Category, an encoding that has an external source that has very distinct groups of entities. There should be strong formal criteria for establishing the category. The classification of Kingdom is in biology is an example. "_class" = an internal encoding that does not have an external source that reflects a sub-classification of the entity. There should be strong formal criteria for the classification. The classification of plants in biology is an example. "_type" = an encoding that has a common meaning both internally and externally. Types are usually less formal than a class and might overlap. For example a driver's license might be for multiple kinds of vehicles; motorcycle, automobile, taxi, truck and so forth. The differences among type, class, and category are an increasing strength of the algorithm for assigning the type, class, or category. A category is very distinct; you will not often have to guess if something "animal, vegetable or mineral" to put it in one of those categories. A class is a set of things that have some commonality; you have rules for classifying an animal as a mammal or a reptile. You may have some cases where it is harder to apply the rules, such as the egg laying mammal in Australia, but the exceptions tend to become their own classification -- monotremes in this example. A type is the weakest of the three, and it might call for a judgment. For example, in some states a three-wheeled motorcycle is licensed as a motorcycle. In other states, it is licensed as an automobile. And in some states, it is licensed as an automobile only if it has a reverse gear. The three terms are often mixed in actual usage. For example, a blood_type has a laboratory procedure to obtain a value of {A, B, AB, O}. of you want to know for sure. Stick with the industry standard, even if violates the definitions given above. "_status" = an internal encoding that reflects a state of being which can be the result of many factors. For example, "credit_status" might be computed from several sources. The words "status" comes from "state" and we expect that there are certain allowed state changes. For example, your marital status can change to "Divorced" only if it is "Married" currently.

Here is where programmers start to mess up. Consider this table, taken from an actual posting:

CREATE TABLE Types (type_id INTEGER, type_name VARCHAR(30));

Is this for blood, work visas or what? The table name cannot be more vague. There is no key. The first column is absurd as well as vague. An attribute can be "_type" or "_id" but never both. Entities have identifiers; scalar values do not. Think about a table of mathematical constants and tell me the identifier of pi, e or phi. Type_id is stupid for the same reason. Hey, why not carry this silliness from "type_id" to "type_id_value" and beyond.

Another version of the same disaster, taken from actual postings, is to add a redundant, non-relational IDENTITY table property.

CREATE TABLE Product_Types (product_type_id INTEGER IDENTITY NOT NULL, -- is this the key? product_type_code CHAR(5) NOT NULL -- is this the key? type_generic_description VARCHAR(30) NOT NULL);

All these simple look-up tables need is a column for the _ as the key and the description or name or both. If you don't get the difference between a name and a description, consider the name "Joe Celko" and "Creepy looking white guy" which is a description. A look-up table of three-letter airport codes will probably return a name. For example, the abbreviation code "MAD" stands for "Barajas International Airport" in Madrid. An encoding for, say, types of hotels might return a description, like types

hotel type description R0 Basic Japanese Ryokan, no plumbing, no electricity, no food R1 Japanese Ryokan, plumbing, electricity, Japanese food R2 Japanese Ryokan, like R1 with internet and television R3 Japanese Ryokan, like R2 with Western meal options

A product code will probably return both as name and a description. For example, the name might be "The Screaming Ear Smasher" and the description be "50000 Watt electric guitar and amplifier" in the catalog. If you build an index on _ key, you can use the INCLUDE feature to carry the name and/or description into the index and the table itself is now redundant. One True look-up Table

The One True look-up Table (OTLT) is a nightmare that keeps showing up. The idea is that you put ALL the encodings into one huge table rather than have one table for each one. I think that Paul Keister was the first person to coin the phrase "OTLT" (One True Look-up Table) and Don Peterson (www.SQLServerCentral.com) gave the same technique the name "Massively Unified Code-Key" or MUCK tables in one of his articles. The rationale is that you will only need one procedure to maintain all of the encodings, and one generic function to invoke them. The "Automobiles, Squids and Lady GaGa" function! The technique crops up time and time again, but I'll give him credit as the first writer to give it a name. Simply put, the idea is to have one table to do all of the code look-ups in the schema. It usually looks like this:

CREATE TABLE OTLT -- Generic_Look_Ups? (generic_code_type CHAR(10) NOT NULL, -- horrible names! generic_code_value VARCHAR(255) NOT NULL, -- notice size! generic_description VARCHAR(255) NOT NULL, -- notice size! PRIMARY KEY (generic_code_value, generic_code_type)); The data elements are meta-data now, so we wind up with horrible names for them. They are nothing in particular, but magical generics for anything in the universe of discourse. So if we have Dewey Decimal Classification (library codes), ICD (International Classification of Diseases), and two-letter ISO-3166 country codes in the schema, we have them all in one, honking big table. Let's start with the problems in the DDL and then look at the awful queries you have to write (or hide in VIEWs). So we need to go back to the original DDL and add a CHECK() constraint on the eneric_code_type column. Otherwise, we might "invent" a new encoding system by typographical error. The Dewey Decimal and ICD codes are digits and have the same format -- three digits, a decimal point and more digits (usually three); the ISO- 3166 is alphabetic. Oops, need another CHECK constraint that will look at the generic_code_type and make sure that the string is in the right format. Now the table looks something like this, if anyone attempted to do it right, which is not usually the case:

CREATE TABLE OTLT (generic_code_type CHAR(10) NOT NULL CHECK(generic_code_type IN ('DDC', 'ICD', 'ISO3166', ..), generic_code_value VARCHAR(255) NOT NULL, CONSTRAINT Valid_Generic_Code_Type CHECK (CASE WHEN generic_code_type = 'DDC' AND generic_code_value LIKE '[0-9][0-9][0-9].[0-9][ 0-9][ 0-9]' THEN 'T' WHEN generic_code_type = 'ICD' AND generic_code_value LIKE '[0-9][0-9][0-9].[0-9][ 0-9][ 0-9]' THEN 'T' WHEN generic_code_type = 'ISO3166' AND generic_code_value LIKE '[A-Z][A-Z]' THEN 'T' ELSE 'F' END = 'T'), generic_description VARCHAR(255) NOT NULL, PRIMARY KEY (generic_code_value, generic_code_type));

Since the typical application database can have dozens and dozens of codes in it, just keep extending this pattern for as long as required. Not very pretty is it? Before you think about some fancy re-write of the CASE expression, SQL Server allows only ten levels of nesting. Now let us consider adding new rows to the OTLT.

INSERT INTO OTLT (generic_code_type, generic_code_value, generic_description) VALUES ('ICD', 259.0, 'Inadequate Genitalia after Puberty'), ('DDC', 259.0, 'Christian Pastoral Practices & Religious Orders');

If you make an error in the generic_code_type during insert, update or delete, you have screwed up a totally unrelated value. If you make an error in the generic_code_type during a query, the results could be interesting. This can really hard to find when one of the similarly structured schemes had unused codes in it.

When I update the OTLT table, I have to lock out everyone until I am finished. It is like having to carry an encyclopedia set with you when all you needed was a magazine article

The next thing you notice about this table is that the columns are pretty wide VARCHAR(n), or even worse, that they are NVARCHAR(n) which can store characters from a strange language. The value of (n) is most often the largest one allowed. Since you have no idea what is going to be shoved into the table, there is no way to predict and design with a safe, reasonable maximum size. The size constraint has to be put into the WHEN clause of that second CHECK() constraint between generic_code_type and generic_code_value. Or you can live with fixed length codes that are longer than what they should be.

These large sizes tend to invite bad data. You give someone a VARCHAR(n) column, and you eventually get a string with a lot of white space and a small odd character sitting at the end of it. You give someone an NVARCHAR(255) column and eventually it will get a Buddhist sutra in Chinese Unicode. Now let's consider the problems with actually using the OTLT in a query. It is always necessary to add the generic_code_type as well as the value which you are trying to look-up. SELECT P1.ssn, P1.lastname, .., L1.generic_description FROM OTLT AS L1, Personnel AS P1 WHERE L1.generic_code_type = 'ICD' AND L1.generic_code_value = P1.disease_code AND ..;

In this sample query, you need to know the generic_code_type of the Personnel table disease_code column and of every other encoded column in the table. If you got a generic_code_type wrong, you can still get a result.

You also need to allow for some overhead for data type conversions. It might be more natural to use numeric values instead of VARCHAR(n) for some encodings to ensure a proper sorting order. Padding a string of digits with leading zeros adds overhead and can be risky if programmers do not agree on how many zeros to use. When you execute a query, the SQL engine has to pull in the entire look-up table, even if it only uses a few codes. If one code is at the start of the physical storage, and another is at the end of physical storage, I can do a lot of caching and paging. When I update the OTLT table, I have to lock out everyone until I am finished. It is like having to carry an encyclopedia set with you when all you needed was a magazine article. Now consider the overhead with a two-part FOREIGN KEY in a table:

CREATE TABLE EmployeeAbsences (.. generic_code_type CHAR(3) -- min length needed DEFAULT 'ICD' NOT NULL CHECK (generic_code_type = 'ICD'), generic_code_value CHAR(7) NOT NULL, -- min length needed FOREIGN KEY (generic_code_type, generic_code_value) REFERENCES OTLT (generic_code_type, generic_code_value), ..);

Now I have to convert the character types for more overhead. Even worse, ICD has a natural DEFAULT value (000.000 means "undiagnosed"), while Dewey Decimal does not. Older encoding schemes often used all 9's for "miscellaneous" so they would sort to the end of the reports in COBOL programs. Just as there is no Magical Universal "id", there is no Magical Universal DEFAULT value. I just lost one of the most important features of SQL! I am going to venture a guess that this idea came from OO programmers who think of it as some kind of polymorphism done in SQL. They say to themselves that a table is a class, which it isn't, and therefore it ought to have polymorphic behaviors, which it doesn't. Look-Up Tables with Multiple Parameters A function can have more than one parameter and often do in commercial situations. They can be ideal candidates for a look-up when the computation is complex . My usual example is the Student’s T-distribution, since I used to be a statistician. It is used for small sample sizes that the normal distribution cannot handle. It takes two parameters, the sample size and confidence interval (how sure do you want to be about your prediction). The probability density function is:

Got any idea just off the top of your head how to write this in T-SQL? How many of you can identify the Greek letters in this thing? Me neither. the nice part about using this in the real world is that you don't need all the possible values. You work with a set of three to ten confidence intervals and since it is meant for small samples, you don't need a lot of population values. Here is a table cut and pasted from Wikipedia.com.

One Sided 75% 80% 85% 90% 95% 97.5% 99% 99.5% 99.75% 99.9% 99.95% Two Sided 50% 60% 70% 80% 90% 95% 98% 99% 99.5% 99.8% 99.9% 1 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60 3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92 4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610 5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869 6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959 7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408 8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041 9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781 10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437 12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318 13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221 14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140 15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073 16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015 17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965 18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922 19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883 20 0.687 0.860 1.064 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850 21 0.686 0.859 1.063 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819 22 0.686 0.858 1.061 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792 23 0.685 0.858 1.060 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.767 24 0.685 0.857 1.059 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745 25 0.684 0.856 1.058 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725 26 0.684 0.856 1.058 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707 27 0.684 0.855 1.057 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690 28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674 29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.659 30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646 40 0.681 0.851 1.050 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551 50 0.679 0.849 1.047 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496 60 0.679 0.848 1.045 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460 80 0.678 0.846 1.043 1.292 1.664 1.990 2.374 2.639 2.887 3.195 3.416 100 0.677 0.845 1.042 1.290 1.660 1.984 2.364 2.626 2.871 3.174 3.390 120 0.677 0.845 1.041 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373 ∞ 0.674 0.842 1.036 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291

Unlike the calculus, nobody should have any trouble loading it into a look-up table. In the January 2005 issue of The Data Administration Newsletter (www.TDAN.com) I published an article on a look-up table solution to a more difficult problem. If you watch the Food channel on cable or if you just like Memphis-style BBQ, you know the name "Corky's". The chain started in 1984 in Memphis by Don Pelts and has grown by franchise at a steady rate ever since. They will never be a McDonald's because all the meats are slow cooked for up to 22 hours over hickory wood and charcoal, and then every pork shoulder is hand-pulled. No automation, no mass production. They sell a small menu of items by mail order via a toll-free number or from their website (www.corkysbbq.com) and ship the merchandise in special boxes sometimes using dry ice. Most of the year, their staff can handle the orders. But at Christmas time, they have the problem of success. Their packing operation consists of two lines. At the start of the line, someone pulls a box of the right size, and puts the pick list in it. As it goes down the line, packers put in the items, and when it gets to the end of the line, it is ready for shipment. This is a standard business operation in lots of industries. Their people know what boxes to use for the standard gift packs and can pretty accurately judge any odd sized orders. At Christmas time, however, mail-order business is so good that they have to get outside temp help. The temporary help does not have the experience to judge the box sizes by looking at a pick list. If a box that is too small starts down the line, it will jam up things at some point. The supervisor has to get it off the line, and re-pack the order by hand. If a box that is too large goes down the line, it is a waste of money and creates extra shipping costs. Mark Tutt (On The Mark Solutions, LLC) has been consulting with Corky's for years and set up a new order system for them on Sybase. One of the goals of the new system is print the pick list and shipping labels with all of the calculations done, including what box size the order requires. Following the rule that you do not re-invent the wheel, Mr. Tutt went to the newsgroups to find out if anyone had a solution already. The suggestions tended to be along the lines of getting the weights and shapes of the items and using a Tetris program to figure out the packing. Programmers seem to love to face every new problem as if nobody has ever done it before and nobody will ever do it again. The "Code first, research later!" mentality is hard to overcome. The answer was not in complicated 3-D math, but in the past 4 or 5 years of orders in the database. Human beings with years of experience had been packing orders and leaving a record of their work to be mined. Obviously the standard gift packs are easy to spot. But most of the orders tend to be something that had occurred before, too. Here are the answers, if you will bother to dig them out. First, Mr. Tutt found all of the unique configurations in the orders, how often they occurred and the boxes used to pack them. If the same configuration had two or more boxes, then you should go with the smaller size. As it turned out, there were about 4995 unique configurations in the custom orders which covered about 99.5% of the cases. Next, this table of configurations was put into a stored procedure that did a slightly modified exact relational division to obtain the box size required. A fancy look-up table with a variable parameter list!

© Simple-Talk.com | Next | Section Menu | Main Menu | Previous |

Phil Factor's Phrenetic Phoughts Simple-Talk columnist The wilder shores of Transact SQL Phil on Twitter Phil on SQL Server Central Phil on BOS

The Presentation Isn't Over Until It's Over

Published Monday, February 14, 2011 12:22 AM

The senior corporate dignitaries settled into their seats looking important in a blue-suited sort of way. The lights dimmed as I strode out in front to give my presentation. I had ten vital minutes to make my pitch. I was about to dazzle the top management of a large software company who were considering the purchase of my software product. I would present them with a dazzling synthesis of diagrams, graphs, followed by a live demonstration of my software projected from my laptop. My preparation had been meticulous: It had to be: A year’s hard work was at stake, so I’d prepared it to perfection. I stood up and took them all in, with a gaze of sublime confidence. Then the laptop expired. There are several possible alternative plans of action when this happens A. Stare at the smoking laptop vacuously, flapping ones mouth slowly up and down B. Stand frozen like a statue, locked in indecision between fright and flight. C. Run out of the room, weeping D. Pretend that this was all planned E. Abandon the presentation in favour of a stilted and tedious dissertation about the software F. Shake your fist at the sky, and curse the sense of humour of your preferred deity I started for a few seconds on plan B, normally referred to as the ‘Rabbit in the headlamps of the car’ technique. Suddenly, a little voice inside my head spoke. It spoke the famous inane words of Yogi Berra; ‘The game isn't over until it's over.’ ‘Too right’, I thought. What to do? I ran through the alternatives A-F inclusive in my mind but none appealed to me. I was completely unprepared for this. Nowadays, longevity has since taught me more than I wanted to know about the wacky sense of humour of fate, and I would have taken two laptops. I hadn’t, but decided to do the presentation anyway as planned. I started out ignoring the dead laptop, but pretending, instead that it was still working. The audience looked startled. They were expecting plan B to be succeeded by plan C, I suspect. They weren’t used to denial on this scale. After my introductory talk, which didn’t require any visuals, I came to the diagram that described the application I’d written. I’d taken ages over it and it was hot stuff. Well, it would have been had it been projected onto the screen. It wasn’t. Before I describe what happened then, I must explain that I have thespian tendencies. My triumph as Professor Higgins in My Fair Lady at the local operatic society is now long forgotten, but I remember at the time of my finest performance, the moment that, glancing up over the vast audience of moist-eyed faces at the during the poignant scene between Eliza and Higgins at the end, I realised that I had a talent that one day could possibly be harnessed for commercial use I just talked about the diagram as if it was there, but throwing in some extra description. The audience nodded helpfully when I’d done enough. Emboldened, I began a sort of mime, well, more of a ballet, to represent each slide as I came to it. Heaven knows I’d done my preparation and, in my mind’s eye, I could see every detail, but I had to somehow project the reality of that vision to the audience, much the same way any actor playing Macbeth should do the ghost of Banquo. My desperation gave me a manic energy. If you’ve ever demonstrated a windows application entirely by mime, gesture and florid description, you’ll understand the scale of the challenge, but then I had nothing to lose. With a brief sentence of description here and there, and arms flailing whilst outlining the size and shape of graphs and diagrams, I used the many tricks of mime, gesture and body-language learned from playing Captain Hook, or the Sheriff of Nottingham in pantomime. I set out determinedly on my desperate venture. There wasn’t time to do anything but focus on the challenge of the task: the world around me narrowed down to ten faces and my presentation: ten souls who had to be hypnotized into seeing a Windows application: one that was slick, well organized and functional I don’t remember the details. Eight minutes of my life are gone completely. I was a thespian berserker. I know however that I followed the basic plan of building the presentation in a carefully controlled crescendo until the dazzling finale where the results were displayed on-screen. ‘And here you see the results, neatly formatted and grouped carefully to enhance the significance of the figures, together with running trend-graphs!’ I waved a mime to signify an animated window-opening, and looked up, in my first pause, to gaze defiantly at the audience. It was a sight I’ll never forget. Ten pairs of eyes were gazing in rapt attention at the imaginary window, and several pairs of eyes were glancing at the imaginary graphs and figures. I hadn’t had an audience like that since my starring role in Beauty and the Beast. At that moment, I realized that my desperate ploy might work. I sat down, slightly winded, when my ten minutes were up. For the first and last time in my life, the audience of a ‘PowerPoint’ presentation burst into spontaneous applause. ‘Any questions?’ ‘Yes, Have you got an agent?’ Yes, in case you’re wondering, I got the deal. They bought the software product from me there and then. However, it was a life-changing experience for me and I have never ever again trusted technology as part of a presentation. Even if things can’t go wrong, they’ll go wrong and they’ll kill the flow of what you’re presenting. if you can’t do something without the techno-props, then you shouldn’t do it. The greatest lesson of all is that great presentations require preparation and ‘stage-presence’ rather than fancy graphics. They’re a great supporting aid, but they should never dominate to the point that you’re lost without them. by Phil Factor

| Section Menu | Main Menu | Raw Materials: Dinner Out

16 February 2011 by Larry Gonick

This is the first of a series commissioned to promote Red Gate's new SQL Monitor.

To see all of Larry's cartoons for Simple-Talk click here.

Free! Larry Gonick’s new "Raw Materials: Sheepizing Derek" eBooklet and trial of new SQL Monitor. SQL Monitor’s web UI means that – like Derek – you too can monitor your servers in real time, whenever, wherever via internet-enabled mobile devices. Download the eBooklet and learn more about SQL Monitor now.

© Simple-Talk.com

Partitioning Your Code Base Through .NET Assemblies and Visual Studio Projects

10 February 2011 by Patrick Smacchia

Should every Visual Studio project really be in its own assembly? And what does 'Copy Local=True' really mean? Patrick Smacchia is no stranger to large .NET projects, is well placed to lay a few myths to rest, and gives some advice that promises up to a tenfold increase in speed of compilation.

This article is aimed at Providing a list of DO and DON’T when it comes to partitioning a code base into .NET assemblies and Visual Studio projects. Shedding light on.NET code componentization and packaging. Suggesting ways of organizing the development environment more effectively.

The aim of this is to increase the speed of .NET developer tools, including VS and C#/VB.NET compilers, by up to an order of magnitude. This is done merely by rationalizing the development of a large code base. This will significantly increase productivity and decrease the maintenance cost of the .NET application . This advice is gained from years of real-world consulting and development work and has proved to be effective in several settings and on many occasions.. Why create another .NET assembly?

The design of Visual Studio .NET (VS) has always encouraged the idea that there is a one-to-one correspondence between assemblies and VS projects. It is easy to assume from using the Visual Studio IDE that VS projects are components of your application; and that you can create projects at whim, since by default, VS proposes to take care of the management of project dependency. Assemblies, .exe and .dll files, are physical units. They are units of deployment. By contrast, a component is better understood as logical unit of development and testing. A component is therefore a finer-grained concept than an assembly: An assembly will typically contain several components. Today, most .NET development teams end up having hundreds, and possibly thousands, of VS projects. The task of maintaining a one-to-one relationship between assembly and component will have these consequences Developers’ tools will slow down by up to an order of magnitude . The whole stack of .NET tooling infrastructure, including VS and the C# and VB.NET compilers, work much faster with fewer larger assemblies than it does with many smaller assemblies. Deployment packaging will become a complex task and therefore more error-prone. Installation time and application start-up time will increase because of the overhead cost per assembly. In the case of an API whose public surface is spread across several assemblies, there will be latency because of the burden for client API consumers to figure out which assemblies to reference.

All these common .NET development problems are a consequence of the usage of a physical object, an assembly, to implement a logical concept, a component. So, if we shouldn’t automatically create a new assembly for each component, what are the good reasons to create an assembly? What common practices don’t constitute good reasons to do so? Common valid reasons to create an assembly

Tier separation, if there is a requirement to run some different pieces of code in different AppDomains, different processes, or different machines. The idea is to avoid overwhelming the precious Window process memory with large pieces of code that are not needed. In this case, an assembly is especially created to contain shared interfaces used for communication across tiers. AddIn/PlugIn model, if there is a need for a physical separation of interface/factory/implementation . As in the case of Tier Separation, an assembly is often dedicated to contain shared interfaces used for communication across the plugin and its host environment. Potential for loading large pieces of code on-demand: This is an optimization made by the CLR: assemblies are loaded on-demand. In other words, the CLR loads an assembly only when a type or a resource contained in it is needed for the first time. Because of this, you don’t want to overwhelm your Window process memory with large amounts of code that are seldom if ever required. Framework features separation: With very large frameworks, users shouldn’t be forced to embed every feature into their deployment package. For example, most of the time an ASP.NET process doesn’t do some Window Forms and vice-versa, hence the need for the two assemblies System.Web.dll and System.Window.Forms.dll. This is valid only for large frameworks with assemblies sized in MB. A quote from Jeremy Miller, renowned .NET developer, explains this perfectly: Nothing is more irritating to me than using 3rd party toolkits that force you to reference a dozen different assemblies just to do one simple thing. Large pieces of code, that don’t often evolve (often automatically generated code) can become a drain on developer productivity if they are continuously handled in the developer environment. It is better to isolate them in a dedicated assembly within a dedicated VS solution that only rarely needs to be opened and compiled on the developer workstation. Test/application code separation. If only the assemblies are released rather than the source code, it is likely that tests should be nested in one or several dedicated test assemblies. Common invalid reasons to create an assembly

Assembly as unit of development, or as a unit of test. Modern Source Control Systems make it easy for several developers to work simultaneously on the same assembly (i.e the same Visual Studio project). The unit should, in this case, be the source file. One might think that, by having fewer and bigger VS projects, you’d increase the contentions on sharing VS .sln, .csproj and .vbproj files. But as usual, the best practice is to keep these files checked-out just for the few minutes required to tweak project properties or add new empty sources files. Automatic detection of dependency cycles between assemblies by MSBuild and Visual Studio. It is important to avoiding dependency cycles between components, but you can still do this if your components are not assemblies but sub-set of assemblies. There are tools such as NDepend which can detect dependency cycles between components within assemblies. Usage of internal visibility to hide implementations details. The public/internal visibility level is useful when developing a framework where it is necessary to hide the implementation details from the rest of the world. Your team is not the rest of the world, so you don’t need to create some assemblies especially to hide some implementations details to them. In order to prevent usage and restrict visibility of some implementation details, a common practice is to define some sub-namespaces named Impl, and use tooling like NDepend or others to restrict usage of the Impl sub-namespaces. Merging assemblies

There are two different ways to merge the contents of several assemblies into a single one. Usage of the ILMerge tool to merge several assemblies into a single one. With ILMerge, merged assemblies are losing their identity (name, version, culture, and public key). Embedding several assemblies as resources in a single assemblies, and use the event AppDomain.CurrentDomain.AssemblyResolve to extract assemblies at runtime. The difference with ILMerge is that all assemblies keep their identity.

This doesn’t solve those problems that are due to there being too many VS projects, thereby causing a significant slowdown of VS, and the compiler’s execution time. Reducing the number of assemblies

Technically speaking, the task of merging the source code of several assemblies into one is a relatively light one that takes just a few hours. The tricky part is to define the proper new partition of code across assemblies. At that point you’ll certainly notice that there are groups of assemblies with a high cohesion. These, are certainly candidates to be merged together into a single assembly. By looking at assemblies dependencies with a Dependency Structure Matrix (DSM) such as the one of NDepend, these groups of cohesive assemblies form obvious squared patterns around the matrix diagonal. Here is a DSM taken on more than 700 assemblies within a real-world application:

Increase Visual Studio solution compilation performance

You can use a simple technique to reduce the compilation-time of most real-world VS solutions by up to an order of magnitude, especially when you have already merged several VS projects into a few. On a modern machine, the optimal performance of the C# and VB.NET compiler is about of 20K logical Lines of Code per second, so you can measure the room for improvement ahead. A logical Lines of Code (Loc) represents a Sequence Point. A sequence point is the code excerpt highlighted in dark red in the VS code editor window, when creating a breakpoint. Most of .NET tools for developers, including VS and NDepend measure Lines of Code through sequence points. By default, VS stores each VS project in its own directory. Typically VS suggests the folder hierarchy for a project, named here MyVSProject:

A VS solution typically has several VS projects and, by default, each VS project lives in its own directory hierarchy. At compilation time, each project builds its assembly in its own bin\Debug or bin\Release directory. By default, when a project A references a project B, the project B is compiled before A. However, the assembly B is then duplicated in the bin\Debug or bin\Release directory of A. This duplication action is the consequence of the value ‘True’ having been set by default for the option ‘Copy Local’ of an assembly reference. Whereas it makes sense for a small solution, it will soon cause problems for larger applications As the size and complexity of solutions increase, the practice of duplicating assembly at compilation time is extremely costly in terms of performance. In other words:

Copy Local = true is evil

Imagine a VS solution with 50 projects. Imagine also that there is a core project used by the 49 others projects. At Rebuild-All time, the core assembly will be compiled first, and then duplicated 49 times. Not only this is a huge waste of disk, but also of time. Indeed, the C# and VB.NET compilers don’t seem to have any checksum and caching algorithm to pinpoint whether the metadata of an assembly has already been parsed. As a result, the core assembly has its metadata parsed 49 times, and this takes a lot of time, and can actually consume most of the compilation resources. From now on, when adding a reference to a VS project, make sure first, to add an assembly reference, and second, make sure that Copy Local is set to False (which doesn’t seem to be always the case)

A slight drawback to referencing directly assemblies directly rather than their corresponding VS projects, is that it is now your responsibility to define the build-order of VS projects. This can be achieved through the Project Dependencies panel: Organize the development environment

When Copy Local is set to true, the top level assemblies, typically executable assemblies, automatically end up with the whole set of assemblies that they used being duplicated in their own .\bin\Debug directory. When the user starts an executable assembly, it just works. There is no FileNotFoundException since all the assemblies that are needed are in the same directory. If you set ‘Copy Local = false’, VS will, unless you tell it otherwise, place each assembly alone in its own .\bin\Debug directory. Because of this, you will need to configure VS to place assemblies together in the same directory. To do so, for each VS project, go to VS > Project Properties > Build tab > Output path, and set the Ouput path to ..\bin\Debug for debug configuration, and ..\bin\Release for release configuration.

Now that all assemblies of the solution reside in the same directory, there is no duplication and VS works and compiles much faster. Organisation of Assemblies

If there are many library assemblies and just a few executable assemblies, it might be useful to display only executable ones in the output directory ..\bin\Debug (and in the ..\bin\Release one as well). Library assemblies are then stored in a dedicated sub directory ..\bin\Debug\Lib (and ..\bin\Release\Lib). This way, when the users browse the directory, they only see the executables without the dll assemblies and so can start any executable straight away. This is the strategy we adopted for the three NDepend executable assemblies: If you wish to nest libraries in a sub-lib directory, it is necessary tell the CLR how to locate, at run-time, library assemblies in the sub-directory .\lib. For that you can use the AppDomain.CurrentDomain.AssemblyResolve event, this way:

class Program {

internal static Assembly AssemblyResolveHandler(object sender, ResolveEventArgs args) { string libPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location) + Path.DirectorySeparatorChar + "Lib" + Path.DirectorySeparatorChar; var assembly = Assembly.LoadFrom(libPath + args.Name + ".dll"); return assembly; }

static void Main(string[] args) { AppDomain.CurrentDomain.AssemblyResolve += AssemblyResolveHandler; SubMain(); }

static void SubMain() { // ...

In this piece of code you will notice how we construct the path of the sub-directory, by relying to the properties: Assembly.GetExecutingAssembly().Location bind the event AppDomain.CurrentDomain.AssemblyResolve immediately in the Main() method, and then call a SubMain() method. need the SubMain() method, because if libraries types are called from the Main() method, the CLR tries to resolve libraries assemblies even before the method Main() is called, hence, even before the event AppDomain.CurrentDomain.AssemblyResolve is binded.

Instead of relying on the AppDomain.CurrentDomain.AssemblyResolve event, it is also possible to use an executableAssembly.exe.config file for each executable. Each file will redirect the CLR probing for assemblies to the sub-directory .\Lib. To do so, just add a Application Configuration File for each executable assembly, and put the following content in it:

Web Testing with Selenium Sushi: A Practical Guide and Toolset

09 February 2011 by Michael Sorens

How does one test the user-interface of a web application? Too often, the answer seems to be 'clumsily, slowly, and not very well'. The technology of automated, repeatable, testing of websites is still developing, but it exists; and Michael Sorens is here to describe an even better approach based on Selenium

Contents

Selenium Basics Selenium IDE Selenium Remote Control Simple Selenium Sample IDE Context RC Context Introducing Selenium Sushi Focus Your Visual Studio Qi Initial File Integration Restore the Balance between Selenium IDE and Selenium RC Running with Selenium Sushi Retargeting Environments with Selenium Sushi Conclusion Footnotes

ome testing categories, such as unit testing, are well-supported by tools. User-Interface (UI) testing, however, has challenges all its own, and Sis often looked upon with trepidation because it is difficult to automate. So each new release of a product requires hoards of manual testers clicking and typing away to regression test the UI. Automation attempts to replace humans with programs that do the clicking and the typing. Automated UI testing would be great—if the UI never changed but, particularly during development of a new product, the user interface frequently and constantly changes; that is just the nature of the beast. Even small changes in a UI, though, can easily break automation testing. It is of no consequence to a human if an input field has moved a few pixels or a label has changed from “UserName” to “Username”, but even innocuous changes such as those can cause automation to no longer find what it expects in the window, requiring tremendous effort to maintain automated tests. Broadly speaking, applications with a graphical user interface fall into two types: client applications, those that run on a user’s computer, and web applications, those that run on a remote server and are accessed with a client browser. Automating UI testing for client applications requires overcoming the burdens outlined above, but for web applications the problems are even more severe. A web application must run in many different browsers on different operating systems with different user settings (e.g. javascript enabled/disabled). The permutations are compounded with having to support different versions of each browser (e.g. Internet Explorer 8 vs. Internet Explorer 9). But there is hope—there are a lot of web testing tools available. But unlike some technologies where market dominance has gone to either a single player or a small pool of major players, the collection of web testing tools is a gallimaufry, a patchwork of utilities, plugins, and add-ons each focusing on their own favorite piece of the web. Wikipedia usually has great listings and comparisons of utilities in various categories, but their list of web testing tools is surprisingly sparse. Following the link from there to their list of GUI testing tools produces a slightly longer list, though not limited to just web testing. However, a link from there to SoftwareQATest.com opens the floodgates wide to reveal this huge list of web test tools. I have used only a handful of these tools so I certainly do not claim to be an expert in which tools are better than others, but one I have been using recently that stands out as a very good tool—and appears on all 3 web pages listed above—is Selenium. I measure goodness in a software package by using a simple yardstick against the three aspects, features, documentation, and support. Selenium is a functional web page tester that lets you: Record and playback actions (primarily typing and mouse clicks). Manage tests by organizing into test cases and test case suites. Exercise tests against multiple browsers and platforms. Convert tests to your favorite language (C#, Ruby, PHP, Perl, Python, Java) for full customization.

That short but powerful list of features is one of the main reasons I became enamored with this free product. Another is its 160-page user manual: good documentation, as I am sure you know, is a rarity for software products and the folks behind Selenium did a reasonable job. How do I measure the quality of the documentation? First let me touch upon the third quality of my software judging triumvirate: support. As yet, I have no judgment about Selenium’s support—because I have had no call to use it. That is how I judge the documentation to be good. In the interests of full disclosure I have had questions not answered by the documentation, but a quick web search has revealed the answers thus far.

Any software package, however, has defined boundaries. This article illustrates how to stretch Selenium’s boundaries with Selenium Sushi[1], providing both a support library and a .NET project framework, to make Selenium more useful and make you more productive. Skip down to Table 3 to see at a glance what Selenium Sushi brings to the table; the table is not right here because it helps if you have some understanding of Selenium first. Selenium Basics Selenium IDE

Selenium provides several packages; the first piece you should install is the Selenium IDE, a Firefox add-on. This IDE lets you record and playback tests, but in Firefox only. Once installed, launch the IDE from Firefox’s tools menu and you can immediately start recording actions and familiarizing yourself with Selenium commands. The IDE lets you playback what you record against your currently focused browser window. So if you happen to switch tabs to look up something then try to run your tests against a now “foreign” web page, you will get an outpouring of errors. The IDE is deceptively simple looking (which I think is a mark of a good design). You can play back an entire test case or an entire suite, which is just a collection of test cases). But you can also set both break points and start points within a test case. You could then, for example, run from the start of a test case to the middle by setting a break point. Alternately, execute from the middle to the end by setting a start point. Set both to run several lines in the middle. Or open the context menu on a single command to execute just that one line. Finally, you can even run a portion of a line. That is, a typical command specifies a target on the web page (a button, a link, a piece of text, etc.) Use the Find button at the bottom of the command pane to highlight that target to make sure you have specified it correctly. (This realm of locators, as Selenium calls them, is one of Selenium’s best features, allowing you to specify web elements with XPath or CSS or DOM, among other syntaxes. I hope to have a separate article just focusing on locators.) Within the IDE you can retarget your test to a different server without modifying the test case at all—simply edit the Base URL field at the top of the IDE window. This lets you, for example, run the same test against your QA server and again against your staging server. But be careful—changes to the Base URL are not saved when you save the test case! If you want to permanently change it in the test case, you must open it up in an external text editor to do so. Selenium Remote Control

Running tests in other browsers requires the next Selenium package, Selenium Remote Control. This package includes a server that you run on your local machine and libraries to interface to your preferred language. This RC package, as it is called, is just a zip file; to mesh with the examples I use in this article, unpack it into C:\Program Files\Selenium. The server itself is just a jar file, so launch it with a simple java command: java -jar selenium-server.jar -multiwindow

The -multiwindow option specifies that when it launches browsers Selenium should use two windows, one for your application and one for its control window. The alternative is -singlewindow, where Selenium uses frames; be aware that some web applications may not render properly embedded in a frame, though. Once your server is running you can then execute tests against it (in this case, unit tests from a .NET project). You can also write tests from scratch using Visual Studio’s unit test framework, but starting off I recommend using Selenium IDE to record tests, then export to source code and edit as needed. Selenium natively supports NUnit so that is what I focus on here. Once you understand the mechanics it is a simple matter to use your own favorite test framework, though. Natively, Selenium stores test cases as HTML. The IDE editor has two tabs, Table and Source. The Table tab is where you work with recording steps and fine tuning locators. The Source tab, by default shows the HTML code-behind. To move code from Selenium IDE to .NET, you have several choices.

1. In-Place Conversion Select Options >> Format >> C# to convert the current test case to C# format in the IDE window itself. You could then copy and paste into Visual Studio. As soon as you change the format in this manner, however, you no longer have access to the Table tab nor to the playback controls. To regain access, simply convert back to HTML format. 2. Export to File Select File >> Export to convert the current test case to C# format and store in a file. On the export menu you directly select from the same set of language choices. Your editor format remains unchanged. 3. Single Line Export to Clipboard After you do an initial conversion using (1) or (2) above you will usually tweak and customize the code to your needs. Thus, you do not want to come back later and re-export the whole test case again, but you may want to use the IDE to update specific tests within the test case. Selenium IDE provides a convenient way to convert a single test, again in the format of your choice. First, under Options >> Clipboard Format select your language. Your IDE format (Options >> Format) should remain at HTML. In the Table tab of the IDE editor, click on the line you want to migrate. Convert and copy the line in one action either by keyboard (the standard Control-C on Windows) or by context menu. Invoking Paste in Visual Studio emits the test in the language you selected. Simple Selenium Sample IDE Context

Before delving into Selenium Sushi, take a look at the sample described here. Figure 1 shows the Selenium IDE with one test case containing 18 commands: 16 tests and 2 actions. Tests can be identified by the assert, verify, or wait command prefix, as in verifyText, or waitForTextPresent. Any other command performs an action, such as open or click. (Technically, the clickAndWait command in Figure 1 is both an action and a test.)

Command Type Description Wait. . . Look for the element until it appears; abort if timeout is exceeded. Verify. . . Look for the element; continue whether found or not. Assert. . . Look for the element; abort if not found. Table 1 Basic Differences among Selenium Test Commands Note the difference between assert and verify in Table 1. One typical scenario is to use an initial assert as a sanity check (to make sure you are on the right web page, for example). If you do not find the header or the logo or certain text to indicate this, there is no point in testing further on the page. If you get past your assert commands, then you know you are on the right page, so then do a series of verify commands to validate various page elements. Verify is better than assert so you can see all the errors rather than just stop at the first one. Figure 1 shows the result of actually running the test case—note the green and red bars highlighting the individual tests. Green means a test passed; red means it failed. The lower panel shows the log of the execution with errors marked prominently in red. Careful observation reveals that the rule about assert/abort and verify/continue does not always hold true! Since I used only verify commands the test case should have run to completion, yet the last six lines are white, indicating they have not been executed. The final error in the log shows why: the specified element in the verifyText command was not found; this is apparently a condition considered serious enough by Selenium that it felt compelled to abort. I do not agree with that design decision—hence my bug indicator at left. The workaround for this: change the verifyText to verifyElementPresent, which specifically tests for whether or not the locator points to something on the page. As you can see in the earlier red-marked lines in Figure 1, execution does continue beyond failing commands of that type.

Figure 1 Selenium IDE Showing Results of a Test Case RC Context

The IDE provides a nice GUI for initial test development but you will quickly find you want to customize or to exercise different browsers or different web servers (i.e. a QA server vs. a development server). For these, you need to move from the GUI to code. Use the IDE’s export command as described earlier, I took the test case above and created a file called ExportedTestCase.cs. I imported this into a new Visual Studio project called ExportedRawSample, available in the code archive accompanying this article. To resolve references in the generated code, I added the RC libraries—all the DLLs in C:\Program Files\Selenium\selenium-dotnet-client-driver-1.0.1. After successfully compiling, you can then launch the selenium server, as described earlier, in preparation for processing requests from your tests. The next step is to confirm that you get the same results as from the IDE, as a baseline, by running the unit tests in code. Since Selenium’s tests are NUnit tests rather than native Visual Studio unit tests, you cannot run these tests from Visual Studio’s Test menu. Here are just a few choices to run these tests: With ReSharper, the outstanding code quality analysis, refactoring, and productivity plug-in for Visual Studio, you can run the tests directly from Visual Studio (ReSharper >> Unit Tests >> Run Unit Tests). If you want to debug, choose Debug Unit Tests as the final menu choice. Alternately, you can right-click a file name in the solution explorer and select Run Unit Tests or Debug Unit Tests. Another good alternative is TestDriven.NET, a plug-in dedicated to running NUnit tests from within Visual Studio. Run your unit tests by right-clicking either in the code editor or on a file name in the solution explorer and selecting Run Test(s) or, if you want to debug, Test With Debugger. The GUI from NUnit itself runs tests independent of Visual Studio. This is most appropriate to use after you have debugged your tests and rolled them out, and are ready to focus on what the tests may uncover from your application under test. Once you launch the NUnit GUI, you first create a new project (File >> New Project), add.dll or .exe files to the project (Project >> Add Assembly), select the test(s) of interest from the navigation pane, and press the Run button.

(Note that NUnit is free, but ReSharper and TestDriven.NET are not—unless you are a student or an open-source developer.) When I used ReSharper to launch the tests in ExportedTestCase.cs it opened two Firefox windows as expected, ran through the tests and reported just a single error, the final error reported by Selenium IDE earlier (Figure 2). The other errors shown by Selenium IDE were not reported here due to a Selenium bug. I ran the same tests with the NUnit GUI and the results were the same: only the final error was reported.[2]

Figure 2 ReSharper Results of a Test Case Selenium RC is also afflicted with the same defect mentioned earlier for Selenium IDE for verifyText tests: if the locator fails to find an element, the test case aborts even though it is a Verify test, not an Assert test. (Technically it is the selenium.GetText method, used as an argument to verifyText, that throws an exception.) Finally, take a look at the code generated by the export from Selenium IDE. Here is the first portion of the test case with my comments added to show the IDE command corresponding to each chunk of code. Note the massive code repetition because each IDE command always expands to the same chunk of code.

[Test] public void TheExportedTestCaseTest() { selenium.Open("/");

// * * * * * * waitForTextPresent * * * * * for (int second = 0;; second++) { if (second >= 60) Assert.Fail("timeout"); try { if (selenium.IsTextPresent("regexpi:Privacy policy.*Terms and conditions")) break; } catch (Exception) {} Thread.Sleep(1000); }

// * * * * * * verifyText * * * * * try { Assert.AreEqual("Home", selenium.GetText("//form[@id='aspnetForm']/div[2]/div/div[2]/div[1]/span")); } catch (AssertionException e) { verificationErrors.Append(e.Message); }

// * * * * * * verifyText * * * * * try { Assert.AreEqual("SQL", selenium.GetText("link=SQL")); } catch (AssertionException e) { verificationErrors.Append(e.Message); }

// * * * * * * verifyText * * * * * try { Assert.AreEqual(".NET", selenium.GetText("ctl00_Navigation1_lnkGotoNet")); } catch (AssertionException e) { verificationErrors.Append(e.Message); }

// * * * * * * verifyElementPresent * * * * * try { Assert.IsTrue(selenium.IsElementPresent("//img[@alt='A service from Red Gate']")); } catch (AssertionException e) { verificationErrors.Append(e.Message); }

// * * * * * * verifyElementPresent * * * * * try { Assert.IsTrue(selenium.IsElementPresent( "//a[contains(text(),'Which of Your Stored Procedures are Using the Most Resources?')]")); } catch (AssertionException e) { verificationErrors.Append(e.Message); } . . .

This brief introduction to Selenium RC revealed several defects that I found just during my initial explorations with the product, as well as the major code smell of an overabundance of repeated code. My initial impetus for developing Selenium Sushi was to clean up these code smells, but it addresses the defects mentioned here as well. Just one final note that applies to Selenium RC and Selenium Sushi: The first time you try to test your application on Internet Explorer you will find it excruciatingly slow. There are two key pieces of information needed to alleviate this. First, use -singlewindow rather than -multiwindow when invoking the Selenium server unless your application requires its own window (i.e. it is “frame busting” by using frames itself or having pop- ups). See Hacking Selenium to improve its performance on IE. Second, change the XPath engine used by Internet Explorer from the default ajaxslt to javascript-xpath. Search for useXpathLibrary on this Selenium Reference page or, for a nicer API format, look in your selenium installation at …/javadocs/com/thoughtworks/selenium/Selenium.html. To make this change, there is nothing to download—just add one line of code in your C# library as described in this StackOverflow post How to use javascript-xpath. With Selenium RC you need to add this yourself; with Selenium Sushi, it is already supplied in the Common\Setup.cs file. Table 2 shows the effects of these two configuration changes running against Internet Explorer 8 on Windows XP (All configurations ran very fast against Firefox 3.6).

Window Layout XPath library Relative Performance -multiplewindow ajaxslt Extremely slow -multiplewindow javascript-xpath Moderate -singlewindow ajaxslt Moderate -singlewindow javascript-xpath Fast Table 2 Performance Differences on Internet Explorer Due to Windowing and XPath Libraries Introducing Selenium Sushi

Selenium Sushi supplements Selenium IDE and Selenium RC, providing an assortment of productivity enhancements allowing you, for example, to: retarget different browsers and different application servers without recompiling, handle file URLs automatically and, most importantly, convert a standard code file emitted by the IDE into a substantially smaller piece of source code. (Table 3 itemizes all the enhancements.) You generate code from the IDE and you run the Selenium server, just as described above. You modify the code file emitted by the IDE to use the Selenium Sushi library, letting you hide all the code that is not directly related to your test at hand. Finally, you plug the code file into the provide Visual Studio project template to provide the requisite infrastructure and then you run the tests just as you did with RC. Plus, you get workarounds for a couple Selenium bugs I discovered along the way.

Feature Selenium IDE Selenium RC Selenium Sushi Generate web page locators ● ● Confirm web page elements from locators (via Find button in table view) Record test case ● Execute test case/suite in Firefox ● ● ● Execute test case/suite in other browsers ● ● ● Execute a single test (via context menu) ● Execute a portion of a test case (via start points and break points) Execute a subset of test cases in a test ● ● suite (via NUnit GUI) (via NUnit GUI) Execute data-driven test cases (or other ● ● (via custom (via custom iterative test cases) programming) programming) ● Retarget browser without recompiling (via .NET config file) without recompiling ● Retarget web server (via .NET config (e.g. QA vs. development vs. staging) file) ● Report diagnostics: browser, environment, (e.g. in NUnit GUI and other details output tab) Provides one-line to one-line mapping of IDE’s Verify… and WaitFor… commands ● (vs. multiple line code fragments repeated (requires manual many times throughout the auto-generated editing) code) Handles file URLs automatically (RC allows file URLs, but must be coded differently ● than non-file URLs) Supports relative file URLs (to allow different team members to have their local ● file tree in different locations on disk) Provides Visual Studio project template ● Table 3 Enhancements Provided by Selenium Sushi Selenium Sushi is an open-source project that is, at the moment, available exclusively as a download from this article on Simple-Talk.com. The development team (namely, me :-) has gone the extra mile to package up all the bits and pieces into an easily digestible form for you, the reader. I welcome your questions (either at the bottom of this article or via email) but if you like Selenium Sushi, I welcome your contributions even more! The accompanying code archive contains the following:

SeleniumTest folder: Visual Studio solution containing 4 projects (the library, the RC sample, the template, and the RC sample sprinkled with sushi) plus complete API documentation for the library (SeleniumTest\API\Index.html). Test Cases folder: contains the Selenium IDE test case from which all the samples in this article derive. Sample Web Page folder: used by the ExportToSushi sample, this contains an offline copy of a tiny fragment of the Simple-Talk.com home page as the Local environment target in contrast to the Production environment target of Simple-Talk.com itself. Focus Your Visual Studio Qi[3]

Before transforming the raw, unadorned Selenium RC code into enhanced Selenium Sushi code, you need to prepare your Visual Studio project. Start by copying the project template into your solution and naming it as you see fit. In the accompanying code, I have named the new project ExportToSushi. Next, you need to customize the template to your needs. The Common\Setup.cs file lists all the values for the TestEnvironment enum: Local, QA, Development, Staging, or Production. Comment out the ones you do not wish to use; uncomment the ones you want. Then go into the project settings (Properties >> Settings page) and add or delete settings for home page and connection to match your selected TestEnvironment values. The template includes settings for Development and Production just as a sample. Once you have done those simple customizations, check that the (essentially empty) project compiles before proceeding further. If you prefer to create your project from scratch, the additional steps are: Create a new class library project. Add all RC DLLs from C:\Program Files\Selenium\selenium-dotnet-client-driver-1.0.1 as project references. Add a reference to my open-source library (CleanCode.TestSupport.Selenium.DLL). Copy the Common folder and the MainTests folder from SeleniumSushiTemplate into your project and adjust the namespaces in Setup.cs and SuiteSetup.cs, respectively. Just as above, edit the clearly marked region in Common\Setup.cs to enable or disable the environments you want. Create project settings (Properties >> Settings page) for general settings (Environment, BrowserType, SeleniumPort) and environment- specific settings to match your Common\Setup updates (HomePage_* and Connection_*) where you replace the wildcard with any or all of the TestEnvironment choices (Local, QA, Development, Staging, or Production). If you plan to use the NUnit GUI, add a post-build event (Properties >> Build Events page) to copy your application’s config file to the project’s top-level directory (or wherever you plan to store the NUnit project file); here is the parameterized line I use verbatim in each project—Visual Studio takes care of plugging in the correct macro values: copy "$(TargetDir)\$(TargetFileName).config" "$(ProjectDir)\$(TargetName).config" Initial File Integration

Now you have a project framework that compiles, ready to accept tests. To migrate the sample file: Copy the ExportedTestCase.cs file into the MainTests folder of the ExportToSushi project and adjust its namespace to match (ExportToSushi.MainTests). Delete the designated Setup and TearDown methods, plus the two variable declarations, at the top of the file. Rename the class to MyTests to match the class name in SuiteSetup.cs. You may, of course, use a name of your own choice but also rename it in SuiteSetup.cs. Make the class partial since it is now split between SuiteSetup.cs and this file. Delete the [TestFixture] attribute on the class because it is already present in SuiteSetup.cs and may only appear once. Remove the first line of the test case (the selenium.Open("/") call); that is subsumed into the Setup.StartSelenium method. If you have recorded steps to login and/or to navigate to the page that your tests target, those should appear next in the code. Move those lines of code into the Login and Preliminaries methods in Setup.cs. The template project includes a few sample lines—replace those with your real lines. Update the values for username and password if you need to use them. Finally, in SuiteSetup.cs enable or disable the calls to Login and Preliminaries, as your needs dictate. What remains of the code should now be test commands interspersed with supporting action commands; all of the preliminary action should now appear in Setup.cs. The final step is to replace code chunks corresponding to test commands with Selenium Sushi library calls (as detailed next), restoring the simplicity and clarity of Selenium IDE within your code. Restore the Balance between Selenium IDE and Selenium RC

Referring back to Figure 1, the IDE shows the first test command is waitForTextPresent. That command emitted this code:

for (int second = 0;; second++) { if (second >= 60) Assert.Fail("timeout"); try { if (selenium.IsTextPresent("regexpi:Privacy policy.*Terms and conditions")) break; } catch (Exception) {} Thread.Sleep(1000); }

Replace that chunk of code with this Selenium Sushi call to restore a clean, simple, IDE-test-to-line-of-code mapping. This WaitForTextPresent method is an extension method encapsulating the above code to let you exactly mirror the test command from Selenium IDE in code: selenium.WaitForTextPresent("regexpi:Privacy policy.*Terms and conditions");

The next test in Figure 1 is verifyText. The emitted code is this:

try { Assert.AreEqual("Home", selenium.GetText("//form[@id='aspnetForm']/div[2]/div/div[2]/div[1]/span")); } catch (AssertionException e) { verificationErrors.Append(e.Message); }

If that had been an assertTest command instead of verifyText, you would have seen just this:

Assert.AreEqual("Home", selenium.GetText("//form[@id='aspnetForm']/div[2]/div/div[2]/div[1]/span"));

The behavioral differences of assert vs. verify (see Table 1) account for the code difference. The static methods of NUnit’s Assert class throw exceptions that cause the current method to abort. In order to circumvent the abort when a verify command is used, Selenium code includes a trap for the exception wherein it postpones reporting the error until the end of the test case. Selenium Sushi encapsulates that functionality to eliminate clutter in your test case. The bottom line: replace the whole try-catch chunk of code with this, making both assert and verify commands symmetric:

Verify.AreEqual("Home", selenium.GetText("//form[@id='aspnetForm']/div[2]/div/div[2]/div[1]/span"));

Repeat this cleanup for each test supported by Selenium Sushi, shown here with example arguments:

selenium.WaitForTitle("title text"); selenium.WaitForTextPresent("some text"); selenium.WaitForElementPresent("//div/table/tr/td"); Verify.AreEqual("some text", selenium.GetText("//td/font/b/i")); Verify.IsTrue(someBool);

Since I have used only verify commands and no assert commands in the sample, this means all of them need cleanup. The net result: the original ExportedTestCase.cs file shrinks from 176 lines to 32 lines – a reduction of over 80%! (Note this is probably close to a best-case scenario; actually savings in your code will vary.) Figure 3 illustrates the entire test case (chopped off only on the right edge to fit publishing requirements). By encapsulating unnecessary details into library calls, the logic and intent of each test in the test case now reveals itself with the same clarity as in the Selenium IDE! Figure 3 Sample Project’s Code Reformulated with the Selenium Sushi library Running with Selenium Sushi

Just as with moving from IDE to RC, the next step is to confirm that you get the same results with Selenium Sushi. Open the NUnit project file (ExportToSushi.nunit) and—making sure you have the Selenium Server still running—run the tests. The test result comes much faster this time, almost instantly. No, Selenium Sushi does not have the capability of speeding of web testing a hundred-fold! Rather it is just telling you you forgot to initialize your configuration file. Figure 4 shows the NUnit output: the error shown in the top panel indicates that you have not specified an environment type.

Figure 4 NUnit GUI showing Selenium Sushi Has Not Yet Been Configured If you open the ExportToSushi.config file in the root directory of the ExportToSushi project, look for the Environment setting (XPath: //ExportToSushi.Properties.Settings/setting[@name=’Environment’]). Change the None value to one of the valid values shown in Figure 4. Note that you could adjust the default value in your project settings so that upon compiling, your config file has a valid value but that is not advised. Setting the default value to None, as the SeleniumSushiTemplate does, has an important advantage: if you mistype a value at some later date when you are changing the value in your config file, NUnit will immediately report this to you, just as in Figure 4. This happens because if the .NET framework attempts to load an invalid value for a setting it assigns your default, in this case None. None is a special value that, while valid, is unusable so the system reports the problem. If, on the other hand, you set a default to Staging, for example, then later want to change to Development but you misspell it, the system would use the default Staging without reporting any problem so you would be testing on the wrong host! Update the Environment setting in the config file to Production for the purpose of this test, because the home page for production is set to the same URL that the Selenium RC and Selenium IDE tests used—http://www.simple-talk.com. Also update the BrowserType setting in the config file to your browser of choice, otherwise NUnit will similarly balk that no browser has yet been specified. Be aware that when you recompile, the config is regenerated with the default values, effectively overwriting the changes you just made, so you should probably keep it open in an editor and just save it again with your non-default values. (If you are instead running from ReSharper inside Visual Studio, it uses your actual application settings rather than the root-level config file used by the NUnit GUI. So to run Selenium Sushi with ReSharper you must set your Environment and BrowserType in the settings (Properties >> Settings page) to a value other than None. (Be sure to set them back to None when you are ready to roll out your tests for the reason explained above.) Go back to the NUnit window and reload the project (File >> Reload project); otherwise, NUnit will not see your updates to the config file. Run again and this time the tests should all execute. Unlike Selenium RC, which has a defect noted earlier for Assert.IsTrue results, the final report from Selenium Sushi matches all the errors from the original Selenium IDE test run.[4] (Per the footnote, the output for Assert.IsTrue and Verify.IsTrue is lacking context, so on my “to do list” is an item to add in overloads to those methods to provide context.) But in this case, matching all the errors is not good enough! Recall that both IDE and RC had a defect—they aborted the test case if a locator in a verify command did not find its target on the web page. So far, Selenium Sushi is replicating that behavior. To remedy the bug, replace all GetText calls with the Selenium Sushi extension method, GetTextSafely. That is, change this:

Verify.IsTrue( Regex.IsMatch( selenium.GetText( "css=div.articlesummary a:contains('LINQ Lycanthropy: Transformations into LINQ')~div.articledetail2"), "Michael Sorens.*05 January 2011"));

. . .to this:

Verify.IsTrue( Regex.IsMatch( selenium.GetTextSafely( "css=div.articlesummary a:contains('LINQ Lycanthropy: Transformations into LINQ')~div.articledetail2"), "Michael Sorens.*05 January 2011"));

With that in place you eliminate the premature test case exit and you expose all such locator errors that Selenium Sushi includes in the final error report shown by either ReSharper or NUnit. It separates errors into two sections as shown; I have highlighted them in red here for clarity:

TearDown : NUnit.Framework.AssertionException : ******** Verify Errors: ******** [false] [false] [false] [false] [false] [false] [false] [false] [ String lengths differ. Expected length=4, but was length=3. Strings differ at index 3.

expected:<"SQLx"> but was:<"SQL"> ------^ ] ******** Selenium Errors: ******** [ERROR: Element //div[@class='articlesummary'][a[contains(text(),'Which of Your Stored Procedures are Using the Most Resources?')]] not found] [ERROR: Element css=div.articlesummary a:contains('Which of Your Stored Procedures are Using the Most Resources?')~div.articledetail2 not found] [ERROR: Element css=div.articlesummary a:contains('Showplan Operator of the Week - Merge Interval')~div.articledetail2 not found] [ERROR: Element css=div.articlesummary a:contains('LINQ Lycanthropy: Transformations into LINQ')~div.articledetail2 not found]

The Verify Errors section reports errors from Verify commands—note this still suffers from the lack of context problem mentioned earlier. The Selenium Errors section reports the locator failures from GetTextSafely commands. Technically this latter group indicates problems with your test code itself rather than your application under test. If you wish to confirm that a locator points to an existing element, it is more appropriate to use the VerifyElementPresent command. A final note about the NUnit GUI: It does not correctly render strings with embedded newlines; when the true generated output was the two dozen or so lines shown above, the NUnit GUI only shows the first couple lines! With another test I got perhaps 50% of the actual output. Thus, the amount of output you see varies with the text of the output. All is not lost, though, as there is a workaround: select all and copy it then paste into a text editor and the entire text reappears! (So the output is apparently present yet unseen; Heisenberg would be proud…) ) Retargeting Environments with Selenium Sushi

You have actually already seen how to retarget either your environment or your browser—just edit the project’s configuration file and change the Environment and BrowserType settings. To test against Internet Explorer instead of Firefox, simply change the BrowserType value in the config file from Firefox to IExplorer. You can change the Environment setting in a similar fashion but it serves a somewhat different purpose. Think of the Environment setting as a routing mechanism; it reroutes both your application’s home page and its database connection, if any. This is quite handy to ensure that you get the same results in QA as you do on your staging server, for example. To use this capability, you must specify in the config file where these values point to when switched. Then, in the Common\Setup.cs file you activate the corresponding data entries, choosing from any or all of the provided locations: Local, QA, Development, Staging, and Production. The sample project shows just two of these activated: Development and Production. In the initial Selenium Sushi tests you ran above, you set the value to Production. From the Setup file, you can see that this maps the home page to the HomePage_Prod setting.

/************************************************************************/ // adjust as needed for your supported environments

private static Dictionary HomePageMap = new Dictionary { // { TestEnvironment.Local, Properties.Settings.Default.HomePage_Local }, // { TestEnvironment.QA, Properties.Settings.Default.HomePage_QA }, { TestEnvironment.Development, Properties.Settings.Default.HomePage_Dev }, // { TestEnvironment.Staging, Properties.Settings.Default.HomePage_Staging}, { TestEnvironment.Production, Properties.Settings.Default.HomePage_Prod } };

private static Dictionary ConnectionStringMap = new Dictionary { // { TestEnvironment.Local, Properties.Settings.Default.Connection_Local }, // { TestEnvironment.QA, Properties.Settings.Default.Connection_QA }, { TestEnvironment.Development, Properties.Settings.Default.Connection_Dev }, // { TestEnvironment.Local, Properties.Settings.Default.Connection_Staging}, { TestEnvironment.Production, Properties.Settings.Default.Connection_Prod } };

/************************************************************************/

With this setup you could also switch to Development, mapping to the HomePage_Dev setting. Observe in the settings that the value for HomePage_Dev looks like a file URL—except it is a relative file URL:

file: / / /.. /.. /.. /.. /.. /Code /Sample Web Page /simple-talk.htm

These beasts do not exist in nature, as you may know. Selenium Sushi supports relative file URLs for convenience in your development environment. Typically your source code will be in source control and each developer on your team is free to check out the source tree in an arbitrary location. You might prefer C:\stuff\projects\priority_projects\here while your teammate might prefer it to be just at C:\. If a standard, absolute file URL was all you could use, that means each person has to customize—and leave checked out—a copy of the config file. So Selenium Sushi supports file URLs relative to your test case DLL. To make it even easier to find the right relative path, Selenium Sushi provides diagnostic output to guide you. Test output appears on the Errors and Failures tab. Switch over to the Text Output tab after you have executed a test to see Selenium Sushi’s diagnostic information for the test (Figure 5). Here you see your browser and environment settings. Of particular interest is the Home Page and the Current Dir lines. The current directory is the location of your test case DLL file. The relative URL for the home page is relative to that directory. The diagnostic output shows both the relative URL and the absolute URL that it resolves to, based on that current directory. By the way, if you get the relative URL wrong, NUnit will abort the test immediately and report the problem—there is no overhead waiting for Selenium to initialize, open a browser, then attempt to open the file URL and find it missing. Figure 5 Diagnostics in NUnit GUI showing details of a Relative File URL Resolution Selenium RC supports absolute file URLs so once Selenium Sushi has translated the relative URL to an absolute URL and confirmed it is a valid path, it only uses the absolute URL from that point on. A minor point to be aware of is that Selenium RC does not allow you to specify a file URL in the selenium constructor, where standard (non-file) URLs are specified; you must defer that to a later open call. You will see the chunk of code in Selenium Sushi’s StartSelenium method to handle this automatically for you. Conclusion

In the short time I have been using Selenium I have found it to be an exciting tool; nevertheless, I quickly developed Selenium Sushi as I went along to make it more effective and efficient to work with. My intent was to make it a general enough platform enhancement to be a timesaver for others as well. It is a work in progress but I felt Selenium Sushi was complete enough that it could be useful to people now. Some items that would be yet be worthwhile: Create a better Assert.IsTrue and Verify.IsTrue. As mentioned above, these have context in IDE, but no context in RC or Selenium Sushi, so need to provide some context. Add additional extension methods to support more IDE commands to avoid massive code duplication. Add support for SqlConnections; currently supports just OleDbConnections. Automation, possibly leveraging Selenium Grid. Properly document the CleanCode.TestSupport.Selenium code and publish the API on my open-source web site.

I will address some or all of these as time allows. If you would like to contribute, you are most welcome to—drop me a line. If you are intrigued by Selenium, be sure to also take a look at Selenium Grid, an enhancement to drastically reduce test time by running tests on a grid of computers in parallel. Also, be aware of the Selenium Conference coming up soon in San Francisco! Fine Print: This article was developed with Visual Studio 2010, ReSharper 5.1, NUnit 2.5.9, Selenium IDE 1.0.10, and Selenium RC 1.0.3. The bug icon used several times is courtesy of Microsoft’s free clip art repository at http://office.microsoft.com/en-us/Images/results.aspx?qu=insects. Footnotes

[1] Traditional sushi is rice commonly topped with a wide variety of other ingredients. In the same vein, I chose Selenium Sushi to convey the variety of enhancements this library and template provides for Selenium. (Also because of its alliterative appeal. :-)

[2] Upon further investigation I determined that when Assert.IsTrue fails, it throws an AssertionException with an empty Message! On the other hand, Assert.AreEqual correctly fills in the exception Message detailing the discrepancy. To see this change the first “SQL” in this line to “SQLx” to make it fail, then rerun the test:

Assert.AreEqual("SQL", selenium.GetText("link=SQL"));

[3] Qi, also spelled in English as chi or ch’i, is the energy force of all living things. Perhaps it is silly to wax philosophic about code design and development, but developing reliable, clean code is more of an art than a science. Raw Selenium RC code is frequently not harmonious with its repetitive chunks of code. Selenium Sushi really does restore its balance.

[4] Selenium Sushi’s results are better than RC’s results by reporting “Error:false” when Verify.IsTrue fails. Unfortunately, it still suffers from an unavoidable artifact: the IDE output shows each test line being executed providing context for the terse “Error: false” message. Here, you do not have context, just the error message. As a contrast, again change the first “SQL” in this line to “SQLx” to make it fail, then rerun the test, so you can see that this, at least, gives sufficient detail:

Verify.AreEqual("SQLx", selenium.GetText("link=SQL"));

© Simple-Talk.com Hitting the Ground Running with Parallel Extensions in .NET 4.0

01 February 2011 by Jeremy Jarrell

With the arrival of Parallel Extensions in .NET 4.0, the concurrent programming powers traditional reserved for the most elite of developers are now available to all of us. With multi-core processors increasingly becoming the norm, this is good news, and Jeremy Jarrell gives us the essential knowledge we'll need to get started.

ver the next few years, your users will begin replacing their current computers with newer machines containing increasing numbers of of multi- Ocore processors, and they’ll expect your software to make that investment pay off. Although concurrent programming and the tools associated with it has traditionally been an arena reserved only for the true gurus of our field, the advent of Parallel Extensions in .NET 4.0 has brought these same tools to the masses. This article will serve as the first step on your path to learning how to use these new parallel tools, and also provide resources for more advanced topics when you’re ready to go a bit deeper. While this article will introduce you to some of these extensions, and show how you can quickly incorporate them into your own applications to take advantage of all of those additional processors that your users are now loading up on, bear in mind that this is merely a taste of the parallel power now available to you. Our goal is to introduce you to the concepts of Parallel Extensions, not turn you into a Concurrent Programming Ninja from scratch. We’ll start by taking a look at PLINQ, which is an easy way to work with the new extensions, and then we’ll dig deeper and take a look at the library which drives it all. PLINQ

Most .NET developers today are familiar with LINQ, the technology that brought functional programming ideas into the object-oriented environment. Parallel LINQ, or ‘PLINQ’, takes LINQ to the next level by adding intuitive parallel capabilities onto an already powerful framework.

var customers = new[] { new Customer { ID = 1, FirstName = "John", LastName = "Smith" }, new Customer { ID = 2, FirstName = "Suzy", LastName = "White" }, new Customer { ID = 3, FirstName = "Robert", LastName = "Johnes" } }; var results = from c in customers.AsParallel() where c.FirstName == "John" select c;

With the simple addition of the AsParallel() extension method, the .NET runtime will automatically parallelize the operation across multiple cores. In fact, PLINQ will take full responsibility for partitioning your data into multiple chunks that can be processed in parallel. PLINQ partitioning is a bit out of the scope of this article, but if you’re curious about the inner workings of it, this blog post from Microsoft’s own Parallel Programming team does a great job of explaining the details. All of this sounds easy, right? Well mostly it is, but there are a few limitations to be aware of. One such limitation is that PLINQ only works against local collections. This means that if you’re using LINQ providers over remote data, such as LINQ to SQL or ADO.NET , then you’re out of luck for this version. However, if you’re working against objects in memory, then PLINQ is ready to go, with a caveat; In addition to being in-memory, the collection on which you’re operating must support the extension methods exposed by the ParallelEnumerable class. As of .NET 4.0, most common collections in the framework already support these methods, so this will likely only be an issue if you’re working with arcane collections or collections of your own design. Let’s try another example; in our previous query, we simply left it up to the runtime to decide how to best parallelize our query given the resources available, but sometimes we want a bit more control. In these cases, we can add the WithDegreeOfParallelism() extension method to specify across how many cores we’d like to scale our query...

var results = from c in customers.AsParallel().WithDegreeOfParallelism(3) where c.FirstName == "John" select c;

By now, you’re probably starting to realize that PLINQ makes parallelizing most of your queries an almost trivial matter. In fact, you may be wondering why you wouldn’t just parallelize everything from now on. Well, before you jump headfirst into parallelization bliss, you need to know that concurrent programming isn’t appropriate for everything. In fact, there are certain situations where parallelization can even have some downsides. We’ve already mentioned the fact that PLINQ only works against in-memory collections, but let’s take a look at a few more pitfalls of PLINQ. Things to Bear in Mind Since PLINQ chunks the collection into multiple partitions and executes them in parallel, the results that you would get from a PLINQ query may not be in the same order as the results that you would get from a serially executed LINQ query. To be fair, this is probably less a pitfall, and more something that you just need to be aware of. Since the two sets are still perfectly equivalent, this won’t be an issue in many cases, but if you’re expecting your results as a series ordered in a specific sequence, then this may lead to a bit of a surprise! However, you can work around this by introducing the AsOrdered() method into your query, which will force a specific ordering into your results. Keep in mind, however, that the AsOrdered() method does incur a performance hit for large collections, which can erase many of the performance gains of parallelizing your query in the first place:

var results = from i in ints.AsParallel().AsOrdered() where i > 100 select i;

Since the runtime must first partition your dataset in order to execute it in parallel, parallelizing your query naturally does incur a slight amount of overhead. This overhead is also incurred by the additional management required to synchronize the results from the multiple tasks. For complex operations, this overhead is usually negligible compared to the benefits, but for simple operations this added overhead may quickly outweigh any gain you receive from parallelization. Therefore, it’s a good idea to only introduce parallelization after you’ve noticed a CPU bottleneck. This will prevent you from prematurely optimizing your code and, in the process, doing more harm than good. MSDN also has a great list of some additional “things bear in mind” when parallelizing your code. Periodically skimming this list to keep these pitfalls fresh in your mind can save you hours of agonizing debugging down the road. Cancellation Control

Everything we’ve discussed thus far assumes that you’ll always want your queries to finish, but what if you need to cancel an already-running query? Luckily, PLINQ has provisions for that as well. To cancel a running query, PLINQ uses a cancellation token from the Task Parallel Library (which we’ll look at in a moment), in the form of the CancellationTokenSource object. The code below demonstrates this object in action…

var cancellationSource = new CancellationTokenSource(); var results = from i in ints.AsParallel().WithCancellation(cancellationSource.Token) where i > 100 select i;

//Elsewhere... cancellationSource.Cancel();

try { results.ForAll(Console.WriteLine); } catch (OperationCanceledException ) { // Handle the exception... }

The example above warrants a bit of explanation: The OperationCanceledException is thrown when a parallel operation is canceled by the holder of a cancellation token, but you may be wondering why we are wrapping only the results in a try/catch, rather than the actual PLINQ query - This is due to the fact that LINQ queries are evaluated lazily. Simply speaking, a lazily evaluated query is one that is never executed until we attempt to enumerate the results object, even though we created it at the beginning of the code sample. This is why the OperationCanceledException is actually thrown from the results object, not the query itself. You can see this lazy evaluation in action when attempting to view the results of a LINQ query in the Visual Studio debugger…

Figure 1. Lazy evaluation in action. Although these tokens allow us to cancel an already-running query from another thread, it’s important to note that PLINQ will not pre-emptively cancel a running thread. Rather, it will continue to poll the cancellation token at various points through the process, thus avoiding the unexpected cancellation issues previously found with Thread.Abort. In actuality, it will wait for the current task to complete before terminating the overall query. In other words, cancelling a PLINQ query simply tells the runtime not to begin any new iterations after a cancellation token has been issued. The Task Parallel Library Now that we’re starting to get an understanding of the parallel capabilities of PLINQ, let’s dig a bit deeper into the framework that supports and drives it. Also included in the Parallel Extensions is the Task Parallel Library, or TPL, which is an advanced concurrent library that’s appropriate for those scenarios when you need more fine-grained control. To be fair, PLINQ is really built on top of the TPL, and is an easier means of using it, but you can dig into the TPL directly if you need to do some serious tinkering. Working with Tasks

The fundamental building block of the TPL is known as a Task, which is conceptually very similar to threads in earlier versions of the .NET Framework. In fact, Tasks are actually implemented under the hood using the standard CLR ThreadPool, which has been optimized to handle Tasks starting in .NET 4.0. Let’s take a quick look at some of the simple things you might need to do with tasks, just so you can see how easy the new extensions are to work with. Tasks are small, self-contained units of work, and can often be expressed by a single method or even a lambda expression. Once a Task has been encapsulated in this way, we can begin its execution by wrapping it in the StartNew() method…

Task.Factory.StartNew(() => Console.WriteLine("This is a task!"));

Sometimes we may be interested in the results of a Task, as well, and in these cases we can retrieve the results of the Task through its Result property, which waits until the value is available...

var task = Task.Factory.StartNew(() => string.Format("Today's date is {0}", DateTime.Now.ToShortDateString())); Console.WriteLine(task.Result);

Note the use of the generic type arguments on the StartNew() method; this denotes to the compiler that our task will return a value of type string. Much of your work with the TPL will likely focus on individual tasks, but sometimes it’s useful for a Task to beget other Tasks. In these situations, Tasks can be arranged in a parent-child relationship...

var parentTask = Task.Factory.StartNew(() => { Console.WriteLine("In the parent..."); Task.Factory.StartNew(() => Console.WriteLine("In the child..."), TaskCreationOptions.AttachedToParent); });

Finally, starting Tasks is easy, but what if you need to cancel one? Luckily, since PLINQ is built on the TPL, cancelling Tasks in the TPL uses the exact same mechanism as in PLINQ. By passing a CancellationTokenSource to a Task when it’s created, the holder of the cancellation token can cancel that Task from anywhere else in the application…

var cancellation = new CancellationTokenSource(); Task.Factory.StartNew(() => { for (var i = 0; i < 1000; i++) { cancellation.Token.ThrowIfCancellationRequested(); Console.WriteLine("In the loop..."); } }, cancellation.Token);

When we request a cancellation via our token, we’ll break out of the loop above at the beginning of the very next iteration. These are just a few examples of how to make use of the TPL, but if you’d like to get a deeper understanding of the underlying concepts behind parallelism then this article from MSDN Magazine is an excellent start. AggregateExceptions

With all of the power available to PLINQ and the TPL, you may be wondering what happens when something goes wrong. Luckily, functionality has been added to the framework to handle the special problems that can arise when working with concurrent programming. The AggregateException is a new exception type that is thrown when an error occurs in the context of one of the new concurrent operations available in the framework; it records all exceptions that occur across all threads, and combines those into a single, aggregated exception. This rather neatly allows you to locate errors that occurred in any thread comprising a single operation…

var task = Task.Factory.StartNew(() => { Task.Factory.StartNew(() => { Console.WriteLine("In the first task."); throw new Exception("Exception from the first task"); }, TaskCreationOptions.AttachedToParent); Task.Factory.StartNew(() => { Console.WriteLine("In the second task"); throw new Exception("Exception from the second task"); }, TaskCreationOptions.AttachedToParent); }); try { task.Wait(); } catch (AggregateException e) { DisplayInnerExceptions(e); }

private static void DisplayInnerExceptions(AggregateException e) { foreach (var exception in e.InnerExceptions) { var aggregate = exception as AggregateException; if (aggregate != null) DisplayInnerExceptions(aggregate); else Console.WriteLine(exception.Message); } }

Figure 2. A simple exception handing demonstration. However, it you recall, a Task can parent multiple other tasks, and since it’s entirely possible for both a parent Task and a child Task to each throw exceptions, it’s also possible to receive an AggregateException that, in itself, contains other AggregateExceptions. In this case it would become necessary to recursively walk the exception tree to ensure that we find all exceptions that were thrown… Figure 3. The potentially nasty AggregateException tree That could quickly become a bit of a headache, but luckily the .NET framework provides a convenient method that performs this operation for us, flattening all of the exceptions into a series of individual exceptions…

try { task.Wait(); } catch (AggregateException e) { foreach (var exception in e.Flatten().InnerExceptions) Console.WriteLine(exception.Message); }

The code above produces the exact same output as before, only in a more elegant way. In fact, the Flatten() method hides the complexity of recursively walking the exception tree entirely. What Lies Ahead

As the comparatively free lunch provided by Moore’s Law for so many years of steadily draws to a close, developers will be forced to adapt their methods to take advantage of evolving hardware. This almost inevitably means that you will be parallelizing your code in the future in order to take advantage of the explosion of processors sure to begin materializing in all levels of consumer hardware. Although concurrent programming has long been a relatively black art, new frameworks such as PLINQ (and the TPL that drives it) promise to bring this once-arcane skill within the grasp of developers everywhere. Once you master these skills, you can rest assured that your code won’t be left behind by this new wave of processing power. If this taste of the parallel power now available in .NET has whet your appetite for more, then there are a multitude of great resources available to help you dive deeper, both online and in print. A particularly great book is Adam Freeman’s Pro .NET Parallel Programming in C#. Also of value is Joseph and Ben Albahari’s C# 4.0 in a Nutshell; in fact, Joseph and Ben have been kind enough to make their chapter on threading available online, in its entirety, for free. This should serve as a great jump-start to not only PLINQ, but to the .NET threading model in general, so I wholeheartedly suggest you take a look!

© Simple-Talk.com