In the Coding Zone: April 2012

Friday, 27 April 2012

Using MiniProfiler with ASP.NET WebForms

I had heard about MiniProfiler, an open source project from the developers at StackOverflow for profiling webpages/code/database calls, and had wanted to use it before now but thought it only applied to ASP.NET MVC websites, which was disappointing since I work on a WebForms website. Turns out I was wrong and you can apply it to WebForms too.

I had also read many opinions about it, a lot of them mixed. I found Scott Hanselman's blog post introducing it interesting not for the content (though it was great as ever) but for the comments that were posted. They ranged from "this is the best thing since sliced bread" to pure venom-filled rage.

After thinking about it I did what any rational person would do - I tried it out for myself to form my own opinions. Here they are...

NuGet = Magic

The first thing of course is to download, install and integrate MiniProfiler into my web project. In the past that would have meant exactly that: 3 separate steps to perform before I could even do anything.

However, thanks to the power of NuGet, I simply installed the MiniProfiler NuGet package directly from Visual Studio. NuGet downloaded it and installed the assembly references automatically for me and all in a couple of seconds.

Any NuGet aficionados reading that last paragraph who think I sound a bit simple should be aware that this is the first time I've used it. Consider me a convert though, I can't wait to use it again.

Setup

I then followed the "Getting Started" steps from the MiniProfiler website and used the WebForms sample from their GitHub repository to plug in some boilerplate code to get it to work.

But naturally it doesn't work first time - remember, we are dealing with an open source project here! The first time I run my code I see no results "chiclet" as it is called - a little tab in the top corner of the page which you can click to see the profiling results. It just shows my webpage as it has always looked.

After some Googling around and searching on StackOverflow I found out that I had missed this important setting from my web.config file:

<system.webServer>
    <modules runAllManagedModulesForAllRequests="true"/>
</system.webServer>

This setting was missing because we recently upgraded our WebForms project to .NET 4 and this setting apparently is provided by default in new projects. With this enabled I finally see this:

Profiling your Code

Now that everything is set up, I can start profiling code. The rationale behind MiniProfiler is to find areas that you think could perform better and then start adding using blocks around your code to record timings, like this:

private void MyLongRunningFunction()
{
    using (MiniProfiler.Current.Step("Profiling my code"))
    {
        // Your code goes here, then timings appear in the results
        // tab as "Profiling my code"
    }
}

This seems to annoy a lot of people, their argument being that they shouldn't be making their code "ugly" just to do some simple profiling. My opinion on the matter is:

I've seen far uglier code than this.
You don't have to put these blocks of code everywhere, just the areas you want to inspect, and
Once you've finished profiling and if it really bothers you, why not just remove them?

Profiling your Database

Now this is where I do have a bone of contention with the MiniProfiler developers. But first let me give some context.

At work we use stored procedures for all our database calls, no inline SQL. We then have a data-access layer which basically has a class method mapped to each stored procedure using standard ADO.NET code (raw SqlConnection and SqlCommand objects). Overall we must have hundreds of these methods throughout this layer. Sometimes it's not perfect but overall I'm happy with the way it works and see no reason to change it.

So I wasn't that impressed when I read what was said about integrating database profiling with MiniProfiler. Quoted from their website:

The profiler includes powerful and comprehensive database profiling capabilities. To enable wrap your database connection with a profiling connection.

Basically they are suggesting I re-write a lot of my data-access code to facilitate this library, which I disagree with: this library is meant to fit around my needs, not the other way around.

Unfortunately database profiling is something I really wanted as my gut instinct tells me that is where a lot of our performance bottlenecks could be. Having pondered on the situation I decided to compromise by wrapping our ADO.NET code within a using/timing block to achieve a similar effect - in fact, I posted an explanation of what I did on this StackOverflow question. I imagine that I am losing some benefits from my approach, but at least I can see which stored procedures are called on each request and can see how long they are taking to complete.

The timings highlighted are stored procedure calls taken from being wrapped around simple Step() profiler calls.

Conclusion

It took me a couple of hours to get everything working but it has resulted, at the very least, in showing me that we do a lot of duplicate database calls which are unnecessary. I'm looking forward to finding the slowest parts of our code (which I know are there) and seeing how we can optimise them.

Regarding my opinions on MiniProfiler itself, I'm not entirely sure why the detractors make such a fuss over it. Yes it is quite basic as a profiler and doesn't cover all your running code like some professional tools would, but what do you expect for something that is free to use?

Having some issues around database profiling I can also appreciate why some people would not want to alter their code just to get a pretty profiling page appear on their website. Personally, although the MiniProfiler developers sell it as something you can run in a live environment, I see it as more of a testing/debugging aid mainly for use in our development environment. It might be interesting to see how a real server setup affects the outcomes of things but ultimately I think I will use this during development on our code/stored procedures.

Tuesday, 17 April 2012

Saving Your Database Server From CPU Meltdown

This week I helped investigate a problem with our database server. Our customers were complaining that they could not log in to our website, so we took a look at our database server and found it to be using 100% CPU! Uh oh...

Something was clearly hammering our SQL Server database but how do you find the problem? Where do you start?

Tracing

I quickly Googled around to find out how I can track down performance bottlenecks and happened upon this excellent blog post: Finding the Causes of Poor Performance in SQL Server. I would highly recommend you bookmark this link, it might just save your bacon one day.

I won't repeat ad-nauseum what this post said, but effectively it involved setting up a trace running on your database - just like SQL Profiler but without the GUI - and then reading the trace file that is produced into a temporary table to be able to query it.

So we set up a trace to record all stored procedure calls for 30 minutes and analysed the results. We found that, within this period, one stored procedure was being called over 8000 times and was taking up a vast amount of time/resources. Great, lets start looking at improving this procedure.

Optimising

I can't go into great details about the problematic procedure - the system I work on is for a commercial business - but let me give an overview of what it was doing.

We have a table called Client which represents a hierarchical data structure, so every row in this table has a reference key to a parent Client. This procedure was being given an ID to a Client record and it then had to recurse up through the hierarchy until a certain condition was met, which would then return the top-most Client ID.

There were several things wrong with this approach:

The procedure was using a database function in the WHERE clause to do this hierarchical searching, which is not very good for performance. Couple that with having to search an entire table and it must have slowed down the procedure a lot.
Searching through the hierarchy of Client records was done using a WHILE loop; check the current row and if it did not meet a certain condition check the parent row and keep repeating. In my experience, SQL Server is never that great at working in this imperative fashion.

The fact that this procedure was being invoked so often meant I had to rewrite it to perform better. The solution I came up with is to use another table we have called ClientLookup. This solves the problem of a hierarchy being stored in the database by flattening all the parent IDs for every primary key in the Client table; filtering on a particular Client ID will therefore give you the entire list of parent Clients. Also the ClientLookup table is well indexed.

After rewriting this procedure to join on our ClientLookup table instead of a database function, CPU usage shot down dramatically. Below is a screenshot of the CPU graph in Resource Monitor. I've added an arrow to show the point where I made my procedure changes live.

Conclusion

This one simple change has now meant our entire system is working at light speed; in fact I'm amazed that doing this one thing has made such a profound impact. CPU usage on the server is at the lowest it has been for years whilst still working effectively, webpage request times have reduced greatly and our response times to the website all around the world has been halved.

What can we take away from this? Check your database performance regularly. Using the technique I used above, finding the biggest bottleneck and fixing it could give you big wins.

Sunday, 8 April 2012

SQL Server - Delete Millions of Rows Without Pummeling the Transaction Log

Here's a problem that I've encountered several times and struggled with before I settled on this solution.

Let's say that you have a table in your SQL Server which has millions of rows in it and for whatever reason you decide you want to delete all of them. So, you initially think "simple, I'll just do this":

delete from [MyTable]

But there's a problem with this. You see, you are also using a full recovery model and are therefore writing to a transaction log file for every transaction you perform. If you were to send this command to SQL Server, it has to also write millions of updates to the transaction log in case you need to recover this data. If this also happens to be your live database you've now effectively brought it grinding to a halt. Not good.

So, the way around this problem is actually quite simple. Instead of deleting millions of rows at once, delete much smaller batches of rows but keep doing it until the table is empty.

declare @rows int = (select COUNT(*) from [MyTable])

while @rows > 0 begin
    delete top (10000) from [MyTable]

    set @rows = (select COUNT(*) from [MyTable]
end

Deleting 10,000 rows at a time will not make much of a dent when updating the transaction log and, because we keep repeating the operation when each delete is finished, other connections to the database can still do work, meaning you can now do this in a live environment and not affect performance much - though I would hope you know what you are doing clearing that big a table in a live environment!