In this post I am going to describe an approach for identifying and removing bugs from a Sitecore solution. This approach is broad and heuristic in nature, mostly based on personal experiences. If you wish to dive further into the methodologies used in debugging, a good starting point is the Wikipedia article on debugging (https://en.wikipedia.org/wiki/Debugging).
As a developer we engage in many activities: We routinely plan, implement and test new features, and we usually have a shared methodology and vocabulary with which we approach and describe these activities. On the other hand, when it comes to debugging, we sometimes end up haphazardly trying out things in an somewhat unstructured way. This makes it hard to plan bug fixing, measure the progress and describe the work conducted.
At its core, debugging is a three-part process, in which we understand, then locate and finally fix a problem within a piece of software. I would argue that keeping these three activities separate is the first step towards a rigorous approach to debugging. While the process is iterative in nature, I find it beneficial to think of it as sequential, and this is how I am going to describe it below.
Step 1: Understanding the bug
Typically, a bug is reported by a QA or a customer via a ticket or bug report. We are often tempted to jump straight into the code without assessing whether we understand the bug or have enough information to locate and fix it. While this approach can sometimes cut corners, it is essentially based on luck and gut-feeling.
In a rigourous approach to bug fixing, the first step when receiving a bug report is to assess the bug:
- First of all: Is the reported behaviour even a bug?
- Secondly, if it is a bug, do we have enough information to locate and fix it?
If this is not the case, we need to ask follow-up questions before continuing. This is not to signal a lack of trust, nor to point out gaps in the bug report. This is simply a part of the debugging process and indicates that we take the bug report seriously.
When our understanding is sufficient to locate and fix the bug, we should restate the bug in writing on the bug report: What we do, is to write down our understanding of the problem and what we believe will resolve the problem (the ‘definition of done’ for the bug). This will allow the QA or customer to point out potential misunderstandings in order to avoid ‘scope creep’.
Step 2: Locating the bug
Next, we try to locate the bug. This almost always starts with us reproducing the bug – first on the environment on which the bug was reported (in situ) and next on our local machine (ex situ).
Step 2.1: Reproducing the bug in-situ
Given that our understanding of the bug is correct, we should be able to recreate the bug. If we cannot reproduce the bug in situ we need to consider whether our understanding of the bug is sufficient or even correct:
- Maybe the reported behaviour, what would constitute a bug, is not the actual behaviour. A classic Sitecore example is content changes not appearing due to an editor forgetting to publish.
- But maybe the reported behaviour is correct, but the steps we take to reproduce the bug is not sufficient. This means that our understanding of the bug lacks one or more ‘hidden variable’: The bug might only appears sometimes, either randomly or under some condition we have not yet understood.
In general, we should be prepared to go back to step 1 if we fail to reproduce the bug. While we can sometimes speed up things by trying out different scenarios, we should always timebox this activity: If we are not able to reproduce the bug after a certain amount of time, we must go back to step 1 and collect more information from the QA or customer.
Step 2.2: Reproducing the bug ex-situ
Once we have reproduced the bug in situ, we move on to reproducing the bug locally, on our development machine.
If we can reproduce the bug in situ but fail to reproduce the bug ex situ, we will need to consider why: In a Sitecore solution there are in many cases significant differences between a local installation and a test environment, and we should look at these differences to find an explanation.
Examples include:
- We use the master database as the content database locally
- We use the Preview API locally
- We use a different Sitecore topology locally
- We use a OnPrem installation locally and a Cloud installation on the test environment
Sitecore’s complex configuration system can also lead to differences between environments. If we suspect differences in configuration, we should compare relevant sections of the configuration use showconfig.aspx
.
Sometimes we need to introduce changes to our local installation to mimic the behaviour of the test environment. We might need to point it to a specific Solr, enable Application Insights or do other configuration changes that will align our local installation with the test environment. We do this in a systematic fashion, changing one thing at a time until we are able to recreate the bug ex-situ.
Step 2.3: Locating the bug
Having recreated the bug locally, we now continue to locate the bug, figuring out exactly with component is failing. When locating a bug we use a ‘divide-and-conquer’ approach: While you will find slightly different definitions of this, it is essentially breaking a large problem into smaller problems, solving one problem at a time.
Most of us already do this intuitively: If e.g., wrong data is being shown on the website, we first try to figure out whether the data is wrong in the content database, hence spilting the problem into two.
However, because this happens intuitively, we sometimes find ourselves examining parts of the code that we have already ruled out as the failing part. To avoid this, if we do conclude that the correct data is in fact stored in the database, we should make a note of all the components we now know is not failing. Not only will we be avoiding unnecessary work, it will also be making handing over the bug to another developer easier and our progress explicit – even though we have yet to locate the bug.
Step 3: Fixing the bug
Once we have located the bug, we will implement a fix. Often this is the easiest part of debugging, but there is some pitfalls we should avoid:
First, we should avoid refactoring as part of a bug fix. If we find that the erroneous code would benefit from refactoring, we should always do this in a separate task. A bug fix is supposed to be isolated, targeting the reported problem and nothing else. The goal is to avoid changing existing functionality or introduce new functionality.
But on the other hand, we should also acknowledge that a bug fix is new code – and in a sense new (and correct) functionality. Therefore, we need to have a test setup to handle this. Having a suite of automated tests is a great way to avoid regression errors.
In general, we should use a ‘do no harm’ approach to fixing bugs: If a feature is completely broken we will sometimes allow rather large code changes. But is a feature is just a bit off, we should avoid extensive interventions.
Some helpful tools
As developers we have a large array of tools available for debugging. Which tools we choose to use is partly based on personal experience and will differ depending on the situation. Here is a list of tools I use when debugging:
Not surprisingly, I use the build-in debugger in Visual Studio all the time. In addition to setting break points, I tend to use the Intermediate Window a lot. I also write debugging information to the console using System.Diagnostics.Debug.WriteLine instead of writing to the Sitecore logs to avoid having to wait for the local logs to be flushed. I sometimes use dotPeek as a symbol server for compiled assemblies, which allow me to set breakpoints in Sitecore dlls. (https://www.jetbrains.com/help/decompiler/Using_product_as_a_Symbol_Server.html)
When debugging calls to external dependencies, I always have Fiddler running and my local IIS configured to use Fiddler as a proxy (https://www.telerik.com/blogs/capturing-traffic-from-.net-services-with-fiddler). I set up a filter in Fiddler showing traffic from the browser as well as from the IIS. This allow me to inspect and auto respond to traffic and is my go-to tool for a broad range of issues. In rare cases I use Wireshark to analyse network traffic on the network layer.
To analyse performance and memory issues I usually use dotTrace from JetBrains together with the tools build into Visual Studio.
For Sitecore related bugs, I have a clean Sitecore installation running on my machine. I also have the relevant packages from Sitecore unpacked on my computer, especially when dealing with PaaS solutions. In that way, I am always able to compare my local (On Premise) installation with the content of the Azure AppService packages.
Finally, if I am faced with a bug that is not related to custom code I always Google it, as the solution is almost always out there already.
Final thoughts
The approach outlined above is only one of many ways to approach debugging. However, I believe that the main benefit of such an outline – even if change and adapted – is to approach debugging as a structured process, in which progress can be quantified and measured. Hopefully this can make debugging less frustrating and more fun.