Scheduled publishing in Sitecore

,
Publish

I recently helped a colleague implement a lightweight scheduled publishing functionality in Sitecore 10.2. The solution allows content editors to schedule the publishing and unpublishing of content using the existing publishing restrictions and is lightweight in terms of code and impact on the overall solution.

In a standard Sitecore installation, publishing of content is a manual process. While Sitecore offers the possibility to restrict the publishing of an item or an item version to a certain time-window, these restrictions does not automatically publish and unpublish the item, but merely restrict the publishability of an item to prevent accidental publish.

As Sitecore writes: “If you specify a date range for when an item or item version is publishable, it does not mean that the item is published on the start date and removed again on the end date. Instead, it means that the item is available for publishing and included when you publish that item or the entire website. To make an item appear on your website on the start date and be removed again on the end date, you must run the Publish Item wizard on both days.”

This behaviour had been confusing content editors right from the start (and I tend to agree), and therefore we opted to turn the publishing restriction into actual publish scheduling.

Overview of our solution

To achieve this, we implemented the following solution: First of all if an item had any publishing restrictions set, we would regard it as scheduled for publish. To easily find all items with publishing restrictions, we added a computed field to the master index called has_publishing_restrictions.

When publishing an item with publishing restrictions, Sitecore’s publishing pipeline will already make sure that the item is published or unpublished according to the publishing restrictions – added or removing the item from the web database as needed. Hence, a naive approach could simply be to publish all items with publishing restrictions using a scheduled task running e.g., each 5 minutes. It would work, but it would result in a large number of published items.

So, while we did create a scheduled task, running each 5 minutes, we also implemented a check so we where only publish an item if:

  • Was publishable but did not exist in the web database
  • Was not publishable but did exist in the web database

I will simply refer to this as the item having an incorrect publishing state. The code to check the publishing state already exist in Sitecore (Sitecore.Publishing.Pipelines.PublishItem.DetermineAction) and is run when a publish takes place, and we more or less duplicated this into our scheduled task.

With this check in place, we could skip items already in the correct publishing state. But we still had to run the check for every item with publishing restrictions each 5 minutes. So to avoid this, we also implemented a cache containing items we already knew was in the correct publishing state: Once our scheduled task had published an item, making sure that the item was now in the correct state, we added it to this cache, and skipped it when the task ran again. The last part was the trickiest bit, but lets start with the easy parts first.

Step 1: Adding a computed field to the index

As mentioned above, the first step is to add a computed field to the index, indicating whether an item has any publishing restrictions:

public object ComputeFieldValue(IIndexable indexable)
{
    Item item = indexable as SitecoreIndexableItem;
    var database = item.Database;

    if (item.Publishing.NeverPublish)
        return true;

    if (item.Publishing.PublishDate != DateTime.MinValue)
        return true;

    if (item.Publishing.UnpublishDate != DateTime.MaxValue)
        return true;

    foreach (var language in item.LanguagesWithContent())
    {
        var languageItem = database.GetItem(item.ID, language);

        foreach (var v in languageItem.Versions.GetVersions())
        {
            if (v.Publishing.HideVersion)
                return true;

            if (v.Publishing.ValidFrom != DateTime.MinValue)
                return true;

            if (v.Publishing.ValidTo != DateTime.MaxValue)
                return true;
        }
    }

    return false;
}

Step 2: Retrieve all items with publish restrictions from the index

The next step is to retrieve all index with publish restrictions from the index. Again, this is straight forward. We create a SearchResultItem:

namespace ScheduledPublish
    using Sitecore.ContentSearch;
    using Sitecore.ContentSearch.SearchTypes;

    public class PublishingRestrictionsSearchResultItem : 
    SearchResultItem
    {
        [IndexField("has_publishing_restrictions")]
        public bool HasPublishingRestrictions { get; set; }
    }

And is now able to retrieve all items we need from the index:

public IEnumerable<Item> GetItemsWithPublishingRestrictions(Database database)
{
    var index = ContentSearchManager.
        GetIndex($"sitecore_{database.Name}_index");

    using (var searchContext = index.CreateSearchContext())
    {
        return searchContext.
            GetQueryable<PublishingRestrictionsSearchResultItem>().
            Where(i => i.HasPublishingRestrictions).ToArray().
            GroupBy(i => i.ItemId).
            Select(g => database.GetItem(g.Key));
    }
}

Step 3: Create logic to determine the publishing state of an item

We will now create the logic that determines the publishing state of an item and whether it was correct. What we are looking for is items that should be published, but was not, and items that should not be published, but was. To do this, we start by creating a few helper methods (I have left the most trivial of them out):

namespace ScheduledPublish
{
    using Sitecore;
    using Sitecore.Data;
    using Sitecore.Data.Items;
    using Sitecore.Data.Managers;
    using Sitecore.Globalization;
    using Sitecore.Publishing;
    using System.Collections.Generic;
    using System.Linq;

    public static class ItemExtensions
    {
        public static Database[] PublishingDatabases(this Database database)
        {
            // Left out
        }

        public static bool HasVersion(this Item item)
        {
            // Left out
        }

        public static IEnumerable<Language> LanguagesWithVersions(this Item item)
        {
            // Left out
        }

        public static bool IsVersionPublished(this Item item)
        {
            foreach (var targetDatabase in item.Database.PublishingDatabases())
            {
                var languageItem = targetDatabase.GetItem(item.ID, item.Language);
                
                if (languageItem == null || !languageItem.HasVersion())
                    return false;

                var versions = languageItem.Versions.GetVersionNumbers();

                if (!versions.Any(v => v == item.Version))
                    return false;
            }

            return true;
        }

        public static bool IsRevisionPublished(this Item item)
        {
            foreach (var targetDatabase in item.Database.PublishingDatabases())
            {
                var targetItem = targetDatabase.GetItem(item.ID, item.Language, item.Version);

                if (targetItem == null)
                    return false;

                if (targetItem[Sitecore.FieldIDs.Revision] != item[Sitecore.FieldIDs.Revision])
                    return false;
            }

            return true;
        }

        public static bool IsItemPublished(this Item item)
        {
            foreach (var targetDatabase in item.Database.PublishingDatabases())
            {
                if (targetDatabase.GetItem(item.ID) == null)
                    return false;
            }

            return true;
        }
    }
}

With these methods in place, we are able to figure out whether an item, an item version or even a specific revision is published.

We will now be able to detect items with the wrong publishing state: We do this in a two-prong process: First we will determine whether the item itself is unpublishable and is currently published. This is a simple check to implement, and will look like this:

private static bool ShouldUnpublish(Item item)
{
    var isPublishable = item.Publishing.IsPublishable(DateTime.UtcNow, false);<br>
    var isPublished = item.IsItemPublished();

    return !isPublishable && isPublished;
}

With this done, we now know that the item itself is publishable, and we now need to dive into each item version to figure out whether the item is in the correct publishing state. We will need to consider a few things:

  • If an item version is publishable, but the parent is not published, we should not expect the item version to be published
  • If an item has no publishable versions, we should also not expect it to be published
  • If an item has multiple publishable versions, we should only expect the last of these to be published.

In code, this looks like this:

private static bool ShouldPublish(Item item)
{
    var parent = item.Parent;
            
    if (parent == null) 
        return false;

    if (!parent.IsItemPublished())
        return false;
                
    var versions = item.Versions.GetVersions();

    var publishableVersion = versions.
        Where(v => v.Publishing.IsPublishable(DateTime.UtcNow, false)).
        Where(v => v.Publishing.IsValid(DateTime.UtcNow, false)).
        OrderBy(v => v.Version.Number).LastOrDefault();

    var hasPublishableVersion = publishableVersion != null;

    if (!hasPublishableVersion)
        return versions.Any(v => v.IsVersionPublished());
    else
        return !publishableVersion.IsRevisionPublished();
}

As we want to support multiple languages, the ShouldPublish check needs to run for each language for which the item has content, and the result of the method below is a list of languages in a incorrect publishing state:

private IEnumerable GetLanguagesToPublish(Item item)
{
    if (ShouldUnpublish(item))
        return item.LanguagesWithContent();

    return item.LanguagesWithContent().
        Where(language => ShouldPublish(item.Database.GetItem(item.ID, language)));
}

Step 4: Put the pieces together

We are now able to put the pieces together: What we will do in our scheduled task is to get all items with publishing restrictions from the index. We will then iterate though them, checking the publishing state of each item, and if we find any items in the wrong publishing state, we will publish them in the detected languages:

public Dictionary<Item, IEnumerable<Language>> GetItemsToPublish(Database database)
{
    var itemsToPublish = Dictionary<Item, IEnumerable<Language>>();

    // Ensure parents are processed first
    var items = this.GetItemsWithPublishingRestrictions(database).OrderBy(i => i.Paths.FullPath);

    foreach (var item in items)
    {
        var languagesToPublish = GetLanguagesToPublish(item).ToArray();

        if (languagesToPublish.Any())
            itemsToPublish.Add(item, languagesToPublish);
    }

    return itemsToPublish;
}

By putting this into a scheduled task, and implement a publishing mechanism we are almost there. But there is a catch: While we will not publish items that are already in the correct publishing state, we will still check the publishing state each time the scheduled task runs.

Step 5: Implementing an IgnoreCache

To avoid this, we will implement a cache, containing all the items we already know is in the correct publishing state. We call this the IgnoreCache, because it contains all the items we can safely ignore.

However, while we might know that a given item is in the correct state today, this might change in the future, either by the items publishing restrictions being changed by an content editor, or by one of the dates in the publishing restrictions being reached.

The first case is trivial and can be accommodated by removing items from the cache via item:saved event handler. The second case it more tricky, and we approached it the following way: Each item with date-specific publishing restrictions will have a number of dates defined. It might be as simple as a date for the publishing of an item, and a date for the unpublishing. However with publishing restrictions for multiple item versions, we might have a long list of dates – some in the past and some in the future. Each time one of these dates are reached, our assertment of the correct publishing state might change.

So what we will do is to gather all these dates, order them and them determine the nearest date in the future where the assertment of the correct publishing state might change. As most cache mechanism in Sitecore (and we are simply using a System.Web.Caching.Cache in our implementation) supports expiry of cached objects, we simply let the cached object expire on the next date defined in the publishing restrictions.

To get this date, we implement the following method:

public DateTime? GetIgnoreDate(Item item)
{
    List<DateTime> dateTimes = new List<DateTime?>();

    if (item.Publishing.PublishDate != DateTimeOffset.MinValue.UtcDateTime)
        dateTimes.Add(item.Publishing.PublishDate);

    if (item.Publishing.UnpublishDate != DateTimeOffset.MaxValue.UtcDateTime)
        dateTimes.Add(item.Publishing.UnpublishDate);

    foreach (var language in item.LanguagesWithContent())
    {
        var languageItem = item.Database.GetItem(item.ID, language);

        if (languageItem.Publishing.ValidTo != DateTimeOffset.MaxValue.UtcDateTime)
            dateTimes.Add(languageItem.Publishing.ValidTo);

        if (languageItem.Publishing.ValidFrom != DateTimeOffset.MinValue.UtcDateTime)
            dateTimes.Add(languageItem.Publishing.ValidFrom);
    }
            
    return dateTimes.
        OrderBy(dateTime => dateTime).
        FirstOrDefault(dateTime => dateTime > DateTime.UtcNow);
}

Extending our loop with the IgnoreCache, we get this final implementation:

public Dictionary<Item, IEnumerable<Language>> GetItemsToPublish(Database database)
{
    var itemsToPublish = Dictionary<Item, IEnumerable<Language>>();

    // Ensure parents are processed first
    var items = this.GetItemsWithPublishingRestrictions(database).OrderBy(i => i.Paths.FullPath);

    foreach (var item in items)
    {
        if (IgnoreCache.Contains(item))
            continue;
                
        var languagesToPublish = GetLanguagesToPublish(item).ToArray();

        if (languagesToPublish.Any())
            itemsToPublish.Add(item, languagesToPublish);

        IgnoreCache.Add(item, this.GetIgnoreDate(item));
    }

    return itemsToPublish;
}

Summary

With this in place, we have now turned the publishing restrictions offered by Sitecore into publishing scheduling. You will notice that we – like Sitecore – does not make any checks for related items, meaning that it is up to the content editor to make sure that related data sources and media items are published as needed. We did discuss whether we should either implement a warning in the Content Editor if related items where not properly configured – or simply publish related items as well – and this is probably something we will look into in the future. But for now, with these fairly simple steps, we manage to make scheduling publishing possible. A big thanks to Jimmie Overby for letting me present this great work.