A pluggable Keywords extractor

I noticed that customers usually do not want to enter Keywords for SEO, as it is too much work. I won’t go into the discussion whether it’s necessary to have them on you page or not, but this might help them. When I was coding this, I had the id to see if I could use the DataAbstraction functionality of EPiServer to be able to plug in different providers, as you might not wanna use the one I used.
And you can, very easily. I’ll touch that part later on.

To be able to insert the extracted keywords, first add the "KeywordsTagAttribute" to your keywords property.

[CultureSpecific]
[KeywordsMetaTag]
[BackingType(typeof(PropertyStringList))]
public virtual string[] MetaKeywords { get; set; }

When publishing a page I’ll collect all text marked "Searchable" (see another post, or GitHub for that code) and submit it to the "Extraction Service" of choice.

IEnumerable<string> props = GetSearchablePropertyValues(page, page.ContentTypeID);

            string textToAnalyze = TextIndexer.StripHtml(string.Join(" ", props), 0);

            ReadOnlyCollection<string> keywordList;

            try
            {
                keywordList = this.ExtractionService.Service.GetKeywords(textToAnalyze);
            }
            catch (ActivationException activationException)
            {
                Logger.Error("[SEO] No extraction service available", activationException);
                return;
            }

            if (keywordList.Count == 0)
            {
                return;
            }

            if (keywordsMetatagProperty.PropertyType == typeof(string[]))
            {
                page[keywordsMetatagProperty.Name] = keywordList.ToArray();
            }
            else if (keywordsMetatagProperty.PropertyType == typeof(List<string>))
            {
                page[keywordsMetatagProperty.Name] = keywordList;
            }
            else if (keywordsMetatagProperty.PropertyType == typeof(string))
            {
                page[keywordsMetatagProperty.Name] = string.Join(",", keywordList);
            }

Ok, the extraction service of choice? I have added an IoC container in which a provider will be injected.

 protected Injected<IExtractionService> ExtractionService { get; set; }

The provider should implement the “IExtractionService” interface and you should add a service configuration attribute to your provider.

[ServiceConfiguration(typeof(IExtractionService))]
public class AlchemyExtractionService : IExtractionService

That is all that is needed, EPiServer will do all the magic for you, which is really great. Saves a lot of work.

As an example I have created a provider that uses Alchemy for keyword extraction. It’s configured by either adding a key to the appsettings or by adding a KeywordGenerationSettingsBlock to your startpage as a property, attributed with the “KeywordGenerationSettingsAttribute”. The code is on GitHub, nothing spectacular.

    [ServiceConfiguration(typeof(IExtractionService))]
    public class AlchemyExtractionService : IExtractionService
    {
        private static readonly ILog Logger = LogManager.GetLogger(typeof(AlchemyExtractionService));
        protected Injected<IContentRepository> ContentRepository { get; set; }
        private int MaxItems { get; set; }       
        private string AlchemyKey { get; set; }

        public ReadOnlyCollection<string> GetKeywords(string text)
        {
            this.GetSettings();

            if (string.IsNullOrWhiteSpace(this.AlchemyKey))
            {
                return new ReadOnlyCollection<string>(new List<string>;());
            }
            try
            {
                string uri = string.Format(
                    CultureInfo.InvariantCulture,
                    "http://access.alchemyapi.com/calls/text/TextGetRankedKeywords?apikey={0}&text={1}&maxRetrieve={2}&keywordExtractMode=strict&outputMode=json",
                    this.AlchemyKey,
                    HttpUtility.UrlEncode(text),
                    this.MaxItems);

                WebRequest translationWebRequest = WebRequest.Create(uri);
                translationWebRequest.Method = "POST";

                WebResponse response = translationWebRequest.GetResponse();
                Stream stream = response.GetResponseStream();
                Encoding encode = Encoding.GetEncoding("utf-8");

                if (stream == null)
                {
                    return null;
                }

                StreamReader translatedStream = new StreamReader(stream, encode);
                string json = translatedStream.ReadToEnd();

                AlchemyResponse alchemyResponse = JsonConvert.DeserializeObject<AlchemyResponse>(json);

                List<string> keywords = alchemyResponse.status.Equals("error", StringComparison.OrdinalIgnoreCase)
                           ? new List<string>()
                           : alchemyResponse.keywords
                                    .Where(k => k.relevance >= 0.5)
                                    .OrderByDescending(k => k.relevance)
                                    .Select(keyword => keyword.text)
                                    .ToList();

                return new ReadOnlyCollection<string>(keywords);
            }
            catch (Exception exception)
            {
                Logger.Error("[SEO] Error getting keywords from Alchemy", exception);
                return new ReadOnlyCollection<string>(new List<string>());
            }
        }

        private static bool HasAttribute<T>(PropertyInfo propertyInfo) where T : Attribute
        {
            T attr = (T)Attribute.GetCustomAttribute(propertyInfo, typeof(T));

            return attr != null;
        }

        private void GetSettings()
        {
            string alchemyKey = ConfigurationManager.AppSettings["seo.alchemy.key"];

            PageData startPageData = this.ContentRepository.Service.Get<PageData>(ContentReference.StartPage);

            PropertyInfo keywordSettingsProperty =
                startPageData.GetType().GetProperties().Where(HasAttribute<KeywordGenerationSettingsAttribute>).FirstOrDefault();

            if (keywordSettingsProperty == null)
            {
                this.MaxItems = 20;
                this.AlchemyKey = alchemyKey;
                return;
            }

            KeywordGenerationSettingsBlock keywordGenerationSettings =
                startPageData[keywordSettingsProperty.Name] as KeywordGenerationSettingsBlock;

            if (keywordGenerationSettings == null)
            {
                this.MaxItems = 20;
                this.AlchemyKey = alchemyKey;
                return;
            }

            this.MaxItems = keywordGenerationSettings.MaxItems > 0 ? keywordGenerationSettings.MaxItems : 20;
            this.AlchemyKey = !string.IsNullOrWhiteSpace(keywordGenerationSettings.AlchemyKey) ? keywordGenerationSettings.AlchemyKey : alchemyKey;
        }
    }

I really like being able to create pluggable functionality this way. I’ll update my translation provider the same way in the near future, so that you do not have to use Bing, but can inject you own.

The main module is located here, the provider for Alchemy is located here.

I’ll create some NuGet packages as well.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s