Page tree
Skip to end of metadata
Go to start of metadata

The SEO module provides tools for analyzing and validating content in your JCR repository. The SEO module is typically used to analyze pages but could also be used to analyze any type of content node.

The SEO module contains:

  • the Content Tuner app that allows you to run auditors and view their results
  • the Audit Manager and several pre-defined auditors
  • an AuditPageAction that runs active auditors
  • a FlexiAuditPageAction that runs auditors you select

First, to introduce a bit of terminology. An auditor is a self-contained test of a selected website page or any other content node. The SEO module contains a variety of auditors that can be configured to test different aspects of your pages and content.

Auditors analyze a selected node and either pass or fail the node. Audit failures are graded at three different levels:

  • Error - the problem must be corrected 
  • Warning - the problem should be corrected
  • Note - the problem is not severe but could be corrected

Auditors can: 

  • test the rendered HTML of your page
  • test if required properties are defined for your page or content
  • connect to an external service to analyse your page, for example an HTML validation service or SEO analysis service
  • be extended and customized to any analysis you need

The Audit Manager provides you a framework for executing one or more auditors and saving their results. You can add new auditors to the Audit Manager and change the configuration of defined auditors.

Installation

You can install the SEO module by either: 

  • Downloading a pre-built SEO app module jar file and placing it in the WEB-INF/lib directory (see Installing a module for more information)
  • Adding a Maven dependency

Maven is the easiest way to install the module. However, there are two different versions of the SEO module depending on what Magnolia version you are using.

Add the following dependency to your bundle if you are using Magnolia v5.6 or newer:

<dependency>
  <groupId>info.magnolia.seo</groupId>
  <artifactId>seo</artifactId>
  <version>1.16.1</version>
</dependency>


Add the following dependency to your bundle if you are using Magnolia v5.5 or older:

<dependency>
  <groupId>info.magnolia.seo</groupId>
  <artifactId>seo</artifactId>
  <version>1.15</version>
</dependency>


Content Tuner app

You can launch an audit and view the results in the Content Tuner app. You can also export the results of an audit as an Excel file or text file. 

The Content Tuner app is installed in the Edit group of Admin Central:

The Content Tuner app has two views: the browser view and the audit detail view.

The browser view shows the current page tree with two new columns: the audit status, indicating if the page has errors, warnings or notes and when the last audit was performed.

The audit detail view (available when an audit has been run on the selected page) gives information about the audit results.

The audit detail view shows you an overview detailing how many errors, warnings and notes were found in the last audit. The audit details view also has a section for each successful or failed audit that was performed.

You can click on an audit section to expand it and find more details on problem found. For some audits, you might also find a button that will link you to an app where you can correct the problem. 

In this example, the button in the Undefined property "title" section will open the Pages app and page editor where you can edit the page's properties and add the title.

Configuration

You can configure the SEO module actions and the SEO auditors with the Config app. Configuring each of these is described in detail below.

Actions

The SEO module contains two custom actions that can be configured into other apps. Both actions launch an analysis by the Audit Manager using configured auditors:

  • AuditPageAction
  • FlexiAuditPageAction

AuditPageAction launches an analysis running all active auditors (see the active property for auditor configuration below). 

FlexiAuditPageAction opens a dialog showing all currently configured auditors, allowing the user to pick and choose which auditors are run.

Both actions are configurable just like any other action. For example: 

Neither AuditPageAction nor FlexiAuditPageAction has any extra properties, they both can be configured as standard actions. See Action definition for more on configuring actions. 

Auditors

The SEO module contains several auditors: 

  • HtmlElementAuditor
  • I18NAuditor
  • I18NPropertyDefinitionAuditor
  • I18NPropertyValidationAuditor
  • LinkAuditor
  • MetaDescriptionAuditor
  • ParagraphLengthAuditor
  • PropertyDefinitionAuditor
  • PropertyValidatorAuditor
  • ValidHtmlAuditor

What each of these auditors does and how they can be configured is described below. 

Definition versus Validation Auditors

There are two sets of auditors that do related but separate jobs. 

For internationalized properties (properties with language variants), you have I18NPropertyDefinitionAuditor and I18NPropertyValidationAuditor. 

For plain vanilla properties (properties without language variants), you have PropertyDefinitionAuditor and PropertyValidationAuditor. 

The definition auditors - PropertyDefinitionAuditor and I18NPropertyDefinitionAuditor - check that value(s) for a property have been defined. They do not validate the value(s) of the property. 

The validation auditors - PropertyValidationAuditor and I18NPropertyValidationAuditor - validate the value(s) for a designated property if the property (or its language variant) have been defined. 

Validation and definition auditors work in conjunction. They separate the work of determining if a property is defined from if it is valid and contents appropriate content. 

Standard Auditor Properties

There are a few properties common to all auditors:

PropertyRequired/OptionalAllowed valuesNotes
name
required

a unique string

The name of the auditor, usually the node name (either from JCR or YAML configuration).

It's important that each auditor have a unique name among all configured auditors. Auditors will save some results by their name.

description
requireda string

A short description of the auditor.

The description be displayed when selecting audits.

active
required
true, false

Defines if the auditor is active.

AuditPageAction only executes active auditors, inactive auditors are skipped.

FlexiAuditPageAction allows users to select what auditors are run and both active and inactive auditors can be selected.

HtmlElementAuditor

Class name: info.magnolia.services.seo.audit.impl.HtmlElementAuditor

HtmlElementAuditor checks for the presence of a specified HTML element. If the HTML element is found is found at least once, the audit passes, otherwise the audit fails. HtmlElementAuditor can be applied to any node that can be rendered by the Magnolia RenderingEngine. 

HtmlElementAuditor uses jsoup queries to parse and find HTML elements, see https://jsoup.org for more about jsoup. jsoup queries have a jQuery or CSS like syntax.

Here are some examples: 

a[href]

Find all anchor elements with a href attribute.

meta[name="keywords"]

Find meta keywords elements in the HTML. 

See https://jsoup.org/cookbook/extracting-data/selector-syntax for more on jsoup queries.

Here's an example of a configured HtmlElementAuditor:

In addition to the standard auditor properties discussed above, HtmlElementAuditor can be configured with the following properties:

PropertyRequired/OptionalAllowed valuesNotes
level
required
auditErrors, auditWarnings, auditNotes
Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)
auditProperty
requireda unique stringDefines the property name for storing failed audit results. The property name should be unique among auditors or auditors may overwrite results
auditValue
requireda string

Defines a message or explanation for a failed audit. The message can have placeholders that are replaced with information about the node and auditor:

0 - the node path

1 - the configured query property

Example: Oops! Couldn't find {1} in the page {0}!
query
requireda stringA valid jsoup query. See https://jsoup.org/cookbook/extracting-data/selector-syntax for more on jsoup queries.
invalidValuerequired if valuePattern is defined, optional otherwisea string

Defines a message or explanation if a query result does not match valuePattern. The message can have placeholders that replaced with:

0 - the query result

1 - the valuePattern

valuePattern
optionala valid Java expressionIf defined, the valuePattern will be applied to the returned results of the jsoup query. If the valuePattern does not match a result, an audit result of level will be marked


I18NAuditor

Class name: info.magnolia.services.seo.audit.impl.I18NAuditor

I18NAuditor will find all internationalised fields in a page and determine if the field has values for all supported languages. Supported languages and the default language is derived from the assigned sites defined in the Site Manager.  

Each field with a value for a supported language will be counted and the total number of all expected values for internationalised fields will be totalled. A ratio of actual internationalised field values versus expected internationalised fields is computed and compared to threshold delegates defined for the I18NAuditor. 

A threshold delegate defines an upper and lower bound for a given audit result based on the actual internationalised field values to expected field values. 

For example, if the actual/expected ratio is: 

  • between 0 and 0.6 will result in an error
  • between 0.6 and 0.8 will result in a warning
  • between 0.8 and 0.9 will result in a note
  • greater than 0.9, audit passed

The threshold delegates replace the "level" property for the I18NAuditor and allow it to report problems of different severity.

I18NAuditor can be configured with the following properties, in addition to the common properties above:


PropertyRequired/OptionalAllowed valuesNotes
auditProperty
requireda unique stringDefines the property name for storing failed audit results. The property name should be unique among auditors or auditors may overwrite their results
passedProperty
optionala unique stringDefines the property name for storing valid links. The property name should be unique among auditors
rootUrl
requireda stringDefines the base URL to be used when checking relative links. Relative links will be appended to the base URL and then checked, so the base URL should end with a slash.
excludedLinks
optionala list of Java regular expressions

Defines one or more patterns of URLs to be ignored. You can define more than one regular expression.

If no regular expressions are defined, all links will be checked.

See https://docs.oracle.com/javase/tutorial/essential/regex/ for more on Java regular expressions.

validStatuses
optionala list of HTTP status codes as integers

Defines the expected HTTP status codes for the link to be considered valid.

If not set, the list of valid status codes is: 200 (SC_OK).

pauseTime
optionalan integer

Defines a delay (in milliseconds) between checking links. You can set this property to a non-zero value to avoid flooding a server with HTTP requests.

If not set, the pause time will be 0 (no delay between requests).


I18NPropertyDefinitionAuditor

Class name: info.magnolia.services.seo.audit.impl.I18NPropertyDefinitionAuditor

I18NPropertyDefinitionAuditor is a companion to PropertyDefinitionAuditor. Instead of checking for the definition of a node property for the default language, I18NPropertyDefinitionAuditor can check that values are defined for all or some of the available languages for the site.

Note that I18NPropertyDefinitionAuditor just checks that a property is defined, not the value of the property. You can use I18NPropertyValidationAuditor to check that the values per language are valid. 

Here's an example I18PropertyDefinitionAuditor: 

I18NPropertyDefinitionAuditor can be configured with the following properties, in addition to the standard properties above:

PropertyRequired/OptionalAllowed valuesNotes
propertyName
required
a string
Defines the node property name to be checked.
level
optional
auditErrors, auditWarnings, auditNotes

Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)

validateAll
optional
true, false (default is true)

Controls what languages for the property will be checked.

If validateAll is true, the property will be checked for language variants for all languages defined for the site.

If validateAll is false, only the language variants defined by the expectedLanguages property will be checked.

expectedLanguages
optionala list of language codes or language plus country codes

expectedLanguages defines a list of the languages to be checked for language variants of the property. It can be a subset of the languages defined for the site. Languages not included in expectedLanguages will not be checked.

expectedLanguages will be used only if validateAll is set to false.

I18NPropertyValidationAuditor

Class name: info.magnolia.services.seo.audit.impl.I18NPropertyValidationAuditor

I18NPropertyValidationAuditor is a companion to PropertyValidationAuditor. Instead of validating the definition of a node property for the default language, I18NPropertyValidationAuditor can validate the values of a designated property for all or some of the available languages for the site.

Note that I18NPropertyValidationAuditor just validates the property values and will not check that all language variants of a property defined. You can use I18NPropertyDefinitionAuditor to check that all required language variants of a property are defined. 

Here's an example I18NPropertyValidationAuditor: 

I18NPropertyValidationAuditor can be configured with the following properties, in addition to the standard properties above: 

PropertyRequired/OptionalAllowed valuesNotes
propertyName
required
a string
Defines the node property name to be checked.
level
optional
auditErrors, auditWarnings, auditNotes

Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)

validateAll
optional
true, false (default is true)

Controls what languages for the property will be checked.

If validateAll is true, all the property will be checked for language values for all languages defined for the site.

If validateAll is false, only the languages defined by the expectedLanguages property will be checked.

expectedLanguages
optionala list of language codes or language plus country codes

expectedLanguages defines a list of the languages to be checked for property values. It can be a subset of the languages defined for the site. Languages not included in expectedLanguages will not be checked.

expectedLanguages will be used only if validateAll is set to false.

valuePatterns
optionala map of language codes or language plus country codes to regular expressions values

Defines a validation pattern - a Java regular expression - to validate the property value for a particular language. If the property values does not match the value pattern for the language, the audit will fail.

If valuePatterns does not have an entry for a particular language, defaultValuePattern (see below) will be used to validate the property value for the language.

See https://docs.oracle.com/javase/tutorial/essential/regex/ for more on Java regular expressions.

defaultValuePatternoptionala valid Java regular expression

Defines a validation pattern - a Java regular expression - to be used when a language specific value pattern has not been defined in valuePattern. If the property values does not match the value pattern for the language, the audit will fail.

See https://docs.oracle.com/javase/tutorial/essential/regex/ for more on Java regular expressions.


LinkAuditor

Class name: info.magnolia.services.seo.audit.impl.LinkAuditor

LinkAuditor will find links in a rendered HTML page and check if they are accessible. The URLs contained in HTML anchor, link and img elements are extracted and checked. Other URLs, such as URLs contained in Javascript functions won't be detected and so won't be checked.

Note: checking a large number of links can be time consuming, you may want to use the excludedLinks property to ignore some links or run the LinkAuditor only when necessary.

Here's an example LinkAuditor:

LinkAuditor can be configured with the following properties, in addition to the standard properties above:


PropertyRequired/OptionalAllowed valuesNotes
level
required
auditErrors, auditWarnings, auditNotes
Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)
auditProperty
requireda unique stringDefines the property name for storing failed audit results. The property name should be unique among auditors or auditors may overwrite their results
passedProperty
optionala unique stringDefines the property name for storing valid links. The property name should be unique among auditors
rootUrl
requireda stringDefines the base URL to be used when checking relative links. Relative links will be appended to the base URL and then checked, so the base URL should end with a slash.
excludedLinks
optionala list of Java regular expressions

Defines one or more patterns of URLs to be ignored. You can define more than one regular expression.

If no regular expressions are defined, all links will be checked.

See https://docs.oracle.com/javase/tutorial/essential/regex/ for more on Java regular expressions.

validStatuses
optionala list of HTTP status codes as integers

Defines the expected HTTP status codes for the link to be considered valid.

If not set, the list of valid status codes is: 200 (SC_OK).

pauseTime
optionalan integer

Defines a delay (in milliseconds) between checking links. You can set this property to a non-zero value to avoid flooding a server with HTTP requests.

If not set, the pause time will be 0 (no delay between requests).


MetaDescriptionAuditor

Class name: info.magnolia.services.seo.audit.impl.MetaDescriptionAuditor

MetaDescriptionAuditor checks a node for a property named "description" and if defined, checks the length of the value. 

Many search engines ignore long meta descriptions - usually all text after about 160 to 180 characters. MetaDescriptionAuditor can help you check pages for long meta descriptions.

Note: MetaDescriptionAuditor assumes that the node property "description" contains the meta description text.

Here's an example MetaDescriptionAuditor:

MetaDescriptionAudit can be configured with the following properties, in addition to the standard properties above:

PropertyRequired/OptionalAllowed valuesNotes
length
optionalan integer

Defines the maximum meta description length, lengths above the maximum will fail with a warning.

If no length is defined, the maximum length will be 160.



ParagraphLengthAuditor

Class name: info.magnolia.services.seo.audit.impl.ParagraphLengthAuditor

ParagraphLengthAuditor checks the length in words of HTML elements containing text, not the overall length in characters. You can use ParagraphLengthAuditor to check for pages with overly long text blocks. 

ParagraphLengthAuditor can any HTML element that contains text that can be found by a jsoup query.

Here's an example ParagraphLengthAuditor:

ParagraphLengthAuditor can be configured with the following properties, in addition to the standard properties above:

PropertyRequired/OptionalAllowed valuesNotes
level
required
auditErrors, auditWarnings, auditNotes
Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)
auditProperty
requireda unique stringDefines the property name for storing failed audit results. The property name should be unique among auditors or auditors may overwrite their results
auditValue
requireda string

Defines a message or explanation for a failed audit. The message can have placeholders that are replaced with information about the node and auditor:

0 - the node path

1 - the configured query property

Example: Oops! Couldn't find &#123;1&#125; in the page &#123;0&#125;!
query
optionala string

A valid jsoup query. See https://jsoup.org/cookbook/extracting-data/selector-syntax for more on jsoup queries.

If not specified, ParagraphLengthAuditor will check the text of "p" (paragraph) HTML elements.

maxWords
optionalan integer

Defines the maximum number of words allowed in the text block.

If not specified, the limit will be 150.

PropertyDefinitionAuditor

Class name: info.magnolia.services.seo.audit.impl.PropertyDefinitionAuditor

PropertyDefinitionAuditor checks to see if a specified node property is defined. It doesn't check if the value of the property, use PropertyValidationAuditor for that. 

You can use PropertyDefinitionAuditor for missing properties in pages or content nodes. You can use I18NPropertyDefinitionAuditor to check the definition of internationalized properties (properties with language variants).

Here's an example of a configured PropertyDefinitionAuditor:

PropertyDefinitionAuditor can be configured with the following properties, in addition to the standard properties above:


PropertyRequired/OptionalAllowed valuesNotes
propertyName
required
a string
Defines the node property name to be checked.
level
optional
auditErrors, auditWarnings, auditNotes

Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)

PropertyValidationAuditor

Class name: info.magnolia.services.seo.audit.impl.PropertyValidationAuditor

PropertyValidationAuditor is a companion to I18NPropertyValidationAuditor. PropertyValidationAuditor validates the values of a designated property. If the property is internationalized (e.g. the property has language variants), use I18NPropertyValidationAuditor to validate the values of the property.

Note that PropertyValidationAuditor just validates the property values and will not check that all language variants of a property are defined. You can use PropertyDefinitionAuditor to check that the property is defined. 

Here's an example PropertyValidationAuditor: 

PropertyValidationAuditor can be configured with the following properties, in addition to the standard properties above: 

PropertyRequired/OptionalAllowed valuesNotes
propertyName
required
a string
Defines the node property name to be checked.
level
optional
auditErrors, auditWarnings, auditNotes

Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)

expectedLanguages
optionala list of language codes or language plus country codes

expectedLanguages defines a list of the languages to be checked for property values. It can be a subset of the languages defined for the site. Languages not included in expectedLanguages will not be checked.

expectedLanguages will be used only if validateAll is set to false.

valuePatternrequireda valid Java regular expression

Defines a validation pattern - a Java regular expression - to check the property value. If the property value does not match the value pattern, the audit will fail.

See https://docs.oracle.com/javase/tutorial/essential/regex/ for more on Java regular expressions.

ValidHtmlAuditor

Class name: info.magnolia.services.seo.audit.impl.ParagraphLengthAuditor

ValidHtmlAuditor checks the rendered HTML of a page for correctness using the W3C HTML validator (see https://validator.w3.org for more information). 

ValidHtmlAuditor will capture and save any HTML errors found by the W3C HTML validator; see https://validator.w3.org/docs/errors.html for more on HTML errors returned by the W3C validator.

Here's an example:

ValidHtmlAuditor can be configured with the following properties, in addition to the standard properties above:

PropertyRequired/OptionalAllowed valuesNotes
level
required
auditErrors, auditWarnings, auditNotes
Determines how a failed audit will be counted: as an error (auditErrors), as a warning (auditWarnings) or as a note (auditNotes)
auditProperty
requireda unique stringDefines the property name for storing failed audit results. The property name should be unique among auditors or auditors may overwrite their results
auditValue
requireda string

Defines a message or explanation for a failed audit. The message can have placeholders that are replaced with information about the node and auditor:

0 - the node path

1 - the configured query property

Example: Oops! Couldn't find {1} in the page {0}!
strict
optional
true, false

Controls the level of validation done by the W3C HTML validator.

If strict is set to true, the W3C HTML will return all errors, warnings and notes found.

If strict is set to false, only HTML errors will be returned.

If not set, strict is set to false.

  • No labels

7 Comments

  1. Nice app. looks similar to yoast seo from wordpress

  2. Where I can download jar file?

    1. It's only available as enterprise module within the nexus repository

  3. Hi, are you planning to update SEO module with new Vaadin version? It's impossible to startup Magnolia 5.6 with SEO module installed.

    1. SEO-9 - Getting issue details... STATUS created

  4. Yes, I certainly will! I might have to split off a 5.5 branch and a 5.6 branch to accommodate different Vaadin versions...

  5. Great, I will wait for it. Do you have any idea when upgrade will be available?