Page tree
Skip to end of metadata
Go to start of metadata


This Module is outdated. There are now more and easier solutions possible. Please have a look at

The Deadlink app is a community developed app that reports broken links on a websites - through the Magnolia CMS UI.

A long-time client request in the Magnolia CMS 4 series was a dead link checker. With the release of 5.0, the first app to be developed by the community was the Deadlink app. The app was developed by Marvin Kerkhoff.

Version 1.0 (released Sep. '13 ) features list:

  • Based on Magnolia 5.0

  • See webpages which links to the broken pages (done)

  • Report page health (done)

  • Recursive Link Tracking (done)

  • Search in HTML Pages for resources like images and css (done)

  • Frequently scanning (done | via scheduler module)

  • Tracks external pages (done) 

Version 1.1 (released Oct. '13) features list:

  • Based on Magnolia 5.1
  • Send report via email (open)
  • Send message to pulse (done)
  • Add i18n support (done)
  • Excel Export (done)
  • Codereview and Cleanup (done)
  • Make additional configurations available e.g.: ignore links, scanned attributes  (done)
  • Delete some ugly hacks (done) 
  • Support for HTTP authentication (done)
  • Version 1.1.1: Added some new translations and tweaked the config for better results
  • Version 1.1.5: Added some UI Improvements and switched the actions to command actions

Features List Version 1.2

  • Make the scanning asynchron for multiple scans (require some changes from Magnolia 5.0) MGNLUI-1901 - Getting issue details... STATUS
  • Implement a control like (should be part of the scheduler module)
  • Clean messages for endusers (open)

Install into an existing project

Jar files can be found on Magnolia Nexus. To add the app using module dependencies:


For more on module dependencies, see:

How does it work?

You can add an External-URL to the browser app and choose if you want to crawl only one page or linked pages too. After this you can start to crawl. 

After the crawler finished to scan the website you can see the result in a report. On the Top you see the Page Health, Scanned Links, Scan Time and Start Time.

On the left side you see a table from all links found on the website. On the right side you see the pages where this link is used.

You can also export your report into a common format like excel:


E-Mail Configuration

After you added your SMTP Config to Magnolia you only need to set the SMTPOutEMail in the deadlink app module and then your email in the report dialog.

Basic-Auth Configuration

You can config in every report a basic auth username and password. 

End User Configuration

SMTPOutEmailFROM address for outgoing mails
ignoredLinkswhich links should be ignored from the checkermailto:,tel:,javascript:,#String
poolsizeNumber of parallel threads10Long

false = deep check by downloading complete page
true = shallow check via HTTP HEAD request
(see also:

resultWidthLength of the short URL displayed in the table of results35Long
timeoutDetermines the timeout in milliseconds until a connection is established2000Long
userAgentuser agent which will be used to request urls from the server, you can also define a mobile agent hereMozilla/5.0String
maxScannedPageshow many pages should be scanned in depth5000Long
proxy(optional) e.g. if you need a special proxy settingnullString

Scheduled Reports

If you want to add scheduled report, please add a report to the deadlink app and configure something like this in the Magnolia Schedule Module

Tracking Pixel

The implementation does not interpret javascript at the moment. But if you use another tracking method for example tracking pixels you should have a look that you filter your ip address before you scan.

About the implementation

The app is inspired on the linkchecker project from Swapnil Sapar. It uses the httpclient component to request a html doc and scan this for href and src attributes multithreaded. I've added some additional features to scan the related html pages and save it in a performable jcr structure. To avoid scanning from external pages, pages will be skipped if the hostname is not the same. Anker, mailto:, javascript: and tel: links will be ignored by the crawler. 

Following Details are saved for every link:



A shortened URL based on the size of the configuration resultWidth
urlThe full URL
typeThe Tagname of the link e.g.: a, img or javascript
statusGood" or Broken
contentTypecontent type of the scanned link
contentLengthcontent length of the scanned link
scanTimescan time for the link
verifiedThreadthreadName which scanned the link
externalif it's true it will skip deeper scanning
notscannedIf it is scanned it would be false

Every link node could be parent of several linkUsage nodes, they will save following information:



A shortened URL based on the size of the configuration resultWidth
urlThe full URL of the parent page
pageTitleTitle of the parent page
captionThe linktext or alt attribute

 Have fun with my second app Form2DB App


  1. Mock-up or did you get this implemented? If mockup, yupp link checker is on the radar, but if implemented, can you provide more info?

  2. It's allready implemented. But i think we need a frequently scanning functionality because for the moment it is not possible make an action asynchron in Magnolia 5. Scans over a complete website could be a very long task.

  3. Want to share the config and implementation process in more detail? That's very cool. Link checker has been a frequent q. Excellent stuff. If i have missed something here (pages etc.) apologies. Would make a nice tutorial.


    1. I can share a detailed documentation. But first i will finish the scheduler stuff. Do you know if there is a favorite scheduler lib. Think i saw the quartz scheduler in older magnolia versions.

      1. True, the Magnolia scheduler module actually uses Quartz.

        1. The command for adding a Magnolia Scheduler job is now also implemented. In the next versions i will try to add an userinterface where you can add frequent scheduler tasks very easily. But for the moment this works fine with the scheduler module.

    2. Please let me know if you are interested in more details, and what you need. (wink) 

  4. Marvin Kerkhoff where can this be downloaded? Does it need to be installed in author as well or only on public, and how/where would one use it? Specifically, I assume that one needs to install this on public to the link checker to make sense, but then how do authors work with it? Or is there communication between the App on author and its "execution service" on Public? Just asking to point out where things would benefit from clarification.

    1. Hi Boris, thanks for your interest. You can download it in your nexus instance. You will find the name in the section "Install" described above. 

      You can use it complete with the author instance. You don't need to install it on a public instance. The module scans external pages so there is "at the moment" no dependency between the page tree and you scan report. I am thinking about a service command that will check the current activated page and checks this on the public instance, then you got feedback for your workflow!

      If you want to install it for testing i guess it would be good if you wait until Monday 24. Oktober 2013. I will release version 1.1 at this weekend. 

      1. Marvin

        many thanks. Happily waiting until Monday then! (I assume you meant Monday Oct 28)

  5. Hi Marvin. Apologies -  was just in middle of updating the doc. A quick q. I'd like to credit this to you in the Academy. Can you confirm that this is a personal project. I remember you demo-ing it to me during the conference. See: Pulling it into the Academy will guarantee a much wider audience. You can still edit this page etc. as I would just be using a special macro to reference your original, so as to avoid duplication of content. See also, comments on: Same page pulled into Academy:

    1. Hey Gavan,

      Yes it's a personal project. Please also have a look that i am not Martin > Marvin



  6. Hi Marvin. Change of plan; instead of pulling the page into the Academy, I will link to it. This is a more subtle approach. Casual Academy visitors might take the community apps, as 'official'.

  7. Tried to install to a Magnolia 5.5.2 with



    It fails with:


    2014-03-03 15:20:03,818 ERROR org.apache.commons.digester.Digester              : Body event threw exception

    org.apache.commons.beanutils.ConversionException: Error converting from 'String' to 'Class' de/marvinkerkhoff/setup/DeadlinkVersionHandler : Unsupported major.minor version 51.0

     at org.apache.commons.beanutils.converters.AbstractConverter.handleError(

     at org.apache.commons.beanutils.converters.AbstractConverter.convert(

     at org.apache.commons.beanutils.converters.ConverterFacade.convert(

     at org.apache.commons.beanutils.ConvertUtilsBean.convert(

     at org.apache.commons.beanutils.ConvertUtils.convert(

     at org.apache.commons.betwixt.strategy.ConvertUtilsObjectStringConverter.stringToObject(

     at org.apache.commons.betwixt.strategy.DefaultObjectStringConverter.stringToObject(

     at org.apache.commons.betwixt.strategy.ObjectStringConverter.stringToObject(

     at org.apache.commons.betwixt.expression.TypedUpdater.update(



     at org.apache.commons.digester.Digester.endElement(

     at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)

     at org.apache.xerces.impl.dtd.XMLDTDValidator.endNamespaceScope(Unknown Source)

     at org.apache.xerces.impl.dtd.XMLDTDValidator.handleEndElement(Unknown Source)

     at org.apache.xerces.impl.dtd.XMLDTDValidator.endElement(Unknown Source)

     at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)

     at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)

     at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)

     at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

     at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)

     at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

     at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)

     at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)

     at org.apache.commons.digester.Digester.parse(


     at info.magnolia.module.model.reader.BetwixtModuleDefinitionReader.readFromResource(

     at info.magnolia.module.model.reader.BetwixtModuleDefinitionReader.readAll(

     at info.magnolia.module.ModuleManagerImpl.loadDefinitions(

     at info.magnolia.init.MagnoliaServletContextListener.contextInitialized(

     at info.magnolia.init.MagnoliaServletContextListener.contextInitialized(

     at org.apache.catalina.core.StandardContext.listenerStart(

     at org.apache.catalina.core.StandardContext.startInternal(

     at org.apache.catalina.util.LifecycleBase.start(

     at org.apache.catalina.core.ContainerBase$

     at org.apache.catalina.core.ContainerBase$

     at java.util.concurrent.FutureTask$Sync.innerRun(


     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(

     at java.util.concurrent.ThreadPoolExecutor$


    Caused by: java.lang.UnsupportedClassVersionError: de/marvinkerkhoff/setup/DeadlinkVersionHandler : Unsupported major.minor version 51.0

     at java.lang.ClassLoader.defineClass1(Native Method)

     at java.lang.ClassLoader.defineClassCond(

     at java.lang.ClassLoader.defineClass(





     at Method)


     at org.apache.catalina.loader.WebappClassLoader.findClass(

     at org.apache.catalina.loader.WebappClassLoader.loadClass(

     at org.apache.catalina.loader.WebappClassLoader.loadClass(

     at org.apache.commons.beanutils.converters.ClassConverter.convertToType(

     at org.apache.commons.beanutils.converters.AbstractConverter.convert(

  8. Tom Wespi: That's a Java class version error - your JVM is too old or something like that.

    Marvin Kerkhoff: Thanks for this it's awesome.

    Some comments:

    1) The Skull Icon isn't showing up for me (sad) That was one of the main reasons to install this module. Even if it didn't work at all I wanted it install it to have a skull icon in my app launcher. And now it's not showing up! Boo!

    2) If you click "show report" before running "scan" at least once you get an exception...

    We're testing right now to see if we can install it for our customers. I'll let you know how it goes.

    Thanks and regards from Vienna,


    1. Hi Richard,

      yeah the Scull Icon. I would love to have it, but it is not in the standard icons of magnolias icon-font. It was only loaded if you clicked on the app. Because then my css was loaded. But i removed it for the moment until magnolia add this icon to the standard icon list (wink)

      2) Thx for the Bug. Report. Will try to figure out the reason. 

  9. Hi Marvin,

    Here's the next issue... (do you want me to post these here, or is there a better place to file issues? Does DeadLink have a JIRA project somewhere?)

    While scanning I was getting tons of errors like:

    java.lang.NoClassDefFoundError: org/apache/http/impl/client/HttpClientBuilder

    Looking into this:

    Deadlink depends on httpclient 4.3. In my install this is being omitted due to conflict with httpclient 4.2.1, which is a dependency of the groovy module, and (indirectly) of the rest-integration module...

    In my tests adding an <exclusion> where I include the groovy dependency works, httpclient 4.3 is then pulled in and link checking works. I still need to test the groovy module to make sure it runs fine with 4.3, I'll give feedback how that works out. But maybe magnolia could upgrade to httpclient 4.3, or you could downgrade to 4.2.1, otherwise everyone installing either the groovy or the rest-integration modules alongside the deadlink module will have the same problems.



    Note in case someone is looking for proxy settings: setting the proxy via VM properties  (eg. -Dhttp.proxyPort=8080 ) also works.

  11. Hi Richard,

    there is a Jira Project in the Magnolia Jira. I am using some relevant functions in the crawler app. It needs to be changed if i downgrade to 4.2.1. But in the end it would be not a big change.

    Please Post it to a Jira Ticket. 

    1. Done:  DEAD-4 - Getting issue details... STATUS

  12. Ok, it's all working really well now. A really impressive module, nice work!

    I have the following suggestions to really make it production-ready:

    1) Strip the fragment from links before checking them (no need to check unless you're going to check for the presence of <a name="anchor"> in the target page.

    2) At the moment it is checking a lot of redundant links. If I run a report on a website it rechecks, for example, each navigation link for every page. That doesn't make much sense, since we can assume if the link worked 30 seconds ago when I checked it for my parent page, it is still working now when I want to check it for the child page. Solution: after stripping the fragment part of the URL, store each checked URL in a big hashmap and only perform checks for new URLs.


    1. Added

      DEAD-5 - Getting issue details... STATUS , DEAD-6 - Getting issue details... STATUS and DEAD-7 - Getting issue details... STATUS

      DEAD-7 - Getting issue details... STATUS  is a nasty one.

      1. 2 of 4 are fixed have a look at version 1.1.6
  13. 2 of 4 are fixed have a look at version 1.1.6