Automating Your Accessibility Tests

Seren Davies

Accessibility is one of those things we all wish we were better at. It can lead to a bunch of questions like: how do we make our site better? How do we test what we have done? Should we spend time each day going through our site to check everything by hand? Or just hope that everyone on our team has remembered to check their changes are accessible?

This is where automated accessibility tests can come in. We can set up automated tests and have them run whenever someone makes a pull request, and even alongside end-to-end tests, too.

Automated tests can’t cover everything however; only 20 to 50% of accessibility issues can be detected automatically. For example, we can’t yet automate the comparison of an alt attribute with an image’s content, and there are some screen reader tests that need to be carried out by hand too. To ensure our site is as accessible as possible, we will still need to carry out manual tests, and I will cover these later.

First, I’m going to explain how I implemented automated accessibility tests on Elsevier’s ecommerce pages, and share some of the lessons I learnt along the way.

Picking the right tool

One of the hardest, but most important parts of creating our automated accessibility tests was choosing the right tool.

We began by investigating aXe CLI, but soon realised it wouldn’t fit our requirements. It couldn’t check pages that required a visitor to log in, so while we could test our product pages, we couldn’t test any customer account pages. Instead we moved over to Pa11y. Its beforeScript step meant we could log into the site and test pages such as the order history.

The example below shows the how the beforeScript step completes a login form and then waits for the login to complete before testing the page:

beforeScript: function(page, options, next) {
  // An example function that can be used to make sure changes have been confirmed before continuing to run Pa11y
  function waitUntil(condition, retries, waitOver) {
    page.evaluate(condition, function(err, result) {
      if (result || retries < 1) {
        // Once the changes have taken place continue with Pa11y testing
        waitOver();
      } else {
        retries -= 1;
        setTimeout(function() {
          waitUntil(condition, retries, waitOver);
        }, 200);
      }
    });
  }

  // The script to manipulate the page must be run with page.evaluate to be run within the context of the page
  page.evaluate(function() {
    const user = document.querySelector('#login-form input[name="email"]');
    const password = document.querySelector('#login-form input[name="password"]');
    const submit = document.querySelector('#login-form input[name="submit"]');
    user.value = 'user@example.com';
    password.value = 'password';
    submit.click();
  }, function() {
    // Use the waitUntil function to set the condition, number of retries and the callback
    waitUntil(function() {
      return window.location.href === 'https://example.com';
    }, 20, next);
  });
}

The waitUntil callback allows the test to be delayed until our test user is successfully logged in.

Another thing to consider when picking a tool is the type of error messages it produces. aXe groups all elements with the same error together, so the list of issues is a lot easier to read, and it’s easier to identify the most commons problems. For example, here are some elements that have insufficient colour contrast:

Violation of "color-contrast" with 8 occurrences!
Ensures the contrast between foreground and background colors meets
WCAG 2 AA contrast ratio thresholds. Correct invalid elements at:
  - #maincontent > .make_your_mark > div:nth-child(2) > p > span > span
  - #maincontent > .make_your_mark > div:nth-child(4) > p > span > span
  - #maincontent > .inform_your_decisions > div:nth-child(2) > p > span > span
  - #maincontent > .inform_your_decisions > div:nth-child(4) > p > span > span
  - #maincontent > .inform_your_decisions > div:nth-child(6) > p > span > span
  - #maincontent > .inform_your_decisions > div:nth-child(8) > p > span > span
  - #maincontent > .inform_your_decisions > div:nth-child(10) > p > span > span
  - #maincontent > .inform_your_decisions > div:nth-child(12) > p > span > span
For details, see: https://dequeuniversity.com/rules/axe/2.5/color-contrast

aXe also provides links to their site where they discuss the best way to fix the problem.

In comparison, Pa11y lists each individual error which can lead to a very verbose list. However, it does provide helpful suggestions of how to fix problems, such as suggesting an alternative shade of a colour to use:

• Error: This element has insufficient contrast at this conformance level.
  Expected a contrast ratio of at least 4.5:1, but text in this element has a contrast ratio of 2.96:1.
  Recommendation: change text colour to #767676.
  ⎣ WCAG2AA.Principle1.Guideline1_4.1_4_3.G18.Fail
  ⎣ #maincontent > div:nth-child(10) > div:nth-child(8) > p > span > span
  ⎣ <span style="color:#969696">Featured products:</span>

Integrating into our build pipeline

We decided the perfect time to run our accessibility tests would be alongside our end-to-end tests. We have a Jenkins job that detects changes to our staging site and then triggers the end-to-end tests, and in turn our accessibility tests. Our Jenkins job retrieves the contents of a GitHub repository containing our Pa11y script file and npm package manifest.

Once Jenkins has cloned the repository, it installs any dependencies and executes the tests via:

npm install && npm test

Bundling the URLs to be tested into our test script means we don’t have a command line style test where we list each URL we wish to test in the Jenkins CLI. It’s an effective method but can also be cluttered, and obscure which URLs are being tested.

In the middle of the office we have a monitor displaying a Jenkins dashboard and from this we can see if the accessibility tests are passing or failing. Everyone in the team has access to the Jenkins logs and when the build fails they can see why and fix the issue.

Fixing the issues

As mentioned earlier, Pa11y can generate a long list of areas for improvement which can be very verbose and quite overwhelming. I recommend going through the list to see which issues occur most frequently and fix those first. For example, we initially had a lot of errors around colour contrast, and one shade of grey in particular. By making this colour darker, the number of errors decreased, and we could focus on the remaining issues.

Another thing I like to do is to tackle the quick fixes, such as adding alt text to images. These are small things that allow us to make an impact instantly, giving us time to fix more detailed concerns such as addressing tabindex issues, or speaking to our designers about changing the contrast of elements on the site.

Manual testing

Adding automated tests to check our site for accessibility is great, but as I mentioned earlier, this can only cover 20-50% of potential issues. To improve on this, we need to test by hand too, either by ourselves or by asking others.

One way we can test our site is to throw our mouse or trackpad away and interact with the site using only a keyboard. This allows us to check items such as tab order, and ensure menu items, buttons etc. can be used without a mouse. The commands may be different on different operating systems, but there are some great guides online for learning more about these.

It’s tempting to add alt text and aria-labels to make errors go away, but if they don’t make any sense, what use are they really? Using a screenreader we can check that alt text accurately represents the image. This is also a great way to double check that our ARIA roles make sense, and that they correctly identify elements and how to interact with them. When testing our site with screen readers, it’s important to remember that not all screen readers are the same and some may interact with our site differently to others.

Consider asking a range of people with different needs and abilities to test your site, too. People experience the web in numerous ways, be they permanent, temporary or even situational. They may interact with your site in ways you hadn’t even thought about, so this is a good way to broaden your knowledge and awareness.

Tips and tricks

One of our main issues with Pa11y is that it may find issues we don’t have the power to solve. A perfect example of this is the one pixel image Facebook injects into our site. So, we wrote a small function to go though such errors and ignore the ones that we cannot fix.

const test = pa11y({
  ....
  hideElements: '#ratings, #js-bigsearch',
  ...
});

const ignoreErrors: string[] = [
  '<img src="https://books.google.com/intl/en/googlebooks/images/gbs_preview_button1.gif" border="0" style="cursor: pointer;" class="lightbox-is-image">',
  '<script type="text/javascript" id="">var USI_orderID=google_tag_mana...</script>',
  '<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=123456789012345&ev=PageView&noscript=1">'
  ];
  const filterResult = result => {
    if (ignoreErrors.indexOf(result.context) > -1) {
      return false;
    }
    return true;
  };

Initially we wanted to focus on fixing the major problems, so we added a rule to ignore notices and warnings. This made the list or errors much smaller and allowed us focus on fixing major issues such as colour contrast and missing alt text. The ignored notices and warnings can be added in later after these larger issues have been resolved.

const test = pa11y({
  ignore: [
    'notice',
    'warning'
  ],
...
});

Jenkins gotchas

While using Jenkins we encountered a few problems. Sometimes Jenkins would indicate a build had passed when in reality it had failed. This was because Pa11y had timed out due to PhantomJS throwing an error, or the test didn’t go past the first URL. Pa11y has recently released a new beta version that uses headless Chrome instead of PhantomJS, so hopefully these issues will less occur less often.

We tried a few approaches to solve these issues. First we added error handling, iterating over the array of test URLs so that if an unexpected error happened, we could catch it and exit the process with an error indicating that the job had failed (using process.exit(1)).

for (const url of urls) {
  try {
    console.log(url);
    let urlResult = await run(url);
    urlResult = urlResult.filter(filterResult);
    urlResult.forEach(result => console.log(result));
  }
  catch (e) {
    console.log('Error:', e);
    process.exit(1);
  }
}

We also had issues with unhandled rejections sometimes caused by a session disconnecting or similar errors. To avoid Jenkins indicating our site was passing with 100% accessibility, when in reality it had not executed any tests, we instructed Jenkins to fail the job when an unhandled rejection or uncaught exception occurred:

process.on('unhandledRejection', (reason, p) => {
  console.log('Unhandled Rejection at:', p, 'reason:', reason);
  process.exit(1);
});
process.on('uncaughtException', (err) => {
  console.log('Caught exception: ${err}n');
  process.exit(1);
});

Now it’s your turn

That’s it! That’s how we automated accessibility testing for Elsevier ecommerce pages, allowing us to improve our site and make it more accessible for everyone. I hope our experience can help you automate accessibility tests on your own site, and bring the web a step closer to being accessible to all.

Seren is a Software Engineer at Elsevier. In between working and promoting accessibility she can can be found doing the occasional bit of nail art.