Mainstreaming Accessibility: A Progress Report

Although nobody argues that publications and the systems that deliver and render them shouldn’t be accessible to everybody, the issue of accessibility has often been more a source of guilt than action. Publishers have treated it as something they know they should get around to, someday. That someday doesn’t come until the lack of accessibility causes them to lose sales—which rarely happens (that they’re aware of). And institutions—for example, libraries—typically consider themselves at the mercy of the publishers. They can only provide what they can get.

Meanwhile, on college campuses, where this is a big issue (most are mandated to provide an accessible version of all the course material a print-disabled student needs), the Disability Services Offices diligently remediate (aka “fix”) whatever they can get: rarely an EPUB, often a PDF, and sometimes just the print. All of this mostly manual repair work is expensive and time consuming, and it can be repeated for the very same book or journal article at campuses all across the country. It is a pathetic situation.

The good news is that this situation is slowly but surely changing. More and more publications and systems are designed for accessibility. And although it still requires some extra work to implement those accessibility features properly, that work is diminishing as accessibility becomes embedded in standard specifications and workflows.

Converging Standards

One of the biggest factors enabling this is that the file formats and other standards for accessibility are ones that publishers and their partners and vendors use anyway. There is still the misconception that an accessible publication has to be a special version, one which not only has to be made, but managed, separately from the regular version.

This used to be the case. For example, the DAISY Consortium, the global authority on accessibility, promulgated an XML specification called DTBook that was designed for assistive technology. But hardly anybody outside of K-12 education used it (although it’s still the basis of the NIMAS standard for that type of content). Few publishers or vendors understood it, and it required a separate workflow and distribution process.

Now, DAISY specifies EPUB 3 as the proper format for the interchange of accessible publications. EPUB is the standard format for ebooks, even for Amazon (although Amazon does have its own proprietary format, the proper format in which to provide an ebook to Amazon is EPUB). Most publishers and their vendors create EPUBs routinely.

Best of all, EPUB itself is not a “special format”; it’s based entirely on web standards. The content documents in an EPUB—for example, each chapter of a book—are tagged as HTML. (The EPUB spec currently requires expressing this HTML as XML; the markup is all HTML.) Virtually all of the other aspects of the EPUB specification are web standards as well. The standards for accessible publications and for ebooks have converged around web standards.

One of the most important of those is the Web Content Accessibility Guidelines (WCAG). WCAG provides basic principles of accessibility, which is that content should be perceivable, operable, and understandable by everybody and robust so as not to become inaccessible over time, and then drills down into specific requirements accompanied by “success criteria” that lets you know when you’ve got things right.

WCAG has become the foundation for almost all accessibility specifications and requirements globally. This is another example of the convergence. Just as accessible publications have moved from specialized formats to the widely used EPUB, accessibility laws and regulations that used to differ significantly country by country or region by region are now all generally based on WCAG. For example, the most widely cited accessibility regulation in the United States, Section 508, was updated in 2018 (the so-called “Section 508 Refresh”) to be based on and aligned with WCAG, which has the benefit of harmonizing it with corresponding standards like the EU’s EN 301 549.

To avoid the perfect being the enemy of the good, WCAG provides three levels of conformance, A, AA, and AAA, designating increasing levels of accessibility. Level A is pretty much a no-brainer; most websites and publications should have little trouble achieving it. The EPUB Accessibility 1.0 specification aims generally for AA conformance, recognizing that although AAA is the ideal, there are aspects of that specification that aren’t currently realistic for publishers to accomplish. It also provides metadata, developed in concert with schema.org, to describe the nature and extent of accessibility features in a publication.

The Catch: You Have to Use the Standards Correctly

When I wax ecstatic about this convergence, people often take me to mean that if you use HTML and EPUB you’ve got accessible content. Not so fast. I have to point out that these standards don’t guarantee accessibility: they guarantee accessibilityability. They enable, and facilitate, the creation of accessible content, but you must use them properly.

In my consulting work I often see websites and EPUBs that their publishers think are wonderful, but which fail miserably regarding accessibility. The best (or worst) example are the EPUBs I got from a very well respected press (I won’t name them) who sent all their books to a single conversion vendor, a very well regarded one (I won’t name them either) to create their EPUBs because that vendor did the best job. But when I cracked open the example EPUBs I was horrified to see that all the inherent accessibility in them had been undermined.

Assistive technology (AT), like a screen reader, is guided by the HTML markup in a website or EPUB. That markup has inherent structural semantics. For example, AT expects to find <p>s for paragraphs in <section>s, and a user can understand which sections are subsections of other sections guided by the headings <h1>–<h6>.

Those presumably fine EPUBs had no structural semantic markup whatsoever, not even a <section> or a <p>. Instead, everything was tagged with the semantically neutral <div>, with proprietary attributes to distinguish the components—paragraphs, headings, lists, everything. Yet the publisher thought they were excellent because the EPUBs looked like the print book. And those EPUBs passed EPUBCheck because <div> is a valid tag in HTML. Those EPUBs weren’t invalid, they were just really bad.

It Can’t Be That Easy!

Just getting your HTML markup right is pretty basic, but no, that’s not all there is to it. Here are some other issues that are important to make content accessible.

▪    HTML structural semantics are necessary but not sufficient. For example, plain HTML has no tags for “sidebar” or “footnote” or “chapter”; sidebars and footnotes are both <aside>s, and chapters are just <section>s. WAI-ARIA and DPUB-ARIA  provide additional semantic distinctions to guide assistive technology, for example to identify a <section> as a chapter or an <aside> as a footnote.

▪    Math should be MathML. Although most math in scholarly and STM content is MathML upstream in the workflow, that MathML does not usually get incorporated into commercially distributed EPUBs because browsers and EPUB readers don’t always get it right. But AT needs it.

▪    Tables should be HTML tables, which is basic. But tables can get gnarly for assistive technology. One simple thing that helps a lot is to provide the <th> tag (designating a table header cell) for both column headings (which is typical) and row headings (which isn’t). That way, for a table consisting of a row for each state and things about the states in the columns, for the cell in row 22 and column three, AT can say “Michigan” and “population” rather than “row 22” and “population.” Avoiding merged cells is another thing; that is something that almost always has to be fixed in remediation.

▪    Here’s one you know already: images need descriptions. This is usually the missing piece even in otherwise pretty accessible HTML or EPUBs. This includes, but is more than, simply alt text. Decorative images should not have alt text; the “null alt” (alt="") is actually correct there. Alt text should not just repeat a caption (a common mistake), and the null alt is also okay if the image is fully described in the content. The biggest challenge; images that convey a lot of information need an extended description, in addition to the brief alt text. It takes time to learn how to do this right.

What About JATS XML?

Most readers of this article are in the scholarly space, and that space is dominated by an XML standard called JATS, the Journal Article Tag Suite (NISO Z39.96-2019), and BITS, the Book Interchange Tag Suite. These have become the lingua franca of scholarly publishing. But they are not HTML and EPUB. They are a completely different XML model, expressly designed for scholarly content and closely aligned.

It has frustrated me for years that scholarly publishing has not widely embraced accessibility because most scholarly publications have almost everything they need to be accessible. Scholarly journal content, in particular, has some of the most explicitly, granularly tagged content of any publication type. It is very straightforward to convert the JATS XML of almost all journal articles into HTML. The table model in JATS is now optimized to be converted to HTML tables, and the equations are almost always MathML. Also, journal articles are well structured and simple to navigate. The only missing piece is usually the image descriptions; and JATS provides markup for both alt text and extended descriptions. Although BITS is not as universal in the world of scholarly books, it is gaining ground, especially by publishers of both journals and books, and the BITS markup in a book chapter is virtually identical to the JATS markup in a journal article.

Here’s my favorite story about how close scholarly content is getting to being fully accessible without trying. A couple of years ago, I was following the development of Atypon’s Literatum platform, which hosts some 40% of the world’s English language scholarly journal content. When they told me they were embedding an EPUB reader in the upcoming release (which was subsequently released in May 2018) so that users could click on an EPUB and just read it the way they can click on a PDF, I was thrilled.

When I asked them what a publisher had to do to get EPUBs of their articles, they said “just check the box that they want an EPUB”; content submitted in conformance to the Atypon JATS spec would be automatically converted to an EPUB. And when I asked them what proportion of the equations (there are millions of them) are MathML, they said basically all of them are.

“Wow,” I said, “that means you’re really close to having fully accessible journal articles!”

“Huh?” they said. They had not considered the accessibility implications at all. Needless to say, they were pleased to find out they’d done such a good thing. (And they’ve since done more work to make them even better.)

The reason I love that story is that it underscores how close we are to the concept of “born accessible”; the phrase, coined by Betsy Beaumon, CEO of Benetech, means that content that is born digital should be born accessible. Get it right from the start. Atypon had done almost all that needed to be done for accessibility without trying. It was just to make their platform better.

Training the Vendors

One reason that was relatively straightforward for scholarly journals is that they are typically very consistent—both within a publisher and between publishers—and they almost all use the same markup, JATS XML. Other sectors don’t enjoy that kind of homogeneity.

I mentioned above that one of the most critical areas for accessibility is in higher education, where institutions are mandated to provide accessible content and can be sued if they don’t. And it’s not just the threat of litigation that makes all those DSOs at all those colleges and universities go to all that work to remediate content for their students; they’re mission-driven institutions that do what they need to do to educate their students.

That means that although the publishers are theoretically insulated from lawsuits, they are under increasing pressure to provide course materials that are accessible. Imagine how complex it is to make a big college textbook accessible, not to mention the platforms on which educational content is increasingly delivered, and the complex nature of that content, full of multimedia and interactive features. This is a huge challenge.

More good news: higher ed publishers, particularly the “Big Five,” Cengage, Macmillan Learning, McGraw-Hill, Pearson, and Wiley, have done truly amazing work over the last few years to address this challenge.

It has not yet sunk in to colleges and universities that many of the books they get from those publishers, especially through a service like VitalSource, are available as fully accessible EPUBs. Their DSOs continue to ask for PDFs and remediate them. Macmillan Learning, for example, now issues most new titles as fully accessible EPUBs, and they don’t have the PDFs. They must go to special efforts to get a DSO a PDF for a book that they know is going to be a ton of work to remediate when there is a fully accessible EPUB available. Aargh!

We are in a transitional period. Let’s hope it’s short.

Most people don’t realize that most publishers don’t actually do the work of making content into PDFs and EPUBs and HTML; they contract with vendors to do that work. Although their ability to provide accessible content is a laudable achievement, the Big Five higher ed publishers have actually had another arguably more significant accomplishment; they’ve trained their vendors to do that work right. They have spent literally years working with their vendors to make sure the tagging is right, to make sure the image descriptions are right, and so forth.

The result—the vendors now know how to do that for all their other customers. This is big.

Smaller Publishers Are Getting It Right Too

Those Big Five higher-ed publishers are some of the biggest publishers in the world. It would be easy to say “sure, they can do this, but most publishers can’t.” My favorite counter example is the University of Michigan.

The University of Michigan Press is part of Michigan Publishing at the University of Michigan Library. They have done a wonderful job of building accessibility into their editorial and production workflows. They started small, focusing on only one or two of the areas in which they publish to develop and perfect their workflows. By last year, when I asked them how much extra work it is in production to make their books fully accessible, they said “hardly any—maybe an hour of QA.”

There are two reasons for this, other than the fact that they did pretty much everything right. Their books are not complex textbooks or STM books; they publish in the humanities and social sciences. And they integrated accessibility throughout their workflows, rather than making it a task that somebody had to do after normal production was done.

The best example of this: those pesky image descriptions. They successfully got their authors to provide image descriptions; it’s now even being written into author contracts. They did that by providing excellent guidance as to how to do good image descriptions, and by training their editors to understand what good image descriptions are so that they could work with the authors as the content is developed. By the time the manuscript is turned over to production, the descriptions are pretty much good to go, though the copyeditors can still refine them.

An interesting byproduct—this often made the content better, because the authors would realize that some of that explanation belonged in the main text itself, which means an extended description isn’t needed anymore, and the book is better for all readers.

Better for Everybody

That’s really the key message about accessibility. Accessible content and systems are better for everybody. Just as we’ve come to take curb cuts and closed captioning and voice assistants for granted—all of which were developed originally for accessibility—we’re making progress toward the day when we can take accessibility for granted too. We’re not there yet, but we’re getting close to Betsy Beaumon’s “born accessible” vision becoming a reality.