Supplemental Materials Survey

June 2010

In October 2009, Alexander (Sasha) Schwarzman at the AGU (American Geophysical Union) conducted an informal survey of scientific journal publishers to learn how other publishers were dealing with the issue of “supplemental materials.” Conducted mainly through the e-mail listservs of CrossRef TWG and eXtyles, Schwarzman’s questions “touched a raw nerve” and generated more responses than he had been expecting. This article is an extract of the full survey report, issued in November 2009, which is available from the AGU website (see relevant links at the end of this article). Schwarzman’s article was the impetus for a January 2010 Supplemental Materials Roundtable meeting on the subject co-sponsored by NISO and NFAIS and the subsequent Working Group on Supplemental Journal Materials that the two organizations launched

Problem Statement

As Emilie Marcus, Editor in Chief of Cell, put it in her editorial, Taming Supplemental Material (Cell 139(1):11 (2009), doi: 10.1016/j.cell.2009.09.021):

Unfortunately, over the years supplemental material has evolved into a seemingly limitless repository for additional “stuff”. …It has become a mechanism for expanding the overall content of a paper without any delineated change in editorial standards. …Authors often feel compelled, by their own desire to be comprehensive and in response to questions raised in the review process, to include increasingly large amounts of data that exceed the traditional restrictions of the printed article. Reviewers may feel responsible, as the supplemental material is ultimately published as part of the peer-reviewed publication, to assess this information with the same attention and standards as the main body of the article, which often means that they are asked to evaluate the equivalent of two papers in the place of one. And readers may find it difficult to navigate through large supplements and may be unsure about how carefully the supplemental material was evaluated in the review process.

What is the definition of supporting material?

There is a clear split within the publishing community between those who declare the electronic article the copy of record and those who don’t. The supporting material definition is easier for those publishers who consider the print journal to be the normative copy; for them, anything that cannot be printed automatically falls into the category of supporting material.

For those of us, however, who define the electronic article as the copy of record, the decision is not so obvious. [The Cell editors in] Elsevier’s “Article of the Future” initiative distinguish between three major conceptual categories:

evidence that provides deeper support for the points made in the main paper,
large data sets and multimedia that can only be presented online, and
detailed information about the methods

Other publishers think along similar lines, e.g., “material that is not critical to the overall message of the paper but which supports it,” “information that will be of interest to some readers but is not essential to the central message of the paper,” “data and other materials that directly support the main conclusions of a paper but are considered additional or secondary.”

Does the notion of supporting material make sense in electronic-only environment?

There is no clear consensus here. Interestingly, many respondents who are currently dealing primarily with print tend to think that the notion may not be applicable in the electronic world. Yet, this optimistic view is not shared by electronic publishers. It seems to me, however, that in actuality the “print” and “electronic” groups are not that far apart; they share the same concern but use different language to express it. While the print camp wants to achieve an uninterrupted flow of narrative (and to do so dumps the offending interrupters “on the Web”), the electronic camp wants to ensure smooth navigation (and to that end dumps the culprits on the lower levels where they are less visible—either through an ingenious user interface or by providing a link instead of displaying an item right away).

Who is to decide what supporting material is?

There is a virtual consensus here that while the initial division between “main” and “supporting” material comes from the author, the ultimate decision must rest with the editor who has to have guidance from the publisher.

Personally, I think that once a conceptual decision of what constitutes supporting material is made, a submission system interface can help a great deal in guiding the author in this respect.

How do you ensure uniform application of “supportiveness” criteria?

Everyone seems to be resigned to the fact that there can be no uniformity in applying the “supportiveness” criteria across different journals published by the same publisher, much less across the entire scientific discipline. However, I would think that a publisher should articulate what the criteria are for a given title and insist that editors apply them consistently. Otherwise, a publisher risks that the decision will be made selectively or arbitrarily, and the editors will be left in the “I know it when I see it” situation.

Should supporting material be peer-reviewed?

I am happy to report that everyone, without exception, thinks that supporting material must be peer-reviewed.

What are different kinds of supporting material? Does it exist on the level of article components only or that of an entire article?

It appears that we can distinguish between two main kinds of supporting material, each treated somewhat differently:

» Supporting components, e.g., supporting tables, figures, multimedia, computer programs, etc. (Data sets are a special type of this component.)

» Supporting structural section, e.g., text (narrative), possibly containing math and a separate reference list.

To state the obvious, while supporting components exist on the component level, structural sections exist on the level of an entire article.

Some publishers explicitly stipulate how many [supporting] components, and of what type an article may contain. Other publishers have no explicit restrictions on how many supporting components can be accepted. Importantly, there is often a difference between the “main” and “supporting” components in (a) their acceptable formats, and (b) whether and to what extent they are processed

When it comes to data sets, we can distinguish between two rather different cases: (1) those data sets that have been deposited to one of the official data centers and those that have not. CrossRef accepts metadata deposits for data sets, so a data set can have a DOI. In the area of geophysics there is a World Data Center System Roster and I suspect that similar approaches exist in other disciplines, such as astronomy, biology, etc. The important point here is that when a data set is deposited with an official data center the whole “supporting vs. main” issue becomes irrelevant; the component is now an external resource that can be cited in the references by its metadata and [identifier]. It seems to me, it would be in the publisher’s best interests to make every effort to encourage authors to deposit their data sets to an official data center or even insist that they do so once the manuscript has been accepted.

Some journals, especially those where articles conform to a rigid format, define very clearly which sections fall into the “supporting” category. For others the picture is less clear; there is no consensus on what constitutes in-article Appendix versus online supporting material. The same kind of derivation of a formula or a proof of a lemma can in one case be part of an Appendix, while in the other appear only online.

What about readability, usability, preservation, and reuse?

Why does a scientist need a publisher? Well, of course we shepherd the manuscript through peer review, but we also add value to the content in a number of other ways:

» make it readable through copy editing; » make it navigable and accessible through user interface;

» make multichannel publishing, e.g., Web/HTML, Web/PDF, Print/PDF, PDAs, iPhone/Blackberry, e-Readers, etc., possible by applying markup in accordance with de-facto semantic and syntactic best practices;

» facilitate the relationship of an article to its scientific context and promote its discoverability by linking references, building citation indices, assigning DOIs to the article and sometimes to its components, and depositing/disseminating article metadata through abstracting and indexing services;

» preserve the narrative by printing it on an acid-free paper or/and marking it up; and

» preserve the components by ensuring they are submitted in/converted to formats that have a good chance of survival or could at least be migrated with lossless conversion.

When we look at supporting material we discover:

» with rare exceptions, supporting material is not being copy edited;

» supporting material items are usually not presented the same way as their “main” brethren, e.g., instead of an individual HTML document/section or a carefully processed image one will see a link to a PDF or MS Word [file], or to the whole group of documents, sometimes of different type (tables, figures, text) stitched together;

» supporting structural sections are universally not being marked up;

» supporting references are not being deposited and are not being linked;

» supporting components are often presented in author-submitted formats that do not meet archival standards or won’t be easily migrate-able;

» even when supporting material is provided in standard formats, e.g., PDF/A, such formats are less likely to be usable than more robust ones, such as XML.

The implications:

While a publisher makes a reasonable effort to ensure that the main content of the article lends itself to multichannel publishing, the probability is lower for supporting material.

While a publisher can be reasonably confident that the scientific content of the article can be recreated in the future as technology changes, the same cannot always be said about supporting material with the same degree of confidence.

What this means for the purposes of our discussion is that, effectively, we can formulate a couple additional operational criteria for defining supporting material:

Usability − Supporting material is not likely to be as versatile, robust, and usable as the main article when it comes to multichannel publishing.
Longevity − While the main article is going to enjoy eternal life with many reincarnations along the way, supporting material is likely to rot and die, with very little possibility of resuscitation.

Still there is no free lunch

Even though supporting material is not processed nearly to the same degree as the main article, it is still not without a cost to a publisher. Supporting material needs to be integrated with the main article; some degree of quality control needs to be exercised; the material’s existence needs to be reflected in the metadata; minimal markup needs to be applied, etc. Yet, with only one exception, all respondents have indicated that they do not charge authors for supporting material.

the question of supporting material cost: on the one hand, a publisher absorbs supporting material processing expenses; on the other hand, a publisher saves costs by not holding supporting material to the same standards of usability and longevity as the main article.

The majority of respondents stated that they take a “pragmatic” approach to dealing with supporting material. Leaving aside the non-printability issue, it appears that “pragmatic” here can refer to two different things: (a) arriving at a working definition of what is essential and what is not to the scientific conclusions of the article, and (b) achieving a trade-off between saving costs by sacrificing usability and longevity and providing access that should suffice at least in the short run.

Preventing abuse

It is a common concern that supporting material has become a “back-door to publication.”

I could discern two approaches to stemming the abuse: (a) enacting strict editorial guidelines, like imposing a limit on the total number of supporting components, and (b) charging authors for supporting material.

Tagging practices

The NLM Journal Publishing Tag Set allows one to tag a supporting section the same way as any other structural section and give it a requisite title. When it comes to tagging a component, the element allows one to treat it in a variety of ways. The approach of the Tag Set is to consider to be an element on par with or elements, rather than to be able to indicate that a particular or plays a “supporting” role.

There is a consensus that eXtyles has no problem exporting supporting material to XML, which is not surprising, given the fact that supporting sections and components markup is minimal.

Summary

While all of the publishers surveyed were distributing [supplemental] materials, there was little consistency in how they were handled. There was consensus in the view that all supplemental materials should be peer-reviewed, but not necessarily about the rigor of that review. The size and scope of the supporting materials was an issue, as well as if and where those materials reside online. Publishers generally responded that supplemental materials did not go through the same production processes, such as editing, layout, consistent markup, etc. While ensuring that the supporting data remained intact and unchanged, this lack of production management could lead to problems when a publisher wants to archive the information or migrate it to a future system.