Automating the Digital Supply Chain: Just DOI it

By David Sidman and Tom Davidson, Content Directions, Inc.
Appeared in the March 2001 issue of Upgrade:The Magazine of the Software and Information Industry Association (SIIA).

For publishers, the most dangerous aspect of digital content distribution is not piracy, but rather the lack of viable alternatives to piracy.  All the encryption and rules enforcement in the world won’t help a publisher if it can’t provide an alternative, acceptable means of providing its content to end-users. Such an alternative will almost certainly include some enforcement and copy protection measures, but to be truly viable in the marketplace it will need to be effortless, efficient, and responsive to new technologies and user demands. It will need to provide value to the user that goes beyond a simple stamp of legitimacy.

The Digital Object Identifier, or DOI, goes a long way to addressing these needs. The DOI can be thought of as something of a supercharged bar code for content on the Internet. As a unique identifier for digital content, it enables the automation of the supply chain, the integration of systems from many technology vendors, and the tracking of content use and distribution outside of traditional channels. As a persistent, actionable link (think permanent, context-sensitive URL) from the content back to the copyright owner that can travel with the file, the DOI enables sophisticated, transparent, and user-friendly DRM solutions, as well as many other applications.

Best of all, the DOI’s not vaporware! Developed by the original inventors of packet-switched networks (e.g. the Internet) at the Corporation for National Research Initiatives (the research organization run by Dr. Robert Kahn), the underlying technology of the DOI system is robust, scalable (to the quadrillions of objects), and Internet standards-tracked. The DOI has been fully implemented in the scientific journals sector by more than 60 of the top publishers, who have tagged nearly 3 million articles and use the DOI system to enable interpublisher cross-linking, and has been recommended by the Association of American Publishers as the identifier of choice for ebooks.

A Permanent, Globally Unique Identifier for Digital Content

While it may seem that the bar codes present on consumer goods in the supermarket are there to speed customers through the line-up, they actually owe their existence to the power of unique product identifiers (in this case the universal product code, or UPC) to produce efficiencies throughout the supply chain. Once supermarkets and their suppliers adopted a unique ID, all kinds of transactions, from inventory control to ordering to distribution to transportation to real-time financial reporting, could be automated efficiently and accurately. The savings in labor costs realized at the checkout were sufficient to justify the costs of installing scanners and computer inventory systems in the first place, but were soon far exceeded by the savings from automation once a critical mass of industry players had adopted the new scheme. The UPC, the ISBN for books, and the CUSIP number for securities all exist so that diverse computer systems can communicate effectively about the item in question.

Online content has no physical inventory, transportation, or physical logistics, but there is a fully analogous, if not more complex, chain of transactions required to facilitate its sale, distribution, syndication, copyright protection, and re-use. These transactions will all be managed by diverse systems that will need to interoperate. Currently, every publisher, and every online bookseller uses some kind of identifier to reference their products internally, and every DRM software package, content management or hosting system, and e-commerce system is shipped with a blank field called “identifier.” Pairs of players can work out bilateral agreements every time they wish to have their systems communicate, but the true efficiencies and advantages of digital distribution will not be realized until these identifiers are either synchronized or otherwise made universally interoperable. The DOI is just the ticket: a shared, globally unique identifier that enables these systems to be able to talk to each other and to end-users successfully, reliably, and cost-effectively.

Since product identifiers are so integral to so many business processes, replacing an existing scheme outright is not always an attractive prospect. It is important to note here, then, that the DOI needn’t be a replacement for other identification systems, but can be implemented as an ‘upgrade’ for them. To understand this, we need to examine the structure of the DOI itself.

A typical DOI might read:

10.1065/abc123defg

In this example, the “10.1045” is the unique publisher prefix (assigned by the International DOI Foundation) and “abc123defg” is the item identifier assigned by the publisher. The format of that second part of the DOI is wide open, allowing publishers to incorporate legacy identification schemes, thereby avoiding the need to re-engineer existing systems or commercial relationships that may depend on them. For instance, a publisher could continue to use ISBNs to identify printed books internally and to physical distributors, and could construct DOIs that incorporated the work’s ISBN. DOIs for saleable component parts of the work, or for different formats of the work, could also be constructed in such a way that the legacy identifier was derivable from the DOI.

Perhaps the most immediate and exciting benefit to a publisher of adopting the DOI, though, is that by making the DOI available to end-users, the publisher can effortlessly turn any pre-existing identifier into an persistent, actionable identifier with an efficient, scalable, Internet-based resolution and routing system behind it, as we’ll see in the next section.

DOI: The Actionable Identifier

A DOI is more than just a flexible, globally unique ID. Like a telephone number, a DOI is a number you can do something with. In much the same way that the Internet’s DNS system looks up network addresses from domain names, a network service called the Handle System resolves DOIs to their current network location.

So what’s the DOI’s advantage over a standard URL? In a word, persistence. In 1997, Brewster Kahle estimated that the half-life of a typical URL is 44 days. When a publisher moves digital content from one server to another, renames a file or directory, changes content hosting providers, or even sells the content to another publisher, the URL for that content is likely to change. Every time this happens, all the URLs pointing to that content break. ‘Redirect’ pages address the problem, but they are a kludgy, stopgap solution to a truly fundamental problem: users and publishers care about “what” an item is; URLs care about “where” an item is.

The DOI provides a crucial layer of indirection between the identifier and the content it identifies. When a piece of content is first published, its publisher registers its identity and current network location in the DOI system. Incoming requests for the DOI are resolved to the appropriate URL. When the content moves, the publisher of that content simply updates its DOI record with the new URL. Users resolving the same DOI are then correctly routed to the new location. Throughout the content’s lifetime, the publisher maintains dynamic control of all inbound links to its content. (In many current implementations, the DOI doesn’t resolve directly to the content, but rather to a web page on a publishers’ site that provides the user with information about the content identified, and the option to view or purchase it.)

In keeping with the emphasis on the “what” and not the “where” of content, the Handle System itself is capable of doing more than a simple direct mapping of a DOI to a URL. Any number of URLs and other pieces of data can be included in a handle record, and client software will soon be able to parse these records and perform such neat tricks as auto-selecting the fastest server for downloads, or offering the user a choice of available formats.

Just as the cost savings at the checkout counter drove adoption of the UPC in the world of consumer goods, so is the ‘actionability’ of the DOI a substantial benefit to early adopters that doesn’t depend on the network effects of widespread adoption of the DOI in the publishing supply chain. No complex supply chain can go long without a standard identification scheme, however, and the DOI is well on its way to becoming the identifier of choice for publishers of electronic content.

David Sidman (dsidman@contentdirections.com) is CEO of Content Directions, Inc., a consulting firm dedicated to promoting the adoption and implementation of the Digital Object Identifier (DOI) throughout all sectors of online publishing:  text, music, video, etc.  Prior to founding Content Directions in August 2000, David was Director of New Publishing Technologies at John Wiley & Sons, a leading global publisher. Tom Davidson was fomerly Associate Director of Consulting and Product Development at CDI.

More info:

The International DOI Foundation: www.doi.org

AAP Press Release on Ebooks: www.publishers.org/home/press/ebookpr.htm

The Handle System: www.handle.net

The “CrossRef” implementation for scientific journal content: www.crossref.org

Content Directions, Inc. The DOI Experts: www.contentdirections.com