AbstractAfter a period of experimentation and prototyping, digital editions are considered a common standard and a serious, quite often even a better alternative to printed editions. In addition the TEI/XML provides a well introduced standard for mark-up of all relevant structural and semantic elements of an edition. In spite of this process of consolidation the digital edition is still accompanied by harsh critique, particularly by objecting that mark-up leaning on XML fosters a text model of an Ordered Hierarchy of Content Objects (OHCO) that does not fit all editorial problems and limits the flexibility of the editor. As a consequence many attempts have been undertaken to overcome these limits of XML, but up to now without much success. By narrowing down the perspective, however, to problems of the text model seemingly caused by XML it was often overlooked that a digital edition consists of more than a XML file. This contribution attempts to show that the critique can be dissolved when the edition is viewed not merely as a XML file, but as an ensemble of its components. In doing so it can also be shown that other than its critiques maintain a digital edition is not less stable or persistent than its printed predecessor. The seeming fluidity of digital edition disappears if it is no longer determined by its visible surface, but according to its algorithmic nature by the interplay of its components of text, structure, layout, interface and metadata.
hybrid editions that seemingly better preserve the editorial outcome against future decay and ensure its accessibility and quotability.However, despite the progress digital editions have made in recent years, they still encounter serious criticism, particularly regarding the lack of stability and the still unsolved issue of long term archiving and accessibility, quite often leading to so-called
, digital editions are still judged primarily by their appearance or their surface as it is visible to the reader and user, rather than on features that are more appropriate for explaining their particular character. Therefore, a change of perspective seems necessary to understand properly structures, functions and properties that are constitutive of digital editions in order to tackle the problems and concerns that accompany the transition from the analog to the digital edition.Yet this feeling of deficiency, so to speak, and concern about the longevity of many years labor seems to be rooted less in technical or methodological arguments, but in a profound misunderstanding of the nature of digital editions. Even if recent papers from the DH community suggest otherwise
I argue in this paper that once the elements and functions of digital editions are sufficiently determined and properly understood, most of the problems that we encounter today with digital editions will disappear or at least be resolved much more easily. The shift in perspective I propose here illustrates shortcomings of the theoretical approaches of the previously dominant document and text model and offers a set of components or building blocks that help better understand the nature of digital editions and construct them in ways that ensure more reliability and instill trust in the digital format.
2. Markup and Overlap – towards a consolidated text and document model
 The distinction between text in the narrower sense, on the one hand, and information about text called markup or in some cases annotation, on the other, proved to be the key invention for digital publishing in general and digital editing in particular. The invention, to be more precise, consists less in the inclusion of formating or other signs in a text to produce the output of a rendition process (as is the case with early word processor software), but in the fact that markup is taken to be a distinct set of code that can be generally described by formal standards and that exists independently of the text at hand so that it can be shared and adapted to various requirements. The decisive invention, therefore, is the general markup language which could be fixed as a set of rules in Document Type Definitions (DTD), later in other schema languages. Without this straightforward method that was easy to handle even by technical layman, digital editing would never have had the success it has had in recent years. As usual in such cases the breakthrough is less conditioned by the mathematical or intellectual beauty or soundness of the argument, but instead by various social factors and economic contexts that can be conceived of as a paradigm. Critics like Schloen fail to take these particular historical circumstances into account when he emphasises that the relational model devised by the mathematician Codd is superior to the markup model by the lawyer Goldfarb. Most influential was the development of SGML and its two offspring, HTML and XML. Today XML is the de facto standard for digital editions and increasingly for digital publications, too, though in the case of the latter PDF encoded documents still prevail, thus defying the open markup concept by adhering to the analog paradigm in mimicking printed documents on the screen.Even though it was occasionally debated, digital editing started with markup.
Text Encoding Initiative (TEI) was accompanied by criticism about the inappropriateness of the so-called OHCO model for encoding complex textual phenomena and representing text properly. The main objection concerns the problem of overlap as XML has to be nested and does not allow for overlap. Cases of conflicting physical and logical structures, for instance, page versus division, are well known, but other structures may be incompatible as well, such as linguistic versus semantic encoding, e. g. connected semantic information that spreads over two sentences or paragraphs.From the outset development and usage of the markup language XML and its most prominent application profile in the humanities the
 not only those that adhere to the OHCO model. However, surprisingly enough, this has not diminished the popularity of XML in general and TEI in particular, and TEI/XML continues to be the medium of choice when it comes to digital editing, even if members of the TEI community more recently feel the need to emphasize that there is a difference between the TEI abstract model and its XML serialisation In view of its acknowledged shortcomings, the perseverance of the TEI/XML model is most striking, especially since there is no lack of alternative solutions to overcome the deficiencies of XML and the OHCO text model. Among them MECS, GODDAG, TexMECS, LMNL, CHOIR/OCHRE all promise to transcend »the document paradigm«, or Text as a Graph (TAGML) that relies on a highly flexible graph model. What is it exactly that prevents all these solutions from replacing XML right away? For instance, the most recent concept Text as a Graph seems to be an elegant and sound concept, promising a much greater deal of flexibility. A first serialization draft was put forward by Dekker et al. and according to the authors this technique solve the problems that come with XML. However, along with older ›convincing‹ solutions TAG shares the fate that while being ›a good idea‹, it only reaches a few very narrow specialist circles, while the great majority seems to be still satisfied with established XML solutions, above all, because there are a lot of many useful tools available that work efficiently with XML. Above and beyond this, even if one might be convinced that TAG is better than XML, we do not know yet whether or not it really is suitable for solving all the issues that arise in the context of text encoding. The question remains whether the benefits of TAG cannot also be solved with XML, or whether changing is really such an advantage that it justifies replacing a well established infrastructure. In the long run, graph theory is equally insufficient to e. g. represent something like the autopoietic textual concept that McGann proposes. Accordingly, even if graph theory as a way of modelling text may help gain more »clarity of thought about textuality«, it seems wise to be cautious with respect to definite text models, and we should always be aware that converting text into a machine readable form does not, in and of itself, mean ›understanding‹ the text. Markup is always a very limited way of making main structural features of a text explicit or of singling out particular entities. Consequently, suggesting a »model of« text will always be restricted to very basic features while a »model for« approach seems to be more promising, as we can concentrate on functional characteristics that do not claim to explain the very nature of text, but make it operative for queries and manipulations. Text in this sense allows introducing a digital pragmatic or even a digital hermeneutics into the digital edition. This approach opens up possibilities of working with and through digital texts in ways that are profoundly different from traditional approaches and former concepts of authorship by relating to ideas expressed by Robinson, Schillingsburg or Gabler, all of whom emphasize the social dimension of the digital edition.Notwithstanding various suggestions that have been proposed to solve these issues, no final solution could be provided that address all the issues raised by critics, and overlap issues continue to plague encoders,
3. Not just markup – completing the document model
 In view of this fundamental difference it is crucial to understand what actually is the gist of the digital edition and what are its pertinent components that allow researchers to devise a model that is stable enough to create reliable and consistent digital editions.Currently TEI, its abstract model and its serialisation in XML are without a doubt the most important standard for digital editions. However, the model is incomplete when we confine ourselves to markup and ignore the other components and factors required to implement an edition. Just like the analog book that developed gradually from scroll to codex and from manuscript to print and came to obtain its particular standardized appearance and format, the digital book or edition also requires a »standardized« structure and form enabling it to be ›shelved‹ by libraries and used by scholars seeking to apply digital methods. ›Books‹ of that kind need to be reliably referenced, their content needs to be displayed in various forms, be it for reading, be it for sorting, or be it for other purposes. Machine-readability is a key prerequisite for digital editions and plays a crucial role for all kinds of applications, e. g. to perform queries, to analyse document content, or to link it to the semantic web. These kind of applications need to discern document structures, such as preface, introduction, abstract, main text, footnotes, bibliography, index and the like. Documents and data should be easily disseminated via standardized interfaces, concepts and entities be extracted and different versions synchronized. Users may wish to contribute to digital editions as social products and to engage in collaborative editorial projects. Seen against this background, it is imperative to leverage the full potential of the digital by avoiding rather traditional approaches that simply mimic the analog, e. g. by using PDF. This is why most theoretical works on digital editions share the conviction that they have to be considered entirely different from analog ones.
This is my first statement:
This is my first statement:
The first encoding seems to represent the right order while the second is plainly wrong. However, if we apply the following XSLT statements, the second version would be likewise correct with regard to the resulting output.
 and demonstrates that we should not mix up the document model with the XML grammar. To express this idea another more subtle way, one must acknowledge »that XML has a semantics if – and only if – you also have a way of taking that representation and using it to produce something else, which, in practice, means having some means for transforming the XML data […].«While nobody would encode text in the wrong order, the example provides a good idea about the interdependency of the semantics within the processing model
Another example relates to the choice issue raised by Dekker in his description
of the TAG model. He argues that the order of TEI elements such as
<choice>, if altered, results in a different
document, even if the meaning is the same. Again, this disregards the nature of XML documents whose appearance
can be determined either by an explicit process model or by a schema file. It makes no sense to insist on the
order given in an XML file without taking the XSLT or schema file into account. A
processor is not dependent on the given text order when it computes the file, but
when using, e. g. a RelaxNG expression such as
 that usually encompasses a couple of files such as .xml, .xslt, .xsd, .css or .js, each of which makes up a necessary component of the edition. These files are used for both the basis for the presentation of the edition in some kind of viewport and for its long term preservation. In other words, we have an ensemble of files that can and should be fully described. By drawing on established and transparent formats we are also able to ensure their long-term archival preservation. As already noted, not the surface, but »data is the important long-term outcome«. Accordingly, one has to keep in mind that this kind of ›edition‹ is by nature algorithmic. It enacts the input-process-output model of all software applications. The XML document(s) are data that are being processed and the result is realized in a viewport of some kind. The result, the surface rendered by this process, is always derivative. Without the underlying data and functions it cannot be understood and should not be mistaken for the edition itself. For example, if there are scripts for displaying a version with an original and one for a normalized view encoded in the document, not just one of both constitute the edition, but the edition is the possibility to display both textual variants. The surface appearing on the viewport is just one of many other possibilities that, as mentioned, can be entirely described as a definite number of combinations and determined or produced by a finite set of statements and data information.If we accept this holistic approach to the technical implementation of the digital edition it becomes evident that the edition is not realised but by an algorithmic process
4. The Components of an Edition
 However, even if »the first task of any editor is to know what is meant by the terms document, text and work«, because for identifying the constitutive parts of an edition, it is better to abstract from that sort of textual interpretation and focus on the basic technical conditions.Patrick Sahle has quite convincingly distinguished six different meanings of text in his monumental work on digital editions.
, questions that ASCII or Unicode are sufficient to characterize this kind of text. He objects: »dass er genau mit einem eng begrenzten, durch technische Bedingungen bestimmten Grund-Zeichenraum zusammenfallen sollte, ist wenig plausibel«. To be sure, editorial theory is related to semiotics or bibliography, and it is important to gain a clear idea of what kind of editorial principle an edition pursues. However, even though there are many meanings of text, as demonstrated by Sahle and others, Unicode is the technical precondition of all of them, and despite Sahle‘s qualms it makes good sense to take the Unicode character set to be a basic component of a digital edition and a prerequisite for long term archival. The Unicode standard limits the way text can be read by readers and by machines and the Unicode codepoint set is at the core of what text technically is. Even if easily overlooked, digital text conforms to the input-process-output principle, too, depending on a rendering engine and a graphic representation of the letter usually described in a font. Text as a sequence of Unicode codepoints forms the basis of all editorial texts, while the code and graph together form the rendered output on the viewport.Text in this sense, therefore, is primarily determined by a sequence of code points and an application of markup. Sahle, while quoting the definition of markup put forward by Sperberg-McQueen: »By markup I mean all the information in a document other than the ›contents‹ of the document itself, viewed as a stream of characters«
In addition or extension to the text proper further media objects can be included into the edition such as images, objects (3D), audio or video. These are subsumed under ›text‹ as they are treated on the content level. They accompany the edition as research data and have both illustrative and verifying functions as in the case of digital facsimiles providing a view of the sources or testimonies on the basis of which the edition was compiled. For most of these, standard formats for long term archiving are available, e. g. TIFF as a robust standard for still images. For some others the situation is still unclear, e. g. for 3D representations.
The difference between text and markup is sometimes blurred in view of codes such as carriage return or line feed that can likewise be encoded by markup or by Unicode codepoints, but in practice markup is preferred to formatting codepoints. Markup and text are closely related and the possible usage of both parts are encoded in the schema file. These three components constitute the ›textual‹, or more precisely, the structural level of the edition. For the sake of long term archiving we have to claim that these three components are present and discernable:
- a sequence of codepoints (Unicode),
- structural or descriptive markup (no matter, if inline or stand-off), and
- a schema defining elements, attributes, and structures permitted in the edition. Usually, there is also a TEI/XML file and a customization of the TEI (ODD, RelaxNG).
 Yet this processing model is rather abstract and only provides a general description independent of its actual implementation in some programming language. Typical for XML is XSLT or XQuery. Other programming or scripting languages are possible, but not recommendable as they run the risk to encode functionalities in code that is proprietary and leans on a particular software. The W3C standards XSLT and XQuery are presumably less incumbent on obsolescence and due to their particular XML logic easier to integrate in editorial projects.The next important component is made up by the processing requirements of the edition. As a whole it can be described by a processing model as suggested e. g. by the TEI.
, whose semantics are often not really determinable. It makes a difference whether a page is set in Roman or Gothic font, and it makes a difference whether there is sufficient space in between the lines and at the border, even when the semantics are not fully clear. A site well designed with pleasant fonts, colors, and a printing or rather viewing area that is easy to survey, to navigate, or agreeable to read makes the edition delightful for the user and offers a better usability and understanding of the meaning. What is missing today is a new printers’ art, so to speak, for digital presentations, that is, an art that is in a way very close to media design, but focuses on the beauty and usability of digital publications. This component encompasses all of the techniques to show or use the edition at the viewport. As a matter of course, what we get depends on the output format. The basic output format for digital editions is HTML, but there may be also other formats such as PDF, EPUP or even LaTeX, DOCX, ODT or similar outputs. Not all of these formats are suitable for including all of the many possible outputs a digital edition is able to generate. For instance, a PDF is perhaps not a good choice for conveying complex navigational logic in the display of different text versions, but provides the best choice for just simply reading pages on a computer screen. The same holds true for EPUB which is a good choice for smartphones or tablets, but fails to serve applications that attempt to extract structured information. Sahle observed that the current change of technology often leads to a loss of aesthetic vigor, and McKitterick pointed out that in view of the material evidence »it is not a straightforward or speedy translation from original to screen«. Therefore, it seems advisable to pay more attention to design issues and appropriate presentations.One last component, the layout, is often overlooked. Some seem to assume that layout plays no, or only a minor, role for digital texts and that we have to be clear about the semantics in the first place. But this would disregard the affordances of an aesthetic digital presentation. The history of print has shown the relevance and influence of aesthetic presentations
5. Integration of the Components
 Besides edition he suggested project, database, archive, thematic research collection and arsenal, and indicated a preference for the latter. Whatever term one may choose, it becomes clear from his discussion that the classical term edition is no longer applicable and, if employed, must be re-defined. The reason lies not only in its volatility and fluidity, discussed above, and which is explained by its algorithmic nature, but also in its lack of integrity and various degrees of coherence. However, in practice the changing states of coherence never lead to a complete dissolution of the editorial text, and it holds true that »whole of all its possible variant readings and interpretations makes up a virtual unity identical with itself. Text in this view is not an arbitrary unity, for if it were seen as such, no text would differ from any other.«In view of the peculiar nature of the digital edition, Price raised the question what the best term might be to describe it properly.
 viewport and editorial components may be at various locations, but if the editor or editorial team has no control over the distributed resources, reliability and quality becomes jeopardized as resources may disappear or change their meaning over time. In general, there are two ways to address this problem. The best solution is copying all resources to the edition. This allows for the exact mirroring of the state of knowledge the editor possessed at a certain point in time. If this is not possible, external resources should be linked, documented, and displayed in a way that the external provenance is visible and that readers or users are aware of possible alterations for which the editor is not responsible. Of course, some resources or digital sources are more reliable than others, but this belongs to a new kind of digital literacy the user needs in order to evaluate the resource properly.Sometimes it seems difficult to distinguish the editorial core from additional material, or to evaluate the quality and persistence of resources that are linked to the edition. For instance, do the digital facsimiles of sources belong to the edition proper? Is an external resource, such as an authority file record, to which the edition provides a link part of the edition? Hypertext is a powerful tool, but brings about conceptual challenges. Digital editions are always net or hyper editions. If the various parts of the edition need not be at one place, but can be assembled by hyperlinks, issues of permanent availability and responsibility arise. According to the principle of transclusion,
 How this is accomplished is not standardized. Some include this information in a TEI document, others use METS, some use other means for this purpose. To ensure the integrity of the edition, all internal and external resources must be listed here so as to determine what is part of the edition and how the components may interact.All resources bundled into an edition should be carefully documented. This documentation must be integrated into the edition itself and makes up a central component describing resources, files, metadata, and the structure as well as processing instructions for rendering the digital edition. This assembly as a vivid organism can be viewed as what Robinson ties to the notion of ›work‹.
6. The edition as interface
 or their suitability to be integrated into the semantic web. As already mentioned, research data such as facsimiles of primary sources collected in the course of compiling sources (recensio) should be included either by downloading or linking.If the particular nature of the digital edition is seriously considered as being entirely different to the analog or printed edition, some basic features of the printed edition that are no longer operative with the digital one must be reconsidered: Consequently, the author, publisher, or other contributor of an edition must be aware of the functional and dynamic opportunities a digital edition has to offer in order to serve both the natural and the mechanical ›reader‹. The authorial intention, so to speak, has to take into account not only new ways of digital hermeneutics and analysis, but also the fact that a digital edition is always an internet edition and, therefore, part of a larger network. Hence requirements occur with respect to its hypertextuality and interfaces
One of the most challenging issues is quotation. Because there is no stable surface or carrier material like sheets of paper that allow for reliable pagination or numbering of lines, a different technique must be applied. This is perhaps one of the most fundamental changes in adopting the digital edition. Previous citation mechanism are no longer viable; the citation itself changes its very nature. To come to grips with this problem, two basic principles must be acknowledged: first that a digital edition is always also an internet edition addressed by URLs, and second that in order to reconstruct a particular view, instructions and data are necessary to reproduce this same view must be available. In other words, the process of quoting is nothing else than an instruction expressed by an URL that encompasses or activates all information necessary to reconstruct a particular view or repeat a particular behavior of the edition.
 It is employed to submit, e. g. a GET statement evoking a particular response (view). Contrary to using an URL as basically a locator that identifies documents or fragments, a REST interface is a suitable means of communication with digital editions as it corresponds best to their algorithmic nature. Accordingly, a quotation is to be regarded as a particular communication act and a REST interface or more generally an interface that allows for precisely evoking the view one wishes to reference. Even if it is limited to a subset of possible views, a good example of such a RESTful interface in connection with an edition can be explored at the Carl Maria von Weber edition at Paderborn, which uses the Swagger/OpenAPI software.Consequently, by quoting a digital edition we no longer refer to some point in the text as was the case with the printed edition, but by means of an URL we activate a mechanism that generates the view or version to be quoted. Roughly speaking, this URL refers to an editorial REST interface.
 how the interface is itself a way of ›interpreting‹ the edition, and how to communicate with it (e. g. reading, extracting, ordering, sorting, or querying). For instance, an ›interface URL‹ may create a view that is produced by applying particular XML and XSLT or XQuery files. It is paramount that this kind of response is not dependent on proprietary software frameworks but is an integral part of the edition itself. This allows for copying the edition as a whole, similar to a book that can be moved from one place to another without losing any of its functionalities. Until now there has been no standard for setting this kind of ›quotation‹ to work, but we can already discern three key ingredients, i. e. the values or data (XML), the functions (XSLT, XQuery), and the parser (e. g. saxon, Xalan). Since the parser is standardized, we can ignore it here. Accordingly, it should suffice to quote the XML and XSLT or XQuery file to create a view. A possible URL might look like the following:We are now able to better comprehend how the digital edition is closely tied to the concept of the interface and how the interface makes up a core feature of the digital edition,
However, this kind of interface functionality has to be stable. As is well known,
URLs tend to disappear or change. To address this problem it is necessary to
assign persistent identifiers to the edition or parts thereof. This can be
achieved by using a PURL, DOI, Handle, URN or other mechanism ensuring that the
›quoted‹ statement always retrieves the same view. It should be noted, however,
that the main purpose of PIs is identifying by naming, not locating. Thus a
persistent identifier can be assigned to several items of the same edition.
Locating functionality is provided by a resolving mechanism such as
https://doi.org/ for DOIs or
URNs, but can also be achieved by any search engine.
In turn, the documents should be equipped with IDs to ensure granular citation of parts to be quoted. For instance, our example:
This is my first statement:
could be quoted by
or by using a query instruction:
Word when equipped with suitable templates. The sense and relevance of the interfaces component consists exactly in that it functions according to a formalized data model allowing for conversion of data into whatever form is needed. Conversely, serving an interface means to render all proprietary modelling without loss of its integrity as conformant to the edition model manifest in the interface.As a matter of course, quoting via an interface can be further differentiated as to the kind of usage the user makes of the edition. Interfaces are employed not only by way of reading the whole or parts of the edition, but also of searching or working on the edition. Searching an edition can be regarded as a particular kind of machine-reading. A search action ideally expressed in a formal query grammar (e. g. CQL, XPath, SPARQL etc.) or simply as a full text search results in a list of hits pointing to pertinent passages or offering snippets for inspection. The hits themselves as single items or as a set can also be viewed as quotable text parts. On the one hand, provision has to be made that the edition can be interrogated by such a query. Input, on the other hand, can be brought to the edition by any format, as long as it conforms to the data model specified in the schema expressing the model and as long as it is suitable for a lossless conversion to the targeted XML/TEI format. Therefore, even if it is not to be recommended it is nonetheless possible to use other editing software that employs formats other than XML/TEI, for instance
As demonstrated above, the digital edition does not have much in common with the traditional printed one. Its dynamic or algorithmic nature must be taken seriously. This means that we have to abandon the two-dimensional form of the printed book as authoritative in order to open ourselves to the particular character of the digital one. We go wrong when we try to compare digital and printed editions only according to their surface features. There can be no stability and persistence of the surface of the digital edition, and even if some scholars claim to be able to persistently archive an output format such as PDF/A, they do not do justice to, or even in some cases mutilate, the digital edition when they reduce it to just one of many possible output formats, data and document representations, or search options.
A digital edition as such is not visible, but all its components can be fully described and views reliably reproduced. By acknowledging the algorithmic nature of the digital edition, it loses its protean character and becomes as stable and reliable as the printed edition used to be over the centuries. All of its components that we have identified as making up the edition — text and markup, editorial functions, layout for output or input, (descriptive, structural, administrative or technical) metadata and interfaces — are for the most part already standardized by agencies such as W3C, leading national libraries, or are in the process of being standardized in order to provide a reliable framework for the centuries to come. Perhaps it is quite the opposite. By its very nature of allowing easy and lossless dissemination the digital edition is even more stable than a paper based editions ever was, because in stark contrast to digital editions print editions are prone to material decay.