Üblicherweise beruht das Design von Ontologien auf der Annahme, dass die Bedeutung einer Proposition sich aus der Bedeutung ihrer Elemente (Begriffe) und ihrer syntaktischen Struktur ergibt. Die Reichweite dieses ›Prinzips der Kompositionalität‹ ist jedoch innerhalb der Semantik strittig. Die Gegner des Prinzips verteidigen den Primat der Satzbedeutung und leiten die Bedeutung von Begriffen aus ihrem Beitrag zur Satzbedeutung ab. Angesichts dieses Sachstandes argumentiert der Aufsatz zugunsten eines Zugangs zum Design von Ontologien, der keine Stellungnahme in dieser Debatte voraussetzt. Die hier vorgeschlagene ›minimale doxographische Ontologie‹ dient als heuristisches Werkzeug zur Erfassung unbekannter oder komplexer Gegenstandsbereiche. In ihr werden Satzbedeutungen als unanalysierbar angesehen und auf einen Träger des propositionalen Inhalts (Personen oder Texte) bezogen. Die Stärken eines solchen Ansatzes werden zunächst anhand eines vereinfachten Beispiels erörtert, einer Analyse von juristischen Begriffsdefinitionen alkoholischer Getränke. Ein komplexerer Anwendungsfall betrifft die doxographische Analyse von Debatten in der Geschichte der frühneuzeitlichen Philosophie. Schließlich erörtert der Aufsatz kurz, wie ein solcher Ansatz erweitert werden kann, indem Ontologien als hermeneutische Werkzeuge zur Deutung von Quellen der Philosophiegeschichte verwendet werden.
Traditionally, ontology engineering is based on the presumption that the meaning of a proposition results from the combination of the meaning of its elements (concepts) and its syntactical structure. The reach of this ›principle of compositionality‹ is, however, a contested topic in semantics. Its opponents defend the primacy of propositional meaning and derive the meaning of concepts from their contribution to propositional meaning. In this situation, this paper argues for an approach to ontology design that does not presuppose a stance in this debate. The proposed ›minimal doxographical ontology‹ is intended as a heuristic tool charting unknown or complex domains. It regards propositional meaning as atomic and relates it to a bearer of propositional content (persons or texts). The strengths of such an approach are first discussed in a simplified example, the analysis of legal stipulations on alcoholic beverages. A more complex use case concerns the doxographical analysis of debates in the history of early modern philosophy. In closing, the paper sketches briefly how this approach may be extended using ontologies as hermeneutic tools in the interpretation of sources from the history of philosophy.
In recent years, technologies of knowledge representation that are usually subsumed under the heading of the ›semantic web‹ have been used within the digital humanities in disciplines as diverse as literary studies (e. g. regarding the ontology of fictional characters), philosophy (the Wittgenstein ontology), or history (LODE, an ontology for the description of historical events). Thinking about the ›semantic web‹ comes most naturally to digital humanists approaching the discipline from what could be called a ›cultural heritage‹ angle, e. g. librarians, archivists, or curators. In these areas, the production of meaningful metadata is part of everyday workflows; the transition from cataloguing guidelines to machine-readable metadata standards to semantic web languages like RDF comes quite naturally and is an important step in fighting the ›siloisation‹ of digital collections by embedding them in a web of ›linked open data‹.
Whether ontologies are an important part of the tool set of the digital humanist is still a disputed question. Some of the more radical defenders of ontologies are unperturbed by this criticism. They contend that ontologies are capable of not just representing or modelling knowledge, but that they capture features of mind-independent reality. This may even be true for artefacts.
Whether semantic web technologies are a good fit for a given use case is, however, first of all a technological problem that probably should not be solved on purely philosophical grounds. Nevertheless, philosophy may be able to contribute to some foundational debates in the digital humanities, if its function is not taken to consist in the provision of foundations, but of ›maieutic impulses‹ that help to explicate hidden presuppositions and stimulate to rethink unacknowledged biases and blind spots.
In this spirit, this paper discusses one such unacknowledged presupposition of ontology design. Both the knowledge to be modelled in an ontology and the ontology itself are necessarily articulated in propositional form. Correspondingly, the formal structure of ontology languages (like the Web Ontology Language OWL) are consciously modelled on central premises of formal semantics, first and foremost the ›principle of compositionality‹. This ancestry, however, may not be as innocuous as it seems, because philosophers of language discuss controversially whether the impact of the principle of compositionality is limited by a second principle, the ›context principle‹. Thus, those interested in the capabilities of ontologies for modelling knowledge must first clarify the possible impact of these debates on ontology design.
The most appropriate strategy in such uncharted territory is ›risk avoidance‹. Accordingly, it may be possible to use the tools of the semantic web in an unassuming and modest manner, as a heuristic tool for mapping vague, complicated, or partially unknown domains. I will first discuss the raison d'être for such a modest approach using a somewhat contrived example and show some problems we encounter in trying to extend the well-known wine ontology. It has already been shown how the modelling of certain domains can profit from analytical restraint, namely if we desist from analysing propositional content into component terms and bind this content to the existence of concrete spatiotemporal entities as their ›bearers‹. The fruitfulness of such a minimal ontology of discourses depends, however, on use cases in ›real life‹. Hence, the proposed ›ontology of what people said‹ is applied to a ›doxographical map‹ of a spatially and temporally circumscribed discourse in the history of philosophy, the debate about the proper definition of the term ›philosophy‹ in early modern Iberian philosophy.
The history of philosophy is, of course, more than just doxography. It should ideally be complemented by interpretations of what people said. In my conclusion, I will sketch how we may use the resources of semantic web technologies to describe the conceptual hierarchies that are implicated by what philosophers (and, possibly, others) have to say. However, it is important to keep in mind that in the light of the foundational discussions of the first part of this paper, such ontologies will always be interpretations, leaving room for controversy and dissent that is probably inevitable if we try to capture the meaning of a text. This is true regardless of the medium we use to express our findings.
2. Ontologies, Compositionality, Contextuality
In a first approximation, ontologies can be defined as »explicit formal specifications of the terms in the domain and relations among them«. Such a specification determines »a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them«. Moreover, such a specification is supposed to be indispensable for ›analyzing domain knowledge‹. OWL knows two sorts of concepts: classes that »provide an abstraction mechanism for grouping resources with similar characteristics« and that are defined in so-called ›class axioms‹, and properties which are defined in so-called ›properties axioms‹. The third category of statements to be found in an OWL document concerns facts about individuals. All statements are composed out of classes, properties, and constants as building blocks. These building blocks must have been defined beforehand: their intension must be known, before statements can be constructed.
This means that ontology engineering is firmly rooted in a theory of meaning based on Frege's ›principle of compositionality‹: the »[…] meaning of a complex expression is determined by its structure and the meanings of its constituents«. This nexus raises interesting questions. Those who believe that ontologies may be capable of modelling knowledge in a given domain without being committed to the stronger view that they capture features of mind-independent reality may be content to limit the scope of compositionality to a given language. Or they could maintain that it only applies to the formal language which is used for articulating the model of a domain, because artificial languages can be construed in such a way as to exhibit compositionality as a feature. Those who subscribe to a more realist interpretation of concepts in an ontology might probably have to accept the stronger thesis of ›cross-linguistic compositionality‹: »For every complex expression e in L, the meaning of e in L is functionally determined through a single function for all possible human languages by the structure of e in L and the meanings of the constituents of e in L«.
But even if people may have reasoned disagreements about the scope of compositionality, the ›compositionalist‹ bias is apparently built into the very notion of an ontology as a ›common vocabulary for researchers‹. However, this first Fregean principle conflicts with a second also discussed in relation to his philosophy of language, the ›principle of contextuality‹ (or ›context principle‹): »The meaning of an expression is determined by the meanings of all complex expressions in which it occurs as a constituent.« So whereas compositionalists hold that the meaning of a proposition is the sum total of its parts with the semantic contribution of the structure of the proposition, contextualists presume that propositional meaning comes first and that the meaning of the constituents of a proposition depends on their role in all other propositions in which they are contained. But, again, the scope of this priority claim must be determined precisely. In the context of this paper, it is helpful to follow Stainton and to distinguish three different understandings of the priority expressed in context principles, namely methodological, metasemantical, and ›psychological‹ interpretations of this priority of propositional over conceptual meaning.
In a methodological perspective, we assume that an analysis of the meaning of subsentential expressions must take into account the context of the proposition they appear in. This understanding of propositional priority may even be compatible with compositionality, because we can understand how the meaning of a subsentential expression appearing in a proposition that we understand can be isolated and transferred into new contexts, allowing us to express a proposition that we had not yet understood. In other words, we may need both compositionality and contextuality of meaning in order to explain linguistic creativity, the capability of expressing new thoughts by recombining elements which we already understand.
But this does not mean that we are necessarily committed to the stronger metasemantic thesis that propositional meaning is in some substantial sense the only (or only the most relevant) source for the meaning of subsentential expressions. If this stronger thesis were applicable to the methodology of ontology design, the project as such might well be hopeless, because the recombination of terms could always lead to mutations in meaning that are unforeseeable for the designer.
The psychological thesis states that competent speakers of a natural language cannot grasp the meaning of subsentential expressions in isolation. Proponents of this view are probably sceptical with regard to the attempt to represent online resources by applying subsentential expressions to them: for them, tagging as such cannot be a meaningful linguistic activity. Even though such radical criticism may be misplaced, we should keep in mind that the vision of the semantic web is built around the notion of knowledge. And – difficult philosophical problems with non-propositional forms of knowledge like knowing-how or foundational perceptual beliefs notwithstanding – an ontology can only codify knowledge that can be explicated in propositional form. So even if we do not subscribe to the strong psychological thesis that subsentential expressions as such are basically meaningless, we could still accept the methodological guideline that ontology design is concerned with knowledge that can be expressed or explicated in propositional form. Implicit awareness of the meaning of subsentential expressions thus should always be explicated in propositional form, regardless of whether competent speakers can use or understand such expressions in isolation.
So even those ontology designers who would subscribe to compositionality as an essential constituent of their self-understanding still face interesting problems: should we presume that ontologies mirror cross-linguistic universals or is their usefulness limited to speakers within a given linguistic community? Do ontologies mostly track extensions, i. e. the reference of terms, or should we give them an intensional interpretation as well, taking into account their meaning? Do we accept the notion that statements in an ontology are fully devoid of context, so that their meaning really consists of nothing but the sum total of subsentential meanings and the contribution made by syntax?
In thinking about these questions we should never lose track of the fact that ontologies are no end in themselves: they are technological instruments, so that their scope and utility is determined first and foremost by pragmatic considerations. It is therefore imprudent to assume that in order to build an ontology it is necessary to choose one side in these complex and unresolved philosophical debates. We should rather ask ourselves to what extent our understanding of ontology design is determined by unacknowledged biases in our implicit theories of meaning and whether it is possible to build ontologies in a way that is not committed to any explicit stance.
Such a minimal understanding, at least on the heuristic level, of coming to terms with a given domain would consist in two decisive moves:
- The meaning of propositions (i. e. their ›propositional content‹) is taken to be opaque, it is only referred to by a name. This allows us to avoid any commitment whether or not in a particular case the meaning of a proposition can in fact be analysed compositionally.
- Propositional contents are only allowed, if they can be connected to a spatio-temporal entity (mostly a person or a document) that articulates a propositional attitude towards this content, i. e. asserts, denies, or reflects upon the content in question. In other words, the propositional content designated by the name ›wine is made of grapes‹ is not to be analysed into a subject term designating a drink, an object term designating fruit, and a relation term designating the process of turning fruit into a drink. And it is not to be admitted, unless we can trace this content to a person or document that either asserts, denies, or reflects on the propositional content ›wine is made from grapes‹.
It should be noted that the opacity of propositional meanings is not taken to be absolute. We still can and should talk about subject terms, object terms, and relation terms contributing to the constitution of propositional content. But we can do so without preconceived notions about how single terms contribute to the meaning of the propositions they appear in. In order to elucidate this point, I will compare the approach proposed here to standard procedures of modelling knowledge about statements, i. e. reification. But first it will be helpful to discuss a simplified example that is meant to demonstrate that using the approach proposed here we can deal in a simple and transparent manner with inconsistent statements within a domain as well as with statements that may prove to be troublesome when related to other domains.
3. Legal Wine
§175-2-2 of the Legislative Rule 175CSR 2 governing the activities of the West Virginia Alcohol Beverage Control Commission stipulates that wine in the sense of West Virginia state law is
»any beverage obtained by the fermentation of the natural content of fruits, or other agricultural products, containing sugar and includes, but is not limited to, still wines, champagne and other sparkling wines, carbonated wines, imitation wines, vermouth, cider, perry, sake, or other similar beverages offered for sale or sold as wines containing not less than seven percent (7%) nor more than twenty-four percent (24%) alcohol by volume.«
Beer is defined as »any beverage obtained by the fermentation of barley, malt, hops, or any other similar product or substitute, and containing more alcohol than that of nonintoxicating beer or nonintoxicating craft beer«. Alcoholic liquors are defined as »alcohol, beer, including barley beer, wine, including barley wine [my emphasis] and distilled spirits, […]«. So in West Virginia state law, the concept ›wine‹ includes products based on pears, apples, and rice as long as they contain more than 7% and less than 24% ethanol, i. e. apparently all alcoholic beverages between these limits that are not beer, since beer is discussed under a different heading. But then again »barley wine« is identified as a kind of wine. However, it shares all relevant properties with beer except its alcoholic strength.
So the law is self-contradictory. If we wanted to model the taxonomy of beverages in West Virginia state law, we would have to settle either for a concept of wine that does include beverages made from barley and does not require that its fermentation is based on sugar. Or we could disregard the subsumption of barley ›wine‹ under the concept of wine, so that our model remains incomplete.
Concepts in law are necessarily vague: courts must have the freedom to apply the law to new beverages that were unknown when the legislation was written. Stipulated meanings in a law can contradict our common-sense notions, so that they cannot easily be mapped on existing ontologies that, like the wine ontology, understand wine as potable liquid that is made from grape. The occurrence of contradictions, vagueness, and tensions between concepts in different domains can lead sceptics to the conclusion that, since concepts are nothing but social constructions that do not follow the strict requirements of the ontology engineer, the whole endeavour of modelling knowledge in a machine-readable way is doomed. Conversely, realists would probably point out that the legal meaning of ›wine‹ in West Virginia could be reconstructed in principle, if we had functioning ontologies of artefacts and social institutions. The resulting determination may be incredibly complex, but feasible in principle.
Or we may wonder whether ›reification‹ can be a solution. RDF, an XML dialect for describing web resources semantically, offers support for this technique, so I will use its syntax to explain the notion.
RDF represents a reified statement as four statements with particular RDF properties and objects: the statement (S, P, O), reified by resource R, is represented by:
- R rdf:type rdf:Statement
- R rdf:subject S
- R rdf:predicate
- R rdf:object O
The first triple (R rdf:type rdf:Statement) can be used to refer to the statement that is composed of the tree terms S, P, and O. We could thus refer to the concept »Sake is legal wine in West Virginia« by simply naming the statement »Sake_is_legal_wine_in_WV« (or SLWWV) and composing it out of the subject term »Sake«, the relation »is subclass of«, and the object term »legal_wine_in_West_Virginia«. Such a ›quadlet‹ does allow us to refer to a statement as a whole. So we could express the intentions of West Virginia legislators in formulating the Legislative Rule by forming a second statement with »West Virginia legislators« as subject term, »stipulate« as relation term, and »RLWWV« as object term: West Virginia legislators stipulate that Sake is legally wine in West Virginia.
But reification helps us only as long as the domain to be modelled is not characterised by self-contradictory notions. If we wanted to reify the statement »barley wine is wine«, we would run into problems. Since barley wine is in fact stipulated to be beer and since the stipulation for beer contradicts the stipulations for wine (beer is based on fermentation of grain, wine is based on fermentation of sugar), any ontology trying to capture the intentions of West Virginia legislators is bound to fail, because these intentions contradict each other: a coherent model is impossible. This is different from a situation in which we are merely unsure about the factual truth or falsity of a statement: a reified statement can be false as long as its falsehood is purely factual. But reification cannot salvage us from logical or conceptual incoherence.
The way out of this quandary is to deny »barley wine is wine« the status of a RDF statement. »Barley wine is wine« is just the name of a statement containing a subject term, a relation term, and an object term, but none of these terms is part of a RDF triple. Hence their aggregation in a statement does not constitute a RDF statement. This expresses the fact that the status of this triple of terms as the description of a resource (i. e. something ›out there‹) is uncertain. Since the reference of the statement is unclear, the same must be presumed for its meaning (or lack thereof). The meaning of the statement is opaque, even though we can specify the terms it contains. But it is equally important to describe the content, whatever it may be, as a propositional content that can be ascribed to the creators of this statement, i. e. presumably legislators in the state of West Virginia.
The main advantage of such an approach over proper reification is that it can be used heuristically: we do not need a full blown ontology for capturing the content of a given discourse in a form that is amenable to further refinement and development. This heuristic approach is particularly useful when we are interested in the connection between what has been said and who said it, i. e. in all domains in which we capture opinions of people, i. e. in all domains that proceed ›doxographically‹. And it can accommodate the development of the intension of a concept over time and thus be helpful to track the history of concepts, beliefs, and theories.
4. A Use Case: Capturing a Philosophical Discourse Doxographically
If we want to condense the approach sketched in this paper into a handy slogan, we could say that it focuses on what people say about the world rather than on what there is in the world. It records opinions rather than facts. In the history of philosophy, doxography, the recording of opinions, is a venerable tradition going back to ancient times. So the minimal ontology for capturing opinions of others can be said to proceed ›doxographically‹. It comprises abstract and spatiotemporal entities, namely persons holding or texts articulating a certain belief and the propositional content of the belief. Propositional attitudes can be understood as properties of spatiotemporal entities, namely the property of asserting, denying, or merely reflecting upon a given propositional content.
So the minimal ontology for capturing opinions of others can be said to proceed ›doxographically‹. It comprises abstract and spatiotemporal entities, namely persons holding or texts articulating a certain belief and the propositional content of the belief. Propositional attitudes can be understood as properties of spatiotemporal entities, namely the property of asserting, denying, or merely reflecting upon a given propositional content.
Such a minimal doxographical ontology can be used to capture the content of a given discourse without making any assumptions about the conceptual structure of the respective domain. In a proof of concept at EMTO Nanopub I have assembled ›doxographical facts‹ about the debate on how to define ›philosophy‹ in early modern Iberian philosophy, collating the viewpoints of Gaspar Cardillo de Villalpando, the Complutenses, the Conimbricenses, Diego Mas, Vicente Montanes, Antonio Rubio, José Saenz de Aguirre, and Franciscus Toletus as ›nanopublications‹. Even without additional conceptual analysis of these propositions, we can gain some interesting insights from this purely doxographical ›record keeping‹.
Figure 1 shows a network of the eight authors and the propositional content they assert, deny or reflect upon in their texts about the proper definition of philosophy. It has been produced in gephi, a very comprehensive tool for the production of network diagrams. The authors are displayed as ›nodes‹ in this network diagram that only serve as starting points of ›edges‹ (arrows). The edges themselves are coloured according to the propositional attitude that exists between author and propositional content: green arrows signify an assertion, red arrows a negation, grey arrows a neutral stance (e. g. a quotation). Propositional contents are siginified by those nodes in which edges end. Since the visualisation is quite complex, zooming and panning is supported via the SVGpan library. This allows the viewer to explore the structure of the presented network of authors and propositional contents interactively. The points of arrows are linked to URLs of nanopublications on EMTO-Nanopub.
Even a cursory inspection of this visualisation provides some interesting insights into parameters that usually are not in the centrer of attention of historians of philosophy. We can discern marked differences in the ratio of grey arrows to coloured arrows in various authors. Antonio Rubio mostly provides theses without taking an explicit stance: most of the arrows starting from this node are grey. In contrast, Cardillo de Villalpando (at the bottom of the figure) prefers a thetical style of writing and expresses only things that are the case: all arrows starting from this node are green. A second surprising result is that, even though many may view Spanish scholasticism as a unified school of thought, many topics come up only in one author. Only a minority of assertions or denials concern more than one or two authors. Reflections like these may lead to a more precise quantitative analysis of argumentative strategies, but would certainly not have come into view by just reading the texts.
But beyond such ›stylometric‹ reflections, visualisations also help us to understand the structure of a debate more precisely. Debates consist in contents that are either asserted or denied. So we can omit all edges that denote mere reflection on a given content and include only assertion and negation as propositional attitudes (edges). And participants in a debate are supposed to endorse at least one thesis that is endorsed or denied by other participants as well.
To apply these two criteria simplifies the picture considerably. One author drops out of the picture, because he does not fulfil the second criterion: Rubio does not take a stance that is either asserted or denied by another author in the debate. Apparently, there are two camps in the debate, one that seems to focus on philosophy as knowledge of causes (in the upper region of figure 2) and one that seems to be concerned with the nexus between philosophy and the Divine (in the bottom region of the figure 2). The bridge between both camps is built by Vicente Montañés who asserts contents that can be found in both camps. This impression is reinforced when we simplify further and include only those propositional contents that are asserted by at least two thinkers (figure 3):
Two general points are worth emphasising: first, it should be noted that the visualisations presented here are the result of algorithms for the visualisation of networks implemented in gephi. Some minor redactions had to be added manually, but the overall representation of the structure of the debate is not the result of conscious design decisions. Since it is the machine that does the work of structuring the debate, hermeneutic biases are minimised in this step. Second, this approach to visualising excerpts of ›the history of philosophy‹ allows to trace each and every ›visual assertion‹ to the relevant evidence, since every edge that connects an author to a propositional content, i. e. every doxographical statement, is linked to a nanopublication providing the bibliographical data of the source text and the author making the doxographical statement.
›Doxography‹ is an essential, though mostly underrated, element in the workflow of any historian of philosophy. Working with a text, we must first produce summaries, excerpts, or other research notes that help us to fixate its content, before we tackle the more complex task of reconstructing its arguments, comparing them to other sources, and evaluating their validity either in their historical context or in relation to contemporary problems. To deal with this process using digital tools may in itself transform and enhance existing practices in the history of philosophy. But, more importantly, it also opens up new research questions and may change our understanding of the discipline as a whole.
5. Digital Doxography and Heuristic Ontologies: A Vision
In the use case presented here, the ›semantic‹ dimension of semantic web technologies was conspicuously absent. But we can now articulate a broader vision of how the heuristic use of ontologies could transform not merely the record keeping of the digital doxographer, but transform our strategies of interpreting philosophical sources.
One particularity of (at least some) philosophical theories consists in the way that they try to develop conceptual hierarchies that quite easily could be transformed into statements of an ontology. If we take for example the following two propositional contents:
S:philosophy R:is O:habit
S:part of philosophy R:is O:species of philosophy
For an expert in the domain it is fairly obvious that the first triple expresses a relation of conceptual subordination: It could thus be transformed into a corresponding OWL statement:
<rdfs:subClassOf rdf:resource="habit" />
The second example expresses the identification of the extensions of two concepts. Everything that is an example of the intension ›is part of philosophy‹ is at the same time an example of the intension ›is a species of philosophy‹. In other words, both concepts are, in the terminology of OWL, equivalent classes:
<owl:Class rdf:ID="part of philosophy">
<rdfs:equivalentClassOf rdf:resource="species of philosophy" />
Of course, it is important to note that again we should not misconstrue such statements as being concerned with philosophy as a thing in the world. These statements, too, must be suitably qualified, namely as ›intentional objects‹ of philosophical thought in a given period. And a second important caveat applies: by transforming the isolated statements of a doxographical record into the reconstruction of a conceptual scheme, we leave the domain of ›facts‹ and enter into the realm of interpretation. The more complex a source text is, the more we may expect deviations between different attempts of reconstruction. In this respect, the use of code as a medium of interpretation does not change its fundamental hermeneutic characteristics. But ontologies, understood as the expression of a coherent vision of a given conceptual scheme, could nevertheless develop into powerful tools for the historian of philosophy.
This is particularly true for those domains in which mass digitisation projects have made available large number of previously unknown or inaccessible sources. Since we may expect some progress regarding optical character recognition of historical prints, it is to be hoped that a large number of these texts will at some time in the future be available as machine-readable e-texts. And even though practitioners in the field assert that natural language processing of Latin texts is difficult, because these texts are written in Latin, some progress on this front will hopefully be made too. The upshot of this is that these developments may help us in extracting doxographical triples from a large number of texts, both in Latin and the vernaculars, in order to gain a deeper and more comprehensive understanding of the historical record as it stands. The methodology proposed here may then prove to be a fruitful strategy for turning this content into semantically rich information.