Um die Anwendung von Netzwerkanalyse bei der Untersuchung von mehrseitigen ägäischen Siegeln zu ermöglichen, wurde eine Graphdatenbank in Neo4j erstellt. Viele der die Siegel beschreibenden Attribute enthalten unsichere Werte, die besonderer Aufmerksamkeit bedürfen, um sie in die Graphdatenbank zu integrieren. Der Aufsatz untersucht die verschiedenen Quellen der Unsicherheiten und präsentiert, wie diese in der Graphdatenbank modelliert werden können. Schließlich wird ein Anwendungsbeispiel vorgestellt, das auf die Darstellungen von Lebewesen auf den Siegeln fokussiert und die erstellte Graphdatenbank verwendet.
To facilitate the application of network analysis for the study of multi-sided Aegean seals, a graph database was implemented in Neo4j. Many of the seal’s attributes contain uncertainties, which require special attention if they are incorporated into the graph database. The article examines where the uncertainties originate from and presents how they can be modelled in the graph database. Finally, a practical example is presented, which focuses on depictions of creatures on seals and makes use of the created graph database.
- 1. Introduction
- 2. Data Source
- 3. Uncertainties in the Dataset
- 3.1 Uncertainties from Provenience
- 3.2 Uncertainties from a Seal’s Condition
- 3.3 Uncertainties by Human Action
- 3.4 Uncertain Values in the Data Set
- 4. The Graph Database
- 4.1 Data Model
- 4.2 Import into Neo4j
- 4.3 Modelling Uncertainties
- 5. Use Case
- 6. Conclusion
- Bibliographic References
- List of Figures with Captions
Aegean seals are small stone, bone or ivory objects of varying shapes, including discs, cylinders, rectangular blocks or triangular prisms. They originate mostly from Bronze Age Crete (Minoan seals) and mainland Greece (Mycenaean seals), thus dating from 3000 to 1100 BCE.
The seals were used for labelling, sealing or securing other objects, by producing relief impressions in soft materials like clay with their engraved faces. About ten percent of the approximately 10,000 seals known today have more than one such seal face, i.e. are multi-sided. An example is depicted in figure 1. For this group, it still remains unclear if the choice of the motifs engraved on the different faces of a seal follows specific rules or was haphazard. The work presented here is part of a project that aims to find answers to this question by means of computational methods.
Network analysis and its visualisation is an exploratory tool that can help in understanding how motif components such as depicted creatures are combined with each other. The method comes from network theory, a field concerned with the study of graphs. Therefore, storing information about the multi-sided seals in a graph database was a natural choice. While developing the data model, special attention had to be paid to unclear or uncertain attributes describing a seal, originating from e. g. a lack of distinct features of the depicted creatures for their unambiguous identification.
The data source used for this work is described in the following section, as well as the way information about the seals is organised. Section 3 focuses on the different kinds of uncertain attributes and explores why uncertainties in the dataset exist. Section 4 presents how the information was transferred into a graph database and introduces the data model.
2. Data Source
All seals considered in this work are recorded in the ›Corpus der Minoischen und Mykenischen Siegel‹ (CMS), a project established in Marburg in 1958, which moved to Heidelberg in 2011. Aim of the CMS is to document and publish all known Aegean seals. In 2007, all seal descriptions were included in the freely accessible object database Arachne. In Arachne, each seal is described in a relational database model with about fifty attributes, such as number of seal faces, dimensions, material, ornaments, and figurative motifs. These attributes are organised into eleven thematic groups: ›identification‹, ›provenience‹, ›shape‹, ›material & technique‹, ›measurements & preservation‹, ›general information about decoration‹, ›stylistic classification‹, ›ornaments‹, ›characters‹, ›figurative motifs excepting creatures‹, and ›creatures‹.
For this work, a total of 1033 pieces was taken into account, forming a dataset that consists of 1033 seals with 301 two-sided, 637 three-sided, 87 four-sided seals, and a few seals (3, 3, 1, and 1 pieces) with more faces (5, 6, 8, and 14 respectively).
Information about the seals was imported from Arachne via its Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) endpoint and then processed into two tables. The first contains all attributes describing one seal face in each row and for the second table all attributes of seal faces belonging to the same seal were merged into a single row. An additional table, containing information about depicted creatures was generated via web scraping. Finally, a fourth table with geospatial information was compiled from the second table with one seal per row and manually enriched with the coordinates of the seal’s place of discovery. These tables constitute the data basis for all applications in this project, including the graph database described here.
3. Uncertainties in the Dataset
Uncertainties in this dataset originate from three different causes, which are linked to the seal’s provenience, condition, and ultimately to a human component. These will be briefly examined in the next three subsections. The last subsection describes how values in the dataset are marked as uncertain and which attributes actually contain uncertain values.
3.1 Uncertainties from Provenience
Provenience of a seal, i.e. the archaeological context in which it was found, is important to be able to examine the meaning of a seal considering its place of deposit and other objects associated with it. A secure find context also provides certainty for geographic localisation and dating of a seal. For a few seals, the lack of a secure context might even lead to question their genuineness. If a seal was acquired under dubious circumstances, which in most of the cases means that it was not found during an archaeological excavation, information about its geographic origin has to be taken with a pinch of salt and context dating is not possible at all. Geographic information is available for a total of 583 seals, although for only 289 pieces a secure context is known.
3.2 Uncertainties from a Seal’s Condition
The overall condition of a seal plays a crucial role in the identification of its engravings. Determining the used ornaments, hieroglyphic signs or a creature’s species is much easier when studying a well preserved and complete seal as opposed to a battered, broken or even fragmentary one. The seals in different preservation conditions in figure 2 shall serve as a visual example.
3.3 Uncertainties by Human Action
An unpredictable cause for uncertain descriptions of a seal is human action, which is manifold. To begin with, the seals are man-made objects and thus already in their making the seal engraver may have made mistakes when cutting the stone or may even have obfuscated the motifs on purpose, thus resulting in ambiguous compositions. Additionally, the carved material, the tools used, the style, and the craftsmanship of the engraver led to very different depictions of the same motif. This can be seen by comparing the seal sides depicted in figures 2 and 3.
All this might then contribute to an uncertain denomination of motifs depicted on a seal by modern scholars while cataloguing the seals. Depending on a scholar’s background additional uncertainties may arise due to insufficient knowledge in other fields of expertise, as e.g. the wrong attribution of a specific bird species. In some cases, further vagueness is introduced when a seal is studied by more than one scholar, who might disagree in their interpretations.
When the information is then transferred into the database Arachne further uncertainties can be introduced by input errors or by leaving some fields empty. The latter is ambiguous, because it could mean that either the field was forgotten or that there actually is no information available for it.
3.4 Uncertain Values in the Data Set
Three different markers are used in Arachne to denote uncertain, but probable values in the database: a question mark, giving the different options, and a combination of the first two variants. When options are indicated, only two are given. In figure 3 for example, the values for the attribute ›Lebewesen‹ (creature) of CMS X 322c, CMS IS 038a, and CMS III 504a are ›Ziege?‹ (goat?), ›Rind oder Ziege‹ (bovine or goat), and ›Rind oder Ziege?‹ (bovine or goat?). In the dataset no value has more than two different options.
For supposedly present values which could not be identified, the terms ›undefinierbar‹ or ›undef.‹ (undefinable, indeterminate) were used. The term ›Intagliolücke‹ (gap in the intaglio) is also used for similar cases.
Not all of the sources for uncertainties discussed above are represented in the data set. Some uncertainties caused by human action are not denoted as such, because they are implicit, such as those caused by errors or lack of knowledge. An additional attribute to state, whether a seal came from a secure context or not was added to the data set by consulting the introductory sections of the printed CMS volumes.
Markers for uncertainty can be found in values for almost all attributes in the data set. From the eleven thematic groups mentioned in section 2 only ›identification‹ does not contain any vague values.
Besides the additionally introduced attribute for secure contexts, some attributes about the provenience of a seal such as places of origin are marked with a question mark when deemed doubtful. Uncertainties in the description of a seal’s shape are present if the seal is in a bad condition or only a fragment is preserved. This is why seemingly distinct properties such as the shape of the whole seal or of individual seal faces, the type of perforation or in one case – CMS V 256 – even the number of seal faces cannot always be indicated without doubt.
Attributes in the thematic group ›material & technique‹ include information about a seal’s material, the technique employed to make the seal, and any other details worth cataloguing. Uncertain values are present for all of those attributes in the dataset.
As can be expected, attributes in the group ›measurements & preservation‹ containing dimensions do not contain any uncertain values. In contrast, the preservation condition of a seal does so, especially if it is not possible to securely indicate that a seal shows tool marks or signs of burning.
All attributes describing the engravings a seal is bearing have values with uncertainties. They are part of the thematic groups ›general information about decoration‹, ›stylistic classification‹, ›ornaments‹, ›characters‹, ›figurative motifs excepting creatures‹, and ›creatures‹. In addition to the example for uncertain identification of creatures given in figure 3 above, two more cases shall be provided.
The attribute ›Standardornament‹ (standard ornament), part of the thematic group ›ornaments‹, can contain multiple values, which in turn can all individually be marked as uncertain. In figure 4, CMS II,1 136a is shown. ›Standardornament‹ contains the value ›Hakenspirale?(2), Punkt, undefinierbar‹ (spiral hook?(2), dot, indeterminate). It indicates that possibly two spiral hooks, one dot and and a further, indeterminate element, are depicted on the seal.
The attribute ›Schrift‹ (script), which belongs to the thematic group ›characters‹, may also contain multiple values in order to list all characters present on a seal’s face. The script used on Aegean seals is in most cases Cretan hieroglyphic. The hieroglyphs are specified by using the acronym ›CHIC‹ and a number which relates to the Corpus Hieroglyphicarum Inscriptionum Cretae. For example, the four-sided seal CMS II,2 316 in figure 5 has script on all sides, of which one character, CHIC 056, on face c is marked as uncertain because of the seal face’s damage.
4. The Graph Database
For the task at hand, which is concerned with the interrelations of motifs, not all attributes available are relevant. Thus the data model which is described in subsection 4.1 is simpler than the one used in Arachne. The graph database management system used was Neo4j, which offers a freely available Community Edition. The import of data into Neo4j is outlined in subsection 4.2 and modelling of uncertainties in the graph database is detailed in the last subsection.
4.1 Data Model
A major difference between the data model described here and the one used by Arachne is that for the graph database seals and their seal faces were modeled as distinct nodes. This allows to separate attributes describing features of the whole seal like material, shape or geographic information, from those only describing a single face, such as depicted motifs. This distinction was not made in Arachne.
Besides the two node types for a seal and for a seal side, 13 further node types were included and are either related to the whole seal or to a side: ›Siegeltyp‹ (seal shape), ›Materialgruppe‹ (material group), ›Material‹ (material), ›Ort‹ (place)‹ ›Stilgruppe‹ (style group), ›Epoche‹ (period), ›Makroornament‹ (macro ornament), ›Standardornament‹ (standard ornament), ›Symbole‹ (symbols), ›Objekte‹ (objects), ›Pflanzen‹ (plants), ›Schrift‹ (script), and ›Lebewesen‹ (creatures).
Nodes can be connected with each other by edges which in Neo4j are called relationships. For this dataset, a total of 17 different relationship types were created. The relationship type defines how and to which other node types a node can be connected. For example, the relationship type ›hat_Siegelseite‹ (has seal side) connects the node type ›Siegel‹ (seal) with the node type ›Siegelseite‹ (seal side). Four different relationship types can be used to connect the node type ›Siegelseite‹ (seal side) with ›Epoche‹ (period), in order to state whether the period represents the beginning of a context date (›hat_EpocheAnfKontext‹), its end (›hat_EpocheEndKontext‹), the beginning of a style date (›hat_EpocheAnfStil‹) or its end (›hat_EpocheEndStil‹).
The node types and their relationships are depicted in figure 6. The node type ›Siegel‹ (seal) is further described with additional properties, including the CMS number, the calculated volume, and the numbers of empty seal faces and depicted creatures.
4.2 Import into Neo4j
As described in section 2, the dataset was imported into two tables with further information provided in two additional tables. All those tables were imported into Neo4j with Cypher, a query language for graph databases. The Cypher script created for the import does not only import the dataset, but also cleans and filters it. This means that attributes with empty values are not imported into the graph, as well as attributes containing ›None‹ or ›nein‹.
In Arachne multiple equal values for an attribute are not listed separately, but the number of repetitions is set in parentheses, as already seen for CMS II,1 136a exemplified in section 3.4 and figure 4, where ›Standardornament‹ contains the value ›Hakenspirale?(2), Punkt, undefinierbar‹ (spiral hook?(2), dot, indeterminate). If the number is greater than six a greater-than sign is used, thus resulting in something like e.g. ›Hakenspirale(>6)‹. This was accounted for during import by creating an edge for each repetition. Unfortunately, it was not possible to determine the exact number for ›>6‹, which is why a maximum of six edges between a seal side and e.g. a specific standard ornament is created.
In figure 7, a part of the Cypher script for the import of ›Standardornament‹ demonstrates how data is converted into the graph data model. The resulting graph for the seal CMS II,1 085 from Figure 1 is shown in figure 8. Overall a total of 4637 nodes was created with 18306 relationships interconnecting them.
4.3 Modelling Uncertainties
There are different ways to represent uncertainties in graph databases that depend on the application and its planned queries. For instance, different nodes can be created for certain and uncertain values, like a node for ›Ziege‹ (goat) and another one for ›Ziege?‹ (goat?). In the worst case this would lead to the double amount of nodes. Adding a property to indicate uncertainty to a node would also lead to an increase in the number of nodes. Another way is to use the edges, either by providing different kinds of relationships (e.g. ›has‹ and ›may-have‹) or by including weights. The former leads to an increase in the number of relationships in the graph. The latter approach keeps the number of nodes and edges at a minimum and also allows to include a measure for the uncertainty such as percent. In the data model presented here uncertainties are modelled by applying edge weights ranging from 0 (0 % sure) to 1 (100 % sure).
Not all of the uncertainties mentioned in section 3 can be represented in the graph database, this is only possible for those which are marked as such in the data source. The weights are also set with the Cypher script, based on the value provided in Arachne. If a value comes without any uncertainty marker (e.g. ›Ziege‹ (goat)), the weight of the edge is set to 1. If the value is marked as uncertain with a question mark (e.g. ›Ziege?‹ (goat)), the weight of the edge is 1 > x < 0.5 which in this case was set to 0.8. For those values where two options are given, two edges are created. The weight of those edges depends on the presence of a question mark. A value like ›Rind oder Ziege‹ (bovine or goat) leads to two edges with a weight of 0.5, i.e. a 50 % chance that one of the two options is actually depicted. The value ›Rind oder Ziege?‹ results in two edges with the weight of 0.5 > y < 0, which in this case was set to 0.3. Furthermore, a count was introduced in order to be able to distinguish between two uncertain edge pairs belonging to one seal side.
In figure 9, seal CMS IS 038 is shown as a graph in Neo4j. Three edges, connecting the seal sides with creatures are uncertain with weights of 0.5 and 0.8.
5. Use Case
After data import, the graph database can be queried with Cypher. In this way the amount of all seals bearing creatures on them can be queried (724 seals in total). Also the amount of seals with uncertain edges to the node type ›Lebewesen‹ (creatures) can easily be identified and counted with a single query (329 seals). When viewing the database in the browser provided by Neo4j, query results can also be visualised and explored, which is something most relational database management systems do not provide out of the box.
As mentioned in the first section, the graph database was set up to facilitate doing network analysis on the dataset. This shall be demonstrated on the set of creatures depicted on a seal, where a specific research objective is to find out which creatures are combined with each other on seals.
For this task a set of nodes containing all creatures and the connections between them is needed. Since the connections are not contained in the database, they have to be created with the query shown in figure 10.
This can then be exported in order to be processed with a network analysis software, such as visone. The dataset can be provided in graphML format, which can be exported from Neo4j with a single command by using the neo4j-shell-tools. Here another major advantage of using a graph database becomes clear, because the list of nodes and edges does not have to be tediously produced from the four tables introduced in section 2.
Two different datasets were exported as graphML, imported into visone, analysed and visualised. In figure 11, the resulting graph of the dataset containing all co-occurrences of different creatures, including certain and uncertain ones, is displayed. When only considering creatures with a certainty equal to 0.8 or above, the graph results in the image in figure 12.
The implementation of a graph database for Aegean seals with more than one side for sealing requires special attention when uncertainties are present in the dataset. It is not possible to include uncertainties implicit to the dataset into the data model, but for those made explicit, various ways exist. In the case displayed here, modelling uncertainties as weighted edges proved to be most suitable.
Working with a graph database in practice proves to be intuitive and fast, especially because data is not only displayed in tabular form, but also as a graph with expandable nodes. Thus the user is allowed to further explore queried results.
Exporting data for further examination with network analysis software feels almost natural and by including or excluding uncertain values this process can be further tuned. The use case presented does only scratch the surface of the possibilities, as e.g. the result displayed in figure 11 does include both possible values for uncertain values with options. In further experiments the data could be analysed by only including one option and then be compared to an analysis including the other option.
By expanding the graph database model with further node types in order to e.g. introduce broader categories for the creatures, such as ›wild animal‹ or ›domestic animal‹, the underlying rules inherent to the use of motifs on the seals might be uncovered.