Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
Documentation
Project
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
Rachael Hu
Documentation
Commits
2f7f2e07
Commit
2f7f2e07
authored
Aug 06, 2015
by
twl8n
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
add detail
parent
12524992
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
86 additions
and
32 deletions
+86
-32
Vocabulary-properties-and-ontologies.md
Vocabulary-properties-and-ontologies.md
+86
-32
No files found.
Vocabulary-properties-and-ontologies.md
View file @
2f7f2e07
...
...
@@ -12,18 +12,21 @@ entity with labels and definitions.
### Property domain
By design properties have a domain which is made explicit by the definition. The Wikipedia clarifies this
By design, each property has a domain which is made explicit by the definition. The Wikipedia clarifies this
issue since some terms only result in a disambiguation page. My "automotive" example may not be good, but the
"belt" example holds up quite well.
"Automotive" defined as "related to automobiles" includes "automobile" and therefore has a broad domain. There
might be no property "automobile". We need a policy for properties, but I'm fairly sure that a property should
be a broad as possible without being ambiguous, and narrowing is handled by ontology and multiple
properties. "Seat belts" are "webbing designed to retain an occupant or object in a seat". There is no
single property "Automotive--Seat belts", but the separate ability to create complex lists and ontological
entries allows for "Automotive" + "Seat belt", as well as "Roller coaster" + "Seat belt" and "Aircraft" +
"Seat belt".
The properties "automotive" and "automobile" are closely related, and in common use it is reasonable to use
them interchangeably in some many situations. "automotive" + "seat belt" and "automobile" + "seat belt" differ
in subtle ways. The difference is subtle enough to guarantee that some database records will be code on way,
and some another way. If there are any errors where one or the other term was used, then due to amibiguity, any
reasoning about these closely related terms is likely be erroneous.
"seat belts" are "webbing designed to retain an occupant or object in a seat". There is no single property
"automotive--seat belts", but the separate ability to create subject lists allows for "automotive" + "seat
belt", as well as "roller coaster" + "seat belt" and "aircraft" + "seat belt".
The curatorial question: Is a seat belt simply a complex list of "belt" and "seat". I think not. The word
"belt" is (in English) applied to clothing, machinery belts, and webbing. Arguably, "seat belt" is simply
...
...
@@ -48,12 +51,12 @@ Even Spanish has issues with belt. The word "correa" by itself is "strap".
It is reasonable to have a property "airplane" as well as "aircraft"? Perhaps, and there is some burden on
archivists to choose "jet (aircraft)" + "seat belt" when speaking of a Boeing 747, and "
Aircraft" + "G
lider" +
"
S
eat belt" when refering to a glider, although I'm not convinced that glider seat belts adds useful facets of
archivists to choose "jet (aircraft)" + "seat belt" when speaking of a Boeing 747, and "
aircraft" + "g
lider" +
"
s
eat belt" when refering to a glider, although I'm not convinced that glider seat belts adds useful facets of
information.
The ontology needs to clarify sub-classes where "jet (aircraft)" and "glider" are more specific
examples of "
A
ircraft". In fact, using a broad category is superfluous when the narrow property is supplied.
examples of "
a
ircraft". In fact, using a broad category is superfluous when the narrow property is supplied.
It would be just as useful to search/analaysis to have a separate topical subject "gliders". In fact, from a
computational point of view, there is probably no difference between complex lists of topical subjects and
...
...
@@ -72,35 +75,86 @@ In fact, the first example is more clear and easier to analyze. If this is true,
universal rule: There are no complex properties, although multiple properties are allowed (and encouraged)
Properties are not themselves complex. "Automotive" is broad, and if the added specificity of "parts" is
desired then a second topic "Parts" (Parts: components of a large entity) needs to be used. Component lists
are in the domain of ontologies, not properties. There is no single property "Automotive--Parts", or
"Automotive--Paintings", and this needs to be enforced by database design, user interface and policy.
### Proper entities are not properties.
Properties are not themselves complex. "automotive" is broad, and if the added specificity of "parts" is
desired then a second topic "parts" (Parts: components of a large entity) needs to be used. Component lists
are in the domain of ontologies, not properties. There is no single property "automotive--parts", or
"automotive--paintings", and this needs to be enforced by database design, user interface and policy.
### Proper entities are not properties
The ontology linkage handles issues such as "Automobiles" +
"Detroit". There is no property "Detroit", although there is a CPF entity for "Detroit, MI USA". It is
possible to conflate CPF entities in the user interface to enable the construction of a topical subject
"Automobiles" + "Detroit".
The ontology linkage handles issues such as "automobiles" + "detroit". There is no property "detroit",
although there is a CPF entity for "Detroit, MI USA". It is possible to conflate CPF entities in the user
interface to enable the construction of a topical subject "automobiles" + "detroit", although that is a bad
idea. The data should be as well-constructed as possible. When searching for subject and place, it is
reasonable to either parse the place name, or have the user explicitly choose a place name.
Consider what happens if (and I'm opposed to this) all CPF entities were imported into the property
table. That would be denormalization and data duplication, and will only end in tears.
table. That would be denormalization and data duplication, and would only end in tears.
### Use Markov models instead of an ontology
Ontologies are difficult to create, and there is little agreement about them, both in structure and
content. There are several to choose from, the the properties they use are (speaking frankly) a huge
mess. Linking each aspect of an entity record's properties to the ontology is an onerous task, and fraught
with error largely because the linking is often a judgement call.
A technology exists that is easy to implement, powerful, and tractable in real life.
We can create a Markov matrices of the property terms. Multiplying Markov matrices causes them to converge
which reveals property relatedness as exists in the data. The effect is quite powerful and obviates the need
for a hand-created ontology. Missing relations (known to exist, but not discovered by the Markov convergence
because no records actually contain the desired relation) are easily rectified by either of two methods. The
first would be to add the correct relations to existing records. The second works by creating non-public
special records containing related terms and making the special records available to the Markov modeling
process. The whole Markov solution is only 2 or 3 pages of code, so we can write it and evaluate the
effectiveness.
See: Everything is miscellaneous.
https://en.wikipedia.org/wiki/Everything_Is_Miscellaneous
http://www.youtube.com/watch?v=WHeta_YZ0oE
http://www.youtube.com/watch?v=x3wOhXsjPYM
### Ontology uses property, but is a separate problem
The alternative to the Markov relation discovery is an ontology that relates properties both in relatedness,
and as a hierarchy from broad to narrow. As far as I can tell, such a network does not yet exist, and it would
be time consuming to create since it has to be done by hand, by humans. There are existing ontologies, but to
use them we would be forced to use their property lists, and (sadly) the property lists I've seen are both
incomplete and poorly constructed.
When describing two very different things, some properties would be the same. For example the topical subject
of a publisher, and of a work of art. The publisher creates books of automobiles, especially cars which have
been artistically painted. The work of art is a painting of an automobile.
```
Publisher subject:Automobiles subject:Painting (fine art)
Painting subject:Automobiles subject:Painting (fine art)
```
### Ontologies use the properties, but are a separate problem.
The underlying properties are the same in both branches of the ontology, but the ontological relationship is
quite different because one is a corporate body, and the other is an object. It is not the domain of a
property to know how it is applied to a database record. Also, the larger context of what is being described
changes how the description is perceived. In any case, the use of properties is sufficient for search and
discovery. Data beyond property is necessary to make assertions about the records.
The ontology should have publishers of Automotive + Paintings. Graphic materials ontology should include
Automotive + Paintings. The underlying properties are the same in both branches of the ontology, but the
ontological relationship is quite different because one is sub-category of "Publishers" and the other of
"Objects".
The example above is limited to properties as topical subject. It seems reasonable to apply additional
properties "typeOf", still using the same (original, large) list of properties. Types of "publisher (corporateBody)" and
"painting (object)" seem obvious. Applying both property and type pairs will explicitly categorize any
database record, even without using an ontology. However, it is unclear how this somewhat loosely coupled
description will impact being able to reason about database records.
The ontology system deals with sub-classes as the example where "Jet (aircraft)" and "Glider" are more specific
examples of "Aircraft".
It also seems resonable to constrain some properties to be used only as certain types. A gender property is
nonsense in the context of a topical subject. On the other hand, "painting (fine art)" could be both a subject
and a typeOf. The conservative approach is to limit each property to a single type.
### Ontology and property interact to create search facets
A search for "
Aircraft" should turn up "Glider (aircraft)" even if the record in question lacks "A
ircraft" as
A search for "
aircraft" should turn up "glider (aircraft)" even if the record in question lacks "a
ircraft" as
a specific topical subject. In general, a search for a parent property should include all child properties as
specified by the ontology. Searching for the Spanish term "
Ropa" will include "C
inturon" which has the English
label "
B
elt (clothing)".
specified by the ontology. Searching for the Spanish term "
ropa" will include "c
inturon" which has the English
label "
b
elt (clothing)".
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment