merging branch tom into master

a20a2182 · Tom Laudeman · 12aaedb3 · 088b8233 · a20a2182 · a20a2182
Commit a20a2182 authored Oct 30, 2015 by Tom Laudeman
13 changed files
--- a/Discussion/name_compoents_alternates.md
+++ b/Discussion/name_compoents_alternates.md
+
+#### Name and alternate name
+
+There is no consensus on the canonical name. There probably can be no single, always preferred name. What is
+preferred depends on context, and will vary for different purposes. In the data we can capture a reasonable
+amount of context, but only the users know what is preferred. Computationally, we should treat all name as
+alternates. We can offer names in one (or more) of several agreed-upon formats, leaving the choice up to the
+user.
+
+Name and alternate has no effect on identity matching because the match is done on all alternates, and uses
+all available data from the identity constellation. 
+
+#### Name components
+
+Give the variety of components in names, it is not possible to create a canonical, single set of components
+for many names and their alternates. Even the concept of "preferred" name is debatable.
+
+It is (mostly) possible to parse out the components for each name and each alternate name. Thus we have as many
+sets of name components as we have names for a single identity constellation. 
+
+Suggest: we not become dogmatic about component labels. We should avoid "family name" vs surname even though
+family name is perhaps more culturally relevant. Ditto givenname vs forename. Goal: be culturally agnostic
+in the name_component vocabulary, and capture cultural practice in some other place/table, such as table
+name_format. 
+
+If we want language specific name components then we need to add field "language" to name_components. Due
+to components being extracted from possibly several name strings, we probably will not join table name to
+table name_component. It would be a many-to-one join, and given that name component relates to cpf, and not
+to one or more name strings, a join between table name and table name component is not logical.
+
+Do we need information about how each name component was derived? If so we are probably better off saving a
+log of the name parser than trying to use the database as part of the historical record of name
+parsing. One aspect of database-centric component derivation would be a join table to handle the
+many-to-many relation between table name and table name component. Complete info about derivation also
+requires the name parsing version number and any configuration at the time the parsing was done.
+
+Field nc_label (table name_component) must come from a controlled vocabulary in order for dynamic
+formatting to work.
+
+We should not allow our technology to be defined by the "minimal existing implementation" of name. We will
+cripple SNAC if we only meet the minimal definition of name. Additionally, SNAC has needs beyond that of
+our archives and authority stake holders.
+
+#### Minimal list of components labels
+
+The least flexible system (of the 3 or 4 we have reviewed) for name components is probably MARC. Even in MARC
+the $c is repeatable, allowing for large number of (unlabeled) components. This system is probably too
+restrictive, although it allows us to capture middle name when possible. However, lack of flexible labels
+makes MARC a weak standard for names.
+
+surname, forename, additions, numeration, expansion
+
+#### Larger list of components
+
+This list comes from a combination of MARC, Unimarc, ISNI, ArchiveSpace, and British Library. We might be wise
+to include others as well (VIAF, BnF, AnF, Archives Hub UK).
+
+surname, middle, forename, prefix, suffix, epithet, title, pretitle, numeration, additional
+
+Unfortunately, current name format guidelines are ambiguous. The most common problem is that middle name could
+be a second forename or the second of several additional name components. 
+
+There is general agreement on "name" and "non-name" parts, although no guides explicitly talk about this. Name
+parts are surname, forename, and middle name. Many other non-name parts are often found in names. Both the
+name and non-name parts have ambiguous rules within most systems, and the various systems and cultures have
+incomplete agreement. 
+
+The database is improved by labeling the compoents where possible, but our algorithms and user
+interface can (fairly) easily remaing agnostic about compoent labels while processing anddisplaying the
+compoents as well as formatting the components into names.
+
+In the past, failure to create names by (re-) formatting components has led to inconsistent names. While is it
+not possible to 100% parse or format names, it is also true that humans who did the data entry were not 100%
+accurate in their formats. The computer can be more consistent, and probably nearly as accurate as the human
+editors (especially where the editors cannot agree or where they have ambiguous rules). While it is
+technically feasible to format names from components, it is also feasible to keep and (carefully) display the
+names as originally entered.
+
+#### Overview
+
+ISNI: prefix (NR), surname (NR), forename (additional forenames) (R-ish), middle name (second and subsequent
+forenames) (R-ish), suffix (NR)
+
+Unimarc: $a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
+
+MARC: $a name (NR), $b numeration (NR), $c titles and other words (R), $q fuller form (NR)
+
+
+#### Detailed fields by authority
+
+(R) are repeatable fields. (NR) are non-repeating fields.
+
+
+#### ISNI
+
+From: ISNI fields of tab delimited format for data submission, A. MacEwan et al.
+
+http://dx.doi.org/10.1080/01639374.2012.730601
+
+http://www.tandfonline.com/doi/abs/10.1080/01639374.2012.730601?journalCode=wccq20
+
+ISNI components:
+
+prefix, surname, forename and additional forenames, middle name second and subsequent forenames, suffix
+
+- prefix: e.g. Sir
+- surname: all parts of surname in the form commonly used; for alt form use alt name
+- forename: one or more forenames or initials
+- middle: second and subsequent forenames
+- suffix: e.g. Esq
+- ISNI's input format supports a single alternate name in indirect format
+
+#### Unimarc
+
+http://www.ifla.org/files/assets/uca/unimarc_updates/BIBLIOGRAPHIC/u-b_700_update.pdf
+
+Unimarc components:
+
+$a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
+
+
+LoC MARC 1xx
+
+http://www.loc.gov/marc/authority/ad100.html
+
+MARC and Unimarc are surname oriented: "that part of the name by which the name is entered in ordered
+lists", although they allow the main entry to be a forename (distinguished by indicators attribute). All
+non-surname parts of the given name are thrown together into forename, both have a numeration field, both
+allow many additional components. MARC $c acknowledges that there are many types of other words associated
+with names.
+
+MARC components:
+
+$a name (NR), $b numeration (NR), $c titles and other words (R), $q fuller form (NR)
+
+(First indicator determines direct or indirect format of $a.)
+
+```
+100 - MAIN ENTRY--PERSONAL NAME (NR)
+   Indicators
+      First - Type of personal name entry element
+         0 - Forename
+         1 - Surname
+         3 - Family name
+      $a - Personal name (NR)
+      $b - Numeration (NR) [roman numerals]
+      $c - Titles and other words associated with a name (R) [Jr., King of Sweden, Meister, pseud., Sir, (Anglo-Norman poet), Esq., II]
+      $d - Dates associated with a name (NR)
+      $q - Fuller form of name (NR)
+```
+
+#### ArchiveSpace
+
+http://sandbox.archivesspace.org/
+
+ArchiveSpace components:
+prefix, title, "primary part of name (required)", rest of name, suffix, fuller form.
+
+
+
+
--- a/Discussion/name_components_alternates.md
+++ b/Discussion/name_components_alternates.md
+
+#### Name and alternate name
+
+There is no consensus on the canonical name. There probably can be no single, always preferred name. What is
+preferred depends on context, and will vary for different purposes. In the data we can capture a reasonable
+amount of context, but only the users know what is preferred. Computationally, we should treat all name as
+alternates. We can offer names in one (or more) of several agreed-upon formats, leaving the choice up to the
+user.
+
+Name and alternate has no effect on identity matching because the match is done on all alternates, and uses
+all available data from the identity constellation. 
+
+#### Name components
+
+Give the variety of components in names, it is not possible to create a canonical, single set of components
+for many names and their alternates. Even the concept of "preferred" name is debatable.
+
+It is (mostly) possible to parse out the components for each name and each alternate name. Thus we have as many
+sets of name components as we have names for a single identity constellation. 
+
+Suggest: we not become dogmatic about component labels. We should avoid "family name" vs surname even though
+family name is perhaps more culturally relevant. Ditto givenname vs forename. Goal: be culturally agnostic
+in the name_component vocabulary, and capture cultural practice in some other place/table, such as table
+name_format. 
+
+If we want language specific name components then we need to add field "language" to name_components. Due
+to components being extracted from possibly several name strings, we probably will not join table name to
+table name_component. It would be a many-to-one join, and given that name component relates to cpf, and not
+to one or more name strings, a join between table name and table name component is not logical.
+
+Do we need information about how each name component was derived? If so we are probably better off saving a
+log of the name parser than trying to use the database as part of the historical record of name
+parsing. One aspect of database-centric component derivation would be a join table to handle the
+many-to-many relation between table name and table name component. Complete info about derivation also
+requires the name parsing version number and any configuration at the time the parsing was done.
+
+Field nc_label (table name_component) must come from a controlled vocabulary in order for dynamic
+formatting to work.
+
+We should not allow our technology to be defined by the "minimal existing implementation" of name. We will
+cripple SNAC if we only meet the minimal definition of name. Additionally, SNAC has needs beyond that of
+our archives and authority stake holders.
+
+#### Minimal list of components labels
+
+The least flexible system (of the 3 or 4 we have reviewed) for name components is probably MARC. Even in MARC
+the $c is repeatable, allowing for large number of (unlabeled) components. This system is probably too
+restrictive, although it allows us to capture middle name when possible. However, lack of flexible labels
+makes MARC a weak standard for names.
+
+surname, forename, additions, numeration, expansion
+
+#### Larger list of components
+
+This list comes from a combination of MARC, Unimarc, ISNI, ArchiveSpace, and British Library. We might be wise
+to include others as well (VIAF, BnF, AnF, Archives Hub UK).
+
+surname, middle, forename, prefix, suffix, epithet, title, pretitle, numeration, additional
+
+Unfortunately, current name format guidelines are ambiguous. The most common problem is that middle name could
+be a second forename or the second of several additional name components. 
+
+There is general agreement on "name" and "non-name" parts, although no guides explicitly talk about this. Name
+parts are surname, forename, and middle name. Many other non-name parts are often found in names. Both the
+name and non-name parts have ambiguous rules within most systems, and the various systems and cultures have
+incomplete agreement. 
+
+The database is improved by labeling the compoents where possible, but our algorithms and user
+interface can (fairly) easily remaing agnostic about compoent labels while processing anddisplaying the
+compoents as well as formatting the components into names.
+
+In the past, failure to create names by (re-) formatting components has led to inconsistent names. While is it
+not possible to 100% parse or format names, it is also true that humans who did the data entry were not 100%
+accurate in their formats. The computer can be more consistent, and probably nearly as accurate as the human
+editors (especially where the editors cannot agree or where they have ambiguous rules). While it is
+technically feasible to format names from components, it is also feasible to keep and (carefully) display the
+names as originally entered.
+
+#### Overview
+
+ISNI: prefix (NR), surname (NR), forename (additional forenames) (R-ish), middle name (second and subsequent
+forenames) (R-ish), suffix (NR)
+
+Unimarc: $a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
+
+MARC: $a name (NR), $b numeration (NR), $c titles and other words (R), $q fuller form (NR)
+
+
+#### Detailed fields by authority
+
+(R) are repeatable fields. (NR) are non-repeating fields.
+
+
+#### ISNI
+
+From: ISNI fields of tab delimited format for data submission, A. MacEwan et al.
+
+http://dx.doi.org/10.1080/01639374.2012.730601
+
+http://www.tandfonline.com/doi/abs/10.1080/01639374.2012.730601?journalCode=wccq20
+
+ISNI components:
+
+prefix, surname, forename and additional forenames, middle name second and subsequent forenames, suffix
+
+- prefix: e.g. Sir
+- surname: all parts of surname in the form commonly used; for alt form use alt name
+- forename: one or more forenames or initials
+- middle: second and subsequent forenames
+- suffix: e.g. Esq
+- ISNI's input format supports a single alternate name in indirect format
+
+#### Unimarc
+
+http://www.ifla.org/files/assets/uca/unimarc_updates/BIBLIOGRAPHIC/u-b_700_update.pdf
+
+Unimarc components:
+
+$a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
+
+
+LoC MARC 1xx
+
+http://www.loc.gov/marc/authority/ad100.html
+
+MARC and Unimarc are surname oriented: "that part of the name by which the name is entered in ordered
+lists", although they allow the main entry to be a forename (distinguished by indicators attribute). All
+non-surname parts of the given name are thrown together into forename, both have a numeration field, both
+allow many additional components. MARC $c acknowledges that there are many types of other words associated
+with names.
+
+MARC components:
+
+$a name (NR), $b numeration (NR), $c titles and other words (R), $q fuller form (NR)
+
+(First indicator determines direct or indirect format of $a.)
+
+```
+100 - MAIN ENTRY--PERSONAL NAME (NR)
+   Indicators
+      First - Type of personal name entry element
+         0 - Forename
+         1 - Surname
+         3 - Family name
+      $a - Personal name (NR)
+      $b - Numeration (NR) [roman numerals]
+      $c - Titles and other words associated with a name (R) [Jr., King of Sweden, Meister, pseud., Sir, (Anglo-Norman poet), Esq., II]
+      $d - Dates associated with a name (NR)
+      $q - Fuller form of name (NR)
+```
+
+#### ArchiveSpace
+
+http://sandbox.archivesspace.org/
+
+ArchiveSpace components:
+prefix, title, "primary part of name (required)", rest of name, suffix, fuller form.
+
+
+
+
--- a/Requirements/Workflow Engine.md
+++ b/Requirements/Workflow Engine.md
+
+#### Introduction
+
+This work flow engine is a lightweight request routing tool. In our application it encapsulates our business
+processes at a high level. Architecturally, it lives inside web middle-ware. Its function in the middle-ware
+is to handle calling the proper high level functions. We have two workflow engines, because we separate web
+UI based workflow from fundamental business (policy) issues, rather than conflating the two problems.
+
+Small web applications that will always be small (a maximum of 5 web pages) often use "page controllers" where
+each page handles its own logic, and connections between pages are implicit in the links. Larger sites use a
+"front controller" which is a single point of control.
+
+The the workflow engine handles the application decision making logic in the front controller. Business
+process decisions are handled in the server-side front controller, and we have separate workflow limited
+browser and UI. Web http requests go to the browser controller, where they are normalized for the server
+controller. REST calls are also normalized and sent to the same sever controller. Thus interactions with the
+server internals always follow consistent business and policy workflow.
+
+It is important to remember that nearly all aspects of the current application design involve lightweight
+solutions to typical problems. Rather than a comprehensive framework, we have chosen to use a select set of
+off the shelf software modules to construct a framework suitable to our needs.
+
+
+#### Requirements
+
+The workflow engine encapsulates only decision making. It assumes other code deeper in the application will do
+the real work. The decisions are written down in a 4 column state table. Workflow is testable by stepping
+through the state table manually. Workflow is also testable via computational methods that will validate that
+the states will reach an exit, and that all states are reachable.
+
+The 4 columns are: starting state, boolean transition test, transition function to run, next state. There are
+3 pseudo-functions: jump, return, wait. The jump will push the current state onto an internal stack and jump
+to a new state. The return pops the stack and returns to that state where it immediately transitions to the
+next state. The wait might be called exit since it causes workflow to stop. 
+
+Workflow always begins with a default starting state. From a starting node, the boolean transition test is
+run. If true, the transition will occur. If false, the next state of the same name will run boolean transition
+test. If a transition function exists, it will be run (eval'd). The workflow transitions to the next state,
+and the process repeats until the wait function.
+
+To accomodate multiple boolean transition tests, there can be multiple rows with the same starting state
+name. These are tested in the order they occur in the state table. If none of the transition tests are true,
+the machine halts with an error. This possibility is revealed during testing. By convention, no transition
+test is true, thus any starting state may (and probably should) have a default catch-all. In keeping with
+business rules this answers the workflow question "What happens at this step if everything goes wrong?"
+
+
+#### Implementation as thought problem
+
+Implementation can be handled several ways which may help you think (extrapolate) how the
+system works.
+
+In the first mode, the workflow engine state table's functions are eval'd as literal function calls. For every
+function that the state table calls for a given state transition, the function must exist in the system. A
+string "unlock_record()" when eval'd will run the function unlock_record. The workflow engine doesn't know
+what exactly goes on inside that function, but it does "know" that it will unlock the current record. 
+
+Creation of the workflow involves a shared understanding between the programmer writing the workflow, and the
+programmer creating the system code.
--- a/Specifications/Originals/constellation_linked.gv
+++ b/Specifications/Originals/constellation_linked.gv
+digraph States {
+        # dot -Tsvg constellation_linked.gv -O
+        # Will create constellation_linked.gv.svg
+	label = "\n\nIdentity Constellation\nTwo linked identities";
+        labelloc="t";
+	fontsize=20;
+        inputscale=0;
+        # sep=1;
+        # splines=true;
+        overlap=false;
+        node [pos="4,5!"]; "root1";
+        node [pos="1,3!"]; ne1;
+        node [pos="3,3!"]; an1;
+        node [pos="5,3!"]; ed1;
+        node [pos="7,3!"]; cr1;
+        node [pos="10,3!"]; occ1;
+        node [pos="9,4!"]; rr1;
+
+        node [pos="3.3,1.5!"]; "root2";
+        node [pos="1,2!"]; ne2;
+        node [pos="3,0!"]; an2;
+        node [pos="1.5,1!"]; an3;
+        node [pos="5,0!"]; ed2;
+        node [pos="7.4,2!"]; cr2;
+        node [pos="9,0!"]; occ2;
+        node [pos="8,-1!"]; occ22;
+        node [pos="10,1.5!"]; rr2;
+
+        "ne1","ne2" [label="alt name"];
+        "an1", "an2", "an3" [label="alt name"];
+        "ed1", "ed2" [label="exist dates"];
+        "occ1", "occ2", "occ22" [label="occupation/function"];
+        "cr1", "cr2" [label="cpf relation"];
+        "rr1", "rr2" [label="resource relation"];
+        "root1" [label="identity-A"];
+        "root2" [label="identity-B"];
+
+        root1 -> ne1;
+        root1 -> an1;
+        root1 -> ed1;
+        root1 -> occ1;
+        root1 -> cr1;
+        root1-> rr1;
+
+        cr1 -> cr2 ;
+        cr2 -> cr1 ;
+
+        root2-> rr2;
+        cr2 -> root2  [dir="back"];
+        root2 -> occ2;
+        root2 -> occ22;
+        root2 -> ed2;
+        root2 -> an2;
+        root2 -> an3;
+        root2 -> ne2;
+
+
+}
--- a/Specifications/Originals/constellation_linked.gv.svg
+++ b/Specifications/Originals/constellation_linked.gv.svg
--- a/Specifications/Originals/constellation_overview.odg
+++ b/Specifications/Originals/constellation_overview.odg
--- a/Specifications/Originals/constellation_overview.svg
+++ b/Specifications/Originals/constellation_overview.svg
--- a/Specifications/Originals/identity_constellation.gv
+++ b/Specifications/Originals/identity_constellation.gv
+digraph States {
+        // neato -n2 -Tsvg identity_constellation.gv -O
+        // 
+        // Absolute positioning appears to only work with neato, and only if all nodes are pinned,
+        // but not always. neato -n2 units are points, and inputscale appears to be ignored
+        // sep=0.2 splines=polyline overlap=false allows the pos values to be followed,
+        // while getting the lines to go around nodes.
+
+	label = "\n\nIdentity Constellation";
+        labelloc="t";
+	fontsize=20;
+        // inputscale=75;
+        sep=0.08;
+        splines=polyline;
+        overlap=false;
+
+        "an1" [label="alt name"];
+        "an2", "an3" [label="alt name"];
+        "ed1" [label="exist dates"];
+        "occ1", "occ2" [label="occupation\nor function"];
+        "cr1", "cr2" [label="cpf relation"];
+        "rr1" [label="resource relation"];
+        
+
+        root1 [pos="350,400!" label="identity root"];
+        place [pos="200,450!" label="related place"];
+        an1 [pos="100,320!" ];
+        an2 [pos="100,250!" ]; 
+        an3 [pos="100,240!" ];
+        ed1 [pos="200,200!"];
+        biog [pos="160,400!" label="biog hist"] ;
+        cr1 [pos="500,300!"];
+        cr2 [pos="600,200!"];
+        et [pos="300,100!" label="entity type"];
+        occ1 [pos="350,250!"];
+        occ2 [pos="450,200!"];
+        rr1 [pos="550,350!"];
+        src [pos="550,400!" label="source"];
+        usedate [pos="105,315]", label="use dates"];
+
+        an1 -> usedate;
+        root1 -> et;
+        root1 -> src;
+        root1 -> place;
+        root1 -> an1;
+        root1 -> an2;
+        root1 -> an3;
+        root1 -> ed1;
+        root1 -> occ1;
+        root1 -> occ2;
+        root1 -> cr1;
+        root1-> rr1;
+        root1 -> biog;
+
+        cr1 -> cr2 ;
+        cr2 -> cr1 ;
+
+}
--- a/Specifications/Originals/identity_constellation.gv.svg
+++ b/Specifications/Originals/identity_constellation.gv.svg
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+ "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Generated by graphviz version 2.34.0 (20140101.1016)
+ -->
+<!-- Title: States Pages: 1 -->
+<svg width="637pt" height="472pt"
+ viewBox="0.00 0.00 637.14 472.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 468)">
+<title>States</title>
+<polygon fill="white" stroke="white" points="-4,4 -4,-468 633.14,-468 633.14,4 -4,4"/>
+<text text-anchor="middle" x="314.57" y="-396" font-family="Times,serif" font-size="20.00">Identity Constellation</text>
+<!-- an1 -->
+<g id="node1" class="node"><title>an1</title>
+<ellipse fill="none" stroke="black" cx="41.5963" cy="-269.413" rx="41.6928" ry="18"/>
+<text text-anchor="middle" x="41.5963" y="-265.713" font-family="Times,serif" font-size="14.00">alt name</text>
+</g>
+<!-- usedate -->
+<g id="node15" class="node"><title>usedate</title>
+<ellipse fill="none" stroke="black" cx="120.194" cy="-225.48" rx="43.5923" ry="18"/>
+<text text-anchor="middle" x="120.194" y="-221.78" font-family="Times,serif" font-size="14.00">use dates</text>
+</g>
+<!-- an1&#45;&gt;usedate -->
+<g id="edge1" class="edge"><title>an1&#45;&gt;usedate</title>
+<path fill="none" stroke="black" d="M67.3322,-255.027C72.9733,-251.874 79.0394,-248.484 84.9796,-245.163"/>
+<polygon fill="black" stroke="black" points="87.1102,-247.982 94.1315,-240.048 83.6949,-241.872 87.1102,-247.982"/>
+</g>
+<!-- an2 -->
+<g id="node2" class="node"><title>an2</title>
+<ellipse fill="none" stroke="black" cx="70.388" cy="-182.051" rx="41.6928" ry="18"/>
+<text text-anchor="middle" x="70.388" y="-178.351" font-family="Times,serif" font-size="14.00">alt name</text>
+</g>
+<!-- an3 -->
+<g id="node3" class="node"><title>an3</title>
+<ellipse fill="none" stroke="black" cx="52.8941" cy="-82.5315" rx="41.6928" ry="18"/>
+<text text-anchor="middle" x="52.8941" y="-78.8315" font-family="Times,serif" font-size="14.00">alt name</text>
+</g>
+<!-- ed1 -->
+<g id="node4" class="node"><title>ed1</title>
+<ellipse fill="none" stroke="black" cx="176.494" cy="-118" rx="48.9926" ry="18"/>
+<text text-anchor="middle" x="176.494" y="-114.3" font-family="Times,serif" font-size="14.00">exist dates</text>
+</g>
+<!-- occ1 -->
+<g id="node5" class="node"><title>occ1</title>
+<ellipse fill="none" stroke="black" cx="326.494" cy="-168" rx="55.3091" ry="26.7407"/>
+<text text-anchor="middle" x="326.494" y="-171.8" font-family="Times,serif" font-size="14.00">occupation</text>
+<text text-anchor="middle" x="326.494" y="-156.8" font-family="Times,serif" font-size="14.00">or function</text>
+</g>
+<!-- occ2 -->
+<g id="node6" class="node"><title>occ2</title>
+<ellipse fill="none" stroke="black" cx="426.494" cy="-118" rx="55.3091" ry="26.7407"/>
+<text text-anchor="middle" x="426.494" y="-121.8" font-family="Times,serif" font-size="14.00">occupation</text>
+<text text-anchor="middle" x="426.494" y="-106.8" font-family="Times,serif" font-size="14.00">or function</text>
+</g>
+<!-- cr1 -->
+<g id="node7" class="node"><title>cr1</title>
+<ellipse fill="none" stroke="black" cx="476.494" cy="-218" rx="52.7911" ry="18"/>
+<text text-anchor="middle" x="476.494" y="-214.3" font-family="Times,serif" font-size="14.00">cpf relation</text>
+</g>
+<!-- cr2 -->
+<g id="node8" class="node"><title>cr2</title>
+<ellipse fill="none" stroke="black" cx="576.494" cy="-118" rx="52.7911" ry="18"/>
+<text text-anchor="middle" x="576.494" y="-114.3" font-family="Times,serif" font-size="14.00">cpf relation</text>
+</g>
+<!-- cr1&#45;&gt;cr2 -->
+<g id="edge14" class="edge"><title>cr1&#45;&gt;cr2</title>
+<path fill="none" stroke="black" d="M493.913,-200.581C509.973,-184.521 533.976,-160.518 551.972,-142.523"/>
+<polygon fill="black" stroke="black" points="554.679,-144.765 559.275,-135.219 549.729,-139.815 554.679,-144.765"/>
+</g>
+<!-- cr2&#45;&gt;cr1 -->
+<g id="edge15" class="edge"><title>cr2&#45;&gt;cr1</title>
+<path fill="none" stroke="black" d="M559.076,-135.419C543.016,-151.479 519.012,-175.482 501.017,-193.477"/>
+<polygon fill="black" stroke="black" points="498.31,-191.235 493.713,-200.781 503.259,-196.185 498.31,-191.235"/>
+</g>
+<!-- rr1 -->
+<g id="node9" class="node"><title>rr1</title>
+<ellipse fill="none" stroke="black" cx="526.494" cy="-268" rx="71.4873" ry="18"/>
+<text text-anchor="middle" x="526.494" y="-264.3" font-family="Times,serif" font-size="14.00">resource relation</text>
+</g>
+<!-- root1 -->
+<g id="node10" class="node"><title>root1</title>
+<ellipse fill="none" stroke="black" cx="326.494" cy="-318" rx="55.4913" ry="18"/>
+<text text-anchor="middle" x="326.494" y="-314.3" font-family="Times,serif" font-size="14.00">identity root</text>
+</g>
+<!-- root1&#45;&gt;an1 -->
+<g id="edge5" class="edge"><title>root1&#45;&gt;an1</title>
+<path fill="none" stroke="black" d="M277.519,-309.648C225.166,-300.719 142.657,-286.648 90.379,-277.732"/>
+<polygon fill="black" stroke="black" points="90.7905,-274.252 80.3444,-276.021 89.6137,-281.152 90.7905,-274.252"/>
+</g>
+<!-- root1&#45;&gt;an2 -->
+<g id="edge6" class="edge"><title>root1&#45;&gt;an2</title>
+<path fill="none" stroke="black" d="M300.146,-301.895C251.614,-272.23 153.009,-211.959 153.009,-211.959 153.009,-211.959 132.706,-204.61 112.23,-197.197"/>
+<polygon fill="black" stroke="black" points="113.162,-193.813 102.568,-193.7 110.78,-200.395 113.162,-193.813"/>
+</g>
+<!-- root1&#45;&gt;an3 -->
+<g id="edge7" class="edge"><title>root1&#45;&gt;an3</title>
+<path fill="none" stroke="black" d="M306.818,-301.066C258.568,-259.541 134.333,-152.62 79.5364,-105.461"/>
+<polygon fill="black" stroke="black" points="81.5287,-102.558 71.6661,-98.6872 76.9625,-107.863 81.5287,-102.558"/>
+</g>
+<!-- root1&#45;&gt;ed1 -->
+<g id="edge8" class="edge"><title>root1&#45;&gt;ed1</title>
+<path fill="none" stroke="black" d="M313.183,-300.251C286.541,-264.729 226.578,-184.778 195.689,-143.593"/>
+<polygon fill="black" stroke="black" points="198.362,-141.323 189.562,-135.423 192.762,-145.523 198.362,-141.323"/>
+</g>
+<!-- root1&#45;&gt;occ1 -->
+<g id="edge9" class="edge"><title>root1&#45;&gt;occ1</title>
+<path fill="none" stroke="black" d="M326.494,-299.906C326.494,-276.556 326.494,-235.353 326.494,-205.194"/>
+<polygon fill="black" stroke="black" points="329.994,-205.034 326.494,-195.034 322.994,-205.034 329.994,-205.034"/>
+</g>
+<!-- root1&#45;&gt;occ2 -->
+<g id="edge10" class="edge"><title>root1&#45;&gt;occ2</title>
+<path fill="none" stroke="black" d="M335.545,-299.899C352.073,-266.843 387.408,-196.172 408.844,-153.301"/>
+<polygon fill="black" stroke="black" points="411.983,-154.848 413.325,-144.338 405.722,-151.718 411.983,-154.848"/>
+</g>
+<!-- root1&#45;&gt;cr1 -->
+<g id="edge11" class="edge"><title>root1&#45;&gt;cr1</title>
+<path fill="none" stroke="black" d="M350.928,-301.711C376.098,-284.931 415.489,-258.67 443.433,-240.041"/>
+<polygon fill="black" stroke="black" points="445.692,-242.742 452.071,-234.282 441.809,-236.917 445.692,-242.742"/>
+</g>
+<!-- root1&#45;&gt;rr1 -->
+<g id="edge12" class="edge"><title>root1&#45;&gt;rr1</title>
+<path fill="none" stroke="black" d="M370.385,-307.027C398.4,-300.024 435.101,-290.848 465.878,-283.154"/>
+<polygon fill="black" stroke="black" points="466.791,-286.534 475.644,-280.713 465.094,-279.743 466.791,-286.534"/>
+</g>
+<!-- place -->
+<g id="node11" class="node"><title>place</title>
+<ellipse fill="none" stroke="black" cx="176.494" cy="-368" rx="57.3905" ry="18"/>
+<text text-anchor="middle" x="176.494" y="-364.3" font-family="Times,serif" font-size="14.00">related place</text>
+</g>
+<!-- root1&#45;&gt;place -->
+<g id="edge4" class="edge"><title>root1&#45;&gt;place</title>
+<path fill="none" stroke="black" d="M287.866,-330.876C268.843,-337.217 245.702,-344.931 225.453,-351.681"/>
+<polygon fill="black" stroke="black" points="224.155,-348.424 215.775,-354.906 226.369,-355.065 224.155,-348.424"/>
+</g>
+<!-- biog -->
+<g id="node12" class="node"><title>biog</title>
+<ellipse fill="none" stroke="black" cx="136.494" cy="-318" rx="42.4939" ry="18"/>
+<text text-anchor="middle" x="136.494" y="-314.3" font-family="Times,serif" font-size="14.00">biog hist</text>
+</g>
+<!-- root1&#45;&gt;biog -->
+<g id="edge13" class="edge"><title>root1&#45;&gt;biog</title>
+<path fill="none" stroke="black" d="M271.028,-318C245.227,-318 214.692,-318 189.24,-318"/>
+<polygon fill="black" stroke="black" points="189.034,-314.5 179.034,-318 189.034,-321.5 189.034,-314.5"/>
+</g>
+<!-- et -->
+<g id="node13" class="node"><title>et</title>
+<ellipse fill="none" stroke="black" cx="276.494" cy="-18" rx="49.2915" ry="18"/>
+<text text-anchor="middle" x="276.494" y="-14.3" font-family="Times,serif" font-size="14.00">entity type</text>
+</g>
+<!-- root1&#45;&gt;et -->
+<g id="edge2" class="edge"><title>root1&#45;&gt;et</title>
+<path fill="none" stroke="black" d="M319.416,-299.896C303.808,-259.976 267.81,-167.909 267.81,-167.909 267.81,-167.909 272.337,-89.7605 274.854,-46.3123"/>
+<polygon fill="black" stroke="black" points="278.363,-46.2682 275.447,-36.0825 271.374,-45.8633 278.363,-46.2682"/>
+</g>
+<!-- src -->
+<g id="node14" class="node"><title>src</title>
+<ellipse fill="none" stroke="black" cx="526.494" cy="-318" rx="34.394" ry="18"/>
+<text text-anchor="middle" x="526.494" y="-314.3" font-family="Times,serif" font-size="14.00">source</text>
+</g>
+<!-- root1&#45;&gt;src -->
+<g id="edge3" class="edge"><title>root1&#45;&gt;src</title>
+<path fill="none" stroke="black" d="M381.94,-318C413.48,-318 452.454,-318 481.982,-318"/>
+<polygon fill="black" stroke="black" points="482.005,-321.5 492.005,-318 482.005,-314.5 482.005,-321.5"/>
+</g>
+</g>
+</svg>
--- a/Specifications/Originals/identity_constellation_repeats.gv
+++ b/Specifications/Originals/identity_constellation_repeats.gv
+digraph States {
+        // neato -n2 -Tsvg identity_constellation_repeats.gv -O
+        // 
+        // Absolute positioning appears to only work with neato, and only if all nodes are pinned,
+        // but not always. neato -n2 units are points, and inputscale appears to be ignored
+        // sep=0.2 splines=polyline overlap=false allows the pos values to be followed,
+        // while getting the lines to go around nodes.
+
+	label = "\n\nIdentity Constellation\n(R) repeatable fields";
+        labelloc="t";
+	fontsize=20;
+        // inputscale=75;
+        sep=0.05;
+
+        // nodesep is a synonym for sep?
+        // nodesep=0.1;
+
+        splines=polyline;
+        overlap=false;
+
+        "an1" [label="name/alt(R)"];
+        "ed1" [label="exist dates"];
+        "occ1" [label="occupation\nor function(R)"];
+        "cr1" [label="identity relation(R)"];
+        "rr1" [label="resource relation(R)"];
+        
+
+        root1 [pos="470,400!" label="identity root"];
+        place [pos="320,450!" label="related place(R)"];
+
+        an1             [pos="270,350!" ];
+        pref            [pos="120,410!" label="preferred"];
+        usedate         [pos="120,350!", label="use dates"];
+        name_components [pos="140,300!", label="components"];
+        language        [pos="150,250!", label="language"];
+        script          [pos="180,200!", label="script"];
+        authorized_form [pos="210,140!", label="authorized\nform"];
+
+        an1 -> language;
+        an1 -> script;
+        an1 ->authorized_form;
+        an1 -> pref;
+
+        name_components -> surname;
+        name_components -> forename;
+        name_components -> numeration;
+        name_components -> prefix;
+        name_components -> suffix;
+
+        surname [pos="0,350!", label="surname(R)"];
+        forename [pos="0,300!", label="forename(R)"];
+        numeration [pos="0,250!", label="numeration"];
+        prefix [pos="0,200!", label="prefix(R)"];
+        suffix [pos="0,150!", label="suffix(R)"];
+
+        ed1 [pos="330,270!"];
+        biog [pos="280,400!" label="biog hist"] ;
+        cr1 [pos="730,310!"];
+        et [pos="340,100!" label="entity type"];
+        occ1 [pos="550,250!"];
+        subject [pos="460,180!" label="topical subject(R)"];
+        rr1 [pos="720,200!"];
+        src [pos="670,400!" label="source(R)"];
+        citation [pos="690,450!" label="citation(R)"];
+
+        root1 -> subject;
+        root1 -> citation;
+        root1 -> et;
+        root1 -> src;
+        root1 -> place;
+        root1 -> an1;
+        root1 -> ed1;
+        root1 -> occ1;
+        root1 -> cr1;
+        root1-> rr1;
+        root1 -> biog;
+        an1 -> usedate;
+        an1 -> name_components;
+}
--- a/Specifications/Originals/identity_constellation_repeats.gv.svg
+++ b/Specifications/Originals/identity_constellation_repeats.gv.svg
--- a/Specifications/Originals/readme.md
+++ b/Specifications/Originals/readme.md
+
+
+##### Constellation diagrams
+
+The diagrams show conceptual table names, which mirror the SQL database schema. An identity
+has one record in each table unless noted as repeatable.
+
+constellation_linked.gv.svg is an over view showing how two related records are linked via identity
+relation. Each record is independent. The only link between records is the identity relation which had one
+record for each side of the relationship. For reasons of clarity, the two identity constellations are not
+shown in full detail.
+
+identity_constellation_repeats.gv.svg is a detailed view of a single identity constellation with repeatable
+recorda noted.
+
+identity_constellation.gv.svg is an earlier, simplified view of the data. 
+
--- a/Specifications/data_constellation.md
+++ b/Specifications/data_constellation.md
+
+SNAC Data outline
+
+This is a broad view of various kinds of data in the SNAC web application. At the core, SNAC data was
+historically EAC-CPF. Working with the CPF data causes two things to happen. First, CPF data itself becomes
+more of a constellation than discrete fields. Second, using and manipulating the data requires many types of
+meta data.
+
+SNAC has always had aspects of controlled vocabulary and authority work. Both of those are being formalized
+and both add data to the SNAC application.
+
+Most of the data resides in the SQL database. Nearly every item below corresponds to a SQL table. The database
+also has additional tables serving various linking and record keeping functions. At this time, we have several
+non-SQL data stores: XTF, Neo4j, Elastic search index
+
+- EAC-CPF constellation (broadly disambiguated from "identity", "entity", "EAC-CPF", "record", etc).
+    - Canonical data in SQL tables
+    - XML output generated as necessary
+    - Meta data
+        - Version system
+            - data current public version
+            - data current edit version (for records being edited)
+            - data old public versions
+            - data old edit versions
+        - Merge history
+        - Links to outside resources
+            - Archives
+            - Finding aids
+    - Multilingual strings
+        - Web UI labels
+        - Controlled vocabulary strings, including labels and definitions
+    - Controlled vocabularies
+        - Multilingual strings
+        - Have category and hierarchy
+        - All vocabularies share a base data structure
+        - Use varies by policy; does this imply a vocabulary workflow?
+- Name format system
+    - Multiple known formats
+    - Canonical SNAC format?
+    - Context sensitive to language, script, and user?
+- Workflows
+    - Web UI workflow
+        - Workflow specific to web domain, pages, buttons, output type, etc.
+    - Server workflow
+        - archivist edit
+        - split/merge
+            - identity reconciliation suggested merge
+            - manual merge
+        - policy based workflows
+        - technical workflows
+- Web admins
+    - Create and assign institution roles (more powerful than institution admins)
+- Institutions
+    - Institution admins
+    - Users
+    - SNAC CPF entries for institutions
+    - At least one role per institution
+- Users
+    - Dashboard tabs
+        - Historical Research Tool per-user search history
+        - Maintenance tool per-user workflow task status
+        - Notifications, all users
+    - Account info
+        - name, email, user id, password
+        - roles, as many as necessary
+    - Web session, possibly multiple sessions per user
+    - REST API session, similar or identical to web sessions
+    - Removing a role from a user revokes the associated privilege
+- Roles
+    - Created and maintained by admins with role privileges
+    - Single privilege per role, must be coordinated with workflows and application functions
+    - At least one role exists per institution
+    - At least one role per user (HRT user)
+    - Potentially, roles for ad-hoc groups (sub-institution, department, professional orgs, etc.)
+    - Need explicit, on-going policy guidance
+- Reports
+    - Read the database
+    - Availability based on roles
+- XTF full text index
+- Neo4j graph database
+- Elastic search full text index