renamed name_compoents.md, expected the original to be removed, now running git rm

83e33e1d · Tom Laudeman · a20a2182 · a20a2182
Commit 83e33e1d authored Nov 02, 2015 by Tom Laudeman
Show whitespace changes
Inline Side-by-side

Showing with 0 additions and 161 deletions

name_compoents_alternates.md Discussion/name_compoents_alternates.md +0 -161

No files found.
--- a/Discussion/name_compoents_alternates.md
+++ b/Discussion/name_compoents_alternates.md
-
-#### Name and alternate name
-
-There is no consensus on the canonical name. There probably can be no single, always preferred name. What is
-preferred depends on context, and will vary for different purposes. In the data we can capture a reasonable
-amount of context, but only the users know what is preferred. Computationally, we should treat all name as
-alternates. We can offer names in one (or more) of several agreed-upon formats, leaving the choice up to the
-user.
-
-Name and alternate has no effect on identity matching because the match is done on all alternates, and uses
-all available data from the identity constellation. 
-
-#### Name components
-
-Give the variety of components in names, it is not possible to create a canonical, single set of components
-for many names and their alternates. Even the concept of "preferred" name is debatable.
-
-It is (mostly) possible to parse out the components for each name and each alternate name. Thus we have as many
-sets of name components as we have names for a single identity constellation. 
-
-Suggest: we not become dogmatic about component labels. We should avoid "family name" vs surname even though
-family name is perhaps more culturally relevant. Ditto givenname vs forename. Goal: be culturally agnostic
-in the name_component vocabulary, and capture cultural practice in some other place/table, such as table
-name_format. 
-
-If we want language specific name components then we need to add field "language" to name_components. Due
-to components being extracted from possibly several name strings, we probably will not join table name to
-table name_component. It would be a many-to-one join, and given that name component relates to cpf, and not
-to one or more name strings, a join between table name and table name component is not logical.
-
-Do we need information about how each name component was derived? If so we are probably better off saving a
-log of the name parser than trying to use the database as part of the historical record of name
-parsing. One aspect of database-centric component derivation would be a join table to handle the
-many-to-many relation between table name and table name component. Complete info about derivation also
-requires the name parsing version number and any configuration at the time the parsing was done.
-
-Field nc_label (table name_component) must come from a controlled vocabulary in order for dynamic
-formatting to work.
-
-We should not allow our technology to be defined by the "minimal existing implementation" of name. We will
-cripple SNAC if we only meet the minimal definition of name. Additionally, SNAC has needs beyond that of
-our archives and authority stake holders.
-
-#### Minimal list of components labels
-
-The least flexible system (of the 3 or 4 we have reviewed) for name components is probably MARC. Even in MARC
-the $c is repeatable, allowing for large number of (unlabeled) components. This system is probably too
-restrictive, although it allows us to capture middle name when possible. However, lack of flexible labels
-makes MARC a weak standard for names.
-
-surname, forename, additions, numeration, expansion
-
-#### Larger list of components
-
-This list comes from a combination of MARC, Unimarc, ISNI, ArchiveSpace, and British Library. We might be wise
-to include others as well (VIAF, BnF, AnF, Archives Hub UK).
-
-surname, middle, forename, prefix, suffix, epithet, title, pretitle, numeration, additional
-
-Unfortunately, current name format guidelines are ambiguous. The most common problem is that middle name could
-be a second forename or the second of several additional name components. 
-
-There is general agreement on "name" and "non-name" parts, although no guides explicitly talk about this. Name
-parts are surname, forename, and middle name. Many other non-name parts are often found in names. Both the
-name and non-name parts have ambiguous rules within most systems, and the various systems and cultures have
-incomplete agreement. 
-
-The database is improved by labeling the compoents where possible, but our algorithms and user
-interface can (fairly) easily remaing agnostic about compoent labels while processing anddisplaying the
-compoents as well as formatting the components into names.
-
-In the past, failure to create names by (re-) formatting components has led to inconsistent names. While is it
-not possible to 100% parse or format names, it is also true that humans who did the data entry were not 100%
-accurate in their formats. The computer can be more consistent, and probably nearly as accurate as the human
-editors (especially where the editors cannot agree or where they have ambiguous rules). While it is
-technically feasible to format names from components, it is also feasible to keep and (carefully) display the
-names as originally entered.
-
-#### Overview
-
-ISNI: prefix (NR), surname (NR), forename (additional forenames) (R-ish), middle name (second and subsequent
-forenames) (R-ish), suffix (NR)
-
-Unimarc: $a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
-
-MARC: $a name (NR), $b numeration (NR), $c titles and other words (R), $q fuller form (NR)
-
-
-#### Detailed fields by authority
-
-(R) are repeatable fields. (NR) are non-repeating fields.
-
-
-#### ISNI
-
-From: ISNI fields of tab delimited format for data submission, A. MacEwan et al.
-
-http://dx.doi.org/10.1080/01639374.2012.730601
-
-http://www.tandfonline.com/doi/abs/10.1080/01639374.2012.730601?journalCode=wccq20
-
-ISNI components:
-
-prefix, surname, forename and additional forenames, middle name second and subsequent forenames, suffix
-
- prefix: e.g. Sir
- surname: all parts of surname in the form commonly used; for alt form use alt name
- forename: one or more forenames or initials
- middle: second and subsequent forenames
- suffix: e.g. Esq
- ISNI's input format supports a single alternate name in indirect format
-
-#### Unimarc
-
-http://www.ifla.org/files/assets/uca/unimarc_updates/BIBLIOGRAPHIC/u-b_700_update.pdf
-
-Unimarc components:
-
-$a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
-
-
-LoC MARC 1xx
-
-http://www.loc.gov/marc/authority/ad100.html
-
-MARC and Unimarc are surname oriented: "that part of the name by which the name is entered in ordered
-lists", although they allow the main entry to be a forename (distinguished by indicators attribute). All
-non-surname parts of the given name are thrown together into forename, both have a numeration field, both
-allow many additional components. MARC $c acknowledges that there are many types of other words associated
-with names.
-
-MARC components:
-
-$a name (NR), $b numeration (NR), $c titles and other words (R), $q fuller form (NR)
-
-(First indicator determines direct or indirect format of $a.)
-
-```
-100 - MAIN ENTRY--PERSONAL NAME (NR)
-   Indicators
-      First - Type of personal name entry element
-         0 - Forename
-         1 - Surname
-         3 - Family name
-      $a - Personal name (NR)
-      $b - Numeration (NR) [roman numerals]
-      $c - Titles and other words associated with a name (R) [Jr., King of Sweden, Meister, pseud., Sir, (Anglo-Norman poet), Esq., II]
-      $d - Dates associated with a name (NR)
-      $q - Fuller form of name (NR)
-```
-
-#### ArchiveSpace
-
-http://sandbox.archivesspace.org/
-
-ArchiveSpace components:
-prefix, title, "primary part of name (required)", rest of name, suffix, fuller form.
-
-
-
-