Commit ecb16a8d by Tom Laudeman

expand name component prose, edit the various component lists for clarity

parent d56092fa
#### Name and alternate name
There is no consensus on the canonical name. There probably can be no single, always preferred name. What is
preferred depends on context, and will vary for different purposes. In the data we can capture a reasonable
amount of context, but only the users know what is preferred. Computationally, we should treat all name as
alternates. We can offer names in one (or more) of several agreed-upon formats, leaving the choice up to the
user.
Name and alternate has no effect on identity matching because the match is done on all alternates, and uses
all available data from the identity constellation.
#### Name components #### Name components
There is only one set of name components for a given cpf identity, thus name_component is related to table Give the variety of components in names, it is not possible to create a canonical, single set of components
cpf, not table name. To derive the components, we can parse the preferred name. Or using a more complex for many names and their alternates. Even the concept of "preferred" name is debatable.
algo, we can parse all the names in a given language and build a consensus set, or a canonical set of
components. Initially I thought that a consensus set precludes having a component "japanese family name", It is (mostly) possible to parse out the components for each name and each alternate name. Thus we have as many
but now I can't see why. There is no reason for a canonical set of components not to include components sets of name components as we have names for a single identity constellation.
used only in a subset of preferred name forms.
Suggest: we not become dogmatic about component labels. We should avoid "family name" vs surname even though
family name is perhaps more culturally relevant. Ditto givenname vs forename. Goal: be culturally agnostic
in the name_component vocabulary, and capture cultural practice in some other place/table, such as table
name_format.
If we want language specific name components then we need to add field "language" to name_components. Due If we want language specific name components then we need to add field "language" to name_components. Due
to components being extracted from possibly several name strings, we probably will not join table name to to components being extracted from possibly several name strings, we probably will not join table name to
...@@ -18,11 +34,6 @@ parsing. One aspect of database-centric component derivation would be a join tab ...@@ -18,11 +34,6 @@ parsing. One aspect of database-centric component derivation would be a join tab
many-to-many relation between table name and table name component. Complete info about derivation also many-to-many relation between table name and table name component. Complete info about derivation also
requires the name parsing version number and any configuration at the time the parsing was done. requires the name parsing version number and any configuration at the time the parsing was done.
Suggest: we not become dogmatic about component labels. We should avoid "family name" vs surname even though
family name is perhaps more culturally relevant. Ditto givenname vs forename. Goal: be culturally agnostic
in the name_component vocabulary, and capture cultural practice in some other place/table, such as table
name_format.
Field nc_label (table name_component) must come from a controlled vocabulary in order for dynamic Field nc_label (table name_component) must come from a controlled vocabulary in order for dynamic
formatting to work. formatting to work.
...@@ -32,15 +43,43 @@ our archives and authority stake holders. ...@@ -32,15 +43,43 @@ our archives and authority stake holders.
#### Minimal list of components labels #### Minimal list of components labels
The least flexible system (of the 3 or 4 we have reviewed) for name components is probably MARC. Even in MARC
the $c is repeatable, allowing for large number of (unlabeled) components. This system is probably too
restrictive, although it allows us to capture middle name when possible. However, lack of flexible labels
makes MARC a weak standard for names.
surname, forename, additions, numeration, expansion surname, forename, additions, numeration, expansion
#### Larger list of components #### Larger list of components
This list comes from a combination of MARC, Unimarc, ISNI, ArchiveSpace, and British Library. We might be wise
to include others as well (VIAF, BnF, AnF, Archives Hub UK).
surname, middle, forename, prefix, suffix, epithet, title, pretitle, numeration, additional surname, middle, forename, prefix, suffix, epithet, title, pretitle, numeration, additional
Unfortunately, current name format guidelines are ambiguous. The most common problem is that middle name could
be a second forename or the second of several additional name components.
There is general agreement on "name" and "non-name" parts, although no guides explicitly talk about this. Name
parts are surname, forename, and middle name. Many other non-name parts are often found in names. Both the
name and non-name parts have ambiguous rules within most systems, and the various systems and cultures have
incomplete agreement.
The database is improved by labeling the compoents where possible, but our algorithms and user
interface can (fairly) easily remaing agnostic about compoent labels while processing anddisplaying the
compoents as well as formatting the components into names.
In the past, failure to create names by (re-) formatting components has led to inconsistent names. While is it
not possible to 100% parse or format names, it is also true that humans who did the data entry were not 100%
accurate in their formats. The computer can be more consistent, and probably nearly as accurate as the human
editors (especially where the editors cannot agree or where they have ambiguous rules). While it is
technically feasible to format names from components, it is also feasible to keep and (carefully) display the
names as originally entered.
#### Overview #### Overview
ISNI: prefix, surname, forename (additional forenames), middle name (second and subsequent forenames), suffix ISNI: prefix (NR), surname (NR), forename (additional forenames) (R-ish), middle name (second and subsequent
forenames) (R-ish), suffix (NR)
Unimarc: $a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR) Unimarc: $a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment