Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
Documentation
Project
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Commits
Issue Boards
Open sidebar
Rachael Hu
Documentation
Commits
ecb16a8d
Commit
ecb16a8d
authored
Oct 21, 2015
by
Tom Laudeman
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
expand name component prose, edit the various component lists for clarity
parent
d56092fa
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
51 additions
and
12 deletions
+51
-12
name_compoents_alternates.md
Discussion/name_compoents_alternates.md
+51
-12
No files found.
Discussion/name_compoents_alternates.md
View file @
ecb16a8d
#### Name and alternate name
There is no consensus on the canonical name. There probably can be no single, always preferred name. What is
preferred depends on context, and will vary for different purposes. In the data we can capture a reasonable
amount of context, but only the users know what is preferred. Computationally, we should treat all name as
alternates. We can offer names in one (or more) of several agreed-upon formats, leaving the choice up to the
user.
Name and alternate has no effect on identity matching because the match is done on all alternates, and uses
all available data from the identity constellation.
#### Name components
#### Name components
There is only one set of name components for a given cpf identity, thus name_component is related to table
Give the variety of components in names, it is not possible to create a canonical, single set of components
cpf, not table name. To derive the components, we can parse the preferred name. Or using a more complex
for many names and their alternates. Even the concept of "preferred" name is debatable.
algo, we can parse all the names in a given language and build a consensus set, or a canonical set of
components. Initially I thought that a consensus set precludes having a component "japanese family name",
It is (mostly) possible to parse out the components for each name and each alternate name. Thus we have as many
but now I can't see why. There is no reason for a canonical set of components not to include components
sets of name components as we have names for a single identity constellation.
used only in a subset of preferred name forms.
Suggest: we not become dogmatic about component labels. We should avoid "family name" vs surname even though
family name is perhaps more culturally relevant. Ditto givenname vs forename. Goal: be culturally agnostic
in the name_component vocabulary, and capture cultural practice in some other place/table, such as table
name_format.
If we want language specific name components then we need to add field "language" to name_components. Due
If we want language specific name components then we need to add field "language" to name_components. Due
to components being extracted from possibly several name strings, we probably will not join table name to
to components being extracted from possibly several name strings, we probably will not join table name to
...
@@ -18,11 +34,6 @@ parsing. One aspect of database-centric component derivation would be a join tab
...
@@ -18,11 +34,6 @@ parsing. One aspect of database-centric component derivation would be a join tab
many-to-many relation between table name and table name component. Complete info about derivation also
many-to-many relation between table name and table name component. Complete info about derivation also
requires the name parsing version number and any configuration at the time the parsing was done.
requires the name parsing version number and any configuration at the time the parsing was done.
Suggest: we not become dogmatic about component labels. We should avoid "family name" vs surname even though
family name is perhaps more culturally relevant. Ditto givenname vs forename. Goal: be culturally agnostic
in the name_component vocabulary, and capture cultural practice in some other place/table, such as table
name_format.
Field nc_label (table name_component) must come from a controlled vocabulary in order for dynamic
Field nc_label (table name_component) must come from a controlled vocabulary in order for dynamic
formatting to work.
formatting to work.
...
@@ -32,15 +43,43 @@ our archives and authority stake holders.
...
@@ -32,15 +43,43 @@ our archives and authority stake holders.
#### Minimal list of components labels
#### Minimal list of components labels
The least flexible system (of the 3 or 4 we have reviewed) for name components is probably MARC. Even in MARC
the $c is repeatable, allowing for large number of (unlabeled) components. This system is probably too
restrictive, although it allows us to capture middle name when possible. However, lack of flexible labels
makes MARC a weak standard for names.
surname, forename, additions, numeration, expansion
surname, forename, additions, numeration, expansion
#### Larger list of components
#### Larger list of components
This list comes from a combination of MARC, Unimarc, ISNI, ArchiveSpace, and British Library. We might be wise
to include others as well (VIAF, BnF, AnF, Archives Hub UK).
surname, middle, forename, prefix, suffix, epithet, title, pretitle, numeration, additional
surname, middle, forename, prefix, suffix, epithet, title, pretitle, numeration, additional
Unfortunately, current name format guidelines are ambiguous. The most common problem is that middle name could
be a second forename or the second of several additional name components.
There is general agreement on "name" and "non-name" parts, although no guides explicitly talk about this. Name
parts are surname, forename, and middle name. Many other non-name parts are often found in names. Both the
name and non-name parts have ambiguous rules within most systems, and the various systems and cultures have
incomplete agreement.
The database is improved by labeling the compoents where possible, but our algorithms and user
interface can (fairly) easily remaing agnostic about compoent labels while processing anddisplaying the
compoents as well as formatting the components into names.
In the past, failure to create names by (re-) formatting components has led to inconsistent names. While is it
not possible to 100% parse or format names, it is also true that humans who did the data entry were not 100%
accurate in their formats. The computer can be more consistent, and probably nearly as accurate as the human
editors (especially where the editors cannot agree or where they have ambiguous rules). While it is
technically feasible to format names from components, it is also feasible to keep and (carefully) display the
names as originally entered.
#### Overview
#### Overview
ISNI: prefix, surname, forename (additional forenames), middle name (second and subsequent forenames), suffix
ISNI: prefix (NR), surname (NR), forename (additional forenames) (R-ish), middle name (second and subsequent
forenames) (R-ish), suffix (NR)
Unimarc: $a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
Unimarc: $a Surname (NR), $b Given name remainder (NR), $c Additions (R), $d Roman numerals (NR), $g Expansion (NR)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment