Commit 0f35071c by twl8n

Cleanup, some edits for consistency

parent 4c483cd6
See "comment:" for comments, discussion, todo, etc.
#### Governance and Policies, etc. #### Governance and Policies, etc.
- Data curation, preservation, graceful retirement - Data curation, preservation, graceful retirement
...@@ -26,7 +29,7 @@ ...@@ -26,7 +29,7 @@
- Data curation, preservation, graceful retirement - Data curation, preservation, graceful retirement
- Data expulsion vs. embargo - Data expulsion vs embargo vs mark as deleted vs physical record delete vs delete from all tapes/disk/media
- Duplicates, backups, restore, related policy and technical issues - Duplicates, backups, restore, related policy and technical issues
...@@ -37,7 +40,6 @@ ...@@ -37,7 +40,6 @@
#### List of requirements #### List of requirements
This is the definitive list of all requirements. Anything the application needs to do must be in this This is the definitive list of all requirements. Anything the application needs to do must be in this
list. Each item and group of items is explained in detail later in the document. Being a "list", this includes list. Each item and group of items is explained in detail later in the document. Being a "list", this includes
only sufficient detail to disambiguate items. only sufficient detail to disambiguate items.
...@@ -150,7 +152,7 @@ only sufficient detail to disambiguate items. ...@@ -150,7 +152,7 @@ only sufficient detail to disambiguate items.
1. SNAC links to external archival resources 1. SNAC links to external archival resources
2. External resources link to SNAC as an authority. (Tom asks: is SNAC also an archival resource?) 2. External resources link to SNAC as an authority. (Tom comment: is SNAC also an archival resource?)
- Clarify: the co-op version 1 is not going to support bulk data ingest - Clarify: the co-op version 1 is not going to support bulk data ingest
...@@ -223,8 +225,8 @@ only sufficient detail to disambiguate items. ...@@ -223,8 +225,8 @@ only sufficient detail to disambiguate items.
The following include both direct programming language intefaces, and REST interfaces. We need to determine The following include both direct programming language intefaces, and REST interfaces. We need to determine
which (REST/direct) is available for each. Modifying data should probably go through authorization and should which (REST/direct) is available for each. Modifying data should probably go through authorization and should
probably be subject to work flow, and that implies that the work flow has a REST interface, and this is the probably be subject to work flow, and that implies that the work flow has a REST interface, and that the REST
only public interface. interface is the only public interface.
- Identity Reconciliation (IR) (direct) - Identity Reconciliation (IR) (direct)
...@@ -292,21 +294,22 @@ where it could take many mouse clicks to accrete a specific search). ...@@ -292,21 +294,22 @@ where it could take many mouse clicks to accrete a specific search).
#### User interface for Discovery #### User interface for Discovery
#### Functionality for Splitting^[[m]](#cmnt13)^^[[n]](#cmnt14)^  #### Functionality for Splitting
Keeping in mind that our descriptions are authoritative, and will be comment: Add prose to explain how splitting interacts with the work flow (historically called the "queue").
referenced via persistent identifier (ARK), it will be necessary to
de-authorize or invalidate the ARK of a description which has been comment: Add prose to cover the manual splitting of single record components (bioghist) into multiple parts.
split. The ARK server will note the new ARKs of the resulting
descriptions in both human readable, and machine-actionable formats. Keeping in mind that our descriptions are authoritative, and will be referenced via persistent identifier
Outside parties with an invalid ARK will probably have to manually (ARK), it will be necessary to de-authorize or invalidate the ARK of a description which has been split. The
update their descriptions, since the entity name is too confusing for a ARK server will note the new ARKs of the resulting descriptions in both human readable, and machine-actionable
computer to disambiguate. (Although we can easily create a report of formats. Outside parties with an invalid ARK will probably have to manually update their descriptions, since
deprecated ARKs on a per-institution basis.) When merging descriptions, the entity name is too confusing for a computer to disambiguate. (Although we can easily create a report of
the main ARK will be retained, and merged ARKs can simply redirect to deprecated ARKs on a per-institution basis.) When merging descriptions, the main ARK will be retained, and
it. ^[[o]](#cmnt15)^Note: determine which operations require a new ARK, merged ARKs can simply redirect to it.
either due to the old ARK being so much changed as to not be want it
originally referred to, or other causes TBD. Note: determine which operations require a new ARK, either due to the old ARK being so much changed as the
original reference is meaningless, or other causes TBD.
Having found a description in need of splitting, we need UI to support Having found a description in need of splitting, we need UI to support
creating one or more additional descriptions. This should have a "save" creating one or more additional descriptions. This should have a "save"
...@@ -317,15 +320,15 @@ by moderators before being "posted", where posting makes the ...@@ -317,15 +320,15 @@ by moderators before being "posted", where posting makes the
modifications visible to the standard discovery tools. There are also modifications visible to the standard discovery tools. There are also
some issues in how we manage ARKs of split descriptions. some issues in how we manage ARKs of split descriptions.
In theory, several people in separate locations could collaborate in comment: confirm collaborative editing is not a requirement
real time on description maintenance. However, that type of
collaboration is fairly complex. We don't want to support collaborative comment: confirm that locking is a requirement
description splitting in the first version, so we need a feature to
"lock" descriptions. Which means we need mechanism for seeing who has In theory, several people in separate locations could collaborate in real time on description
the lock, and for sending that person a maintenance. However, that type of collaboration is fairly complex. We don't want to support collaborative
message.^[[p]](#cmnt16)^^[[q]](#cmnt17)^ Unless we're going to expose description splitting in the first version, so we need a feature to "lock" descriptions. Which means we need
the email addresses of our users we will need an anonymized email system mechanism for seeing who has the lock, and for sending that person a message. Unless we're going to expose the
(or email forwarding system). email addresses of our users we will need an anonymized email system (or email forwarding system).
An ideal split UI will easily allow text/fields to be selected and moved An ideal split UI will easily allow text/fields to be selected and moved
to one of the possibly multiple splits, via a single mouse click or to one of the possibly multiple splits, via a single mouse click or
...@@ -411,7 +414,7 @@ When starting description maintenance, the descriptions involved are ...@@ -411,7 +414,7 @@ When starting description maintenance, the descriptions involved are
locked to prevent other users from modifying them. The system notes this locked to prevent other users from modifying them. The system notes this
lock and makes the locked state visible in the discovery interface. It lock and makes the locked state visible in the discovery interface. It
seems safe to assume that one of the merged descriptions will become the seems safe to assume that one of the merged descriptions will become the
authoritative recdescriptionord. This single description will be authoritative record description. This single description will be
retained, and the other merged descriptions marked at deleted. We can retained, and the other merged descriptions marked at deleted. We can
retain the ARK of the single retained description. The main description retain the ARK of the single retained description. The main description
will be copied, with the original still visible to the discovery tool, will be copied, with the original still visible to the discovery tool,
...@@ -479,11 +482,12 @@ also needs to support bulk data edits of several types. ...@@ -479,11 +482,12 @@ also needs to support bulk data edits of several types.
#### Admin Client for Maintenance System #### Admin Client for Maintenance System
Does this mean the admin dashboard?
#### User Management #### User Management
Authentication is validating user logins to the system. Authorization is Authentication is validating user logins to the system. Authorization is the related aspect of controlling
the related aspect of controlling which parts of the system users may which parts of the system users may access (or even which parts they may know exist).
access (or even which parts they may know exist).
We can use OpenID for authentication, but we will need a user profile for SNAC roles and authorization. There We can use OpenID for authentication, but we will need a user profile for SNAC roles and authorization. There
are examples of PHP code to implement OpenID at stackexchange: are examples of PHP code to implement OpenID at stackexchange:
...@@ -509,49 +513,62 @@ authorization before being passed to the code doing the real work. ...@@ -509,49 +513,62 @@ authorization before being passed to the code doing the real work.
The Linux model of three privilege types "user", "group", and "other" works well for authorization permissions The Linux model of three privilege types "user", "group", and "other" works well for authorization permissions
and we should use this model. "User" is an authenticated user. "Group" is a set of users, and a user may and we should use this model. "User" is an authenticated user. "Group" is a set of users, and a user may
belong to several groups. Outside the Linux world "group" is known as "role", so SNAC will call them belong to several groups. In SNAC and the non-Linux world "group" is known as "role", so SNAC will call them
"roles". "Other" privileges apply to SNAC as public, non-authenticated users. "roles". "Other" privileges apply to SNAC as public, non-authenticated users, although we don't really have
"other", and the "researcher" role applies to public users.
Users can have several roles, and will have all the privileges of all the roles they fill. Role membership is Users can have several roles, and will have all the privileges of all their roles. Role membership is managed
managed by an administrative UI and related API code. User information such as name, phone number, and even by an administrative UI (part of the dashboard) and related API code. User information such as name, phone
password can also change. User ID values cannot be changed, and a user ID is never reused, even after account number, and even password can also change. User ID values cannot be changed, and a user ID is never reused,
deletion. even after account deletion.
We expect to create additional roles as necessary for application functions.
Roles include a large number "is instution member" roles. These should be roles like any other, but we may
want to flag these role records to make them easy to manage and easy to display in the UI. Any user can have
zero or more roles that define their instutional affiliation. This primarily effects reporting and admin. In
the case of reports, membership in an institution constrains the reporting. When setting up a report, users
may only choose from institutions of which they are members. Some reports may auto-detect the user's
membership.
By and large when we refer to "accounts" we mean web accounts managed by the Manager/Web admin. The general By and large when we refer to "accounts" we mean web accounts managed by the Manager/Web admin. The general
public can use the discovery interface without an account, but saving search history, and other public can use the discovery interface without an account, but saving search history, and other
session related discovery tools requires an account. session related discovery tools requires an account. It is technically possible to have a single session
dashboard. Although that has not been mentioned as a requirement and is probably a low priority, it might be
almost trivial to implement.
Every account will be in the "Researcher" role which has the same privileges as the general public, but with a Every account will be in the "Researcher" role which has the same privileges as the general public, but with a
TBD set of basic privileges including: search history, certain researcher reports. TBD set of basic privileges including: search history, certain researcher reports.
| User type | Role | Description | | User type | Role | Description |
|----------------------------+---------------------+------------------------------------------------------------------------| |----------------------------+---------------------+-----------------------------------------------------------------------|
| Sysadmin | Server admin | Maintain server, backups, etc. | | Sysadmin | Server admin | Maintain server, backups, etc. |
| DBA | DB admin | Schema maintenance, data dumps, etc. | | Database Administrator | DBA | Schema maintenance, data dumps, etc. |
| Software engineer | Developer | Coding, testing, QA, release management, data loading, etc. | | Software engineer | Developer | Coding, testing, QA, release management, data loading, etc. |
| Manager | Web admin | Web accounts: create, manage, assign roles, run reports | | Manager | Web admin | Web accounts: create, manage, assign roles, run reports |
| Peer vetting | Vetting | Approve moderators, reviewers, content experts | | Peer vetting | Vetting | Approve moderators, reviewers, content experts |
| Moderator | Moderator | Approve maintenance changes, posting those changes | | Moderator | Moderator | Approve maintenance changes, posting those changes |
| Reviewer/editor | Maintenance | Maintainer privileges, interacts with moderators | | Reviewer/editor | Maintenance | Maintainer privileges, interacts with moderators |
| Content expert | Maintenance | Domain expert, may have zero institutional roles | | Content expert | Maintenance | Domain expert, may have zero institutional roles |
| Documentary editor | Maintenance | Distinguished by? | | Documentary editor | Maintenance | Distinguished by? |
| Maintenance | Maintenance | Distinguished by? | | Maintenance | Maintenance | Distinguished by? |
| Researcher | Researcher | Use the discovery interface and history dashboard | | Researcher | Researcher | Use the discovery interface and history dashboard |
| Archival description donor | Block upload | Bulk uploads of CPF or finding aids | | Archival description donor | Block upload | Bulk uploads of CPF or finding aids |
| Name authority manager | Name authority | Donates name authority data perhaps via bulk upload | | Name authority manager | Name authority | Donates name authority data perhaps via bulk upload |
| Institutional admins | Institutional admin | Admin dashboard, institutional reports | | Institutional admins | Institutional admin | Instutional role admin dashboard, institutional reports |
| Public | Researcher | No account, researcher role, no dashboard, or single session dashboard | | Public | Researcher | No account, researcher role, no dashboard or single session dashboard |
Not listed above are "is instution member" roles. Any user can have zero or more roles that define their Remember: institutional affiliation roles aren't in the table above. There will be many of those roles, and
instutional privileges. This primarily effects reporting and admin. In the case of reports, membership in an users may have zero, one, or several institutional roles that define which insitutions that user is a member
institution constrains the reporting. When setting up a report, users may only choose from institutions of of.
which they are members. Some reports may auto-detect the user's membership.
It is possible for an institutional admin to be a member of more than one institution. Institutional Admins
Institutional Admins have the ability: view membership lists of their institution(s), and to add or remove have abilities:
their instutional role for users. It is possible for an institutional admin to be a member of more than one
institution. - view membership lists of their institution(s)
- add or remove their instutional role for users.
Roles which require one or more instutitutional roles (affiliation): Roles which require one or more instutitutional roles (affiliation):
...@@ -587,17 +604,15 @@ command line accounts involved, and server configuration. This aspect of ...@@ -587,17 +604,15 @@ command line accounts involved, and server configuration. This aspect of
administration integrates with versioning, backup, and software administration integrates with versioning, backup, and software
releases. releases.
#### Reports ^[[s]](#cmnt19)^^[[t]](#cmnt20)^ #### Reports
While the web interface is the primary public face of SNAC, many other While the web interface is the primary public face of SNAC, many other views of the data and meta data are
views of the data and meta data are necessary, especially for admins and necessary, especially for admins and governance. Those "views" are reports and will primary be generated via
governance. These reports will primary be generated via integration of a integration of a third-party reporting package such as Jaspersoft Business Intelligence Suite, which is free,
third-party reporting package such as Jaspersoft Business Intelligence open source, and includes a full range of tools. All SNAC data resides in PostgreSQL, the standard SQL
Suite, which is free, open source, and includes a full range of tools. relational database management system (RDBMS) which simplifies the process of adding reporting and business
The SNAC data resides in PostgreSQL, the standard SQL relational intelligence.
database management system (RDBMS) which simplifies the process of
adding reporting and business intelligence.
(How much detail do we want about reports? Maybe just half a dozen (How much detail do we want about reports? Maybe just half a dozen
examples?) examples?)
...@@ -616,12 +631,10 @@ developers via SSH. If the institution hosting the project can only ...@@ -616,12 +631,10 @@ developers via SSH. If the institution hosting the project can only
allow employees on the server, then we may need to create a new server allow employees on the server, then we may need to create a new server
strategy. strategy.
One option is to do our hosting on Amazon. If so, what is the hosting One option is to do our hosting on Amazon. If so, what is the hosting fall back if Amazon has an outage? If we
fall back if Amazon has an outage? ^[[u]](#cmnt21)^Where do we house host with Amazon, do we have to pay extra for multiple availablity zones? Where does Amazon house offsite
things like tape backups? If we're using Amazon we will have to research things like tape backups? If we're using Amazon we will have to research the list of things that go wrong
the list of things that go wrong since our current since our current sysadmins are experienced with the model of local hardware colocation.
sysadmins^[[v]](#cmnt22)^ are experienced with the model of local
hardware colocation.
One common failure of standard server practice is to assume that backups One common failure of standard server practice is to assume that backups
are working. We should test our backups on some schedule to verify that are working. We should test our backups on some schedule to verify that
...@@ -644,14 +657,14 @@ dissemination (All, especially Brian) ...@@ -644,14 +657,14 @@ dissemination (All, especially Brian)
In addition to current and planned features (need a list) we should In addition to current and planned features (need a list) we should
consider the following: consider the following:
- Expose all CPF descriptions to search crawlers so that Google and - Expose all CPF descriptions to search crawlers so that Google and Bing can index our data. Google has
Bing can index our data.^[[w]](#cmnt23)^ started using schema.org for improved hinting about certain kinds of data.
- Expose the facets of our data as web pages or directories of web
pages so that the facets can be browsed outside XTF, and indexed by - Expose the facets of our data as web pages or directories of web pages so that the facets can be browsed
Google and Bing. outside XTF, and indexed by Google and Bing.
- Administration interface/functionality, including private/admin
facing, internal discovery tools, and data modification (Tom, Brian, - Administration interface/functionality, including private/admin facing, internal discovery tools, and data
Rachael, Ray) modification (Tom, Brian, Rachael, Ray)
The last item above is available only to management and editorial The last item above is available only to management and editorial
admins, but not required by any other users. Not all admins should (or admins, but not required by any other users. Not all admins should (or
...@@ -667,32 +680,25 @@ heavy report users. ...@@ -667,32 +680,25 @@ heavy report users.
#### Ability to Open/Close the Site during Maintenance #### Ability to Open/Close the Site during Maintenance
If the product has a "closed for maintenance" feature, If the web application has a "closed for maintenance" feature, this feature would be available to web admins,
^[[x]](#cmnt24)^this ability would be available to admins, even though even though it is the Linux sysadmins who will do the maintenance. A common major failure of web applications
it is the Linux sysadmins who will do the maintenance. A major failing is the assumption that the product is always up. This creates havoc when the site simply fails to load due to
of web applications is the assumption that the product is always up. an outage, planned or otherwise. With a little work we should be able to have an orderly "site is closed" web
This creates havoc when the site simply fails to load due to an outage, page and status message for planned outages. We might be able to failover to some kind of system status
planned or otherwise. With a little work we should be able to have an message. This is a low priority feature since downtime is probably only a few hours per year. At the same
orderly "site is closed" web page and status message. This is a low time, if it isn't too difficult to implement, it sets our project apart from the majority who either ignore
priority feature since downtime is probably only a few hours per year. the problem, or let their help desk folks spend an hour apologizing to customers.
At the same time, if it isn't too difficult to implement, it sets our
project apart from the majority who either ignore the problem, or let When the product is closed, web admins should be able to login (assuming login is possible).
their help desk folks spend an hour apologizing to customers.
comment: Do we want an architecture where the login is essentially a separate product so that we can have a
When the product is closed, web admins should be able to login (assuming "lobby" and other front end features that continue to work even when the backend is down for maintenance?
login is possible). Discuss: do we want an architecture where the login
is essentially a separate product so that we can have a "lobby" and Most sites simply return a server error or site not available (404) when the site is down for whatever
other front end features that continue to work even when the backend is reason. We can avoid this a couple of ways. The simplest is to use some Apache server features and a few
down for maintenance? simple scripts so that users see a nice message when the site is down for maintenance. This very simple
approach requires little or no change to our software architecture. The more elegant approach is to use one of
Most sites simply return a server error or site not available (404) when several system architectures that  keep a small system front end always running.
the site is down for whatever reason. We can avoid this a couple of
ways. The simplest is to use some Apache server features and a few
simple scripts so that users see a nice message when the site is down
for maintenance. This very simple approach requires little or no change
to our software architecture. The more elegant approach is to use one of
several system architectures that  keep a small system front end always
running.
#### Sandbox for Training, perhaps as a clone of the QA system? #### Sandbox for Training, perhaps as a clone of the QA system?
...@@ -745,85 +751,5 @@ Main Artifact: Research Agenda, schemas and specifications (esp. merge ...@@ -745,85 +751,5 @@ Main Artifact: Research Agenda, schemas and specifications (esp. merge
spec) spec)
[[a]](#cmnt_ref1)Awkward. Unclear perhaps that "the same" means records
referring to the same identity, and not "the same" as the previous
sentence.
[[b]](#cmnt_ref2)could it be phrased as "...for matching name records,
linking those descriptions to a single authoritative CF identity."?
I am not sure the adverb "Critically" has noteworthy value here. Or
should it be replaced with something like "Basically" | "Essentially" |
"Effectively" ?
[[c]](#cmnt_ref3)Is this the same as pilot phase; or after the pilot?
[[d]](#cmnt_ref4)First time readers may not be clear that the database
contains singleton and merged records. Confusion may arise because we
alway says, "the merged records are discoverable..." In fact, both
unmerged and merged records are discoverable.
[[e]](#cmnt_ref5)I find this sense awkward. Should the sentence maybe
end with something more like "...can accurately determine are matching
descriptions | descriptions for the same identify."?
[[f]](#cmnt_ref6)We seem to have a name consistency issue. Names here
should match names on the SNAC web site, grant materials, etc.
[[g]](#cmnt_ref7)this is a planned feature, thus the next sentence
instead of this sentence.
[[h]](#cmnt_ref8)Work on alternative 1 to extract out functions common
to all prototype architectures, and distill Alt 1 architecture.
[[i]](#cmnt_ref9)We need user id and group (role) in order to implement
most of the UI features. Unless the CRM is tightly integrated with the
Prototype, there will be problems. Correspondence, contracts, etc.
present an interesting problem.
[[j]](#cmnt_ref10)It may be asking too much to find an off the shelf CRM
that integrates both with our UI/UX and an off-the-shelf issue tracker.
[[k]](#cmnt_ref11)Note this management role!
[[l]](#cmnt_ref12)Good point that we may need a programmer to handle
tier 2 help desk issues, if not during the prototype, then later.
[[m]](#cmnt_ref13)this section doesn't cover the manual splitting of
parts of the record that go into the various splits. For example, a
bioghist might need to be split several ways, and for that we need some
kind of wysiwyg editor.
[[n]](#cmnt_ref14)This also needs a rewrite to align with the data
architecture/queue, etc.
[[o]](#cmnt_ref15)Right? Review the rule of when ARCs are invalid.
[[p]](#cmnt_ref16)we sketched out an edit queue based approach in one of
the DC meetings
[[q]](#cmnt_ref17)If we don't lock, two people could have live edits,
and one of them is not going to get the expected result, unless I'm
missing something. Certainly both edits will take place, but the final
state could result in the first edit being wiped out, just as can happen
in RDBMS commits. The locking seems to me more a feature of business
logic than transaction logic.
[[r]](#cmnt_ref18)why not do something like use OAuth and google?
[[s]](#cmnt_ref19)I know ASpace uses jasper reports with good success;
but I'm not convinced the database will record information on everything
we want to report on.
[[t]](#cmnt_ref20)A corollary requirement is that the database contain
all necessary data for any report we anticipate.
[[u]](#cmnt_ref21)Host in multiple availability zones
[[v]](#cmnt_ref22)Several teams at CDL including DSC have several years
experience running production services in Amazon
[[w]](#cmnt_ref23)I'm pretty sure this is a current feature
[[x]](#cmnt_ref24)This is just for the backend? The front end should
not need to go down.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment