Commit 1be90de9 by Robbie Hott

Started splitting requirements.md. Moved progess files to Unsorted/

parent 4d3208dc
# Staffing Model (Brian's draft suggestions)
Production of a cooperatively maintained high profile web site requires
different types of Technical and non-technical work.
Operations Team
- Communications and interactions with end users and content owners,
from marketing to user support, assessment
- Manages help desk
- Support production web application infrastructure, including
monitoring, "on call" for first tier response to system monitors
- batch ingest of new data sources
- signs up and on-boards new pilot members
- Proactive content QA and remediation
- work organized around issue queue / customer relationship management
system
Main Artifact: Ticketing Issue tracker that automatically generates a
ticket for an email to help@example.edu
Development Team
- Create new features that deliver customer value
- Maintain tests for new features
- second tier support of deployed features, developers on call for
their deployed code
- deploy code to test, stage, and production environments
- work organized around sprints
Main Artifact: User story backlog that supports scoring stories by
points,
Research Team
- Conduct experiments with new algorithms and technologies
- interoperation (and participation in the development) of relevant
domain specific standards and practices
Main Artifact: Research Agenda, schemas and specifications (esp. merge
spec)
...@@ -18,6 +18,11 @@ Data should be available to be downloaded in the following formats: ...@@ -18,6 +18,11 @@ Data should be available to be downloaded in the following formats:
## System Reports ## System Reports
While the web interface is the primary public face of SNAC, many other views of the data and meta data are
necessary, especially for admins and governance. Those "views" are reports and will primary be generated via
integration of a third-party reporting package such as Jaspersoft Business Intelligence Suite, which is free,
open source, and includes a full range of tools.
For each user of the system, the following reports should be available for download: For each user of the system, the following reports should be available for download:
* List of records the user has edited * List of records the user has edited
......
...@@ -9,10 +9,10 @@ on stuff where we don't agree. ...@@ -9,10 +9,10 @@ on stuff where we don't agree.
Requirements: Requirements:
- expose an http accessible API that is viable for `wget` or `curl`, browser `<form>`, and Ajax calls. - expose an http accessible API that is viable for `wget` or `curl`, browser `<form>`, and Ajax calls.
- Supported input format depends on the complexity of the requested operation. - Supported input format depends on the complexity of the requested operation.
- Public functions require no authentication. Everything else must include authentication data. - Public functions require no authentication. Everything else must include authentication data.
- Sandbox functionality to for training and testing, which doesn't modify actual SNAC data
### Web application output via template ### Web application output via template
...@@ -89,3 +89,26 @@ The watcher should have the ability to disable their watch. After each edit, all ...@@ -89,3 +89,26 @@ The watcher should have the ability to disable their watch. After each edit, all
watchers will get a notification. The watch does not apply to any single field, but to the entire description, and therefore also to future descriptions which result from merging. watchers will get a notification. The watch does not apply to any single field, but to the entire description, and therefore also to future descriptions which result from merging.
When an identity constellation is split, the watch propagates to both resulting records. The user will be informed of the change, and then may choose to disable one of the watchers. When an identity constellation is split, the watch propagates to both resulting records. The user will be informed of the change, and then may choose to disable one of the watchers.
### Ability to Open/Close the Site during Maintenance
If the web application has a "closed for maintenance" feature, this feature would be available to web admins,
even though it is the Linux sysadmins who will do the maintenance. A common major failure of web applications
is the assumption that the product is always up. This creates havoc when the site simply fails to load due to
an outage, planned or otherwise. With a little work we should be able to have an orderly "site is closed" web
page and status message for planned outages. We might be able to failover to some kind of system status
message. This is a low priority feature since downtime is probably only a few hours per year. At the same
time, if it isn't too difficult to implement, it sets our project apart from the majority who either ignore
the problem, or let their help desk folks spend an hour apologizing to customers.
When the product is closed, web admins should be able to login (assuming login is possible).
comment: Do we want an architecture where the login is essentially a separate product so that we can have a
"lobby" and other front end features that continue to work even when the backend is down for maintenance?
Most sites simply return a server error or site not available (404) when the site is down for whatever
reason. We can avoid this a couple of ways. The simplest is to use some Apache server features and a few
simple scripts so that users see a nice message when the site is down for maintenance. This very simple
approach requires little or no change to our software architecture. The more elegant approach is to use one of
several system architectures that  keep a small system front end always running.
# User Management
Authentication is validating user logins to the system. Authorization is the related aspect of controlling
which parts of the system users may access (or even which parts they may know exist).
We can use OpenID for authentication, but we will need a user profile for SNAC roles and authorization. There
are examples of PHP code to implement OpenID at stackexchange:
http://stackoverflow.com/questions/4459509/how-to-use-open-id-as-login-system
OpenID seems to constantly be changing, and sites using change frequently. Google has (apparently) deprecated
OpenID 2.0 in favor of Open Connect. Facebook is using something else, but apparently FB still works with
OpenID. Stackexchange supports several authentication schemes. If they can do it, so can we. Or we can support
one scheme for starters and add others as necessary. The SE code is not open source, so we can't see how much
work it was to support the various OpenID partners.
Authorization involves controlling what users can do once they are in the system. That function is sort of
more solved by OAuth or OpenID by sharing the user profile. However, SNAC has specific requirements,
especially our roles, and those will not be found in other system. There is not anything we must have from
user profiles. We might want their social networking profile, but social networking is not a core function of
SNAC.
By default users can't do anything that isn't exposed to the non-authenticated public users. Privileges are
added and users are given roles (aka groups) from which they inherit privileges. The authorization system is
involved in every transaction with the server to the extent that every request to the server is checked for
authorization before being passed to the code doing the real work.
The Linux model of three privilege types "user", "group", and "other" works well for authorization permissions
and we should use this model. "User" is an authenticated user. "Group" is a set of users, and a user may
belong to several groups. In SNAC and the non-Linux world "group" is known as "role", so SNAC will call them
"roles". "Other" privileges apply to SNAC as public, non-authenticated users, although we don't really have
"other", and the "researcher" role applies to public users.
Users can have several roles, and will have all the privileges of all their roles. Role membership is managed
by an administrative UI (part of the dashboard) and related API code. User information such as name, phone
number, and even password can also change. User ID values cannot be changed, and a user ID is never reused,
even after account deletion.
We expect to create additional roles as necessary for application functions.
Roles include a large number "is instution member" roles. These should be roles like any other, but we may
want to flag these role records to make them easy to manage and easy to display in the UI. Any user can have
zero or more roles that define their instutional affiliation. This primarily effects reporting and admin. In
the case of reports, membership in an institution constrains the reporting. When setting up a report, users
may only choose from institutions of which they are members. Some reports may auto-detect the user's
membership.
By and large when we refer to "accounts" we mean web accounts managed by the Manager/Web admin. The general
public can use the discovery interface without an account, but saving search history, and other
session related discovery tools requires an account. It is technically possible to have a single session
dashboard. Although that has not been mentioned as a requirement and is probably a low priority, it might be
almost trivial to implement.
Every account will be in the "Researcher" role which has the same privileges as the general public, but with a
TBD set of basic privileges including: search history, certain researcher reports.
| User type | Role | Description |
|----------------------------+---------------------+-----------------------------------------------------------------------|
| Sysadmin | Server admin | Maintain server, backups, etc. |
| Database Administrator | DBA | Schema maintenance, data dumps, etc. |
| Software engineer | Developer | Coding, testing, QA, release management, data loading, etc. |
| Manager | Web admin | Web accounts: create, manage, assign roles, run reports |
| Peer vetting | Vetting | Approve moderators, reviewers, content experts |
| Moderator | Moderator | Approve maintenance changes, posting those changes |
| Reviewer/editor | Maintenance | Maintainer privileges, interacts with moderators |
| Content expert | Maintenance | Domain expert, may have zero institutional roles |
| Documentary editor | Maintenance | Distinguished by? |
| Maintenance | Maintenance | Distinguished by? |
| Researcher | Researcher | Use the discovery interface and history dashboard |
| Archival description donor | Block upload | Bulk uploads of CPF or finding aids |
| Name authority manager | Name authority | Donates name authority data perhaps via bulk upload |
| Institutional admins | Institutional admin | Instutional role admin dashboard, institutional reports |
| Public | Researcher | No account, researcher role, no dashboard or single session dashboard |
Remember: institutional affiliation roles aren't in the table above. There will be many of those roles, and
users may have zero, one, or several institutional roles that define which insitutions that user is a member
of.
It is possible for an institutional admin to be a member of more than one institution. Institutional Admins
have abilities:
- view membership lists of their institution(s)
- add or remove their instutional role for users.
Roles which require one or more instutitutional roles (affiliation):
- Block upload
- Name authority
- Institutional admin
Roles which may have zero or more institutional roles:
- Web admin
- Vetting
- Moderator
- Maintenance (likely to have one or more)
- Researcher
There are several dashboard sections:
- Standard researcher history
- Standard user account management (password, email, etc.)
- Web admin account creation, deletion, role assignments
- Vetting admin (if we have vetting)
- Available reports.
# SNAC Server Architecture
The system will be architected as a LAMP system, with the following components:
* Linux: CentOS 7
* Apache: Apache 2 web server
* PHP: PHP 7
* PostgreSQL: Postgres
Each component of the architecture will run on this platform. Any sub-component must either produce it's own http server on an available port, such as Elastic Search, or utilize the main Apache web server running a virtual host.
The following diagrams describe the architecture of internal components:
* [Overall Server Architecture](http://gitlab.iath.virginia.edu/snac/Documentation/raw/master/Specifications/SNAC%20Server%20Architecture.pdf)
...@@ -21,27 +21,20 @@ This is the definitive list of all requirements, briefly. Anything the applicati ...@@ -21,27 +21,20 @@ This is the definitive list of all requirements, briefly. Anything the applicati
list. Each item and group of items is explained in detail later in the document. Being a "list", this includes list. Each item and group of items is explained in detail later in the document. Being a "list", this includes
only sufficient detail to disambiguate items. only sufficient detail to disambiguate items.
**Todo: We should reorganize these by component of the system, then create a formal specifications for each component as to what it must do and how they interact.** * User Management
- authentication
- authentication
- user creation - user creation
- authentication - authentication
- account maintenance - account maintenance
- authorization
- authorization
- user/group/other (ugo) with read/write (rw) privilege system, typical for Linux - user/group/other (ugo) with read/write (rw) privilege system, typical for Linux
- admin tool for group privs - admin tool for group privs
- create/edit/delete groups - create/edit/delete groups
- create privs matched to API functionality - create privs matched to API functionality
* User Interface
- search/discover cpf data; need a list of filters/facets - search/discover cpf data; need a list of filters/facets
- dashboard - dashboard
- dashboard content/tabs
- dashboard content/tabs
- search history - search history
- clear history - clear history
- search history - search history
...@@ -49,28 +42,23 @@ only sufficient detail to disambiguate items. ...@@ -49,28 +42,23 @@ only sufficient detail to disambiguate items.
- change sort order (maybe version 2) - change sort order (maybe version 2)
- system/work flow messages - system/work flow messages
- account settings - account settings
- social media - social media
- web admin (for admins only) - web admin (for admins only)
- institutional admin (for institutional admins only) - institutional admin (for institutional admins only)
- moderator admin (for moderators only) - moderator admin (for moderators only)
- edit cpf data
- available reports
- links to the rest of the web site, especially search and edit
- edit cpf data
- edit UI - edit UI
- per field data validation - per field data validation
- record validation - record validation
- user message system in UI - user message system in UI
- work flow - work flow
- dashboard for workspace, task list - dashboard for workspace, task list
* Generated Documents
- available reports
- EAC-CPF export
- RDF/Turtle triples
- JSON-LD
- split merged records, know that some record consists of merged records - split merged records, know that some record consists of merged records
...@@ -95,12 +83,6 @@ only sufficient detail to disambiguate items. ...@@ -95,12 +83,6 @@ only sufficient detail to disambiguate items.
- architecture of identity, expandable design - architecture of identity, expandable design
- authority and controlled vocabulary management - authority and controlled vocabulary management
- web UI/web application
- Linux, Apache httpd, PostgreSQL (aka Postgres)
- HTML, CSS, JavaScript
- work flow
- work flows - work flows
- may want wild-west non-locking edits - may want wild-west non-locking edits
...@@ -500,138 +482,6 @@ also needs to support bulk data edits of several types. ...@@ -500,138 +482,6 @@ also needs to support bulk data edits of several types.
Does this mean the admin dashboard? Does this mean the admin dashboard?
#### User Management
Authentication is validating user logins to the system. Authorization is the related aspect of controlling
which parts of the system users may access (or even which parts they may know exist).
We can use OpenID for authentication, but we will need a user profile for SNAC roles and authorization. There
are examples of PHP code to implement OpenID at stackexchange:
http://stackoverflow.com/questions/4459509/how-to-use-open-id-as-login-system
OpenID seems to constantly be changing, and sites using change frequently. Google has (apparently) deprecated
OpenID 2.0 in favor of Open Connect. Facebook is using something else, but apparently FB still works with
OpenID. Stackexchange supports several authentication schemes. If they can do it, so can we. Or we can support
one scheme for starters and add others as necessary. The SE code is not open source, so we can't see how much
work it was to support the various OpenID partners.
Authorization involves controlling what users can do once they are in the system. That function is sort of
more solved by OAuth or OpenID by sharing the user profile. However, SNAC has specific requirements,
especially our roles, and those will not be found in other system. There is not anything we must have from
user profiles. We might want their social networking profile, but social networking is not a core function of
SNAC.
By default users can't do anything that isn't exposed to the non-authenticated public users. Privileges are
added and users are given roles (aka groups) from which they inherit privileges. The authorization system is
involved in every transaction with the server to the extent that every request to the server is checked for
authorization before being passed to the code doing the real work.
The Linux model of three privilege types "user", "group", and "other" works well for authorization permissions
and we should use this model. "User" is an authenticated user. "Group" is a set of users, and a user may
belong to several groups. In SNAC and the non-Linux world "group" is known as "role", so SNAC will call them
"roles". "Other" privileges apply to SNAC as public, non-authenticated users, although we don't really have
"other", and the "researcher" role applies to public users.
Users can have several roles, and will have all the privileges of all their roles. Role membership is managed
by an administrative UI (part of the dashboard) and related API code. User information such as name, phone
number, and even password can also change. User ID values cannot be changed, and a user ID is never reused,
even after account deletion.
We expect to create additional roles as necessary for application functions.
Roles include a large number "is instution member" roles. These should be roles like any other, but we may
want to flag these role records to make them easy to manage and easy to display in the UI. Any user can have
zero or more roles that define their instutional affiliation. This primarily effects reporting and admin. In
the case of reports, membership in an institution constrains the reporting. When setting up a report, users
may only choose from institutions of which they are members. Some reports may auto-detect the user's
membership.
By and large when we refer to "accounts" we mean web accounts managed by the Manager/Web admin. The general
public can use the discovery interface without an account, but saving search history, and other
session related discovery tools requires an account. It is technically possible to have a single session
dashboard. Although that has not been mentioned as a requirement and is probably a low priority, it might be
almost trivial to implement.
Every account will be in the "Researcher" role which has the same privileges as the general public, but with a
TBD set of basic privileges including: search history, certain researcher reports.
| User type | Role | Description |
|----------------------------+---------------------+-----------------------------------------------------------------------|
| Sysadmin | Server admin | Maintain server, backups, etc. |
| Database Administrator | DBA | Schema maintenance, data dumps, etc. |
| Software engineer | Developer | Coding, testing, QA, release management, data loading, etc. |
| Manager | Web admin | Web accounts: create, manage, assign roles, run reports |
| Peer vetting | Vetting | Approve moderators, reviewers, content experts |
| Moderator | Moderator | Approve maintenance changes, posting those changes |
| Reviewer/editor | Maintenance | Maintainer privileges, interacts with moderators |
| Content expert | Maintenance | Domain expert, may have zero institutional roles |
| Documentary editor | Maintenance | Distinguished by? |
| Maintenance | Maintenance | Distinguished by? |
| Researcher | Researcher | Use the discovery interface and history dashboard |
| Archival description donor | Block upload | Bulk uploads of CPF or finding aids |
| Name authority manager | Name authority | Donates name authority data perhaps via bulk upload |
| Institutional admins | Institutional admin | Instutional role admin dashboard, institutional reports |
| Public | Researcher | No account, researcher role, no dashboard or single session dashboard |
Remember: institutional affiliation roles aren't in the table above. There will be many of those roles, and
users may have zero, one, or several institutional roles that define which insitutions that user is a member
of.
It is possible for an institutional admin to be a member of more than one institution. Institutional Admins
have abilities:
- view membership lists of their institution(s)
- add or remove their instutional role for users.
Roles which require one or more instutitutional roles (affiliation):
- Block upload
- Name authority
- Institutional admin
Roles which may have zero or more institutional roles:
- Web admin
- Vetting
- Moderator
- Maintenance (likely to have one or more)
- Researcher
There are several dashboard sections:
- Standard researcher history
- Standard user account management (password, email, etc.)
- Web admin account creation, deletion, role assignments
- Vetting admin (if we have vetting)
- Available reports.
#### Web Application Administration
System administration will be required for the web application and the
server hosting the web site. This is well understood from a technical
point of view. We should have more than usual documentation of the
command line accounts involved, and server configuration. This aspect of
administration integrates with versioning, backup, and software
releases.
#### Reports
While the web interface is the primary public face of SNAC, many other views of the data and meta data are
necessary, especially for admins and governance. Those "views" are reports and will primary be generated via
integration of a third-party reporting package such as Jaspersoft Business Intelligence Suite, which is free,
open source, and includes a full range of tools. All SNAC data resides in PostgreSQL, the standard SQL
relational database management system (RDBMS) which simplifies the process of adding reporting and business
intelligence.
(How much detail do we want about reports? Maybe just half a dozen
examples?)
#### System Administration #### System Administration
...@@ -693,79 +543,9 @@ logic that we will support with UI, code, and database tables/fields. ...@@ -693,79 +543,9 @@ logic that we will support with UI, code, and database tables/fields.
Many reports will be limited certain roles. Admin users will likely be Many reports will be limited certain roles. Admin users will likely be
heavy report users. heavy report users.
#### Ability to Open/Close the Site during Maintenance
If the web application has a "closed for maintenance" feature, this feature would be available to web admins,
even though it is the Linux sysadmins who will do the maintenance. A common major failure of web applications
is the assumption that the product is always up. This creates havoc when the site simply fails to load due to
an outage, planned or otherwise. With a little work we should be able to have an orderly "site is closed" web
page and status message for planned outages. We might be able to failover to some kind of system status
message. This is a low priority feature since downtime is probably only a few hours per year. At the same
time, if it isn't too difficult to implement, it sets our project apart from the majority who either ignore
the problem, or let their help desk folks spend an hour apologizing to customers.
When the product is closed, web admins should be able to login (assuming login is possible).
comment: Do we want an architecture where the login is essentially a separate product so that we can have a
"lobby" and other front end features that continue to work even when the backend is down for maintenance?
Most sites simply return a server error or site not available (404) when the site is down for whatever
reason. We can avoid this a couple of ways. The simplest is to use some Apache server features and a few
simple scripts so that users see a nice message when the site is down for maintenance. This very simple
approach requires little or no change to our software architecture. The more elegant approach is to use one of
several system architectures that  keep a small system front end always running.
#### Sandbox for Training, perhaps as a clone of the QA system?
#### ArchiveSpace Feature Planning via Brad #### ArchiveSpace Feature Planning via Brad
This section will require some discussion (conference calls) with Brad This section will require some discussion (conference calls) with Brad
and others. and others.
#### Staffing Model (Brian's draft suggestions)
Production of a cooperatively maintained high profile web site requires
different types of Technical and non-technical work.
Operations Team
- Communications and interactions with end users and content owners,
from marketing to user support, assessment
- Manages help desk
- Support production web application infrastructure, including
monitoring, "on call" for first tier response to system monitors
- batch ingest of new data sources
- signs up and on-boards new pilot members
- Proactive content QA and remediation
- work organized around issue queue / customer relationship management
system
Main Artifact: Ticketing Issue tracker that automatically generates a
ticket for an email to help@example.edu
Development Team
- Create new features that deliver customer value
- Maintain tests for new features
- second tier support of deployed features, developers on call for
their deployed code
- deploy code to test, stage, and production environments
- work organized around sprints
Main Artifact: User story backlog that supports scoring stories by
points,
Research Team
- Conduct experiments with new algorithms and technologies
- interoperation (and participation in the development) of relevant
domain specific standards and practices
Main Artifact: Research Agenda, schemas and specifications (esp. merge
spec)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment