Started splitting requirements.md. Moved progess files to Unsorted/

1be90de9 · Robbie Hott · 4d3208dc · 1be90de9 · 1be90de9 · 1be90de9
Commit 1be90de9 authored Sep 24, 2015 by Robbie Hott
8 changed files
--- a/Discussion/Staffing Model.md
+++ b/Discussion/Staffing Model.md
+# Staffing Model (Brian's draft suggestions)
+Production of a cooperatively maintained high profile web site requires
+different types of Technical and non-technical work.
+Operations Team
+- Communications and interactions with end users and content owners,
+    from marketing to user support, assessment
+- Manages help desk
+-   Support production web application infrastructure, including
+    monitoring, "on call" for first tier response to system monitors
+- batch ingest of new data sources
+-   signs up and on-boards new pilot members
+- Proactive content QA and remediation
+-   work organized around issue queue / customer relationship management
+    system
+Main Artifact: Ticketing Issue tracker that automatically generates a
+ticket for an email to help@example.edu
+Development Team
+- Create new features that deliver customer value
+-   Maintain tests for new features
+- second tier support of deployed features, developers on call for
+    their deployed code
+- deploy code to test, stage, and production environments
+-   work organized around sprints
+Main Artifact: User story backlog that supports scoring stories by
+points,
+Research Team
+- Conduct experiments with new algorithms and technologies
+- interoperation (and participation in the development) of relevant
+    domain specific standards and practices
+Main Artifact: Research Agenda, schemas and specifications (esp. merge
+spec)
--- a/Requirements/Generated Documents.md
+++ b/Requirements/Generated Documents.md
@@ -18,6 +18,11 @@ Data should be available to be downloaded in the following formats:
 ## System Reports
+While the web interface is the primary public face of SNAC, many other views of the data and meta data are
+necessary, especially for admins and governance. Those "views" are reports and will primary be generated via
+integration of a third-party reporting package such as Jaspersoft Business Intelligence Suite, which is free,
+open source, and includes a full range of tools.
 For each user of the system, the following reports should be available for download:
 * List of records the user has edited

--- a/Requirements/Identity Reconciliation.md
+++ b/Requirements/Identity Reconciliation.md
+# Identity Reconciliation
--- a/Requirements/User Interface.md
+++ b/Requirements/User Interface.md
@@ -9,10 +9,10 @@ on stuff where we don't agree.
 Requirements:
 - expose an http accessible API that is viable for `wget` or `curl`, browser `<form>`, and Ajax calls.
 - Supported input format depends on the complexity of the requested operation.
 - Public functions require no authentication. Everything else must include authentication data.
+- Sandbox functionality to for training and testing, which doesn't modify actual SNAC data
 ### Web application output via template
@@ -89,3 +89,26 @@ The watcher should have the ability to disable their watch. After each edit, all
 watchers will get a notification. The watch does not apply to any single field, but to the entire description, and therefore also to future descriptions which result from merging.
 When an identity constellation is split, the watch propagates to both resulting records.  The user will be informed of the change, and then may choose to disable one of the watchers.
+### Ability to Open/Close the Site during Maintenance
+If the web application has a "closed for maintenance" feature, this feature would be available to web admins,
+even though it is the Linux sysadmins who will do the maintenance. A common major failure of web applications
+is the assumption that the product is always up.  This creates havoc when the site simply fails to load due to
+an outage, planned or otherwise. With a little work we should be able to have an orderly "site is closed" web
+page and status message for planned outages. We might be able to failover to some kind of system status
+message. This is a low priority feature since downtime is probably only a few hours per year.  At the same
+time, if it isn't too difficult to implement, it sets our project apart from the majority who either ignore
+the problem, or let their help desk folks spend an hour apologizing to customers.
+When the product is closed, web admins should be able to login (assuming login is possible).
+comment: Do we want an architecture where the login is essentially a separate product so that we can have a
+"lobby" and other front end features that continue to work even when the backend is down for maintenance?
+Most sites simply return a server error or site not available (404) when the site is down for whatever
+reason. We can avoid this a couple of ways. The simplest is to use some Apache server features and a few
+simple scripts so that users see a nice message when the site is down for maintenance. This very simple
+approach requires little or no change to our software architecture. The more elegant approach is to use one of
+several system architectures that  keep a small system front end always running.
--- a/Requirements/User Management.md
+++ b/Requirements/User Management.md
+# User Management
+Authentication is validating user logins to the system. Authorization is the related aspect of controlling
+which parts of the system users may access (or even which parts they may know exist).
+We can use OpenID for authentication, but we will need a user profile for SNAC roles and authorization. There
+are examples of PHP code to implement OpenID at stackexchange:
+http://stackoverflow.com/questions/4459509/how-to-use-open-id-as-login-system
+OpenID seems to constantly be changing, and sites using change frequently. Google has (apparently) deprecated
+OpenID 2.0 in favor of Open Connect. Facebook is using something else, but apparently FB still works with
+OpenID. Stackexchange supports several authentication schemes. If they can do it, so can we. Or we can support
+one scheme for starters and add others as necessary. The SE code is not open source, so we can't see how much
+work it was to support the various OpenID partners.
+Authorization involves controlling what users can do once they are in the system. That function is sort of
+more solved by OAuth or OpenID by sharing the user profile. However, SNAC has specific requirements,
+especially our roles, and those will not be found in other system. There is not anything we must have from
+user profiles. We might want their social networking profile, but social networking is not a core function of
+SNAC.
+By default users can't do anything that isn't exposed to the non-authenticated public users. Privileges are
+added and users are given roles (aka groups) from which they inherit privileges. The authorization system is
+involved in every transaction with the server to the extent that every request to the server is checked for
+authorization before being passed to the code doing the real work.
+The Linux model of three privilege types "user", "group", and "other" works well for authorization permissions
+and we should use this model.  "User" is an authenticated user. "Group" is a set of users, and a user may
+belong to several groups. In SNAC and the non-Linux world "group" is known as "role", so SNAC will call them
+"roles". "Other" privileges apply to SNAC as public, non-authenticated users, although we don't really have
+"other", and the "researcher" role applies to public users.
+Users can have several roles, and will have all the privileges of all their roles. Role membership is managed
+by an administrative UI (part of the dashboard) and related API code. User information such as name, phone
+number, and even password can also change. User ID values cannot be changed, and a user ID is never reused,
+even after account deletion.
+We expect to create additional roles as necessary for application functions.
+Roles include a large number "is instution member" roles. These should be roles like any other, but we may
+want to flag these role records to make them easy to manage and easy to display in the UI. Any user can have
+zero or more roles that define their instutional affiliation. This primarily effects reporting and admin. In
+the case of reports, membership in an institution constrains the reporting. When setting up a report, users
+may only choose from institutions of which they are members. Some reports may auto-detect the user's
+membership.
+By and large when we refer to "accounts" we mean web accounts managed by the Manager/Web admin. The general
+public can use the discovery interface without an account, but saving search history, and other
+session related discovery tools requires an account. It is technically possible to have a single session
+dashboard. Although that has not been mentioned as a requirement and is probably a low priority, it might be
+almost trivial to implement.
+Every account will be in the "Researcher" role which has the same privileges as the general public, but with a
+TBD set of basic privileges including: search history, certain researcher reports.
+| User type                  | Role                | Description                                                           |
+|----------------------------+---------------------+-----------------------------------------------------------------------|
+| Sysadmin                   | Server admin        | Maintain server, backups, etc.                                        |
+| Database Administrator     | DBA                 | Schema maintenance, data dumps, etc.                                  |
+| Software engineer          | Developer           | Coding, testing, QA, release management, data loading, etc.           |
+| Manager                    | Web admin           | Web accounts: create, manage, assign roles, run reports               |
+| Peer vetting               | Vetting             | Approve moderators, reviewers, content experts                        |
+| Moderator                  | Moderator           | Approve maintenance changes, posting those changes                    |
+| Reviewer/editor            | Maintenance         | Maintainer privileges, interacts with moderators                      |
+| Content expert             | Maintenance         | Domain expert, may have zero institutional roles                      |
+| Documentary editor         | Maintenance         | Distinguished by?                                                     |
+| Maintenance                | Maintenance         | Distinguished by?                                                     |
+| Researcher                 | Researcher          | Use the discovery interface and history dashboard                     |
+| Archival description donor | Block upload        | Bulk uploads of CPF or finding aids                                   |
+| Name authority manager     | Name authority      | Donates name authority data perhaps via bulk upload                   |
+| Institutional admins       | Institutional admin | Instutional role admin dashboard, institutional reports               |
+| Public                     | Researcher          | No account, researcher role, no dashboard or single session dashboard |
+Remember: institutional affiliation roles aren't in the table above. There will be many of those roles, and
+users may have zero, one, or several institutional roles that define which insitutions that user is a member
+of.
+It is possible for an institutional admin to be a member of more than one institution. Institutional Admins
+have abilities:
+- view membership lists of their institution(s)
+- add or remove their instutional role for users.
+Roles which require one or more instutitutional roles (affiliation):
+- Block upload
+- Name authority
+- Institutional admin
+Roles which may have zero or more institutional roles:
+- Web admin
+- Vetting
+- Moderator
+- Maintenance (likely to have one or more)
+- Researcher
+There are several dashboard sections:
+- Standard researcher history
+- Standard user account management (password, email, etc.)
+- Web admin account creation, deletion, role assignments
+- Vetting admin (if we have vetting)
+- Available reports.
--- a/Specifications/Server Architecture.md
+++ b/Specifications/Server Architecture.md
+# SNAC Server Architecture
+The system will be architected as a LAMP system, with the following components:
+* Linux: CentOS 7
+* Apache: Apache 2 web server
+* PHP: PHP 7
+* PostgreSQL: Postgres
+Each component of the architecture will run on this platform.  Any sub-component must either produce it's own http server on an available port, such as Elastic Search, or utilize the main Apache web server running a virtual host.
+The following diagrams describe the architecture of internal components:
+* [Overall Server Architecture](http://gitlab.iath.virginia.edu/snac/Documentation/raw/master/Specifications/SNAC%20Server%20Architecture.pdf)
--- a/Requirements/SQL-Schema-Tech-Requirements.md
+++ b/Requirements/SQL-Schema-Tech-Requirements.md
--- a/Requirements/requirements.md
+++ b/Requirements/requirements.md
@@ -21,27 +21,20 @@ This is the definitive list of all requirements, briefly. Anything the applicati
 list. Each item and group of items is explained in detail later in the document. Being a "list", this includes
 only sufficient detail to disambiguate items.
-**Todo: We should reorganize these by component of the system, then create a formal specifications for each component as to what it must do and how they interact.**
+* User Management
+    - authentication
- authentication
        - user creation
        - authentication
        - account maintenance
+    - authorization
- authorization
        - user/group/other (ugo) with read/write (rw) privilege system, typical for Linux
        - admin tool for group privs
        - create/edit/delete groups
        - create privs matched to API functionality
+* User Interface
- search/discover cpf data; need a list of filters/facets
+    - search/discover cpf data; need a list of filters/facets
        - dashboard
+    - dashboard content/tabs
- dashboard content/tabs
        - search history
            - clear history
            - search history
@@ -49,28 +42,23 @@ only sufficient detail to disambiguate items.
            - change sort order (maybe version 2)
    - system/work flow messages
    - account settings
    - social media
    - web admin (for admins only)
    - institutional admin (for institutional admins only)
    - moderator admin (for moderators only)
+    - edit cpf data
-  - available reports
-  - links to the rest of the web site, especially search and edit
- edit cpf data
        - edit UI
        - per field data validation
        - record validation
        - user message system in UI
        - work flow
        - dashboard for workspace, task list
+* Generated Documents
+    - available reports
+    - EAC-CPF export
+    - RDF/Turtle triples
+    - JSON-LD
 - split merged records, know that some record consists of merged records
@@ -95,12 +83,6 @@ only sufficient detail to disambiguate items.
  - architecture of identity, expandable design
  - authority and controlled vocabulary management
- web UI/web application
-  - Linux, Apache httpd, PostgreSQL (aka Postgres)
-  - HTML, CSS, JavaScript
-  - work flow
 - work flows
  - may want wild-west non-locking edits
@@ -500,138 +482,6 @@ also needs to support bulk data edits of several types.
 Does this mean the admin dashboard?
-#### User Management
-Authentication is validating user logins to the system. Authorization is the related aspect of controlling
-which parts of the system users may access (or even which parts they may know exist).
-We can use OpenID for authentication, but we will need a user profile for SNAC roles and authorization. There
-are examples of PHP code to implement OpenID at stackexchange:
-http://stackoverflow.com/questions/4459509/how-to-use-open-id-as-login-system
-OpenID seems to constantly be changing, and sites using change frequently. Google has (apparently) deprecated
-OpenID 2.0 in favor of Open Connect. Facebook is using something else, but apparently FB still works with
-OpenID. Stackexchange supports several authentication schemes. If they can do it, so can we. Or we can support
-one scheme for starters and add others as necessary. The SE code is not open source, so we can't see how much
-work it was to support the various OpenID partners.
-Authorization involves controlling what users can do once they are in the system. That function is sort of
-more solved by OAuth or OpenID by sharing the user profile. However, SNAC has specific requirements,
-especially our roles, and those will not be found in other system. There is not anything we must have from
-user profiles. We might want their social networking profile, but social networking is not a core function of
-SNAC. 
-By default users can't do anything that isn't exposed to the non-authenticated public users. Privileges are
-added and users are given roles (aka groups) from which they inherit privileges. The authorization system is
-involved in every transaction with the server to the extent that every request to the server is checked for
-authorization before being passed to the code doing the real work.
-The Linux model of three privilege types "user", "group", and "other" works well for authorization permissions
-and we should use this model.  "User" is an authenticated user. "Group" is a set of users, and a user may
-belong to several groups. In SNAC and the non-Linux world "group" is known as "role", so SNAC will call them
-"roles". "Other" privileges apply to SNAC as public, non-authenticated users, although we don't really have
-"other", and the "researcher" role applies to public users.
-Users can have several roles, and will have all the privileges of all their roles. Role membership is managed
-by an administrative UI (part of the dashboard) and related API code. User information such as name, phone
-number, and even password can also change. User ID values cannot be changed, and a user ID is never reused,
-even after account deletion.
-We expect to create additional roles as necessary for application functions.
-Roles include a large number "is instution member" roles. These should be roles like any other, but we may
-want to flag these role records to make them easy to manage and easy to display in the UI. Any user can have
-zero or more roles that define their instutional affiliation. This primarily effects reporting and admin. In
-the case of reports, membership in an institution constrains the reporting. When setting up a report, users
-may only choose from institutions of which they are members. Some reports may auto-detect the user's
-membership.
-By and large when we refer to "accounts" we mean web accounts managed by the Manager/Web admin. The general
-public can use the discovery interface without an account, but saving search history, and other
-session related discovery tools requires an account. It is technically possible to have a single session
-dashboard. Although that has not been mentioned as a requirement and is probably a low priority, it might be
-almost trivial to implement.
-Every account will be in the "Researcher" role which has the same privileges as the general public, but with a
-TBD set of basic privileges including: search history, certain researcher reports. 
-| User type                  | Role                | Description                                                           |
-|----------------------------+---------------------+-----------------------------------------------------------------------|
-| Sysadmin                   | Server admin        | Maintain server, backups, etc.                                        |
-| Database Administrator     | DBA                 | Schema maintenance, data dumps, etc.                                  |
-| Software engineer          | Developer           | Coding, testing, QA, release management, data loading, etc.           |
-| Manager                    | Web admin           | Web accounts: create, manage, assign roles, run reports               |
-| Peer vetting               | Vetting             | Approve moderators, reviewers, content experts                        |
-| Moderator                  | Moderator           | Approve maintenance changes, posting those changes                    |
-| Reviewer/editor            | Maintenance         | Maintainer privileges, interacts with moderators                      |
-| Content expert             | Maintenance         | Domain expert, may have zero institutional roles                      |
-| Documentary editor         | Maintenance         | Distinguished by?                                                     |
-| Maintenance                | Maintenance         | Distinguished by?                                                     |
-| Researcher                 | Researcher          | Use the discovery interface and history dashboard                     |
-| Archival description donor | Block upload        | Bulk uploads of CPF or finding aids                                   |
-| Name authority manager     | Name authority      | Donates name authority data perhaps via bulk upload                   |
-| Institutional admins       | Institutional admin | Instutional role admin dashboard, institutional reports               |
-| Public                     | Researcher          | No account, researcher role, no dashboard or single session dashboard |
-Remember: institutional affiliation roles aren't in the table above. There will be many of those roles, and
-users may have zero, one, or several institutional roles that define which insitutions that user is a member
-of.
-It is possible for an institutional admin to be a member of more than one institution. Institutional Admins
-have abilities:
- view membership lists of their institution(s)
- add or remove their instutional role for users. 
-Roles which require one or more instutitutional roles (affiliation):
- Block upload
- Name authority
- Institutional admin
-Roles which may have zero or more institutional roles:
- Web admin
- Vetting
- Moderator
- Maintenance (likely to have one or more)
- Researcher
-There are several dashboard sections:
- Standard researcher history
- Standard user account management (password, email, etc.)
- Web admin account creation, deletion, role assignments
- Vetting admin (if we have vetting) 
- Available reports. 
-#### Web Application Administration
-System administration will be required for the web application and the
-server hosting the web site. This is well understood from a technical
-point of view. We should have more than usual documentation of the
-command line accounts involved, and server configuration. This aspect of
-administration integrates with versioning, backup, and software
-releases.
-#### Reports
-While the web interface is the primary public face of SNAC, many other views of the data and meta data are
-necessary, especially for admins and governance. Those "views" are reports and will primary be generated via
-integration of a third-party reporting package such as Jaspersoft Business Intelligence Suite, which is free,
-open source, and includes a full range of tools.  All SNAC data resides in PostgreSQL, the standard SQL
-relational database management system (RDBMS) which simplifies the process of adding reporting and business
-intelligence.
-(How much detail do we want about reports? Maybe just half a dozen
-examples?)
 #### System Administration
@@ -693,79 +543,9 @@ logic that we will support with UI, code, and database tables/fields.
 Many reports will be limited certain roles. Admin users will likely be
 heavy report users.
-#### Ability to Open/Close the Site during Maintenance
-If the web application has a "closed for maintenance" feature, this feature would be available to web admins,
-even though it is the Linux sysadmins who will do the maintenance. A common major failure of web applications
-is the assumption that the product is always up.  This creates havoc when the site simply fails to load due to
-an outage, planned or otherwise. With a little work we should be able to have an orderly "site is closed" web
-page and status message for planned outages. We might be able to failover to some kind of system status
-message. This is a low priority feature since downtime is probably only a few hours per year.  At the same
-time, if it isn't too difficult to implement, it sets our project apart from the majority who either ignore
-the problem, or let their help desk folks spend an hour apologizing to customers.
-When the product is closed, web admins should be able to login (assuming login is possible). 
-comment: Do we want an architecture where the login is essentially a separate product so that we can have a
-"lobby" and other front end features that continue to work even when the backend is down for maintenance?
-Most sites simply return a server error or site not available (404) when the site is down for whatever
-reason. We can avoid this a couple of ways. The simplest is to use some Apache server features and a few
-simple scripts so that users see a nice message when the site is down for maintenance. This very simple
-approach requires little or no change to our software architecture. The more elegant approach is to use one of
-several system architectures that  keep a small system front end always running.
-#### Sandbox for Training, perhaps as a clone of the QA system?
 #### ArchiveSpace Feature Planning via Brad
 This section will require some discussion (conference calls) with Brad
 and others.
-#### Staffing Model (Brian's draft suggestions)
-Production of a cooperatively maintained high profile web site requires
-different types of Technical and non-technical work.
-Operations Team
- Communications and interactions with end users and content owners,
-    from marketing to user support, assessment
- Manages help desk
-   Support production web application infrastructure, including
-    monitoring, "on call" for first tier response to system monitors
- batch ingest of new data sources
-   signs up and on-boards new pilot members
- Proactive content QA and remediation
-   work organized around issue queue / customer relationship management
-    system
-Main Artifact: Ticketing Issue tracker that automatically generates a
-ticket for an email to help@example.edu
-Development Team
- Create new features that deliver customer value
-   Maintain tests for new features
- second tier support of deployed features, developers on call for
-    their deployed code
- deploy code to test, stage, and production environments
-   work organized around sprints
-Main Artifact: User story backlog that supports scoring stories by
-points,
-Research Team
- Conduct experiments with new algorithms and technologies
- interoperation (and participation in the development) of relevant
-    domain specific standards and practices
-Main Artifact: Research Agenda, schemas and specifications (esp. merge
-spec)