moving files from wiki

da334479 · Tom Laudeman · 7a2c70bc · 3d9e8709 · da334479 · da334479
Commit da334479 authored Aug 04, 2015 by Tom Laudeman
Hide whitespace changes
Inline Side-by-side

Showing with 146 additions and 4 deletions

README.md README.md +3 -2

plan.md tat_requirements/plan.md +75 -2

requirements.md tat_requirements/requirements.md +68 -0

No files found.
--- a/README.md
+++ b/README.md
 # Documentation

-Quite a bit of documentation is found on the Wiki. (See the "Wiki" menu on the left nav bar.)
+Documentation is best created as files (preferrably markdown format) in the repository, and in a relevant directory.
+
+The currently-being-revised TAT requirements are found in the [tat_requirements](tat_requirements).

-The currently-being-revised TAT requirements are found here in the "tat_requirements".

 Note for TAT functional requirements: need to have UI widget for search of very long fields, such as the Joseph Henry cpfRelations
 that contain some 22K entries. Also need to list all fields which migh have large numbers of values. In fact, part of the meta data for

--- a/tat_requirements/plan.md
+++ b/tat_requirements/plan.md
@@ -2,16 +2,24 @@
 Big questions
 ---

- how is gitlab backed up?
+- (solved) how is gitlab backed up?

+  - Shayne backs up the whole gitlab VM.
+
+- We need a complete description of controlled vocabulary hierarchy. This ties in with, and is similar
+  (perhaps computationally transformaable to) a tagging system. See [Tag system](#controlled-vocabularies-and-tag-system).
+  

 Overview and order of work
 ---


 1. create tech documents, filling in as much prose as possible
+   - currenly on-going

 1. create prototype software to test tech requirements, iterate updating requirements and prototype
+   - Work flow engine is working and has both a command-line and web interface
+   - We have a SQL database schema

 1. create tests for test driven development, and validate prototype

@@ -31,20 +39,79 @@ Code we write

  - exists, needs tests, needs requirements
  
- SQL schema (Robbie)
+  - needs to be integrated into an index.php script that also checks authentication
+  
+  - can the workflow also support the login.php authentication? (Yes).
+  
+- SQL schema (Robbie, Tom)

  - exists, needs tests, needs requirements
+  
+  - should we re-architect tables become normal tables, the views go away, and versioned records are moved to shadow tables.
+  
+  - add features for delete-via-mark (as opposed to actual delete)
+  
+  - add features to support embargo
+  
+  - *maybe, discuss* change vocabulary.sql from insert to copy. It would be smaller and faster, although in reality as soon as
+    it is in the database, the text file will never be touched again.
+    
+- discuss; Can/should we create a tag system to deal with ad-hoc requirements later in the project? [Tag system](#controlled-vocabularies-and-tag-system)

 - CPF to SQL parser (Robbie)

  - exists, needs tests, needs requirements
  
+- Name serialization tool, selectable pre-configured formats
+
 - Name string parser

+    - Can we find a grammar-based parser for PHP? Should we use a standalone parser?
+
 - Date parser

+  - Can this use the same parser engine as the name string parser?
+
+- CPF record edit, edit each field
+
+- CPF record split, split data into separate cpf identities, deprecate old ARK, mint new ARKs
+
+- CPF record merge, combine fields, deprecate old ARKs, mint new ARK
+
 - coding style, class template (architect Robbie)

+- We need to have UI edit/chooser widget for search and select of large numbers of options, such as the Joseph
+  Henry cpfRelations that contain some 22K entries. Also need to list all fields which might have large numbers
+  of values. In fact, part of the meta data for every field is "number of possible entries/reapeat values" or
+  whatever that's called. From a software architecture perspective, the answer is 0, 1, infinite.
+
+
+Controlled vocabularies and tag system 
+---
+
+The tags are very similar to vocabulary entryies in that the tag values are controlled. The difference being a
+weaker moderation of tags and more readiness to create new tags (types). The tag table would consist of tag,
+value and is essentially a name-value system. If we create tags, should we try to enforce some data typing,
+that is: string, int, date, float, etc.?
+
+
+It seems pretty clear that some controlled vocabularies need to have n sub-vocabularies. And some subclasses
+can probably appear with several super-classes. 'Periodicals' appears in around 400 subjects. It also appears
+that the order of sub and super is not well defined. In some cases 'Periodicals' is the final subject, and in
+other cases the first subject. Curiously, topical subject bears a strong resemblence to the "tags" suggested
+below in
+  
+```
+INSERT INTO vocabulary (type, value) values ( 'subject', 'American literature--19th century--Periodicals' );
+INSERT INTO vocabulary (type, value) values ( 'subject', 'American literature--20th century--Periodicals' );
+INSERT INTO vocabulary (type, value) values ( 'subject', 'Periodicals' );
+INSERT INTO vocabulary (type, value) values ( 'subject', 'Periodicals--19th century' );
+INSERT INTO vocabulary (type, value) values ( 'subject', 'World politics--Periodicals' );
+INSERT INTO vocabulary (type, value) values ( 'subject', 'World politics--Pictorial works' );
+INSERT INTO vocabulary (type, value) values ( 'subject', 'World politics--Societies, etc.' );
+INSERT INTO vocabulary (type, value) values ( 'subject', 'World politics--Study and teaching' );
+```
+

 Code we use off the shelf
 ---
@@ -56,7 +123,13 @@ Code we use off the shelf
 - test framework, need to choose one

 - authentication
+  - session management, especially as applies to authentication tokens, cookies and something which prevents
+    XSS (cross-site scripting) attacks
+    
+- JavaScript UI component tools, JQuery; what others?

 - reports, probably Jasper

 - PHP, Postgres, Linux, Apache httpd, etc.
+
+- language modules
--- a/tat_requirements/requirements.md
+++ b/tat_requirements/requirements.md
+Requirements from Rachael's spreadsheet
+---
+
+- Programmers contribute some time to help with technology side of the gap analysis of institutional capability
+
+- We need a concrete plan for persistent IDs.
+
+  - We need to manage base HREF stubs that are combined with persistent IDs to form working URLs. Ideally, all
+    the URLs could be composed via a format string (printf), so we could just store the ID, HREF stub, and
+    format string and be done with it. However, some URLs have interesting issues that require code and thus
+    exceed the abilities of normal format strings. We can certainly roll out an early version with format
+    strings, and add some clever functions later as necessary.
+
+- Do we need any additional requirements for related name linking?
+
+- Clarify: the co-op version 1 is not going to support bulk data ingest
+
+- Clarify: the co-op version 1 is not going to support bi-directional data exchange and update
+
+- Do we need full delete? For example, a CPF contains something illegal and must be fully deleted. How do we
+  delete from backups? Are either of these even required by policy?
+  
+- Are we assuming that data from the web browser has been sanity checked before hitting the server? Does the
+  server need to cache edit data prior to writing the data to the cpf database? For example, what if someone
+  enters "19th century" in a date field? It isn't valid, but we need to save their work.
+  
+- We need to sanity check any links we create, especially links back into SNAC.
+
+- Don't forget the X-to-CPF field mapping documentation, and this ties in to the "CPF data contributor's"
+  guide (below)
+
+- We need the "CPF data contributor's" guide.
+
+- What authority work will we be doing?
+
+- What authority data from other sources do we cache locally?
+
+- Create detailed functional requirements for controlled vocabularies, and a detailed implementation
+  specification.
+  
+- Clarify: versioning is per-record, not per-field. 
+
+- Need a watch/notification API. It needs a canonical name. Is there an off-the-shelf event monitor that will
+  easily integrate with the web REST API and work flow manager?
+  
+- Clarify: Are we integrating SNAC and ArchiveSpace in co-op version 1? Will ArchiveSpace have to use our REST API?
+
+- How is embargo implemented at the database level? What are the requirements for embargo?
+
+- Clarify / verify: Technical review vs content review is handled by a combination of roles and work flow.
+
+- Reports: Where are we keeping the Big List of All Reports? 
+
+- Clarify: row 43, (unclear) Consider implementing inked data standard for relationship links
+  instead of having to download an entire document of links, as it is configured now.
+  
+- Search: need the Big List of Search Facets, and someone needs to verify that Elastic Search can do facets.
+
+- Does co-op version 1 have a timeline visualization? Does it have a "sort by timeline"? What does it mean to
+  sort by timeline?
+  
+- Clarify: What is a context widget? - row 52, Continue to develop and refine context widget. (technical
+  requirements unclear)
+  
+- Clarify: we need requirements for citations, and details about where they integrate with the rest of the
+  system.
+
+
 List of requirements
 ---