Reworked and added comments, questions, and suggestions to the plan.md document.…

Reworked and added comments, questions, and suggestions to the plan.md document. Added comments to the requirements.md document.

Reworked and added comments, questions, and suggestions to the plan.md document.…
8e234548 · Robbie Hott · 0ba83b21 · 8e234548 · 8e234548
Commit 8e234548 authored Sep 01, 2015 by Robbie Hott
Hide whitespace changes
Inline Side-by-side

Showing with 90 additions and 31 deletions

plan.md tat_requirements/plan.md +84 -31

requirements.md tat_requirements/requirements.md +6 -0

No files found.
--- a/tat_requirements/plan.md
+++ b/tat_requirements/plan.md
@@ -52,10 +52,47 @@

 1. create version 1 of software

+**RH: I'd rework this a little, but would like feedback:**

-#### Code we write
+1. List requirements of the overall application. (done)

- Data validation API
+2. Organize requirements by component, clean up, flesh out, and vet.
+
+3. Create formal specifications for each component based on the official, clean, requirements document. Prototypes may help ensure the formal spec is written correctly.
+
+4. Define a timeline for development and prototyping based on the formal specifications document.
+
+5. Create tests for test-driven development based on the formal specification.  This includes creating and mining ground-truth data.
+
+6. Develop software based on formal specification that passes the given tests.
+
+#### Coding style requirements (architect Robbie)
+
+- Code quality
+  - 4-space tabs with literal spaces
+  - line-lengths of 100 characters or less
+  - Upper-cased class names with camel casing
+  - lower-cased field and variable names with camel casing
+  - no underscores in variable names  (we may revisit this to flop between camel and underscore, but should pick only one for sanity)
+- Intra-code documentation
+  - All files must have javadoc-style documentation with author attribution, definition of the file, and short-text of the code license
+  - All classes and definitions must include javadoc-style documentation
+
+#### Non-component notes to be worked into requirements 
+- CPF record edit, edit each field
+
+- CPF record split, split data into separate cpf identities, deprecate old ARK, mint new ARKs
+
+- CPF record merge, combine fields, deprecate old ARKs, mint new ARK
+
+### System Design
+
+#### Developed Components
+
+- Data validation engine
+  - **API:** Custom JSON (needs formal spec)
+
+  - The data validation engine applies a written system of rules to the incoming data.  The rules must be written in a human-readble form, such that non-technical individuals are able to write and understand the rules.  A rule-writing guide must be supplied to give hints and help for writing rules properly.  The engine will be pluggable and written as an MVC application.  The model will read the user-written rules (stored in a postgres database or flat-file system, depending on the model) and apply them to any input given on the view.  The initial view will be a JSON API, which accepts individual fields and returns the input with either validation suggestions or valid flags. 

  - rule based system abstracted out of the code
  - rules are data
@@ -69,20 +106,25 @@
    policy documentation.

 - Identitiy Reconciliation (aka IR) (architect Robbie)
+  - **API:** Custom JSON (needs formal spec) 

 - needs docs wrangled

 - workflow manager (architect Tom)
+  - **API:** Custom JSON? (needs formal spec)

  - exists, needs tests, needs requirements
+    * **We need to stop now, write requirements, then apply those requirements to the existant system to ensure we meet the requirements**
  
  - needs to be integrated into an index.php script that also checks authentication
  
  - can the workflow also support the login.php authentication? (Yes).
  
- SQL schema (Robbie, Tom)
+- PostgreSQL Storage: schema definition (Robbie, Tom)
+  - **API:** SQL

  - exists, needs tests, needs requirements
+    * **We need to stop now, write requirements, then apply those requirements moving forward to ensure we meet the requirements**
  
  - should we re-architect tables become normal tables, the views go away, and versioned records are moved to shadow tables.
  
@@ -96,31 +138,60 @@
 - discuss; Can/should we create a tag system to deal with ad-hoc requirements later in the project? [Tag system](#controlled-vocabularies-and-tag-system)

 - CPF to SQL parser (Robbie)
+  - **API:** EAC-CPF XML input, JSON output? (needs formal spec)

  - exists, needs tests, needs requirements
+    * **We need to stop now, write requirements, then apply those requirements moving forward to ensure we meet the requirements**
  
 - Name serialization tool, selectable pre-configured formats

- Name string parser
+- Name string parser **We need to distinguish between name parser and name-identity-string parser**
+  - **API:** subroutine? JSON?

    - Can we find a grammar-based parser for PHP? Should we use a standalone parser?

+    - Can we expose this as a JSON API such that it's given a name-string and returns an identity object of that identity's information?  Possibly a score as well as to how well we thought we could parse it?
+
 - Date parser
+  - **API:** subroutine? JSON?

  - Can this use the same parser engine as the name string parser?
+  - **This should be distinct, or may be a subroutine of the name-identity-string parser**
+  - Can we expose this as a JSON API such that given a set of dates, it returns a date object that lists out the individual dates?  Then, it could be called from the name-string parser to parse out dates.

- CPF record edit, edit each field
+- Editing User interface
+  - **API:** HTML front-end, makes calls to internal JSON API
+  - Must have ajax-backed interaction for displaying and searching large lists, such as cpfRelations for an identity.
+    - We need to have UI edit/chooser widget for search and select of large numbers of options, such as the Joseph
+      Henry cpfRelations that contain some 22K entries. Also need to list all fields which might have large numbers
+      of values. In fact, part of the meta data for every field is "number of possible entries/reapeat values" or
+      whatever that's called. From a software architecture perspective, the answer is 0, 1, infinite.

- CPF record split, split data into separate cpf identities, deprecate old ARK, mint new ARKs
+- History Research Tool (redefined)
+  - **API:** HTML front-end, makes calls to internal JSON API
+  - Needs to be reworked to support the Postgres backend

- CPF record merge, combine fields, deprecate old ARKs, mint new ARK
+#### Off-the-shelf Components

- coding style, class template (architect Robbie)
+- gitlab for developer documentation, code version management, internal issue tracking and milestone keeping
+
+- github public code repository
+
+- test framework, need to choose one
+
+- authentication
+  - session management, especially as applies to authentication tokens, cookies and something which prevents
+    XSS (cross-site scripting) attacks
+    
+- JavaScript UI component tools, JQuery; what others?
+  - Suggestions: bootstrap, angular JS, JQueryUI
+
+- reports, probably Jasper
+
+- PHP, Postgres, Linux, Apache httpd, etc.
+
+- language modules ?

- We need to have UI edit/chooser widget for search and select of large numbers of options, such as the Joseph
-  Henry cpfRelations that contain some 22K entries. Also need to list all fields which might have large numbers
-  of values. In fact, part of the meta data for every field is "number of possible entries/reapeat values" or
-  whatever that's called. From a software architecture perspective, the answer is 0, 1, infinite.


 #### Controlled vocabularies and tag system 
@@ -146,23 +217,5 @@ World politics--Societies, etc.
 World politics--Study and teaching
 ```

+**RH: I agree, this is super tricky.  We need to distinguish between types of controlled vocab, so that we don't mix Occupation and Subject.  A tagging system might be very nice for at least subjects.**

-#### Code we use off the shelf
-
- gitlab for docs, code version management, issue tracking(?)
-
- github public code repository?
-
- test framework, need to choose one
-
- authentication
-  - session management, especially as applies to authentication tokens, cookies and something which prevents
-    XSS (cross-site scripting) attacks
-    
- JavaScript UI component tools, JQuery; what others?
-
- reports, probably Jasper
-
- PHP, Postgres, Linux, Apache httpd, etc.
-
- language modules
--- a/tat_requirements/requirements.md
+++ b/tat_requirements/requirements.md
@@ -21,6 +21,8 @@ This is the definitive list of all requirements, briefly. Anything the applicati
 list. Each item and group of items is explained in detail later in the document. Being a "list", this includes
 only sufficient detail to disambiguate items.

+**RH: We should reorganize these by component of the system, then create a formal specifications for each component as to what it must do and how they interact.**
+
 - authentication

  - user creation
@@ -129,6 +131,7 @@ only sufficient detail to disambiguate items.
    format string and be done with it. However, some URLs have interesting issues that require code and thus
    exceed the abilities of normal format strings. We can certainly roll out an early version with format
    strings, and add some clever functions later as necessary.
+    * **RH: I don't understand this.  If we're generating persistant IDs, then we don't need to write "clever functions"**

 - Do we need any additional requirements for related name linking, or more accurately identity linking? Each
  identity has and ARK which is a persistent ID with an assciated URL. Use cases for identity links:
@@ -144,6 +147,7 @@ only sufficient detail to disambiguate items.
  2. External resources link to SNAC as an authority. (Tom comment: is SNAC also an archival resource?)

 - Clarify: the co-op version 1 is not going to support bulk data ingest
+    * **Why not?**

 - Clarify: the co-op version 1 is not going to support bi-directional data exchange and update

@@ -152,9 +156,11 @@ only sufficient detail to disambiguate items.
  
 - Are we assuming that data from the web browser has been sanity checked before hitting the server? (Yes, by
  the data validation API) 
+    * **What is the data validation api?  Wouldn't we need a validation engine?  Where will the engine live, in the browser?**
  
 - Does the server need to save temporary edit data prior to writing the data to the cpf database? For example, what if
  someone enters "19th century" in a date field? It isn't valid, but we need to save their work. (Yes, we need to save invalid user input, and give the user a useful message for each type of data validation failure.)
+    * **Where does the data validation engine live?  If we make that part of the UI, it may live on the browser as a downloaded JS engine**
  
 - We need to sanity check any links we create, especially links back into SNAC.