Commit 607a63e6 by Robbie Hott

Updated and cleaned the README file

parent 29459436
# Introduction to Documentation
The currently-being-revised TAT requirements are found in the [tat_requirements](tat_requirements).
The currently-being-revised Technical Requirements are found in the [Requirements Directory](Requirements). Formal Specifications of those requirements, as available are in the [Specifications Directory](Specifications). This repository also includes [Help](Help) on using Gitlab, Git, and the Markdown text format; [Documentation](Third Party Documentation) on third-party software and applications being used; [Historical Documentation](Historical Documentation) on previous iterations of SNAC; and [Discussions](Discussion) and [Notes](Notes) from the technical team.
The best place to start is the big, overall [plan](tat_requirements/plan.md).
The best place to start is the big, overall [plan](plan.md) document, which describes the process forward with defining requirements and specifications.
This is Gitlab, a work-alike clone of the Github web site, but installed locally on a SNAC server. Gitlab is a
This repository is stored in Gitlab, a work-alike clone of the Github web site, but installed locally on a SNAC server. Gitlab is a
version control system with a suite of project management tools.
Ideally we will all create documentation in markdown format (.md files). You may create and edit files from
the web interface here on gitlab, or download files and edit locally. You can also upload any file type using
Ideally we will all create documentation in [markdown format](http://daringfireball.net/projects/markdown/) (.md files). You may create and edit files from
the web interface here on Gitlab, or download files and edit locally. You can also upload any file type using
standard git commands, or use a Git graphical client (see below). Choose a relevant directory for your docs,
or create a new directory as necessary.
Markdown files are simple text files, which makes them easy to edit and universally portable. Markdown has a
limited set of conventions to denote headers, lists, URLs and so on. When uploaded to gitlab or github,
markdown files are rendered into nicely styled HTML. Tools are available to convert markdown into .doc, .pdf,
LaTex and other formats.
LaTex and other formats. For more information on Markdown, see [this guide](Help/Markdown.md).
#### How to use gitlab and markdown
[Help using gitlab](Help-using-gitlab.md)
---
Notes to merge with other requirements
---
Note for TAT functional requirements: need to have UI widget for search of very long fields, such as the Joseph Henry cpfRelations
that contain some 22K entries. Also need to list all fields which migh have large numbers of values. In fact, part of the meta data for
every field is "number of possible entries/reapeat values" or whatever that's called.This wiki serves as the documentation of the SNAC technical team as relates to Postgres and storage of data. Currently, we are documenting:
* the schema and reasons behind the schema,
* methods for handling versioning of eac-cpf documents, and
* elastic search for postgres.
* Need a data constraint and business logic API for validating data in the UI. This layer checks user inputs against some set of rules and when there is an issue it informs the user of problems and suggests solutions. Data should be saved regardless. We could allow "bad" data to go into the database (assumes text fields) and flag for later cleaning. That's ugly. Probably better to save inputs in some agnostic repo which could be json, frozen data structures, or name-value pairs, or even portable source format. The problem is that most often data validation is hard coded into UI JavaScript or host code. Validation really should be configurable separate from the display of data, workflow automation, and data storage. While our database needs to do certain rudimentary sanity checks, the database can't be expected to send messages all the way back up to the UI layer. Nor should the database be burdened with validation rules which are certain to be mutable. Ideally, the validation rules would work in the same state machine framework as the workflow automation API, and might even share some code.
* Need a mechanism to lock records. Even mark-as-deleted records won't necessarily solve all our problems, so we might want records that are "live", but not editable for whatever reason. The ability to lock records and having the lock integrated across all the data-aware APIs is a good idea.
* On the topic of data, we will have user/group/other and r/w permissions, and all data-aware APIs should be designed with that in mind.
* "Literate programming" Using MCV and the state machine workflow automation probably meets (and exceeds) Knuth's ideas of Literate programming. Look at his idea and make sure we aren't missing any key concepts.
* QA and testing needs to be several layers, one of which is simple documentation. We should have code that examines data for various qualities, which when done in a comprehensive manner will test the data for all properties described in the requirements. As bugs are discovered and features added, this data testing code would expand. Code should be tested on several levels as well, and the tests plus comments in the tests constitute our full understanding of both data and code.
* Entities (names) have ID values, ARKs and various kinds of persistent ids. Some subsystem needs to know how to gather various ID values, and how to generate URIs and URLs from those id values. All discovered ids need to be attached to the cpf identity in a table related_id. We will need to track the authority that issued the id, so we need an authority table. Perhaps it is best (as previously discussed) to create a CPF record for each authority, and use the CPF persistent ID as the authority identifier in the related_id table.
```
create table related_id (
ri_id auto primary key,
id_value text,
uri text,
url text,
authority_id int -- fk to cpf.id?
);
```
* Allow config of CPF output formats via web interface. For example, in the CPF generator, we can offer some format and config options such as name formats in <part> and/or <relationEntry>
- include 4 digit fromDate-toDate for person
- include dates for corporateBody
- use "fl." for active dates
- use "active" for active dates
- use explicit "b." and "d."
- only use "b." or "d." for single 4 digit dates
- enclose date in parentheses
- add comma between name and dates (applies only if there is a date)
and so on. In theory that could be done for all kinds of CPF variations. We should have a single checkbox for "use most portable CPF formats" although I suspect the best data exchange format is not XML, but SQlite, SQL INSERT statements, or json.
* Does our schema track who edited a record? If not, we should add user_id to the version table.
* We should review the schema and create rules that human beings follow for primary key names, and foreign key names. Two options are table_id and table_id_fk. There may be a better option
* Option 1 is nice for people doing "join on", but I find "join on" syntax confusing, especially when combined with a where clause
* Option 2 seems more natural, especially when there is a where clause. Having different field names means that field names are more likely to be unique, thus not requiring explicit table.field syntax. Option 2a will be used most of the time. Option 2b shows a three table join where table.field syntax is required.
```
-- 1
select * from foo as a, bar as b where a.table_id=b.table_id;
-- 2a
select * from foo as a, bar as b where table_id=table_id_fk;
-- or 2b
select * from foo as a, bar as b, baz as c where a.table_id=b.table_id_fk and a.table_id=c.table_id_fk;
```
* Identity Reconciliation has a planned "why" feature to report on why a match matches. It would be nice to also report the negative: why didn't this match X? I'm not even sure that is possible, but something to think about.
#### Help Links
* [Git and Gitlab](Help/Git-and-Gitlab.md)
* [Markdown](Help/Markdown.md)
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment