# CPF SQL schema

## Version implementation

1. each table contains version info.
    1. never update
    2. always insert and insert into version_history
    3. ideally, we don't need to remember to update existing "current" record (trigger?)
2. each table contains `cpf_id` foreign key
3. Current record is where `version=max(version)` or `current<>0`
    1. use trigger as necessary to avoid lots of update on insert
    2. use a view as necessary
    3. maybe simply use where clause and brute force the first implementation
4. table version_history has all version info for all users for all tables
   1. version_history has its own sequence because it is a special table

```PLpgSQL
create table version_history (
    id int primary key default nextval('version_history_seq'),
    user int, -- fk to user.id
    date timestamp, -- now()
)

create table foo_internal (
       id int primary key default nextval('unique_id_seq'),,
       cpf_id int, -- fk to cpf.id
       data text,
       version int -- fk to version_history.id, sequence is unique foreign key 
       current int -- or bool, depending on implementation this field is optional
       );

-- Optional view to simplify queries by not requiring every where clause to use version=max(version)
create view foo as select * from foo_internal where current;

create view foo as select * from foo_internal where valid = true;

-- a view foo_current that is the most current record
create view foo_current as select * from foo_internal where version=max(version);

select * from foo_current where id=1234;

```

Having renamed all the tables to table_internal and created one or more views, any use of the table_internal name tends to make the code un-portable. We need to make sure the views perform as well as the longer, explicit queries.

```PLpgSQL
-- Yikes, using foo_internal means this query breaks if the internal tables have a schema or name
-- change.
select * from foo_internal where id=1234 and version=max(version);

-- Normally, we would use the foo view, not table foo_internal
select * cpf_current as cpf,foo_current as foo where cpf.id=foo.cpf_id;

```

*Caveat* The actual implementation needed to change the setup of `foo_internal`, since in some cases multiple entries could be for the same `cpf_id` (ie name entries, date entries, sources, documents, etc).  So, there are two options: 
    * `(id, version)` primary key in each table, where we get the latest version for each id and join on `cpf_id` for the latest entries for that cpf, or
    * `(cpf_id, version)` foreign key, where a join must also match the current version of the cpf record to get the latest entries for that cpf (note independence on this table's `id` field.
In commit 533fc082fb3c68f7c2bf8edbbace2571c8f963bc, I've chosen the first version of this scheme.

## Splitting of merged records

1. split may result in N new records
2. each field (especially biogHist) might be split, but only via select/cut/copy/paste. Editing of content is not supported.
   1. Edit during split is a later (if ever) feature.
3. In a 2 way split, where originals are A and B, 
   1. we need to show the user the original record A (original fields) and the most current version of each field
   2. ditto original B and current version of each field
   3. each field one checkbox: keep
4. Some web UI is created for splitting, including field subsplit. The UI details by prototyping.
   1. splitting a single field involves select, choose, and a resulting highlight. 
   2. We allow multi-select, multi-choose, and accumulate select/choose sections.
5. Nth phase shows original N plus the current, probably with gray-highlight for previous records. All fields/text are selectable since content is often in several records. 
6. In the database, we create a new cpf record for each split record, and invalidate the merged record.
7. what happens in table merge_history when a merge is split?


## Schema changes to support merge and split

1. add `valid` field to table cpf
   1. change valid to 0/false when the cpf record is merged
2. add table merge_history, all result in multiple records being created
   1. Merge: from=old (singleton) and to=new (merged)
   2. Split from merge: from=old (merged) and to=new (singleton)
   3. Split (original singleton): from=old and to=new (multiple singletons)
3. When splitting merge, need an additional record to link new split with its parent (pre-merge) record. Only 1 generation back.
   1. Does this lineage record need a special field in table split_merge_history? (No.)

```PLpgSQL
create table split_merge_history (
    from_id int, -- fk cpf.id
    to_id int,  -- fk cpf.id
    date timestamp,
);
```