Commit ac5756a1 by Tom Laudeman

Merge remote-tracking branch 'origin/master' into tom

parents b05d6875 728ec78d
SNAC Cooperative Meeting
ArchivesSpace Integration Discussion Notes
2015-12-02
Discussion leaders: Brad Westbrook and Mike Rush
What can it do for you? What functionality would you like to see?
- “Start reconciliation” button
- Compare ASpace w/ SNAC, create linkages
- Scope to resource or larger set
- Add prompt to create SNAC entry
- Making sure matching happens on the right data
- Accommodating legacy data
- Creation of content
- Identities not in SNAC
- Upload/add to SNAC
- Identities in SNAC
- Download form SNAC
- Local extension/customization
- Controls for process of amending SNAC
- Remember ASpace may be a local source of authority
- Record refresh function
- Local version not intended for SNAC inclusion
- Refresh different sections of EAC record on different schedules or
w/ different permissions
- Working off of central source or off of local cache? Fundamental
issue to resolve
- Spawn EAC bioghist into exported EAD (as an option)
- SNAC discovery interface
- Search additional sources beyond SNAC (part of the SNAC research
interface)
- Feed repository holding info back into SNAC
- Discovery of unknown identities
- Use case: download an entity and all related entities at the start
of a processing project
- Notifications of refresh changes
- Maintain timestamps for record downloads and refreshes
BENEFITS
- Coupling increases sustainability of both ASpace and SNAC
- Efficiences
- Packages workflows
- Shares the work
- Makes relationships easier to manage and share
- Spark uptake of SNAC
- Bigger entity set
- Bigger relation set
- More use of collections
- One less interface for staff users
Scope of Grant
- Central vs. Local copy use cases
- EAD3 export (to include relations)
- Contribution function
- EAC-CPF data model compliance
......@@ -9,6 +9,7 @@ This repository contains all the documentation for the SNAC Web Application and
* [Historical Documentation](Historical Documentation) on previous iterations of SNAC
* Technical [Discussions](Discussion) related to the SNAC project
* [Notes](Notes) from the technical team.
* Database schema diagrams auto-generated by SchemaSpy http://shannonvm.village.virginia.edu/~twl8n/schema_spy_output/
The best place to start is the big, overall [plan](plan.md) document, which describes the process forward with defining requirements and specifications.
......@@ -25,6 +26,7 @@ limited set of conventions to denote headers, lists, URLs and so on. When upload
markdown files are rendered into nicely styled HTML. Tools are available to convert markdown into .doc, .pdf,
LaTex and other formats. For more information on Markdown, see [this guide](Help/Markdown.md).
#### Help Links
* [Git and Gitlab](Help/Git-and-Gitlab.md)
......
# EAC-CPF XML Output Requirements
When generating EAC-CPF out of the database, the system should be able to provide the following pieces of information and follow these guidelines.
## Control/Source Data
Daniel has given the following requirement for sources: "What scholars and archivists want is to be able to say who provided the descriptive data, when, and based on what sources. A description is a constellation of assertions made by one or more agents/editors based on one or more sources. Thus Source Statement should be repeatable, as any given assertion can be revised over time, or a new editor comes along and simply provides corroborating data from a new source without revising the description otherwise."
A source block should be creatable for the tags for descriptive data (first-order data, FOD) in a constellation. There may be multiple blocks for each piece of data, thus they are repeatable blocks. Each block must contain:
* Scholar or archivist responsible for the descriptive assertion (FOD)
* Latest transaction date for the source block (the date the assertion was made or updated)
* Transaction type (was this a new descriptive assertion (FOD) or a revision)
* Source citation (the URI, citation, or source for this assertion (FOD))
* Source data (the text of what the scholar or archivist got out of the source)
* Descriptive rules (any rules that the scholar or activist used to formulate the assertion (FOD) from the source)
* Language and script associated with the assertion (used by the scholar or archivist, or source)
* Note (a human-readable note from the scholar or activist about the assertion (FOD))
\ No newline at end of file
# Codebase Organization
We define the following PHP namespace organization, with mirroring directory structure.
```
\snac
\client
\webui
\workflow
\rest
\testui
\server
\workflow
\identityReconciliation
\dateParser
\nameEntryParser
\dataValidation
\reporting % Wrapper
\authentication % Wrapper for OAuth
\authorization
\database % Database interaction (mainly wrappers)
% possible wrappers for Neo4J and Elastic Search
\exceptions % SNAC exceptions to be thrown on error
\interfaces % interfaces used throughout the codebase
```
A key note here is that the client applications (web ui and rest api) will then need to make internal server API calls over the local server API to handle any back-end requests. This addresses a separation of concerns, keeps the server more secure, and allows for a higher level of scalability.
All endpoints (server, webui, and rest) will share the same codebase, but with an `index.php` that includes the codebase and instantiates and executes the appropriate handler class. Then, we may have:
```
/var/www/snac-www instantiates \snac\client\webui\WebUI
/var/www/snac-api instantiates \snac\client\rest\RestAPI
/var/www/snac-internal instantiates \snac\server\Server
```
### Repo Organization
```
/
/src % containing all snac sourcecode
/snac % ...
/virtualhosts
/www % landing for /var/www/snac-www
/rest % landing for /var/www/snac-api
/internal % landing for /var/www/snac-internal
/test % containing all unit tests (mirrors /src)
/snac % ...
LICENSE % code license
README.md % readme file for the repo
```
......@@ -18,6 +18,12 @@ Source code must match the following style guidelines:
* Class names start with upper-case letters
* Variable and field names start with lower-clase letters
* No underscores allowed in variable names
* Filenames must match the name of the class defined within (exactly)
* Directory structure must mirror the namespace structure (PHP)
## Test-Driven Development
Each class that is written should have matching and appropriate unit tests written. For PHP code, those tests will be executed using the [PHPUnit](https://phpunit.de/index.html) unit testing framework.
## Internal Documentation of Code
......@@ -111,5 +117,3 @@ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
```
# SNAC Control Metadata
Each piece of first-order data in the database may have metadata associated with it. The information the database will keep track of is:
* _source citation_ : The citation associated with this particular first-order data.
* _sub citation_ : The part of the citation associated with this first-order data (i.e. "page 5").
* _source data_ : The text found in the citation related to this first-order data. This is an "as seen" field.
* _descriptive rules_ : The rules associated with creating this first-order data from the source citation.
* _language_ : The language in which the source data was found or user was working.
* _script_ : The script in which the source data was found or user was working.
* _note_ : The note from the user associated with this citation and first-order data.
This specification leads to a SQL database with the following structure:
```
id int primary key default nextval('id_seq'), -- id for Metadata
fk_id int , -- FK to data described
fk_table text , -- table name for FK
version int , -- version from version_history
citation int , -- FK to source table
sub_citation text , -- subcitation information
source_data text , -- source data
descriptive_rules int , -- FK to vocabulary
language int , -- FK to vocabulary
script int , -- FK to vocabulary
note text -- human-readable note
```
This data will be used to build EAC-CPF source control blocks, as defined in [the EAC-CPF Output requirements](/Requirements/EAC-CPF Output.md). The user who made the assertion and created the first-order data and the associated timestamp will be pulled from the `version_history` table through the `version` foreign-key.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"constellation" : {
"dataType": "Constellation",
"id": null,
"version": null,
"ark": null,
"entityType": null,
"otherRecordIDs": [],
"maintenanceStatus": null,
"maintenanceAgency": null,
"maintenanceEvents": [],
"sources": [],
"legalStatuses": [],
"conventionDeclaration": null,
"constellationLanguage": null,
"constellationLanguageCode": null,
"constellationScript": null,
"constellationScriptCode": null,
"language": null,
"languageCode": null,
"script": null,
"scriptCode": null,
"nameEntries": [
{
"dataType": "NameEntry",
"id": null,
"version": null,
"original": "Jefferson, Thomas, 1743-1826.",
"preferenceScore": "99",
"contributors": [
{
"type": "authorizedForm",
"contributor": "VIAF"
}
],
"language": null,
"scriptCode": null,
"useDates": null
}
],
"occupations": [],
"biogHists": [],
"existDates": [],
"existDatesNote": null,
"relations": [],
"resourceRelations": [],
"functions": [],
"places": [],
"subjects": [],
"nationality": null,
"gender": null,
"generalContext": null,
"structureOrGenealogy": null,
"mandate": null
}
}
# Server API
The internal server's API is written and required to be entirely in JSON. Requests to the server will use HTTP PUT requests to a localhost port (set up as a firewalled virtualhost in Apache). This allows future scalability, including separating the intenal server onto a separate machine from the front-facing web and rest interfaces.
## Server Requests
Requests can be tested by the command line using curl, in the form:
```bash
curl -XPUT "http://shannonvm.village.virginia.edu:82" -d '{ "JSON" : "Request"}'
```
Requests should be of the form:
```json
{
"command" : "command_name",
"constellation" : {
"...":"..."
}
}
```
### Request API Definitions
* `command` : the command to issue to the server. The following commands are allowed:
* User-level Commands
* `login` - log in with the given credentials
* `user_info` - get the user information for the given user
* Constellation Management Commands
* `edit` - Edit a constellation. Read the constellation `"id"` from the given constellation object
and return all pieces of the constellation object to the client for editing. Sets the state
of the server to note that this user is editing this constellation.
* `insert` - Insert the given constellation in the database. Returns the full constellation
out of SNAC with ID and version numbers within the Constellation structure.
* `update` - Update a constellation. Read the constellation `"id"` from the given constellation
object and make the requested changes to the constellation. In an update **many** of the non-repeating portions of the constellation object are required. Also, any subcomponent given, such as nameEntry objects, must be given in full. Except for the outer constellation object, any subcomponents must either be given with all fields or left out completely. **Partial components will result in data loss.**
* `search` - Return a list of constellations matching the given constellation. Read all the data available
in the given constellation object and return a list of constellations that match (or are similar)
* `user` : the current user's information
* `token` : the current session token
* `constellation` : all available parts of the constellation over which to enact the command. Certain commands require different amounts of the constellation object to be given. The full specification for the constellation structure may be viewed [here](/Specifications/Server API/Constellation.md).
* Example: to request a constellation, the client may use the portions of the constellation that are known (`id` or `ark_id`)
{
"command" : "get",
"constellation" : {"id" : 12345}
}
## Server Responses
The Server response API has the following form:
```json
{
"timing" : 50,
"request" : {
"Note" : "Entire original request will be returned to the client."
},
"error" : {
"type" : "Type of error",
"message" : "Error message"
},
"constellation" : {
"...":"..."
}
}
```
### Response API Definitions
* `timing` : the time in milliseconds for the server to compute the response
* `request` : the client's entire request to the server, as a sub-element. This is mainly returned for the client's reference.
* `error` : an error message, if the server had a problem handling the request. This object has two subcomponents,
* `type` : the type of the error. A full list of error types are defined [here](http://shannonvm.village.virginia.edu:83).
* `message` : a detailed error message (in English prose) stating the exact nature of the problem
* `constellation` : The matching constellation for the client's query. All portions of the constellation known to the server will be returned to the client. If there are multiple constellations, this will return a list:
{
"constellation" : [
{ "...":"..." },
{ "...":"..." },
{ "...":"..." }
]
}
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment