Integrity Checker Tool (ICT)
The text below will need a thorough revision due to various changes that have not yet been taken into account
Integrity Checker Tool (ICT)
The Integrity Checker Tool (ICT) tests the integrity of (a selection of) the data in files that are curated within a particular scope, i.e. the SAF, the MRGs, and curated texts. The integrity checking of other data, e.g. formatted texts, such as HRGs, is outside the scope of the ICT.
In order for the ICT to be used optimally, it will assume for specific kinds of files that the integrity of other files is guaranteed, as follows:
| When testing a ... , | the integrity is assumed of |
|---|---|
| MRG | SAF |
| curated file | MRG and SAF |
The idea behind this is to enable curators to only test changes they have made rather than testing the entire set of files.
As the tool hasn't been made, and no practical experience has been gained, many of these optimizations may not work in the first versions.
There's a lot of duplication in syntax specs. For example, the SAF spec and MRG spec define the regex for various kinds of tags all over the place. It would be nice to have a way by which syntax can be specified in one location that is 'naturally predictable' so that both readers and maintainers of the documentation can easily find it. One way might be to include the syntax in a 'popover', i.e. that we define stuff with particular syntax as a concept and have the syntax be included in its 'popover text'.
Installing the Tool
The tool can be installed from the command line and made globally available by executing
This section is written when there's an actual tool to install
We expect that it will be something like
npm install -g @tno-terminology-design/ict
Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.
- CMD.exe (Windows)
- PowerShell(Windows)
- Bash (Linux/Mac)
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If you have installed the package globally, confirm the global NPM modules path by running
npm config get prefix. The global modules are usually stored under<prefix>/node_modules. - Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be
<prefix>on Windows. To add to PATH, you can edit your environment variables or runset PATH=%PATH%;<prefix>in the CMD.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: Check the global NPM modules path as in CMD.
- Environment Variables: Update the PATH environment variable as in CMD. You can also use
$env:Path += ";<prefix>"to update the PATH temporarily in the current PowerShell session.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If globally installed, run
npm config get prefixto get the global modules path, usually<prefix>/lib/node_modules. - Environment Variables: Add the
<prefix>/bindirectory to yourPATHif it's not already. You can do this by addingexport PATH=$PATH:<prefix>/binto your~/.bashrcor~/.zshrcfile.
Calling the Tool
The behavior of the ICT can be configured per call e.g. by a configuration file and/or command-line parameters. Examples include specifications for:
The command-line syntax is as follows:
ict [ <scopedir> ] <cmd> [ <paramlist> ]
| Where: | |
|---|---|
<scopedir> | (optional) specifies the scopedir of the scope whose integrity is to be tested. If <scopedir> is omitted and a configuration file is used, its value is read from that file. If cases where <scopedir> isn't specified, the current directory is assumed to be the scopedir.In this document, we use the term "this scopedir" to refer to the value of <scopedir>, and this scope" to refer to the associated scope. |
<cmd> | The following commands are valid:
|
<paramlist> | a list of parameters that provide further specifications for what the ICT should be checking. |
Parameters (Command-line arguments)
The current set of parameters is just an initial suggestion. We'll need to see what will actually be needed in practice.
Legend
The columns in the following table are defined as follows:
Keyis the text to be used as a key.Valuerepresents the kind of value to be used.Req'dspecifies whether (Y) or not (n) the field is required to be present when the tool is being called. If (always) required, it MUST either be present in the configuration file, or as a command-line parameter.Cmdspecifies a<cmd>value: if the ICT is called with this<cmd>, then this parameter will be used by the tool as described. A*in this field indicates that this parameter can be used with every command.Descriptionspecifies the meaning of theValuefield, and other things you may need to know, e.g. why it is needed, a required syntax, etc.
| Key | Value | Req'd | Cmds | Description |
|---|---|---|---|---|
config | <path> | n | * | Path (including the filename) of the tool's (YAML) configuration file. This file contains the default key-value pairs to be used. Allowed keys (and the associated values) are documented in this table. Command-line arguments override key-value pairs specified in the configuration file. This parameter SHOULD NOT appear in the configuration file itself. |
scopedir | <path> | Y | * | Path to the scopedir within which the tool is to operate, i.e.: this scopedir. |
syntax | n | * | This argument has no value. If present, the syntax of all (YAML) fields in the file is checked against their specifications (see e.g. SAF specs, Curated Texts, TermRefs), etc., etc. | |
vsntag | <vsntag> | -mrg | versiontag that is used to select the version of the MRG to be checked. The MRG that is selected will either have <vsntag> as the contents of the field terminology.vsntag, or as an element in the list of terminology.altvsntags. | |
term | <term> | n | -txt | term that identifies a particular curated file. The curated file, whose (front-matter) field term matches this parameter, will be integrity-checked. |
grouptags | <grouptags> | n | -txt | List of grouptags. Every curated file, whose (front-matter) field grouptags has an element that also appears as an element in the <grouptags> list, will be integrity-checked. |
Integrity Checks
The checks that are done on files depend on the kind of file that is being checked. This section lists the tests for the various kinds of files. Every file is assumed to be part of this scope, and reside in the associated scopedir (i.e.: this scopedir).
SAF integrity
The SAF must be a file that contains valid YAML syntax.
The integrity of a SAF requires the following conditions to be satisfied for the key's in the scope section:
scopedirmust point to the directory in which the SAF is stored for public use (i.e. in this scopedir).curatedir, when appended to the value of "scopedir/", must point to the directory that stores the curated files.glossarydirmust point to an existing directory.mrgfilemust be an existing file in directory "scopedir/" (note that an empty terminology is still a terminology that can have an MRG).hrgfilemust be an existing file in directory "scopedir/" (note that an empty terminology is still a terminology that can have a HRG).licensemust be an existing file in the directory pointed to byscopedir.
The integrity of a SAF requires the following conditions to be satisfied for every element in the scopes section:
scopetagsmust be a nonempty list of scopetags.scopedirmust be a valid URL, that points to an existing directory resource.
The integrity of a SAF requires the following conditions to be satisfied for every element in the versions section:
vsntagSHOULD not appear as an element in thealtvsntagsfield of thisversionelement, and it MUST NOT appear in thevsntagoraltvsntagsfields of any other element in theversionssection.altvsntagsmust be a (possibly empty) list of versiontags, each of which SHOULD not appear in thevsntagfield of the element, and MUST NOT appear in thevsntagoraltvsntagsfields of any other element in theversionssection.termselectionmust be a non-empty list of term selection instructions.statusSHOULD be a non-empty field.
MRG integrity
The integrity checking for MRG files assumes that the integrity conditions of a SAF file are satisfied, and that the MRG file itself contains valid YAML syntax.
The integrity checking comprises every (group of) test(s) as specified in this sub-section.
The MRG MUST have sections named terminology, scopes, and entries.
Integrity checks for the terminology section include:
scopedirmust point to the directory in which the SAF is stored for public use (i.e. in this scopedir).vsntagmust be a versiontag that SHOULD not appear as an element in thealtvsntagsfield.altvsntagsmust be a (possibly empty) list of versiontags, none of which appear in thevsntagfield.licensemust be an existing file in the directory pointed to byscopedir.
Integrity checks for the scopes section include:
scopetagsmust be a nonempty list of scopetags.scopedirmust be a valid URL, that points to an existing directory resource other than the scopedir of the current scope. This directory MUST contain a SAF. Do we need an option to test the integrity of such SAFs?
Integrity checks for the entries section consist of one part that is generic for all entries, and another part that depends on the value of the termType field (so that checking of e.g. entries of type concept and of type pattern can have different checks.) The checks that every entry must pass include the following:
scopetagMUST also appear as the value ofterminology.scopetag, or as an element in one of thescopes.scopetagselements.termTypeSHOULD be tbd.grouptagsMUST be a list of grouptag elements.licenseMUST be an existing file in the directory pointed to byscopedir.statusSHOULD match an element in the listscope.statusesof the SAF.locator, if specified, MUST have a readable resource (file) atscopedir/curatedir/locator, wherescopedirandcuratedirare specified in the SAF.navurl, if specified, MUST return an HTML-resource when specified as the URL in a HTTP(S) request methodGETorHEAD.
For specific kinds of MRG entries, the following additional constraints MUST be satisfied:
- Terms
- Concepts
- Relations
- Mental Models
The following constraints MUST hold for MRG entries of type concept:
- if a
glossaryTextcontains a TermRef, then the TermRef SHOULD be resolvable (reference to the term-ref-integrity checks). hoverTextMUST NOT contain any TermRef, nor any other markdown links.
As header fields for term termTypes need to be discussed, we do not yet specify any constraints
Header fields for termType: relation
As relations need to be discussed, we do not yet specify any constraints.
As patterns need to be discussed, we do not yet specify any constraints.
Checks need to be added to ensure congruence between terms and any synonyms that are defined for them. For example, they should have the same value in various fields, e.g., termType, isa (but not glossaryText or synonymOf)
Curated Text integrity
The integrity of any curated text file requires the integrity conditions of the MRG file to be satisfied, as well as the following conditions:
- TBD
Concepts
The integrity of any curated text file that has termType: concept requires the integrity conditions of a curated text file to be satisfied, as well as the following conditions:
- TBD
Patterns
The integrity of any curated text file that has termType: concept requires the integrity conditions of a curated text file to be satisfied, as well as the following conditions:
- TBD
Processing, Errors and Warnings
The ICT starts by reading its command-line and configuration file. If the command-line has a key that is also found in the configuration file, the command-line key-value pair takes precedence. The resulting set of key-value pairs is tested for proper syntax and validity. Every improper syntax and every invalidity found will be logged.
Then, the ICT TBD
The ICT logs every error- and/or warning condition that it comes across while processing its configuration file, command-line parameters, and input files, in a way that helps tool-operators and document authors to identify and fix such conditions.
Deploying the Tool
The ICT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.
Discussion Notes
This section lists the topics that may need further discussion