Human Readable Glossary Generator Tool (HRGT)
The Human Readable Glossary Generator Tool (HRGT) is a TEv2 text conversion tool that takes files that contain so-called MRGRefs as inputs, and that outputs (a copy of) these files in which these MRGRefs are converted into hrg-lists, i.e. lists of alphabetically sorted HRG entries that can be further processed by tools such as the TRRT, as well as rendering tools such as GitHub pages, Docusaurus, etc.
While MRGRefs have a default syntax, alternative syntaxes can be used by choosing (or specifying) the interpreter that the HRGT should be using.
Hrg-lists do not have a default structure, but there are various [predefined converters](#predefined-converters) (and [sorters](#predefined-sorters)) that can be chosen (or specified) for the HRGT to use.Installing the Tool
The tool can be installed from the command line and made globally available by executing
npm install -g @tno-terminology-design/hrgt
Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.
- CMD.exe (Windows)
- PowerShell(Windows)
- Bash (Linux/Mac)
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If you have installed the package globally, confirm the global NPM modules path by running
npm config get prefix
. The global modules are usually stored under<prefix>/node_modules
. - Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be
<prefix>
on Windows. To add to PATH, you can edit your environment variables or runset PATH=%PATH%;<prefix>
in the CMD.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: Check the global NPM modules path as in CMD.
- Environment Variables: Update the PATH environment variable as in CMD. You can also use
$env:Path += ";<prefix>"
to update the PATH temporarily in the current PowerShell session.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If globally installed, run
npm config get prefix
to get the global modules path, usually<prefix>/lib/node_modules
. - Environment Variables: Add the
<prefix>/bin
directory to yourPATH
if it's not already. You can do this by addingexport PATH=$PATH:<prefix>/bin
to your~/.bashrc
or~/.zshrc
file.
Calling the Tool
The behavior of the TRRT can be configured per call e.g. by a configuration file and/or command line parameters. The command line syntax is as follows:
hrgt [ <paramlist> ] [ <globpattern> ]
where:
<paramlist>
is an (optional) list of parameters, as specified in the table below.globpattern
specifies a set of (input) files that are to be processed. The set of input files must be specified either on the command-line or in a configuration file that is specified with the-c
option.
Legend
The columns in the following table are defined as follows:
Parameter
specifies the parameter and further specificationsReq'd
specifies whether (Y
) or not (n
) the field is required to be present when the tool is being called. If required, it MUST either be present in the configuration file, or as a command line parameter.Description
specifies the meaning of theValue
field, and other things you may need to know, e.g. why it is needed, a required syntax, etc.
If a configuration file used, the long version of the parameter must be used (without the preceding --
).
Parameter | Req'd | Description |
---|---|---|
-V , --version | n | output the version number of the tool. |
-h , --help | n | display help for command. |
-c , --config <path> | n | Path (including the filename) of the tool's (YAML) configuration file. |
-s , --scopedir <path> | Y | Path of the scope directory where the SAF is located. |
-o , --output <dir> | Y | (Root) directory for output files to be written. |
-f , --force | n | Allow overwriting of existing files. |
--int , --interpreter <type> or <regex> | n | Specifies the interpreter to be used to detect MRGRefs. This can either be a predefined interpreter, or a regex. |
--con[n] , --converter[n] <type> or <hexpr> 1 | n | Specifies the converter to be used to produce HRG lists. This can either be a predefined converter, or a handlebars template. See HRGT Converters for details. |
--con[error] , --converter[error] <type> or <hexpr> | n | Specifies the converter to be used to replace the MRGRef with in case the associated MRG file could not be found. |
--sort , --sorter <type> or <hexpr> | n | Specifies the value to be used to sort HRG lists. This can either be a predefined value, or a handlebars template. |
-e , --onNotExist <action> | n | The action in case a vsntag was specified, but wasn't found in the SAF. |
The -d
(--debug
) option needs to be added when implemented.
Note that this option should work the same for all tools, which currently is certainly not the case.
`-e`, `--onNotExist` Actions
<action> | Description |
---|---|
'throw' | an error is thrown (an exception is raised), and processing will stop. |
'warn' | a message is displayed (and logged) and processing continues. |
'log' | a message is written to a log(file) and processing continues. |
'ignore' | processing continues as if nothing happened. |
HRG Generation
All text conversion tools, including the HRGT, convert (input) text files into results (output text files) by locating particular text patterns, doing some processing, and constructing texts that are used to replace the located text patterns with. This is illustrated in the figure below, and further explained in the page TEv2 Text Conversion:
Figure 1: The (generic) parts of a Text Conversion
The following subsections specify the particulars of the HRGT: the interpreter profile, its predefined interpreters, the intermediate processing, the construction of its converter profile, its predefined converters, and the predefined sorters.
HRGT Interpreter Profile
The interpreter profile of the HRGT consists of the following named capturing groups that are to be populated by the predefined interpreters, as well as by any custom interpreter.
Legend
Group
name of the capturing group;Req'd
specifies whether (Y
) or not (n
, orF
) the group is required to have non-empty contents. TheF
means that we reserve this field for Future Use.Description
specifies the meaning (purpose) for which the contents of the capturing group will be used.
Group | Req'd | Description |
---|---|---|
hrg | n | A terminology-identifier that specifies the MRG for which a HRG is to be generated. |
converter | n | Specifies the converter to be used to produce HRG entries. This can either be a predefined converter, or a handlebars template. See HRG Converters for details. |
sorter | n | Specifies the sorter to be used for sorting the HRG list. This can either be a predefined sorter, or a handlebars template. See HRG Sorters for details. |
Note that the values of these specified capturing groups will be regularized before they are used for processing.
When the hrg
identifies a terminology that is not part of the current scope, the TermRefs that appear in the generated glossary must be resolved using the MRG that contains the identified terminology. Currently, the tools do not support a mechanism for doing this.
HRGT Predefined Interpreters
The HRGT has only one predefined interpreter, which is called default
, the syntax of which is:
{% hrg="<hrg>" converter="<converter>" sorter="<sorter>" %}
where:
hrg
(optional) is a terminology-identifier that specifies the MRG for which a HRG is to be generated.converter
(optional) specifies the converter to be used to produce the HRG list. This can either be a predefined converter, or a handlebars template. See HRG Converters for details.sorter
(optional) specifies the sorter to be used for sorting the HRG list. This can either be a predefined sorter, or a handlebars template. See HRG Sorters for details.
If you specify a converter as part of this syntax, that's the only one that will be used.
There is currently no provision for having multiple converters in this syntax
If you want multiple converters, you MUST use the commandline or a configuration file.
For completeness, here is the [regex] that defines the default
interpreter for the HRGT.
{%\s*hrg="(?<hrg>[^"]*)"\s*(?:converter="(?<converter>[^"]*)"\s*)?(?:sorter="(?<sorter>[^"]*)"\s*)?%}
Interpreter Customization
It may happen that an interpreter is needed that is not predefined. Fortunately, interpreters can be added by specifying a regex that populates the named capturing groups as specified by the interpreter profile.
This regex can be specified:
- as a command-line argument
- in (the
hrgt
-section of) the configuration file that the HRGT is specified to use.
In future, it may become possible to specify interpreters and converters in the SAF, in whcih case they will have a name that will then be required for identifying an interpreter or converter of one's choice.
Processing
The purpose of the HRGT is to allow source texts to contain MRGRefs that are to be converted into hrg-lists.
To do that, the HRGT uses the specified interpreter to locate subsequent MRGRefs in its input files, and replaces it with a HRG that is constructed as follows:
- For each MRGRef, the associated MRG is located, and an empty hrg list is created.
- Then, for every MRG entry in the MRG:
- a corresponding converter profile object is created
- this converter profile object is fed to all converter that the HRGT was configured to use. If the result of a converter does not produce an empty string, that result is added as a HRG entry to the hrg list.
- Finally, the HRG entries in the [hrg list] are sorted according to the sorter that is specified.
Finding the MRG associated with an MRGRef
The MRG file associated with an MRGRef is found by resolving the terminology identifier that is specified in the named capturing group hrg
, which leads to a valid scopetag and versiontag.
Since all MRGs follow the MRG naming conventions, it follows that the MRG that corresponds with a terminology identifier is in the file mrg.<scopetag>.<versiontag>.yaml
in the glossarydir of the current scope.
If that file does not exist, the converter that was specified in the argument con[error]
(or converter[error]
) is executed, and will replace the MRGRef. This ensures that readers can be provided with an adequate error message, or whatever else the curators find useful to replace the MRGRef with.
Using Converters
The HRGT requires its users to specify at least one converter, yet allows it to specify multiple ones.
Converters can be specified - on the command-line, using the `--con` or `--converter` option (see Calling the Tool) - in the [configuration file](/docs/specs/files/configuration-file) that is specified with the `-c` option on the command line.h - as part of the MRGRef syntax.If no converters can be found, a default will be provided
You can also specify an error converter, which is a regular converter that will be used to produce output when a MRGRef could not be resolved to an (existing) MRG (file).
You can specify an error converter using the -con[error]
option. If you do, that means you get additional output (as specified by the converter) when an error occurs.
Using Multiple Converters
Note that currently, configuring the HRGT to use multiple converters can onlybe done through the command-line or a configuration file.
When specifying multiple converters they should be numbered, e.g., as in converter[1]
, [converter[2]
, etc. The section on calling parameters tells you how such converters are to be specified. The ordering/numbering of such converters is irrelevant for the HRGT, as in the end, the HRG entries that they provide as their results, are being sorted.
Having multiple converters allows one to create multiple HRG entries for a single MRG entry. This can be usefule, e.g., if the MRG entry specifies aliases, or abbreviations. In such cases, one converter can create an HRG entry for the MRG entry itself, and another one can create an HRG entry for the alias or abbreviation, and refer to the actual entry.
Example: a converter that adds abbreviations
The following converter adds an HRG entry for an abbreviation. The HRG entry assumes that the HRG will be formatted as a markdown table.
"{{#if glossaryAbbr}}| {{glossaryAbbr}} | <a href="/tev2-specifications/docs/terms/termid" title="Termid: a text of the form `<termType>:<term>` that serves as an unambiguous identifier for a specific Semantic Unit in some given Scope.">{{glossaryTerm}}</a> |\n{{/if}}"
Error Converter
Whenever a MRGRef is identified, yet cannot be resolved to an MRG (file), this results in an error message being logged. Also, the MRGRef itself isn't changed.
Here is an examples for a converter that adds a line to the log that the tool produces:
"{{log 'HRGT error converter:' err.dir '/' err.file '@' err.line ':' err.pos ' - Cannot find a corresponding MRG' level='warn'}}"
Sorting the HRG list
The HRG list contains elements that are assocated with one MRG entry, one HRG entry, and one value that is used for sorting. This value is the result from evaluating (the handlebars template specified by) the sorter, using moustache variables that come from the converter profile of the HRGT. See HRG Sorters for details.
HRGT Converter Profile
The converter profile of the HRGT specifies the structure of the data objects that its converters can use, i.e., in which its converters find the actual (context dependent) data that they need to produce a HRG entry.
The HRGT converter profile contains the specifications for the data object that the converters used by the HRGT can use. This says that such converters know:
- the interpreter that is used (i.e., its name as well as the regex) that finds the MRGRef.
- the values of each of the named capturing groups, as defined by the HRGT interpreter profile, and populated by that interpreter.
- all fields in the MRG entry for which the converter is being called.
- all fields from the terminology section of the mrg from which that MRG entry was taken.
- various fields that can be used to construct logging/error messages, such as the filename, linenumber etc. of the MRGRef.
HRGT Predefined Converters
The following tabs specify the predefined converters for the TRRT.
- markdown-table-row
- markdown-section-2
- markdown-section-3
The markdown-table-row
converter is defined by the following handlebars template.
| [{{#if glossaryTerm}}{{glossaryTerm}}{{else}}{{capFirst term}}{{/if}}]({{localize navurl}}) | {{#if glossaryText}}{{glossaryText}}{{else}}no `glossaryText` was specified for this entry.{{/if}} |\n
The markdown-section-2
converter is defined by the following handlebars template.
## [{{#if glossaryTerm}}{{glossaryTerm}}{{else}}{{capFirst term}}{{/if}}]({{localize navurl}})\n\n{{#if glossaryText}}{{glossaryText}}{{else}}no `glossaryText` was specified for this entry.{{/if}}\n\n
The markdown-section-3
converter is defined by the following handlebars template.
### [{{#if glossaryTerm}}{{glossaryTerm}}{{else}}{{capFirst term}}{{/if}}]({{localize navurl}})\n\n{{#if glossaryText}}{{glossaryText}}{{else}}no `glossaryText` was specified for this entry.{{/if}}\n\n
This converter uses the following functions.
localize
: converts the URL of its argument (i.e.,navurl
) with a (shorter) version by removing the protocol and host parts in case the resource is located on the same site.capFirst
: capitalizes the first character of every word found in its argument.
Converter Customization
It may happen that HRG entries must be formatted in a way that hasn't been foreseen, requiring a converter that wasn't predefined. Fortunately, converters can be added by specifying a handlebars template that constructs the appropriate HRG Entries from the values that are available in the converter profile of the HRGT. Examples of what such converters might look like are given in the section about predefined converters
A converter can be specified:
- as a command-line argument
- in (the
hrgt
-section of) the configuration file that the HRGT is specified to use.
In future, it may become possible to specify interpreters, converters and sorters in the SAF, in whcih case they will have a name that will then be required for identifying an interpreter, converter, or sorter of one's choice.
curl -H "Accept: application/vnd.github+json" https://api.github.com/repos/:tno-terminology-design/:tev2-tools/issues/events
HRGT Predefined Sorters
A HRG is a sorted list of HRG entries, where sorting can be done in various ways. By default (i.e. when the sort
option isn't specified), this is done as specified by the (predefined) default
sorting option.
The predefined sorting options are as follows:
Predefined option | What it does |
---|---|
default | Sorting of HRG entries is done by using the term field of their corresponding MRG entries as sort value. If multiple entries with the same term field contents, exist, these entries are then sorted according to their termType field, making the sort unique.1 |
glossaryterm | Sorting of HRG entries is done by using the glossaryTerm field of their corresponding MRG entries as sort value. If the glossaryTerm field does not exist, sorting value is computed using the default sorting methods. |
Alternatively, you can specify a handlebars template. Every field in the MRG entry that is being converted can be used as a variable. So, specifying --sorter "{{glossaryText}}"
would sort the HRG according to the contents of the glossaryText
field in the MRG entries.
Sorter Customization
It may happen that a sorter is needed that is not predefined. Fortunately, sorters can be added by specifying a handlebars template that constructs a text that will be used as the value (key) on which sorting takes place.
A sorter can be specified:
- as a command-line argument
- in (the
hrgt
-section of) the configuration file that the HRGT is specified to use.
In future, it may become possible to specify interpreters, converters and sorters in the SAF, in whcih case they will have a name that will then be required for identifying an interpreter, converter, or sorter of one's choice.
Example
Suppose that within the current scope:
myterms:test
is the terminology identifier for the terminology that contains definitions for the terms Glossary, Curator and Definition;- the associated MRG for that terminology has been imported, making that terminology available within the current scope;
- we have a regular markdown file, within which we want to embed a markdown table which lists all definitions from that terminology.
The table would then be specified as follows:
| Term | Description |
| :--- | :---------- |
{% hrg="myterms:test" converter="markdown-table-row" %}
When this markdown file is processed by the HRGT, a new file is created where the above text has been converted into the following:
| Term | Description |
| :--- | :---------- |
| Glossary | an alphabetically sorted list of <a href="/tev2-specifications/docs/terms/term" title="Term: a word or phrase (i.e.: text) that is used to represent Concepts.">terms</a> with the (single) meaning it has in (at least) one context. |
| Curator (of a Scope) | a person responsible for curating, managing, and maintaining the <a href="/tev2-specifications/docs/terms/terminology" title="Terminology: a set of Terms that are used within a single Scope to refer to Concepts and other Semantic Units of a single Party (e.g. a Community), enabling Parties to reason and communicate ideas they have about one or more specific topics.">terminologies</a>, to ensure shared understanding among a <a href="https://essif-lab.github.io/framework/docs/terms/community" title="Community: a Party, consisting of at least two different Parties (the members of the Community) that seek to collaborate with each other so that each of them can achieve its individual Objectives more efficiently and/or effectively.">community</a> working together on a particular set of <a href="/framework/docs/terms/objective" title="Objective: Something toward which a Party (its Owner) directs effort (an aim, goal, or end of action).">objectives</a>. |
| Definition | the combination of a <a href="/tev2-specifications/docs/terms/term" title="Term: a word or phrase (i.e.: text) that is used to represent Concepts.">term</a> and a descriptive text, where the <a href="/tev2-specifications/docs/terms/term" title="Term: a word or phrase (i.e.: text) that is used to represent Concepts.">term</a> refers to a <a href="/tev2-specifications/docs/terms/concept" title="Concept: a Semantic Unit that captures the ideas/thoughts behind a classification of Entities (what makes Entities in that class 'the same').">concept</a> or other <a href="/tev2-specifications/docs/terms/semantic-unit" title="Semantic Unit: a basic building block of meaning or representation that exists within the 'mind' of a Party (i.e., in its Knowledge).">semantic unit</a>, and the descriptive text enables a set of <a href="https://essif-lab.github.io/framework/docs/terms/party" title="Party: an Entity that sets its Objectives, maintains its Knowledge, and uses that Knowledge to pursue its Objectives in an autonomous (sovereign) manner. Humans and Organizations are the typical examples.">parties</a> to have the same understanding about that <a href="/tev2-specifications/docs/terms/concept" title="Concept: a Semantic Unit that captures the ideas/thoughts behind a classification of Entities (what makes Entities in that class 'the same').">concept</a>. Ideally, the descriptive text is a <a href="/tev2-specifications/docs/terms/criterion" title="Criterion: (aka: criteria) a text that people can evaluate to base a judgment or decision on.">criterion</a> that such <a href="https://essif-lab.github.io/framework/docs/terms/party" title="Party: an Entity that sets its Objectives, maintains its Knowledge, and uses that Knowledge to pursue its Objectives in an autonomous (sovereign) manner. Humans and Organizations are the typical examples.">parties</a> can use to determine what is, and what is not, an instance (or example) of that <a href="/tev2-specifications/docs/terms/concept" title="Concept: a Semantic Unit that captures the ideas/thoughts behind a classification of Entities (what makes Entities in that class 'the same').">concept</a>. |
Further examples are provided on the Glossary Generation Demo page
Errors and Warnings
The HRGT starts by reading its command line and optional configuration file. If the command line has a key that is also found in the configuration file, the command line key-value pair takes precedence. The resulting set of key-value pairs is tested for proper syntax and validity. Every improper syntax and every invalidity found will be logged. Improper syntax may be e.g. an invalid globpattern. Invalid conditions include non-existing directories or files, lack of write-permissions where needed, etc.
The HRGT logs every error- and/or warning condition that it comes across while processing its configuration file, command line parameters, and input files, in a way that helps tool-operators and document authors to identify and fix such conditions.
Deploying the Tool
The HRGT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.