Skip to main content

Form Phrase

A Form Phrase is a word or phrase that occurs in oral or written texts and that refers to a particular semantic unit, yet is not (necessarily) the term that is used in the definition of that semantic unit. Form phrases can be, e.g., plural forms, possessive extensions, verb-conjugation forms, abbreviations, and other variations.

Examples
TermRefs such as `[party](@)`, `[parties](@)` or `[party(s)](@)` should all refer to the same semantic unit. This is achieved by specifiying "party", "parties", and "party(s)" as form phrases for that semantic unit in the curated text that documents that unit.

Purpose

Form phrases serve as (standardized, human readable) identifiers for semantic units, enabling consistent and unambiguous references across various texts such as manuals, specifications, and guidelines. This is particularly useful (if not vital) in fields where precise terminology is key, ensuring that all stakeholders have a common understanding of the terms used and thereby reducing the potential for misinterpretation or confusion.

Specifying Form Phrases in Curated Texts

Form phrases are to be specified in (the `formPhrases` field of) the header of the curated text that describes the semantic unit to which it refers. Here is an example:
formPhrases: [ "actor", "actors", "actor's", "actor(s)", "human actor", "machine actor" ]

This specifies that whenever a TermRef is being converted by the TRRT, and the showtext or the term parts of that TermRef are any of these formPhrases, then the TermRef refers to the semantic unit that is documented by that curated text.

Note that the specification of a form phrase may include a form-phrase-macro, for which several are predefined. The example below is the equivalent specification as above:

formPhrases: [ "actor{ss}", "human actor", "machine actor" ]

The same varieties can easily be added for the human and machine actors, as follows

formPhrases: [ "actor{ss}", "human actor{ss}", "machine actor{ss}" ]

Form Phrases in MRGs

The MRGT creates MRG entries from the headers of curated texts. When such a header contains form phrases that use macros, these form phrases are

  1. converted into a set of form phrases that no longer contain such macros; this process is called Form Phrase Macro Expansion, and then
  2. each of the form phrases in that set is regularized, the result of which is that tools can easily use them for matching.

Matching Form Phrases

Using (or: matching) form phrases is the process in which for a given word or phrase, it is determined whether or not it refers to a particular semantic unit. This is done, e.g., by the TRRT as it tries to find an MRG entry that corresponds with the showtext field of a TermRef.

This matching process expects the MRG entries in a designated MRG to contain a formPhrases field that is an array of regularized form phrases (that do not contain form phrase macros (any more)).The matching process proceeds as follows:

  1. Regularize the given word or phrase;
  2. Find all MRG entries that have the result an an entry in its formPhrases-field;
  3. If there is a single such an MRG entry, then the text is a form phrase for the semantic unit described by that MRG entry.

It is possible that there is no matching MRG entry.

If multiple MRG entries match, that is an error condition - that should not happen. Such conditions are typically flagged, e.g., as an error by the MRGT, and they need to be resolved.

Guidance for Specifying Form Phrases in Curated Texts

  1. Character Composition: A form phrase is composed of a sequence of characters that may include letters, numbers, and spaces. Spaces are permissible if they are a standard part of the term (e.g., "hard drive").

  2. Limited Special Characters: Generally, a form phrase should not contain special characters like punctuation marks (.,;:!? etc.), except for hyphens, underscores, or other characters if they are an integral part of the term (e.g., "non-refundable", "e-mail").

  3. Case Sensitivity: While a form phrase may include uppercase or lowercase letters, it is typically treated as case-insensitive during the matching process. This ensures that variations in capitalization do not affect the identification of the term.

  4. Adherence to Language Rules: A form phrase should conform to the grammatical and morphological rules of the language it's used in, including correct spelling and, where applicable, pluralization or possessive forms.

  5. Uniqueness within Context: Each form phrase must be unique within its context or domain to avoid ambiguities. It should not overlap with or be a substring of another form phrase within the same set of terms.

  6. Contextual Relevance: The form phrase should be relevant to its context and accurately represent the term or concept it's associated with, aligning with domain-specific terminology and usage.