2. Utility Interface Model

Note

A utility interface model (UIM) is a central aspect of cmdse model since it enables all rich-semantic features. manparse is a tool developed along with cmdse to extract such UIM from manpages, see Section 7.

It is defined as follow:

Structured data describing the command line interface capabilities of a utility executable identified by its utility name. The capabilities are defined through:

Those are defined for a peculiar version range. The term “utility” is directly borrowed from the POSIX.1-2008 reference[1].

2.1. Synopses

POSIX.1-2008 reference[1] defines strictly the syntax of a utility (or command) synopsis:

utility_name[-a][-b][-c option_argument][-d|-e][-f[option_argument]][operand…]

This standard syntax definition is globally well defined. doclifter[3] author reports a 93% success rate for its manapge to DocBook extractor on a bare Ubuntu install.

2.1.1. POSIX.1-2008 Strict Rules

The following syntax rules are non-exhaustive but give a quick overview of the standard:

  • Options are denoted with hyphen - prefixes and separated by blank characters.

  • Optional words are enclosed between square braquets [].

  • Exclusive expressions are denoted with the pipe | character.

  • Alternatively, mutually-exclusive options and operands may be listed with multiple synopsis lines. For example:

    utility_name -d[-a][-c option_argument][operand…]
    utility_name[-a][-b][operand…]
  • Repeatable expressions are followed up by ellipsis or three dots character.

  • Names that require substitution could be enclosed in angle-braquets <> or embedded with underscore _ characters (non-mandatory).

  • Utilities with many flags generally show all of the individual flags (that do not take option-arguments) grouped, as in:

    utility_name [-abcDxyz][-p arg][operand]

  • Utilities with very complex arguments may be shown as follows:

    utility_name [options][operands]

  • Unless otherwise specified, whenever an operand or option-argument is, or contains, a numeric value, the number is interpreted as a decimal integer.

2.1.2. POSIX.1-2008 Guidance Rules

POSIX.1-2008 reference[1] defines guidance rules which shall be implemented.

Guidelines are provided as non-mandatory, but many are implemented in Unix system utilities. This list is non-exhaustive, but reatains rules which might affect the cmdse project:

  • G1, 2 Utility names should be between two and nine characters, inclusive, and should include lowercase letters (the lower character classification) and digits only from the portable character set.
  • G3 Each option name should be a single alphanumeric character (the alnum character classification) from the portable character set. Multi-digit options should not be allowed.
  • G4 All options should be preceded by the ‘-‘ delimiter character.
  • G5 One or more options without option-arguments, followed by at most one option that takes an option-argument, should be accepted when grouped behind one - delimiter.
  • G6 Each option and option-argument should be a separate argument, except as noted in Utility Argument Syntax, item (2).
  • G8 When multiple option-arguments are specified to follow a single option, they should be presented as a single argument, using comma , characters within that argument or blank characters within that argument to separate them.
  • G9 All options should precede operands on the command line.
  • G10 The first -- argument that is not an option-argument should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the - character.
  • G11 The order of different options relative to one another should not matter, unless the options are documented as mutually-exclusive.
  • G12 The order of operands may matter and position-related interpretations should be determined on a utility-specific basis.
  • G13 For utilities that use operands to represent files to be opened for either reading or writing, the - operand should be used to mean only standard input (or standard output when it is clear from context that an output file is being specified) or a file named ‘-‘.

2.1.3. Accepted non-POSIX rules

  • POSIX guideline G3 must be extended with GNU-style and X-Toolkit style options.

to be continued

2.2. Option Description Model

An option description model is a set of option descriptions. The latter is defined as follow:

Structured data composed of a description text field and a collection of match models. Each match model is related to an option expression variant and has a one-or-two groups regular expression. When two groups can be matched, the latest is the option parameter of an explicit option assignments.

It is traditionnaly found on linux manual pages in the “OPTIONS” section. Bellow an invented example:

OPTIONS
    -h, --help
        Display help.

    -a, --all
        Select all items.

     ...

2.2.1. Option expressions Variants

Three option styles exists in the unix world.

  1. POSIX Style
  2. GNU Style
  3. X Toolkit Style

In the Table 2.1, different option expression variants are listed and their corresponding style.

Table 2.1 Option expression variants
Expression variant
assign. value in “<>”
Variant Description Style Prevalence
-o POSIX_SHORT_SWITCH One-letter option switch POSIX Very common
-opq POSIX_GROUPED_SHORT_FLAGS One-letter option stack switch. This is equivalent to -o -p -q. POSIX Common
-o <value> POSIX_SHORT_ASSIGNMENT One-letter option switch with value assignment POSIX Very common
-o<value> POSIX_SHORT_STICKY_VALUE One-letter option switch with integer sticky value POSIX Common
-option X2LKT_SWITCH Long option switch X Toolkit Less common
+option X2LKT_REVERSE_SWITCH Long option switch reset (xterm(1)) X Toolkit Rare
-option <value> X2LKT_IMPLICIT_ASSIGNMENT Long option switch with implicit value assignment X Toolkit Less common
-option=<value> X2LKT_EXPLICIT_ASSIGNMENT Long option switch with explicit value assignment X Toolkit Less common
--option GNU_SWITCH Long option switch GNU Very common
--option <value> GNU_IMPLICIT_ASSIGNMENT Long option switch with implicit value assignement GNU Very common
--option=<value> GNU_EXPLICIT_ASSIGNMENT Long option switch with explicit value assignment GNU Very common
-- POSIX_END_OF_OPTIONS Signal end of options, i.e. upcoming arguments must be treated as operands[2] GNU Common
option HEADLESS_OPTION An “old style” option, see tar(1)[4] for an example. NONE Very rare

2.3. Option scheme

An option scheme is a set of option expression variants which delimits the option expressions supported by a utility. A list of presets provided by cmdse is shown in Table 2.2.

Table 2.2 List of option scheme presets
Preset Description Supported option expression variants
POSIX-Strict Option expressions can be can be composed solely with POSIX-styled variants.
  • POSIX_SHORT_SWITCH
  • POSIX_GROUPED_SHORT_FLAGS
  • POSIX_SHORT_ASSIGNMENT
  • POSIX_END_OF_OPTIONS
Linux-Standard Option expressions can be of any common GNU or POSIX-styled variants. Very often, one option has either one GNU and one POSIX variant, either one POSIX variant.
  • POSIX_SHORT_SWITCH
  • POSIX_GROUPED_SHORT_FLAGS
  • POSIX_SHORT_ASSIGNMENT
  • GNU_SWITCH
  • GNU_IMPLICIT_ASSIGNMENT
  • GNU_EXPLICIT_ASSIGNMENT
  • POSIX_END_OF_OPTIONS
Linux-Explicit Option expressions can be of any common GNU or POSIX-styled variants with implicit assignments.
  • POSIX_SHORT_SWITCH
  • POSIX_GROUPED_SHORT_FLAGS
  • POSIX_SHORT_ASSIGNMENT
  • GNU_SWITCH
  • GNU_EXPLICIT_ASSIGNMENT
  • POSIX_END_OF_OPTIONS
Linux-Implicit Option expressions can be of any common GNU or POSIX-styled variants with explicit assignments.
  • POSIX_SHORT_SWITCH
  • POSIX_GROUPED_SHORT_FLAGS
  • POSIX_SHORT_ASSIGNMENT
  • GNU_SWITCH
  • GNU_IMPLICIT_ASSIGNMENT
  • POSIX_END_OF_OPTIONS
X-Toolkit-Strict Option expressions can be composed solely with X-Toolkit-styled variants.
  • X2LKT_SWITCH
  • X2LKT_REVERSE_SWITCH
  • X2LKT_IMPLICIT_ASSIGNMENT
  • X2LKT_EXPLICIT_ASSIGNMENT
  • POSIX_END_OF_OPTIONS
X-Toolkit-Standard Option expressions can be composed solely with X-Toolkit-styled variants and POSIX short.
  • X2LKT_SWITCH
  • X2LKT_REVERSE_SWITCH
  • X2LKT_IMPLICIT_ASSIGNMENT
  • X2LKT_EXPLICIT_ASSIGNMENT
  • POSIX_SHORT_SWITCH
  • POSIX_END_OF_OPTIONS
X-Toolkit-Explicit Option expressions can be composed solely with X-Toolkit-styled variants and POSIX short.
  • X2LKT_SWITCH
  • X2LKT_REVERSE_SWITCH
  • X2LKT_EXPLICIT_ASSIGNMENT
  • POSIX_SHORT_SWITCH
  • POSIX_END_OF_OPTIONS
X-Toolkit-Implicit Option expressions can be composed solely with X-Toolkit-styled variants and POSIX short.
  • X2LKT_SWITCH
  • X2LKT_REVERSE_SWITCH
  • X2LKT_IMPLICIT_ASSIGNMENT
  • POSIX_SHORT_SWITCH
  • POSIX_END_OF_OPTIONS

2.4. Sub-commands

to be writen


[1](1, 2, 3) See POSIX.1-2008, sec. 12.1, “Utility Conventions”
[2]See POSIX.1-2008, sec. 12.1, guideline 10 which states that “The first -- argument that is not an option-argument should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the - character.” This behavior is implemented in a great number of bash builtin commands and unix programs.
[3]See Gitlab project
[4]Tar “Old Option Style”