2. Utility Interface Model¶
Note
A utility interface model (UIM) is a central aspect of cmdse model since it enables all rich-semantic features. manparse is a tool developed along with cmdse to extract such UIM from manpages, see Section 7.
It is defined as follow:
Structured data describing the command line interface capabilities of a utility executable identified by its utility name. The capabilities are defined through:
- a set of synopses of minimum length one, see Section 2.1 ;
- an option description model which is a set of options and their related expressions, see Section 2.2;
- an option scheme, see Section 2.3;
- an optional set of sub-commands, see Section 2.4.
Those are defined for a peculiar version range. The term “utility” is directly borrowed from the POSIX.1-2008 reference[1].
2.1. Synopses¶
POSIX.1-2008 reference[1] defines strictly the syntax of a utility (or command) synopsis:
This standard syntax definition is globally well defined. doclifter[3] author reports a 93% success rate for its manapge to DocBook extractor on a bare Ubuntu install.
2.1.1. POSIX.1-2008 Strict Rules¶
The following syntax rules are non-exhaustive but give a quick overview of the standard:
Options are denoted with hyphen
-
prefixes and separated by blank characters.Optional words are enclosed between square braquets
[]
.Exclusive expressions are denoted with the pipe
|
character.Alternatively, mutually-exclusive options and operands may be listed with multiple synopsis lines. For example:
utility_name -d[-a][-c option_argument][operand…]utility_name[-a][-b][operand…]Repeatable expressions are followed up by ellipsis
…
or three dots character.Names that require substitution could be enclosed in angle-braquets
<>
or embedded with underscore_
characters (non-mandatory).Utilities with many flags generally show all of the individual flags (that do not take option-arguments) grouped, as in:
utility_name [-abcDxyz][-p arg][operand]
Utilities with very complex arguments may be shown as follows:
utility_name [options][operands]
Unless otherwise specified, whenever an operand or option-argument is, or contains, a numeric value, the number is interpreted as a decimal integer.
2.1.2. POSIX.1-2008 Guidance Rules¶
POSIX.1-2008 reference[1] defines guidance rules which shall be implemented.
Guidelines are provided as non-mandatory, but many are implemented in Unix system utilities. This list is non-exhaustive, but reatains rules which might affect the cmdse project:
- G1, 2 Utility names should be between two and nine characters, inclusive, and should include lowercase letters (the lower character classification) and digits only from the portable character set.
- G3 Each option name should be a single alphanumeric character (the alnum character classification) from the portable character set. Multi-digit options should not be allowed.
- G4 All options should be preceded by the ‘-‘ delimiter character.
- G5 One or more options without option-arguments, followed by at most one option that takes an option-argument, should be accepted when grouped behind one
-
delimiter. - G6 Each option and option-argument should be a separate argument, except as noted in Utility Argument Syntax, item (2).
- G8 When multiple option-arguments are specified to follow a single option, they should be presented as a single argument, using comma
,
characters within that argument or blank characters within that argument to separate them. - G9 All options should precede operands on the command line.
- G10 The first
--
argument that is not an option-argument should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the-
character. - G11 The order of different options relative to one another should not matter, unless the options are documented as mutually-exclusive.
- G12 The order of operands may matter and position-related interpretations should be determined on a utility-specific basis.
- G13 For utilities that use operands to represent files to be opened for either reading or writing, the
-
operand should be used to mean only standard input (or standard output when it is clear from context that an output file is being specified) or a file named ‘-‘.
2.1.3. Accepted non-POSIX rules¶
- POSIX guideline G3 must be extended with GNU-style and X-Toolkit style options.
to be continued
2.2. Option Description Model¶
An option description model is a set of option descriptions. The latter is defined as follow:
It is traditionnaly found on linux manual pages in the “OPTIONS” section. Bellow an invented example:
OPTIONS
-h, --help
Display help.
-a, --all
Select all items.
...
2.2.1. Option expressions Variants¶
Three option styles exists in the unix world.
In the Table 2.1, different option expression variants are listed and their corresponding style.
Expression variant
assign. value in “<>”
|
Variant | Description | Style | Prevalence |
---|---|---|---|---|
-o |
POSIX_SHORT_SWITCH |
One-letter option switch | POSIX | Very common |
-opq |
POSIX_GROUPED_SHORT_FLAGS |
One-letter option stack switch. This is equivalent to -o -p -q . |
POSIX | Common |
-o <value> |
POSIX_SHORT_ASSIGNMENT |
One-letter option switch with value assignment | POSIX | Very common |
-o<value> |
POSIX_SHORT_STICKY_VALUE |
One-letter option switch with integer sticky value | POSIX | Common |
-option |
X2LKT_SWITCH |
Long option switch | X Toolkit | Less common |
+option |
X2LKT_REVERSE_SWITCH |
Long option switch reset (xterm(1)) | X Toolkit | Rare |
-option <value> |
X2LKT_IMPLICIT_ASSIGNMENT |
Long option switch with implicit value assignment | X Toolkit | Less common |
-option=<value> |
X2LKT_EXPLICIT_ASSIGNMENT |
Long option switch with explicit value assignment | X Toolkit | Less common |
--option |
GNU_SWITCH |
Long option switch | GNU | Very common |
--option <value> |
GNU_IMPLICIT_ASSIGNMENT |
Long option switch with implicit value assignement | GNU | Very common |
--option=<value> |
GNU_EXPLICIT_ASSIGNMENT |
Long option switch with explicit value assignment | GNU | Very common |
-- |
POSIX_END_OF_OPTIONS |
Signal end of options, i.e. upcoming arguments must be treated as operands[2] | GNU | Common |
option |
HEADLESS_OPTION |
An “old style” option, see tar(1)[4] for an example. | NONE | Very rare |
2.3. Option scheme¶
An option scheme is a set of option expression variants which delimits the option expressions supported by a utility. A list of presets provided by cmdse is shown in Table 2.2.
Preset | Description | Supported option expression variants |
---|---|---|
POSIX-Strict | Option expressions can be can be composed solely with POSIX-styled variants. |
|
Linux-Standard | Option expressions can be of any common GNU or POSIX-styled variants. Very often, one option has either one GNU and one POSIX variant, either one POSIX variant. |
|
Linux-Explicit | Option expressions can be of any common GNU or POSIX-styled variants with implicit assignments. |
|
Linux-Implicit | Option expressions can be of any common GNU or POSIX-styled variants with explicit assignments. |
|
X-Toolkit-Strict | Option expressions can be composed solely with X-Toolkit-styled variants. |
|
X-Toolkit-Standard | Option expressions can be composed solely with X-Toolkit-styled variants and POSIX short. |
|
X-Toolkit-Explicit | Option expressions can be composed solely with X-Toolkit-styled variants and POSIX short. |
|
X-Toolkit-Implicit | Option expressions can be composed solely with X-Toolkit-styled variants and POSIX short. |
|
2.4. Sub-commands¶
to be writen
[1] | (1, 2, 3) See POSIX.1-2008, sec. 12.1, “Utility Conventions” |
[2] | See POSIX.1-2008, sec. 12.1, guideline 10 which states that “The first -- argument that is not an option-argument should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the - character.” This behavior is implemented in a great number of bash builtin commands and unix programs. |
[3] | See Gitlab project |
[4] | Tar “Old Option Style” |