4. Call expressions parsing¶
To add semantics, the call expression parser will try to guess the nature of arguments by converting them into tokens. After givent some context, such tokens will be assembled to metadata, which is comprised of:
- option assignment expressions and a description of left-side and value parts
- standalone option expressions and a description of the underlying option
- command operands and a their description
4.1. Parsing workflow¶
The Fig. 4.1 shows an overview of the different steps involved in call expression parsing. Those steps are grouped into higher-level steps (A, B, C). The core of call expression parsing is done in B through tokenization (see Section 4.2 for a better understanding on token typing). But some static bash analysis must be done upstream (A, see Section 3.4 for more details about this step). After parsing, the call expression must be assembled to form a metadata structure (C).
4.2. Tokenization¶
The first step consists in creating a list of tokens that maps the command arguments (Fig. 4.1, item B.1). The token types will be updated thanks to basic inference rules and command meta-information. These token types are first assigned to “context-free” tokens (see Table 4.1 for a listing). “Context-free” means that their nature can be captured without the need for information about their siblings or position, and is therefore trivial.
In a second step (Fig. 4.1, item B.3), token types are assigned to “semantic token type” values (Table 4.3) given some inference rules and information extracted from the utility interface model (UIM, Fig. 4.1, item B.2). The underlying algorithm is described in details in Section 4.4.
When semantic type cannot be inferred, a prompt to the user is processed (Fig. 4.1, item B.4).
4.2.1. Context-free tokens typings¶
The Table 4.1 shows a list of the context-free token types. In the last column, a list of semantic type candidates is provided. This list shows which semantic types this context-free type can be transformed to. Some of these context-free token types overlap semantic token types, because they have only one semantic candidate (resolved to self). They are considered “non-ambiguous” and don’t need further transformation.
Context-free token type | Is option flag? | Examples
given in
brackets “[]”
|
Semantic type candidates |
---|---|---|---|
POSIX_SHORT_STICKY_VALUE |
yes | [-o<int-value>] |
self |
GNU_EXPLICIT_ASSIGNMENT |
yes | [--option=<value>] |
self |
X2LKT_EXPLICIT_ASSIGNMENT |
yes | [-option=<value>] |
self |
X2LKT_REVERSE_SWITCH |
yes | [+option] |
self |
POSIX_END_OF_OPTIONS |
yes | [--] |
self |
ONE_DASH_LETTER |
yes | [-o] <value> [-o] |
|
ONE_DASH_WORD_ALPHANUM |
yes | [-opq]` [-option]` |
|
ONE_DASH_WORD |
yes | [-long-option] [-long-option] <value> |
|
TWO_DASH_WORD |
yes | [--option] |
|
OPT_WORD |
no[1] | -o [<value>] --option [<value>] -option [<value>] option |
|
WORD |
no | ls [~/] -o /some/file --option /some/files -option /some/file |
|
4.2.2. Semantic tokens typings¶
Note
See the Section 2.2.1 for details on the existing option expression styles from which a majority of those semantic token types are derived.
The Table 4.3 shows a list of the semantic token types. Those types have a positional model (Table 4.2) from which rules can be inferred.
For example of such inferences, in the call expression find . -type file
, “file” would be a token which positional model is OPT_IMPLICIT_ASSIGNMENT_VALUE
and type X2LKT_IMPLICIT_ASSIGNMENT_VALUE
and “-type” a OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE
of type X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE
.
Positionnal model name | Description | Binding | is
“option part”
|
is
“option flag”
|
is
“semantic”
|
---|---|---|---|---|---|
OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE |
The left side of an implicit option assignment in the form left-side <value> . |
right | yes | yes | yes |
OPT_IMPLICIT_ASSIGNMENT_VALUE |
The right side of an implicit option assignment in the form left-side <value> . |
left | yes | no | yes |
STANDALONE_OPT_ASSIGNMENT |
A token option with value assignment. | none | yes | yes | yes |
OPT_SWITCH |
An option switch, that is without value. | none | yes | yes | yes |
COMMAND_OPERAND |
A command operand. | none | no | no | yes |
UNSET |
Positional model unset. | inferred | inferred | inferred | false |
In the Table 4.2, the first 5 models are applicable for semantic token types, while the latest is applicable for context-free types. The attributes of the latest are dynamically inferred regarding the set of semantic candidates associated with a token instance. For example, if a context-free type has semantic candidates which positionnal model all have is “option part” set to true, it will infer the attribute to true.
Semantic token type | Example, given in brackets, “[]”
|
Positional model
|
---|---|---|
X2LKT_REVERSE_SWITCH |
[+option] |
OPT_SWITCH |
POSIX_SHORT_SWITCH |
[-o] |
OPT_SWITCH |
POSIX_GROUPED_SHORT_FLAGS |
[-opq] |
OPT_SWITCH |
POSIX_SHORT_ASSIGNMENT_LEFT_SIDE |
[-o] <value> |
OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE |
POSIX_SHORT_ASSIGNMENT_VALUE |
-o [<value>] |
OPT_IMPLICIT_ASSIGNMENT_VALUE |
POSIX_SHORT_STICKY_VALUE |
[-o<value>] |
STANDALONE_OPT_ASSIGNMENT |
X2LKT_SWITCH |
[-option] |
OPT_SWITCH |
X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE |
[-option] <value> |
OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE |
X2LKT_IMPLICIT_ASSIGNMENT_VALUE |
-option [<value>] |
OPT_IMPLICIT_ASSIGNMENT_VALUE |
X2LKT_EXPLICIT_ASSIGNMENT |
[-option=<value>] |
STANDALONE_OPT_ASSIGNMENT |
GNU_SWITCH |
--option |
OPT_SWITCH |
GNU_IMPLICIT_ASSIGNMENT_LEFT_SIDE |
[--option] <value> |
OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE |
GNU_IMPLICIT_ASSIGNMENT_VALUE |
--option [<value>] |
OPT_IMPLICIT_ASSIGNMENT_VALUE |
GNU_EXPLICIT_ASSIGNMENT |
[--option=<value>] |
STANDALONE_OPT_ASSIGNMENT |
POSIX_END_OF_OPTIONS |
[--] |
OPT_SWITCH |
OPERAND |
[<operand>] |
COMMAND_OPERAND |
HEADLESS_OPTION |
[option] |
OPT_SWITCH |
4.3. Analytic Model¶
4.4. Option parsing algorithm¶
This section offers an in-depth look at tokenization (B) step from Fig. 4.1. The parser will hold in memory a list of tokens (Fig. 4.2). Each of these starts with a context-free type. The parser’s job is considered done when all tokens hold a semantic type. To get there, it will proceed with the following steps :
Initiate the token list with the result of mapping arguments to context-free token generation.
Fetch the utility interface model (UIM) if it exists.
Provide the list and the UIM as arguments of the parse function (Fig. 4.3). Such function will do the following:
Check for the existence of an
POSIX_END_OF_OPTIONS
typed token (Fig. 4.4) and convert to operands all remaining tokens to the right.Repeat the following operation until the last two operations didn’t turn out to at least one context-free to semantic conversion:
For each non-semantic token, inferRight (Fig. 4.5) and inferLeft (Fig. 4.6). Those functions will try to infer the semantic type by checking its siblings’. For example, if the left sibling token type is
X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE
, the only possible type for this token would beX2LKT_IMPLICIT_ASSIGNMENT_VALUE
. If the token type is “option part”, use the option descriptions from the UIM to try an exact match (Fig. 4.8). For example, the token is--reverse
, and the utility interface model contains an option description that exactly match--reverse
. If no exact match is found, check for a pattern match with the option scheme (Fig. 4.9). For example, if the token-pq
is encountered, and the program option scheme is “Linux-Standard-Explicit” (see Table 2.2), the only possible mapping forONE_DASH_WORD
will bePOSIX_GROUPED_SHORT_FLAGS
. Finally, increment conversions if the token type “is semantic”.
Until all tokens are of “semantic” type, prompt the user for a token type annotation and loop back at 3.2.
4.5. Edge cases and extension perspectives¶
Some argument constructs must be anticipated, so here is a list of problematic examples to open to further enhancements:
- How to model restricted operands such as in dd(1)? Although they look like headless options, dd operands are “typed”.
- How to model commands which operands can be another command, such as find -exec <command> {} ; ?
[1] | Although HEADLESS_OPTION is an option, it is very rare and should only be matched when defined in a utility interface model, or reviewed by the user. So, by default we assume a WORD is not an option. |