4. Call expressions parsing¶
To add semantics, the call expression parser will try to guess the nature of arguments by converting them into tokens. After givent some context, such tokens will be assembled to metadata, which is comprised of:
- option assignment expressions and a description of left-side and value parts
- standalone option expressions and a description of the underlying option
- command operands and a their description
4.1. Parsing workflow¶
The Fig. 4.1 shows an overview of the different steps involved in call expression parsing. Those steps are grouped into higher-level steps (A, B, C). The core of call expression parsing is done in B through tokenization (see Section 4.2 for a better understanding on token typing). But some static bash analysis must be done upstream (A, see Section 3.4 for more details about this step). After parsing, the call expression must be assembled to form a metadata structure (C).
Fig. 4.1 Call expression parsing dataflow¶
4.2. Tokenization¶
The first step consists in creating a list of tokens that maps the command arguments (Fig. 4.1, item B.1). The token types will be updated thanks to basic inference rules and command meta-information. These token types are first assigned to “context-free” tokens (see Table 4.1 for a listing). “Context-free” means that their nature can be captured without the need for information about their siblings or position, and is therefore trivial.
In a second step (Fig. 4.1, item B.3), token types are assigned to “semantic token type” values (Table 4.3) given some inference rules and information extracted from the utility interface model (UIM, Fig. 4.1, item B.2). The underlying algorithm is described in details in Section 4.4.
When semantic type cannot be inferred, a prompt to the user is processed (Fig. 4.1, item B.4).
4.2.1. Context-free tokens typings¶
The Table 4.1 shows a list of the context-free token types. In the last column, a list of semantic type candidates is provided. This list shows which semantic types this context-free type can be transformed to. Some of these context-free token types overlap semantic token types, because they have only one semantic candidate (resolved to self). They are considered “non-ambiguous” and don’t need further transformation.
| Context-free token type | Is option flag? | Examples given in brackets “[]” | Semantic type candidates | 
|---|---|---|---|
| POSIX_SHORT_STICKY_VALUE | yes | [-o<int-value>] | self | 
| GNU_EXPLICIT_ASSIGNMENT | yes | [--option=<value>] | self | 
| X2LKT_EXPLICIT_ASSIGNMENT | yes | [-option=<value>] | self | 
| X2LKT_REVERSE_SWITCH | yes | [+option] | self | 
| POSIX_END_OF_OPTIONS | yes | [--] | self | 
| ONE_DASH_LETTER | yes | [-o] <value>[-o] | 
 | 
| ONE_DASH_WORD_ALPHANUM | yes | [-opq]`[-option]` | 
 | 
| ONE_DASH_WORD | yes | [-long-option][-long-option] <value> | 
 | 
| TWO_DASH_WORD | yes | [--option] | 
 | 
| OPT_WORD | no[1] | -o [<value>]--option [<value>]-option [<value>]option | 
 | 
| WORD | no | ls [~/]-o /some/file--option /some/files-option /some/file | 
 | 
4.2.2. Semantic tokens typings¶
Note
See the Section 2.2.1 for details on the existing option expression styles from which a majority of those semantic token types are derived.
The Table 4.3 shows a list of the semantic token types. Those types have a positional model (Table 4.2) from which rules can be inferred.
For example of such inferences, in the call expression find . -type file, “file” would be a token which positional model is OPT_IMPLICIT_ASSIGNMENT_VALUE and type X2LKT_IMPLICIT_ASSIGNMENT_VALUE and “-type” a OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE of type X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE.
| Positionnal model name | Description | Binding | is “option part” | is “option flag” | is “semantic” | 
|---|---|---|---|---|---|
| OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE | The left side of an implicit option assignment in the form left-side <value>. | right | yes | yes | yes | 
| OPT_IMPLICIT_ASSIGNMENT_VALUE | The right side of an implicit option assignment in the form left-side <value>. | left | yes | no | yes | 
| STANDALONE_OPT_ASSIGNMENT | A token option with value assignment. | none | yes | yes | yes | 
| OPT_SWITCH | An option switch, that is without value. | none | yes | yes | yes | 
| COMMAND_OPERAND | A command operand. | none | no | no | yes | 
| UNSET | Positional model unset. | inferred | inferred | inferred | false | 
In the Table 4.2, the first 5 models are applicable for semantic token types, while the latest is applicable for context-free types. The attributes of the latest are dynamically inferred regarding the set of semantic candidates associated with a token instance. For example, if a context-free type has semantic candidates which positionnal model all have is “option part” set to true, it will infer the attribute to true.
| Semantic token type | Example, given in brackets, “[]” | Positional model | 
|---|---|---|
| X2LKT_REVERSE_SWITCH | [+option] | OPT_SWITCH | 
| POSIX_SHORT_SWITCH | [-o] | OPT_SWITCH | 
| POSIX_GROUPED_SHORT_FLAGS | [-opq] | OPT_SWITCH | 
| POSIX_SHORT_ASSIGNMENT_LEFT_SIDE | [-o] <value> | OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE | 
| POSIX_SHORT_ASSIGNMENT_VALUE | -o [<value>] | OPT_IMPLICIT_ASSIGNMENT_VALUE | 
| POSIX_SHORT_STICKY_VALUE | [-o<value>] | STANDALONE_OPT_ASSIGNMENT | 
| X2LKT_SWITCH | [-option] | OPT_SWITCH | 
| X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE | [-option] <value> | OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE | 
| X2LKT_IMPLICIT_ASSIGNMENT_VALUE | -option [<value>] | OPT_IMPLICIT_ASSIGNMENT_VALUE | 
| X2LKT_EXPLICIT_ASSIGNMENT | [-option=<value>] | STANDALONE_OPT_ASSIGNMENT | 
| GNU_SWITCH | --option | OPT_SWITCH | 
| GNU_IMPLICIT_ASSIGNMENT_LEFT_SIDE | [--option] <value> | OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE | 
| GNU_IMPLICIT_ASSIGNMENT_VALUE | --option [<value>] | OPT_IMPLICIT_ASSIGNMENT_VALUE | 
| GNU_EXPLICIT_ASSIGNMENT | [--option=<value>] | STANDALONE_OPT_ASSIGNMENT | 
| POSIX_END_OF_OPTIONS | [--] | OPT_SWITCH | 
| OPERAND | [<operand>] | COMMAND_OPERAND | 
| HEADLESS_OPTION | [option] | OPT_SWITCH | 
4.3. Analytic Model¶
4.4. Option parsing algorithm¶
This section offers an in-depth look at tokenization (B) step from Fig. 4.1. The parser will hold in memory a list of tokens (Fig. 4.2). Each of these starts with a context-free type. The parser’s job is considered done when all tokens hold a semantic type. To get there, it will proceed with the following steps :
- Initiate the token list with the result of mapping arguments to context-free token generation. 
- Fetch the utility interface model (UIM) if it exists. 
- Provide the list and the UIM as arguments of the parse function (Fig. 4.3). Such function will do the following: - Check for the existence of an - POSIX_END_OF_OPTIONStyped token (Fig. 4.4) and convert to operands all remaining tokens to the right.
- Repeat the following operation until the last two operations didn’t turn out to at least one context-free to semantic conversion: - For each non-semantic token, inferRight (Fig. 4.5) and inferLeft (Fig. 4.6). Those functions will try to infer the semantic type by checking its siblings’. For example, if the left sibling token type is - X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE, the only possible type for this token would be- X2LKT_IMPLICIT_ASSIGNMENT_VALUE. If the token type is “option part”, use the option descriptions from the UIM to try an exact match (Fig. 4.8). For example, the token is- --reverse, and the utility interface model contains an option description that exactly match- --reverse. If no exact match is found, check for a pattern match with the option scheme (Fig. 4.9). For example, if the token- -pqis encountered, and the program option scheme is “Linux-Standard-Explicit” (see Table 2.2), the only possible mapping for- ONE_DASH_WORDwill be- POSIX_GROUPED_SHORT_FLAGS. Finally, increment conversions if the token type “is semantic”.
 
- Until all tokens are of “semantic” type, prompt the user for a token type annotation and loop back at 3.2. 
 
Fig. 4.3 Parse function
 
Fig. 4.4 CheckEndOfOptions function
 
Fig. 4.5 InferRight function
 
Fig. 4.6 InferLeft function
 
Fig. 4.7 ConvertToSemantic function
 
Fig. 4.8 MatchOptionDescription function
 
Fig. 4.9 ReduceCandidatesWithScheme function
4.5. Edge cases and extension perspectives¶
Some argument constructs must be anticipated, so here is a list of problematic examples to open to further enhancements:
- How to model restricted operands such as in dd(1)? Although they look like headless options, dd operands are “typed”.
- How to model commands which operands can be another command, such as find -exec <command> {} ; ?
| [1] | Although HEADLESS_OPTIONis an option, it is very rare and should only be matched when defined in a utility interface model, or reviewed by the user. So, by default we assume aWORDis not an option. | 

![@startuml
!include styles.puml
class Program {
  + String projectURL
  + String commandIdentifier
}
class ProgramInterfaceModel {
  Program program
  OptionScheme optionScheme
  OptionDescription[] optionDescriptions
}
enum TokenPositionalModel {
  + Binding binding = 'UNKNOWN' | 'NONE' | 'LEFT' | 'RIGHT'
  + Bool isSemantic
  - Bool isOptionFlag
  - Bool isOptionPart
  ..models..
  OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE
  OPT_IMPLICIT_ASSIGNMENT_VALUE
  STANDALONE_OPT_ASSIGNMENT
  OPT_SWITCH
  COMMAND_OPERAND
  UNSET
}
enum OptionExpressionVariant {
 Regex flagRegex
 TokenType flagType
 Optional<TokenType> valueType
 OptionStyle style = 'POSIX' | 'XTOOLKIT' | 'GNU' | 'NONE'
 .. variants ..
 POSIX_SHORT_SWITCH
 POSIX_GROUPED_SHORT_FLAGS
 POSIX_SHORT_ASSIGNMENT
 POSIX_SHORT_STICKY_VALUE
 X2LKT_SWITCH
 X2LKT_REVERSE_SWITCH
 X2LKT_IMPLICIT_ASSIGNMENT
 X2LKT_EXPLICIT_ASSIGNMENT
 GNU_SWITCH
 GNU_IMPLICIT_ASSIGNMENT
 GNU_EXPLICIT_ASSIGNMENT
 POSIX_END_OF_OPTIONS
 HEADLESS_OPTION
}
class OptionScheme {
  OptionExpressionVariant[] variants
}
class OptionDescription {
  + OptionExpressionVariants[] supportedVariants
  + ValueModel valueModel = 'NONE' | 'OPTIONAL' | 'MANDATORY'
  + String description
  + Optional<TokenType> matchDescription(Token token)
}
class CallExpression {
 + String commandIdentifier
 + String[] arguments
 + String raw
 + LineRange lines
}
class Token {
  + Int argumentPosition
  + TokenType type
  + String value
  + Token boundTo
  + OptionDescription optionDescription
  + TokenType[] semanticCandidates
  + PositionalModel[] posModelCandidates()
  + Bool isOptionFlag()
  + Bool isOptionPart()
  + Bool isBoundToOneOf(Binding[] bindings)
  + Bool isBoundTo(Binding binding)
  + Bool matchOptionDescription(OptionDescription[] options)
  + Bool reduceCandidatesWithScheme(OptionScheme scheme)
}
class CallExpressionMetadata {
  CallExpression callExpression
  OptionExpression[] optionExpressions
  Operand[] operands
  Token[] tokens
}
enum TokenType {
  + PositionalModel posModel
  + Bool isSemantic()
}
enum ContextFreeTokenType {
  + SemanticTokenType[] semanticCandidates
  -----
  .. ContextFree and Semantic ..
  X2LKT_REVERSE_SWITCH
  GNU_EXPLICIT_ASSIGNMENT
  X2LKT_EXPLICIT_ASSIGNMENT
  POSIX_END_OF_OPTIONS
  .. Strictly ContextFree ..
  ONE_DASH_WORD
  ONE_DASH_LETTER
  TWO_DASH_WORD
  WORD
}
note "isOption* is resolved to type.posModel.isOption* \nwhen type.posModel is not UNSET or to true when \n '∀c ∈ {posModelCandidates}, c.isOption* = true', false otherwise.\nSeemingly, isBoundToOneOf is resolved to \n'token.type.posModelbinding.binding ∈ {bindings}'\nwhen posModel is not UNSET, otherwise to\n'{bindings} ∩ {token.posModelCandidates} = {bindings}'." as N2
Token  .. N2
N2 .. TokenPositionalModel
enum SemanticTokenType {
  + OptionExpressionVariant variant
  -----
  .. ContextFree and Semantic ..
  X2LKT_REVERSE_SWITCH
  GNU_EXPLICIT_ASSIGNMENT
  X2LKT_EXPLICIT_ASSIGNMENT
  POSIX_END_OF_OPTIONS
  .. Strictly Semantic ..
  POSIX_SHORT_SWITCH
  POSIX_GROUPED_SHORT_FLAGS
  POSIX_SHORT_ASSIGNMENT_LEFT_SIDE
  POSIX_SHORT_ASSIGNMENT_VALUE
  POSIX_SHORT_STICKY_VALUE
  GNU_IMPLICIT_ASSIGNMENT_LEFT_SIDE
  GNU_IMPLICIT_ASSIGNMENT_VALUE
  X2LKT_SWITCH
  X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE
  X2LKT_IMPLICIT_ASSIGNMENT_VALUE
  OPERAND
  HEADLESS_OPTION
}
class Parser {
  CallExpressionMetadata parse(CallExpression callExpression)
}
TokenType <|-- ContextFreeTokenType
TokenType <|-- SemanticTokenType
OptionDescription o-- ProgramInterfaceModel
OptionExpressionVariant o-- OptionScheme
OptionExpressionVariant o-- OptionDescription
OptionExpressionVariant o--o TokenType
OptionScheme o-- ProgramInterfaceModel
TokenType   "1" *-- "*" Token
TokenPositionalModel *-- TokenType
OptionDescription "?" o-- "*" Token
CallExpression o-- Parser
Token o-- CallExpressionMetadata
CallExpressionMetadata o-- Parser
ProgramInterfaceModel o-- Parser
Program o-- ProgramInterfaceModel
@enduml](../../_images/plantuml-5bb3c245a327b7c2f936ef193c902946d116f0e1.png)