.. _call-expression-parsing: ######################## Call expressions parsing ######################## To add semantics, the :term:`call expression` parser will try to guess the nature of arguments by converting them into tokens. After givent some context, such tokens will be assembled to metadata, which is comprised of: - option assignment expressions and a description of left-side and value parts - standalone option expressions and a description of the underlying option - command operands and a their description Parsing workflow ################ The :numref:`call-expression-process-flow` shows an overview of the different steps involved in call expression parsing. Those steps are grouped into higher-level steps (*A*, *B*, *C*). The core of call expression parsing is done in *B* through tokenization (see :numref:`token-typings` for a better understanding on token typing). But some static bash analysis must be done upstream (*A*, see :numref:`call-expression-structure` for more details about this step). After parsing, the call expression must be assembled to form a metadata structure (*C*). .. _call-expression-process-flow: .. uml:: /diagrams/call-expr-process.puml :caption: Call expression parsing dataflow :align: center :width: 500 .. _token-typings: Tokenization ############ The first step consists in creating a list of tokens that maps the command arguments (:numref:`call-expression-process-flow`, *item B.1*). The token types will be updated thanks to basic inference rules and command meta-information. These token types are first assigned to "context-free" tokens (see :numref:`context-free-tokens` for a listing). "Context-free" means that their nature can be captured without the need for information about their siblings or position, and is therefore trivial. In a second step (:numref:`call-expression-process-flow`, *item B.3*), token types are assigned to "semantic token type" values (:numref:`semantic-token-properties`) given some inference rules and information extracted from the :term:`utility interface model` (UIM, :numref:`call-expression-process-flow`, *item B.2*). The underlying algorithm is described in details in :numref:`option-parsing-algorithm`. When semantic type cannot be inferred, a prompt to the user is processed (:numref:`call-expression-process-flow`, *item B.4*). Context-free tokens typings =========================== The :numref:`context-free-tokens` shows a list of the context-free token types. In the last column, a list of semantic type candidates is provided. This list shows which semantic types this context-free type can be transformed to. Some of these context-free token types overlap semantic token types, because they have only one semantic candidate (resolved to *self*). They are considered "non-ambiguous" and don't need further transformation. .. _context-free-tokens: .. list-table:: Context-free token types :header-rows: 1 :widths: 40 10 10 40 * - Context-free token type - Is option flag? - | Examples | *given in* | *brackets* "[]" - Semantic type candidates * - ``POSIX_SHORT_STICKY_VALUE`` - yes - ``[-o]`` - *self* * - ``GNU_EXPLICIT_ASSIGNMENT`` - yes - ``[--option=]`` - *self* * - ``X2LKT_EXPLICIT_ASSIGNMENT`` - yes - ``[-option=]`` - *self* * - ``X2LKT_REVERSE_SWITCH`` - yes - ``[+option]`` - *self* * - ``POSIX_END_OF_OPTIONS`` - yes - ``[--]`` - *self* * - ``ONE_DASH_LETTER`` - yes - | ``[-o] `` | ``[-o]`` - * ``POSIX_SHORT_ASSIGNMENT_LEFT_SIDE`` * ``POSIX_SHORT_SWITCH`` * - ``ONE_DASH_WORD_ALPHANUM`` - yes - | ``[-opq]``` | ``[-option]``` - * ``POSIX_GROUPED_SHORT_FLAGS`` * ``X2LKT_SWITCH`` * ``X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE`` * - ``ONE_DASH_WORD`` - yes - | ``[-long-option]`` | ``[-long-option] `` - * ``X2LKT_SWITCH`` * ``X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE`` * - ``TWO_DASH_WORD`` - yes - ``[--option]`` - * ``GNU_SWITCH`` * ``GNU_IMPLICIT_ASSIGNMENT_LEFT_SIDE`` * - ``OPT_WORD`` - no\ [#headless-option-exception]_ - | ``-o []`` | ``--option []`` | ``-option []`` | ``option`` - * ``OPERAND`` * ``POSIX_SHORT_ASSIGNMENT_VALUE`` * ``GNU_IMPLICIT_ASSIGNMENT_VALUE`` * ``X2LKT_IMPLICIT_ASSIGNMENT_VALUE`` * ``HEADLESS_OPTION`` * - ``WORD`` - no - | ``ls [~/]`` | ``-o /some/file`` | ``--option /some/files`` | ``-option /some/file`` - * ``OPERAND`` * ``POSIX_SHORT_ASSIGNMENT_VALUE`` * ``GNU_IMPLICIT_ASSIGNMENT_VALUE`` * ``X2LKT_IMPLICIT_ASSIGNMENT_VALUE`` Semantic tokens typings ======================= .. note:: See the :numref:`option-expression-syntax` for details on the existing option expression styles from which a majority of those semantic token types are derived. The :numref:`semantic-token-properties` shows a list of the semantic token types. Those types have a positional model (:numref:`token-positional-model`) from which rules can be inferred. For example of such inferences, in the :term:`call expression` ``find . -type file``, "file" would be a token which positional model is ``OPT_IMPLICIT_ASSIGNMENT_VALUE`` and type ``X2LKT_IMPLICIT_ASSIGNMENT_VALUE`` and "-type" a ``OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE`` of type ``X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE``. .. _token-positional-model: .. list-table:: Token positional model :header-rows: 1 :widths: 20 40 10 10 10 10 * - Positionnal model name - Description - Binding - | is | "option part" - | is | "option flag" - | is | "semantic" * - ``OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE`` - The left side of an implicit option assignment in the form ``left-side ``. - *right* - *yes* - *yes* - *yes* * - ``OPT_IMPLICIT_ASSIGNMENT_VALUE`` - The right side of an implicit option assignment in the form ``left-side ``. - *left* - *yes* - *no* - *yes* * - ``STANDALONE_OPT_ASSIGNMENT`` - A token option with value assignment. - *none* - *yes* - *yes* - *yes* * - ``OPT_SWITCH`` - An option switch, that is without value. - *none* - *yes* - *yes* - *yes* * - ``COMMAND_OPERAND`` - A command operand. - *none* - *no* - *no* - *yes* * - ``UNSET`` - Positional model unset. - *inferred* - *inferred* - *inferred* - *false* In the :numref:`token-positional-model`, the first 5 models are applicable for semantic token types, while the latest is applicable for context-free types. The attributes of the latest are dynamically inferred regarding the set of semantic candidates associated with a token instance. For example, if a context-free type has semantic candidates which positionnal model all have is "option part" set to true, it will infer the attribute to true. .. _semantic-token-properties: .. list-table:: Semantic token types :header-rows: 1 :widths: 10 10 10 * - Semantic token type - | Example, *given in brackets*, "[]" - | Positional model * - ``X2LKT_REVERSE_SWITCH`` - ``[+option]`` - ``OPT_SWITCH`` * - ``POSIX_SHORT_SWITCH`` - ``[-o]`` - ``OPT_SWITCH`` * - ``POSIX_GROUPED_SHORT_FLAGS`` - ``[-opq]`` - ``OPT_SWITCH`` * - ``POSIX_SHORT_ASSIGNMENT_LEFT_SIDE`` - ``[-o] `` - ``OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE`` * - ``POSIX_SHORT_ASSIGNMENT_VALUE`` - ``-o []`` - ``OPT_IMPLICIT_ASSIGNMENT_VALUE`` * - ``POSIX_SHORT_STICKY_VALUE`` - ``[-o]`` - ``STANDALONE_OPT_ASSIGNMENT`` * - ``X2LKT_SWITCH`` - ``[-option]`` - ``OPT_SWITCH`` * - ``X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE`` - ``[-option] `` - ``OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE`` * - ``X2LKT_IMPLICIT_ASSIGNMENT_VALUE`` - ``-option []`` - ``OPT_IMPLICIT_ASSIGNMENT_VALUE`` * - ``X2LKT_EXPLICIT_ASSIGNMENT`` - ``[-option=]`` - ``STANDALONE_OPT_ASSIGNMENT`` * - ``GNU_SWITCH`` - ``--option`` - ``OPT_SWITCH`` * - ``GNU_IMPLICIT_ASSIGNMENT_LEFT_SIDE`` - ``[--option] `` - ``OPT_IMPLICIT_ASSIGNMENT_LEFT_SIDE`` * - ``GNU_IMPLICIT_ASSIGNMENT_VALUE`` - ``--option []`` - ``OPT_IMPLICIT_ASSIGNMENT_VALUE`` * - ``GNU_EXPLICIT_ASSIGNMENT`` - ``[--option=]`` - ``STANDALONE_OPT_ASSIGNMENT`` * - ``POSIX_END_OF_OPTIONS`` - ``[--]`` - ``OPT_SWITCH`` * - ``OPERAND`` - ``[]`` - ``COMMAND_OPERAND`` * - ``HEADLESS_OPTION`` - ``[option]`` - ``OPT_SWITCH`` Analytic Model ############## .. _snippet-class-diagram: .. uml:: /diagrams/snippet.puml :align: center :width: 100% .. _option-parsing-algorithm: Option parsing algorithm ######################## This section offers an in-depth look at tokenization (B) step from :numref:`call-expression-process-flow`. The parser will hold in memory a list of tokens (:numref:`snippet-class-diagram`). Each of these starts with a context-free type. The parser's job is considered done when all tokens hold a semantic type. To get there, it will proceed with the following steps : #. Initiate the token list with the result of mapping arguments to context-free token generation. #. Fetch the :term:`utility interface model` (UIM) if it exists. #. Provide the list and the UIM as arguments of the *parse* function (:numref:`algo-parse`). Such function will do the following: #. Check for the existence of an ``POSIX_END_OF_OPTIONS`` typed token (:numref:`algo-check-end-of-options`) and convert to operands all remaining tokens to the right. #. Repeat the following operation until the last two operations didn't turn out to at least one context-free to semantic conversion: For each non-semantic token, *inferRight* (:numref:`algo-infer-right`) and *inferLeft* (:numref:`algo-infer-left`). Those functions will try to infer the semantic type by checking its siblings'. For example, if the left sibling token type is ``X2LKT_IMPLICIT_ASSIGNEMNT_LEFT_SIDE``, the only possible type for this token would be ``X2LKT_IMPLICIT_ASSIGNMENT_VALUE``. If the token type is "option part", use the option descriptions from the UIM to try an exact match (:numref:`algo-match-option-description`). For example, the token is ``--reverse``, and the :term:`utility interface model` contains an option description that exactly match ``--reverse``. If no exact match is found, check for a pattern match with the option scheme (:numref:`algo-reduce-candidates-with-scheme`). For example, if the token ``-pq`` is encountered, and the program :term:`option scheme` is "Linux-Standard-Explicit" (see :numref:`option-schemes`), the only possible mapping for ``ONE_DASH_WORD`` will be ``POSIX_GROUPED_SHORT_FLAGS``. Finally, increment *conversions* if the token type "is semantic". #. Until all tokens are of "semantic" type, prompt the user for a token type annotation and loop back at 3.2. .. _algo-parse: .. figure:: /algorithms/parse.svg :align: left Parse function .. _algo-check-end-of-options: .. figure:: /algorithms/checkEndOfOptions.svg :align: left CheckEndOfOptions function .. _algo-infer-right: .. figure:: /algorithms/inferRight.svg :align: left InferRight function .. _algo-infer-left: .. figure:: /algorithms/inferLeft.svg :align: left InferLeft function .. _algo-convert-to-semantic: .. figure:: /algorithms/convertToSemantic.svg :align: left ConvertToSemantic function .. _algo-match-option-description: .. figure:: /algorithms/matchOptionDescription.svg :align: left MatchOptionDescription function .. _algo-reduce-candidates-with-scheme: .. figure:: /algorithms/reduceCandidatesWithScheme.svg :align: left ReduceCandidatesWithScheme function Edge cases and extension perspectives ##################################### Some argument constructs must be anticipated, so here is a list of problematic examples to open to further enhancements: - How to model restricted operands such as in :linuxman:`dd(1)`? Although they look like headless options, dd operands are "typed". - How to model commands which operands can be another command, such as `find -exec {} \;` ? ---------------------- .. container:: footnotes .. [#headless-option-exception] Although ``HEADLESS_OPTION`` is an option, it is very rare and should only be matched when defined in a :term:`utility interface model`, or reviewed by the user. So, by default we assume a ``WORD`` is not an option.