Comparative Function Arguments
Table of Contents
- Introduction
- Description
- Bash
- Background
- Simple Function and Call
- Positional Value
- Default Value
- Named Value
- Repeating Value
- Implicit Value
- Typed Value
- Caller Destructuring
- Author Destructuring
- Variadic Function
- Python
- Background
- Simple Function and Call
- Positional Value
- Default Value
- Named Value
- Repeating Value
- Implicit Value
- Typed Value
- Caller Destructuring
- Author Destructuring
- Variadic Function
- Guile
- Background
- Simple Function and Call
- Positional Value
- Default Value
- Named Value
- Repeating Value
- Implicit Value
- Typed Value
- Caller Destructuring
- Author Destructuring
- Variadic Function
- Rust
- Analysis
Introduction
Programming languages are complex programs that evolve over time with the aim of better serving their users. The most successful programming languages have sophisticated governance systems such as steering committees and RFC-like processes to incorporate feedback from users with on-the-ground knowledge into the language design. These processes embed significant practical knowledge into the programming language.
This paper is a proof of concept for extracting this knowledge from these languages. It examines a selection of programming languages and compares the features they provide around the syntactic specification of function arguments, on both the author and the caller side. It then takes the lessons learned from these disparate languages and creates a unified framework that incorporates these lessons. The purpose is NOT to create a one-size-fits-all "best" solution. Different programming languages are useful in different contexts precisely because they provide different features that suit the context the user is working in. However, the differences between programming languages go far beyond what is necessary for adapting to different contexts. For example, differences in standard library naming conventions, which delimiters to use, and allowable characters in symbol names are cosmetic differences that do not add context-specific value. However, these differences do add mental overhead when programmers context-switch between domains and increase the overall cost of learning to program in different domains. Wouldn't it be nice if we had a core programming language that settled the cosmetic differences but was flexible enough to provide the context-specific features that users benefit from?
Conventions
Footnotes are indicated by an asterisk (*) and can be viewed by hovering your cursor over them. They contain less formal commentary about the paper.
Citations are indicated by an italicized parenthesized nickname, such as (PySource) or (guile-manual). The nicknames give the reader an understanding of where the information is generally coming from while hovering over the citation will provide detailed information for the source and the location in the source where that specific citation is coming from. For example, all citations that say (PySource) refer to the same Python source repository, but each citation will also provide a different filename(s) and line number(s) indicating precisely where the relevant information can be found.
Reproducibility and Transparency
The source code used in example snippets can be found in the
online git repository.
The repository includes files appropriate for use with GNU Guix to reproduce the software
environment I used while creating this paper.
It pins to a specific revision of the main Guix channel so that updates do not interfere
with reproducibility.
Running make shell from the top-level directory launches a pure
shell*
(Here, "pure" means that enviornment conditions such as variables are unset so that
the shell does not use any artifacts from the base operating system by mistake.)
for the user so that environmental factors do not interfere with reproducibility.
AI Use Disclosure
I have asked an AI chatbot to give critical feedback on this paper. I took this feedback and incorporated it into future edits exactly as I would have if a human had given me this feedback. AI has not written even a single punctuation mark in this paper.
The commits I asked for feedback on follow:
- 268261b6a88ffa86263f444879491743d3212cfd
- d73e85529c0c8b93720fcd3e97e49d23997ead99
Features Examined
Some languages provide similar features under different names and sometimes the names that organically grew around a feature is not particularly descriptive. For example, different communities use different terms to refer to the values that shell functions interpret as implying a value. A shell function might recognize a value "--verbose" to mean that a variable named "verbose" should have the value "true". I have heard this mechanism variously referred to as a flag, a switch, or an option. This paper refers to this mechanism as an "implicit value" because the caller uses a name associated with the argument to imply a value which does not appear explicitly. A complete least of terms which describe a feature provided by more than one language follows:
Positional Value (other name(s): N/A):
These values are associated with arguments based on the index of the value in the list of
all positional values.
Default Value (other name(s): Optional Argument):
A value that an author associates with an argument for use when the caller declines to
supply a value.
Named Value (other name(s): Keyword Argument, Option
Argument*
(The term "option argument" is not to be confused with the term
"optional argument". The former is used by shell users to refer to an
argument that come after an option (such as "--ignore" "foo" in the previous example)
while an optional argument is a term used by scripters and system
programmers to refer to an argument associated with a default value.)
):
These values are associated with an argument based on a name which the caller attaches to
the value.
The argument may be associated with several synonymous names.
Repeating Value (other name(s): N/A):
Languages which support named values must have some strategy for handling what happens
when it receives multiple values for the same argument.
In most cases, the strategy is either to error out or overwrite the previous value.
However, other strategies such as counting the number of times the value is given
(sensible, for example, with implicit values, see below) are possible.
Implicit Value (other name(s)s: Switch, Flag, Option):
An argument has an implicit value if the caller can specify a name associated with the
argument but omit the value.
For example, many CLI tools recognize "--verbose" to mean that additional output should be
provided.
An argument with an implicit value may also have antonyms, which invert the
semantics of the primary name.
For example, some CLI tools also recognize "--quiet" to mean that verbose output should be
disabled, and possibly that some non-verbose output should be disabled as well.
Typed Value (other name(s): N/A):
Some languages allow or require associating a type with an argument, in which case the
value provided by the caller must be compatible with that type.
Destructuring (other name(s)s: Pattern, Unpacking):
Some languages provide a mechanism to break a composite structure into its component
parts.
In some cases this functionality is specified by the author, in others by the caller.
In both cases the caller provides a value of a specific type and different parts of that
value (for example, different elements in a list or different members of a struct) are
bound to different local variables in the body of the function.
The difference is whether the author decides how the value is destructured (as in the case
of Rust's patterns) or the caller does (as in the case of Python's argument unpacking).
It is theoretically possible for a language to provide both of these variants
simultaneously.
Variadic Function (other name(s): N/A):
A function which can accept an arbitrary number of values.
Variadic functions are distinct from arguments with default values because with default
values the author controls the range of acceptable argument counts and each value provided
by the caller is bound to a different local variable.
With variadic functions, the caller can pass in an arbitrarily large number of arguments
(physical limitations notwithstanding) and the values are typically aggregated into a
container; the variable bound to this container is the variadic variable.
Variadic values are values which are associated with this feature.
For example, a variadic function may require one positional value at the beginning of the
value list.
The first value is not a variadic value but all other values are.
Each of these features is involved with one or more concerns:
Argument Binding:
These features are concerned with the manner in which values provided by the caller are
associated with arguments specified by the author.
Features with this concern include Positional Values, Named Values, Repeating Values,
Destructuring, and Variadic Functions.
Variable Specifications:
These features are concerned with how authors specify the local variables that are
associated with arguments.
Features with this concern include Destructuring and Typed Values.
Value Specifications:
These features are concerned with the ways that callers can provide values which will be
associated with arguments.
Features with this concern include Default Values, Repeating Values, Destructuring, and
Implicit Values.
Value Constraints:
These features are concerned with restricting the set of values that callers can associate
with a specific argument.
The only feature with this concern is Typed Values.
Structure of Paper
The remainder of this paper will be divided into 3 parts: Description, Analysis, and Implementation. The purpose of these parts is to first understand what the current state of the craft is, then assess the advantages and disadvantages of different approaches, and finally create a framework that allows programmers to make context-aware decisions in an interoperable manner.
Description
Overview
This part is dedicated to describing the features of various languages that are sampled
by this paper, using the terms defined by this paper.
Each of these sections will start with a summary of the features provided by the language.
Next, it will explain the background knowledge necessary in order to understand the
feature descriptions.
This explanation will include an example of a "simple function and call" which serves two
purposes.
First, it provides the reader with a concrete example of what will be described in the
context of this language.
For example, the description in Python will focus on the behavior of the interpreter and
the bytecode emitted by the compiler.
Second, it provides a point of comparison when describing features.
In Python, the description of destructuring primarily describes the
CALL_FUNCTION_EX instruction, with some description of list-building
instructions towards the end, rather than describing every single instruction that is
emitted.
This keeps each section shorter because they do not have to explain the baseline they are
being compared to; the sections are not self-contained, but they all have exactly one
dependency and the dependency plus the section are self-contained.
This makes it easier for a reader to focus only on the features that are relevant to them,
if they so choose.
Finally, each of the features provided by the language will be described.
Finally, there will be 3 subsections: Syntax, Implementation, and Historical Record. All 3 of these sections inform the implementation of the mechanism.
The syntax section explains what the source code looks like when the feature is used. This clarifies what the feature looks like in practice.
The implementation section explains the language behavior which causes the feature to be provided. This suggests implementation strategies that might be useful in the framework.
The historical record section discusses, where possible, the motivation for the feature and lessons learned from implementation and community response. This helps the framework avoid or solve problems that others have dealt with in the past.
Bash
Bash, along with getopt, provides the following features:
Background
Bash has an interpreter which executes plain-text documents
(bash-5.1.16-source)
doc/bash.info lines 111-122
Bash Source. 5.1.16. SHA256: 5bac17218d3911834520dad13cd1f85ab944e1c09ae1aba55906be1f8192f558 GNU Project. https://ftp.gnu.org/gnu/bash/bash-5.1.16.tar.gz
.
Unlike most languages (and all other languages examined in this paper), bash does not
expect or allow function authors to specify arguments in the function signature.
Instead, Bash makes all functions variadic and binds each argument to a special variable
named after the position - the first argument will be bound to $1, the second
to $2, etc.
Note that the $ simply indicates the start of a variable (instead of a plain
string) in Bash.
This paper also assesses the command-line utility getopt, specifically the version provided by GNU which has features beyond what other implementations provide. I chose to include the use of this utility because it enjoys widespread use and it is more interesting to assess the feature-rich utility than the relatively featureless facilities provided by the core language (positional arguments and variadic functions only). Getopt has 2 different concepts when it comes to arguments, which it refers to as options and arguments. In the nomenclature of this paper, options are names and arguments are values. Getopt further distinguishes between option arguments, values that are associated with a name, and non-option arguments, values that are associated with a position.
Furthermore, getopt does not do variable binding in the traditional manner that we will
see in other languages in this paper.
Instead, it supports future processing by normalizing the form of the arguments by
ordering them and separating them.
It organizes them so that all named values appear first, then a literal --,
then all positional values.
It separates them by taking named values which are specified by a single value, by using
the form --name=value, and separating them into two separate values,
--name value.
This organization makes it straightforward to process named values using the built-in
while, case, and shift constructs.
Positional values can then be referenced by using their position name, $1,
$2, etc.
The implementation of getopt is split across two repositories: GLibc provides c-level functions which implement functionality (usable by c-level programs to process command-line arguments) and Util-Linux provides a wrapper around these functions to make them accessible directly from the command-line.
The GLibc function getopt_long is the entry point.
It accepts an array of argument values along with metadata to control how they are
processed (for example, valid names are passed through a control structure).
It expects the user to call it repeatedly in order to process all arguments and performs
2 processes while scanning the argument list.
One process is to immediately indicate the next value name (option) through the return
value and return the value (option argument), when present, through a global variable.
When this process is finished it returns a special value to indicate such.
The second process (which occurs simultaneously with the first process) is to reorganize
the given list of values so that all named values precede all positional values.
The Util-Linux wrapper calls getopt_long and prints the return values in a
normalized form.
Simple Function and Call
function add-values {
getopt -o "" -- "$@"
eval set -- $(getopt "" -- "$@")
shift
echo $(($1 + $2))
}
add-values 1000 1001
The first line in the function runs getopt, but does not have any side-effects; it simply
prints the organized and normalized arguments for examination in the output.
The second line is the idiomatic way to invoke getopt; it uses the output of getopt to
rewrite the argument list so that it takes effect.
The third line shifts the argument list; that is, it moves all positional arguments to the
left one position so that $3 becomes $2, $2 becomes
$1, and $1 is dropped.
It is idiomatic to call shift before processing positional arguments so that
the literal -- inserted by getopt does not cause confusion.
The final line uses the rewritten argument list to perform the addition.
Calling the function results in the following output:
-- '1000' '1001' 2001
Positional Value
Bash supports positional arguments through the use of special variables $1,
$2, etc.
Note that, without further processing by the author, named values and their names will
appear as positional values.
It is idiomatic to use the built-in shift function when processing named
values in order to avoid this issue.
Getopt facilitates this by ensuring that all named values preceed all positional values,
and divides the two groups with a literal --.
Syntax & Semantics
The syntax and semantics of positional values do not differ from the simple call.
Implementation
getopt_long recognizes that a value is not a name because it either does not
start with a -, or because it is only one character long (that is, the value
is exactly the string "-")
(glibc-source)
posix/getopt.c line 490
GLibc Source. 2.35. SHA256: 5123732f6b67ccd319305efd399971d58592122bcc2a6518a1bd2510dd0cf52e GNU Project. https://ftp.gnu.org/gnu/glibc/glibc-2.35.tar.xz
.
Values which are named are automatically processed when the name is processed, so they
will not be mistaken for positional values
(glibc-source)
Long options:
posix/getopt.c lines 573, 595
posix/getopt.c lines 351-365
Short options:
posix/getopt.c lines 653-696
GLibc Source. 2.35. SHA256: 5123732f6b67ccd319305efd399971d58592122bcc2a6518a1bd2510dd0cf52e GNU Project. https://ftp.gnu.org/gnu/glibc/glibc-2.35.tar.xz
.
Values which are named are moved to the beginning of the array whenever a new value
starts being processed
(glibc-source)
posix/getopt.c lines 504-520
GLibc Source. 2.35. SHA256: 5123732f6b67ccd319305efd399971d58592122bcc2a6518a1bd2510dd0cf52e GNU Project. https://ftp.gnu.org/gnu/glibc/glibc-2.35.tar.xz
.
Historical Record
Bash has used positional parameters in the above manner since at most version 1.14.7, the
oldest version available in the referenced repository
(bash-1.14.7-source)
documentation/features.texi lines 243-248.
Bash Source. 1.14.7. Commit: 726f63884db0132f01745f1fb4465e6621088ccf. GNU Project. https://git.savannah.gnu.org/git/bash.git
.
In the getopt C function specified by POSIX, arguments are not reorganized so
that named values appear prior to positional values.
Instead, the standard assumes that the caller provides all named values before positional
ones and stops processing at the first positional value found.
The sentinal value -- to indicate the beginning of positonal arguments was
included in the standard
(POSIX.2-1992)
pages 732-733.
Portable Operating System Interface (POSIX). 1003.2-1992. Institute of Electrical and Electronics Engineers. 1992.
.
The argument reorganization has been a GNU extension since the oldest version of GLibc
available in the referenced repository
(GLibc-first-commit)
posix/getopt.c lines 73-85
GLibc Source. Commit 28f540f45bbacd939bfd07f213bcad2bf730b1bf. GNU Project. https://sourceware.org/git/glibc.git
.
Default Value
While default values are not explicitly supported by getopt, they are
trivial to implement because of the manual processing that getopt expects the
programmer to perform.
Syntax & Semantics
Default values do not have first-class support in Bash and Getopt, so there is no specific syntax for specifying default values. However, it is trivial and common for shell scripters to support default values so this will be examined in the implementation section.
Implementation
Default values can be implemented by initializing local variables to default values, then
updating them if a new value is given through an argument.
For example, the following code provides a default value of 0 for both $A and
$B.
function add-values {
getopt -o "" -- "$@"
eval set -- $(getopt "" -- "$@")
A=0
B=0
# Move past the literal --
shift
# The code [ -n "$1" ] checks if the variable contains a value
if [ -n "$1" ]; then
A="$1"
shift
fi
if [ -n "$1" ]; then
B="$1"
shift
fi
echo $(($A + $B))
}
Historical Record
Bash and getopt have never provided first-class support for default values.Named Value
Syntax & Semantics
Implementation
Historical Record
In the POSIX specification, only argument names with a single letter are allowed and
argument names may only be prefixed with a single hyphen
(POSIX.2-1992)
pages 732-733.
Portable Operating System Interface (POSIX). 1003.2-1992. Institute of Electrical and Electronics Engineers. 1992.
.
The ability to use multicharacter option names with a double hyphen has been a GNU
extension since the oldest commit available in the referenced repository
(GLibc-first-commit)
posix/getopt.c lines 354-361
GLibc Source. Commit 28f540f45bbacd939bfd07f213bcad2bf730b1bf. GNU Project. https://sourceware.org/git/glibc.git
.
Repeating Value
Syntax & Semantics
Implementation
Historical Record
Implicit Value
Syntax & Semantics
Implementation
Historical Record
Typed Value
Syntax & Semantics
Implementation
Historical Record
Caller Destructing
Syntax & Semantics
Implementation
Historical Record
Author Destructuring
Syntax & Semantics
Implementation
Historical Record
Variadic Function
Syntax & Semantics
Implementation
Historical Record
Python
Python provides the following features:
- Positional Value
- Callers typically decide between naming or positioning values but authors can restrict this decision.
- Positional values must precede named values.
- Default Value
- Named Value
- Repeating Value
- Typed Value
- Caller Destructuring
- Restricted to iterables and dictionaries.
- Variadic Functions
- One or two variadic variables will be bound to a list for positional values and/or a dictionary for named values.
Python used to provide author destructuring for tuples, but this was removed in version 3.0.
Some bytecode instructions include hints in parentheticals after the instruction.
For example, LOAD_NAME will include the name being loaded:
While
LOAD_NAME 0 (add_values)
LOAD_CONST will include the literal value:
These hints are helpfully provided by the Python decompiler.
LOAD_CONST 1 (1000)
Background
Python has a compiler which produces bytecode
(PDG)
internals/compiler.rs section "Abstract"
Python Developer Guide. Commit: 323f9cf9438730fb64ed71b40a3cb343b6724841. Python Software Foundation. https://github.com/python/devguide
and an interpreter which executes the bytecode
(PDG)
internals/interpreter.rs section "Introduction"
Python Developer Guide. Commit: 323f9cf9438730fb64ed71b40a3cb343b6724841. Python Software Foundation. https://github.com/python/devguide
.
Argument and return values are given using a stack managed by the interpreter;
instructions may modify or move these values, even if this is not the primary purpose of
the instruction
(PDG)
internals/interpreter.rst section "The Evaluation Stack"
Python Developer Guide. Commit: 323f9cf9438730fb64ed71b40a3cb343b6724841. Python Software Foundation. https://github.com/python/devguide
.
There are 2 families of instructions that are used throughout the examples in this
section.
The LOAD family puts values on the stack from different locations depending
on the instruction.
The CALL family initiates a function call after the stack has been prepared.
There are also example-specific instructions which will be explained alongside the
relevant example.
Note: This section omits bookkeeping instructions that are not topically relevant.
For example, when calling a non-method function (one which is not associated with an
object instance), the interpreter pushes NULL onto the stack before pushing
the function.
This instruction, and similarly uninteresting instructions, are omitted for brevity.
LOAD family
Instructions prefixed with LOAD_ retrieve a value from an
instruction-dependent location and put it onto the stack.
Each instruction recieves an integer which represents an index into a C-level array.
Which array is referenced depends on the instruction.
When the array contains variable names, the instruction also retrieves the value
associated with that name.
The below table explains the contents of the array that each instruction references.
LOAD_CONST |
Constant values which appear literally or implicitly in source code
(PySource)
Doc/library/dis.rst lines 964-966 Python/ceval.c lines 1314-1536 Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython |
LOAD_FAST |
Names of local variables which are guaranteed to be initialized.
(PySource)
Doc/library/dis.rst lines 1253-1259, Lib/inspect.py line 514 Python/ceval.c lines 1314-1536 Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython |
LOAD_NAME |
Names of non-local variables.
If a local variable exists with the same name as a non-local variable then the
value bound to the local variable will be returned.
(PySource)
Doc/library/dis.rst lines 969-972 Lib/inspect.py line 511 Python/ceval.c lines 1314-1536 Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython |
CALL family
These instructions tell the interpreter to call a function.
This paper views this family as rooted in the plain CALL instruction, with
all others being variants on this core instruction.
When it needs to reference a behavior which occurs when a function is called, it examines
only the plain CALL instruction and assumes that other instructions behave
similarly unless the purpose of the variant is to change that specific behavior.
CALL |
Receives an integer indicating the number of argument values provided by the caller.
The stack will contain the function to call followed by the argument values in
separate stack locations.
(PySource)
Doc/library/dis.rst lines 1398-1410 Python/ceval.c lines 1314-1536 Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython |
Simple Function and Call
def add_values(a, b):
return a + b
add_values(1000, 1001)
The bytecode generated for the call to add_values performs 3 tasks.
First, it pushes the function onto the stack.
Next, it pushes literal values which will become associated with arguments.
Finally, it calls the function.
LOAD_NAME 0 (add_values) LOAD_CONST 1 (1000) LOAD_CONST 2 (1001) CALL 2
The bytecode generated for the definition is similar.
It uses LOAD_FAST (instead of LOAD_NAME) to refer to the variables
associated with arguments and BINARY_OP (instead of CALL) to use the
built-in + operator.
LOAD_FAST 0 (a) LOAD_FAST 1 (b) BINARY_OP 0 (+)
Positional Value
Provided by default.
Generally, callers can choose whether to provide values by name or position when they
make the call.
All positional values must precede all named values
(PyDoc)
reference/expressions.html section 6.3.4 "Calls"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
.
Function authors can specify that some arguments with only receive their value by
position.
These are referred to as "positional-only arguments".
Syntax & Semantics
Callers specify positional values by providing a comma-separated list of values.
Function authors specify positional-only arguments by listing a literal /
after the final positional-only argument
(PyDoc)
reference/compound_stmts.html section 8.7 "Function Definitions"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
.
def add_values_mixed(position, /, either):
return position + either
# valid
add_values_mixed(1000, 1001)
add_values_mixed(1000, b=1001)
# invalid: the value for argument
# "position" cannot be given by name
# add_values_mixed(position=1000, either=1001)
Implementation
The bytecode for function definitions is identical to the simple definition regardless of
the presence of positional-only arguments.
The restriction is enforced within the CALL instruction.
In particular, the helper function positional_only_passed_as_keyword uses the
co_posonlyargcount and co_localsplusnames members of the code
object.
These variables track the number of positional-only arguments
(PySource)
Doc/library/inspect.rst lines 180-181
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
and the names of all arguments
(PySource)
Include/cpython/code.h line 155
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
respectively.
If any names of positional-only arguments appear as keyword arguments then the helper
raises an error.
(PySource)
Python/ceval.c lines 1182-1244
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
Note that the helper is only called if the function does not include a variadic variable
for named values
(PySource)
Python/ceval.c lines 1417-1431
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
localsplus.
The CALL instruction copies positional arguments from the stack into this
array
(PySource)
Python/ceval.c lines 1341-1353Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython .
Historical Record
Positional values have always been available in Python and requires no special syntax to use* Unfortunately, positional values are assumed to be the default method of passing arguments by most programmers, including language authors, so there is no good citation for this assertion. .
PEP 570 introduced positional-only arguments
(PEPs)
peps/pep-0570.rst
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
It gives several justifications for the change, most of which are concerned with
maintaining a healthy ecosystem.
There are two relevant*
There are also several concerns mentioned which are specific to Python
and/or its implementation.
For example, it references PEP 399 which requires that pure Python code and C
extensions have the same expressive power.
While important to the Python community, these concerns are not relevant to this
paper.
ecosystem harms the PEP is concerned with: inappropriate use of
value names by callers and increased maintenance burden for library authors.
Inappropriate use of value names includes using non-meaningful names, such as a math
function that takes one argument (the sqrt function takes one argument named
x).
It also includes providing values in an illogical order, such as calling the
range function and supplying the stop value before the
start
value.
The increased maintenance burden occurs because all argument names are automatically and irrevocably added to the API surface of all libraries. It could be the case that a library author wants to implement a change which should be non-breaking in principle, but prompts a variable name change for clarity. This variable name change transforms the overall change into a breaking change.
The PEP is also concerned with functions that include a variadic variable for named parameters. For these functions, any non-variadic variables restrict the domain of the variadic variable, as their names will be associated with the distinct variable rather than the variadic one.
Finally, the PEP notes the curious case of the range function, which the PEP
describes as accepting "an optional parameter to the left of its required parameter."
In particular, if the range function only receives a single argument it is
interpreted as the end of the range, but if it receives 2 arguments then the first is
interpreted as the start while the second is interpreted as the end.
This concern does not appear to be addressed by PEP 570*
At least, I do not see anything that addresses it when I read the PEP and the
implementation of range still inspects the number of provided arguments manually
(PySource)
Objects/rangeobject.c lines 81-120
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
.
Default Value
Provided by default argument values. Default values are evaluated at definition time and
evaluated exactly once
(PyDoc)
tutorial/controlflow.html section 4.8.1 "Default Argument Values"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
Syntax & Semantics
Function authors can define a default value by adding a literal = after the
name of an argument, then the value
(PyDoc)
reference/compound_stmts.html section 8.7 "Function Definitions"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
def add_values(mandatory, optional=2000):
return mandatory + optional
# valid
add_values(1000)
add_values(1000, 1001)
# invalid: the first argument is not
# associated with a default value
# add_values()
The value may refer to a variable, but the value contained by the variable at definition
time will always be used as the default value:
value = 2000
def add_values(mandatory, optional=value):
return mandatory + optional
value = 1000
add_values(0) # returns 2000
Implementation
Default values do not impact the bytecode generated for the definition or the call: they
are both identical to the simple call.
Instead, the CALL instruction retrieves default values from the code object
and uses them when necessary
(PySource)
Python/ceval.c lines 1314-1536
Trace from CALL implementation to initialize_locals
definition:
Call to _PyEvalFramePushAndInit Python/bytecodes.c lines 2689-2691
Call to initialize_locals Python/ceval.c line 1594
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
Historical Record
Default values were added in version 1.0.2
(PySource)
Misc/HISTORY lines 32809-32811
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
There is additionally a note that default values "would now be quite sensible" in the
version 0.9.4 release notes.
This version changed argument processing so that functions receive all arguments as
separate values, rather than as a single tuple
(PySource)
Misc/HISTORY lines 34550-34639
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
PEP 671 proposes adding a feature which would allow function authors to provide an
expression which will produce a default value at call time ("late evaluation")
(PEPs)
peps/pep-0671.rst
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
The mailing list discussion includes several disagreements, including whether or not it is
appropriate for a function signature to contain un-inspectable objects and technical
difficulties about scoping rules for late evaluated values.
The proposal is still in the "draft" state, so it might be added in the future (possibly
after trivial or significant changes to the proposal), but there has been no activity on
the mailing list since 2021.
(PEP 671 Discussion)
PEP 671 (late-bound arg defaults), next round of discussion. TODO: list authors. https://mail.python.org/archives/list/python-ideas@python.org/thread/UVOQEK7IRFSCBOH734T5GFJOEJXFCR6A/
Named Value
Provided through keyword arguments.
Generally, callers can choose whether to provide values by name or position when they
make the call.
All named values must proceed all positional values
(PyDoc)
reference/expressions.html section 6.3.4 "Calls"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
.
Function authors can specify that some arguments will only receive their value by name.
These are referred to as "keyword-only arguments".
Syntax & Semantics
Callers provide named values by writing first a symbolic name, then a literal
=, then the value.
(PyDoc)
reference/expressions.html section 6.3.4 "Calls"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
.
Function authors specify keyword-only arguments by listing a literal * before
the first keyword-only argument
(PyDoc)
reference/compound_stmts.html section 8.7 "Function Definitions"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
.
def add_values_mixed(either, *, named):
return either + named
# valid
add_values_mixed(named=1001, either=1000)
add_values_mixed(1000, named=1001)
# invalid: the value for argument "named"
# must be given by name
# some_function(1000, 1001)
# invalid: named values cannot appear
# before positional values
# some_function(named=1001, 1000)
Implementation
The bytecode for function definitions is identical regardless of the presence of
keyword-only arguments.
The restriction is enforced within the CALL instruction.
In particular, the helper function initialize_locals checks that the number
of positional arguments is not more than expected
(PyDoc)
Python/ceval.c lines 1458-1462, trace_call_TO_initialize_locals
Trace from CALL implementation to initialize_locals definition:
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
by checking the
Call to `_PyEvalFramePushAndInit` Python/bytecodes.c lines 2689-2691
Call to `initialize_locals` Python/ceval.c line 1594
co_argcount member which tracks the number of arguments which
may be positional
(PySource)
Doc/library/inspect.rst lines 146-149
Python/assemble.c line 556
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
Bytecode for calls which use named values differ significantly from the simple call. For example, consider this code:
def add_values(a, b):
return a + b
add_values(b=1001, a=1000)
add_values(1000, b=1001)
The bytecode for the first call differs from the simple call by adding the
KW_NAMES instruction prior to the CALL instruction:
LOAD_NAME 1 (add_values)
LOAD_CONST 6 (1001)
LOAD_CONST 5 (1000)
KW_NAMES 8 (('b', 'a'))
CALL 2
KW_NAMES marks the given constant, in this case the tuple
('b', 'a'), as a set of argument names to be used by CALL
(PySource)
Python/bytecodes.c lines 2601-2605, 2644, 2869-2692, 2706-2709Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython Then, the
CALL instruction determines which values belong to which arguments
by corresponding their respective positions on the stack and in the tuple
(PySource)
Python/ceval.c lines 1383-1384Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython .
The process is similar when some values are provided by position and others by name. The second call above does not have any additional instructions to handle this case:
LOAD_NAME 2 (add_values_mixed)
LOAD_CONST 5 (1000)
LOAD_CONST 6 (1001)
KW_NAMES 12 (('b',))
CALL 2
The CALL instruction infers which value is named based on the restriction
that positional values must precede named values
(PySource)
Python/ceval.c lines 1383-1384
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
Historical Record
Named values were first introduced in Python 1.3
(Py1.3Source)
Doc/tut.tex lines 3540-3626
Python Source. 1.3. SHA256: 892a12d7360b5e6aed0f6a637f5dbdd4ea9a27ee0e2741337393b1d5711d2d33 Python Software Foundation. https://legacy.python.org/download/releases/src/python-1.3.tar.gz
The feature was based on the similar feature provided by Modula-3
(Py1.3Source)
Doc/tut.tex lines 3584-3586
Python Source. 1.3. SHA256: 892a12d7360b5e6aed0f6a637f5dbdd4ea9a27ee0e2741337393b1d5711d2d33 Python Software Foundation. https://legacy.python.org/download/releases/src/python-1.3.tar.gz
.
While keyword-only arguments (discussed later in this section) were added afterwards, the
core syntax and semantics of keyword-only arguments have remained unchanged.
PEP 3102 introduced keyword-only arguments. It provides a single justification for the change: variadic functions cannot make use of default values. The PEP gives the following example:
def sortwords(*words, case_sense=False):
pass
If the value associated with case_sense can be provided positionally then it
must be provided in every call even if the caller wants the default value of
False*
Unless the caller wants to sort the empty list. =)
.
Repeating Value
Python produces an error when a value is repeated.Syntax & Semantics
A repeated value is provided by naming 2 different values with the same name:
add_values(a=1000, b=1001, a=1002)
This produces an error:
SyntaxError: keyword argument repeated: a
Implementation
Python checks for repeating arguments statically at compile-time. Thecompiler_call helper function is used to emit instructions related to a
function call
(PySource)
Python/compile.c line 6126
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
The first thing this function does is call the validate_keywords function
(PySource)
Python/compile.c line 4914
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
This function manually compares each name to each proceeding name and returns an error if
a duplicate is found
(PySource)
Python/compile.c lines 4900-4905
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
Historical Record
The explicit check for duplicate value names was added in version 1.5a3 (PySource) Misc/HISTORY lines 30752-30753 Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython ; it is not clear what behavior Python exhibited before this change * Unfortunately, Python's 1.4 source does not compile on modern compilers with many errors such as "GCC no longer supports <varargs.h>", so I was not able to test the behavior at this version. .Implicit Value
This feature is not provided by Python.
Typed Value
Provided through type hinting. The Python compiler and interpreter do not change their behavior based on type hints. However, they do guarantee that the hints will be available to external tools and provide supporting infrastructure to help the tools work correctly. Both static analyzers and runtime checkers can make use of annotations.
Syntax & Semantics
Type hints are specified using function annotations, as defined in PEP 3107. This means that function authors add a colon and type name after the variable name associated with the argument:
def typed_argument(x: str):
pass
Implementation
Annotations are stored as metadata in the Python object which represents the function.
Libraries can access them through the __annotations__ property, which
contains a dictionary
(PEPs)
peps/pep-3107.rst "Accessing Function Annotations"
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
Historical Record
The foundations for type hinting were added in PEP 3102, which defines the syntax for
function annotations
(PEPs)
peps/pep-3102.rst
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
The Python developers then waited for external community-driven tools to experiment with
different type-checking approaches.
Eventually, they took lessons learned from the community and created a set of
recommendations in PEPs 482, 483, and 484
(PEPs)
peps/pep-0484.rst
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
Much of their content addresses type theory issues, such as generics, variance, and
special types like Any.
Since this initial introduction there have been a number of PEPs which further clarify
best practices or provide syntactic improvements to type specifications.
However, the core mechanism that this paper is concerned with - associating a type with an
argument, regardless of how that type is specified - remains unchanged.
Caller Destructing
Provided through argument unpacking. Caller destructuring is only available for iterables and mappings.
Syntax & Semantics
This feature allows a caller to prefix one or more iterables with * in order
to translate their contents into a set of positional values, and/or prefix one or more
mappings with ** to translate their contents into a set of named values.
For mappings, keys must strings naming an argument.
(PyDoc)
reference/expressions.html section 6.3.4 "Calls"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
def add_values(a, b, c):
return a + b + c
# all of the below calls are equivalent
# to this:
# add_values(1000, 1001, 1002)
# destructure an iterable into positional
# values
l = [ 1000, 1001, 1002 ]
add_values(*l)
# destructure multiple iterables into
# positional values
first_part = [ 1000 ]
second_part = [ 1001, 1002 ]
add_values(*first_part, *second_part)
# destructure a mapping into named
# values
d = { 'a': 1000, 'b': 1001, 'c': 1002 }
add_values(**d)
# destructure multiple mappings into
# named values
first_part = { 'b': 1001 }
second_part = { 'a': 1000, 'c': 1002 }
add_values(**first_part, **second_part)
Implementation
The difference between the simple call and a call which includes destructuring is best
explained by starting with the final instruction.
While the simple call uses the plain CALL instruction a destructuring call uses
the CALL_FUNCTION_EX instruction.
CALL_FUNCTION_EX receives either 0 or 1 which tells it whether or not there is
a mapping to destructure
(PySource)
Doc/library/dis.rst lines 1398-1410
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
When it receives 1, there is a mapping to destructure which will be on the top of the
stack.
While the caller can use any mapping (and any number of mappings),
CALL_FUNCTION_EX will always see a single dictionary when it executes (the
process which ensures this is discussed in more detail later in this section).
The dictionary is turned into a set of keyword arguments by interpreting the keys as names
identifying arguments.
(PySource)
Objects/call.c lines 1029-1053
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
The next item on the stack is an iterable to destructure.
In this case, CALL_FUNCTION_EX might see any iterable on the stack.
If the iterable is not a tuple it will convert it into a tuple
(PySource)
Python/bytecodes.c lines 3198-3207
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
The elements of this tuple will be used as positional values.
(PySource)
Python/bytecodes.c line 3219
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
When CALL_FUNCTION_EX receives 0 the process is similar, except that the
top element of the stack is an iterable and there is no mapping.
The compiler ensures that CALL_FUNCTION_EX only receives dictionaries (rather
than the arbitrary mapping object provided by the caller) with two instructions.
First, it issues a BUILD_MAP instruction to place a new dictionary on the
stack
(PySource)
Doc/library/dis.rst lines 1015-1023
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
Then it adds the keys and values of each mapping object into this dictionary by repeatedly
calling the DICT_MERGE instruction.
For example, this code:
add_values(*d)
Compiles to this bytecode (note that BUILD_MAP receives the value 0 to
indicate that it is building an empty dictionary):
LOAD_NAME 0 (add_values) LOAD_CONST 13 (()) BUILD_MAP 0 LOAD_NAME 3 (d) DICT_MERGE 1 CALL_FUNCTON_EX 1
When named values are provided separately from the destructured values, the freshly created dictionary is prepopulated with those values. For example, this code:
d = { 'b': 1001, 'c': 1002 }
add_values(a=1000, **d)
Compiles to this bytecode (note that in this case, BUILD_MAP receives the value
1 to indicate that there is one key-value pair on the stack):
LOAD_NAME 1 (add_values)
LOAD_CONST 17 (())
LOAD_CONST 11 ('a')
LOAD_CONST 3 (1000)
BUILD_MAP 1
LOAD_NAME 4 (d)
DICT_MERGE 1
CALL_FUNCTION_EX 1
When the caller provides multiple destructured iterables, or provides literal positional values in addition to one or more destructured iterables, the compiler issues instructions to merge them into a list, then converts that list into a tuple. For example, this code:
t0 = ( 1001, )
t1 = ( 1002, )
add_values(1000, *t0, *t1)
Compiles to this bytecode:
LOAD_NAME 1 (add_values) LOAD_CONST 3 (1000) BUILD_LIST 1 LOAD_NAME 4 (t0) LIST_EXTEND 1 LOAD_NAME 5 (t1) LIST_EXTEND 1 CALL_INTRINSIC_1 6 (INTRINSIC_LIST_TO_TUPLE) CALL_FUNCTION_EX 0
If the caller provides only a single iterable to destructure, and no literal positional
values, this iterable is placed onto the stack without modification and the tuple creation
logic contained within CALL_FUNCTION_EX itself is triggered
(PySource)
Python/bytcodes.c line 3202
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
.
Historical Record
When argument unpacking was first introduced in version 1.6
(PySource)
Misc/HISTORY lines 26740-26743
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
it only allowed callers to unpack a single iterable and/or a single mapping.
For example, the call add_values(*first_part, *second_part) would have been
illegal.
PEP 448 expanded argument unpacking so that multiple values can be destructured in the
same call
(PEPs)
peps/pep-0448.rst
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
The rationale given for this change was enhanced readability, as previously callers would
either need to build iterables/dictionaries separately or destructure them manually,
adding additional lines of code which are semantically sparse.
Author Destructuring
While python does not currently support authorial destructuring, it did so prior to
version 3.0
(PEPs)
peps/pep-3113.rst
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
It allowed authors to declare that arguments should receive tuples whose values would be
bound to separate local variables:
def distance((x1, y1), (x2, y2)):
pass
This function would require that callers pass in 2 values which are both tuples containing
2 elements.
The values from the first tuple would be bound to local variables x1 and
y1, while the values from the second would be bound to x2 and
y2.
The functionality was removed through PEP 3113. The rationale includes multiple implementation issues which are important to the Python community but not relevant to this paper.
Variadic Function
Provided by arbitrary argument lists and dictionaries. Positional values are collected by the former while named values are collected by the latter.
Syntax & Semantics
Function authors specify variadic-ness by specifying the name for one or two variadic
variables.
The name for the variadic list must be prefixied by a * while the name for the
variadic dictionary must be preceeded by a **
(PyDoc)
reference/compound_stmts.html section 8.7 "Function Definitions"
Python Documentation. 3.12.2. SHA-256: d600427f22db970ddf4f5699fdb5a8ebda8e9e92ddb09af807bbeabf07e64df6. Python Software Foundation. https://docs.python.org/3.12/archives/python-3.12.2-docs-html.zip
.
from itertools import chain
def add_values(*pos_vals, **named_vals):
return sum(chain(pos_vals, named_vals.values()))
# All of these values appear in the
# pos_values list
add_values(1000, 1001, 1002, 1003)
# All of these values appear in the
# named_values dictionary
add_values(named_val0=1000,
named_val1=1001,
named_val2=1002,
named_val3=1003)
# The values 1002 and 1003 appear in the
# pos_vals list while the names and
# values named_arg0=1000 and
# named_arg1=1001 appear in the
# named_vals dictionary
add_all_values(1002,
1003,
named_val0=1000,
named_val1=1001)
Implementation
The bytecode for variadic calls does not differ from the simple call (or a call with named
values, when relevant). Similarly, the bytecode for the definition of variadic functions
does not differ except implicitly in the way that the variables are used (eg, as lists
or dictionaries rather than plain values).
The interpreter tracks which positional values are also variadic values by checking the
co_argcount variable associated with the function's code object.
Remaining positional arguments are moved into the appropriate variadic variable, if it
exists
(PySource)
Python/ceval.c lines 1355-1376
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
It distinguishes variadic named values from non-variadic named values by checking if the
name is expected; the interpreter already has to keep track of this information because
an unrecognized value name is considered an error for non-variadic functions
(PySource)
Python/ceval.c lines 1378-1455
Python Source. 3.12.2. Commit: 6abddd9f6afdddc09031989e0deb25e301ecf315. Python Software Foundation. https://github.com/python/cpython
Historical Record
PEP 468 updated the variadic variable for named values such that the author can retrieve
the syntactic order in which the values were given.
The rationale given for this change is that some users are developing APIs where order
matters, such as serialization
(PEPs)
peps/pep-0468.rst
Python Enhancement Proposals (PEPs). Commit: fca6000dfd8b55127c9a1fcbac809ddb27aafab7. Python Software foundation. https://github.com/python/peps
.
Guile
Guile provides the following features:
- Destructuring
- Named Value
- Repeating Value
- Default Values
- Positional Value
- Typed Values
- Variadic Functions
All code samples were compiled with the partial-evaluation optimization turned off. This is because partial evaluation frequently removes the function call itself as the return values can be calculated at compile-time.
Additionally, bytecode listings include comments which help explain what the instruction
is doing.
The meaning of the comment depends on the instruction.
For example, the make-immediate bytecode instruction copies a literal static
value onto the stack.
It contains a comment which indicates the value loaded:
(make-immediate 4 4002) ;; 1000
While the call-scm<-scm-scm, which is pronounced "Call scheme from scheme
scheme" to signify that it is calling a built-in function that returns a scheme value and
accepts two scheme values as input, includes the name of the built-in function it is
calling:
(call-scm<-scm-scm 8 8 7 111) ;; lookup-bound
Note that the alloc-frame instruction will contain a comment referring to the
number of "slots" that the frame has; this refers to the size of the stack after the
instruction finishes executing.
These comments are helpfully added by the Guile decompiler.
Background
Guile implements the Scheme programming language, which is a dialect of Lisp.
Scheme was originally described in a 1975 paper for demonstrative purposes
(r0rs)
Page 1.
Scheme: an Interpreter for Extended Lambda Calculus. Massachusetts Institute of Technology - Artificial Intelligence Laboratory. https://standards.scheme.org/official/r0rs.pdf
.
Interest in the language led to a series of revisions to the original description and,
eventually, standardization.
These papers are refered to as "Revised Reports on Scheme", abbreviated to
"rNrs" where the N is replaced with a number representing the revision
count.
For example, the most recently published version of the standard is the 7th revision of
the Scheme programming language so it is referred to as "r7rs".
Guile was originally implemented as an interpreter which worked with a literal
representation of a program's text
(guile-manual)
Section 9.3.1 "Why a VM?"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
A virtual machine was added to Guile in the 2.0 release (2010)
(guile-manual)
Section 9.1.4 "A Timeline of Selected Guile Releases"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
(guile-source)
NEWS lines 5068-5071
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
and was rewritten for the 2.2 release (2017)
(guile-manual)
Section 9.1.4 "A Timeline of Selected Guile Releases"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
Modern Guile implements a compiler which produces bytecode, an interpreter which executes
bytecode, and an interpreter which directly executes program text
(guile-source)
modules/language/scheme/spec.scm lines 43-45
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
This paper will focus on compiled bytecode as this keeps the analysis consistent with
other sections and is the typical way to execute Guile
code*
For example, running guile script-name will first compile the script then
run the compiled file rather than running the script directly.
.
Guile bytecode operates as a stack machine with 2 pointers into the stack: the frame
pointer and the stack pointer.
(guile-manual)
Section 9.3.2 "VM Concepts"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
The frame pointer stores a location near*
It is "near" the beginning rather than "at" the beginning because some metadata is
stored before the frame pointer's location; this metadata will not be relevant to this
paper.
the beginning of the frame, where each frame represents a single function call.
The stack pointer keeps track of the end of the stack, like a pointer to the end of an
array
(guile-manual)
Section 9.3.3 "Stack Layout"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
When instructions take an index, they differ in whether they take in an index relative to
the stack pointer or the frame pointer
(guile-manual)
Section 9.3.5 "Compiled Procedures are VM Programs"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
In prose, this paper will always reference indexes relative to the stack pointer for the
sake of consistency; this has the effect that frames start at a higher index and end at a
lower index.
For example, in the simple call the frame pointer, associated with the beginning of the
stack, is moved to index 5 while the stack pointer, associated with the end of the stack,
is moved to index 2.
Guile produces different kinds of call instructions based on the call's location.
Code samples in the repository were intentionally crafted to ensure that they always
produced the plain call instruction; irrelevant parts of the code (such as a
constant value placed after the call) are omitted from this paper.
The optargs module
Many of the features described in this paper are provided through Guile's
optargs module, which was introduced in version 1.3.2 (1999)
(guile-manual)
Section 9.1.4 "A Timeline of Selected Guile Releases"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
(guile-source)
NEWS lines 11451-11525
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
At this time, Guile lacked a virtual machine and the module was implemented in pure
scheme.
This implementation operated by adding a prelude to the function definiton which searches
through the values provided by the caller to decide which values should be assigned to
which variables
(guile-1.3.2-source)
ice-9/optargs.scm
Guile Source. 1.3.2. Commit: 0a852b9424f949575afecb19a391023acc63e635. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
The virtual machine was added to Guile in version 2.0.0 (2010)
(guile-manual)
Section 9.1.4 "A Timeline of Selected Guile Releases"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
(guile-source)
NEWS lines 5068-5071
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
This also entailed a rewrite of the optargs module which centered the
implementation around the internal <lambda-case> strucure.
This structure contains all the information about the different kinds of arguments
accepted by the function and facilitates the use of special-purpose VM instructions for
processing arguments efficiently
(guile-devel-archive)
2009-10 lines 6428-6453
Index of /archive/mbox/guile-devel. The GNU Project. https://lists.gnu.org/archive/mbox/guile-devel/
.
While the VM has gone through changes since then, including a complete rewrite in version
2.2
(guile-manual)
Section 9.1.4 "A Timeline of Selected Guile Releases"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
the implementation of the optargs module has remained stable.
The special-purpose VM instructions have been adapted to handle the details of VM
operation correctly and decrease latency, but the core logic used to process arguments has
proven to be robust.
Simple Function and Call
(define (add-values a b)
(+ a b))
(add-values 1000 1001)
The bytecode generated for the function definition is fairly straightforward.
0 (call-scm<-scm-scm 1 1 0 0) ;; add
1 (reset-frame 1) ;; 1 slot
2 (handle-interrupts)
3 (return-values)
Instruction 0 calls the built-in function add (which corresponds to the
+ function) and places the result into index 0
(guile-source)
liguile/vm-engine.c lines 1545-1549
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
Instruction 1 resizes the stack so that it contains 1 element - in this case, the return
value
(guile-source)
libguile/vm-engine.c lines 797-802
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
Instruction 2 ensures that Guile properly handles signals like Ctrl-C (SIGINT) and
code instrumentation
(guile-manual)
Section 6.22.3 "Asynchronous Interrupts"
Section 9.3.7.6 "Instrumentation Instructions"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
it can be safely ignored here and it will not be mentioned again.
Finally, the return-values instruction moves the flow of execution back to
the caller.
Counterintutively, it does not actually handle moving the return values to
specific locations; it only sets the frame and instruction pointers to the caller's
values
(guile-source)
libguile/vm-engine.c lines 530-555
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
The bytecode generated for the call to add-values is more complicated because
it is calling a user-defined function.
It needs to load the function and its arguments*
The function definition did not have to manage the arguments for the built-in call
because built-in calls take values from arbitrary stack locations which are specified
in the instruction arguments, so the caller-supplied locations were reusable.
,
then dispatch to the function.
0 (static-ref 7 16324) ;; add-values
1 (call-scm<-scm-scm 8 8 7 111) ;; lookup-bound
2 (scm-ref/immediate 5 8 1)
3 (make-immediate 4 4002) ;; 1000
4 (make-immediate 3 4006) ;; 1001
5 (handle-interrupts)
6 (call 3 3)
Instruction 0 loads the literal string "add-values" into index 7
(guile-source)
libguile/vm-engine.c line 2125-2129
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
Instruction 1 looks up the value associated with the name add-values and
stores it in index 8
(guile-source)
libguile/vm-engine.c lines 1545-1549
libguile/intrinsics.c lines 374-37
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
The value includes both the procedure itself and some associated metadata; instruction 2
retrieves the procedure itself and stores it in index 5
(guile-source)
libguile/vm-engine.c lines 1906-1910
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
Instructions 3-4 place the constant values 1000 and 1001 into indices 4 and 3, respectively. At this point the most relevant portion of the stack looks like this:
| Index | Value |
|---|---|
| 5 | #<procedure add-values> |
| 4 | 1000 |
| 3 | 1001 |
Finally, instruction 6 calls the function; internally, this involves moving the frame
pointer to index 5 (where the procedure is stored) and the stack pointer to index 2 (the
new end of the stack, which is smaller for the callee than for the caller)
(guile-source)
libguile/vm-engine.c lines 451-454
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
Positional Value
Provided as the default mechanism for mapping values to arguments.
Syntax & Semantics
The syntax and semantics for both calls and definitions are identical to the simple call.
Implementation
Guile does not explicitly enforce a restriction that values can only be supplied positionally. This is because, technically, all values are always supplied positionally. When a caller provides named values they are actually providing pairs of values: keywords as names and arbitrary values as values. For example, given this call it is impossible to say whether the caller is supplying named values or positional values:
(make-dictionary #:first "one"
#:second "two"
#:third "three")
This could be a valid call to a function with named values:
(define* (make-dictionary #:key first second third)
(let ((result (make-hash-table)))
(hash-set! result #:first first)
(hash-set! result #:second second)
(hash-set! result #:third third)
result))
Or it could be a valid call to a function which takes 6 positional values:
(define* (make-dictionary key0 val0
key1 val1
key2 val2)
(let ((result (make-hash-table)))
(hash-set! result key0 val0)
(hash-set! result key1 val1)
(hash-set! result key2 val2)
result))
From the perspective of this particular caller, it is impossible to determine which implementation is in use without inspecting the code* Of course, a caller who wanted to use different keywords in their dictionary would notice the difference between these implmentations very quickly! .
However, it is likely that using named values for a function which does not accept them
will result in an error.
This is because names are themselves values and are unlikely to be the correct kind of
value for the function.
For example, this attempt to use named values resulted in an error because the local
variable a is bound to the value #:a, which is not a valid input
to +:
(define (add-values a b)
(+ a b))
(add-values #:a 5 #:b 7)
; ice-9/boot-9.scm:1685:16: In procedure raise-exception:
; In procedure +: Wrong type argument in position 1: #:a
Historical Record
The Scheme standard requires that implementations accept value positionally.
This has been true since the original paper describing lisp
(original-lisp-paper)
Pages 185-186
McCarthy, John. Recursive functions of symbolic expressions and their computation by machine, Part I. Association for Computing Machinery, 1980. https://doi.org/10.1145/367177.367199
.
The stated motivation is to distinguish between "functions" and "forms".
In that paper, a function is the abstract idea of a formula that can have concrete numbers
applied to it.
Traditionally, a form is syntactically identical to a formula, but it is meant to be a
stand-in for whatever value the formula will resolve to in context.
The differentiating syntax is taken from a previous work which had a similar concern and
does not explicitly justify the postional notation*
The decision to supply arguments positionally appears to derive from prior work
describing multi-argument functions as a series of functions which take in one of the
values and call the "next" function which will receieve the "next" value (this is
similar to the concept of currying used in some programming languages).
With this model, it is natural to write values positionally as they must be provided
in the correct order to ensure that each partial function operates correctly.
It's also worth noting that the examples of functions used in the paper are relatively
simple - they are all inlinable with the prose.
In the face of such simplicity, naming values would be verbose without much benefit.
(calculi-of-lambda-conversion)
Pages 3-7
Church, Alzono. The Calculi of Lambda-Conversion. Princeton University Press, 1941. https://doi.org/10.2307/2267126
While the semantics of positional arguments have remained the same, the implementation
has changed significantly.
In the oldest version of Guile available through source control, values are given to
a function by collecting them into a list, then dispatching based on the number of
values expected by the function
(guile-first-commit)
libguile/eval.c lines 1822-2004
Guile Source. Commit: 0f2d19dd46f83f41177f61d585732b32a866d613. The GNU Project. https://git.savannah.gnu.org/git/guile.git
With the addition of the VM, values are given by putting them into a frame managed by the
VM (eg, not a C-level frame), avoiding the need to create a list unless the function is
variadic.
Default Value
Provided through both #:optional and #:key.
When considering default values, both of these are semantically equivalent.
The difference between them is whether values are passed by position or by name.
This section will exclusively use #:optional because none of the functions
used in the examples are complicated enough to benefit from named values.
Syntax & Semantics
The default value for an argument can either be specified or unspecified.
If it is unspecified it is #false.
Arguments become associated with a default value by placing them after the
#:optional keyword in a define* form.
This verison of add-values demonstrates use of default values without
specification.
(define* (add-values #:optional a b)
(+ (or a 0) (or b 0)))
This function adds the given values, and the body of the function uses the value
0 if the caller omitted a vaule.
However, it is idiomatic to specify a default value in the function signature rather than testing the truthiness of the value. This is done by specifying the argument with a list instead of a symbol. The first element is the symbol naming the local variable and the second element is the default value.
(define* (add-values #:optional (a 0) (b 0))
(+ a b))
This version is semantically equivalent to the first.
Implementation
The bytecode for the function definition differs significantly. I will start with the bytecode for the function which includes default values in the signature as this version is simpler:
0 (bind-optionals 3) ;; 2 args
1 (alloc-frame 3) ;; 3 slots
2 (immediate-tag=? 1 4095 2308) ;; undefined?
3 (jne 2) ;; -> L1
4 (make-immediate 1 2) ;; 0
L1:
5 (immediate-tag=? 0 4095 2308) ;; undefined?
6 (jne 2) ;; -> L2
7 (make-immediate 0 2) ;; 0
L2:
8 (call-scm<-scm-scm 2 1 0 0) ;; add
9 (reset-frame 1) ;; 1 slot
10 (handle-interrupts)
11 (return-values)
Instruction 0, bind-optionals, checks if the caller omitted any argument
values.
If so, it fills in the associated variables with the undefined value
(guile-source)
libguile/vm-engine.c lines 3213-3233
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
The next 2 sections, instructions 2-4 and 5-7, check whether or not the local variables associated with optional values are undefined; if so, they are filled with the default value specified by the author.
Finally, instructions 8-11 are the body of the function as found in the simple call.
The version which does not specify default values in the signature is similar, but contains additional (and repetitious) logic.
0 (bind-optionals 3) ;; 2 argss
1 (alloc-frame 3) ;; 3 slots
2 (immediate-tag=? 1 4095 2308) ;; undefined?
3 (jne 2) ;; -> L1
4 (make-immediate 1 4) ;; #f
L1:
5 (immediate-tag=? 0 4095 2308) ;; undefined?
6 (jne 2) ;; -> L2
7 (make-immediate 0 4) ;; #f
L2:
8 (immediate-tag=? 1 3839 4) ;; false?
9 (jne 2) ;; -> L3
10 (make-immediate 1 2) ;; 0
11 L3:
12 (immediate-tag=? 0 3839 4) ;; false?
13 (jne 2) ;; -> L4
14 (make-immediate 0 2) ;; 0
L4:
15 (call-scm<-scm-scm 2 1 0 0) ;; add
16 (reset-frame 1) ;; 1 slot
17 (handle-interrupts)
18 (return-values)
Instruction 0 is also bind-optionals.
Instructions 2-7 fill in the default value but in this case the value is
#:false rather than 0.
The next two sections check the argument values a second time and replace the argument
value with 0 if it was false (instructions 8-14) - these represent the
or statements in the source code.
Instruction 15-18 are the body of the function as found in the simple call.
The bytecode for calling a function using default values does not differ from the bytecode for the simple call.
Historical Record
Default values were included in the original optargs implementation
(guile-source)
NEWS lines 11451-11525
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
The semantics at that time were nearly identical to the current semantics.
The one notable difference is that, in the case where the author does not define a
default value and the caller does not supply one, the variable was not bound to any value.
This was changed as part of merging a branch that was mostly concerned with how modules
are handled.
I was unable to find any record about the reason why this change was
made*
Marius Vollmer, the developer who authored the commit, was very kind in responding to
an email enquiring about this.
Unfortunately it has been more than 2 decades since the change was made and it was not
notable enough to warrant a place in permanent memory.
(guile-optargs-unbound-to-false)
Diff of ice-9/optargs.scm.
Guile Source. Commit: 296ff5e78b8322fe4bf00c5ec1497dc28da776b8. The GNU Project. https://git.savannah.gnu.org/git/guile.git
(guile-devel-archive)
2001-05 lines 11305-11313
Index of /archive/mbox/guile-devel. The GNU Project. https://lists.gnu.org/archive/mbox/guile-devel/
.
This was included in the 1.8 release series (2006)*
I infer that it was first added in 1.8 because commit
296ff5e78b8322fe4bf00c5ec1497dc28da776b8, which implements the change, is between
commit 0f24e75b73b9c0c745971de639a53748a395a1cb, which bumps the numbers in
GUILE-VERSION from 1.7 to 1.9 (guile uses odd numbers for development versions and
even numbers for release versions) and commit
c299f186ff9693fc88859daef037e3d94cc7c0ff, which adds content to the NEWS file
regarding changes new in 1.6.
However, I did not find any content in the NEWS file which discusses the change
(including in more recent release notes).
(guile-manual)
Section 9.1.4 "A Timeline of Selected Guile Releases"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
Named Value
Provided through the #:key arguments to optargs definitions.
Syntax & Semantics
The caller must use the name specified by the author, which will also be the name used by
the local variable*
The caller uses a keyword to name the value while the author uses a symbol to name the
variable, as normal, but they both use the same name.
.
When an author mandates a named value they also implicitly create a default value.
Details about default values are described in the appropriate section; for the purpose of
named values all we need to know is that if a caller omits a value then the local variable
will be bound to #:false.
For example, an author of add-values could allow named values like this:
(define* (add-values #:key a b)
(+ (or a 0) (or b 0)))
This function could be called in any of the following ways, with the result shown in the proceeding comment:
(add-values) ; 0
(add-values #:a 3) ; 3
(add-values #:b 3) ; 3
(add-values #:a 3 #:b 3) ; 6
However, these would not be legal:
(add-values 3)
(add-values 2 #:b 2)
When an author specifies that a value can be named this also means that it must be named, if it is provided at all.
Additionally, an author may specify that a caller can supply arbitrary named values which
are not specified in the function signature (and not bound to a local variable, unless
the function is also variadic).
This is done by adding #:allow-other-keys*
Ironically, while the optargs module does not support implicit values for user-defined
code, #:allow-other-keys is itself an implicit value.
.
to the function signature.
For example, a definition of add-values with this feature would allow all of
the following calls to be legal, and they would all produce the value 6.
(define* (add-values #:key a b #:allow-other-keys)
(+ (or a 0) (or b 0)))
(add-values #:a 3 #:b 3)
(add-values #:a 3 #:b 3 #:c 34)
(add-values #:c 34 #:a 3 #:b 3)
(add-values #:a 3 #:b 3 #:multiplier 88)
Implementation
When Guile compiles a procedure it stores the list of valid argument names next to the
SCM object representing the code
(guile-source)
libguile/vm-engine.c lines 734, 736, 743-745;
libguile/vm.c lines 1003, 1008-1009
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
When a function using named values is defined, the compiler adds additional instructions
as a prelude in the definition.
0 (bind-kwargs 1 0 1 3 16351)
1 (alloc-frame 3) ;; 3 slots
2 (immediate-tag=? 1 4095 2308) ;; undefined?
3 (jne 2) ;; -> L1
4 (make-immediate 1 14) ;; 3
L1:
5 (immediate-tag=? 0 4095 2308) ;; undefined?
6 (jne 2) ;; -> L2
7 (make-immediate 0 22) ;; 5
L2:
8 (call-scm<-scm-scm 2 1 0 0) ;; add
9 (reset-frame 1) ;; 1 slot
10 (handle-interrupts)
11 (return-values)
Instruction 0, bind-kwargs, performs the bulk of the work required to process
named values at runtime.
First, it initializes all relevant local variables (eg, ones that have default values or
are named) to the undefined value
(guile-source)
libguile/vm.c lines 995-998
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
Next, it walks through the list of non-positional arguments with the assumption that every
even-indexed item is a keyword naming an argument and binds the odd-indexed items to the
appropriate variables*
Technically, it assumes that alternating value are keywords, except that it will
silently ignore a non-keyword if #:rest is included in the procedure
definition.
However, I am currently focused on named values in isolation of other features.
#:rest is discussed in the section on variadic functions.
.
(guile-source)
libguile/vm.c lines 1000-1037
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
Instruction 1 prepares the stack for the function call.
Instructions 2-7 ensure that the local variables which come from named values are initialized properly; these instructions are actually about default values, not named values, so they are discussed in the appropriate section.
Finally, instructions 8-11 are the body of the function as found in the simple definition.
The bytecode for the call is largely similar to the simple call, except that it also
sends the keywords as additional argument values to the function (while they are not
considered values within the body of a define* form, they are values at this
lower level):
1 (call-scm<-thread 8 62) ;; current-module
2 (static-ref 7 16324) ;; add-values
3 (call-scm<-scm-scm 8 8 7 111) ;; lookup-bound
4 (scm-ref/immediate 5 8 1)
5 (static-ref 4 16331) ;; #:a
6 (make-immediate 3 34) ;; 8
7 (static-ref 2 16340) ;; #:b
8 (make-immediate 1 38) ;; 9
9 (handle-interrupts)
10 (call 3 5)
Historical Record
Named values were introduced with the original implementation of the optargs
module
(guile-source)
NEWS lines 11451-11525
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
In the original implementation, the let-ok-template helper function generated
a scheme-level let (or let*) form which initially binds each
variable name to either the default value supplied by the author or a fresh undefined
variable
(guile-initial-optargs)
module/ice-9/optargs.scm lines 128-135
Guile Source. Commit: 7e01997e88c54216678271de36b1c2088377492d. The GNU Project. https://git.savannah.gnu.org/git/guile.git
The bindfilter local in the let-keyword-template helper replaced
these initial values with the values supplied by the caller
(guile-initial-optargs)
module/ice-9/optargs.scm lines 152-168
Guile Source. Commit: 7e01997e88c54216678271de36b1c2088377492d. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
The current implementation is largely the same, except that it takes advantage of
being able to directly manipulate the VM state from C code instead of using macros to
mutate code syntactically.
The maintainers decided to move the implementation from scheme to C in order to decrease
the latency associated with a function call that uses named values
(guile-devel-archive)
2009-10 lines 10053-10056, 14008-14009, 14139-14141
Index of /archive/mbox/guile-devel. The GNU Project. https://lists.gnu.org/archive/mbox/guile-devel/
.
One change, which may or may not be considered a bug fix, has to do with the way that
Guile processes named values when the function also contains a variadic variable.
Originally, Guile required that all named values precede all (other) variadic values.
This was changed so that variadic values may be intermingled with named values, so long
as variadic values are not keywords (unless #:allow-other-keys is set, in
which case they may be keywords)
(guile-source)
Commit message, complete diff.
Guile Source. Commit: ff74e44ecba55f50b2c2c84bad2f13bed9489455. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
Repeating Value
Guile uses the syntactically latest value given for an argument as the actual value for the argument.Syntax & Semantics
A repeating value is given by providing the name multiple times with different values:
(add-values #:a 1000 #:b 1001 #:a 1002) ;; 2003
Implementation
When Guile binds values to variables, it simply walks through the list of values provided by the caller, checks for keywords that match argument names, and fills in the variable with the following value when a match is found (guile-source)
libguile/vm-engine.c lines 734, 736, 743-745;
libguile/vm.c lines 1003, 1008-1009
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git . This algorithm implicitly causes the overwriting behavior without any explicit check for repeated values.
Historical Record
The original implementation of theoptargs module use let* for
creating bindings
(guile-initial-optargs)
module/ice-9/optargs.scm line 261module/ice-9/optargs.scm line 113
module/ice-9/optargs.scm line 156
module/ice-9/optargs.scm line 124
Guile Source. Commit: 7e01997e88c54216678271de36b1c2088377492d. The GNU Project. https://git.savannah.gnu.org/git/guile.git , so it would have also exhibited the overwriting behavior.
Implicit Value
This feature is not provided by Guile.
Typed Value
This feature is provided by defining methods in the Guile Object-Oriented Programming System (GOOPS).
Syntax & Semantics
In GOOPS methods are not contained within classes.
Instead, they are simply functions with type specifications.
When an author defines a method, they can choose to specify a type by providing a
two-element list in place of an argument name
(guile-manual)
Section 8.6 "Methods and Generic Functons"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
The first element of the list is the argument name and the second element is the class
name.
For example:
(define-method (add-values (a <number>) (b <number>))
(+ a b))
This defines a method that will only be executed when it is called with exactly two arguments which are both numbers. The author can also define alternative implementations for different types* While overloading is generally out of scope for this paper, the implementation section will not make sense without mentioning it. .
(define-method (add-values (a <list>) (b <list>))
(append a b))
Implementation
Defining a method defines two separate functions: a generic which is responsible for
method dispatching and the method which contains the body that the author defined
(guile-manual)
Section 8.6 "Methods and Generic Functons"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
A generic is an applicable struct (a structure which can be called like a function) which
stores all methods which share a name
(guile-source)
module/oop/goops.scm lines 2045-2245
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
When a caller uses that name with some arguments, the generic object searches all of the
methods associated with that name for a signature with the best fit
(guile-source)
module/oop/goops.scm lines 1381-1449
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
This indirectly causes an error if the caller provides a set of values which do not have a
compatible type: the searching process will fail to find a match and print a message like
this one:
ice-9/boot-9.scm:1685:16: In procedure raise-exception: No applicable method for #<add-values (2)> in call (add-values foo bar)
The bytecode for calling a function with typed values is identical to the simple call.
Historical Record
GOOPS was first added to the Guile repository in version 1.6
(guile-manual)
Section 9.1.4 "A Timeline of Selected Guile Releases"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
Its implementation was based on STklos, the object system for STK, and it was also
influenced by CLOS, the object system for Common Lisp
(guile-source)
module/oop/goops.scm lines 24-25
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
(guile-manual)
Section 8.0 "GOOPS"
Guile Manual. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git Shell command: info guile
.
The original code*
I am tentatively using 4b5d86e0334f6b8c0b37c55cf47a4cd30e7803e0 as the commit which
"adds GOOPS".
This is somewhat fictitious; 9 commits prior
(fdd70ea97c142dc8db1e3f147ac6f5bd6ae157c6) changed the major version "due to the merge
of GOOPS".
However, at the time this commit was made no GOOPS files had been added; the version
number was adjusted in anticipation of the merge, not after the merge was complete.
I am choosing the other commit as representing the initial add of GOOPS to the
repository because it is the last consecutive commit which adds or modifies a GOOPS
file after the version adjustment.
.
looks quite different from the modern version of the file due to changes in the core
language, bug fixes, and refactorings for performance and readability.
However, the underlying idea of storing all alternative implementations of a function in a
structure that uses a type signature to implementation map has remained stable.
Caller Destructuring
Provided through the apply function. This allows a caller to provide
a list which is destructured into a set of positional arguments.
Syntax & Semantics
Destructuring in guile has a simple interface, as apply is simply a function
that takes in a function as the first value, then any number of arbitrary values, and
requires a possibly empty list as the final value.
For example, all of these are equivalent:
(add-values 3 4)
(apply add-values 3 '(4))
(apply add-values '(3 4))
However, this would be an error because the final value is not a list:
(apply add-values 3 4)
Implementation
apply operates by first collecting all of the positional arguments into a
single list, with the final value as the tail of the list, and sending them to
scm_call_n*
A typical guile programmer (such as myself) who is familiar with the dotted list
syntax or the #:rest argument (both described in the section on variadic
functions) might be confused by the C-level API of the apply function.
It takes 2 required arguments and a rest argument.
The apply function takes any number of arguments and requires that the final argument
is a list.
Oddly, the body of the function simply calls cons* on the second required
argument and the rest argument; one might expect that this would cause the
user-provided list to be preserved instead of flattened.
However, when a C-level function is registered with a rest argument in guile it
receives an improper list of arguments, rather than the proper list provided when
#:rest is given to define*.
See the file examples/guile/c-gsubr-rest.scm (and its corresponding c
file) in the paper's repository for a demonstration.
The reason for this discrepency is not clear but the API of scm_apply
makes sense now.
(guile-source)
libguile/eval.c lines 585-622, lines 715-729
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
This function is responsible for setting up the VM stack correctly
(guile-source)
libguile/vm.c lines 1542-1621
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
For example, these lines save the frame's metadata and add the arguments to the correct
stack positions:
SCM_FRAME_SET_VIRTUAL_RETURN_ADDRESS (call_fp, vp->ip);
SCM_FRAME_SET_MACHINE_RETURN_ADDRESS (call_fp, 0);
SCM_FRAME_SET_DYNAMIC_LINK (call_fp, return_fp);
SCM_FRAME_LOCAL (call_fp, 0) = proc;
for (i = 0; i < nargs; i++)
SCM_FRAME_LOCAL (call_fp, i + 1) = argv[i];
These tasks are handled by different components in the simple call.
The frame's metadata is typically set by the call instruction , while the
arguments are added to the stack by the by separate instructions such as
make-immediate or scm-ref.
Historical Record
The notion of list destructuring with apply was included in the original
paper defining the lisp programming language
(original-lisp-paper)
Pages 189-190
McCarthy, John. Recursive functions of symbolic expressions and their computation by machine, Part I. Association for Computing Machinery, 1980. https://doi.org/10.1145/367177.367199
.
This original definition took exactly 2 parameters, a function and a list containing the
arguments.
r2rs extended this by allowing the caller to pass any number of values
between the function and the list
(r2rs)
Page 57
Revised Revised Report on Scheme or An UnCommon Lisp. Massachusetts Institute of Technology - Artificial Intelligence Laboratory. https://standards.scheme.org/official/r2rs.pdf
In the original interpreter, Guile simply called the function after manually unpacking the
arguments
(guile-first-commit)
libguile/eval.c lines 1869-1985
Guile Source. Commit: 0f2d19dd46f83f41177f61d585732b32a866d613. The GNU Project. https://git.savannah.gnu.org/git/guile.git
With the release of Guile 2.0, which added the virtual machine
(guile-source)
NEWS lines 5068-5071
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
the apply function delegated to scm_call_with_vm, a dedicated
C-level helper function for calling scheme-level functions
(guile-2.0-source)
libguile/eval.c lines 782-802
Guile Source. 2.0. Commit: 958a28e9fec33ebb4673294308a82ccd18cc6071. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
This was replaced with the scm_call_n function as part of a refactoring which
removed explicit references to the VM from many locations in C code as well as eliminating
VM visibility from scheme code
(guile-use-scm-call-n)
Commit message and diff of libguile/eval.scm at -486,41 +486,41
Guile Source. Commit: 55ee3607003702ef5c53994c6216b9f0f835e0f1. The GNU Project. https://git.savannah.gnu.org/git/guile.git
The reason for this change is not explicitly stated in the commit messages but is likely
related to performance improvements attributed to a rewrite of the virtual machine in the
Guile 2.2 release notes
(guile-source)
NEWS lines 1995-2004, 2124-2205
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
Author Destructuring
This feature is not provided by Guile.
Variadic Function
Provided by the "dotted list" syntax and the #:rest keyword in a
define*.
Syntax & Semantics
The #:rest syntax allows an author to define a variadic variable that will
contain a (possibly empty) list of variadic values.
The variadic variable must be declared after all other declarations.
For example, this definition and call would have the following
value*
As the variadic variable contains a list, the below code uses apply to
destructure the list.
This is described in detail in the section on caller destructuring.
:
(define* (add-values a b #:rest variadic-variable)
(apply + a b variadic-variable)
(add-values 1000 2001 1002 2003)
; 6006
The dotted list syntax is similar, except that the author uses a single period instead of
the #:rest keyword and this syntax works without optargs:
(define* (add-values a b . variadic-variable)
(apply + a b variadic-variable)
(add-values 1000 2001 1002 2003)
; 6006
Other than this syntactic difference, the two methods of defining a variadic function are identical.
Implementation
For definitions, the only notable difference from the simple function is the addition of
the bind-rest instruction near the beginning of the body.
This instruction takes values from the stack and constructs a new list which contains all
of them; this list is bound to the variadic variable
(guile-source)
libguile/vm-engine.c lines 756-783
libguile/vm.c lines 1041-1052
Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
static SCM
cons_rest (scm_thread *thread, uint32_t base)
{
SCM rest = SCM_EOL;
uint32_t n = frame_locals_count (thread) - base;
while (n--)
rest = scm_inline_cons (thread, SCM_FRAME_LOCAL (thread->vm.fp, base + n),
rest);
return rest;
}
For calls, the bytecode is identical to the simple call.
Historical Record
The dotted list syntax was first described in r2rs, with the same
semantics that are in use today
(r2rs)
Page 13
Revised Revised Report on Scheme or An UnCommon Lisp. Massachusetts Institute of Technology - Artificial Intelligence Laboratory. https://standards.scheme.org/official/r2rs.pdf
#:rest syntax was originally used by other lisp dialects and
implementations of Scheme; Guile provided it in the original release of the optargs module
in order to ease the transition for programmers used to those languages
(guile-source)
NEWS lines 11514-11518Guile Source. 3.1.2. Commit: 3b76a30e3ca1f0b7ee7944836c2fc5660596b3bd. The GNU Project. https://git.savannah.gnu.org/git/guile.git
As described in the section on positional values, the oldest versions of Guile manually
collected argument values into a Scheme-level list and dispatched based on the number of
values given.
This made it trivial for the interpreter to supply the variadic values by dropping
elements from the front of the list of all arguments
(guile-first-commit)
libguile/eval.c lines 1652-1655
Guile Source. Commit: 0f2d19dd46f83f41177f61d585732b32a866d613. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
When the VM was added in version 2.0, the bind-rest instruction was included.
It performs essentially the same work as the current implementation
(guile-2-0-source)
libguile/vm-i-system.c lines 718-732
Guile Source. 2.0. Commit: 958a28e9fec33ebb4673294308a82ccd18cc6071. The GNU Project. https://git.savannah.gnu.org/git/guile.git
.
Rust
Rust provides the following features:
- Positional Value
- Typed Value
- Author Destructuring
Background
Rust has a compiler which produces native code. It does not have a stable ABI and creating one is not considered desirable by the Rust developers (#600) Issue #600. "Define a Rust ABI". https://github.com/rust-lang/rfcs/issues/600 . When targeting Linux, the compiler appears to produce code that follows the System V ABI.
This paper uses Intel-style assembly syntax, for ease of use with the Intel Software Developer Manual.
Simple Function and Call
fn add_values(a: i32, b: i32) -> i32 {
a + b
}
fn main() {
add_values(1000, 1001);
}
The assembly for the call to add_values does 2 things.
First, it pushes the literal values 1000 and 1001 to the edi and
esi registers, respectively.
This follows the System V convention that the first 2 arguments are passed in these
registers (the specification implicitly assumes that all arguments are positional)
(SV-ABI)
Page 27.
System V Application Binary Interface. 2025. Lu, H.J., et al. https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/10470667619/artifacts/file/x86-64-ABI/abi.pdf
.
Then it calls the add_values function.
mov edi,0x3e8 mov esi,0x3e9 call 7940 <_ZN6simple10add_values17h33b93751bca0b5fbE>
The body of add_values does a number of things.
First it pushed the rax register onto the stack; this conventionally contains
the return value of function calls
(SV-ABI)
Page 27.
System V Application Binary Interface. 2025. Lu, H.J., et al. https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/10470667619/artifacts/file/x86-64-ABI/abi.pdf
.
Next, it adds the given arguments directly in the registers used to pass them and moves the
result onto the stack using rsp, which conventionally contains the stack
pointer
(SV-ABI)
Page 27.
System V Application Binary Interface. 2025. Lu, H.J., et al. https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/10470667619/artifacts/file/x86-64-ABI/abi.pdf
.
The next two instructions, seto and jo, are used to handle
overflows (in debug mode, Rust panics on an overflow).
Assuming there is no overflow, the code then moves the result of the addition from the
stack into the eax*
Note that eax is a subset of rax, so this is setting the
lower bits of the rax register; it is only setting the lower bits due to
the size of the integer specified in the code.
register.
Finally, it moves the top of the stack into the rcx register*
It is not clear why it is helpful to push into rcx; it is conventionally
used as the 4th positional argument and is not referenced at all in the calling site.
and returns to the calling site.
0000000000007940 <_ZN6simple10add_values17h33b93751bca0b5fbE>: push rax add edi,esi mov DWORD PTR [rsp+0x4],edi seto al jo 7952 <_ZN6simple10add_values17h33b93751bca0b5fbE+0x12> mov eax,DWORD PTR [rsp+0x4] pop rcx ret lea rdi,[rip+0x4c5ef] # 53f48 <_ZN3std3sys3pal4unix4args3imp15ARGV_INIT_ARRAY17hf0e4ba666b615240E+0x48> call QWORD PTR [rip+0x4f061] # 569c0 <_GLOBAL_OFFSET_TABLE_+0x1c0> nop
Positional Value
Provided as the only way to pass arguments.
Syntax & Semantics
Callers specify positional values by providing a comma-separated list of values. Authors specify local variable names with a corresponding list of comma-separated identifiers.
Implementation
The simple function and call use positional arguments; there are no additional features to examine here.
Historical Record
Rust used positional arguments since the pre-RFC era. Systems programming languages typically use positional arguments, as evidenced by the assumption in the System V ABI assumption that all arguments are positional.
Default Value
This feature is not provided by Rust.
Named Value
This feature is not provided by Rust.
Repeating Value
This feature is not supported by Rust. As it only allows passing values by position, it is not syntactically possible to supply multiple values for the same argument.
Implicit Value
This feature is not provided by Rust.
Typed Value
All function arguments in Rust are required to have an associated type.
Syntax & Semantics
Types are specified by placing a colon after the local variable name, then the type name:
fn add_values(a: i32, b: i32) -> i32 {
a + b
}
There are 2 reasons why the compiler might reject code due to the declared types.
One reason is that the given values are explicitly incompatible.
For example, if the user tries to pass literal strings when calling the above definition
of add_values, the compiler will emit this error
(rust-source)
compiler/rustc_hir_typeck/src/fn_ctxt/check.rs lines 791-793, 748-753
Rust Source. 1.87.0. Commit: 17067e9ac6d7ecb70e50f92c1944e545188d2359. The Rust Foundation. https://github.com/rust-lang/rust
:
error[E0308]: arguments to this function are incorrect
--> simple.rs:7:5
|
7 | add_values("1000", "1001");
| ^^^^^^^^^^ ------ ------ expected `i32`, found `&str`
| |
| expected `i32`, found `&str`
|
note: function defined here
--> simple.rs:1:4
|
1 | fn add_values(a: i32, b: i32) -> i32 {
| ^^^^^^^^^^ ------ ------
error: aborting due to 1 previous error
The other reason for an error is that type inference fails. While local variables declared in the function signature must have an explicit type, local variables declared in the body of the function do not; in this case, there is still an associated type but this type is inferred by the compiler based on usage. However, the programmer might try to use the variables in such a way that no associated type will satisfy all of the usage constraints. For example, if we additionally define this function:
fn add_unsigned_values(a: u32, b: u32) -> u32 {
a + b
}
Then try to call both of them with the same variables:
fn main() {
let a = 1000;
let b = 1001;
add_values(a, b);
add_unsigned_values(a, b);
}
The compiler will first infer that a and b must be of type
i32 based on the first call
(rust-source)
compiler/rustc_hir_typeck/src/fn_ctxt/check.rs lines 313-319
Rust Source. 1.87.0. Commit: 17067e9ac6d7ecb70e50f92c1944e545188d2359. The Rust Foundation. https://github.com/rust-lang/rust
,
then finds an incompatibility in the second call
(rust-source)
compiler/rustc_hir_typeck/src/fn_ctxt/check.rs lines 791-793, 748-753
Rust Source. 1.87.0. Commit: 17067e9ac6d7ecb70e50f92c1944e545188d2359. The Rust Foundation. https://github.com/rust-lang/rust
:
error[E0308]: arguments to this function are incorrect
--> typed-value-error.rs:13:5
|
13 | add_unsigned_values(a, b);
| ^^^^^^^^^^^^^^^^^^^ - - expected `u32`, found `i32`
| |
| expected `u32`, found `i32`
|
note: function defined here
--> typed-value-error.rs:5:4
|
5 | fn add_unsigned_values(a: u32, b: u32) -> u32 {
| ^^^^^^^^^^^^^^^^^^^ ------ ------
help: you can convert an `i32` to a `u32` and panic if the converted value doesn't fit
|
13 | add_unsigned_values(a.try_into().unwrap(), b);
| ++++++++++++++++++++
help: you can convert an `i32` to a `u32` and panic if the converted value doesn't fit
|
13 | add_unsigned_values(a, b.try_into().unwrap());
| ++++++++++++++++++++
Implementation
The generated assembly accesses memory under the assumption that the given values and/or pointers contain the expected data layout. While this means that the generated assembly will be different if the types in use are changed, there is no type-checking logic to examine in the final output.
Historical Record
Rust has required types on local variables declared in function signatures since the pre-RFC era.
Caller Destructuring
This feature is not provided by Rust.
Author Destructing
In Rust both tuples and structs can be destructured* Technically enums and slices can be destructured in function signatures as well, but the pattern must be irrefutable - meaning that only enums with a single variant can be destructured, and slices can only be destructured if all values are ignored. by the author.
Syntax & Semantics
For both tuples and structs the author specifies a pattern in place of a variable name:
fn add_values_tuple_destructured((a, b): &(i32, i32)) -> i32 {
a + b
}
struct Arguments {
a: i32,
b: i32,
}
fn add_values_struct_destructured(Arguments { a: a, b: b }: &Arguments) -> i32 {
a + b
}
In each case a local variable is named based on the names in the left side of the colon. In the case of structs, the right side of the inner colons represents the name of the local variable while the left side represents the name of the member in the struct.
Additionally, the author can choose to ignore some members of an input when destructuring.
They do this by specifying only the elements that they want to match and placing a literal
.. to indicate that other elements should be ignored:
fn add_values_tuple_too_many_destructured((a, b, ..): &(i32, i32, i32)) -> i32 {
a + b
}
struct TooManyArguments {
a: i32,
b: i32,
#[allow(dead_code)]
c: i32,
}
fn add_values_struct_too_many_destructured(TooManyArguments { a, b, .. }: &TooManyArguments) -> i32 {
a + b
}
Implementation
Use of destructuring does not change how the argument is passed.
In either case, a pointer to the given structure is passed using the expected register.
For example, given the following function definitions:
fn add_values_tuple(tuple: &(i32, i32)) -> i32 {
tuple.0 + tuple.1
}
fn add_values_tuple_destructured((a, b): &(i32, i32)) -> i32 {
a + b
}
fn add_values_struct(arguments: &Arguments) -> i32 {
arguments.a + arguments.b
}
fn add_values_struct_destructured(Arguments { a: a, b: b }: &Arguments) -> i32 {
a + b
}
The calling site will always load the address of the structure into rdi:
lea rdi,[rip+0x3e600] # 46018 <_fini+0x24c> call 7970 <_ZN20author_destructuring16add_values_tuple17h78fb3f52970da270E> lea rdi,[rip+0x3e5f4] # 46018 <_fini+0x24c> call 79a0 <_ZN20author_destructuring29add_values_tuple_destructured17h565795344f78b8c3E> lea rdi,[rip+0x3e5e8] # 46018 <_fini+0x24c> call 79c0 <_ZN20author_destructuring17add_values_struct17h7b66a2e7b4d693a8E> lea rdi,[rip+0x3e5dc] # 46018 <_fini+0x24c> call 79f0 <_ZN20author_destructuring30add_values_struct_destructured17hfb478771134896efE>
However, the bodies of the functions differ*
While the bodies of the destructured functions differ from their non-destructured
counterparts, the differences between a destructured tuple and destructured struct are
negligible.
when destructuring is used.
When it is not used, the function looks similar to the simple function, except
that it accumulates the values into eax by dereferencing the pointer in
edi instead of using edi (and esi) directly:
push rax mov eax,DWORD PTR [rdi] <---- pointer dereferenced> add eax,DWORD PTR [rdi+0x4] <---- dereferenced with offset, for "second" argument> mov DWORD PTR [rsp+0x4],eax <---- eax stored on the stack> seto al jo 7985 <_ZN20author_destructuring16add_values_tuple17h78fb3f52970da270E+0x15> mov eax,DWORD PTR [rsp+0x4] <---- eax restored from stack> pop rcx ret lea rdi,[rip+0x4c574] # 53f00 <_ZN3std3sys3pal4unix4args3imp15ARGV_INIT_ARRAY17hf0e4ba666b615240E+0x48> call QWORD PTR [rip+0x4f02e] # 569c0 <_GLOBAL_OFFSET_TABLE_+0x1c0> cs nop WORD PTR [rax+rax*1+0x0] nop DWORD PTR [rax+0x0]
However, when destructuring is used, a pointer to the second value is loaded into
rsi and a helper function is called:
push rax mov rsi,rdi <---- rsi initialized with rdi> add rsi,0x4 <---- rsi offset by same offset as in the non-destructuring version> lea rdx,[rip+0x4c569] # 53f18 <_ZN3std3sys3pal4unix4args3imp15ARGV_INIT_ARRAY17hf0e4ba666b615240E+0x60> call 78d0 <---- helper function called> <_ZN49_$LT$$RF$i32$u20$as$u20$core..ops..arith..Add$GT$3add17h90bb72e1fc0da902E> pop rcx ret cs nop WORD PTR [rax+rax*1+0x0]
The helper function is similar to the accumulation performed in the non-destructured versions:
mov eax,DWORD PTR [rdi] <---- eax initialized with first argument> add eax,DWORD PTR [rsi] <---- second argument accumulated into eax> mov DWORD PTR [rsp+0x14],eax <---- eax stored on the stack> seto al jo 78ef <_ZN49_$LT$$RF$i32$u20$as$u20$core..ops..arith..Add$GT$3add17h90bb72e1fc0da902E+0x1f> mov eax,DWORD PTR [rsp+0x14] <---- eax restored from stack> add rsp,0x18 ret
The discrepency is a reasonable literal translation of destructuring into assembly code: instead of dereferencing the pointer inline, the second part of the pattern is moved into the second positional register and a helper function which takes 2 (theoretically unrelated) positional arguments is called.
The assembly for the bodies of destructuring functions that ignore some elements is identical to the above assembly; ignored values are literally ignored.
Historical Record
Pattern-matching has been available in Rust since the pre-RFC era. However, it has evolved in the RFC era.
RFC 1492 allows the construct .. to appear alongside variable names in
destructured tuples.
This has the effect of ignoring unspecified members: previously, .. could
only be used to ignore all members in a tuple
(RFC 1492)
text/1492-dotdot-in-patterns.md
Rust Requests For Comment (RFCs). Commit: ea1f59fbc782b0c4cab5b19a628d9aa9ad834a58. Rust Foundation. https://github.com/rust-lang/rfcs.git
.
It was added for consistency with the similar feature usable in slices (subsets of
sequences)
*
The utility of this feature was probably seen as obvious by the developers, due to
their experience with the similar feature for slices.
It makes code more terse and readable when the author does not need all of the tuple
members.
.
There have been other changes to patterns through the RFC process, however they are not
relevant here due to the requirement that patterns in function signatures are irrefutable.
For example, there have been changes to conditional guards on patterns which can be
refuted when the condition evaluates to false.
Variadic Function
This feature is not provided by Rust.