Combinable Records: Motivation

Note: combinable records are deprecated

I recently published the beginnings of a patch that adds the combine property to fields in define-record-type*. This is a somewhat complex feature that is similar to, but distinct from, record inheritance.

Guix has been written as a collaboration of literally thousands of programmers, and the most prolific contributors have decades of experience, including particular expertise in Guile programming.

Furthermore, there are good arguments for the simplicity of the current facilities. They allow constructing an object which has exactly one parent value that provides defaults for any field not otherwise specified. While inheritance in any form - datatype definition or for value construction - is useful for reducing duplicate work, it adds reading complexity. When multiple values are inherited, readers have to trace through multiple parents in order to understand the final object, and ambiguity is created when parents conflict (this is what causes the "deadly diamond of death" in datatype inheritance).

However, when working on defining a Guix system that integrates with QubesOS I have found the need to combine pieces of configuration in a more complicated manner, and there is evidence of a similar need in Guix itself.

Inheritance in Guix

Guix uses a custom record facility powered by define-record-type*. It facilitates record definitions and construction with a lispier syntax and allows inheriting values from a parent object. For example:

(use-modules (guix records) (ice-9 pretty-print))

; Data type definition, there are a lot of symbols which define-record-type* will expand
; into helper procedures/macros. For the purposes of this example, the important parts are
; that the symbol `point` expands into a constructor, `x` and `y` are the members of
; the structure, and `point-x` and `point-y` are accessors.
(define-record-type* <point> point make-point
  point?
  this-point
  (x point-x (default 0))
  (y point-y (default 0)))

(define base (point (x 1) (y 2)))

(pretty-print base)
; Prints: #<<point> x: 1 y: 2>

(define (move-up start magnitude)
  (point (inherit start) (y (+ (point-y start) magnitude))))

(pretty-print (move-up base 3))
; Prints: #<<point> x: 1 y: 5>
(pretty-print (move-up base -3))
; Prints: #<<point> x: 1 y: -1>
(pretty-print (move-up (move-up base 38) 4))
; Prints: #<<point> x: 1 y: 44>

The move-up procedure creates a new point which has been moved "up" (positive in the y direction) by the given magnitude. The inheritance declared at the beginning means that the x value of the new point is defined by the given point start while the y value is overwritten by the explicit declaration (+ (point-y start) magnitude).

Utilization of Inheritance in Guix

Packages

The most voracious users of inheritance in guix are the (gnu packages *) submodules. Inheritance is useful when defining multiple variants of the same package, if those variants cannot be expressed as different outputs. For example, there are several variants of the gdb package because the different variants require different configure flags which produce conflicting artifacts. The current implementation of inheritance is perfect for this task.

Systems

The (gnu system *) submodules also use inheritance, albeit more sparingly. Usage here is similar to usage in the packages submodules: some base system is defined, then variants on that system are defined. For example, the installer ISO is defined in gnu/system/install.scm, and this will work well for most systems, but special variants are defined for certain scenarios, such as installation onto a pinebook. These are defined by inheriting from the primary installer definition and overriding the relevant fields. Again, current inheritance works well.

However, there is an idiom within the code which is not explicitly organized. For example, consider the operating system defined in gnu/system/examples/desktop.impl. This definition is presented to the user as one of several starting points for crafting their own definition. Below is a relevant subset of the configuration elements it uses:

(operating-system
...
  (packages (append (list nss-certs gvfs) %base-packages))
  (services (append (list (service gnome-desktop-service-type)
                          (service xfce-desktop-service-type)
                          (set-xorg-configuration
                           (xorg-configuration
                            (keyboard-layout keyboard-layout))))
                    %desktop-services)

Both the packages and services fields provide examples of how the user can customize the system while retaining some reasonable baseline value, provided by %base-packages and %desktop-services.

%base-packages is defined in guix/gnu/system.scm and contains a list of packages that most every GNU user will want to have - sudo for managed admin privileges, iproute for networking configuration, etc. It contains the core of base packages that a distribution would typically provide.

%desktop-services is defined in guix/gnu/services/desktop.scm and provides more than a dozen services which implement modern desktop expectations, such as NetworkManager for automatic network configuration and multiple screen lockers. It also includes %base-services which provides essential services such as TTYs and the Guix daemon itself.

So here we have two fragments of operating system definitions: the base fragment which directly provides packages and indirectly provides services, and the desktop fragment which provides only services, including the indirectly provided %base-services. The directly provided values are hardcoded (%desktop-services appears literally in the operating-system declaration) while the indirectly provided values are not (%base-services is provided through %desktop-services).

Consider what happens as the definitions of these fragments change. If new services are added to either %base-services or %desktop-services, users who followed the template and included %desktop-services to their definition will receive the update. However, if there is a need to define %desktop-packages as a new component of the desktop fragment then all users must change their operating system definition to use this new variable instead of the previously used %base-packages. The definition is "inheriting" from a set of variables which are abstractly related, but this relationship is not reified in code.

Applicability to QubesOS

While there is evidence in the current Guix code that this is immediately useful to Guix, there is also evidence that this will be useful to maintaining a Guix template for QubesOS. For example, in order to support automatic network configuration a template needs a set of services to connect to qubesdb and configure networking and also needs to have the /proc/xen filesystem mounted which is 2 different fields.

Combinability Objectives

There are 4 objectives which are partially met at the design level:

  1. Complement, not replace, linear inheritance
  2. Support an arbitrarily large number of fragments
  3. Minimize reading complexity, without ignoring actual complexity
  4. Break clearly if fragments are incompatible

Furthermore, these 2 rules support the first 3 objectives:

  1. Combinability is opt-in (by the author of the record type)
  2. Combiners are additive*

For example, consider the following code:

(use-modules (guix records)
             (srfi srfi-1)
             (srfi srfi-26))
; The definition of a quote data type includes 3 fields: author, which contains the name
; of the person who said it; text, which contains the words that the author said; and
; tags, which is a list of arbitrary strings.
;
; The tags field contains a combine attribute which defines the procedure used to combine 
; values from different sources.
(define (combine-tags tag-list)
  (apply lset-union string=? tag-list))

(define-record-type* <quote> quote make-quote
  quote?
  this-quote
  (author quote-author)
  (text   quote-text)
  (tags   quote-tags (default '()) (combine combine-tags)))

; Next, several quote fragments are defined. They can only contain a tags field because it
; is the only field with a combine procedure. Trying to add an author or text to a
; fragment declaration would result in an error.
(define star-trek-quote
  (quote-fragment (tags '("sci-fi" "fiction" "star-trek"))))

(define roman-quote
  (quote-fragment (tags '("historical" "non-fiction" "roman"))))

(define aggressive-quote
  (quote-fragment (tags '("aggressive" "color-association:red"))))

(define peaceful-quote
  (quote-fragment (tags '("peaceful" "color-association:green"))))

; The quotes themselves also define tags and are combined with the tags contained in the
; fragments they are associated with.
(define vulcan-hello-quote (quote
  (author    "Michael Burnham")
  (text      "They said 'Hello' in a language the Klingons understood.")
  (tags      '("disco-s1-e1"))
  (fragments (list star-trek-quote aggressive-quote))))

(define species-10c-quote (quote
  (author    "Michael Burnham")
  (text      "(T)he ruins down here can tell us how they lived, what's important to
              them. Cultural context, a way to begin communicating.")
  (tags      '("disco-s4-e11"))
  (fragments (list star-trek-quote peaceful-quote))))

(define conquering-quote (quote
  (author    "Julius Caesar")
  (text      "I came, I saw, I conquered")
  (tags      '("indirect-sourcing"))
  (fragments (list historical-quote aggressive-quote))))

(define precedent-quote (quote
  (author    "Julius Caesar")
  (text      "All bad precedents begin as justifiable measures")
  (tags      '("unclear-sourcing"))
  (fragments (list historical-quote peaceful-quote))))

In this simple example, where fragments only contain one field, it would make sense to use the idiom and define %star-trek-quote-tags, %historical-quote-tags, etc. But in an operating system definition there are many fields which could be combined and different fields could be related to each other. For example, this is part of the configuration required for a Guix installation to be well-integrated into a QubesOS host:

(define qubesos-guest
  (operating-system-fragment
    (packages (list qubesdb))
    (services (list
      (service kernel-module-loader-service-type
        '("xen_blkback"
          "xen_evtchn"
          "xen_fbfront"
          "xen_gntdev"
          "xen_privcmd"
          "xen_scsiback"
          "xenfs"))
      (service qubesdb-service-type)
      (service qubes-networking-service-type)))
    (file-systems (list
      (file-system
        (mount-point "/proc/xen")
        (device      "xenfs")
        (type        "xenfs")
        (options     "defaults"))))))

With the idiom, each guest would have to include the packages, services, and file-systems separately. And if in the future it turns out that a new initrd module is needed, everyone would have to separately update their definitions to include the new variable.

The combinability system supports the 4 objectives as much as is reasonable for a generic feature. Discipline in use is required to meet them fully,

* A notable exception to this is the example of combining configuration settings for a specific piece of software. In this case it might make sense to have fragments override settings in previous fragments, instead of merely adding to them. This is not a new problem for software configuration, where we currently have default settings that the program ships with, system-wide settings it /etc, and per-user settings in $HOME, each of which could override previous definitions.

1. Complement, not replace, linear inheritance

This objective is enforced by the feature's design.

This feature does not modify linear inheritance at all, although it does strive to be compatible with it. A record can inherit from another record and still include fragments. The fragment values will be combined with the child's definition if one is given, otherwise they will be combined with the parent's definition. Furthermore, fragments themselves are defined in terms of define-record-type*, so they can inherit from each other.

2. Support an arbitrarily large number of fragments

This objective is supported by the feature's construction but requires discipline when authoring types.

This is achieved with careful selection of combine procedures. In the quote example, lset-union is used to combine groups of tags together without the inefficiency of storing multiple tags. This procedure can be used on an arbitrarily large number of fragments.

Other kinds of fields could be combined differently. For example, a field might contain a string which will be written out to a configuration file. This field might define its combine procedure as (cute string-join <> "\n").

3. Minimize reading complexity without ignoring actual complexity

This is supported by the feature's construction but requires discipline when authoring types.

Inheritance is most complex when components interact with each other. For example, it is more difficult to understand an inheritance tree which defines and overrides the same method multiple times, compared to one that combines independent units of functionality.

The combination mechanism is intended to be purely additive. If 2 different operating system fragment definitions contain different services, the operating system that combines them will contain all of the services that both of them declare. Different fragments do not interact with each other, they are simply added on top of each other. This makes it tedious to manually trace through all of the fragments in order to determine the final set of services provided, but given those definitions there is no ambiguity about what will end up in the complete operating system.

Incidentally, this is one of the main reasons why combination is opt-in: for some fields, there is no sensible way to combine them. For example, the kernel of a particular operating system is a single fixed value, and it makes no sense to say "I want an operating system that simultaneously uses Linux and NetBSD". Therefore, it is incorrect (though technically possible) to define a combine procedure for the kernel field of an operating system. So long as there is no combine procedure for the kernel field, any attempt to set one in a fragment will result in an error.

4. Break clearly if fragments are incompatible

This objective is partially supported by this feature, but requires potentially significant work by the type author.

The (in)compatibility of fragments is entirely dependent on the semantics of the structure, which can only be determined when the type is created. define-record-type* already defines a sanitize attribute which should be useful for this purpose.

Download the markdown source and signature.