scheme-langserver

Type System & Inference

This document replaces the older type-inference.md and type-inference.cn.md. It describes the complete type-inference pipeline implemented in analysis/type/.

1. Design Philosophy

Scheme is an untyped language. Unlike TypeScript or Typed Racket, it does not require programmers to annotate parameters or return values. Consequently, a traditional Hindley–Milner (HM) or System-F-based approach cannot be applied directly: there are no explicit type annotations to seed the inference process.

scheme-langserver takes a different path. It is built on the following assumptions and observations:

  1. Executable code is self-describing. If a program runs correctly under R6RS, then every primitive call (+, car, vector-ref, …) already carries strong type constraints. For example, in (lambda (a) (+ 1 a)) the parameter a must be a number? because + demands it.
  2. Gradual typing is sufficient for IDE diagnostics. We do not need a sound, machine-verified proof. We need useful information for hover-tooltips, jump-to-definition, and basic error detection. If we occasionally over-approximate a type as something?, the user still gets a working language server.
  3. The AST itself can act as a type variable. Instead of inventing a separate layer of type variables (α, β, …), every index-node in the virtual file system doubles as a placeholder for an unknown type. This makes the substitution list (the equation store) physically attached to the syntax tree, which simplifies debugging and caching.

The result is a two-phase pipeline:

Phase Module What it does
Substitution Generation substitutions/generator.sls + substitutions/rules/*.sls Walks the AST and builds a set of equations (substitutions) that relate AST nodes to type expressions.
Interpretation / Unification domain-specific-language/interpreter.sls Evaluates the equations like a small interpreter, expanding macros, applying λ-abstractions, and resolving index-node placeholders until concrete types are reached.

Both phases are expressed in a small domain-specific language (DSL) that lives inside the type subsystem.

For the list-pattern matcher used when applying lambda types and macro heads, see syntax-candy.md.


2. The Type Expression DSL

Type expressions are ordinary Scheme S-expressions. There is no separate parser. The DSL is validated and manipulated by predicates in domain-specific-language/inner-type-checker.sls.

2.1 Atoms

Form Meaning
something? The top type (universal). Every value inhabits it.
inner:void? The type of (void).
<- Function-arrow marker (used in infix position).
... Kleene-star: the preceding element may appear zero or more times.
**1 Kleene-plus: the preceding element may appear one or more times.

2.2 Composite Forms

Form Meaning
(inner:list? τ₁ τ₂ …) Proper list whose elements have types τ₁, τ₂, …
(inner:vector? τ₁ τ₂ …) Vector whose elements have types τ₁, τ₂, …
(inner:pair? τ-car τ-cdr) Pair (dotted or improper list).
(τ-ret <- (inner:list? τ-arg1 τ-arg2 …)) Function type: returns τ-ret, accepts arguments τ-arg1
(inner:record? pred (inner:pair? getter τ-field) …) Recogniser exists but not yet generated as a type value. Record types are currently represented indirectly via the predicator identifier and per-field getter/setter signatures.

2.3 Identifier References as Types

Wherever a raw symbol such as number? appears, the system actually stores an identifier-reference record (see analysis/identifier/reference.sls). This preserves library scoping: number? from (rnrs) and a user-defined number? are distinct types. The helper construct-type-expression-with-meta performs this wrapping.

2.4 Index-Nodes as Placeholders

An index-node (an AST node from the VFS) may appear inside a type expression. This is how the system represents unknowns. During interpretation the interpreter looks up the index-node-substitution-list of that node and continues evaluating. In HM terminology, an index-node is both a type variable and the term it describes.

2.5 Example Type Expressions

;; + under R6RS
(number? <- (inner:list? number? ...))

;; car, expressed as a macro (see §6)
((with ((a b c **1))
   (with-equal? inner:list? a b))
 (inner:list? fixnum? number?))

;; Record types are not yet emitted as unified values.
;; Instead, the predicate `point?` and accessors carry separate signatures:
;;   point?    : (boolean? <- (inner:list? something?))
;;   point-x   : (something? <- (inner:list? point?))
;;   make-point: (point? <- (inner:list? something? ...))

3. Phase I – Substitution Generation

The entry point is construct-substitutions-for in substitutions/generator.sls. It maps step over every top-level index-node in a document.

3.1 The step Dispatcher

step is a case-lambda with two arities:

  1. Normal mode (document index-node expanded+callee-list) – used for the main AST.
  2. Quasi-quoted mode (document index-node available-identifiers quasi-quoted-syntaxed expanded+callee-list) – used when descending inside ` or #' forms, so that unquote/unsyntax boundaries switch back to normal mode.

The dispatcher distinguishes five coarse shapes:

Shape Action
Leaf node trivial-process (literals, symbols, self-evaluating values)
quote / quasiquote / syntax / quasisyntax Special handling; syntax is skipped entirely because it is not a value.
Non-leaf list whose car is a symbol Look up available identifier references, route to a rule, then recurse into children.
Non-leaf list whose car is not a symbol Must be an application (e.g. ((lambda (x) x) 42)). Run application-process, then recurse.

3.2 Rule Routing

establish-available-rules-from takes the list of identifier references that are visible at the current index-node and returns a list of (rule-procedure . identifier-reference) pairs. It filters out parameter and syntax-parameter bindings (those are handled by the enclosing lambda/let rules) and then classifies each identifier by its identifier-reference-type:

identifier-reference-type Matched rule(s)
procedure, variable, getter, setter, predicator, constructor application-process
meta + symbol define define-process
meta + symbol lambda lambda-process
meta + symbol let / let* / letrec / letrec* let-process / let*-process / letrec-process
meta + symbol if if-process
meta + symbol cond cond-process
meta + symbol case-lambda case-lambda-process
meta + symbol begin begin-process
meta + symbol do do-process
others (self-defined libraries) route&add in self-defined-rules/router.sls

If an identifier is a meta-form, the corresponding rule runs before recursion. If it is an ordinary procedure, application-process runs. In all cases step then recurses into the children.

3.3 What Is a Substitution?

A substitution is simply an S-expression stored in an index-node’s index-node-substitution-list. It represents one possible type for that node. Because an expression may legitimately have multiple types (overloaded primitives, union types), the substitution list is a list of alternatives, not a single mapping.

For example, after running the generator on (+ 1 2), the node for the whole expression may contain:

;; substitution list of the (+ 1 2) node
((number? <- (inner:list? number? ...)) number? number?)

The first element is the type of +; the remaining elements are the argument nodes. During interpretation the interpreter sees that this is an inner:executable? (a function applied to arguments) and performs β-reduction.


4. Rule Reference (Detailed)

Rules live in analysis/type/substitutions/rules/ and a few self-defined extensions.

4.1 trivial.sls – Literals, Symbols, and Implicit Conversions

trivial-process is the most complex rule because it handles the bottom of the AST.

Literals

Literal form Type attached
char? char?
string? string?
boolean? boolean?
fixnum? fixnum?
bignum? bignum?
integer? integer?
cflonum? cflonum?
flonum? flonum?
rational? rational?
real? real?
complex? / number? number?

Composite literals

Symbols (identifiers)

For a symbol reference, trivial-process looks up all available identifier-references and generates substitutions depending on the binding kind:

  1. Constructor / Getter / Setter – attach the identifier itself (its type-expressions will carry the real type).
  2. Predicator – attach (boolean? <- (inner:list? something?)).
  3. Imported from another document – attach the target index-node (or its type-expressions if already known).
  4. Local variable – attach the index-node of the initialization site. This creates a direct edge in the equation graph.

Gradual Typing / Implicit Conversion

This is where the system converts “the code is executable” into “the parameter has the expected type”. When a local variable occurs as an argument in a function application, and that variable is a parameter, trivial-process generates a macro substitution on the variable’s definition site:

;; If `a` is the i-th parameter of a call to `+`
;; the substitution attached to `a`'s binding node becomes:
((with ((a b c))
   ((with ((x d0 d1 d2))
      di)            ;; di is the symbol for the i-th parameter
    c))
 +)

After macro expansion (see §6), this effectively says: “the type of a is whatever the type of +’s i-th parameter is”. This is the mechanism that turns (lambda (a) (+ 1 a)) into a : number? without any explicit annotation.

4.2 application.sls – Function Application

The simplest rule:

(define (application-process document index-node)
  (extend-index-node-substitution-list index-node (index-node-children index-node)))

It records that the type of the whole application is “the function node applied to the argument nodes”. The interpreter later reduces this.

4.3 lambda.sls – Lambda Abstraction

For (lambda (p₁ p₂ …) body):

  1. Collect the parameter index-nodes.
  2. Build (inner:list? p₁ p₂ …) via construct-parameter-index-nodes-products-with.
  3. Build the function type (body <- (inner:list? p₁ p₂ …)) via construct-lambdas-with.
  4. Attach it to the lambda node.

Because body and each pᵢ are themselves index-nodes, the type remains symbolic until interpretation.

4.4 let.sls / letrec.sls – Binding Forms

let is treated as syntactic sugar for lambda application:

(let ((x init)) body)
;; ≈
((lambda (x) body) init)

The rule therefore:

  1. Attaches the return-index-node (the last body expression) to the let node, and vice-versa.
  2. For each binding pair (identifier init), attaches init to identifier and identifier to init (bidirectional substitution).

Named let receives additional handling: the loop identifier is given a function type constructed from the binding parameters and the body, exactly like lambda.

letrec-process is almost identical to let-process; the key difference is that the loop identifier is not present, so no extra lambda wrapper is built.

4.5 if.sls – Conditional

For (if cond then else):

For two-arm if, the union has two members; for one-arm if, only the then branch is attached.

4.6 define.sls – Top-Level & Internal Definitions

Two shapes:

  1. (define (f p₁ p₂ …) body) – like lambda, but the resulting function type is attached to the identifier node of f, not to the define node.
  2. (define var init) – bidirectional link between var and init.

4.7 case-lambda.sls – Overloaded Arity

Each clause (((p₁ p₂ …) body)) generates an independent function type. All clauses are attached to the root case-lambda node, producing a list of alternative types. During interpretation the interpreter may resolve to any of them; in practice the call site arity usually selects the correct one.

4.8 record.slsdefine-record-type

Collects the constructor, predicator, getters and setters exported by the record definition and assigns them canonical types:

These are stored directly in the identifier-reference-type-expressions field so that they are available even when the record definition is imported from another file.

Note: Although inner-type-checker.sls defines an inner:record? recogniser of the form (inner:record? pred (inner:pair? getter τ) …), no rule currently generates such a unified record-type value. The record’s type is therefore represented indirectly through its predicate identifier and the per-field accessor signatures above.

4.9 cond.sls – Multi-Branch Conditional

Each clause (test expr₁ expr₂ …) is processed independently:

4.10 begin.sls – Sequencing

The type of a begin form is the type of its last expression:

(extend-index-node-substitution-list index-node (car (reverse (index-node-children index-node))))

4.11 Self-Defined Rules (ufo-match, ufo-try)

The router in self-defined-rules/router.sls allows external libraries to register custom rules. Currently supported:


5. Phase II – The DSL Interpreter

Once substitutions are generated, every index-node carries a list of raw type expressions. The interpreter resolves these into concrete types.

5.1 Entry Points

Function Purpose
type:interpret Core interpreter. Returns a type:environment whose result-list contains the resolved types.
type:interpret-result-list Convenience wrapper that extracts the result-list from type:interpret.
type:depature&interpret->result-list First expands any macros inside the expression, then interprets.
type:recursive-interpret-result-list Iteratively extends the substitution set to break recursive cycles (see §5.5).
type:interpret->strings Pretty-prints results as human-readable strings.

5.2 The type:environment Record

(define-record-type type:environment
  (fields
    (mutable substitution-list)  ; global equations that may be extended
    (mutable result-list)))      ; output accumulator

The environment is passed through recursive calls so that macro expansions can consult and even extend the substitution list.

5.3 type:interpret – Case Analysis

type:interpret receives (expression env memory max-depth). memory is a list of already-visited expressions (loop detection). max-depth defaults to PRIVATE-MAX-DEPTH (10).

The interpreter proceeds by structural case analysis:

Expression shape Action
null? Return '()
Depth exceeded or expression in memory Return the expression unchanged (give up).
inner:executable? with macro head Expand the macro (see §6), interpret each expansion.
inner:executable? with lambda head β-reduce: match arguments against parameter template, substitute into body, interpret body.
index-node? Look up its substitution-list; interpret each substitution.
inner:macro? Return unchanged (it needs an outer application context to expand).
inner:list?, inner:vector?, inner:pair?, inner:lambda? Interpret each element independently, then take the Cartesian product of all element results.
Plain list? (not an inner type) Interpret the car; if it becomes a lambda or macro, re-assemble and interpret again.
identifier-reference? Return its type-expressions (for constructors/getters/setters) or the identifier itself.
Anything else Return as-is (concrete atom).

5.4 Cartesian Product & Pruning

When a list type has multiple possible types for each element, the interpreter uses Cartesian product to enumerate all combinations. This is the main performance bottleneck.

private-generate-cartesian-product-procedure implements a tiered pruning strategy:

  1. Try full Cartesian product.
  2. If the product size exceeds PRIVATE-MAX-CARTESIAN-PRODUCT (50 000), filter each dimension to keep only type:partially-solved? items.
  3. Still too large? Increase the minimum solved-leaves threshold (2, then 3).
  4. Still too large? Keep only fully type:solved? items.
  5. Absolute fallback: return an empty list (effectively something?).

This graceful degradation prevents the interpreter from hanging on pathological code while still giving useful results for common cases.

5.5 Solving Recursion & The Halting Problem

Recursion is ubiquitous in Scheme. A naïve interpreter would loop forever on:

(define (factorial n)
  (if (= n 0) 1 (* n (factorial (- n 1)))))

type:recursive-interpret-result-list solves this with an iterative fixed-point loop:

(let loop ([i 0] [targets '(expr)] [env env] [result '()])
  (if (or (>= i max-recursion) (too-many-targets?))
      (dedupe result targets)
      (let* ([r0 (map (lambda (e) (type:depature&interpret->result-list e env)) targets)]
             [r1 (filter type:solved? r0)])
        (loop (+ i 1)
              (filter (lambda (x) (not (type:solved? x))) r0)
              env
              (append result r1)))))

This is not a full fixed-point solver, but in practice it resolves most self-referential function types because the base case (if) introduces a union type that eventually becomes solved.

5.6 Solved vs. Unsolved

(define (type:solved? expression)
  )

An expression is solved when:

type:partially-solved? counts how many leaf nodes are solved; it is used by the Cartesian-product pruner.


6. The Macro Engine

Some R6RS primitives cannot be expressed as simple function types. car, for example, must return the first element type of a list, but the list’s length is unknown at compile time. The system solves this with a small macro language embedded in the type DSL.

6.1 with – Pattern Abstraction

with has two syntactic forms depending on whether the denotions (pattern templates) are wrapped in an extra list.

Form A — flat denotions (most common)

((with (a b c)                ; denotions = (a b c)
     body)
  x y z)                      ; inputs    = (x y z)

Form B — wrapped denotions (single-input scenarios)

((with ((a b c **1))          ; denotions = ((a b c **1))
     body)
  (inner:list? x y z))        ; inputs    = ((inner:list? x y z))

When the interpreter sees a macro application, it:

  1. Interprets the arguments to obtain their type-level values.
  2. Calls candy:matchable? to see if the templates match.
  3. Uses candy:match-left to produce a list of ((symbol . value) …) pairs.
  4. Substitutes these pairs into body via private-with.
  5. Interprets the resulting body.

Critical invariant: execute-macro calls candy:matchable? with denotions on the left and expanded-inputs on the right. The two lists must have the same nesting level — flat against flat, wrapped against wrapped. Mismatching them causes candy:matchable? to return #f and the macro is silently left unexpanded. See §6.7 Trap 1.

6.2 Built-In Macro Helpers

Macro Semantics
with-append (with-append list1 list2)(append list1 list2), then recursively execute
with-equal? (with-equal? a b body)body if a equals b, else the original macro expression unchanged

with-equal? is the primary guard mechanism. When the guard fails, the entire expression is returned as-is; type:interpret then tries the next signature in the substitution list.

6.3 Macro Execution Internals

6.3.1 macro-head-execute-with — Gatekeeper

(define (macro-head-execute-with expression interpreted-inputs)
  (match expression
    [(('with ((? inner:macro-template? denotions) **1) body)
      (? inner:trivial? inputs) **1)
     (execute-macro `((with ,denotions ,body) ,@interpreted-inputs))]
    [else (raise 'macro-not-match:macro-head-execute-with)]))

The match pattern ((? inner:trivial? inputs) **1) uses ufo-match semantics: **1 means “one or more occurrences of the preceding pattern”, and the predicate inner:trivial? is checked per element. inputs binds to the entire list of matched elements.

Input shape Match result Why
(x) inputs = (x) Single element satisfies inner:trivial?
(x y z) inputs = (x y z) All three satisfy inner:trivial?
(x y z) where y is bare number? ❌ match fails Bare symbols fail inner:trivial?
(x y z) where y is identifier-reference ✅ match passes identifier-reference? is accepted

Field note: In real substitution generation every type predicate (fixnum?, string?, etc.) is wrapped as an identifier-reference by construct-type-expression-with-meta. Bare symbols fail the guard, which is why synthetic repros using raw symbols sometimes look like macro bugs when they are actually data-format mismatches.

6.3.2 execute-macro — The Expander

(define (execute-macro expression)
  (match expression
    [(('with ((? inner:macro-template? denotions) **1) body) inputs ...)
     (let ([expanded-inputs (map execute-macro inputs)])
       (if (and (andmap inner:trivial? expanded-inputs)
                (candy:matchable? denotions expanded-inputs))
           (execute-macro (private-with body (candy:match-left denotions expanded-inputs)))
           expression))]
    [('with-append (? list? a) (? list? b)) (execute-macro (append a b))]
    [('with-equal? a b body) (if (equal? a b) (execute-macro body) expression)]
    [else expression]))

Three checks before expansion:

  1. andmap inner:trivial? expanded-inputs — every argument must be a valid type expression.
  2. candy:matchable? denotions expanded-inputs — shapes must align (see §6.7 Trap 1 for the wrapping-level requirement).
  3. If both pass, candy:match-left produces bindings and private-with substitutes them into body.

On failure, execute-macro returns the original expression unchanged. This residue then flows back into type:interpret-result-list and participates in Cartesian products downstream.

6.3.3 private-with — Substitution Engine

(define (private-with body match-pairs)
  (fold-left
    (lambda (left pair)
      (let ([denotion (car pair)] [input (cdr pair)])
        (cond 
          [(symbol? denotion) (private-substitute left denotion input)]
          [(index-node? denotion) (private-substitute left denotion input)]
          [(identifier-reference? denotion) left]  ; deliberately ignored
          [(and (list? denotion) (list? input))
           (if (candy:matchable? denotion input)
               (private-with body (candy:match-left denotion input))
               (raise 'macro-not-match:private-with-list?))]
          [else (raise 'macro-not-match:private-with-else)])))
    body 
    match-pairs))

private-substitute is a naïve tree walk that replaces every occurrence of from with to. When denotion is an identifier-reference, private-with returns left unchanged — an identifier-reference in pattern position acts as a wildcard that matches anything but binds nothing.

When both denotion and input are lists, private-with recurses with candy:match-left. The previous code used candy:match-right when input contained **1 or ..., but that path was broken: match-right expands list-valued bindings into multiple flat pairs, and fold-left + private-substitute can only consume the first one. See §6.7 Trap 3 for details.

6.4 Example: car

car carries two signatures in rnrs-meta-rules.sls:

(car (with ((a b c **1)) (with-equal? inner:list? a b)))
(car (something? <- (inner:list? (inner:pair? something? something?))))

The first is a macro for inner:list? arguments; the second is a conservative function type for inner:pair? arguments. type:interpret tries them in order.

6.4.1 Macro execution trace

Apply the macro to (inner:list? fixnum? number?):

  1. execute-macro sees a with application:
    ((with ((a b c **1)) (with-equal? inner:list? a b))
     (inner:list? fixnum? number?))
    
  2. candy:matchable? checks template (a b c **1) against argument (inner:list? fixnum? number?).
    • Segment a matches inner:list?.
    • Segment b matches fixnum?.
    • Segment c **1 matches (number?) (one occurrence).
    • Match succeeds.
  3. candy:match-left produces bindings:
    ((a . inner:list?) (b . fixnum?) (c . (number?)))
    
  4. private-with substitutes into body (with-equal? inner:list? a b):
    (with-equal? inner:list? inner:list? fixnum?)
    
  5. execute-macro evaluates with-equal?:
    • (equal? inner:list? inner:list?)#t.
    • Result is fixnum?.

6.4.2 Match-failure & fallback

If the argument is not an inner:list? (e.g. an inner:pair?):

6.4.3 Template length requirement

The template (a b c **1) requires the argument to contain at least three elements (a, b, and at least one c). Therefore:

Argument shape Macro matches? Result
(inner:list? fixnum? number?) (3 elems) fixnum?
(inner:list? fixnum? number? string?) (4 elems) fixnum?
(inner:list? fixnum?) (2 elems) falls back to pair version → something?
(inner:pair? something? something?) ❌ (guard fails) something?

The same pattern applies to all c*r macros listed below.

6.5 Extending the c*r Macro Family

The cdr/cddr/cdddr/cddddr macros already exist in rnrs-meta-rules.sls. They use with-append (inner:list?) to reconstruct the tail type:

(cdr    (with ((a b c **1))       (with-equal? inner:list? a (with-append (inner:list?) c))))
(cddr   (with ((a b c d **1))     (with-equal? inner:list? a (with-append (inner:list?) d))))
(cdddr  (with ((a b c d e **1))   (with-equal? inner:list? a (with-append (inner:list?) e))))
(cddddr (with ((a b c d e f **1)) (with-equal? inner:list? a (with-append (inner:list?) f))))

The following inner:list? macros have been added to rnrs-meta-rules.sls. They improve hover-tooltip precision for code that uses cadr, caddr, etc. on known-length lists.

6.5.1 cadr – second element

(cadr (with ((a b c d **1)) (with-equal? inner:list? a c)))
Binding Value for (inner:list? τ₁ τ₂ τ₃ τ₄ …)
a inner:list?
b τ₁
c τ₂returned
d (τ₃ τ₄ …)

Requires the argument to contain at least four elements. Shorter lists fall back to the pair version.

6.5.2 caddr – third element

(caddr (with ((a b c d e **1)) (with-equal? inner:list? a d)))

Requires at least five elements. d binds τ₃.

6.5.3 cadddr – fourth element

(cadddr (with ((a b c d e f **1)) (with-equal? inner:list? a e)))

Requires at least six elements. e binds τ₄.

6.5.4 Nested-list macros (caar, caaar, cadar, cddar, …)

Macros that decompose nested list types are written by nesting with macro applications inside the macro body. execute-macro recursively expands them, and since the interpreter pre-expands macro arguments (including with-append and with-equal?) before matching a macro head, inner cdr steps work as well.

(caar (with ((a b c ...))
        (with-equal? inner:list? a
          ((with ((d e f **1)) (with-equal? inner:list? d e)) b))))

(caaar (with ((a b c ...))
         (with-equal? inner:list? a
           ((with ((d e f ...))
              (with-equal? inner:list? d
                ((with ((g h i **1)) (with-equal? inner:list? g h)) e)))
            b))))

(cadar (with ((a b c ...))
         (with-equal? inner:list? a
           ((with ((d e f ...))
              (with-equal? inner:list? d
                ((with ((g h i ...)) (with-equal? inner:list? g h))
                 (with-append (inner:list?) f))))
            b))))

(cddar (with ((a b c ...))
         (with-equal? inner:list? a
           ((with ((d e f ...))
              (with-equal? inner:list? d
                ((with ((g h i ...)) (with-equal? inner:list? g (with-append (inner:list?) i)))
                 (with-append (inner:list?) f))))
            b))))

The nesting depth is limited only by readability. In practice only caar, cadar, cddar, and caaar appear often enough in real code to justify the extra signature lines.

6.5.5 Summary table – all c*r functions

Function List macro? Kind Min elems Notes
car flat (a b c **1) 3 returns b
cdr flat (a b c **1) 3 returns (with-append (inner:list?) c)
caar nested with 3 (inner) car of car
cadr flat (a b c d **1) 4 returns c
cdar nested with 3 (inner) cdr of car
cddr flat (a b c d **1) 4 returns (with-append (inner:list?) d)
caaar nested with 3 (inner) car of car of car
caadr feasible (3-layer nested with), not yet added
cadar nested with 3 (inner) car of cdr of car
caddr flat (a b c d e **1) 5 returns d
cdaar feasible, not yet added
cdadr feasible, not yet added
cddar nested with 3 (inner) cdr of cdr of car
cdddr flat (a b c d e **1) 5 returns (with-append (inner:list?) e)
caaaar feasible (4-layer nested with), extremely rare in practice
caaadr feasible, not yet added
caadar feasible, not yet added
caaddr feasible, not yet added
cadaar feasible, not yet added
cadadr feasible, not yet added
caddar feasible, not yet added
cadddr flat (a b c d e f **1) 6 returns e
cdaaar feasible, not yet added
cdaadr feasible, not yet added
cdadar feasible, not yet added
cdaddr feasible, not yet added
cddaar feasible, not yet added
cddadr feasible, not yet added
cdddar feasible, not yet added
cddddr flat (a b c d e f **1) 6 returns (with-append (inner:list?) f)

Note: For every function in the table, an additional fallback signature (something? <- (inner:list? (inner:pair? something? something?))) is kept in rnrs-meta-rules.sls so that short lists and generic pairs still type-check conservatively.

6.5.6 Why some c*r functions are not added

After the execute-macro fix (§6.5.4), all c*r functions can theoretically receive a list macro—there is no longer a technical block. The remaining ones are omitted for two pragmatic reasons:

  1. Readability. A 4-layer nested with (e.g. caaaar) is ~10 lines of dense parenthesis soup. It is easy to get a bracket wrong and hard to review.
  2. Frequency. Functions beyond 3 layers (caaaar, caaadr, caadar, …) appear so rarely in real Scheme code that the pair fallback (something? <- (inner:list? (inner:pair? something? something?))) is sufficient for IDE use.

If a project genuinely needs caadr or cdaar, the pattern is identical to cadar/cddar: nest with macros, use with-append (inner:list?) for every cdr step, and keep the outer with-equal? inner:list? guard at each level.

6.6 candy:matchable? – Dynamic-Programming Matcher

The matcher lives in syntax-candy.sls. It segments a list into alternating fixed and repeatable sections. A segment is delimited by ... or **1.

Matching is performed by filling a 2-D matrix (rows = template segments, columns = argument segments) with three statuses:

The matrix is explored depth-first, allowing paths that step right or down but never diagonally. This is essentially a restricted regular-expression matcher for heterogeneous lists. It supports overlapping repeatable segments, which is needed for types like (inner:list? number? ... string? ...).

6.7 Known Traps & Invariants

Trap 1: Wrapping-Level Mismatch

execute-macro passes denotions and expanded-inputs to candy:matchable? without any unwrapping. The two must share the same nesting level.

Denotions Inputs Result
(a b c) (flat) (x y z) (flat) ✅ match
((a b c **1)) (wrapped) ((x y z)) (wrapped) ✅ match
(a b c) (flat) ((x y z)) (wrapped) ❌ mismatch — candy:matchable? returns #f
((a b c **1)) (wrapped) (x y z) (flat) ❌ mismatch

If you write a macro with wrapped denotions (Form B), the caller must also wrap the arguments. The c*r macros that use inner:list? (§6.5) rely on this wrapping because the entire list type (inner:list? τ₁ τ₂ …) is treated as a single macro argument.

Trap 2: Symbol-Category Mismatch

execute-macro requires (andmap inner:trivial? expanded-inputs). inner:trivial? accepts index-node?, identifier-reference?, something?, and inner:void?. It rejects bare symbols such as number? or fixnum?.

In real substitution rules all predicates are identifier-reference records produced by construct-type-expression-with-meta. However, hand-written unit tests that feed raw symbols will silently fail to match any macro, leaving the with expression unexpanded. If that residue later flows into a Cartesian product, the product includes the raw with expression as one of its branches, which usually produces a meaningless type.

Rule of thumb: never use bare symbols in synthetic repros. Wrap them with construct-type-expression-with-meta or at least ensure they satisfy inner:trivial?.

Trap 3: candy:match-right & private-with Interaction (Fixed)

When a template contains repeated segments (**1 or ...), candy:match-right returns a flat list of pairs:

;; template: (a b c **1)   argument: (x y z w)
;; candy:match-right produces:
((a . x) (b . y) (c . z) (c . w))

private-with then iterates over these pairs with fold-left and private-substitute. Because private-substitute replaces every occurrence of the denotion symbol, the two (c . _) pairs overwrite each other rather than accumulating into a list. This means repeated-segment substitution inside private-with was incomplete for the candy:match-right path.

A more severe variant was discovered: even when the template has no repeated segments, if the argument itself is a list type expression (e.g. (inner:list? something? ...) which contains ... or **1), match-right would still expand the matched binding into multiple pairs:

;; template: (x d0 d1)
;; argument: (inner:list? (something? <- ...) (inner:list? something? ...) **1)
;; match-right incorrectly produces:
((x . inner:list?)
 (d0 . (something? <- ...))
 (d1 . inner:list?)       <-- d1 bound to first element only
 (d1 . something?)
 (d1 . ...))

private-with would then substitute d1 with inner:list? and discard something? and ..., causing the parameter type to collapse from (inner:list? something? ...) to bare inner:list?.

Fix (2025-05-11): private-with now always uses candy:match-left, which preserves list-valued bindings as a single pair:

;; match-left correctly produces:
((x . inner:list?)
 (d0 . (something? <- ...))
 (d1 . (inner:list? something? ...)))   <-- d1 keeps the full list

User concern: interpreted-inputs passed to macros (e.g. the type signature of map) can themselves contain **1 or .... The old match-right branch would trigger on any **1 in the input, not just repeated-segment templates. Using match-left universally is safer because it never fragments a binding.

Trap 4: Macro Residue in Cartesian Products

When execute-macro fails to expand (guard fails or mismatch), it returns the original macro expression unchanged. If this residue is fed back into type:interpret-result-list, it participates in the Cartesian product of type combinations. The product generator does not know how to reduce a with expression, so the resulting branch is typically discarded as an invalid type, but it wastes time and may confuse downstream consumers.

Lesson learned: application.sls was briefly modified to inject with macros into every call site. Even when the macro did not expand, the residue broke test-define and test-let because unexpanded with terms appeared inside substitution lists that the interpreter tried to fold into type sets. The fix was to revert application.sls to its simple form.

Trap 5: with-append Is Not append

with-append is only recognised inside execute-macro. It is not a general function. You cannot write:

;; WRONG — with-append is not in scope outside a macro body
(let ([tail (with-append (inner:list?) (number?))]) ...)

It must appear as the immediate expression being interpreted, and its arguments must be literal lists known at macro-expansion time.

Trap 6: Template Length vs Runtime Length

A template like (a b c **1) requires the argument to contain at least three elements. If the list is shorter at runtime, the macro does not match and falls back to the next signature. This is by design — car on a two-element list cannot statically know whether the first element is the “head” or the “tail” in a type sense — but it means short lists always degrade to the conservative pair fallback (something?).

Trap 7: identifier-reference As Wildcard

In private-with, if the denotion is an identifier-reference?, the branch returns left unchanged:

[(identifier-reference? denotion) left]  ; deliberately ignored

This means an identifier-reference in pattern position acts as a wildcard that matches anything but binds nothing. It is useful when you want a segment to match without capturing its value (e.g. forcing a shape check but discarding the matched predicate). However, if you intended to capture the matched value into a variable, using an identifier-reference instead of a symbol will silently drop it.


7. Type Relations

domain-specific-language/interpreter.sls exports three relation predicates:

7.1 type:->? – Subtype

(type:->? left right env) asks whether left is a subtype of right.

Rules (in order):

  1. equal?#t.
  2. left is something?#f (top is supertype of nothing concrete).
  3. right is something?#t.
  4. Both are identifier-reference?s → walk the identifier-reference-parents chain (e.g. integer?rational?real?number?).
  5. Both are lists → convert inner:list? to inner:pair? via inner:?->pair, then element-wise type:->?.
  6. Fallback → check whether right appears in (type:interpret-result-list left).

7.2 type:<-? – Supertype

Reverse of type:->?.

7.3 type:=? – Type Equality

(and (type:->? left right) (type:<-? left right)).


8. Built-In Type Signatures (rnrs-meta-rules)

analysis/type/substitutions/rnrs-meta-rules.sls contains a large sorted alist that maps primitive names to their type signatures. It covers R6RS base, R6RS standard libraries, and Chez Scheme extensions—more than 1 100 entries.

Examples:

(+ (number? <- (inner:list? number? ...)))
(- (number? <- (inner:list? number? **1)))
(map ((inner:list? something? ...) <-
       (inner:list?
         (something? <- (inner:list? something?))
         (inner:list? something? ...) **1)))
(apply (something? <-
         (inner:list?
           (something? <- (inner:list? something?))
           something? ...
           (inner:list? something? ...))))

During workspace initialization (init-type-expressions in analysis/identifier/meta.sls), these signatures are loaded and attached to the corresponding identifier-reference records in the top-level environment. This is the seed data that makes the entire inference engine work: without knowing that + expects number?, the implicit conversion in trivial-process would have no target type.


9. Integration with the Rest of the Server

9.1 When Type Inference Runs

Type inference is gated by the type-inference? flag passed to init-workspace. When enabled:

  1. init-references (in analysis/workspace.sls) runs construct-substitutions-for on every document in topological order.
  2. After substitution generation, identifier-reference-type-expressions are populated by calling type:recursive-interpret-result-list on each exported identifier.

9.2 Incremental Updates

When a file is changed:

  1. update-file-node-with-tail rebuilds the document’s index-node-list.
  2. refresh-workspace-for determines whether the library header changed. If not, only the affected file’s substitutions are regenerated.
  3. shrink-paths computes topological batches; init-references re-runs step on the affected batch.

9.3 From Types to LSP Diagnostics

Currently the type system is used primarily for hover information (not yet for full publishDiagnostics). The textDocument/hover handler can look up an identifier’s type-expressions, map them through type:interpret->strings, and display them to the user. The old documents listed “TODO: Diagnostic based on Type System”; the infrastructure is in place, but the actual diagnostic rules (type-mismatch errors) are not yet implemented.


10. Performance Characteristics & Known Limitations

10.1 Strengths

10.2 Weaknesses

  1. Cartesian-product explosion A function with 4 parameters each having 4 possible types generates 4⁴ = 256 combinations. In the worst case this is factorial growth. The tiered pruner mitigates but does not eliminate the problem.

  2. Approximate recursion handling type:recursive-interpret-result-list uses a hard iteration limit (2). Deeply nested recursive types may not fully resolve.

  3. No polymorphic generalisation The system does not implement HM’s ∀α.τ generalisation. A function like (lambda (x) x) gets type (something? <- (inner:list? something?)), not a true parametric type. This is usually sufficient for IDE use but can lose precision.

  4. call/cc and control effects Continuations, dynamic-wind, and other control operators are inherently beyond the scope of this type system. They are typed as something? <- ….

  5. Mutability is not tracked set! and mutable data structures are not reflected in types. (define x 1) followed by (set! x "hello") may leave x with type fixnum? because the substitution graph does not model temporal mutation.

  6. max-depth is not propagated through recursive calls type:interpret accepts a max-depth parameter, but every internal recursive call invokes the 3-arity (type:interpret-result-list expr env memory) which silently falls back to the hard-coded PRIVATE-MAX-DEPTH (10). Consequently:

    • Specifying a shallow max-depth from the outside has no effect on nested expansions.
    • Recursive functions with complex bodies (e.g. binary-search in util/binary-search.sls) are expanded to depth 10 regardless of the caller’s intent.
    • Because memory-based cycle detection only records expressions on the current call chain, cross-function recursion (e.g. binary-searchprivate-collect → self-call) is not caught when the recursive target is reached through a substitution rather than a direct type:interpret-result-list call.
    • The combination of depth-10 expansion and missed cycle detection causes a Cartesian-product explosion that can produce millions of intermediate type expressions and hang the interpreter for minutes.
    • natural-order-compare survives the same mechanism because its body is simple (let + if + one recursive call); binary-search fails because its clause-2 body contains cond with three branches, let*, if, multiple recursive call sites, and a mutual call to private-collect.

    Short-term mitigation: Lower PRIVATE-MAX-DEPTH from 10 to a smaller value (e.g. 3–4).
    Proper fix: Thread max-depth through every internal recursive call in interpreter.sls (lines 224, 231, 238, 258, 268, 278, 285).


10.3 Performance Optimization Attempts (Field Notes)

This section records a systematic benchmarking and optimization effort on the type-inference subsystem. It is intended to prevent future developers from repeating the same dead ends.

Benchmarking Infrastructure

A standalone benchmark runner was added at bin/benchmark-type-inference.ss. It measures wall-clock time, CPU time, GC time, and result-list size for:

Key baseline numbers (before any optimization):

Benchmark Real Results
type:interpret binary-search cl 158.7 s 21,603
type:interpret natural-order-compare cl 12.9 s 218
init-workspace (with ti) >600 s (timeout)
hover assq-ref (with ti) 72.3 s 0

The extreme gap between binary-search (21k results, 158s) and natural-order-compare (218 results, 13s) shows that the problem is not type inference per se, but the branching factor of recursive functions under the current depth limit.

Attempt 1 — Hash-Set dedupe (Success)

Change: Rewrite util/dedupe.sls to use make-hashtable with equal-hash / equal? when the equality predicate is equal?. Custom predicates fall back to the original O(n²) filter.

Result:

Lesson: Deduplication is a real bottleneck when result lists grow to 20k+ items, but it is not the root cause. Even with O(n) dedupe, binary-search still takes ~100 s because the interpreter still generates 21k results.

Attempt 2 — Thread max-depth Through Internal Calls (Failure)

Hypothesis: type:interpret-result-list drops max-depth in 7 internal recursive call sites, causing all nested expansions to fall back to PRIVATE-MAX-DEPTH = 10. Fixing this would cap the explosion.

Change: Pass max-depth explicitly at lines 224, 231, 238, 258, 268, 278, 285.

Result: No measurable effect. binary-search cl remained at ~100 s with 21,603 results.

Root cause analysis: The external call (type:interpret-result-list target-node) already uses the default PRIVATE-MAX-DEPTH = 10. Internal calls falling back to 10 is therefore a no-op in this scenario. The real problem is that depth 10 itself is too deep for a highly branching recursive function.

Lesson: Passing max-depth is a correct code hygiene fix, but it does not solve the binary-search timeout on its own.

Attempt 3 — Memoization in type:interpret (Failure → False Start → Safe but Ineffective)

Hypothesis: The same sub-expressions are re-interpreted hundreds of times across cartesian-product branches. Caching (expression, env) → result-list should eliminate redundant work.

First implementation (at type:interpret level): On cache hit, set env.result-list directly and skip private:interpret-core.

Result: Catastrophic. Result count ballooned from 21,603 → 32,403, and runtime increased from 100 s → 181 s.

Root cause analysis: env is a mutable record shared across the entire call chain. When an internal cache hit mutated env.result-list, outer fold-left append calls in private:interpret-core collected stale or duplicated values, corrupting the aggregation.

Second implementation (at type:interpret-result-list level): Cache only the returned value, do not mutate env. Only index-node? expressions are cached (using make-eq-hashtable for O(1) identity lookup).

Result: Semantically safe — result count stayed at 21,603. But speedup was negligible (~1%).

Root cause analysis: binary-search is deeply recursive. After the first encounter of an index-node, it is immediately appended to memory. Subsequent recursive references to the same node find it in memory and are truncated, so the cache condition (not (contain? memory expression)) fails. Cache hits only occur for top-level expressions referenced from multiple distinct parents before they enter memory — a rare scenario for recursive functions.

Lesson: Memoization is safe when implemented as a value cache outside the mutable env, but for recursive functions the memory-based truncation mechanism defeats it.

Attempt 4 — private:interpret-core Extraction (Unexpected Side Effect)

Change: Refactor type:interpret by extracting its body into private:interpret-core and simplifying the case-lambda fallback chain so that 1-arity and 2-arity calls reach the 4-arity body directly instead of hopping through intermediate arities.

Result: No semantic change, but type:interpret binary-search cl dropped from ~100 s → 82.8 s and the full benchmark suite completed for the first time (previously it timed out at 600 s because earlier cases consumed too much time).

Lesson: The improvement is likely attributable to reduced call-stack overhead and/or better compiler optimisation from smaller functions, not a fundamental algorithmic win.

Synthesis — What Actually Works

Based on the above attempts, the following interventions are known to have real, measured impact:

Intervention Status Verified Impact
Hash-set dedupe ✅ Merged −37% on binary-search cl
Simplify case-lambda fallbacks ✅ Merged (as part of memoization revert) ~−15–20% on binary-search cl
Thread max-depth ❌ Reverted None (default depth = 10 anyway)
Memoization (type:interpret) ❌ Reverted Negative (pollutes shared env)
Memoization (type:interpret-result-list) ❌ Reverted ~0% (blocked by memory truncation)

The remaining bottleneck is the depth-10 expansion of highly branching recursive functions. The most promising fixes not yet attempted are:

  1. Lower PRIVATE-MAX-DEPTH (e.g. from 10 → 3–4) — trivial one-line change.
  2. Result-size budget — hard cap the number of items in env.result-list (e.g. 1,000) and truncate to something?.
  3. Cache type:partially-solved? — the cartesian-product pruner calls this predicate for every element on every tier; memoizing it would avoid repeated tree traversals.
  4. Global visited set — replace (or augment) chain-based memory with a per-top-level-call hash set so that cross-function recursion (binary-searchprivate-collect) is detected regardless of call path.

11. File Map

File Responsibility
analysis/type/substitutions/generator.sls construct-substitutions-for, step dispatcher, rule routing.
analysis/type/substitutions/util.sls construct-lambdas-with, construct-parameter-index-nodes-products-with.
analysis/type/substitutions/rnrs-meta-rules.sls Large alist of primitive type signatures.
analysis/type/substitutions/rules/trivial.sls Literals, symbols, implicit conversion (Gradual Typing).
analysis/type/substitutions/rules/application.sls Vanilla function application.
analysis/type/substitutions/rules/lambda.sls Lambda abstraction types.
analysis/type/substitutions/rules/let.sls let, let*, named let.
analysis/type/substitutions/rules/letrec.sls letrec, letrec*.
analysis/type/substitutions/rules/if.sls Conditional union types.
analysis/type/substitutions/rules/define.sls define for variables and procedures.
analysis/type/substitutions/rules/case-lambda.sls Overloaded arities.
analysis/type/substitutions/rules/record.sls define-record-type types.
analysis/type/substitutions/rules/cond.sls Multi-branch conditional.
analysis/type/substitutions/rules/begin.sls Sequencing.
analysis/type/substitutions/self-defined-rules/router.sls Extensible rule dispatcher.
analysis/type/substitutions/self-defined-rules/ufo-match/match.sls ufo-match pattern support.
analysis/type/domain-specific-language/interpreter.sls type:interpret, type:->?, macro expander, Cartesian-product pruner.
analysis/type/domain-specific-language/inner-type-checker.sls Predicates: inner:trivial?, inner:lambda?, inner:macro?, inner:executable?, inner:?->pair.
analysis/type/domain-specific-language/syntax-candy.sls candy:matchable?, candy:match-left, candy:match-right, segment-based DP matcher.
analysis/identifier/meta.sls construct-type-expression-with-meta, loader for rnrs-meta-rules.

12. Field Notes — car Macro & Synthetic-Repro Pitfall (2026-04)

Retrospective note: The original write-up of this section contained an incorrect root-cause analysis. The symptoms described below were real, but they stemmed from a flawed synthetic repro rather than from bugs in interpreter.sls. The section has been corrected to reflect what actually happened.

What Was Observed

A hand-written test (test-car-macro-bug.sps) constructed a raw car macro application using bare symbols for type predicates:

(type:interpret-result-list
  `((with ((a b c **1)) (with-equal? inner:list? a b))
    (inner:list? fixnum? string?))
  env)

The interpreter returned the macro expression unchanged instead of fixnum?.

The Actual Root Cause

The synthetic repro was unrealistic. In real substitution generation, every type predicate (fixnum?, string?, number?, etc.) is wrapped as an identifier-reference record by construct-type-expression-with-meta. inner:trivial? accepts identifier-reference objects, so real macro arguments pass the (? inner:trivial? inputs) **1 guard in macro-head-execute-with without issue.

In the synthetic repro, fixnum? and string? are bare symbols. They fail inner:trivial?, so ufo-match rejects the argument list before any macro logic runs. The macro never gets a chance to call candy:matchable?.

What the syntax-candy.sls Hardening Actually Fixed

The syntax-candy.sls changes made during this investigation did not fix a car macro bug — the macro already worked in real code — but they did eliminate several latent risks in the matcher:

Change Risk eliminated
private-segment now rejects templates with more than one repeat marker (... / **1) Prevents ambiguous DP-matcher states that could silently produce wrong bindings.
candy:match 2-arity now returns grouped symbol results directly Removes the fragile fold-left grouping logic from candy:match-left; repeated segments can no longer be split into multiple bindings by accident.
private:group-match-pairs compares segment-type instead of equal? on record objects Removes an implicit dependency on Chez Scheme’s default record equality, which could break if the segment record layout ever changes.
candy:match-right expands grouped results back to flat (rest-type . ready-type) pairs Keeps backward compatibility with private-with in interpreter.sls.

Verified Status

Test Status Notes
test-inner-type-checker.sps — direct car/cdr macro assertions ✅ Pass car returns first element type; cdr returns tail type.
test-lambda.sps — parameter-index-node type access ✅ Pass Implicitly exercises the same with/with-equal? path.
test-syntax-candy.sps ✅ Pass No regressions after matcher hardening.

Lesson

When debugging macro expansion, always feed the interpreter real type expressions (via construct-type-expression-with-meta or through the full substitution generator). Hand-constructing bare-symbol repros is fast, but it bypasses the identifier-reference wrapping that the rest of the pipeline assumes, producing false negatives that look like macro bugs but are actually data-format mismatches.