This document replaces the older
type-inference.mdandtype-inference.cn.md. It describes the complete type-inference pipeline implemented inanalysis/type/.
Scheme is an untyped language. Unlike TypeScript or Typed Racket, it does not require programmers to annotate parameters or return values. Consequently, a traditional Hindley–Milner (HM) or System-F-based approach cannot be applied directly: there are no explicit type annotations to seed the inference process.
scheme-langserver takes a different path. It is built on the following assumptions and observations:
+, car, vector-ref, …) already carries strong type constraints. For example, in (lambda (a) (+ 1 a)) the parameter a must be a number? because + demands it.something?, the user still gets a working language server.α, β, …), every index-node in the virtual file system doubles as a placeholder for an unknown type. This makes the substitution list (the equation store) physically attached to the syntax tree, which simplifies debugging and caching.The result is a two-phase pipeline:
| Phase | Module | What it does |
|---|---|---|
| Substitution Generation | substitutions/generator.sls + substitutions/rules/*.sls |
Walks the AST and builds a set of equations (substitutions) that relate AST nodes to type expressions. |
| Interpretation / Unification | domain-specific-language/interpreter.sls |
Evaluates the equations like a small interpreter, expanding macros, applying λ-abstractions, and resolving index-node placeholders until concrete types are reached. |
Both phases are expressed in a small domain-specific language (DSL) that lives inside the type subsystem.
For the list-pattern matcher used when applying lambda types and macro heads, see
syntax-candy.md.
Type expressions are ordinary Scheme S-expressions. There is no separate parser. The DSL is validated and manipulated by predicates in domain-specific-language/inner-type-checker.sls.
| Form | Meaning |
|---|---|
something? |
The top type (universal). Every value inhabits it. |
inner:void? |
The type of (void). |
<- |
Function-arrow marker (used in infix position). |
... |
Kleene-star: the preceding element may appear zero or more times. |
**1 |
Kleene-plus: the preceding element may appear one or more times. |
| Form | Meaning |
|---|---|
(inner:list? τ₁ τ₂ …) |
Proper list whose elements have types τ₁, τ₂, … |
(inner:vector? τ₁ τ₂ …) |
Vector whose elements have types τ₁, τ₂, … |
(inner:pair? τ-car τ-cdr) |
Pair (dotted or improper list). |
(τ-ret <- (inner:list? τ-arg1 τ-arg2 …)) |
Function type: returns τ-ret, accepts arguments τ-arg1 … |
(inner:record? pred (inner:pair? getter τ-field) …) |
Recogniser exists but not yet generated as a type value. Record types are currently represented indirectly via the predicator identifier and per-field getter/setter signatures. |
Wherever a raw symbol such as number? appears, the system actually stores an identifier-reference record (see analysis/identifier/reference.sls). This preserves library scoping: number? from (rnrs) and a user-defined number? are distinct types. The helper construct-type-expression-with-meta performs this wrapping.
An index-node (an AST node from the VFS) may appear inside a type expression. This is how the system represents unknowns. During interpretation the interpreter looks up the index-node-substitution-list of that node and continues evaluating. In HM terminology, an index-node is both a type variable and the term it describes.
;; + under R6RS
(number? <- (inner:list? number? ...))
;; car, expressed as a macro (see §6)
((with ((a b c **1))
(with-equal? inner:list? a b))
(inner:list? fixnum? number?))
;; Record types are not yet emitted as unified values.
;; Instead, the predicate `point?` and accessors carry separate signatures:
;; point? : (boolean? <- (inner:list? something?))
;; point-x : (something? <- (inner:list? point?))
;; make-point: (point? <- (inner:list? something? ...))
The entry point is construct-substitutions-for in substitutions/generator.sls. It maps step over every top-level index-node in a document.
step Dispatcherstep is a case-lambda with two arities:
(document index-node expanded+callee-list) – used for the main AST.(document index-node available-identifiers quasi-quoted-syntaxed expanded+callee-list) – used when descending inside ` or #' forms, so that unquote/unsyntax boundaries switch back to normal mode.The dispatcher distinguishes five coarse shapes:
| Shape | Action |
|---|---|
| Leaf node | trivial-process (literals, symbols, self-evaluating values) |
quote / quasiquote / syntax / quasisyntax |
Special handling; syntax is skipped entirely because it is not a value. |
Non-leaf list whose car is a symbol |
Look up available identifier references, route to a rule, then recurse into children. |
Non-leaf list whose car is not a symbol |
Must be an application (e.g. ((lambda (x) x) 42)). Run application-process, then recurse. |
establish-available-rules-from takes the list of identifier references that are visible at the current index-node and returns a list of (rule-procedure . identifier-reference) pairs. It filters out parameter and syntax-parameter bindings (those are handled by the enclosing lambda/let rules) and then classifies each identifier by its identifier-reference-type:
identifier-reference-type |
Matched rule(s) |
|---|---|
procedure, variable, getter, setter, predicator, constructor |
application-process |
meta + symbol define |
define-process |
meta + symbol lambda |
lambda-process |
meta + symbol let / let* / letrec / letrec* |
let-process / let*-process / letrec-process |
meta + symbol if |
if-process |
meta + symbol cond |
cond-process |
meta + symbol case-lambda |
case-lambda-process |
meta + symbol begin |
begin-process |
meta + symbol do |
do-process |
| others (self-defined libraries) | route&add in self-defined-rules/router.sls |
If an identifier is a meta-form, the corresponding rule runs before recursion. If it is an ordinary procedure, application-process runs. In all cases step then recurses into the children.
A substitution is simply an S-expression stored in an index-node’s index-node-substitution-list. It represents one possible type for that node. Because an expression may legitimately have multiple types (overloaded primitives, union types), the substitution list is a list of alternatives, not a single mapping.
For example, after running the generator on (+ 1 2), the node for the whole expression may contain:
;; substitution list of the (+ 1 2) node
((number? <- (inner:list? number? ...)) number? number?)
The first element is the type of +; the remaining elements are the argument nodes. During interpretation the interpreter sees that this is an inner:executable? (a function applied to arguments) and performs β-reduction.
Rules live in analysis/type/substitutions/rules/ and a few self-defined extensions.
trivial.sls – Literals, Symbols, and Implicit Conversionstrivial-process is the most complex rule because it handles the bottom of the AST.
Literals
| Literal form | Type attached |
|---|---|
| char? | char? |
| string? | string? |
| boolean? | boolean? |
| fixnum? | fixnum? |
| bignum? | bignum? |
| integer? | integer? |
| cflonum? | cflonum? |
| flonum? | flonum? |
| rational? | rational? |
| real? | real? |
| complex? / number? | number? |
Composite literals
(e₁ e₂ …) → (inner:list? node₁ node₂ …) where each nodeᵢ is either the child index-node or a virtual node created for unquote-splicing.#(e₁ e₂ …) → (inner:vector? node₁ node₂ …).(a . b) → (inner:pair? node-a node-b).Symbols (identifiers)
For a symbol reference, trivial-process looks up all available identifier-references and generates substitutions depending on the binding kind:
type-expressions will carry the real type).(boolean? <- (inner:list? something?)).index-node (or its type-expressions if already known).index-node of the initialization site. This creates a direct edge in the equation graph.Gradual Typing / Implicit Conversion
This is where the system converts “the code is executable” into “the parameter has the expected type”. When a local variable occurs as an argument in a function application, and that variable is a parameter, trivial-process generates a macro substitution on the variable’s definition site:
;; If `a` is the i-th parameter of a call to `+`
;; the substitution attached to `a`'s binding node becomes:
((with ((a b c))
((with ((x d0 d1 d2))
di) ;; di is the symbol for the i-th parameter
c))
+)
After macro expansion (see §6), this effectively says: “the type of a is whatever the type of +’s i-th parameter is”. This is the mechanism that turns (lambda (a) (+ 1 a)) into a : number? without any explicit annotation.
application.sls – Function ApplicationThe simplest rule:
(define (application-process document index-node)
(extend-index-node-substitution-list index-node (index-node-children index-node)))
It records that the type of the whole application is “the function node applied to the argument nodes”. The interpreter later reduces this.
lambda.sls – Lambda AbstractionFor (lambda (p₁ p₂ …) body):
index-nodes.(inner:list? p₁ p₂ …) via construct-parameter-index-nodes-products-with.(body <- (inner:list? p₁ p₂ …)) via construct-lambdas-with.Because body and each pᵢ are themselves index-nodes, the type remains symbolic until interpretation.
let.sls / letrec.sls – Binding Formslet is treated as syntactic sugar for lambda application:
(let ((x init)) body)
;; ≈
((lambda (x) body) init)
The rule therefore:
return-index-node (the last body expression) to the let node, and vice-versa.(identifier init), attaches init to identifier and identifier to init (bidirectional substitution).Named let receives additional handling: the loop identifier is given a function type constructed from the binding parameters and the body, exactly like lambda.
letrec-process is almost identical to let-process; the key difference is that the loop identifier is not present, so no extra lambda wrapper is built.
if.sls – ConditionalFor (if cond then else):
cond gets something? (we do not attempt boolean-truthiness analysis).then and else are each attached to the if node, and the if node is attached back to both branches. This encodes a union type: the value of the if may be either branch.For two-arm if, the union has two members; for one-arm if, only the then branch is attached.
define.sls – Top-Level & Internal DefinitionsTwo shapes:
(define (f p₁ p₂ …) body) – like lambda, but the resulting function type is attached to the identifier node of f, not to the define node.(define var init) – bidirectional link between var and init.case-lambda.sls – Overloaded ArityEach clause (((p₁ p₂ …) body)) generates an independent function type. All clauses are attached to the root case-lambda node, producing a list of alternative types. During interpretation the interpreter may resolve to any of them; in practice the call site arity usually selects the correct one.
record.sls – define-record-typeCollects the constructor, predicator, getters and setters exported by the record definition and assigns them canonical types:
predicator → (boolean? <- (inner:list? something?))constructor → (predicator <- (inner:list? something? …))getter → (something? <- (inner:list? predicator))setter → (inner:void? <- (inner:list? predicator something?))These are stored directly in the identifier-reference-type-expressions field so that they are available even when the record definition is imported from another file.
Note: Although
inner-type-checker.slsdefines aninner:record?recogniser of the form(inner:record? pred (inner:pair? getter τ) …), no rule currently generates such a unified record-type value. The record’s type is therefore represented indirectly through its predicate identifier and the per-field accessor signatures above.
cond.sls – Multi-Branch ConditionalEach clause (test expr₁ expr₂ …) is processed independently:
test gets something?.cond node (and back).begin.sls – SequencingThe type of a begin form is the type of its last expression:
(extend-index-node-substitution-list index-node (car (reverse (index-node-children index-node))))
ufo-match, ufo-try)The router in self-defined-rules/router.sls allows external libraries to register custom rules. Currently supported:
ufo-match – match-process recognizes (? pred var) patterns and attaches all available predicate identifiers (symbols ending in ?) to the pattern variable.ufo-try – try-process attaches union types for exception clauses.Once substitutions are generated, every index-node carries a list of raw type expressions. The interpreter resolves these into concrete types.
| Function | Purpose |
|---|---|
type:interpret |
Core interpreter. Returns a type:environment whose result-list contains the resolved types. |
type:interpret-result-list |
Convenience wrapper that extracts the result-list from type:interpret. |
type:depature&interpret->result-list |
First expands any macros inside the expression, then interprets. |
type:recursive-interpret-result-list |
Iteratively extends the substitution set to break recursive cycles (see §5.5). |
type:interpret->strings |
Pretty-prints results as human-readable strings. |
type:environment Record(define-record-type type:environment
(fields
(mutable substitution-list) ; global equations that may be extended
(mutable result-list))) ; output accumulator
The environment is passed through recursive calls so that macro expansions can consult and even extend the substitution list.
type:interpret – Case Analysistype:interpret receives (expression env memory max-depth). memory is a list of already-visited expressions (loop detection). max-depth defaults to PRIVATE-MAX-DEPTH (10).
The interpreter proceeds by structural case analysis:
| Expression shape | Action |
|---|---|
null? |
Return '() |
Depth exceeded or expression in memory |
Return the expression unchanged (give up). |
inner:executable? with macro head |
Expand the macro (see §6), interpret each expansion. |
inner:executable? with lambda head |
β-reduce: match arguments against parameter template, substitute into body, interpret body. |
index-node? |
Look up its substitution-list; interpret each substitution. |
inner:macro? |
Return unchanged (it needs an outer application context to expand). |
inner:list?, inner:vector?, inner:pair?, inner:lambda? |
Interpret each element independently, then take the Cartesian product of all element results. |
Plain list? (not an inner type) |
Interpret the car; if it becomes a lambda or macro, re-assemble and interpret again. |
identifier-reference? |
Return its type-expressions (for constructors/getters/setters) or the identifier itself. |
| Anything else | Return as-is (concrete atom). |
When a list type has multiple possible types for each element, the interpreter uses Cartesian product to enumerate all combinations. This is the main performance bottleneck.
private-generate-cartesian-product-procedure implements a tiered pruning strategy:
PRIVATE-MAX-CARTESIAN-PRODUCT (50 000), filter each dimension to keep only type:partially-solved? items.type:solved? items.something?).This graceful degradation prevents the interpreter from hanging on pathological code while still giving useful results for common cases.
Recursion is ubiquitous in Scheme. A naïve interpreter would loop forever on:
(define (factorial n)
(if (= n 0) 1 (* n (factorial (- n 1)))))
type:recursive-interpret-result-list solves this with an iterative fixed-point loop:
(let loop ([i 0] [targets '(expr)] [env env] [result '()])
(if (or (>= i max-recursion) (too-many-targets?))
(dedupe result targets)
(let* ([r0 (map (lambda (e) (type:depature&interpret->result-list e env)) targets)]
[r1 (filter type:solved? r0)])
(loop (+ i 1)
(filter (lambda (x) (not (type:solved? x))) r0)
env
(append result r1)))))
type:solved? into the result set.index-node placeholders may have accumulated new solved substitutions from siblings.PRIVATE-MAX-RECURSION (2) times or until the target set grows beyond PRIVATE-MAX-RECURSION-SET-SIZE (400).This is not a full fixed-point solver, but in practice it resolves most self-referential function types because the base case (if) introduces a union type that eventually becomes solved.
(define (type:solved? expression)
…)
An expression is solved when:
'something?, 'inner:void?).inner:trivial? type expression (i.e. not a raw application waiting for reduction).type:partially-solved? counts how many leaf nodes are solved; it is used by the Cartesian-product pruner.
Some R6RS primitives cannot be expressed as simple function types. car, for example, must return the first element type of a list, but the list’s length is unknown at compile time. The system solves this with a small macro language embedded in the type DSL.
with – Pattern Abstractionwith has two syntactic forms depending on whether the denotions (pattern templates) are wrapped in an extra list.
Form A — flat denotions (most common)
((with (a b c) ; denotions = (a b c)
body)
x y z) ; inputs = (x y z)
Form B — wrapped denotions (single-input scenarios)
((with ((a b c **1)) ; denotions = ((a b c **1))
body)
(inner:list? x y z)) ; inputs = ((inner:list? x y z))
When the interpreter sees a macro application, it:
candy:matchable? to see if the templates match.candy:match-left to produce a list of ((symbol . value) …) pairs.body via private-with.Critical invariant:
execute-macrocallscandy:matchable?withdenotionson the left andexpanded-inputson the right. The two lists must have the same nesting level — flat against flat, wrapped against wrapped. Mismatching them causescandy:matchable?to return#fand the macro is silently left unexpanded. See §6.7 Trap 1.
| Macro | Semantics |
|---|---|
with-append |
(with-append list1 list2) → (append list1 list2), then recursively execute |
with-equal? |
(with-equal? a b body) → body if a equals b, else the original macro expression unchanged |
with-equal? is the primary guard mechanism. When the guard fails, the entire expression is returned as-is; type:interpret then tries the next signature in the substitution list.
macro-head-execute-with — Gatekeeper(define (macro-head-execute-with expression interpreted-inputs)
(match expression
[(('with ((? inner:macro-template? denotions) **1) body)
(? inner:trivial? inputs) **1)
(execute-macro `((with ,denotions ,body) ,@interpreted-inputs))]
[else (raise 'macro-not-match:macro-head-execute-with)]))
The match pattern ((? inner:trivial? inputs) **1) uses ufo-match semantics: **1 means “one or more occurrences of the preceding pattern”, and the predicate inner:trivial? is checked per element. inputs binds to the entire list of matched elements.
| Input shape | Match result | Why |
|---|---|---|
(x) |
✅ inputs = (x) |
Single element satisfies inner:trivial? |
(x y z) |
✅ inputs = (x y z) |
All three satisfy inner:trivial? |
(x y z) where y is bare number? |
❌ match fails | Bare symbols fail inner:trivial? |
(x y z) where y is identifier-reference |
✅ match passes | identifier-reference? is accepted |
Field note: In real substitution generation every type predicate (
fixnum?,string?, etc.) is wrapped as anidentifier-referencebyconstruct-type-expression-with-meta. Bare symbols fail the guard, which is why synthetic repros using raw symbols sometimes look like macro bugs when they are actually data-format mismatches.
execute-macro — The Expander(define (execute-macro expression)
(match expression
[(('with ((? inner:macro-template? denotions) **1) body) inputs ...)
(let ([expanded-inputs (map execute-macro inputs)])
(if (and (andmap inner:trivial? expanded-inputs)
(candy:matchable? denotions expanded-inputs))
(execute-macro (private-with body (candy:match-left denotions expanded-inputs)))
expression))]
[('with-append (? list? a) (? list? b)) (execute-macro (append a b))]
[('with-equal? a b body) (if (equal? a b) (execute-macro body) expression)]
[else expression]))
Three checks before expansion:
andmap inner:trivial? expanded-inputs — every argument must be a valid type expression.candy:matchable? denotions expanded-inputs — shapes must align (see §6.7 Trap 1 for the wrapping-level requirement).candy:match-left produces bindings and private-with substitutes them into body.On failure, execute-macro returns the original expression unchanged. This residue then flows back into type:interpret-result-list and participates in Cartesian products downstream.
private-with — Substitution Engine(define (private-with body match-pairs)
(fold-left
(lambda (left pair)
(let ([denotion (car pair)] [input (cdr pair)])
(cond
[(symbol? denotion) (private-substitute left denotion input)]
[(index-node? denotion) (private-substitute left denotion input)]
[(identifier-reference? denotion) left] ; deliberately ignored
[(and (list? denotion) (list? input))
(if (candy:matchable? denotion input)
(private-with body (candy:match-left denotion input))
(raise 'macro-not-match:private-with-list?))]
[else (raise 'macro-not-match:private-with-else)])))
body
match-pairs))
private-substitute is a naïve tree walk that replaces every occurrence of from with to. When denotion is an identifier-reference, private-with returns left unchanged — an identifier-reference in pattern position acts as a wildcard that matches anything but binds nothing.
When both denotion and input are lists, private-with recurses with candy:match-left. The previous code used candy:match-right when input contained **1 or ..., but that path was broken: match-right expands list-valued bindings into multiple flat pairs, and fold-left + private-substitute can only consume the first one. See §6.7 Trap 3 for details.
carcar carries two signatures in rnrs-meta-rules.sls:
(car (with ((a b c **1)) (with-equal? inner:list? a b)))
(car (something? <- (inner:list? (inner:pair? something? something?))))
The first is a macro for inner:list? arguments; the second is a conservative function type for inner:pair? arguments. type:interpret tries them in order.
Apply the macro to (inner:list? fixnum? number?):
execute-macro sees a with application:
((with ((a b c **1)) (with-equal? inner:list? a b))
(inner:list? fixnum? number?))
candy:matchable? checks template (a b c **1) against argument (inner:list? fixnum? number?).
a matches inner:list?.b matches fixnum?.c **1 matches (number?) (one occurrence).candy:match-left produces bindings:
((a . inner:list?) (b . fixnum?) (c . (number?)))
private-with substitutes into body (with-equal? inner:list? a b):
(with-equal? inner:list? inner:list? fixnum?)
execute-macro evaluates with-equal?:
(equal? inner:list? inner:list?) → #t.fixnum?.If the argument is not an inner:list? (e.g. an inner:pair?):
a ≠ inner:list?.with-equal? returns the original macro expression unchanged.type:interpret sees the macro was not reduced, tries the next signature (the pair version), and returns something?.The template (a b c **1) requires the argument to contain at least three elements (a, b, and at least one c). Therefore:
| Argument shape | Macro matches? | Result |
|---|---|---|
(inner:list? fixnum? number?) (3 elems) |
✅ | fixnum? |
(inner:list? fixnum? number? string?) (4 elems) |
✅ | fixnum? |
(inner:list? fixnum?) (2 elems) |
❌ | falls back to pair version → something? |
(inner:pair? something? something?) |
❌ (guard fails) | something? |
The same pattern applies to all c*r macros listed below.
c*r Macro FamilyThe cdr/cddr/cdddr/cddddr macros already exist in rnrs-meta-rules.sls. They use with-append (inner:list?) to reconstruct the tail type:
(cdr (with ((a b c **1)) (with-equal? inner:list? a (with-append (inner:list?) c))))
(cddr (with ((a b c d **1)) (with-equal? inner:list? a (with-append (inner:list?) d))))
(cdddr (with ((a b c d e **1)) (with-equal? inner:list? a (with-append (inner:list?) e))))
(cddddr (with ((a b c d e f **1)) (with-equal? inner:list? a (with-append (inner:list?) f))))
The following inner:list? macros have been added to rnrs-meta-rules.sls. They improve hover-tooltip precision for code that uses cadr, caddr, etc. on known-length lists.
cadr – second element(cadr (with ((a b c d **1)) (with-equal? inner:list? a c)))
| Binding | Value for (inner:list? τ₁ τ₂ τ₃ τ₄ …) |
|---|---|
a |
inner:list? |
b |
τ₁ |
c |
τ₂ ← returned |
d |
(τ₃ τ₄ …) |
Requires the argument to contain at least four elements. Shorter lists fall back to the pair version.
caddr – third element(caddr (with ((a b c d e **1)) (with-equal? inner:list? a d)))
Requires at least five elements. d binds τ₃.
cadddr – fourth element(cadddr (with ((a b c d e f **1)) (with-equal? inner:list? a e)))
Requires at least six elements. e binds τ₄.
caar, caaar, cadar, cddar, …)Macros that decompose nested list types are written by nesting with macro applications inside the macro body. execute-macro recursively expands them, and since the interpreter pre-expands macro arguments (including with-append and with-equal?) before matching a macro head, inner cdr steps work as well.
(caar (with ((a b c ...))
(with-equal? inner:list? a
((with ((d e f **1)) (with-equal? inner:list? d e)) b))))
(caaar (with ((a b c ...))
(with-equal? inner:list? a
((with ((d e f ...))
(with-equal? inner:list? d
((with ((g h i **1)) (with-equal? inner:list? g h)) e)))
b))))
(cadar (with ((a b c ...))
(with-equal? inner:list? a
((with ((d e f ...))
(with-equal? inner:list? d
((with ((g h i ...)) (with-equal? inner:list? g h))
(with-append (inner:list?) f))))
b))))
(cddar (with ((a b c ...))
(with-equal? inner:list? a
((with ((d e f ...))
(with-equal? inner:list? d
((with ((g h i ...)) (with-equal? inner:list? g (with-append (inner:list?) i)))
(with-append (inner:list?) f))))
b))))
The nesting depth is limited only by readability. In practice only caar, cadar, cddar, and caaar appear often enough in real code to justify the extra signature lines.
c*r functions| Function | List macro? | Kind | Min elems | Notes |
|---|---|---|---|---|
car |
✅ | flat (a b c **1) |
3 | returns b |
cdr |
✅ | flat (a b c **1) |
3 | returns (with-append (inner:list?) c) |
caar |
✅ | nested with |
3 (inner) | car of car |
cadr |
✅ | flat (a b c d **1) |
4 | returns c |
cdar |
✅ | nested with |
3 (inner) | cdr of car |
cddr |
✅ | flat (a b c d **1) |
4 | returns (with-append (inner:list?) d) |
caaar |
✅ | nested with |
3 (inner) | car of car of car |
caadr |
❌ | — | — | feasible (3-layer nested with), not yet added |
cadar |
✅ | nested with |
3 (inner) | car of cdr of car |
caddr |
✅ | flat (a b c d e **1) |
5 | returns d |
cdaar |
❌ | — | — | feasible, not yet added |
cdadr |
❌ | — | — | feasible, not yet added |
cddar |
✅ | nested with |
3 (inner) | cdr of cdr of car |
cdddr |
✅ | flat (a b c d e **1) |
5 | returns (with-append (inner:list?) e) |
caaaar |
❌ | — | — | feasible (4-layer nested with), extremely rare in practice |
caaadr |
❌ | — | — | feasible, not yet added |
caadar |
❌ | — | — | feasible, not yet added |
caaddr |
❌ | — | — | feasible, not yet added |
cadaar |
❌ | — | — | feasible, not yet added |
cadadr |
❌ | — | — | feasible, not yet added |
caddar |
❌ | — | — | feasible, not yet added |
cadddr |
✅ | flat (a b c d e f **1) |
6 | returns e |
cdaaar |
❌ | — | — | feasible, not yet added |
cdaadr |
❌ | — | — | feasible, not yet added |
cdadar |
❌ | — | — | feasible, not yet added |
cdaddr |
❌ | — | — | feasible, not yet added |
cddaar |
❌ | — | — | feasible, not yet added |
cddadr |
❌ | — | — | feasible, not yet added |
cdddar |
❌ | — | — | feasible, not yet added |
cddddr |
✅ | flat (a b c d e f **1) |
6 | returns (with-append (inner:list?) f) |
Note: For every function in the table, an additional fallback signature
(something? <- (inner:list? (inner:pair? something? something?)))is kept inrnrs-meta-rules.slsso that short lists and generic pairs still type-check conservatively.
c*r functions are not addedAfter the execute-macro fix (§6.5.4), all c*r functions can theoretically receive a list macro—there is no longer a technical block. The remaining ones are omitted for two pragmatic reasons:
with (e.g. caaaar) is ~10 lines of dense parenthesis soup. It is easy to get a bracket wrong and hard to review.caaaar, caaadr, caadar, …) appear so rarely in real Scheme code that the pair fallback (something? <- (inner:list? (inner:pair? something? something?))) is sufficient for IDE use.If a project genuinely needs caadr or cdaar, the pattern is identical to cadar/cddar: nest with macros, use with-append (inner:list?) for every cdr step, and keep the outer with-equal? inner:list? guard at each level.
candy:matchable? – Dynamic-Programming MatcherThe matcher lives in syntax-candy.sls. It segments a list into alternating fixed and repeatable sections. A segment is delimited by ... or **1.
Matching is performed by filling a 2-D matrix (rows = template segments, columns = argument segments) with three statuses:
'matched – both segments align.'skipped – one side is skipped (used for zero-or-more).'unused – unreachable cell.The matrix is explored depth-first, allowing paths that step right or down but never diagonally. This is essentially a restricted regular-expression matcher for heterogeneous lists. It supports overlapping repeatable segments, which is needed for types like (inner:list? number? ... string? ...).
execute-macro passes denotions and expanded-inputs to candy:matchable? without any unwrapping. The two must share the same nesting level.
| Denotions | Inputs | Result |
|---|---|---|
(a b c) (flat) |
(x y z) (flat) |
✅ match |
((a b c **1)) (wrapped) |
((x y z)) (wrapped) |
✅ match |
(a b c) (flat) |
((x y z)) (wrapped) |
❌ mismatch — candy:matchable? returns #f |
((a b c **1)) (wrapped) |
(x y z) (flat) |
❌ mismatch |
If you write a macro with wrapped denotions (Form B), the caller must also wrap the arguments. The c*r macros that use inner:list? (§6.5) rely on this wrapping because the entire list type (inner:list? τ₁ τ₂ …) is treated as a single macro argument.
execute-macro requires (andmap inner:trivial? expanded-inputs). inner:trivial? accepts index-node?, identifier-reference?, something?, and inner:void?. It rejects bare symbols such as number? or fixnum?.
In real substitution rules all predicates are identifier-reference records produced by construct-type-expression-with-meta. However, hand-written unit tests that feed raw symbols will silently fail to match any macro, leaving the with expression unexpanded. If that residue later flows into a Cartesian product, the product includes the raw with expression as one of its branches, which usually produces a meaningless type.
Rule of thumb: never use bare symbols in synthetic repros. Wrap them with
construct-type-expression-with-metaor at least ensure they satisfyinner:trivial?.
candy:match-right & private-with Interaction (Fixed)When a template contains repeated segments (**1 or ...), candy:match-right returns a flat list of pairs:
;; template: (a b c **1) argument: (x y z w)
;; candy:match-right produces:
((a . x) (b . y) (c . z) (c . w))
private-with then iterates over these pairs with fold-left and private-substitute. Because private-substitute replaces every occurrence of the denotion symbol, the two (c . _) pairs overwrite each other rather than accumulating into a list. This means repeated-segment substitution inside private-with was incomplete for the candy:match-right path.
A more severe variant was discovered: even when the template has no repeated segments, if the argument itself is a list type expression (e.g. (inner:list? something? ...) which contains ... or **1), match-right would still expand the matched binding into multiple pairs:
;; template: (x d0 d1)
;; argument: (inner:list? (something? <- ...) (inner:list? something? ...) **1)
;; match-right incorrectly produces:
((x . inner:list?)
(d0 . (something? <- ...))
(d1 . inner:list?) <-- d1 bound to first element only
(d1 . something?)
(d1 . ...))
private-with would then substitute d1 with inner:list? and discard something? and ..., causing the parameter type to collapse from (inner:list? something? ...) to bare inner:list?.
Fix (2025-05-11): private-with now always uses candy:match-left, which preserves list-valued bindings as a single pair:
;; match-left correctly produces:
((x . inner:list?)
(d0 . (something? <- ...))
(d1 . (inner:list? something? ...))) <-- d1 keeps the full list
User concern:
interpreted-inputspassed to macros (e.g. the type signature ofmap) can themselves contain**1or.... The oldmatch-rightbranch would trigger on any**1in the input, not just repeated-segment templates. Usingmatch-leftuniversally is safer because it never fragments a binding.
When execute-macro fails to expand (guard fails or mismatch), it returns the original macro expression unchanged. If this residue is fed back into type:interpret-result-list, it participates in the Cartesian product of type combinations. The product generator does not know how to reduce a with expression, so the resulting branch is typically discarded as an invalid type, but it wastes time and may confuse downstream consumers.
Lesson learned:
application.slswas briefly modified to injectwithmacros into every call site. Even when the macro did not expand, the residue broketest-defineandtest-letbecause unexpandedwithterms appeared inside substitution lists that the interpreter tried to fold into type sets. The fix was to revertapplication.slsto its simple form.
with-append Is Not appendwith-append is only recognised inside execute-macro. It is not a general function. You cannot write:
;; WRONG — with-append is not in scope outside a macro body
(let ([tail (with-append (inner:list?) (number?))]) ...)
It must appear as the immediate expression being interpreted, and its arguments must be literal lists known at macro-expansion time.
A template like (a b c **1) requires the argument to contain at least three elements. If the list is shorter at runtime, the macro does not match and falls back to the next signature. This is by design — car on a two-element list cannot statically know whether the first element is the “head” or the “tail” in a type sense — but it means short lists always degrade to the conservative pair fallback (something?).
identifier-reference As WildcardIn private-with, if the denotion is an identifier-reference?, the branch returns left unchanged:
[(identifier-reference? denotion) left] ; deliberately ignored
This means an identifier-reference in pattern position acts as a wildcard that matches anything but binds nothing. It is useful when you want a segment to match without capturing its value (e.g. forcing a shape check but discarding the matched predicate). However, if you intended to capture the matched value into a variable, using an identifier-reference instead of a symbol will silently drop it.
domain-specific-language/interpreter.sls exports three relation predicates:
type:->? – Subtype(type:->? left right env) asks whether left is a subtype of right.
Rules (in order):
equal? → #t.left is something? → #f (top is supertype of nothing concrete).right is something? → #t.identifier-reference?s → walk the identifier-reference-parents chain (e.g. integer? → rational? → real? → number?).inner:list? to inner:pair? via inner:?->pair, then element-wise type:->?.right appears in (type:interpret-result-list left).type:<-? – SupertypeReverse of type:->?.
type:=? – Type Equality(and (type:->? left right) (type:<-? left right)).
rnrs-meta-rules)analysis/type/substitutions/rnrs-meta-rules.sls contains a large sorted alist that maps primitive names to their type signatures. It covers R6RS base, R6RS standard libraries, and Chez Scheme extensions—more than 1 100 entries.
Examples:
(+ (number? <- (inner:list? number? ...)))
(- (number? <- (inner:list? number? **1)))
(map ((inner:list? something? ...) <-
(inner:list?
(something? <- (inner:list? something?))
(inner:list? something? ...) **1)))
(apply (something? <-
(inner:list?
(something? <- (inner:list? something?))
something? ...
(inner:list? something? ...))))
During workspace initialization (init-type-expressions in analysis/identifier/meta.sls), these signatures are loaded and attached to the corresponding identifier-reference records in the top-level environment. This is the seed data that makes the entire inference engine work: without knowing that + expects number?, the implicit conversion in trivial-process would have no target type.
Type inference is gated by the type-inference? flag passed to init-workspace. When enabled:
init-references (in analysis/workspace.sls) runs construct-substitutions-for on every document in topological order.identifier-reference-type-expressions are populated by calling type:recursive-interpret-result-list on each exported identifier.When a file is changed:
update-file-node-with-tail rebuilds the document’s index-node-list.refresh-workspace-for determines whether the library header changed. If not, only the affected file’s substitutions are regenerated.shrink-paths computes topological batches; init-references re-runs step on the affected batch.Currently the type system is used primarily for hover information (not yet for full publishDiagnostics). The textDocument/hover handler can look up an identifier’s type-expressions, map them through type:interpret->strings, and display them to the user. The old documents listed “TODO: Diagnostic based on Type System”; the infrastructure is in place, but the actual diagnostic rules (type-mismatch errors) are not yet implemented.
xxx-process rule.something?; the server never hangs.Cartesian-product explosion A function with 4 parameters each having 4 possible types generates 4⁴ = 256 combinations. In the worst case this is factorial growth. The tiered pruner mitigates but does not eliminate the problem.
Approximate recursion handling
type:recursive-interpret-result-list uses a hard iteration limit (2). Deeply nested recursive types may not fully resolve.
No polymorphic generalisation
The system does not implement HM’s ∀α.τ generalisation. A function like (lambda (x) x) gets type (something? <- (inner:list? something?)), not a true parametric type. This is usually sufficient for IDE use but can lose precision.
call/cc and control effects
Continuations, dynamic-wind, and other control operators are inherently beyond the scope of this type system. They are typed as something? <- ….
Mutability is not tracked
set! and mutable data structures are not reflected in types. (define x 1) followed by (set! x "hello") may leave x with type fixnum? because the substitution graph does not model temporal mutation.
max-depth is not propagated through recursive calls
type:interpret accepts a max-depth parameter, but every internal recursive call invokes the 3-arity (type:interpret-result-list expr env memory) which silently falls back to the hard-coded PRIVATE-MAX-DEPTH (10). Consequently:
max-depth from the outside has no effect on nested expansions.binary-search in util/binary-search.sls) are expanded to depth 10 regardless of the caller’s intent.memory-based cycle detection only records expressions on the current call chain, cross-function recursion (e.g. binary-search → private-collect → self-call) is not caught when the recursive target is reached through a substitution rather than a direct type:interpret-result-list call.natural-order-compare survives the same mechanism because its body is simple (let + if + one recursive call); binary-search fails because its clause-2 body contains cond with three branches, let*, if, multiple recursive call sites, and a mutual call to private-collect.Short-term mitigation: Lower PRIVATE-MAX-DEPTH from 10 to a smaller value (e.g. 3–4).
Proper fix: Thread max-depth through every internal recursive call in interpreter.sls (lines 224, 231, 238, 258, 268, 278, 285).
This section records a systematic benchmarking and optimization effort on the type-inference subsystem. It is intended to prevent future developers from repeating the same dead ends.
A standalone benchmark runner was added at bin/benchmark-type-inference.ss. It measures wall-clock time, CPU time, GC time, and result-list size for:
init-workspace (with and without type inference)construct-substitutions-for (Phase I)type:interpret-result-list on specific functions (binary-search, natural-order-compare, assq-ref)hover)Key baseline numbers (before any optimization):
| Benchmark | Real | Results |
|---|---|---|
type:interpret binary-search cl |
158.7 s | 21,603 |
type:interpret natural-order-compare cl |
12.9 s | 218 |
init-workspace (with ti) |
>600 s (timeout) | — |
hover assq-ref (with ti) |
72.3 s | 0 |
The extreme gap between binary-search (21k results, 158s) and natural-order-compare (218 results, 13s) shows that the problem is not type inference per se, but the branching factor of recursive functions under the current depth limit.
dedupe (Success)Change: Rewrite util/dedupe.sls to use make-hashtable with equal-hash / equal? when the equality predicate is equal?. Custom predicates fall back to the original O(n²) filter.
Result:
type:interpret binary-search cl: 158.7 s → 100.0 s (−37%)init-workspace (no ti): 44.1 s → 38.7 s (−12%)Lesson: Deduplication is a real bottleneck when result lists grow to 20k+ items, but it is not the root cause. Even with O(n) dedupe, binary-search still takes ~100 s because the interpreter still generates 21k results.
max-depth Through Internal Calls (Failure)Hypothesis: type:interpret-result-list drops max-depth in 7 internal recursive call sites, causing all nested expansions to fall back to PRIVATE-MAX-DEPTH = 10. Fixing this would cap the explosion.
Change: Pass max-depth explicitly at lines 224, 231, 238, 258, 268, 278, 285.
Result: No measurable effect. binary-search cl remained at ~100 s with 21,603 results.
Root cause analysis: The external call (type:interpret-result-list target-node) already uses the default PRIVATE-MAX-DEPTH = 10. Internal calls falling back to 10 is therefore a no-op in this scenario. The real problem is that depth 10 itself is too deep for a highly branching recursive function.
Lesson: Passing max-depth is a correct code hygiene fix, but it does not solve the binary-search timeout on its own.
type:interpret (Failure → False Start → Safe but Ineffective)Hypothesis: The same sub-expressions are re-interpreted hundreds of times across cartesian-product branches. Caching (expression, env) → result-list should eliminate redundant work.
First implementation (at type:interpret level): On cache hit, set env.result-list directly and skip private:interpret-core.
Result: Catastrophic. Result count ballooned from 21,603 → 32,403, and runtime increased from 100 s → 181 s.
Root cause analysis: env is a mutable record shared across the entire call chain. When an internal cache hit mutated env.result-list, outer fold-left append calls in private:interpret-core collected stale or duplicated values, corrupting the aggregation.
Second implementation (at type:interpret-result-list level): Cache only the returned value, do not mutate env. Only index-node? expressions are cached (using make-eq-hashtable for O(1) identity lookup).
Result: Semantically safe — result count stayed at 21,603. But speedup was negligible (~1%).
Root cause analysis: binary-search is deeply recursive. After the first encounter of an index-node, it is immediately appended to memory. Subsequent recursive references to the same node find it in memory and are truncated, so the cache condition (not (contain? memory expression)) fails. Cache hits only occur for top-level expressions referenced from multiple distinct parents before they enter memory — a rare scenario for recursive functions.
Lesson: Memoization is safe when implemented as a value cache outside the mutable env, but for recursive functions the memory-based truncation mechanism defeats it.
private:interpret-core Extraction (Unexpected Side Effect)Change: Refactor type:interpret by extracting its body into private:interpret-core and simplifying the case-lambda fallback chain so that 1-arity and 2-arity calls reach the 4-arity body directly instead of hopping through intermediate arities.
Result: No semantic change, but type:interpret binary-search cl dropped from ~100 s → 82.8 s and the full benchmark suite completed for the first time (previously it timed out at 600 s because earlier cases consumed too much time).
Lesson: The improvement is likely attributable to reduced call-stack overhead and/or better compiler optimisation from smaller functions, not a fundamental algorithmic win.
Based on the above attempts, the following interventions are known to have real, measured impact:
| Intervention | Status | Verified Impact |
|---|---|---|
Hash-set dedupe |
✅ Merged | −37% on binary-search cl |
Simplify case-lambda fallbacks |
✅ Merged (as part of memoization revert) | ~−15–20% on binary-search cl |
Thread max-depth |
❌ Reverted | None (default depth = 10 anyway) |
Memoization (type:interpret) |
❌ Reverted | Negative (pollutes shared env) |
Memoization (type:interpret-result-list) |
❌ Reverted | ~0% (blocked by memory truncation) |
The remaining bottleneck is the depth-10 expansion of highly branching recursive functions. The most promising fixes not yet attempted are:
PRIVATE-MAX-DEPTH (e.g. from 10 → 3–4) — trivial one-line change.env.result-list (e.g. 1,000) and truncate to something?.type:partially-solved? — the cartesian-product pruner calls this predicate for every element on every tier; memoizing it would avoid repeated tree traversals.memory with a per-top-level-call hash set so that cross-function recursion (binary-search ↔ private-collect) is detected regardless of call path.| File | Responsibility |
|---|---|
analysis/type/substitutions/generator.sls |
construct-substitutions-for, step dispatcher, rule routing. |
analysis/type/substitutions/util.sls |
construct-lambdas-with, construct-parameter-index-nodes-products-with. |
analysis/type/substitutions/rnrs-meta-rules.sls |
Large alist of primitive type signatures. |
analysis/type/substitutions/rules/trivial.sls |
Literals, symbols, implicit conversion (Gradual Typing). |
analysis/type/substitutions/rules/application.sls |
Vanilla function application. |
analysis/type/substitutions/rules/lambda.sls |
Lambda abstraction types. |
analysis/type/substitutions/rules/let.sls |
let, let*, named let. |
analysis/type/substitutions/rules/letrec.sls |
letrec, letrec*. |
analysis/type/substitutions/rules/if.sls |
Conditional union types. |
analysis/type/substitutions/rules/define.sls |
define for variables and procedures. |
analysis/type/substitutions/rules/case-lambda.sls |
Overloaded arities. |
analysis/type/substitutions/rules/record.sls |
define-record-type types. |
analysis/type/substitutions/rules/cond.sls |
Multi-branch conditional. |
analysis/type/substitutions/rules/begin.sls |
Sequencing. |
analysis/type/substitutions/self-defined-rules/router.sls |
Extensible rule dispatcher. |
analysis/type/substitutions/self-defined-rules/ufo-match/match.sls |
ufo-match pattern support. |
analysis/type/domain-specific-language/interpreter.sls |
type:interpret, type:->?, macro expander, Cartesian-product pruner. |
analysis/type/domain-specific-language/inner-type-checker.sls |
Predicates: inner:trivial?, inner:lambda?, inner:macro?, inner:executable?, inner:?->pair. |
analysis/type/domain-specific-language/syntax-candy.sls |
candy:matchable?, candy:match-left, candy:match-right, segment-based DP matcher. |
analysis/identifier/meta.sls |
construct-type-expression-with-meta, loader for rnrs-meta-rules. |
car Macro & Synthetic-Repro Pitfall (2026-04)Retrospective note: The original write-up of this section contained an incorrect root-cause analysis. The symptoms described below were real, but they stemmed from a flawed synthetic repro rather than from bugs in
interpreter.sls. The section has been corrected to reflect what actually happened.
A hand-written test (test-car-macro-bug.sps) constructed a raw car macro application using bare symbols for type predicates:
(type:interpret-result-list
`((with ((a b c **1)) (with-equal? inner:list? a b))
(inner:list? fixnum? string?))
env)
The interpreter returned the macro expression unchanged instead of fixnum?.
The synthetic repro was unrealistic. In real substitution generation, every type predicate (fixnum?, string?, number?, etc.) is wrapped as an identifier-reference record by construct-type-expression-with-meta. inner:trivial? accepts identifier-reference objects, so real macro arguments pass the (? inner:trivial? inputs) **1 guard in macro-head-execute-with without issue.
In the synthetic repro, fixnum? and string? are bare symbols. They fail inner:trivial?, so ufo-match rejects the argument list before any macro logic runs. The macro never gets a chance to call candy:matchable?.
syntax-candy.sls Hardening Actually FixedThe syntax-candy.sls changes made during this investigation did not fix a car macro bug — the macro already worked in real code — but they did eliminate several latent risks in the matcher:
| Change | Risk eliminated |
|---|---|
private-segment now rejects templates with more than one repeat marker (... / **1) |
Prevents ambiguous DP-matcher states that could silently produce wrong bindings. |
candy:match 2-arity now returns grouped symbol results directly |
Removes the fragile fold-left grouping logic from candy:match-left; repeated segments can no longer be split into multiple bindings by accident. |
private:group-match-pairs compares segment-type instead of equal? on record objects |
Removes an implicit dependency on Chez Scheme’s default record equality, which could break if the segment record layout ever changes. |
candy:match-right expands grouped results back to flat (rest-type . ready-type) pairs |
Keeps backward compatibility with private-with in interpreter.sls. |
| Test | Status | Notes |
|---|---|---|
test-inner-type-checker.sps — direct car/cdr macro assertions |
✅ Pass | car returns first element type; cdr returns tail type. |
test-lambda.sps — parameter-index-node type access |
✅ Pass | Implicitly exercises the same with/with-equal? path. |
test-syntax-candy.sps |
✅ Pass | No regressions after matcher hardening. |
When debugging macro expansion, always feed the interpreter real type expressions (via construct-type-expression-with-meta or through the full substitution generator). Hand-constructing bare-symbol repros is fast, but it bypasses the identifier-reference wrapping that the rest of the pipeline assumes, producing false negatives that look like macro bugs but are actually data-format mismatches.