Kitten

Introduction

This is a quick introduction to Kitten. It glosses over some nuances of the language in order to provide a fast, high-level overview so you can quickly get started with using Kitten. I’m working on a more complete treatment of how to use Kitten in the form of a tutorial.

Comments

Comments are C-style. Multi-line comments are allowed to nest.

// Single-line comment.

/*
  Multi-line
  comment.
*/

/*
  Nested

  /* multi-line */

  comment.
*/

Literals

Integers consist of decimal digits, or a base prefix and digits in the corresponding base. They have type Int32 by default; this can be overridden with a suffix.

6       // Decimal
0b1010  // Binary
0o755   // Octal
0x10    // Hexadecimal
5i64    // Decimal Int64
0xFFu8  // Hexadecimal UInt8

Floating-point numbers consist of decimal digits, a decimal point (.), a fractional part, and an optional exponent. They have type Float64 by default, and this can also be overridden with a suffix.

6.28    // Regular notation
.5      // Leading digit omitted
1.0e+9  // Scientific notation
2.0f32  // Float32

Text literals use double-quotes (""), and are syntactic sugar for lists of characters. They have type List<Char>. Common character escapes are allowed using a backslash \. Character literals use single quotes ('').

// []
""

// ['M', 'e', 'o', 'w', '!', '\n']
"Meow!\n"

Ordinary text literals can’t contain newlines; a multi-line text literal is called a paragraph and consists of text wrapped in three double-quote marks. Paragraphs may be indented, and the indentation is stripped from each line.


"""
Here’s a paragraph.
It contains multiple lines.
(Haiku optional.)
"""

  """
  1 2 3 4 5
  6 7 8 9 10

  11 12
  """

// Equivalent to:

"Here’s a paragraph.\nIt contains multiple lines.\n(Haiku optional.)"

"1 2 3 4 5\n6 7 8 9 10\n\n11 12"

Lists

Lists are surrounded by square brackets ([]) with comma-separated elements of the same type. They have type List<T> where T is the type of the elements. A trailing comma is allowed after the last element.

// <T> List<T>
[]

// List<Int32>
[1, 2, 3]

// List<List<Char>>
[
  "four",
  "five",
  "six",
]

Note that the type of [] is <T> List<T>: an empty list may be used where a list of any type is expected.

Expressions and Stack-based Evaluation

Kitten is an expression language; there are no statements, only expressions that compute a result. Expressions may use infix operators with different precedences, as in many other languages. Kitten operators are not built into the language—they are ordinary functions with symbolic names.

// (2 * 2) + (3 * 3)
2 * 2 + 3 * 3

The biggest syntactic difference between Kitten and C-like languages is that function calls are postfix: the function name is written after the arguments. Instead of f(x, y), we write x y f. You can think of this like a generalization of the method-call syntax x.f() in many object-oriented languages, and it easily permits chaining multiple functions together in a “pipeline”.

"Meow!\n" print

// Prints "3-" followed by a newline.
-3 abs neg show reverse say

You can call an operator in postfix form by wrapping it in parentheses. You can also partially apply one operand: (/ 3) (“divide by three”) is shorthand for -> x; x / 3; likewise, (1 -) (“one minus”) is short for -> x; 1 - x.

2 2 (*) 3 3 (*) (+)

// (x * 2)
x (* 2)

// (2 - x)
x (2 -)

Here we start to see Kitten’s stack-based evaluation model. When computing 2 2 (*), we push 2 to the data stack, push another 2, then call * to multiply them, leaving the result on the stack. So one way to read postfix code is as a sequence of imperative commands operating on the stack.

2    // push 2   => 2
2    // push 2   => 2 2
(*)  // multiply => 4
3    // push 3   => 4 3
3    // push 3   => 4 3 3
(*)  // multiply => 4 9
(+)  // add      => 13

However, this is the low-level way to think about Kitten programs. At a high level, postfix syntax can be thought of as representing a data flow graph.

2233
2233
**
49
+
13

Locals

Local variables can be introduced with the -> name; syntax (or Unicode → name;). This takes a value from the stack and stores it in a local variable for the remainder of scope. Mentioning a local has the effect of pushing its value to the stack.

// 25
5 -> x; x * x

We can also introduce multiple variables at once; they are taken from the stack in the order they were pushed.

1 2 3 -> x, y, z;

x say  // 1
y say  // 2
z say  // 3

Local variables are always immutable. Pushing a local to the stack once simply moves its value (as if by memcpy); pushing the variable multiple times copies its value, as long as the type is copyable.

Conditionals and Blocks

Conditional execution uses the ifelifelse expression. Blocks can be surrounded by curly braces {} like in C, or prefixed with a colon : and delimited by indentation like in Python. The : form is called “layout” and desugars into the {} form.

if (food_dish is_empty) {
  ":<" say
  beg_for_food
} elif (mouse_nearby) {
  eat_mouse
} else {
  purr
}

if (food_dish is_empty):
  ":<" say
  beg_for_food
elif (mouse_nearby):
  eat_mouse
else:
  purr

Layout syntax is generally preferred for multi-line blocks.

You can omit the condition part of an if expression to take it from the stack. For example, here’s how the standard not word is defined.

// Start with a local variable…
-> x; if (x) { false } else { true }

// …take it from the stack…
-> x; x if { false } else { true }

// …and remove the redundancy.
if { false } else { true }

An if without an else is equivalent to an if with an empty else block. In other words, the else branch is the identity function.

if (have_food) { purr }
if (have_food) { purr } else {}

Algebraic Data Types

Kitten has algebraic data types (ADTs) defined with the type keyword. A type may have any number of constructors, introduced with the case keyword, and each constructor may have any number of fields, with the field types given in parentheses.

// Enumeration
// Multiple constructors with no fields
type Bool:
  case false
  case true

// Enumeration
// Explicitly indicating no fields
type Bool:
  case false ()
  case true ()

// Structure
// One constructor with multiple fields
type Pair<A, B>:
  case pair (A, B)

// A pair of type Pair<Float64, Char>
2.0 'x' pair

// Tagged Union
// Multiple constructors
// with different numbers of fields
type Optional<T>:
  case none
  case some (T)

// An empty Optional
// of type <T> Optional<T>
none

// A full Optional
// of type Optional<Int32>
5 some

A match expression takes an instance of an ADT, matches on its tag, and unpacks its fields (if any) onto the stack so they can be manipulated. You can think of match as the inverse of a constructor: whereas pair has the type A, B -> Pair<A, B>, the expression -> p; match (p) case pair {} has the type Pair<A, B> -> A, B—the inputs and outputs are swapped.

match (xs head)
case some:
  -> x;
  "Non-empty list; first value: " print
  x say
case none:
  "Empty list." say

The value being pattern-matched (x in match (x)) is called the scrutinee. Similarly to if, you can omit this from a match expression, in which case the value is taken from the stack. That is, match (x) case /* ... */ is equivalent to x match case /* ... */.

This code constructs a pair, deconstructs it, swaps its elements, and makes a new swapped pair.

// (1, 2)
1 2 pair

// (2, 1)
match case pair { swap pair }

A match expression may include an else branch, which is executed if none of the cases matched. If a match expression does not include a case for every constructor of a data type, and has no else branch, it’s called non-exhaustive. In that case, the compiler produces an implicit else branch, which calls fail to abort the program with an error message. This also causes the match expression to require the +Fail permission, which will be explained later on.

Function Definitions

You can define new functions with the define keyword. Every definition has a name, a type signature, and a body. Type signatures are written in parentheses, with the input and output types to the left and right, respectively, of an ASCII rightwards arrow -> or its Unicode equivalent (U+2192).

This definition squares a number by copying it and multiplying it with itself.

define square (Int32 -> Int32):
  dup (*)

A user-defined function, just like any other function, is called when we mention its name.

// 16
4 square

To push a function to the stack instead of calling it, wrap it in a block. This denotes an anonymous function, called a quotation. Quotations are often used with higher-order functions which take other functions as arguments, such as map, filter, fold_left, and zip_with.

// [1, 4, 9]
[1, 2, 3] { square } map

// [10, 20, 30]
[1, 2, 3] { (* 10) } map

The backslash syntax \f is a slightly more succinct equivalent, with the advantage that an operator can be referred to as simply \+ instead of { (+) }. Mnemonically, this is “escaping” the function to prevent it from being evaluated. This also works with partially applied operators.

// [1, 4, 9]
[1, 2, 3] \square map

// [2, 3, 4]
[1, 2, 3] \(+ 1) map

// [4, 10, 18]
[1, 2, 3] [4, 5, 6] \* zip_with

Lambdas

Whereas Kitten decouples the concepts of local variable bindings (->;…) and anonymous functions ({}) by default, in practice we often want to couple these things together. For example, in a match expression, we often want to name the values we extracted, if present. We could use a local variable binding inside a block:

match (xs head)
case some:
  -> x;
  x say
else:
  "nothing" say

But we can also use the combined “lambda” form, ->{}, which is compatible with the usual layout syntax for blocks:

match (xs head)
case some -> x {
  x say
} else {
  "nothing" say
}

match (xs head)
case some -> x:
  x say
else:
  "nothing" say

-> x y z { x + y + z } is exactly equivalent to { -> x y z; x + y + z }. Note that this also works on definitions.

define square
(Int32 -> Int32)
-> x:
  x * x

Multiple Inputs and Outputs

As in most languages, functions can take multiple inputs. In the type signature, their types are separated by commas.

define multiply3
(Float64, Float64, Float64 -> Float64):
  -> x, y, z;
  x * y * z

Notice what happens when we write this function in postfix:

define multiply3
(Float64, Float64, Float64 -> Float64):
  -> x, y, z;
  x y z (*) (*)

We’re pulling x, y, and z off the stack and pushing them right back! Of course, the compiler can trivially optimize this away, but it’s redundant. This is a good place to apply Kitten’s common practice of chaining functions without using local variables. We can just pass arguments and return values implicitly on the stack.

define multiply3
(Float64, Float64, Float64 -> Float64):
  (*) (*)

Unlike most languages, Kitten also allows functions to have multiple results, which is very useful for writing in a dataflow style.

define increment_both
(Int32, Int32 -> Int32, Int32):
  -> x, y;
  (x + 1) (y + 1)

define increment_both
(Int32, Int32 -> Int32, Int32):
  \(+ 1) to_both

to_both is a handy combinator: it takes one function and two values, and applies the function to both of the values.

Polymorphism

Type signatures often include type variables, which enable polymorphism. For example, here is a simple recursive function to reverse a list:

define reverse<T>
(List<T> -> List<T>):
  -> xs;
  match (xs head_tail)
  case some:
    unpair -> h, t;
    t reverse h append
  case none:
    []

The signature reads:

// For any type 'T'…
<T>
(

  // …given a list of values of that type…
  List<T>

  // …this function produces…
  ->

  // …another list of values of that type.
  List<T>

)

Functions may be generic in any number of types. For example, the type of the map function can be written as <A, B> (List<A>, (A -> B) -> List<B>). It takes a list of A values, and a function from A to B, and returns a list of B values. A and B may refer to the same type or different types.

// Same type (Int32 -> Int32)
// [1, 2, 3, 4]
[3, 6, 9, 12] { (/ 3) } map

// Different types (Int32 -> List<Char>)
// ["3", "6", "9", "12"]
[3, 6, 9, 12] \show map

Permissions

We’ve seen functions such as say that have side effects. Kitten’s type system includes a concept of permissions, which track the side effects that a function is allowed to perform. say, print, newline, and other such output functions require the +IO permission because they perform I/O, and this is reflected in their types.

// Takes no inputs,
// produces no outputs,
// and has permission to do I/O.
define newline (-> +IO):
  "\n" print

Some of the standard permissions in Kitten are:

+IO
Permission to do I/O, or more generally, break referential transparency. A function requiring +IO is allowed to produce side effects and return different values each time it’s called on the same inputs, e.g., "Enter a number: " ask "Enter a number: " ask returns whatever the user entered each time, despite being called with the same argument.
+Fail
Permission to abort the program with an assertion failure. This is required by the fail and abort words. It also appears in the implicit else branch of match expressions that don’t include a case for all constructors and don’t have an explicit else.
+Unsafe
Permission to break memory safety. This can be used for low-level programming in Kitten, such as accessing the memory representations of Kitten values or working with raw pointers.