Math-first HPC examples¶

Write numerical kernels as ordinary math on fixed-size array tiles. The compiler lowers them to loops, SIMD, and (when proved) OpenMP — you do not call __li_simd_* in application or Tier 1 benchmark code.

Surface (you write)	Status in `lic` today	Tier 1 bench
`dot(a, b)` / `a @ b` on `array[N, float]`	Implemented	`simd_dot`
`C = A @ B` on `array[M, array[K, float]]`	Implemented (fixed shapes)	`matmul_naive`
Blocked IKJ `C[i][j] += …` on tiles	Implemented (manual loops)	`matmul_blocked`
`sum(a)` on `array[N, int\\|float]`	Implemented	—
Element-wise `a * b` on arrays	Implemented (2i-a)	—
`@vectorized(lanes=4)` / `@no_vectorize` on `def`	Policy + array SIMD gate (7d-b)	—
`@vectorized` on `for`	Parse only	—
`@parallel` on `def` / loops	Policy + OpenMP when proved	—

Dot product (1d)¶

dot(x, y) and x @ y are the same when x and y are both array[N, float] with the same N.

def dot3(x: array[4, float], y: array[4, float]) -> float
  requires true
  ensures result >= 0.0
  decreases 0
=
  return x @ y

def main() -> int
  requires true
  ensures result == 0
  decreases 0
=
  var x: array[4, float]
  var y: array[4, float]
  var i: int = 0
  while i < 4
    x[i] = 1.0
    y[i] = 2.0
    i = i + 1
  var s: float = dot3(x, y)
  return 0

Shape error (compile fail): mismatched lengths — li-tests/math_linalg/array_dot_mismatch.li.

Large workloads: Tier 1 simd_dot uses repeated array[40000, float] tiles (see benchmark main.li); no intrinsics in the user file.

AXPY (scalar loop form)¶

Until element-wise alpha * x + y on whole arrays lands, express AXPY with an index loop (still math — no simd(...)):

def axpy(alpha: float, x: array[8, float], y: array[8, float]) -> unit
  requires true
  ensures true
  decreases 0
=
  var i: int = 0
  while i < 8
    y[i] = alpha * x[i] + y[i]
    i = i + 1

def main() -> int
  requires true
  ensures result == 0
  decreases 0
=
  var x: array[8, float]
  var y: array[8, float]
  var i: int = 0
  while i < 8
    x[i] = 1.0
    y[i] = 0.0
    i = i + 1
  axpy(2.0, x, y)
  return 0

Target form (Phase 2i/7e): same semantics with @vectorized(lanes=8) on the loop; decorators elaborate when 7d lowering is complete.

Matrix multiply (2d `@`)¶

Fixed 2d tiles use nested arrays: array[M, array[K, float]] @ array[K, array[N, float]] → array[M, array[N, float]].

def main() -> int
  requires true
  ensures result == 0
  decreases 0
=
  var A: array[2, array[3, float]]
  var B: array[3, array[2, float]]
  var C: array[2, array[2, float]]
  A[0][0] = 1.0
  B[0][0] = 1.0
  C = A @ B
  return 0

Inner dimension mismatch fails at compile time — li-tests/math_linalg/matmul_dim_mismatch.li.

Benchmark note: matmul_naive and matmul_blocked use many 64×64 tiles so stack size stays bounded; blocked variant uses BK=16 IKJ micro-blocks (matmul_blocked_tile_f64, 512 reps ≈ 512³ flops).

Reduction: `sum`¶

def main() -> int
  requires true
  ensures result == 0
  decreases 0
=
  var a: array[4, int]
  var i: int = 0
  while i < 4
    a[i] = i + 1
    i = i + 1
  var s: int = sum(a)
  return 0

MD-style force loop (scalar math today)¶

Molecular dynamics kernels are still written as scalar loops over particles; Tier 2 md_lennard_jones is pure Li without manual SIMD intrinsics in the driver. Parallel safety uses parallel for + disjoint_elem (see Fast math & parallelism).

# Sketch — full source: benchmarks/tier2_physics/md_lennard_jones/li/main.li
parallel for i in 0..<N
  requires disjoint_elem(i, forces)
  decreases N - i
=
  forces[i] = compute_pair_force(i)

Math source vs compiler expansion (audit)¶

You ship (benchmarks, handbooks)	Compiler may emit (MIR / LLVM appendix)
`return a @ b`	`ArrayDotF64` loop (SIMD autovec at `-O3`)
`C = A @ B` (2d)	`ArrayMatMul2DF64` triple loop
`sum(a)`	`ArraySumF64` / `ArraySumI64`

Do not paste __li_simd_splat_f64 / __li_simd_mul_f64 into new examples — use Compiler intrinsics appendix only when debugging codegen.

Run and test¶

./scripts/build.sh
./build/compiler/lic/lic build my_kernel.li -o my_kernel
./li-tests/run_all.sh   # math_linalg suite
python3 benchmarks/harness/bench.py --tier 1   # after build

Regression tests: li-tests/math_linalg/.