For the last few months I've been building NURL — a small,
self-hosted, LLVM-backed language whose syntax is shaped by a single
hypothesis: existing languages were optimised for human ergonomics,
and that's a poor fit for code generated token-by-token by an LLM.
Keywords, punctuation, and indentation exist for human eyes; an LLM
pays for every redundant token in both context and inference cost.

v0.1.0 just went public, source MIT/Apache-2.0:
https://github.com/nurl-lang/nurl. Site + browser playground:
https://nurl-lang.org.

I'd love feedback from this sub, especially on the design trade-offs
below. Genuine criticism welcome — I'm not married to the choices.

The design constraints

I tried to make these explicit and falsifiable rather than vibes:

Token efficiency. Every syntactic construct minimises tokens
without information loss. Single characters can carry full
semantic meaning (@ = function, ^ = return, ~ = loop /
mutability prefix, ?? = pattern match, …).
Regular grammar. No exceptions, no "this works here but not
there". LL(1) parser with ≤4-token lookahead; full EBNF fits on a
page (spec/grammar.ebnf, currently v1.7).
Local semantics. A token's meaning is derivable from ≤8 tokens
of preceding context. No long-range dependencies that break
mid-generation.
Deterministic compiler. Same source → byte-identical IR. The
self-hosted compiler must reproduce its own IR on the bootstrap's
second pass or the build is rejected.
LLVM all the way down. Codegen is delegated to clang; native
Linux/Windows/macOS and wasm32-wasi all work. The compiler
itself also builds to wasm.

What it looks like

Everything is prefix notation, one shape: OP ARG1 ARG2 ….

@ add i a i b → i { ^ + a b }     // i = i64                    
( add 3 4 )                       // → 7

Algebraic data types and pattern match:

: | Expr {
    Num i                                                                                        
    Add  *Expr  *Expr
    Mul  *Expr  *Expr                                                                            
}                                                               
                                                                                               
@ eval *Expr e → i {
    ^ ?? . e 0 {                                                                                 
        Num n    → n
        Add l r  → + ( eval l ) ( eval r )                                                       
        Mul l r  → * ( eval l ) ( eval r )                                                       
    }                                                                                          
}

Closures carry a function-type literal (@ ret_ty arg_tys):

: (@ i i) square \ i x → i { * x x }                            
( square 7 )                      // → 49

Strings live between backticks (\hello`) so single/double quotes can stay free for other syntax. The grammar deliberately reuses every character it can — there's nofor/while/if/fn` keyword in the
language.

Token economy — a quick check

Hand-counted on a "sum 1..N" toy:

Language	Tokens	Runtime	Targets
Python	~46	interp.	host
C	~30	native	many (per port)
NURL	~13	native	any LLVM target

This isn't rigorous — it's just a sanity check that the design is pulling in the right direction. The real metric would be something
like expected tokens for an LLM to produce a correct program across
a corpus, which I haven't measured yet.

Toolchain bits

Python bootstrap → self-hosted nurlc.nu → re-compiles to
byte-identical IR (hard gate in build.sh).
Stdlib: option/result/errors, string (Vec[u8]-backed,
NUL-tolerant), int/float/time, lazy iter chains, cmp + sort,
HashMap[K V], Vec[A], JSON, HTTP (libcurl + SSE streaming), CSV
reader/writer (RFC 4180), POSIX/Win32 process spawning, SHA-256
- HMAC + base64.
Memory: default-immutable bindings, compiler-inserted auto-drop
for owned strings, slices, and selected struct fields. No GC, no
borrow checker — the auto-drop pass is conservative and the
type system tracks ownership transfer through return values.
Hosted MCP server (/mcp) exposes the entire compiler to MCP
clients (Claude Desktop, Cursor, Windsurf, Zed) — they can
browse the stdlib, fetch examples, and build native/wasm
binaries on the user's behalf.

Honest rough edges

No fixed-width int types yet (i8, u32, f64 …) — the
lexer splits i8 into i + 8. Workaround: cast with #.
This is the most-asked-for feature.
No borrow checker. Auto-drop covers common ownership patterns
but nested owned struct fields and arm-local bindings that fall
through without ^ can leak.
Generic instantiation is text-level — type parameters don't
propagate through generic functions the way Rust/Haskell readers
expect. Documented gotcha.
Single-letter [T] parameter collides with the boolean literal
T**.** Use [E] or [A] until I find a less hacky fix.

Where I'd love your input

Is "tokens per program" the wrong metric? My gut says
grammar regularity (no exceptions, predictable next-token
distribution) is doing more of the work than raw token count
when an LLM is generating code. Anyone seen actual measurements?
Byte-identical bootstrap as a hard gate — too strict, or
exactly the right paranoia level for a young self-hosted
compiler?
Pattern match without a coverage checker yet. I'm leaning
toward implementing exhaustiveness at the IR-gen layer rather
than the type-check layer (lets me share code with switch
lowering). Sane?
Auto-drop vs. explicit ownership annotations — the
conservative auto-drop pass is fine for "the 80% case" but leaks
in nested-struct + control-flow-fallthrough corners. Has anyone
tried a similar approach and stayed sane?
LLM-first language design generally — is this a real
constraint worth optimising for, or is the right take "frontier
models will learn whatever syntax you throw at them, so optimise
for humans anyway"?

Try it

Browser playground (compiles to wasm32-wasi, runs locally in the tab — no server-side execution):
https://play.nurl-lang.org
Grammar (EBNF v1.7): https://github.com/nurl-lang/nurl/blob/main/spec/grammar.ebnf
Gotchas / current rough edges:
https://github.com/nurl-lang/nurl/blob/main/docs/GOTCHAS.md
Roadmap:
https://github.com/nurl-lang/nurl/blob/main/ROADMAP.md

Thanks for reading — happy to dig into any of this in the comments.

u/AdhesivenessHappy873

NURL (Neural Unified Representation Language) v0.1.0 - designing a small LLVM-backed language with LLM token economics as the primary constraint