Refactor prompt
Read @llms.txt which contains the snapshot of the entire codebase.
Analyze the entire #codebase
Update REFACTOR.md so that it becomes a very detailed plan of refactoring the code, under the following principles:
- Production-grade Quality β Aim for clean, idiomatic, boring Rust. No clever macros where straightforward code is clearer.
- Parity With Reference Implementation β Behaviour must remain 100 % compatible with the original JavaScript
the reference implementation
test-suite unless a conscious deviation is documented. - Incremental, Review-friendly Commits β Small, atomic commits that each compile and keep the test-suite green.
- Minimal Public-API Breakage β The current crate is already used in downstream code and WASM builds; any unavoidable breaking change must be sign-posted in the CHANGELOG and guarded by a semver bump.
- Performance Awareness β Never regress the existing Criterion benchmarks by more than 3 % unless the change gives a functional or maintainability win that clearly outweighs the cost.
- Great DX β Improve docs, examples and error messages as you touch code; run
./build.sh
locally before pushing. - Security & Safety First β Eliminate
unsafe
(currently none), check forTODO: unwrap
/expect
, replace with fallible code paths.
The refactor will be delivered as a series of pull-requests structured around themes so that reviewers can digest them easily.
Below is a detailed, step-by-step playbook you β the engineer β should follow. Feel free to adjust the ordering if downstream work uncovers hidden coupling, but always keep commits small and the repo green.
1. On-boarding (Β½ day)¶
- Clone the repo, run
./build.sh
, open./build.log.txt
β ensure you start from a clean, reproducible state. - Scan
docs/internal/CLAUDE.md
,IMPLEMENTATION_SUMMARY.md
,PLAN.md
to understand design intent. - Run the benchmarks (
cargo bench --bench parsing
) and note baseline numbers in a personal scratchpad. - Create a new branch
refactor/phase-1-module-layout
for the first PR.
2. Restructure the Module Tree (1 day)¶
Goal: make the crateβs public surface and internal structure obvious at a glance.
1.1 Move binaries into src/bin/
Currently we have main.rs
and bin/harness.rs
; place both under src/bin/
and use descriptive names (cli.rs
, harness.rs
). Adjust Cargo manifest [bin]
sections accordingly.
1.2 Introduce src/ast/
Create a dedicated module for the concrete syntax tree (tokens) and abstract syntax tree (Value) to localise parsing artefacts. File split suggestion:
src/ast/mod.rs
β re-exportssrc/ast/token.rs
β existingToken
enum + helper implssrc/ast/value.rs
β existingValue
,Number
, conversions, feature-gatedserde
1.3 Isolate Error Handling
Move error.rs
into src/error/mod.rs
; create sub-modules:
kind.rs
β theError
enumposition.rs
β a lightweightSpan { start: usize, end: usize }
1.4 Public API Barrel File
lib.rs
should become a concise index that re-exports public types; the heavy doc-comment with README inclusion can move to docs/api.md
.
Deliverables: new folder structure, imports updated, tests & benchmarks still pass.
3. Simplify the Lexer (2-3 days)¶
The current lexer contains duplicated state machines and ad-hoc look-ahead logic. Steps:
2.1 Extract Config β Config flags like allow_single_quotes
belong in ParserOptions
only; remove duplication from lexer. The lexer should tokenise regardless of permissiveness; the parser decides if a token is legal in context.
2.2 Use logos
β Evaluate replacing the handwritten lexer with the logos
crate (MIT licensed, no runtime deps). Benchmark; accept if equal or faster and code is clearer.
2.3 Remove lexer2.rs
β Itβs an experiment that has diverged; either promote it (if chosen) or delete.
2.4 Canonical Token Stream β Ensure every character of input maps to exactly one token stream position; add invariant tests (property test with quickcheck
) that iter::sum(token.len()) == input.len()
apart from whitespace.
4. Parser Clean-up (3 days)¶
3.1 Introduce ParserState
struct instead of many boolean fields to group stateful data (depth
, lexer_offset
, etc.).
3.2 Tail-recursion removal β Replace deep recursion on arrays/objects with an explicit stack to honour max_depth
without risking stack overflow.
3.3 Improve Error Reporting β Switch from raw usize
positions to the Span
type; implement fmt::Display
to highlight offending slice with a caret.
3.4 Config Validation β Add ParserOptions::validate()
that returns Result<(), ConfigError>
; e.g. newline_as_comma=false
+ implicit_top_level=true
is ambiguously specified β decide policy and enforce.
3.5 Property-based tests β Port the reference implementation
round-trip tests; generate random forgiving JSON, parse, serialise back to canonical JSON, compare using serde_json Value.
5. Error & Result Type Revamp (1 day)¶
- Implement the
thiserror
crate for boilerplate. - Provide an
Error::source()
chain so WASM callers can access root cause. - Export a
type ParseResult<T = Value> = core::result::Result<T, Error>
alias.
6. WASM Bindings Overhaul (Β½ day)¶
- Re-generate with
wasm-bindgen
0.2.latest; enableweak-refs
for memory leaks fix. - Expose
parse_with_options(json, options)
whereoptions
is a JS object; deriveserde_wasm_bindgen
for bridging.
7. Benchmark & CI Pipeline (1 day)¶
- Move Criterion benches under
benches/
root, usecargo bench --workspace
. - GitHub Actions matrix:
stable
,beta
,nightly
, pluswasm32-unknown-unknown
build. - Add
cargo udeps
andcargo deny
checks.
8. Documentation Pass (1Β½ days)¶
- Update code comments to explain why not just what.
- Auto-generate docs via
cargo doc --workspace --no-deps
in CI; deploy togh-pages
. - Write a migration guide if any
pub
items are renamed.
9. Release Planning (Β½ day)¶
- Bump version to
0.2.0
following semver since internal layout changed. - Update
CHANGELOG.md
with highlights: module re-org, logos lexer, better error messages.
9.1. Deliverable Checklist per PR¶
./build.sh
green locally.- All tests & benches pass on CI.
- Coverage β₯ 90 % for touched code (grcov).
- Added / updated docs where public API changed.
- CHANGELOG entry under Unreleased.
10. Nice-to-have Stretch Goals (do not block v0.2.0)¶
- Plug a streaming serializer to avoid building intermediate
Value
s for large input. - Explore
simd-utf8
for lexing speed-ups. - Accept
Cow<str>
input to allow zero-copy parse in some contexts.
10.1. Final Notes¶
Treat the refactor as paving the road for long-term maintainability rather than chasing micro-optimisations. When in doubt choose readability β but back it up with benchmark data.