Refactor prompt
Read @llms.txt which contains the snapshot of the entire codebase.
Analyze the entire #codebase
Update REFACTOR.md so that it becomes a very detailed plan of refactoring the code, under the following principles:
- Production-grade Quality β Aim for clean, idiomatic, boring Rust. No clever macros where straightforward code is clearer.
- Parity With Reference Implementation β Behaviour must remain 100 % compatible with the original JavaScript
the reference implementationtest-suite unless a conscious deviation is documented. - Incremental, Review-friendly Commits β Small, atomic commits that each compile and keep the test-suite green.
- Minimal Public-API Breakage β The current crate is already used in downstream code and WASM builds; any unavoidable breaking change must be sign-posted in the CHANGELOG and guarded by a semver bump.
- Performance Awareness β Never regress the existing Criterion benchmarks by more than 3 % unless the change gives a functional or maintainability win that clearly outweighs the cost.
- Great DX β Improve docs, examples and error messages as you touch code; run
./build.shlocally before pushing. - Security & Safety First β Eliminate
unsafe(currently none), check forTODO: unwrap/expect, replace with fallible code paths.
The refactor will be delivered as a series of pull-requests structured around themes so that reviewers can digest them easily.
Below is a detailed, step-by-step playbook you β the engineer β should follow. Feel free to adjust the ordering if downstream work uncovers hidden coupling, but always keep commits small and the repo green.
1. On-boarding (Β½ day)¶
- Clone the repo, run
./build.sh, open./build.log.txtβ ensure you start from a clean, reproducible state. - Scan
docs/internal/CLAUDE.md,IMPLEMENTATION_SUMMARY.md,PLAN.mdto understand design intent. - Run the benchmarks (
cargo bench --bench parsing) and note baseline numbers in a personal scratchpad. - Create a new branch
refactor/phase-1-module-layoutfor the first PR.
2. Restructure the Module Tree (1 day)¶
Goal: make the crateβs public surface and internal structure obvious at a glance.
1.1 Move binaries into src/bin/
Currently we have main.rs and bin/harness.rs; place both under src/bin/ and use descriptive names (cli.rs, harness.rs). Adjust Cargo manifest [bin] sections accordingly.
1.2 Introduce src/ast/
Create a dedicated module for the concrete syntax tree (tokens) and abstract syntax tree (Value) to localise parsing artefacts. File split suggestion:
src/ast/mod.rsβ re-exportssrc/ast/token.rsβ existingTokenenum + helper implssrc/ast/value.rsβ existingValue,Number, conversions, feature-gatedserde
1.3 Isolate Error Handling
Move error.rs into src/error/mod.rs; create sub-modules:
kind.rsβ theErrorenumposition.rsβ a lightweightSpan { start: usize, end: usize }
1.4 Public API Barrel File
lib.rs should become a concise index that re-exports public types; the heavy doc-comment with README inclusion can move to docs/api.md.
Deliverables: new folder structure, imports updated, tests & benchmarks still pass.
3. Simplify the Lexer (2-3 days)¶
The current lexer contains duplicated state machines and ad-hoc look-ahead logic. Steps:
2.1 Extract Config β Config flags like allow_single_quotes belong in ParserOptions only; remove duplication from lexer. The lexer should tokenise regardless of permissiveness; the parser decides if a token is legal in context.
2.2 Use logos β Evaluate replacing the handwritten lexer with the logos crate (MIT licensed, no runtime deps). Benchmark; accept if equal or faster and code is clearer.
2.3 Remove lexer2.rs β Itβs an experiment that has diverged; either promote it (if chosen) or delete.
2.4 Canonical Token Stream β Ensure every character of input maps to exactly one token stream position; add invariant tests (property test with quickcheck) that iter::sum(token.len()) == input.len() apart from whitespace.
4. Parser Clean-up (3 days)¶
3.1 Introduce ParserState struct instead of many boolean fields to group stateful data (depth, lexer_offset, etc.).
3.2 Tail-recursion removal β Replace deep recursion on arrays/objects with an explicit stack to honour max_depth without risking stack overflow.
3.3 Improve Error Reporting β Switch from raw usize positions to the Span type; implement fmt::Display to highlight offending slice with a caret.
3.4 Config Validation β Add ParserOptions::validate() that returns Result<(), ConfigError>; e.g. newline_as_comma=false + implicit_top_level=true is ambiguously specified β decide policy and enforce.
3.5 Property-based tests β Port the reference implementation round-trip tests; generate random forgiving JSON, parse, serialise back to canonical JSON, compare using serde_json Value.
5. Error & Result Type Revamp (1 day)¶
- Implement the
thiserrorcrate for boilerplate. - Provide an
Error::source()chain so WASM callers can access root cause. - Export a
type ParseResult<T = Value> = core::result::Result<T, Error>alias.
6. WASM Bindings Overhaul (Β½ day)¶
- Re-generate with
wasm-bindgen0.2.latest; enableweak-refsfor memory leaks fix. - Expose
parse_with_options(json, options)whereoptionsis a JS object; deriveserde_wasm_bindgenfor bridging.
7. Benchmark & CI Pipeline (1 day)¶
- Move Criterion benches under
benches/root, usecargo bench --workspace. - GitHub Actions matrix:
stable,beta,nightly, pluswasm32-unknown-unknownbuild. - Add
cargo udepsandcargo denychecks.
8. Documentation Pass (1Β½ days)¶
- Update code comments to explain why not just what.
- Auto-generate docs via
cargo doc --workspace --no-depsin CI; deploy togh-pages. - Write a migration guide if any
pubitems are renamed.
9. Release Planning (Β½ day)¶
- Bump version to
0.2.0following semver since internal layout changed. - Update
CHANGELOG.mdwith highlights: module re-org, logos lexer, better error messages.
9.1. Deliverable Checklist per PR¶
./build.shgreen locally.- All tests & benches pass on CI.
- Coverage β₯ 90 % for touched code (grcov).
- Added / updated docs where public API changed.
- CHANGELOG entry under Unreleased.
10. Nice-to-have Stretch Goals (do not block v0.2.0)¶
- Plug a streaming serializer to avoid building intermediate
Values for large input. - Explore
simd-utf8for lexing speed-ups. - Accept
Cow<str>input to allow zero-copy parse in some contexts.
10.1. Final Notes¶
Treat the refactor as paving the road for long-term maintainability rather than chasing micro-optimisations. When in doubt choose readability β but back it up with benchmark data.