Features
A clear list of what the tool can do, with plain limits and behaviour.
Goal
Fast cleaning, repeatable results, and no mystery behaviour.
Supported input
The current focus is structured CSV files: rows and columns with commas separating values. That covers most simple datasets used in spreadsheets, basic analytics, and machine learning demos.
If your data is messy text, mixed formats, or deeply nested structures, that is in the future roadmap (semi-structured and unstructured).
How you use it
You can upload a CSV or paste CSV text. Then you run detection to see what issues exist, and run cleaning steps to produce a cleaned file you can download.
It is designed so you can explain what you did in plain English, for example: you removed duplicate rows and standardised null values.
Detection
Detection measures issues without changing data. This makes it easier to justify changes and compare before and after.
- Missing value detection: counts empty cells and missing entries.
- Duplicate detection: identifies duplicated rows.
These are deliberately simple and deterministic so results are stable and easy to audit.
Cleaning
Cleaning applies a known change, and the output is a new CSV.
- Remove duplicate rows: keeps the first occurrence of each row and removes later duplicates.
- Standardise common null values: converts text like
n/a,null, ornoneinto a consistent missing representation. - Download cleaned CSV: exports the cleaned result for reuse.
Offline-first mode (privacy)
The app tries to run locally first using WebAssembly. WebAssembly is a browser feature that lets compiled code run inside the page, so the cleaning logic can run without sending data away.
When offline mode is active, your CSV stays on your device. This matters if you are working with sensitive data.
API fallback mode (compatibility)
If WebAssembly is not available, the app can fall back to calling the server API to run the same style of operations. This keeps the tool usable in more environments.
If you are privacy-conscious, prefer offline mode and double check the mode shown in the app.
Limits and non-goals
This is not a spreadsheet replacement and it is not an enterprise pipeline tool. It is a focused toolkit that keeps the behaviour understandable and visible.
- No hidden smart cleaning rules.
- No automatic changes without you choosing the operation.
- No requirement to sign in.