📚 Personal bits of knowledge
1# Data IDE
2
3After playing with [Rill Developer](https://github.com/rilldata/rill-developer), DuckDB, Vega, WASM, [Rath](https://rath.kanaries.net/), and other modern Data IDEs, I think we have all the pieces for an awesome web based BI/Data exploration tool. Some of the features it could have:
4
5- Let me add local and remote datasets. Not just one as I'd like to join them later.
6- Let me plot it using Vega-Lite. Guide me through alternatives like [Vega's Voyager2](https://vega.github.io/voyager2/) does.
7 - Might be as simple as surfacing Observable Plot with DuckDB WASM...
8- Use LLMs to improve the datasets and offer next steps:
9 - Get suggested transformations for certain columns. If it detects a date, extract day of the week. If it detects a string, `lower()` it...
10 - Get suggested plots. Given that it'll know both the column names and the types. Should be possible to create a prompt that returns some plot ideas and another that takes that and write the Vega-Lite code to make it work.
11 - Make it easy to query the data via Natural Language.
12- Let me transform them with SQL ([DuckDB](https://duckdb.org/)) and Python ([JupyterLite](https://jupyterlite.readthedocs.io/en/latest/)). Similar to [Neptyne](http://web.archive.org/web/20250306181451/https://www.neptyne.com/) but in the browser (WASM).
13- Let me save the plots in a separate space and give me a shareable URL encoded link.
14 - Local datasets could be shared using something like [Magic Wormhole](https://github.com/magic-wormhole/magic-wormhole) or a temporal storage service.
15- Let me grab the state of the app (YAML/JSON), version control it, and generate static (to publish in GitHub Pages) and dynamic (hosted somewhere) dashboards from it.
16 - Similar to [evidence.dev](https://evidence.dev/) or [portal.js](https://portaljs.org/).
17- It could also have "smart" data checks. Similar to [deepchecks](https://github.com/deepchecks/deepchecks) alerting about anomalies, outliers, noisy variables, ...
18- Given a large amount of Open Data. It could offer a way for people to upload their datasets [and get them augmented](http://web.archive.org/web/20250108164736/https://subsets.io/).
19 - E.g: Upload a CSV with year and country and the tool could suggest GDP per Capita or population.
20
21Could be an awesome front-end to explore Open Data.
22
23## Relevant Projects
24
25- [Rath](https://rath.kanaries.net/)
26- [Hex.tech](https://hex.tech/)
27- [Perspective](https://perspective.finos.org/)
28- [Rill Developer](https://github.com/rilldata/rill-developer)
29- [Datastation](https://datastation.multiprocess.io/)
30- [Excalichart](http://web.archive.org/web/20231018190556/https://www.excalichart.com/)
31- [Chartpilot](http://web.archive.org/web/20241007164422/https://www.chartpilot.com/)