📚 Personal bits of knowledge
at fix-typos 31 lines 2.8 kB view raw view rendered
1# Data IDE 2 3After playing with [Rill Developer](https://github.com/rilldata/rill-developer), DuckDB, Vega, WASM, [Rath](https://rath.kanaries.net/), and other modern Data IDEs, I think we have all the pieces for an awesome web based BI/Data exploration tool. Some of the features it could have: 4 5- Let me add local and remote datasets. Not just one as I'd like to join them later. 6- Let me plot it using Vega-Lite. Guide me through alternatives like [Vega's Voyager2](https://vega.github.io/voyager2/) does. 7 - Might be as simple as surfacing Observable Plot with DuckDB WASM... 8- Use LLMs to improve the datasets and offer next steps: 9 - Get suggested transformations for certain columns. If it detects a date, extract day of the week. If it detects a string, `lower()` it... 10 - Get suggested plots. Given that it'll know both the column names and the types. Should be possible to create a prompt that returns some plot ideas and another that takes that and write the Vega-Lite code to make it work. 11 - Make it easy to query the data via Natural Language. 12- Let me transform them with SQL ([DuckDB](https://duckdb.org/)) and Python ([JupyterLite](https://jupyterlite.readthedocs.io/en/latest/)). Similar to [Neptyne](http://web.archive.org/web/20250306181451/https://www.neptyne.com/) but in the browser (WASM). 13- Let me save the plots in a separate space and give me a shareable URL encoded link. 14 - Local datasets could be shared using something like [Magic Wormhole](https://github.com/magic-wormhole/magic-wormhole) or a temporal storage service. 15- Let me grab the state of the app (YAML/JSON), version control it, and generate static (to publish in GitHub Pages) and dynamic (hosted somewhere) dashboards from it. 16 - Similar to [evidence.dev](https://evidence.dev/) or [portal.js](https://portaljs.org/). 17- It could also have "smart" data checks. Similar to [deepchecks](https://github.com/deepchecks/deepchecks) alerting about anomalies, outliers, noisy variables, ... 18- Given a large amount of Open Data. It could offer a way for people to upload their datasets [and get them augmented](http://web.archive.org/web/20250108164736/https://subsets.io/). 19 - E.g: Upload a CSV with year and country and the tool could suggest GDP per Capita or population. 20 21Could be an awesome front-end to explore Open Data. 22 23## Relevant Projects 24 25- [Rath](https://rath.kanaries.net/) 26- [Hex.tech](https://hex.tech/) 27- [Perspective](https://perspective.finos.org/) 28- [Rill Developer](https://github.com/rilldata/rill-developer) 29- [Datastation](https://datastation.multiprocess.io/) 30- [Excalichart](http://web.archive.org/web/20231018190556/https://www.excalichart.com/) 31- [Chartpilot](http://web.archive.org/web/20241007164422/https://www.chartpilot.com/)