Python bindings to oxyroot. Makes reading .root files blazing fast 馃殌
at main 97 lines 3.0 kB view raw view rendered
1# Python bindings for oxyroot 2 3[![CI](https://github.com/vvsagar/py-oxyroot/actions/workflows/CI.yml/badge.svg)](https://github.com/vvsagar/py-oxyroot/actions/workflows/CI.yml) 4[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) 5 6> **Warning** 7> This project is an early prototype and is not yet recommended for production use. For a mature and well-tested alternative, please consider using [uproot](https://github.com/scikit-hep/uproot5). 8 9A fast, Rust-powered Python reader for CERN ROOT files. 10 11This package provides a simple and Pythonic interface bindings to `oxyroot`, a rust package, to read data from `.root` files, inspired by libraries like `uproot`. It leverages the speed of Rust for high-performance data extraction and integrates with the scientific Python ecosystem by providing data as NumPy arrays. 12 13## Features 14 15- **High-Performance**: Core logic is written in Rust for maximum speed. 16- **Parquet Conversion**: Convert TTrees directly to Apache Parquet files with a single command. 17- **NumPy Integration**: Get branch data directly as NumPy arrays. 18- **Simple, Pythonic API**: Easy to learn and use, and similar to `uproot` 19 20## Quick Start 21 22Here's how to open a ROOT file, access a TTree, and read a TBranch into a NumPy array. 23 24```python 25import oxyroot 26import numpy as np 27 28# Open the ROOT file 29file = oxyroot.open("ntuples.root") 30 31# Get a TTree 32tree = file["mu_mc"] 33 34# List branches in the tree 35print(f"Branches: {tree.branches()}") 36 37# Get a specific branch and its data as a NumPy array 38branch = tree["mu_pt"] 39data = branch.array() 40 41print(f"Read branch '{branch.name}' into a {type(data)}") 42print(f"Mean value: {np.nanmean(data):.2f}") 43``` 44 45## Converting to Parquet 46 47You can easily convert all (or a subset of) branches in a TTree to a Parquet file. 48 49```python 50# Convert the entire tree to a Parquet file 51tree.to_parquet("output.parquet") 52 53# Convert specific columns to a Parquet file with ZSTD compression 54tree.to_parquet( 55 "output_subset.parquet", 56 columns=["mu_pt", "mu_eta"], 57 compression="zstd" 58) 59``` 60 61## Performance 62 63`oxyroot` is designed to be fast. Here is a simple benchmark comparing the time taken to read all branches of a TTree with `uproot` and `oxyroot`. 64 65```python 66import oxyroot 67import uproot 68import time 69 70file_name = "ntuples.root" 71tree_name = 'mu_mc' 72 73# Time uproot 74start_time = time.time() 75up_tree = uproot.open(file_name)[tree_name] 76for branch in up_tree: 77 if branch.typename != "std::string": 78 branch.array(library="np") 79end_time = time.time() 80print(f"Uproot took: {end_time - start_time:.3f}s") 81 82# Time oxyroot 83start_time = time.time() 84oxy_tree = oxyroot.open(file_name)[tree_name] 85for branch in oxy_tree: 86 branch.array() 87end_time = time.time() 88print(f"Oxyroot took: {end_time - start_time:.3f}s") 89``` 90 91## License 92 93This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. 94 95## Contributing 96 97Contributions are welcome! Please feel free to submit a pull request or open an issue.