← Explore
TOPIC

#data-engineering

Open source repositories tagged with #data-engineering, ranked by health score.

DioCrafts
DioCrafts/OpenFoundry
Go
91
health

🏭 The open-source Palantir Foundry alternative. Connect any data source, build ontologies, create pipelines, visualize with dashboards, and make AI-powered decisions. Self-hosted.

337
cocoindex-io
cocoindex-io/cocoindex
Python
89
health

Incremental engine for long horizon agents 🌟 Star if you like it!

10.0k
dathere
dathere/qsv
Rust
88
health

Blazing-fast Data-Wrangling toolkit

3.6k
lakehq
lakehq/sail
Rust
86
health

Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.

2.7k
im-anishraj
im-anishraj/arnio
Python
83
health

C++ accelerated data quality toolkit for Python: CSV parsing, cleaning, schema validation, profiling, and pandas integration.

72