blob: e657abaa105e2df0c53537722ed43243f052e75d [file] [view]
---
title: Overview
description: High-level overview of the Texera architecture, core concepts, and use cases.
weight: 10
---
{{% pageinfo %}}
**Texera** is an open-source system that supports collaborative data science at scale using Web-based workflows.
{{% /pageinfo %}}
Texera combines powerful backend dataflow execution with an intuitive, drag-and-drop web interface. It allows users to build, execute, and share complex data workflows seamlessly across teams without worrying about the underlying computing infrastructure.
---
## 🏗️ Architecture: How it Works
At its core, Texera acts as a bridge between a highly accessible frontend and a scalable distributed computing backend.
1. **Web-Based Interface (Frontend):** A rich GUI running directly in your browser. It allows users to construct data processing pipelines by dragging and dropping blocks on a canvas. No installation is required on client machines.
2. **Distributed Engine (Backend):** When a workflow is submitted, the Texera engine compiles the graphical representation into an optimized, distributed execution plan. It then spins up computing units to process massive datasets in parallel.
3. **Storage Integration:** Texera integrates smoothly with modern data lake and storage technologies (like LakeFS and MinIO) to persistently log runs and save datasets securely.
---
## 🧩 Core Concepts
To use Texera effectively, familiarize yourself with these foundational terms:
* **Operators:** The fundamental building blocks of a workflow. Each operator represents a single operationsuch as filtering data, joining tables, training a machine learning model, or running a custom Python script. Operators have input and output ports to flow data seamlessly between them.
* **Workflows:** A Directed Acyclic Graph (DAG) constructed out of linked operators. Workflows represent fully end-to-end data pipelines.
* **Datasets:** Structured or semi-structured data sources uploaded to or generated by Texera. You can drag datasets directly into your workflow to begin processing them.
---
## 🎯 Use Cases & Target Audience
Texera bridges the gap between different technical proficiencies, making it ideal for teams to collaborate:
* **Data Scientists:** Quickly prototype data transformations, run machine learning algorithms, and visualize outputs without having to manage Spark or Kubernetes configurations manually.
* **Domain Experts & Analysts:** Utilize pre-built advanced analytics operators through an easy-to-learn visual interface, skipping the complex coding traditionally required for Big Data tasks.
* **Software Engineers:** Rapidly iterate and contribute back to the system by writing modular Java/Scala natively or injecting custom Python UDFs (User Defined Functions) directly into the execution graph.
> Texera enables you to move from prototype to production data pipelines seamlessly.