| --- |
| title: Overview |
| description: High-level overview of the Texera architecture, core concepts, and use cases. |
| weight: 10 |
| --- |
| |
| {{% pageinfo %}} |
| **Texera** is an open-source system that supports collaborative data science at scale using Web-based workflows. |
| {{% /pageinfo %}} |
| |
| Texera combines powerful backend dataflow execution with an intuitive, drag-and-drop web interface. It allows users to build, execute, and share complex data workflows seamlessly across teams without worrying about the underlying computing infrastructure. |
| |
| --- |
| |
| ## 🏗️ Architecture: How it Works |
| |
| At its core, Texera acts as a bridge between a highly accessible frontend and a scalable distributed computing backend. |
| |
| 1. **Web-Based Interface (Frontend):** A rich GUI running directly in your browser. It allows users to construct data processing pipelines by dragging and dropping blocks on a canvas. No installation is required on client machines. |
| 2. **Distributed Engine (Backend):** When a workflow is submitted, the Texera engine compiles the graphical representation into an optimized, distributed execution plan. It then spins up computing units to process massive datasets in parallel. |
| 3. **Storage Integration:** Texera integrates smoothly with modern data lake and storage technologies (like LakeFS and MinIO) to persistently log runs and save datasets securely. |
| |
| --- |
| |
| ## 🧩 Core Concepts |
| |
| To use Texera effectively, familiarize yourself with these foundational terms: |
| |
| * **Operators:** The fundamental building blocks of a workflow. Each operator represents a single operation—such as filtering data, joining tables, training a machine learning model, or running a custom Python script. Operators have input and output ports to flow data seamlessly between them. |
| * **Workflows:** A Directed Acyclic Graph (DAG) constructed out of linked operators. Workflows represent fully end-to-end data pipelines. |
| * **Datasets:** Structured or semi-structured data sources uploaded to or generated by Texera. You can drag datasets directly into your workflow to begin processing them. |
| |
| --- |
| |
| ## 🎯 Use Cases & Target Audience |
| |
| Texera bridges the gap between different technical proficiencies, making it ideal for teams to collaborate: |
| |
| * **Data Scientists:** Quickly prototype data transformations, run machine learning algorithms, and visualize outputs without having to manage Spark or Kubernetes configurations manually. |
| * **Domain Experts & Analysts:** Utilize pre-built advanced analytics operators through an easy-to-learn visual interface, skipping the complex coding traditionally required for Big Data tasks. |
| * **Software Engineers:** Rapidly iterate and contribute back to the system by writing modular Java/Scala natively or injecting custom Python UDFs (User Defined Functions) directly into the execution graph. |
| |
| > Texera enables you to move from prototype to production data pipelines seamlessly. |