docs/overview.md

title: Overview description: High-level overview of the Texera architecture, core concepts, and use cases. weight: 10

{{% pageinfo %}} Texera is an open-source system that supports collaborative data science at scale using Web-based workflows. {{% /pageinfo %}}

Texera combines powerful backend dataflow execution with an intuitive, drag-and-drop web interface. It allows users to build, execute, and share complex data workflows seamlessly across teams without worrying about the underlying computing infrastructure.

🏗️ Architecture: How it Works

At its core, Texera acts as a bridge between a highly accessible frontend and a scalable distributed computing backend.

Web-Based Interface (Frontend): A rich GUI running directly in your browser. It allows users to construct data processing pipelines by dragging and dropping blocks on a canvas. No installation is required on client machines.
Distributed Engine (Backend): When a workflow is submitted, the Texera engine compiles the graphical representation into an optimized, distributed execution plan. It then spins up computing units to process massive datasets in parallel.
Storage Integration: Texera integrates smoothly with modern data lake and storage technologies (like LakeFS and MinIO) to persistently log runs and save datasets securely.

🧩 Core Concepts

To use Texera effectively, familiarize yourself with these foundational terms:

Operators: The fundamental building blocks of a workflow. Each operator represents a single operation—such as filtering data, joining tables, training a machine learning model, or running a custom Python script. Operators have input and output ports to flow data seamlessly between them.
Workflows: A Directed Acyclic Graph (DAG) constructed out of linked operators. Workflows represent fully end-to-end data pipelines.
Datasets: Structured or semi-structured data sources uploaded to or generated by Texera. You can drag datasets directly into your workflow to begin processing them.

🎯 Use Cases & Target Audience

Texera bridges the gap between different technical proficiencies, making it ideal for teams to collaborate:

Data Scientists: Quickly prototype data transformations, run machine learning algorithms, and visualize outputs without having to manage Spark or Kubernetes configurations manually.
Domain Experts & Analysts: Utilize pre-built advanced analytics operators through an easy-to-learn visual interface, skipping the complex coding traditionally required for Big Data tasks.
Software Engineers: Rapidly iterate and contribute back to the system by writing modular Java/Scala natively or injecting custom Python UDFs (User Defined Functions) directly into the execution graph.

Texera enables you to move from prototype to production data pipelines seamlessly.