title: Overview description: High-level overview of the Texera architecture, core concepts, and use cases. weight: 10

{{% pageinfo %}} Texera is an open-source system that supports collaborative data science at scale using Web-based workflows. {{% /pageinfo %}}

Texera combines powerful backend dataflow execution with an intuitive, drag-and-drop web interface. It allows users to build, execute, and share complex data workflows seamlessly across teams without worrying about the underlying computing infrastructure.


🏗️ Architecture: How it Works

At its core, Texera acts as a bridge between a highly accessible frontend and a scalable distributed computing backend.

  1. Web-Based Interface (Frontend): A rich GUI running directly in your browser. It allows users to construct data processing pipelines by dragging and dropping blocks on a canvas. No installation is required on client machines.
  2. Distributed Engine (Backend): When a workflow is submitted, the Texera engine compiles the graphical representation into an optimized, distributed execution plan. It then spins up computing units to process massive datasets in parallel.
  3. Storage Integration: Texera integrates smoothly with modern data lake and storage technologies (like LakeFS and MinIO) to persistently log runs and save datasets securely.

🧩 Core Concepts

To use Texera effectively, familiarize yourself with these foundational terms:

  • Operators: The fundamental building blocks of a workflow. Each operator represents a single operation—such as filtering data, joining tables, training a machine learning model, or running a custom Python script. Operators have input and output ports to flow data seamlessly between them.
  • Workflows: A Directed Acyclic Graph (DAG) constructed out of linked operators. Workflows represent fully end-to-end data pipelines.
  • Datasets: Structured or semi-structured data sources uploaded to or generated by Texera. You can drag datasets directly into your workflow to begin processing them.

🎯 Use Cases & Target Audience

Texera bridges the gap between different technical proficiencies, making it ideal for teams to collaborate:

  • Data Scientists: Quickly prototype data transformations, run machine learning algorithms, and visualize outputs without having to manage Spark or Kubernetes configurations manually.
  • Domain Experts & Analysts: Utilize pre-built advanced analytics operators through an easy-to-learn visual interface, skipping the complex coding traditionally required for Big Data tasks.
  • Software Engineers: Rapidly iterate and contribute back to the system by writing modular Java/Scala natively or injecting custom Python UDFs (User Defined Functions) directly into the execution graph.

Texera enables you to move from prototype to production data pipelines seamlessly.