title: Overview description: High-level overview of the Texera architecture, core concepts, and use cases. weight: 10
{{% pageinfo %}} Texera is an open-source system that supports collaborative data science at scale using Web-based workflows. {{% /pageinfo %}}
Texera combines powerful backend dataflow execution with an intuitive, drag-and-drop web interface. It allows users to build, execute, and share complex data workflows seamlessly across teams without worrying about the underlying computing infrastructure.
🏗️ Architecture: How it Works
At its core, Texera acts as a bridge between a highly accessible frontend and a scalable distributed computing backend.
- Web-Based Interface (Frontend): A rich GUI running directly in your browser. It allows users to construct data processing pipelines by dragging and dropping blocks on a canvas. No installation is required on client machines.
- Distributed Engine (Backend): When a workflow is submitted, the Texera engine compiles the graphical representation into an optimized, distributed execution plan. It then spins up computing units to process massive datasets in parallel.
- Storage Integration: Texera integrates smoothly with modern data lake and storage technologies (like LakeFS and MinIO) to persistently log runs and save datasets securely.
🧩 Core Concepts
To use Texera effectively, familiarize yourself with these foundational terms:
- Operators: The fundamental building blocks of a workflow. Each operator represents a single operation—such as filtering data, joining tables, training a machine learning model, or running a custom Python script. Operators have input and output ports to flow data seamlessly between them.
- Workflows: A Directed Acyclic Graph (DAG) constructed out of linked operators. Workflows represent fully end-to-end data pipelines.
- Datasets: Structured or semi-structured data sources uploaded to or generated by Texera. You can drag datasets directly into your workflow to begin processing them.
🎯 Use Cases & Target Audience
Texera bridges the gap between different technical proficiencies, making it ideal for teams to collaborate:
- Data Scientists: Quickly prototype data transformations, run machine learning algorithms, and visualize outputs without having to manage Spark or Kubernetes configurations manually.
- Domain Experts & Analysts: Utilize pre-built advanced analytics operators through an easy-to-learn visual interface, skipping the complex coding traditionally required for Big Data tasks.
- Software Engineers: Rapidly iterate and contribute back to the system by writing modular Java/Scala natively or injecting custom Python UDFs (User Defined Functions) directly into the execution graph.
Texera enables you to move from prototype to production data pipelines seamlessly.