| <h1 align="center">Texera - Collaborative Data Science and AI/ML Using Workflows</h1> |
| |
| <p align="center"> |
| <a href="https://texera.io"> <img src="core/gui/src/assets/logos/full_logo_small.png" alt="texera-logo" width="192px" height="109px"/> </a> |
| <br> |
| <i>Texera supports scalable data computation and enables advanced AI/ML techniques.</i> |
| <br> |
| <i>"Collaboration" is a key focus, and we enable an experience similar to Google Docs, but for data science. </i> |
| <br> |
| |
| <h4 align="center"> |
| <a href="https://texera.io">Official Site</a> |
| | |
| <a href="https://texera.io/publications/">Publications</a> |
| | |
| <a href="https://texera.io/category/video/">Video</a> |
| | |
| <a href="https://texera.io/category/blog/">Blog</a> |
| | |
| <a href="https://github.com/Texera/texera/wiki/Getting-Started">Getting Started</a> |
| <br> |
| </h4> |
| |
| </p> |
| </p> |
| <p align="center"> |
| <img alt="Static Badge" src="https://img.shields.io/badge/Users-332-blue"> |
| <img alt="Static Badge" src="https://img.shields.io/badge/Projects-86-blue"> |
| <img alt="Static Badge" src="https://img.shields.io/badge/Workflows-2,481-blue"> |
| <img alt="Static Badge" src="https://img.shields.io/badge/Executions-51K-blue"> |
| <img alt="Static Badge" src="https://img.shields.io/badge/Workflow_Versions-357K-blue"> |
| <img alt="Static Badge" src="https://img.shields.io/badge/Deployments-7-blue"> |
| <img alt="Static Badge" src="https://img.shields.io/badge/Largest_Deployment-100_nodes,_400_cores-green"> |
| </p> |
| |
| # Goals |
| |
| * Provide data science as cloud services; |
| * Provide a browser-based GUI to form a workflow without writing code; |
| * Allow non-IT people to access data science; |
| * Support collaborative data science; |
| * Allow users to interact with the execution of a job; |
| * Support huge volumes of data efficiently. |
| |
| # Workflow GUI |
| The Texera interface supports real-time collaboration on data science projects, allowing seamless sharing of data and workflows with easy access to AI/ML techniques and efficient management of public and private resources. |
| The workflow in the use case shown below includes data cleaning, ML model training, and validation. |
|  |
| |
| # Publications (Computer Science) |
| * (5/2025) **Responsive Retrieval of Consistent States in Pipelined Executions of Dataflows** |
| Shengquan Ni, and Chen Li |
| _To appear in HILDA Workshop at SIGMOD 2025_ |
| * (11/2024) **IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems** |
| Shengquan Ni, Yicong Huang, Zuozhi Wang, and Chen Li |
| _To appear in VLDB 2025_ |
| * (8/2024) **Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs** |
| Xiaozhen Liu, Yicong Huang, Xinyuan Lin, Avinash Kumar, Sadeem Alsudais, and Chen Li |
| _To appear in SIGMOD 2025_ |
| * (7/2024) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows** |
| Zuozhi Wang, Yicong Huang, Shengquan Ni, Avinash Kumar, Sadeem Alsudais, Xiaozhen Liu, Xinyuan Lin, Yunyan Ding, and Chen Li |
| _In VLDB 2024, Scalable Data Science track_ | [PDF](https://www.vldb.org/pvldb/vol17/p3580-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2024-texera-presentation.pdf) |
| * (3/2024) **Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows** |
| Yicong Huang, Zuozhi Wang, and Chen Li |
| _In SIGMOD 2024 **Best Demo Runner-Up Award🏆**_ | [PDF](https://dl.acm.org/doi/10.1145/3626246.3654756) |
| * (2/2024) **Data Science Tasks Implemented with Scripts versus GUI-Based Workflows:** The Good, the Bad, and the Ugly |
| Alexander K Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei Wang, and Chen Li |
| _In DataPlat Workshop at ICDE 2024_ | [PDF](https://ieeexplore.ieee.org/abstract/document/10555112) | [Slides](https://chenli.ics.uci.edu/files/icde2024-dataplat-workshop.pdf) |
| <details> |
| <summary>Expand All</summary> |
| |
| * (8/2023) **Building a Collaborative Data Analytics System: Opportunities and Challenges** |
| Zuozhi Wang, Chen Li |
| _In Tutorial at VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p3898-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-texera-tutorial.pdf) |
| * (8/2023) **Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control** |
| Yicong Huang, Zuozhi Wang, and Chen Li |
| _In SIGMOD 2024_ | [PDF](https://dl.acm.org/doi/10.1145/3626712) | [Slides](https://chenli.ics.uci.edu/files/sigmod2024-udon-presentation.pdf) |
| * (8/2023) **Improving Iterative Analytics in GUI-Based Data-Processing Systems with Visualization, Version Control, and Result Reuse** |
| Sadeem Alsudais Ph.D. Thesis | [PDF](https://sadeemsaleh.github.io/Sadeem_phd_thesis.pdf) |
| * (7/2023) **Using Texera to Characterize Climate Change Discussions on Twitter During Wildfires** |
| Shengquan Ni, Yicong Huang, Jessie W. Y. Ko, Alexander Taylor, Xiusi Chen, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Suellen Hopfer, and Chen Li |
| _In Data Science Day at KDD 2023_ |
| * (7/2023) **Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions** |
| Sadeem Alsudais, Avinash Kumar, and Chen Li |
| _In HILDA Workshop at SIGMOD 2023_ | [PDF](https://dl.acm.org/doi/10.1145/3597465.3605219) |
| * (6/2023) **Texera: A System for Collaborative and Interactive Data Analytics Using Workflows** |
| Zuozhi Wang Ph.D. Thesis | [PDF](https://zuozhiw.github.io/Zuozhi_Wang_UCI_PhD_Thesis.pdf) |
| * (12/2022) **Towards Interactive, Adaptive and Result-aware Big Data Analytics** |
| Avinash Kumar Ph.D. Thesis | [PDF](https://arxiv.org/abs/2212.07096) |
| * (9/2022) **Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees** |
| Zuozhi Wang, Shengquan Ni, Avinash Kumar, and Chen Li |
| _In VLDB 2023_ | [PDF](https://www.vldb.org/pvldb/vol16/p256-wang.pdf) | [Slides](https://chenli.ics.uci.edu/files/vldb2023-fries.pdf) |
| * (7/2022) **Drove: Tracking Execution Results of Workflows on Large Datasets** |
| Sadeem Alsudais |
| _In the Ph.D. Workshop at VLDB 2022_ | [PDF](http://ceur-ws.org/Vol-3186/paper_10.pdf) |
| * (6/2022) **Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models** |
| Zhihui Yang, Yicong Huang, Zuozhi Wang, Feng Gao, Yao Lu, Chen Li, and X. Sean Wang |
| _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3734-yang.pdf) |
| * (6/2022) **Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera** |
| Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li |
| _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p3738-liu.pdf) | [Demo Video](https://youtu.be/2gfPUZNsoBs) |
| * (4/2022) **Optimizing Machine Learning Inference Queries with Correlative Proxy Models** |
| Zhihui Yang, Zuozhi Wang, Yicong Huang, Yao Lu, Chen Li, and X. Sean Wang |
| _In VLDB 2022_ | [PDF](https://www.vldb.org/pvldb/vol15/p2032-yang.pdf) |
| * (7/2020) **Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera** |
| Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li |
| _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p2953-wang.pdf) | [Video](https://www.youtube.com/watch?v=SP-XiDADbw0) | [Slides](https://docs.google.com/presentation/d/14U6RPZfeb8Ho0aO2HsCSc8lRs6ul6AxEIm5gpjeVUYA/edit?usp=sharing) |
| * (1/2020) **Amber: A Debuggable Dataflow system based on the Actor Model** |
| Avinash Kumar, Zuozhi Wang, Shengquan Ni, and Chen Li |
| _In VLDB 2020_ | [PDF](http://www.vldb.org/pvldb/vol13/p740-kumar.pdf) | [Video](https://www.youtube.com/watch?v=T5ShFRfHmgI) | [Slides](https://docs.google.com/presentation/d/1v8G9lDmfv4Ff2YWyrGfo_9iMQVF4N8a-4gO4H-K6rCk/edit?usp=sharing) |
| * (4/2017) **A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets** |
| Zuozhi Wang, Flavio Bayer, Seungjin Lee, Kishore Narendran, Xuxi Pan, Qing Tang, Jimmy Wang, and Chen Li |
| _In ICDE 2017_ **Best Demo award** | [PDF](https://chenli.ics.uci.edu/files/icde2017-textdb-demo.pdf) | [Video](https://github.com/Texera/texera/wiki/Video) |
| |
| </details> |
| |
| # Publications (Interdisciplinary): |
| * (2/2025) **DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service** |
| Jiadong Bai, Xiaozhen Liu, Anthony Cuturrufo, Alexander Kundu Taylor, Jeehyun Hwang, Mingyu Derek Ma, Xinyuan Lin, Yanqiao Zhu, Yicong Huang, Yunyan Ding, Wei Wang, and Chen Li |
| _To appear in [Data Science Education K-12: Research to Practice Annual Conference 2025](https://web.cvent.com/event/d641bd9f-6c99-4cbc-951b-33b1ca05d4ed/summary)_ |
| * (7/2024) **Brain Image Data Processing Using Collaborative Data Workflows on Texera** |
| Yunyan Ding, Yicong Huang, Pan Gao, Andy Thai, Atchuth Naveen Chilaparasetti, M. Gopi, Xiangmin Xu, and Chen Li |
| _In Frontiers Neural Circuits_ | [PDF](https://doi.org/10.3389/fncir.2024.1398884) |
| * (1/2024) **Wording Matters: The Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine Tweets** |
| Judith Borghouts, Yicong Huang, Suellen Hopfer, Chen Li, and Gloria Mark |
| _In TOCHI 2024_ | [PDF](https://dl.acm.org/doi/pdf/10.1145/3637876) |
| * (1/2024) **How the Experience of California Wildfires Shape Twitter Climate Change Framings** |
| Jessie W. Y. Ko, Shengquan Ni, Alexander Taylor, Xiusi Chen, Yicong Huang, Avinash Kumar, Sadeem Alsudais, Zuozhi Wang, Xiaozhen Liu, Wei Wang, Chen Li, and Suellen Hopfer |
| _In Climatic Change 2024_ | [PDF](https://link.springer.com/content/pdf/10.1007/s10584-023-03668-0.pdf) |
| * (11/2023) **The Marketing and Perceptions of Non-Tobacco Blunt Wraps on Twitter** |
| Joshua U. Rhee, Yicong Huang, Aurash J. Soroosh, Sadeem Alsudais, Shengquan Ni, Avinash Kumar, Jacob Paredes, Chen Li, and David S. Timberlake |
| _In Substance Use & Misuse 2023_ | [PDF](https://www.tandfonline.com/doi/epdf/10.1080/10826084.2023.2280572?needAccess=true) |
| |
| <details> |
| <summary>Expand All</summary> |
| |
| * (3/2023) **Understanding Underlying Moral Values and Language Use of COVID-19 Vaccine Attitudes on Twitter** |
| Judith Borghouts, Yicong Huang, Sydney Gibbs, Suellen Hopfer, Chen Li, and Gloria Mark |
| _In PNAS Nexus 2023_ | [PDF](https://academic.oup.com/pnasnexus/article-pdf/2/3/pgad013/49435858/pgad013.pdf) |
| * (10/2022) **Public Opinions Toward COVID-19 Vaccine Mandates: A Machine Learning-Based Analysis of U.S. Tweets** |
| Yawen Guo, Jun Zhu, Yicong Huang, Lu He, Changyang He, Chen Li, and Kai Zheng |
| _In AMIA 2022_ | [PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10148373/pdf/1066.pdf) |
| * (9/2021) **The Social Amplification and Attenuation of COVID-19 Risk Perception Shaping Mask-Wearing Behavior: A Longitudinal Twitter Analysis** |
| Suellen Hopfer, Emilia J. Fields, Yuwen Lu, Ganesh Ramakrishnan, Ted Grover, Quishi Bai, Yicong Huang, Chen Li, and Gloria Mark |
| _In PLOS ONE 2021_ | [PDF](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0257428) |
| * (4/2021) **Why Do People Oppose Mask Wearing? A Comprehensive Analysis of U.S. Tweets During the COVID-19 Pandemic** |
| Lu He, Changyang He, Tera Leigh Reynolds, Qiushi Bai, Yicong Huang, Chen Li, Kai Zheng, and Yunan Chen |
| _In JAMIA 2021_ | [PDF](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7989302/pdf/ocab047.pdf) |
| </details> |
| |
| # Getting Started |
| |
| * For users, visit [Guide to Use Texera](https://github.com/Texera/texera/wiki/Getting-Started). |
| * For developers, visit [Guide to Develop Texera](https://github.com/Texera/texera/wiki/Guide-for-Developers). |
| |
| Texera was formally known as "TextDB" before August 28, 2017. |
| |
| # Acknowledgements |
| |
| This project is supported by the <a href="http://www.nsf.gov">National Science Foundation</a> under the awards [IIS-1745673](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1745673), [IIS-2107150](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2107150), AWS Research Credits, and Google Cloud Platform Education Programs. |
| |
| * <a href="https://www.niddk.nih.gov/"><img src="https://github.com/Texera/texera/assets/17627829/d279897a-3efb-41c1-b2d3-8fd20c800ad7" alt="NIH NIDDK" height="30"/></a> This project is supported by an <a href="https://reporter.nih.gov/project-details/10818244">NIH NIDDK</a> award. |
| |
| |
| * <a href="http://www.yourkit.com"><img src="https://www.yourkit.com/images/yklogo.png" alt="Yourkit" height="30"/></a> [Yourkit](https://www.yourkit.com/) has given an open source license to use their profiler in this project. |
| |
| # Citation |
| Please cite Texera as |
| ``` |
| |
| @article{DBLP:journals/pvldb/WangHNKALLDL24, |
| author = {Zuozhi Wang and |
| Yicong Huang and |
| Shengquan Ni and |
| Avinash Kumar and |
| Sadeem Alsudais and |
| Xiaozhen Liu and |
| Xinyuan Lin and |
| Yunyan Ding and |
| Chen Li}, |
| title = {Texera: {A} System for Collaborative and Interactive Data Analytics |
| Using Workflows}, |
| journal = {Proc. {VLDB} Endow.}, |
| volume = {17}, |
| number = {11}, |
| pages = {3580--3588}, |
| year = {2024}, |
| url = {https://www.vldb.org/pvldb/vol17/p3580-wang.pdf}, |
| timestamp = {Thu, 19 Sep 2024 13:09:37 +0200}, |
| biburl = {https://dblp.org/rec/journals/pvldb/WangHNKALLDL24.bib}, |
| bibsource = {dblp computer science bibliography, https://dblp.org} |
| } |
| ``` |