| --- |
| authors: |
| - lindong: null |
| name: Dong Lin |
| date: "2023-04-19T08:00:00Z" |
| excerpt: The Apache Flink community is excited to announce the release of Flink |
| ML 2.2.0! This release focuses on enriching Flink ML's feature engineering |
| algorithms. The library now includes 33 feature engineering algorithms, making |
| it a more comprehensive library for feature engineering tasks. |
| title: Apache Flink ML 2.2.0 Release Announcement |
| aliases: |
| - /news/2023/04/19/release-ml-2.2.0.html |
| --- |
| |
| The Apache Flink community is excited to announce the release of Flink ML 2.2.0! |
| This release focuses on enriching Flink ML's feature engineering algorithms. The |
| library now includes 33 feature engineering algorithms, making it a more |
| comprehensive library for feature engineering tasks. |
| |
| With the addition of these algorithms, we believe Flink ML library is ready for |
| use in production jobs that require feature engineering capabilities, whose |
| input can then be consumed by both offline and online machine learning tasks. |
| |
| We encourage you to [download the |
| release](https://flink.apache.org/downloads.html) and share your feedback with |
| the community through the Flink [mailing |
| lists](https://flink.apache.org/community.html#mailing-lists) or |
| [JIRA](https://issues.apache.org/jira/browse/flink)! We hope you like the new |
| release and we’d be eager to learn about your experience with it. |
| |
| # Notable Features |
| |
| ## Introduced API and infrastructure for online serving |
| |
| In machine learning, one of the main goals of model training is to deploy the |
| trained model to perform online inference, where the model server must respond |
| to incoming requests with millisecond-level latency. However, prior releases of |
| Flink ML only supported nearline inference using the Flink runtime, which may |
| not meet the requirements of online inference use-cases. |
| |
| With |
| [FLIP-289](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268), |
| Flink ML now provides an API and infrastructure for users to load a |
| ModelServable from model data generated by an Estimator. This ModelServable can |
| be replicated across multiple model servers to process online inference requests |
| in parallel. As the ModelServable is effectively a UDF that does not rely on |
| Flink runtime, it can also be integrated as a UDF into other serving or |
| processing frameworks to serve the model trained by Flink ML. |
| |
| As a first step, the LogisticRegressionModelServable has been added to serve the |
| logistic regression model online, and more servables will be added in the |
| future. This new feature enables Flink ML to be used for both offline and online |
| machine learning tasks, making it more versatile for a wider range of use cases. |
| |
| ## Added 27 feature engineering algorithms |
| |
| Flink ML 2.2.0 significantly expanded the coverage of feature engineering |
| algorithms, increasing the number from 6 to 33. Flink ML now covers 28 out of |
| the 33 feature engineering algorithms provided in Spark ML, making it a more |
| comprehensive library for feature engineering tasks. |
| |
| Feature engineering is a critical step in modern AI infrastructures as it can |
| preprocess data not only for traditional machine learning algorithms like GBT |
| but also for deep learning algorithms and large language models like |
| Transformer, which are increasingly popular. With the addition of these |
| algorithms, we hope Flink ML can be more useful in machine-learning tasks for |
| Flink users. |
| |
| All feature engineering algorithms can be easily accessed through the drop-down |
| list on the left side of |
| [this](https://nightlies.apache.org/flink/flink-ml-docs-master/docs/operators/feature/binarizer/) |
| Flink ML page. For each algorithm, we have provided Python and Java examples to |
| demonstrate how to use them. |
| |
| ## Added two production-validated online learning algorithms |
| |
| Flink ML offers a significant advantage over other machine learning libraries in |
| terms of its ability to perform online learning using Flink's streaming runtime. |
| To leverage this strength, we implemented two online algorithms in Flink ML and |
| successfully used them in a production machine learning job at Alibaba. |
| |
| This job involves dynamically clustering similar logs and detecting errors in |
| the logs to help site reliability engineers. By using OnlineStandardScaler and |
| AgglomerativeClustering to standardize and cluster logs in real-time, the job is |
| able to update models more frequently with a much simpler infrastructure setup. |
| We presented this work at [Flink Forward Asia](https://flink-forward.org.cn/) |
| last year, and it will soon be integrated into the open-source project |
| [SREWorks](https://github.com/alibaba/SREWorks). |
| |
| With these online algorithms, Flink ML provides users with the ability to |
| continuously update models using new data in real-time, resulting in more |
| accurate and up-to-date predictions. This can be particularly useful in use |
| cases where data is constantly streaming in, and it's important to make quick |
| decisions based on the latest available information. |
| |
| # Upgrade Notes |
| |
| This release is fully backward compatible with Flink ML 2.1. Users should be |
| able to upgrade to Flink ML 2.2.0 without worrying about any incompatibilities |
| or breaking changes. |
| |
| # Release Notes and Resources |
| |
| Please take a look at the [release |
| notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351884) |
| for a detailed list of changes and new features. |
| |
| The binary distribution and source artifacts are now available on the updated |
| [Downloads page](https://flink.apache.org/downloads.html) of the Flink website, |
| and the most recent distribution of Flink ML Python package is available on |
| [PyPI](https://pypi.org/project/apache-flink-ml). |
| |
| # List of Contributors |
| |
| The Apache Flink community would like to thank each one of the contributors that |
| have made this release possible: |
| |
| Zhipeng Zhang, Dong Lin, Fan Hong, JiangXin, Zsombor Chikan, huangxingbo, |
| taosiyuan163, vacaly, weibozhao, yunfengzhou-hub |
| |