| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| # Lab: Filtering a Stream (Ride Cleansing) |
| |
| If you haven't already done so, you'll need to first [setup your Flink development environment](../README.md). See [How to do the Labs](../README.md#how-to-do-the-labs) for an overall introduction to these exercises. |
| |
| The task of the "Taxi Ride Cleansing" exercise is to cleanse a stream of TaxiRide events by removing events that do not start or end in New York City. |
| |
| The `GeoUtils` utility class provides a static method `isInNYC(float lon, float lat)` to check if a location is within the NYC area. |
| |
| ### Input Data |
| |
| This series of exercises is based a stream of `TaxiRide` events, as described in [Using the Taxi Data Streams](../README.md#using-the-taxi-data-streams). |
| |
| ### Expected Output |
| |
| The result of the exercise should be a `DataStream<TaxiRide>` that only contains events of taxi rides which both start and end in the New York City area as defined by `GeoUtils.isInNYC()`. |
| |
| The resulting stream should be printed to standard out. |
| |
| ## Getting Started |
| |
| > :information_source: Rather than following the links to the sources in this section, you'll do better to find these classes in the flink-training project in your IDE. |
| > Both IntelliJ and Eclipse have ways to make it easy to search for and navigate to classes and files. For IntelliJ, see [the help on searching](https://www.jetbrains.com/help/idea/searching-everywhere.html), or simply press the Shift key twice and then continue typing something like `RideCleansing` and then select from the choices that popup. |
| |
| ### Exercise Classes |
| |
| This exercise uses these classes: |
| |
| - Java: [`org.apache.flink.training.exercises.ridecleansing.RideCleansingExercise`](src/main/java/org/apache/flink/training/exercises/ridecleansing/RideCleansingExercise.java) |
| - Scala: [`org.apache.flink.training.exercises.ridecleansing.scala.RideCleansingExercise`](src/main/scala/org/apache/flink/training/exercises/ridecleansing/scala/RideCleansingExercise.scala) |
| |
| ### Tests |
| |
| You will find the test for this exercise in |
| |
| - Java: [`org.apache.flink.training.exercises.ridecleansing.RideCleansingTest`](src/test/java/org/apache/flink/training/exercises/ridecleansing/RideCleansingTest.java) |
| - Scala: [`org.apache.flink.training.exercises.ridecleansing.scala.RideCleansingTest`](src/test/scala/org/apache/flink/training/exercises/ridecleansing/scala/RideCleansingTest.scala) |
| |
| Like most of these exercises, at some point the `RideCleansingExercise` class throws an exception |
| |
| ```java |
| throw new MissingSolutionException(); |
| ``` |
| |
| Once you remove this line, the test will fail until you provide a working solution. You might want to first try something clearly broken, such as |
| |
| ```java |
| return false; |
| ``` |
| |
| in order to verify that the test does indeed fail when you make a mistake, and then work on implementing a proper solution. |
| |
| ## Implementation Hints |
| |
| <details> |
| <summary><strong>Filtering Events</strong></summary> |
| |
| Flink's DataStream API features a `DataStream.filter(FilterFunction)` transformation to filter events from a data stream. The `GeoUtils.isInNYC()` function can be called within a `FilterFunction` to check if a location is in the New York City area. Your filter function should check both the starting and ending locations of each ride. |
| </details> |
| |
| ## Documentation |
| |
| - [DataStream API](https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html) |
| - [Flink JavaDocs](https://ci.apache.org/projects/flink/flink-docs-stable/api/java/) |
| |
| ## Reference Solutions |
| |
| Reference solutions are available in this project: |
| |
| - Java: [`org.apache.flink.training.solutions.ridecleansing.RideCleansingSolution`](src/solution/java/org/apache/flink/training/solutions/ridecleansing/RideCleansingSolution.java) |
| - Scala: [`org.apache.flink.training.solutions.ridecleansing.scala.RideCleansingSolution`](src/solution/scala/org/apache/flink/training/solutions/ridecleansing/scala/RideCleansingSolution.scala) |
| |
| ----- |
| |
| [**Back to Labs Overview**](../LABS-OVERVIEW.md) |