Apex Malhar Changelog

Version 3.5.0 - 2016-08-31

Sub-task

  • [APEXMALHAR-2047] - Create Factory Which Can Easily Create A Single Spillable Data Structure
  • [APEXMALHAR-2048] - Create concrete implementation of ArrayListMultiMap using managed state.
  • [APEXMALHAR-2070] - Create In Memory Implementation of ArrayList Multimap
  • [APEXMALHAR-2202] - Move accumulations to correct package
  • [APEXMALHAR-2208] - High-level API beam examples

Bug

  • [APEXMALHAR-998] - Compilation error while using UniqueValueCount operator.
  • [APEXMALHAR-1988] - CassandraInputOperator fetches less number of records inconsistenly
  • [APEXMALHAR-2103] - Scanner issues in FileSplitterInput class
  • [APEXMALHAR-2104] - BytesFileOutputOperator Refactoring
  • [APEXMALHAR-2112] - Contrib tests are failing because of inclusion of apache logger with geode dependency
  • [APEXMALHAR-2113] - Dag fails validation due to @NotNull on getUpdateCommand() in JdbcPOJOOutputOperator
  • [APEXMALHAR-2119] - Make DirectoryScanner in AbstractFileInputOperator inheritance friendly.
  • [APEXMALHAR-2120] - Fix bugs on KafkaInputOperatorTest and AbstractKafkaInputOperator
  • [APEXMALHAR-2128] - Update twitter4j version to the one support twitter APIs
  • [APEXMALHAR-2134] - Catch NullPointerException if some Kafka partition has no leader broker
  • [APEXMALHAR-2135] - Upgrade Kafka 0.8 input operator to support 0.8.2 client
  • [APEXMALHAR-2136] - Null pointer exception in AbstractManagedStateImpl
  • [APEXMALHAR-2138] - Multiple declaration of org.mockito.mockito-all-1.8.5 in Malhar library pom
  • [APEXMALHAR-2140] - Move ActiveFieldInfo class to com.datatorrent.lib.util
  • [APEXMALHAR-2158] - Duplication of data emitted when the Kafka Input Operator(0.8 version) redeploys
  • [APEXMALHAR-2168] - The setter method for double field is not generated correctly in JdbcPOJOInputOperator.
  • [APEXMALHAR-2169] - KafkaInputoperator: Remove the stuff related to Partition Based on throughput.
  • [APEXMALHAR-2171] - In CacheStore maxCacheSize is not applied
  • [APEXMALHAR-2174] - S3 File Reader reading more data than expected
  • [APEXMALHAR-2195] - LineReaderContext gives incorrect results for files not ending with the newline
  • [APEXMALHAR-2197] - TimeBasedPriorityQueue.removeLRU throws NoSuchElementException
  • [APEXMALHAR-2199] - 0.8 kafka input operator doesn't support chroot zookeeper path (multitenant kafka support)

Documentation

  • [APEXMALHAR-2153] - Add user documentation for Enricher

Improvement

  • [APEXMALHAR-1953] - Add generic (insert, update, delete) support to JDBC Output Operator
  • [APEXMALHAR-1957] - Improve HBasePOJOInputOperator with support for threaded read
  • [APEXMALHAR-1966] - Cassandra output operator improvements
  • [APEXMALHAR-2028] - Add System.err to ConsoleOutputOperator
  • [APEXMALHAR-2045] - Bandwidth control feature
  • [APEXMALHAR-2063] - Integrate WAL to FS WindowDataManager
  • [APEXMALHAR-2069] - FileSplitterInput and TimeBasedDirectoryScanner - move operational fields initialization from constructor to setup
  • [APEXMALHAR-2075] - Support fields of type Date,Time and Timestamp in Pojo Class for JdbcPOJOInputOperator
  • [APEXMALHAR-2087] - Hive output module
  • [APEXMALHAR-2096] - Add blockThreshold parameter to FSInputModule
  • [APEXMALHAR-2105] - Enhance CSV Formatter to take in schema similar to Csv Parser
  • [APEXMALHAR-2111] - Projection Operator config params shall use List instead of comma-separated field names
  • [APEXMALHAR-2121] - KafkaInputOperator emitTuple method should be able to emit more than just message
  • [APEXMALHAR-2148] - Reduce the noise of kafka input operator
  • [APEXMALHAR-2154] - Update kafka 0.9 input operator to use new CheckpointNotificationListener
  • [APEXMALHAR-2156] - JMS Input operator enhancements
  • [APEXMALHAR-2157] - Improvements in JSON Formatter
  • [APEXMALHAR-2172] - Update JDBC poll input operator to fix issues
  • [APEXMALHAR-2180] - KafkaInput Operator partitions has to be unchanged in case of dynamic scaling of ONE_TO_MANY strategy.
  • [APEXMALHAR-2185] - Add a Deduper implementation for Bounded data

New Feature

  • [APEXMALHAR-1701] - Deduper backed by Managed State
  • [APEXMALHAR-2019] - S3 Input Module
  • [APEXMALHAR-2026] - Spill-able Datastructures
  • [APEXMALHAR-2066] - JDBC poller input operator
  • [APEXMALHAR-2082] - Data Filter Operator
  • [APEXMALHAR-2085] - Implement Windowed Operators
  • [APEXMALHAR-2100] - Inner Join Operator using Spillable Datastructures
  • [APEXMALHAR-2116] - File Record reader module
  • [APEXMALHAR-2142] - High-level API window support
  • [APEXMALHAR-2151] - Enricher - Add delimited file format support to FSLoader

Task

  • [APEXMALHAR-2129] - ManagedState: Disable purging based on system time
  • [APEXMALHAR-2200] - Enable checkstyle for demos

Test

  • [APEXMALHAR-2161] - Add tests for AbstractThroughputFileInputOperator

Version 3.4.0 - 2016-05-24

Sub-task

  • [APEXMALHAR-2006] - Stream API Design
  • [APEXMALHAR-2046] - Introduce Spill-able data-structure interfaces
  • [APEXMALHAR-2050] - Move spillable package under state.
  • [APEXMALHAR-2051] - Remove redundant StorageAgent interface Malhar library
  • [APEXMALHAR-2064] - Move WindowDataManager to org.apache.apex.malhar.lib.wal
  • [APEXMALHAR-2065] - Add getWindows() method to WindowDataManager
  • [APEXMALHAR-2095] - Fix checkstyle violations of library module in Apex Malhar

Bug

  • [APEXMALHAR-1970] - ArrayOutOfBoundary error in One_To_Many Partitioner for 0.9 kafka input operator
  • [APEXMALHAR-1973] - InitialOffset bug and duplication caused by offset checkpoint
  • [APEXMALHAR-1984] - Operators that use Kryo directly would throw exception in local mode
  • [APEXMALHAR-1985] - Cassandra Input Oeprator: startRow set incorrectly
  • [APEXMALHAR-1990] - Occasional concurrent modification exceptions from IdempotentStorageManager
  • [APEXMALHAR-1993] - Committed offsets are not present in offset manager storage for kafka input operator
  • [APEXMALHAR-1994] - Operator partitions are reporting offsets for kafka partitions they don't subscribe to
  • [APEXMALHAR-1998] - Kafka unit test memory requirement breaks Travis CI build
  • [APEXMALHAR-2003] - NPE in FileSplitterInput
  • [APEXMALHAR-2004] - TimeBasedDirectoryScanner keep reading same file
  • [APEXMALHAR-2036] - FS operator tests leave stray test files under target
  • [APEXMALHAR-2042] - Managed State - unexpected null value
  • [APEXMALHAR-2052] - Enable checkstyle in parent POM
  • [APEXMALHAR-2060] - Add an entry for org.apache.apex in the log4j.properties
  • [APEXMALHAR-2072] - Cleanup properties of Transform Operator
  • [APEXMALHAR-2073] - Intermittent test failure: ManagedStateImplTest.testFreeWindowTransferRaceCondition
  • [APEXMALHAR-2078] - Potential thread issue in FileSplitterInput class
  • [APEXMALHAR-2079] - FileOutputOperator expireStreamAfterAccessMillis field typo
  • [APEXMALHAR-2080] - File expiration time is set too low by default in AbstractFileOutputOperator.
  • [APEXMALHAR-2081] - Remove FSFileSplitter, BlockReader, HDFSFileSplitter, HDFSInputModule
  • [APEXMALHAR-2088] - Exception while fetching properties for Operators using JdbcStore
  • [APEXMALHAR-2097] - BytesFileOutputOperator class should be marked as public

Improvement

  • [APEXMALHAR-1873] - Create a fault-tolerant/scalable cache component backed by a persistent store
  • [APEXMALHAR-1948] - CassandraStore Should Allow You To Specify Protocol Version.
  • [APEXMALHAR-1961] - Enhancing existing CSV Parser
  • [APEXMALHAR-1962] - Enhancing existing JSON Parser
  • [APEXMALHAR-1980] - Add metrics to Cassandra Input operator
  • [APEXMALHAR-1983] - Support special chars in topics setting for new Kafka Input Operator
  • [APEXMALHAR-1991] - Move Dimensions Computation Classes to org.apache.apex.malhar package and Mark evolving
  • [APEXMALHAR-2018] - HDFS File Input Module: Move generic code to abstract parent class.
  • [APEXMALHAR-2025] - Move FileLineInputOperator out of AbstractFileInputOperator
  • [APEXMALHAR-2031] - Allow Window Data Manager to store data in a user specified directory
  • [APEXMALHAR-2043] - Update checkstyle plugin declaration to use apex-codestyle-config artifact
  • [APEXMALHAR-2056] - Move Serde Interface Under utils and add methods which don't take mutable int
  • [APEXMALHAR-2077] - SingleFileOutputOperator should append partitionId to file name

New Feature

  • [APEXMALHAR-1897] - Large operator state management
  • [APEXMALHAR-1919] - Move Dimensional Schema To Malhar
  • [APEXMALHAR-1920] - Add dimensional JDBC Output Operator
  • [APEXMALHAR-1936] - Apache Nifi Connector
  • [APEXMALHAR-1938] - Operator checkpointing in distributed in-memory store
  • [APEXMALHAR-1942] - Apex Operator for Apache Geode.
  • [APEXMALHAR-1972] - Create Expression Evaluator Support quasi-Java Expression Language
  • [APEXMALHAR-2010] - Transform operator
  • [APEXMALHAR-2011] - POJO to Avro record converter
  • [APEXMALHAR-2012] - Avro Record to POJO converter
  • [APEXMALHAR-2014] - ParquetReader operator
  • [APEXMALHAR-2015] - Projection Operator
  • [APEXMALHAR-2023] - Enrichment Operator

Task

  • [APEXMALHAR-1859] - Integrate checkstyle with Malhar
  • [APEXMALHAR-1968] - Update NOTICE copyright year
  • [APEXMALHAR-1969] - Add idempotency support to 0.9 KafkaInputOperator
  • [APEXMALHAR-1975] - Add group id information to all apex malhar app package
  • [APEXMALHAR-1986] - Change semantic version check to use 3.3 release
  • [APEXMALHAR-2009] - concrete operator for writing to HDFS file
  • [APEXMALHAR-2013] - HDFS output module for file copy
  • [APEXMALHAR-2054] - Make the Query Operator in the App Data Pi Demo embedded in the Snapshot Server
  • [APEXMALHAR-2055] - Add Dimension TOPN support
  • [APEXMALHAR-2058] - Add simple byte[] to byte[] Serde implementation
  • [APEXMALHAR-2067] - Make necessary changes in Malhar for Apex Core 3.4.0
  • [APEXMALHAR-2093] - Remove usages of Idempotent Storage Manager

Version 3.3.1-incubating - 2016-02-27

Bug

  • [APEXMALHAR-1970] - ArrayOutOfBoundary error in One_To_Many Partitioner for 0.9 kafka input operator
  • [APEXMALHAR-1973] - InitialOffset bug and duplication caused by offset checkpoint
  • [APEXMALHAR-1984] - Operators that use Kryo directly would throw exception in local mode
  • [APEXMALHAR-1990] - Occasional concurrent modification exceptions from IdempotentStorageManager
  • [APEXMALHAR-1993] - Committed offsets are not present in offset manager storage for kafka input operator
  • [APEXMALHAR-1994] - Operator partitions are reporting offsets for kafka partitions they don't subscribe to
  • [APEXMALHAR-1998] - Kafka unit test memory requirement breaks Travis CI build
  • [APEXMALHAR-2003] - NPE in FileSplitterInput

Improvement

  • [APEXMALHAR-1983] - Support special chars in topics setting for new Kafka Input Operator

Task

  • [APEXMALHAR-1968] - Update NOTICE copyright year
  • [APEXMALHAR-1986] - Change semantic version check to use 3.3 release

Version 3.3.0-incubating - 2016-01-10

Sub-task

  • [APEXMALHAR-1877] - Move org.apache.hadoop.io.file.tfile from contrib to library in Malhar
  • [APEXMALHAR-1901] - Test- DTFileTest creates test folder under lib directory
  • [APEXMALHAR-1902] - Rename IdempotentStorage Manager
  • [APEXMALHAR-1910] - Fix existing checkstyle violations in BlockReader and FileSplitter
  • [APEXMALHAR-1912] - Fix existing check style violations in FileOutput, JMSInput, FTPInput, JDBC classes
  • [APEXMALHAR-1916] - Add FileAccess API and its DTFileImplementation
  • [APEXMALHAR-1931] - Augment FileAccess API
  • [APEXMALHAR-1941] - Add a default Slice comparator to Malhar/util
  • [APEXMALHAR-1943] - Add Aggregator to Malhar and make it top level interface
  • [APEXMALHAR-1944] - Add DimensionsConversionContext to Malhar and make it top class
  • [APEXMALHAR-1945] - Upgrade the version of japicmp to 0.6.2

Bug

  • [APEXMALHAR-1880] - Incorrect documentation for maxLength property on AbstractFileOutputOperator
  • [APEXMALHAR-1887] - shutdown field in WebSocketInputOperator should be volatile
  • [APEXMALHAR-1894] - Add an Input Port With An isConnected Method
  • [APEXMALHAR-1922] - FileStreamContext - Set filterStream variable to transient
  • [APEXMALHAR-1925] - The kafka offset manager may not store the offset of processed data in all scenarios
  • [APEXMALHAR-1928] - Update checkpointed offsettrack in operator thread instead of consumer thread
  • [APEXMALHAR-1929] - japicmp plugin fails for malhar samples
  • [APEXMALHAR-1934] - When offset is unavailable kafka operator stops reading data
  • [APEXMALHAR-1949] - JDBC Input Operator unnecessarily waits two times when the result is empty
  • [APEXMALHAR-1960] - Test failure KafkaInputOperatorTest.testRecoveryAndIdempotency

Improvement

  • [APEXMALHAR-1895] - Refactor Snapshot Server
  • [APEXMALHAR-1896] - Add Utility Functions For Working With Schema Tags
  • [APEXMALHAR-1906] - Snapshot Server support tags
  • [APEXMALHAR-1908] - Add Deserialization Function That Deserializes keys with multiple values
  • [APEXMALHAR-1913] - FileSplitter - Need access to modifiedTime of ScannedFileInfo class
  • [APEXMALHAR-1918] - FileSplitter - Need stopScanning method in Scanner
  • [APEXMALHAR-1940] - Create Operator Utility Class Which Converts Time To Windows
  • [APEXMALHAR-1958] - Provide access to doneTuple field in AbstractReconciler for derived classes

New Feature

  • [APEXMALHAR-1812] - Support Anti Join
  • [APEXMALHAR-1813] - Support Semi Join
  • [APEXMALHAR-1904] - New Kafka input operator using 0.9.0 consumer APIs

Task

  • [APEXMALHAR-1859] - Integrate checkstyle with Malhar
  • [APEXMALHAR-1892] - Fix missing javadoc
  • [APEXMALHAR-1905] - Test the old kafka input operator is compatible with 0.9.0 broker
  • [APEXMALHAR-1950] - Identify and mark Operators and Components as @Evolving
  • [APEXMALHAR-1956] - Concrete generic Implementation of Kafka Output Operator with auto metrics and batch processing
  • [APEXMALHAR-1964] - Checkstyle - Reduce the severity of line length check

Version 3.2.0-incubating - 2015-11-13

Sub-task

  • [MLHR-1870] - JsonParser unit test failing
  • [MLHR-1872] - Add license headers in unit tests of parsers and formatters
  • [MLHR-1886] - Optimize recovery of files which are not corrupted
  • [MLHR-1889] - AbstractFileOutputOperator should have rename method to do rename operation

Bug

  • [MLHR-1799] - Cassandra Pojo input operator is broken
  • [MLHR-1820] - Fix NPE in SnapshotServer
  • [MLHR-1823] - AbstractFileOutputOperator not finalizing the file after the recovery
  • [MLHR-1825] - AbstractFileOutputOperator throwing FileNotFoundException during the recovery
  • [MLHR-1830] - Fix Backword Compatibility Errors
  • [MLHR-1835] - WebSocketInputOperator Creates More And More Zombie Threads As It Runs
  • [MLHR-1837] - AbstractFileOutputOperator writing to same temp file after the recovery
  • [MLHR-1839] - Configure All The Twitter Demos To Use Embeddable Query
  • [MLHR-1841] - AbstractFileOutputOperator rotation interval not working when there is no processing
  • [MLHR-1852] - File Splitter Test Failing On My Machine
  • [MLHR-1856] - Make Custom Time Buckets Sortable
  • [MLHR-1860] - Check for null fileName in new wordcount app in wrong place
  • [MLHR-1864] - Some Times Expired Queries Are processed
  • [MLHR-1866] - Travis-ci build integration
  • [MLHR-1876] - WindowBoundedService Can Block The Shutdown Of A Container
  • [MLHR-1880] - Incorrect documentation for maxLength property on AbstractFileOutputOperator
  • [MLHR-1885] - Adding getter methods to the variables of KafkaMessage

Task

  • [MLHR-1857] - Apache license headers and related files
  • [MLHR-1869] - Update Maven coordinates for ASF release
  • [MLHR-1871] - Expand checks in CI build
  • [MLHR-1891] - Skip install/deploy of source archives

Improvement

  • [MLHR-1803] - Add Embeddable Query To AppDataSnapshotServer
  • [MLHR-1804] - Enable FileSplitter to be used as a non-input operator
  • [MLHR-1805] - Ability to supply additional file meta information in FileSplitter
  • [MLHR-1806] - Ability to supply additional block meta information in FileSplitter
  • [MLHR-1824] - Convert Pi Demo to support Query Operator
  • [MLHR-1836] - Integrate schema with Jdbc POJO operators
  • [MLHR-1862] - Clean up code for Machine Data Demo
  • [MLHR-1863] - Make Custom Time Bucket Comparable
  • [MLHR-1868] - Improve GPOUtils hashcode function